Calculating Distances in SQL
Sooner or later when dealing with data that contains addresses, you’ll need to perform distance calculations. Some examples might be “which store is nearest to location X” or “display all hotels within 20 miles of the customer’s location sorted by distance”. This post deals with such calculations based on positions designated by latitude and longitude. The process of determining the latitude and longitude of an address is called geocoding and is outside the scope of this article.
Distance on a Sphere
In Euclidean geometry (i.e. on a plane), the distance between two points is simply the length of the line connecting them. However, the Earth is a (slightly irregular) sphere and a line drawn between two points on a sphere assumes you can tunnel through the planet. On a sphere, the shortest distance between two points is the length of the arc of the great circle that passes through both points. This gives distance “as the crow flies” which is usually what we need.
Assuming the earth is a perfect sphere is not perfect, as it actually bulges a bit at the equator, and of course this also conveniently ignores pesky terrain with altitude changes. However, the techniques discussed here provide reasonable distance estimates, usually within 1% of a more “correct” model. You can compensate somewhat for this deviation by using a value for the radius of the earth which corresponds to the latitude of most of your locations.
When using a spherical model of the earth, there are several great circle distance formulas to choose from. While they are all equivalent mathematically, each has different strengths and weaknesses when implemented as a computer algorithm.
-
and
are two geographic
locations where
is latitude and
is longitude
is the great circle distance between the locations
is the radius of the earth- arguments to trig functions need to be given in radians
means inverse cosine or arccos
means inverse tangent or arctan
The simplest formula is derived from the spherical law of cosines. It is the least complex but is also ill conditioned (meaning susceptible to rounding errors) for small distances.
Because of the potential errors introduced by using this formula when used with finite precision variables and small distances, a variation called the haversine formula has been traditionally used in navigation.

and the haversine formula can be derived directly from the spherical law of cosines. It is well conditioned for nearly all distances except for nearly antipodal locations (that is nearly opposite one another on the globe). Go figure.

where
Note that these formulas are mathematically equivalent and exact. It is only when dealing with numbers of finite precision that the rounding errors appear.
Formula Implementations in SQL
Which formula should you use for distance calculations? The answer is (as always) it depends on your needs. The spherical law of cosines degrades for small distances, though in informal testing I’ve found that the largest deviation I could generate was 10-9 (miles) over a total distance of a half mile. This is mostly due to the relatively large precision we can attain using the sql DOUBLE data type. The precision in my source data for latitude and longitude is much worse than this error, so I find the cosine formula to be acceptable for my uses. The haversine formula is more accurate for small distances, but is more computationally expensive, so if accuracy is more important than speed, then this is a good choice.
Here are the two formulas implemented as MySQL UDFs (user defined functions). I
use the built in RADIANS and DEGREES functions to convert between radians
and degrees, but you can also simply multiply by PI/180 (0.017453293) to
convert to radians and 180/PI (57.295780) to convert to degrees. These
functions as written return results in miles, but you can replace 3956.547 (the
average radius of the earth in miles) with 6367 (the same thing in kilometers)
to get the final distance in kilometers.
Distance using the spherical law of cosines:
DROP FUNCTION IF EXISTS fn_distance_cosine;
CREATE FUNCTION fn_distance_cosine (
latitude_1 DOUBLE,
longitude_1 DOUBLE,
latitude_2 DOUBLE,
longitude_2 DOUBLE
)
RETURNS DOUBLE
DETERMINISTIC
SQL SECURITY INVOKER
RETURN ACOS(
SIN(RADIANS(latitude_1)) * SIN(RADIANS(latitude_2))
+ COS(RADIANS(latitude_1)) * COS(RADIANS(latitude_2))
* COS(RADIANS(longitude_2 - longitude_1))
) * 3956.547;
Distance using the haversine formula:
DELIMITER //
DROP FUNCTION IF EXISTS fn_distance_haversine//
CREATE FUNCTION fn_distance_haversine (
latitude_1 DOUBLE,
longitude_1 DOUBLE,
latitude_2 DOUBLE,
longitude_2 DOUBLE
)
RETURNS DOUBLE
DETERMINISTIC
SQL SECURITY INVOKER
BEGIN
DECLARE partial DOUBLE;
SET partial := SIN(RADIANS(latitude_2 - latitude_1)*0.5) *
SIN(RADIANS(latitude_2 - latitude_1)*0.5) +
COS(RADIANS(latitude_1)) * COS(RADIANS(latitude_2)) *
SIN(RADIANS(longitude_2 - longitude_1)*0.5) *
SIN(RADIANS(longitude_2 - longitude_1)*0.5);
RETURN 2.0 * 3956.547 * ATAN(SQRT(partial), SQRT(1.0-partial));
END
//
DELIMITER ;
Benchmark results: notice the haversine version is roughly 60% slower
mysql> SELECT BENCHMARK(100000, fn_distance_cosine(39.39351,-76.611219,38.687371,-89.983306));
1 row in set (0.88 sec)
mysql> SELECT BENCHMARK(100000, fn_distance_haversine(39.39351,-76.611219,38.687371,-89.983306));
1 row in set (1.44 sec)
Calculating Distances Efficiently
Now that we have some formulas, we can jump into using them. Let’s say that we have a table containing store information, including latitude and longitude. We want to find all stores within 30 miles of a customer’s location and order them by distance, near to far.
Here is a table with some sample data. It contains five stores in San Francisco and five more in Los Angeles.
| store_id | name | latitude | longitude |
|---|---|---|---|
| 1 | SF 1 | 37.80118 | -122.47754 |
| 2 | SF 2 | 37.778483 | -122.391121 |
| 3 | SF 3 | 37.78237 | -122.46821 |
| 4 | SF 4 | 37.786816 | -122.39558 |
| 5 | SF 5 | 37.741009 | -122.405017 |
| 6 | LA 1 | 34.101202 | -118.328518 |
| 7 | LA 2 | 34.083529 | -118.348925 |
| 8 | LA 3 | 34.038557 | -118.268695 |
| 9 | LA 4 | 34.062725 | -118.348914 |
| 10 | LA 5 | 34.096877 | -118.328813 |
Let’s say that our hypothetical customer is searching from San Francisco at coordinates (37.81076, -122.47681). The query to find all stores within 30 miles is simple.
SELECT store_id, name, fn_distance_cosine(latitude, longitude, 37.81076, -122.47681) AS distance_in_miles FROM store WHERE fn_distance_cosine(latitude, longitude, 37.81076, -122.47681) <= 30 ORDER BY distance_in_miles;
Results:
| store_id | name | distance_in_miles |
|---|---|---|
| 1 | SF 1 | 0.662742576418294 |
| 3 | SF 3 | 2.01584700390573 |
| 4 | SF 4 | 4.73066680579462 |
| 2 | SF 2 | 5.17993180641564 |
| 5 | SF 5 | 6.20929598602107 |
This gives correct answers and is really simple, but there’s a problem. If the table had a realistic number of rows, the query will start to perform poorly, even if we put an index on the latitude and longitude fields. This is because MySQL is not able to utilize an index on a column if you are using a function in the expression with that column. So the server is forced to do a full table scan, apply the function to every single row, then weed out rows based on the results. This is especially bad if the two locations you are trying to compare are in separate tables. Then the server has to calculate the distance for every combination of rows from the two tables, which is effectively a CROSS JOIN and will quickly bog down with even moderately sized tables.
How do we improve this situation? The trick is to help the server rapidly discard rows which are too far away and only apply the complicated distance formula to rows which are close enough that they might be within the radius. We do this by calculating the minimum and maximum latitude and longitudes which could satisfy our distance condition. If you think of the distance requirement as a circle centered on the starting location whose radius is the distance we’re interested in, then the min/max latitude and longitude form a box around the circle. It’s not actually a square, since the sides are arcs of latitude and longitude, but the concept is similar.
In this diagram, we ultimately want only results in the purple circle, but to make the query efficient, we’ll do two passes, first gathering results inside the whole box (purple + green), then removing the corners (leaving only purple). The bottom and top of the box are the minimum/maximum latitudes while the sides of the box are the minimum/maximum longitude.




where
is the search radius and
is the radius of the earth
Using these numbers we can help the server quickly narrow in on the values we’re interested in. Here I am using session variables to make the query more understandable but you can easily do this with a single longer query, just substitute the calculations for the variables. You can also perform the bounding box calculations outside of SQL and just pass in the final numbers since it only needs to be calculated once.
SET @max_latitude := 37.81076 + DEGREES(30.0/3956.547); SET @min_latitude := 37.81076 - DEGREES(30.0/3956.547); SET @max_longitude := -122.47681 + DEGREES(30.0/3956.547/COS(RADIANS(37.81076))); SET @min_longitude := -122.47681 - DEGREES(30.0/3956.547/COS(RADIANS(37.81076))); SELECT store_id, name, fn_distance_cosine(latitude, longitude, 37.81076, -122.47681) AS distance_in_miles FROM store WHERE latitude BETWEEN @min_latitude AND @max_latitude AND longitude BETWEEN @min_longitude AND @max_longitude HAVING distance_in_miles <= 30.0 ORDER BY distance_in_miles;
Compare this version of the query with the first one. Notice how this time, the server only has to do simple range comparisons on the latitude and longitude and it will be able to use indexes on those columns, greatly speeding the result. Once it has a set of rows which fall inside the bounding box, it performs the more expensive correct distance calculation on each row, and then the HAVING clause removes any results which are too far away (the corners of the box). This is a good use of the often misunderstood HAVING clause as it always applies at the end of the query, after the rows are joined, grouped, and fetched.
Results (same as before, only more efficient):
| store_id | name | distance_in_miles |
|---|---|---|
| 1 | SF 1 | 0.662742576418294 |
| 3 | SF 3 | 2.01584700390573 |
| 4 | SF 4 | 4.73066680579462 |
| 2 | SF 2 | 5.17993180641564 |
| 5 | SF 5 | 6.20929598602107 |
One caveat with this technique is that it will fail if your search radius includes one of the poles (unlikely unless you are Santa Claus) or spans the 180° meridian.
Useful Links
- GIS FAQ 5.1 - census.gov FAQ covering calculating great circle distances
- Movable Type Scripts - implementations of distance, bearing, and other useful geography calculations in javascript
- Haversine formula - Wikipedia article on the Haversine formula
- Spherical Trigonometry - Derivation of the spherical law of cosines and more math fun
- Derivation of the Haversine formula - With nifty ASCII art diagrams
In his post Finding things within some distance in SQL, Baron Schwartz makes a good case for avoiding any of the great circle formulas in the common case where you just want to find results within a radius of a point. Often, a simple square box approximation is good enough.