Sunday, March 15, 2009

The History of Geocoding and Distance Calculation on the Web

I have been working on a project that will allow for hunters to mark on a map where they have hunted and what animals they have brought down.  The website should be launched in the next month or two.  In working on this project, I've been researching what is new in the geocoding world.  Not surprisingly, much has changed in the past few years.  In this article, I'll be going over the old way of geocoding.

So what is geocoding?
Geocoding is the process of finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or zip codes (postal codes). (Source: Wikipedia)
This process is vital for all sorts of applications.  In the past, you purchased a zip code database from one of several vendors.  I've always used Melissa Data.  This would tell you the latitude and longitude coordinates on a zip code level.  Census data would also be included in the database.  Back then, before Google Maps and similar allowed for mapping to be used to develop applications on your website, this was about all the level of specificity you could get without spending lots of money.

Once you had this database, you could calculate the as a crow flies mileage between zip codes.  This mileage can be calculated using the following algorithm from Meridian World Data.  
sqrt(x * x + y * y)

where x = 69.1 * (lat2 - lat1) 
and y = 53.0 * (lon2 - lon1) 
Depending on the level of accuracy needed for the application, you can use different formulas.  In most cases, I've found the above algorithm accurate enough.  As an added bonus, the above formula is simple enough to easily use in a database SQL statement.

An example of an application would be a company that delivers widgets to a local area.  Based on the mileage, the delivery charge can be assessed.  Another example would be a news website that if it knows the users zip code, they can provide news that is more local.

Admittedly, the above formula won't get exact mileage for routing for a couple of reasons.  One, the calculation is as a crow flies meaning that the calculated mileage takes no consideration of the roads that you would have to take.  As an extreme example, imagine you are on a Delaware beach and want to get to New Jersey (I have no idea why anyone would want that).  The as a crow flies mileage would be about 25 miles.  The actual driving distance is closer to 167 miles.   Extreme examples aside, as a crow flies mileage is usually pretty close to the actual driving distance.  

The other issue with this is that you are only calculating zip code to zip code.  If the origin address is in Lewes, Delaware and the destination address is in Milton, Delaware, the distance based on the above calculation would be the center of Lewes, Delaware to the center of Milton, Delaware. What if I am at the north end of Lewes and travelling to a home in the south-east end of Milton?  The distance might only be 5 miles instead of the 12 miles from the center of zip code calculation.

I'm providing extreme examples to prove a point of the inaccuracy of these old methods.  In reality, you can usually accept this level of accuracy.  In the delivery service example, you plan to make money on some delivery (where the actual distance is less than reported) while losing money on others (where actual distance is more than reported).  

Three or four years ago, this was all we had (without paying lots of money for a routing service).  Good applications were still being built if not quite as accurate as everyone would have liked.

Fast forward to now and we have several providers (Google and Yahoo! among others) that provide an HTTP based geocoding service.  Mapping is becoming easier to integrate into your website.  The level of accuracy is at an address level compared to the zip code level of yesterday.  In another post, I will discuss the current implementation of geocoding and the maturation of the applications that can be built as a result.

0 comments:

Post a Comment