Sunday, March 15, 2009

Modified Preorder Tree Traversal

If you are a programmer, hopefully you've heard of this term.  I learned about it 7 years ago while a big city programmer from New York looked down at a country programmer from Sussex County Delaware.  He had a lot to say but of little value.  When we were discussing an e-commerce platform that I had built, he asked which algorithm I was using for parent/child relationships.  I explained how I had a parentCategoryID field that held the value of the parent category in the child category row.  One weakness in this approach is that it is tough to find indirect relationships.  Say I have top level category of animals and plants, then the next level has birds, fish, reptiles, etc.  Then, the next level has the species of animals and plants.  How can you create a query that says display all animals?  Without multiple queries or complex queries, you can't.  

lifeFormIDnameparentLifeFormID
1AnimalNULL
2PlantNULL
3Bird1
4Fish1
5Reptiles1
6Trees2
7Shrubs2
8Cactus2
9Duck3
10Goose3
11Snake4
12Oak Tree6

Using the data above, I want a query that returns duck, goose and snake but not oak tree.  Using this method, there is no clean way.  Either multiple queries 
SELECT lifeFormID, name, parentLifeFormID FROM lifeForm  WHERE parentLifeFormID = 1
then...
SELECT lifeFormID, name, parentLifeFormID FROM lifeForm  WHERE parentLifeFormID IN (list of ID's from the above query)
Yes, if you know that there are only three levels then you could build the query below:
SELECT lf.lifeformID, lf.name, lf.parentLifeformID FROM lifeForm lf WHERE lf.parentLifeFormID IN ( SELECT subLF.lifeFormID FROM lifeForm subLF WHERE subLF.parentLifeFormID = 1 )
But this creates code that is not very modular or efficient.  A better way to perform this is using the Modified Preorder Tree Traversal algorithm.

Lets try setting up the table this way.

lifeFormIDnameparentLifeFormIDtreeLefttreeRighttreeLevel
1AnimalNULL1141
2PlantNULL15241
3Bird1272
4Fish1892
5Reptiles110132
6Trees216192
7Shrubs220212
8Cactus222232
9Duck3343
10Goose3563
11Snake411123
12Oak Tree617183

Now lets say we want to create a SQL query that retrieves all types of animals.
SELECT lifeFormID, name, parentLifeFormID, treeLeft, treeRight, treeLevel FROM lifeForm WHERE 1 < treeLeft AND treeRight < 14
Note: 1 is the treeLeft value of the animal row and 14 is the treeRight value of the animal row 
This method allows for SQL queries to be created that allow for indirect relationships to be found simpler.  The one downside of this algorithm is that it is more complicated to add/edit/delete from the table.  Since most applications use the view data much more often (generally around 90%), it is better to simplify and speed up the view process while losing speed on the admin side.  As for the actual functions and SQL to maintain this, SitePoint has a good article with some useful code that you should be able to migrate into your application.

The History of Geocoding and Distance Calculation on the Web

I have been working on a project that will allow for hunters to mark on a map where they have hunted and what animals they have brought down.  The website should be launched in the next month or two.  In working on this project, I've been researching what is new in the geocoding world.  Not surprisingly, much has changed in the past few years.  In this article, I'll be going over the old way of geocoding.

So what is geocoding?
Geocoding is the process of finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or zip codes (postal codes). (Source: Wikipedia)
This process is vital for all sorts of applications.  In the past, you purchased a zip code database from one of several vendors.  I've always used Melissa Data.  This would tell you the latitude and longitude coordinates on a zip code level.  Census data would also be included in the database.  Back then, before Google Maps and similar allowed for mapping to be used to develop applications on your website, this was about all the level of specificity you could get without spending lots of money.

Once you had this database, you could calculate the as a crow flies mileage between zip codes.  This mileage can be calculated using the following algorithm from Meridian World Data.  
sqrt(x * x + y * y)

where x = 69.1 * (lat2 - lat1) 
and y = 53.0 * (lon2 - lon1) 
Depending on the level of accuracy needed for the application, you can use different formulas.  In most cases, I've found the above algorithm accurate enough.  As an added bonus, the above formula is simple enough to easily use in a database SQL statement.

An example of an application would be a company that delivers widgets to a local area.  Based on the mileage, the delivery charge can be assessed.  Another example would be a news website that if it knows the users zip code, they can provide news that is more local.

Admittedly, the above formula won't get exact mileage for routing for a couple of reasons.  One, the calculation is as a crow flies meaning that the calculated mileage takes no consideration of the roads that you would have to take.  As an extreme example, imagine you are on a Delaware beach and want to get to New Jersey (I have no idea why anyone would want that).  The as a crow flies mileage would be about 25 miles.  The actual driving distance is closer to 167 miles.   Extreme examples aside, as a crow flies mileage is usually pretty close to the actual driving distance.  

The other issue with this is that you are only calculating zip code to zip code.  If the origin address is in Lewes, Delaware and the destination address is in Milton, Delaware, the distance based on the above calculation would be the center of Lewes, Delaware to the center of Milton, Delaware. What if I am at the north end of Lewes and travelling to a home in the south-east end of Milton?  The distance might only be 5 miles instead of the 12 miles from the center of zip code calculation.

I'm providing extreme examples to prove a point of the inaccuracy of these old methods.  In reality, you can usually accept this level of accuracy.  In the delivery service example, you plan to make money on some delivery (where the actual distance is less than reported) while losing money on others (where actual distance is more than reported).  

Three or four years ago, this was all we had (without paying lots of money for a routing service).  Good applications were still being built if not quite as accurate as everyone would have liked.

Fast forward to now and we have several providers (Google and Yahoo! among others) that provide an HTTP based geocoding service.  Mapping is becoming easier to integrate into your website.  The level of accuracy is at an address level compared to the zip code level of yesterday.  In another post, I will discuss the current implementation of geocoding and the maturation of the applications that can be built as a result.

Tuesday, March 3, 2009

Graphic Design and Hosting Selection - Joomla Install Step By Step Instructions - Step 1

I've been using Joomla over the past half a year now.  The application works very well for professional website design projects.  The Joomla application itself is free under the GPL license.  I enjoy using Joomla because it allows me to focus on my clients and building great websites instead of having to program functionality that has been built many time before.  

In this series of articles, I will go through the steps that I use to build a Joomla content managed website.  I have been using Joomla almost exclusively for the past six months and have setup approximately 15 websites (not all launched yet).    I will be going through the process that I use to setup a website.  

First step, is to have a graphic designer design a great look for your website.  Alternatively, you can select a template from one of several websites.  Monster Templates has many great templates built specifically for Joomla.  RocketTheme has also been a great source for finding a good template to for the website.  While we prefer to also have a graphic designer create a custom built design, sometimes that just isn't in the budget.  Purchasing a template will also free you from programming the CSS and html template files.

Once you've performed this, you'll need to select a hosting provider.  I have had experience hosting websites at the company I worked for, using a local vendor and using a national vendor.  My best experience has been with the national vendors.  While the tech support can be a little lacking sometimes, they have invested enough time in the server setups that little needs to be asked of them.  Some of the hosts that I have experience with are InMotion Hosting, Host Monster, and Host Gator.  For email, I use Google Apps.  It is currently free and provides excellent SPAM protection free of charge.

I would also recommend that if you are also purchasing a domain, do not purchase the domain through your host.  Use a domain registrar like GoDaddy (make sure they have free DNS management).  This makes it much easier to move the website if your host selected isn't satisfactory.  I have some horror stories where a client's website had to go down for 5 days while the domain was transferred away from the current host.  If the domain registrar was elsewhere, this would have been avoided.

After the above steps have completed, you will be able to actually create your Joomla install, install some components and modules, and begin actually creating content.  I will cover how to accomplish these steps in the installment of this series.