Writing your own Geohash algorthim, part 1

Sometimes you find yourself with a random computing science seeming problem that you just want to roll up your sleeves and get you hands dirty with. I had one of those moments, about 4 months ago.

Now, four months may seeeem like a really long time, but as a classic lean saying say’s, results are not the point. It may have taken four months, but in those four months I learned some valuable lessons about computer programming. I thought it would be worth sharing them.

The problem.

The first part of the process is to frame the problem. On maps.nokia.com we have some functionality that means whenever you drag the map, the url bar updates with the new latitude/longitude.

Which is fine.

Unless.

You live in China.

In China you are not allowed to display latitude/longitude. Instead you must hide it from the sight of your viewers, incase they see it and realise the world is not the distorted place they’ve been told it is by their government. Go figure. So, my problem was how to save the location, via the url bar, without using the lat/long.

Googling this topic leads you to three likely looking solutions.

Geohash

The geohash appears to be a relatively new (2008) discovery, invented by Gustavo Niemeyer, you can find the implementation at Geohash.org.

I spent sometime replicating this technique in Javascript, essentially, it involves, systematic storing of deltas (or I guess diff’s for people who know source control). The starting premise is that for example, a latitude exists between -90 and 90 degrees. The first delta identifies whether it is in the first or second half of that range. So if I took the latitude, 52.51607, I would programatically ask, is that between -90 and 0 or 0 and 90, clearly it would be the latter so my first delta would be 1, if it had been -52.51607 then my delta would be 0. Doing this recursively I can then create a pattern of 1’s and 0’s that slowly drill down to my desired latitude. Then, once i’ve done that, I can take that binary string and convert it to integers between 0 and 32 and then using base32 represent it as a string.

It means for the coordinate pair 57.64911,10.40744, you end up with a geohash of u4pruydqqvj which is a saving of 6 characters (including points and the comma delimiter). Not bad. But what are the other options?

Patent No : 7, 302, 343

Working for Nokia, I have luxury of being able to tap into a vast treasure trove of both Microsoft and Nokia patents without the fear of being sued. Now, I’m not saying I think software patents are a good thing, but it was a relief when I found this patent which is described as compact text encoding of Longitude/Latitude was owned by Microsoft rather than Apple. It seemed to be exactly what I was looking for.

This technique takes the long/lats and converts them into non-negative numbers. So for example 47.64932 converts to 18,583,657 to be honest, don’t ask me why this works, I followed the formulas in the paper and boom, got the right number, but the theory behind converting to a non-negative number is lost on me. Then with this integer, you can generate a base-n (so in this case they used base32 again) string.

So with the example co-ordinate pair, 47.6493, -122.12926 you end up with a hash of ry7cx4tp95, a saving of nine whole characters! Pretty damn impressive, but there was one final method I wanted to research, mainly because it was named after a place in my homeland.

The Maidenhead Locator System

300px-maidenhead_grid_over_europe

Developed in 1980 in Maidenhead, England, this system was designed by VHF managers and used by amateur radio enthusiasts. The Maidenhead Locator system was created with the use case of being able to transmit a location within 12km accuracy in a simple 6 character string using morse code.

It essentially works on the idea of converting the world map into a grid, based on degrees and assigning them a letter or a number. Each character in the string and precisely positioned and different locations have different boundaries. An example of a Maidenhead Locator system hash is:

BL11bh16

Stolen from wikipedia, below is a run down of what each of the characters does.

  • Character pairs encode longitude first, and then latitude.
  • The first pair (a field) encodes with base 18 and the letters “A” to “R”.
  • The second pair (square) encodes with base 10 and the digits “0” to “9”.
  • The third pair (subsquare) encodes with base 24 and the letters “A” to “X”.
  • The fourth pair (extended square) encodes with base 10 and the digits “0” to “9”.

Interesting, a system invented in the 80’s that achieves a small, human readable hash of a lat/long with only 8 characters but alas with one large draw back, which is that it’s not as accurate as I need.

Having invested a large amount of time replicating in Javascript each of these techniques, I realised that what I need was a hybrid system, using the Maidenhead locator concept as a basis. In part 2, I will explore the javascript I need to write a geohash which matches the precision of the Geohash and Microsoft patent solution with the number of characters from the Maidenhead system.