GeoDjango and the UK postcode database

This post describes how to play with the UK postcode database that recently turned up on WikiLeaks using GeoDjango.

You will need the following pieces of free software:

  1. PostgreSQL (MySQL does not provide sufficient geospacial features) — I will be using Postgres 8.4.1; please adjust your paths for your version and/or distributor.
  2. PostGIS spacial database extensions — I will be using version 1.4.0.
  3. Django — I will be using version 1.1. This version includes the GeoDjango framework.
  4. GEOS geometry engine — I will be using version 3.0.0 from the libgeos-c1 package in Debian unstable.

You will also need a copy of the leaked postcode data. Usage of this data is almost certainly a copyright violation.

Getting started

First, create a Django project and application for this example:

$ django-admin startproject ukpostcodes
$ cd ukpostcodes
$ ./manage.py startapp postcodes

Next, edit settings.py:

  • Configure DATABASE_*. You must use the postgresql_psycopg2 PostgreSQL database driver.
  • Add django.contrib.gis & ukpostcodes.postcodes to INSTALLED_APPS.

If you have not already done so, you must install PostGIS in your database:

$ sudo -u postgres createlang plpgsql -d <databasename>
$ sudo -u postgres psql <databasename> -f /usr/share/postgresql/8.4/contrib/postgis.sql
$ sudo -u postgres psql <databasename> -f /usr/share/postgresql/8.4/contrib/spatial_ref_sys.sql

Modelling a postcode

Now define a simple model to store imported postcodes:

from django.contrib.gis.db import models

class Postcode(models.Model):
    name = models.CharField(max_length=8, db_index=True)
    location = models.PointField()

    objects = models.GeoManager()
  • We import models from django.contrib.gis.db instead of django.db.
  • We override the default objects manager with GeoDjango's GeoManager() so we can perform spacial queries.

Save this model in postcodes/models.py. We can now create this model in our database using:

$ ./manage syncdb

Importing postcode data

We will use a management command to import the data. First, create the necessary modules with:

$ mkdir -p postcodes/management/commands
$ touch postcodes/management/commands/__init__.py
$ touch postcodes/management/__init__.py

Next, save the following code in postcodes/management/commands/import_postcodes.py:

import sys
import csv

from django.contrib.gis.geos import Point
from django.core.management.base import BaseCommand

from postcodes.models import Postcode

class Command(BaseCommand):
    def handle(self, *args, **options):
        Postcode.objects.all().delete()
        count = 0

        for row in csv.reader(sys.stdin):
            try:
                name = row[0].upper().strip().replace('  ', ' ')
                location = Point(map(float, row[13:15]))
            except ValueError:
                print "I: skipping %r" % row
                continue

            Postcode.objects.create(name=name, location=location)

            count += 1
            if count % 10000 == 0:
                print "Imported %d" % count

Finally, we can import the data using:

$ bunzip2 /path/to/uk-post-codes-2009.bz2 | ./manage.py import_postcodes

This takes approximately one hour on my computer. If you plan to import the data multiple times, consult the Django documentation for faster methods of importing initial data.

Quering the postcode database

We can now query our postcode database. First, let's find the postcode for Buckingham Palace:

>>> from postcodes.models import Postcode
>>> palace = Postcode.objects.get(name='SW1A 1AA')
>>> palace
<Postcode: SW1A 1AA>

Next, let's find all postcodes within 1 mile. For this, we will need to construct a tuple representing the area we want to search in.

>>> from django.contrib.gis.measure import Distance
>>> area = (palace.location, Distance(mi=1))
>>> Postcode.objects.filter(location__distance_lte=area)
[<Postcode: SE1 7BF>, <Postcode: SE1 7SG>, <Postcode: SE1 7JA>, <Postcode: SE1 7PB>,
 <Postcode: SE1 7PD>, <Postcode: SW1A 1DW>, <Postcode: SW1A 1EA>, <Postcode: SW1A 1EE>,
 <Postcode: SW1A 1EF>, <Postcode: SW1A 1EG>, '...(remaining elements truncated)...']
>>> Postcode.objects.filter(location__distance_lte=area)
7024

To find "neighbour" postcodes, we can pass a Point object to the .distance(..) method on our queryset – this adds a field we can then sort on:

>>> Postcode.objects.distance(palace.location).order_by('distance')[:4]
[<Postcode: SW1A 1AA>, <Postcode: SW1E 6JP>, <Postcode: SW1E 6LA>, <Postcode: SW1E 6WG>]

Putting it all together

Let's pretend we are making an app to find pizza from an arbitrary user-submitted postcode. (We will assume we have parsed the user's postcode and have found its Postcode object, but we will continue to use Buckingham Palace here.)

First, define the following model in postcodes/models.py and re-run ./manage.py syncdb:

class PizzaJoint(models.Model):
    name = models.CharField(max_length=200)
    postcode = models.ForeignKey(Postcode)

    objects = models.GeoManager()

    def __unicode__(self):
         return self.name

Now, add some local pizza places:

>>> def add_pizza_joint(name, loc):
...     code = Postcode.objects.get(name=loc)
...     PizzaJoint.objects.create(name=name, postcode=code)
...
>>> add_pizza_joint('Pizza Hut', 'SW1E 6SQ')
>>> add_pizza_joint('Pizza Express', 'SW1E 5NA')
>>> add_pizza_joint('Pizza On The Park', 'SW1X 7LY')

If we were not using GeoDjango, we would have to be content with simply matching the postcode:

>>> PizzaJoint.objects.filter(postcode=palace)
[]

However, we can now find the cloest pizza places:

>>> PizzaJoint.objects.distance(palace.location,
            field_name='postcode__location').order_by('distance')
[<PizzaJoint: Pizza Hut>, <PizzaJoint: Pizza Express>, <PizzaJoint: Pizza On The Park>]

The .distance(..) method also sets a distance attribute on all the objects returned; this makes it easy to show the actual distance to the user:

>>> for x in PizzaJoint.objects.distance(palace.location,
                     field_name='postcode__location').order_by('distance'):
...    print "%s (%d metres away)" % (x, x.distance.m)
...
Pizza Hut (499 metres away)
Pizza Express (509 metres away)
Pizza On The Park (902 metres away)

For more details, see the GeoDjango documentation.

[Belorussian translation of this post]

Comments (7)

Richard Warburton

Any reason you don't simply sed the postcode entries into the appropriate sql statements, and then pipe into postgres? Probably be simpler to write (though obviously you've already written this so a bit irrelevant) and maybe faster. Also bonus points for pro.

Sept. 30, 2009, 1:37 p.m. #
I did that to show how to create model instances that use PointField fields. (An even faster way to import would be to use executemany, avoiding parsing the SQL 1.8 million times.)
Jeremy Dunck

It's useful to note that coordinates in location data like this are relative to a "spatial reference system"; this is a bit like encodings for text, except that translating between reference systems can be lossy, due to the fact that the earth isn't perfectly round.

For this app, using the default SRID for all data is fine-- it's probably wrong, but not in a meaningful way, since all your points are from the same source and in roughly the same location on earth, and therefore are wrong in the same way. But if you start mixing data from different sources, you'll want to look into SRIDs more:

http://geodjan…
http://en.wiki…

Oct. 1, 2009, 3:25 a.m. #
Thanks for the info, Jeremy. I decided to skip over outlining SRIDs for three reasons: First, the post was already too long. Second, anyone making a "real world" application should be skimming over all of the GeoDjango documentation anyway and would find this. Third, finding the SRID of the dataset would be politically problematic, so I would not even be able to recommend which to use. But you're totally right.

Great article, thanks!

At the top you say "MySQL does not provide sufficient geospacial features" - can you elaborate on that?

In the GeoDjango documentation it says "MySQL and Oracle users can skip this [Create a Spatial Database] section because spatial types are already built into the database.", so presumably the steps in this article would work with MySQL, and you wouldn't need to install PostGIS, or are there other features that MySQL is missing?

Thanks, Ben

Oct. 4, 2009, 12:01 p.m. #
Take a look at http://geodjan…. MySQL might do exactly what you require, but if you're just getting started you don't really want your experimentation to be tempered.

I was recently bitten by the problems http://ernestm… is facing, and this is a brilliant way around the problem. It's worth mentioning that this works perfectly fine without GeoDjango (I'm not quite ready to launch into special database extensions for a very minor part of my website project); storing lat and lon as FloatFields. Obviously it needs slightly more code to be written but it seems a nice trade-off, particularly for adding this to an existing project.

Anyway, thanks again.

Cheers,
Carl

Oct. 6, 2009, 1:07 a.m. #
Faizan

Thanks for the tutorial Chris! I'm sure you're thrilled that post code data is now openly available via Code-Point Open.

I've used this data importing procedure in a couple of projects now. One thing I've noticed is that if you don't set Debug=False in your settings file, the import process consumes a LOT of RAM (~1.2GB in my case).

Feb. 20, 2011, 2:27 a.m. #

What if the density of locations in the database is too high around the 1 mile of radius of search point?
Maybe we will only be interested in showing the X(5 0r 10) nearby locations.
Is there any way we can optimize the process to find X nearby locations?

April 23, 2017, 2:02 p.m. #
LIMIT X, surely?

I don't know why I am unable to import the data, It's failing. Can it possibly be my RAM. I have only 2 GB

May 23, 2017, 8:55 p.m. #