September 30th 2009

GeoDjango and the UK postcode database

This post describes how to play with the UK postcode database that recently turned up on WikiLeaks using GeoDjango

You will need the following pieces of free software:

  1. PostgreSQL (MySQL does not provide sufficient geospacial features) — I will be using Postgres 8.4.1; please adjust your paths for your version and/or distributor.

  2. PostGIS spacial database extensions — I will be using version 1.4.0.

  3. Django — I will be using version 1.1. This version includes the GeoDjango framework.

  4. GEOS geometry engine — I will be using version 3.0.0 from the libgeos-c1 package in Debian unstable.

You will also need a copy of the leaked postcode data. Usage of this data is almost certainly a copyright violation.

Getting started

First, create a Django project and application for this example:

$ django-admin startproject ukpostcodes
$ cd ukpostcodes
$ ./manage.py startapp postcodes

Next, edit settings.py:

If you have not already done so, you must install PostGIS in your database:

$ sudo -u postgres createlang plpgsql -d <databasename>
$ sudo -u postgres psql <databasename> -f /usr/share/postgresql/8.4/contrib/postgis.sql
$ sudo -u postgres psql <databasename> -f /usr/share/postgresql/8.4/contrib/spatial_ref_sys.sql

Modelling a postcode

Now define a simple model to store imported postcodes:

from django.contrib.gis.db import models

class Postcode(models.Model):
    name = models.CharField(max_length=8, db_index=True)
    location = models.PointField()

    objects = models.GeoManager()

Save this model in postcodes/models.py. We can now create this model in our database using:

$ ./manage syncdb

Importing postcode data

We will use a management command to import the data. First, create the necessary modules with:

$ mkdir -p postcodes/management/commands
$ touch postcodes/management/commands/__init__.py
$ touch postcodes/management/__init__.py

Next, save the following code in postcodes/management/commands/import_postcodes.py:

import sys
import csv

from django.contrib.gis.geos import Point
from django.core.management.base import BaseCommand

from postcodes.models import Postcode

class Command(BaseCommand):
    def handle(self, *args, **options):
        Postcode.objects.all().delete()
        count = 0

        for row in csv.reader(sys.stdin):
            try:
                name = row[0].upper().strip().replace('  ', ' ')
                location = Point(map(float, row[13:15]))
            except ValueError:
                print "I: skipping %r" % row
                continue

            Postcode.objects.create(name=name, location=location)

            count += 1
            if count % 10000 == 0:
                print "Imported %d" % count

Finally, we can import the data using:

$ bunzip2 /path/to/uk-post-codes-2009.bz2 | ./manage.py import_postcodes

This takes approximately one hour on my computer. If you plan to import the data multiple times, consult the Django documentation for faster methods of importing initial data.

Quering the postcode database

We can now query our postcode database. First, let's find the postcode for Buckingham Palace:

>>> from postcodes.models import Postcode
>>> palace = Postcode.objects.get(name='SW1A 1AA')
>>> palace
<Postcode: SW1A 1AA>

Next, let's find all postcodes within 1 mile. For this, we will need to construct a tuple representing the area we want to search in.

>>> from django.contrib.gis.measure import Distance
>>> area = (palace.location, Distance(mi=1))
>>> Postcode.objects.filter(location__distance_lte=area)
[<Postcode: SE1 7BF>, <Postcode: SE1 7SG>, <Postcode: SE1 7JA>, <Postcode: SE1 7PB>,
 <Postcode: SE1 7PD>, <Postcode: SW1A 1DW>, <Postcode: SW1A 1EA>, <Postcode: SW1A 1EE>,
 <Postcode: SW1A 1EF>, <Postcode: SW1A 1EG>, '...(remaining elements truncated)...']
>>> Postcode.objects.filter(location__distance_lte=area)
7024

To find "neighbour" postcodes, we can pass a Point object to the .distance(..) method on our queryset – this adds a field we can then sort on:

>>> Postcode.objects.distance(palace.location).order_by('distance')[:4]
[<Postcode: SW1A 1AA>, <Postcode: SW1E 6JP>, <Postcode: SW1E 6LA>, <Postcode: SW1E 6WG>]

Putting it all together

Let's pretend we are making an app to find pizza from an arbitrary user-submitted postcode. (We will assume we have parsed the user's postcode and have found its Postcode object, but we will continue to use Buckingham Palace here.)

First, define the following model in postcodes/models.py and re-run ./manage.py syncdb:

class PizzaJoint(models.Model):
    name = models.CharField(max_length=200)
    postcode = models.ForeignKey(Postcode)

    objects = models.GeoManager()

    def __unicode__(self):
         return self.name

Now, add some local pizza places:

>>> def add_pizza_joint(name, loc):
...     code = Postcode.objects.get(name=loc)
...     PizzaJoint.objects.create(name=name, postcode=code)
...
>>> add_pizza_joint('Pizza Hut', 'SW1E 6SQ')
>>> add_pizza_joint('Pizza Express', 'SW1E 5NA')
>>> add_pizza_joint('Pizza On The Park', 'SW1X 7LY')

If we were not using GeoDjango, we would have to be content with simply matching the postcode:

>>> PizzaJoint.objects.filter(postcode=palace)
[]

However, we can now find the closest pizza places:

>>> PizzaJoint.objects.distance(palace.location,
            field_name='postcode__location').order_by('distance')
[<PizzaJoint: Pizza Hut>, <PizzaJoint: Pizza Express>, <PizzaJoint: Pizza On The Park>]

The .distance(..) method also sets a distance attribute on all the objects returned; this makes it easy to show the actual distance to the user:

>>> for x in PizzaJoint.objects.distance(palace.location,                        
    field_name='postcode__location').order_by('distance'):
...    print "%s (%d metres away)" % (x, x.distance.m)
...
Pizza Hut (499 metres away)
Pizza Express (509 metres away)
Pizza On The Park (902 metres away)

For more details, see the GeoDjango documentation.

Belorussian translation of this post




You can subscribe to new posts via email or RSS.