Mar 22, 2011

Building Interactive Maps With Polymaps, TileStache, and MongoDB: Part 1

I’d been toying around with ideas for cool ancillary features for Goalfinch for a while, and finally settled on creating this interactive map of Twitter weight loss goals. I knew what I wanted: a Google-maps-style, draggable, zoomable, slick-looking map, with the ability to combine raster images and style-able vector data. And I didn’t want to use Flash. But as a complete geographic information sciences (GIS) neophyte, I had no idea where to start. Luckily there are some new technologies in this area that greatly simplified this project. I’m going to show you how they all fit together so you can create your own interactive maps for the browser.

Overview

The main components of the weight loss goals map are:
  1. Client-side Javascript that assembles the map from separate layers (using Polymaps)
  2. Server-based application that provides the data for each layer (TileStache, MongoDB, PostGIS, Pylons)
  3. Server-based Python code that runs periodically to search Twitter and update the weight loss goal data
I’ll cover each component separately in upcoming posts, but I’ll start with a high-level description of how the components work together for those of you who are new to web-based interactive maps.

Serving information-rich content to the browser requires programmers to think carefully about performance. For an interactive, detail-filled map of the globe, we could serve a single, very high-resolution image, but it would take a while to load. If we want our map to show up and be usable right away, we need a different strategy. That’s why most online maps (such as Google Maps) use a technique called tiling. With tiling we load a series of smaller images (tiles) that cover the visible map area and dynamically load tiles covering other areas only as the user pans to them. Tiles can be images stitched together by the browser or vector data for a particular geographical region. This lets us display the map relatively quickly without having to wait for the non-visible images to load. Another advantage to tiling is that we can load different tiles for different zoom levels. So when the map initially appears zoomed all the way out we don’t have to overload the browser with all the geographic complexities that won’t even be discernable at this scale.

Polymaps is a Javascript library that handles requesting image and vector tiles, stitching them together, and assembling the multiple layers. Using Polymaps, I was able to assemble a base layer of image tiles and two SVG layers for the county and state boundaries with a bit of Javascript.

So we have Polymaps assembling the map in the browser on the fly, but where is this data coming from? The short answer is: wherever we want. Here’s the long answer.

For the image tiles, the conventional approach has been to collect a bunch of geographic data from somewhere like OpenStreetMap, shove it into a database, and use that data to render PNG files for the various zoom levels. If you want complete control over how your images tiles look, this is the only way to go. I, however, only wanted a basic, monochrome gray map on which to overlay SVG, and found the perfect solution in the CloudMade Maps API and their free developer account. So rather than building and hosting the map tiles myself, I was able to pull in map tiles from CloudMade’s servers in my Polymaps code.

The vector data for state and county boundaries is served from my own server as GeoJSON using a combination of TileStache — a cache for image and vector map tiles — and Postgresql/PostGIS. Integrating TileStache with my existing Pylons application was a breeze. Learning all about PostGIS, shapefiles, SRIDs, projections, and polygon simplification was quite a bit more pain for me, so hopefully my upcoming post on that will help other newcomers get these details right.

Finally, to get the data I was actually interested in, I wrote a Python script to repeatedly ask Twitter’s search API for tweets related to weight loss in each county across the US, store the results in MongoDB, and do some simple natural language processing to determine how much weight each user wanted to lose. This made it possible to calculate the average weight loss goals of Twitter users on a per-location basis.

I’ll get much farther into the details of each of these components over the next few days, but for now you can check out the end result. Enjoy!

Continue to part 2

Continue to part 3

6 Comments

Leave a comment