http://ryba4.com

© Dmitri Lebedev (http://ryba4.com) [http://ryba4.com], April 2010, version 1.1.

This is a Python implementation of **Ramer-Douglas-Peucker**
algorithm that simplifies polylines.

- Make a straight line between the beginning and the end of the polyline.
- Calculate the distance between each point and the straight.
- If the farthest point is further than the threshold value, it's included in the end result, and the operation is repeated on the parts to the left and to the right of it.

To simplify the code, my implementation uses object-oriented approach, though it is not hard to re-write it in procedural style. The positive side-effect of OO style is that the function can work with points of any number of dimensions (as long as all the points have it the same).

In this code I use a special formula to calculate the distance between a point and a line. Here's how it looks:

We know X, Y and Z (*begin*, *end* and *curr* in the code). We can calculate automatically *a* and *b* vectors and need vector *c*'s length (module). To find the longest vector, we may use the squares of the lengths. Now see the way of thought:

Note that *|a| ^{2}* actually takes less calculations than

The code:

def ramerdouglas(line, dist): """Does Ramer-Douglas-Peucker simplification of a line with `dist` threshold. `line` must be a list of Vec objects, all of the same type (either 2d or 3d).""" if len(line) < 3: return line begin, end = line[0], line[-1] distSq = [begin.distSq(curr) - ((end - begin) * (curr - begin)) ** 2 / begin.distSq(end) for curr in line[1:-1]] maxdist = max(distSq) if maxdist < dist ** 2: return [begin, end] pos = distSq.index(maxdist) return (ramerdouglas(line[:pos + 2], dist) + ramerdouglas(line[pos + 1:], dist)[1:])

Basically, that's all the algorithm's code. The rest is service code with line and vector classes.

class Line: """Polyline. Contains a list of points and outputs a simplified version of itself.""" def __init__(self, points): pointclass = points[0].__class__ for i in points[1:]: if i.__class__ != pointclass: raise TypeError("""All points in a Line must have the same type""") self.points = points def simplify(self, dist): if self.points[0] != self.points[-1]: points = ramerdouglas(self.points, dist) else: points = ramerdouglas( self.points[:-1], dist) + self.points[-1:] return self.__class__(points) def __repr__(self): return '{0}{1}'.format(self.__class__.__name__, tuple(self.points)) class Vec: """Generic vector class for n-dimensional vectors for any natural n.""" def __eq__(self, obj): """Equality check.""" if self.__class__ == obj.__class__: return self.coords == obj.coords return False def __repr__(self): """String representation. The string is executable as Python code and makes the same vector.""" return '{0}{1}'.format(self.__class__.__name__, self.coords) def __add__(self, obj): """Add a vector.""" if not isinstance(obj, self.__class__): raise TypeError return self.__class__(*map(sum, zip(self.coords, obj.coords))) def __neg__(self): """Reverse the vector.""" return self.__class__(*[-i for i in self.coords]) def __sub__(self, obj): """Substract object from self.""" if not isinstance(obj, self.__class__): raise TypeError return self + (- obj) def __mul__(self, obj): """If obj is scalar, scales the vector. If obj is vector returns the scalar product.""" if isinstance(obj, self.__class__): return sum([a * b for (a, b) in zip(self.coords, obj.coords)]) return self.__class__(*[i * obj for i in self.coords]) def dist(self, obj = None): """Distance to another object. Leave obj empty to get the length of vector from point 0.""" return self.distSq(obj) ** 0.5 def distSq(self, obj = None): """ Square of distance. Use this method to save calculations if you don't need to calculte an extra square root.""" if obj is None: obj = self.__class__(*[0]*len(self.coords)) elif not isinstance(obj, self.__class__): raise TypeError('Parameter must be of the same class') # simple memoization to save extra calculations if obj.coords not in self.distSqMem: self.distSqMem[obj.coords] = sum([(s - o) ** 2 for (s, o) in zip(self.coords, obj.coords)]) return self.distSqMem[obj.coords] class Vec3D(Vec): """3D vector""" def __init__(self, x, y, z): self.coords = x, y, z self.distSqMem = {} class Vec2D(Vec): """2D vector""" def __init__(self, x, y): self.coords = x, y self.distSqMem = {}

Highlighted with Pygments [http://pygments.org].

This code at work:

Figure 1. Original line in 10*10 km square. 492 points.

Figure 2. Simplified line, threshold = 50 m. ~150 points.

Figure 3. Simplified line, threshold = 100 m. ~90 points.

Figure 3. Simplified line, threshold = 250 m. ~45 points.

You can download the code with this line and it's simplifications.

The data is a coastline from this place [http://osm.org/go/2ttZPN0-] in OpenStreetMap [http://osm.org]. Coordinates are in kilometres.

I know this code isn't the fastest, and procedural approach (which I actually like more) can work faster on larger amount of data, but I finally chose OO approach, as it allows readable and maintainable code. Here is a procedural-style implementation [http://mappinghacks.com/2008/05/05/douglas-peucker-line-simplification-in-python/].