Ramer-Douglas-Peucker Algorithm in Python

© Dmitri Lebedev ( [], April 2010, version 1.1.

This is a Python implementation of Ramer-Douglas-Peucker algorithm that simplifies polylines.

Image by Jitse Niesen [].
The algorithm works the following way:

To simplify the code, my implementation uses object-oriented approach, though it is not hard to re-write it in procedural style. The positive side-effect of OO style is that the function can work with points of any number of dimensions (as long as all the points have it the same).

In this code I use a special formula to calculate the distance between a point and a line. Here's how it looks:

We know X, Y and Z (begin, end and curr in the code). We can calculate automatically a and b vectors and need vector c's length (module). To find the longest vector, we may use the squares of the lengths. Now see the way of thought:

Note that |a|2 actually takes less calculations than |a|, so this formula gets rid of few x**.5 operators.

The code:

def ramerdouglas(line, dist):
    """Does Ramer-Douglas-Peucker simplification of
    a line with `dist` threshold.
    `line` must be a list of Vec objects,
	all of the same type (either 2d or 3d)."""
    if len(line) < 3:
        return line
    begin, end = line[0], line[-1]
    distSq = [begin.distSq(curr) -
        ((end - begin) * (curr - begin)) ** 2 /
        begin.distSq(end) for curr in line[1:-1]]
    maxdist = max(distSq)
    if maxdist < dist ** 2:
        return [begin, end]
    pos = distSq.index(maxdist)
    return (ramerdouglas(line[:pos + 2], dist) + 
            ramerdouglas(line[pos + 1:], dist)[1:])

Basically, that's all the algorithm's code. The rest is service code with line and vector classes.

class Line:
    """Polyline. Contains a list of points and outputs
    a simplified version of itself."""
    def __init__(self, points):
        pointclass = points[0].__class__
        for i in points[1:]:
            if i.__class__ != pointclass:
                raise TypeError("""All points in a Line
                                must have the same type""")
        self.points = points
    def simplify(self, dist):
        if self.points[0] != self.points[-1]:
            points = ramerdouglas(self.points, dist)
            points = ramerdouglas(
                self.points[:-1], dist) + self.points[-1:]
        return self.__class__(points)
    def __repr__(self):
        return '{0}{1}'.format(self.__class__.__name__,

class Vec:
    """Generic vector class for n-dimensional vectors
    for any natural n."""
    def __eq__(self, obj):
        """Equality check."""
        if self.__class__ == obj.__class__:
            return self.coords == obj.coords
        return False
    def __repr__(self):
        """String representation. The string is executable as Python
        code and makes the same vector."""
        return '{0}{1}'.format(self.__class__.__name__, self.coords)
    def __add__(self, obj):
        """Add a vector."""
        if not isinstance(obj, self.__class__):
            raise TypeError
        return self.__class__(*map(sum, zip(self.coords, obj.coords)))
    def __neg__(self):
        """Reverse the vector."""
        return self.__class__(*[-i for i in self.coords])
    def __sub__(self, obj):
        """Substract object from self."""
        if not isinstance(obj, self.__class__):
            raise TypeError
        return self + (- obj)

    def __mul__(self, obj):
        """If obj is scalar, scales the vector.
        If obj is vector returns the scalar product."""
        if isinstance(obj, self.__class__):
            return sum([a * b for (a, b) in zip(self.coords, obj.coords)])
        return self.__class__(*[i * obj for i in self.coords])

    def dist(self, obj = None):
        """Distance to another object. Leave obj empty to get
        the length of vector from point 0."""
        return self.distSq(obj) ** 0.5

    def distSq(self, obj = None):
        """ Square of distance. Use this method to save
        calculations if you don't need to calculte an extra square root."""
        if obj is None:
            obj = self.__class__(*[0]*len(self.coords))
        elif not isinstance(obj, self.__class__):
            raise TypeError('Parameter must be of the same class')
        # simple memoization to save extra calculations
        if obj.coords not in self.distSqMem:
            self.distSqMem[obj.coords] = sum([(s - o) ** 2 for (s, o) in
                zip(self.coords, obj.coords)])
        return self.distSqMem[obj.coords]

class Vec3D(Vec):
    """3D vector"""
    def __init__(self, x, y, z):
        self.coords = x, y, z
        self.distSqMem = {}

class Vec2D(Vec):
    """2D vector"""
    def __init__(self, x, y):
        self.coords = x, y
        self.distSqMem = {}

Highlighted with Pygments [].

This code at work:

Figure 1. Original line in 10*10 km square. 492 points.

Figure 2. Simplified line, threshold = 50 m. ~150 points.

Figure 3. Simplified line, threshold = 100 m. ~90 points.

Figure 3. Simplified line, threshold = 250 m. ~45 points.

You can download the code with this line and it's simplifications.

The data is a coastline from this place [] in OpenStreetMap []. Coordinates are in kilometres.

I know this code isn't the fastest, and procedural approach (which I actually like more) can work faster on larger amount of data, but I finally chose OO approach, as it allows readable and maintainable code. Here is a procedural-style implementation [].