http://ryba4.com

# Ramer-Douglas-Peucker Algorithm in Python

© Dmitri Lebedev (http://ryba4.com) [http://ryba4.com], April 2010, version 1.1.

This is a Python implementation of Ramer-Douglas-Peucker algorithm that simplifies polylines.

Image by Jitse Niesen [http://en.wikipedia.org/wiki/User:Jitse_Niesen].
The algorithm works the following way:
• Make a straight line between the beginning and the end of the polyline.
• Calculate the distance between each point and the straight.
• If the farthest point is further than the threshold value, it's included in the end result, and the operation is repeated on the parts to the left and to the right of it.

To simplify the code, my implementation uses object-oriented approach, though it is not hard to re-write it in procedural style. The positive side-effect of OO style is that the function can work with points of any number of dimensions (as long as all the points have it the same).

In this code I use a special formula to calculate the distance between a point and a line. Here's how it looks:

We know X, Y and Z (begin, end and curr in the code). We can calculate automatically a and b vectors and need vector c's length (module). To find the longest vector, we may use the squares of the lengths. Now see the way of thought:

Note that |a|2 actually takes less calculations than |a|, so this formula gets rid of few x**.5 operators.

The code:

```def ramerdouglas(line, dist):
"""Does Ramer-Douglas-Peucker simplification of
a line with `dist` threshold.
`line` must be a list of Vec objects,
all of the same type (either 2d or 3d)."""
if len(line) < 3:
return line

begin, end = line[0], line[-1]
distSq = [begin.distSq(curr) -
((end - begin) * (curr - begin)) ** 2 /
begin.distSq(end) for curr in line[1:-1]]

maxdist = max(distSq)
if maxdist < dist ** 2:
return [begin, end]

pos = distSq.index(maxdist)
return (ramerdouglas(line[:pos + 2], dist) +
ramerdouglas(line[pos + 1:], dist)[1:])
```

Basically, that's all the algorithm's code. The rest is service code with line and vector classes.

```class Line:
"""Polyline. Contains a list of points and outputs
a simplified version of itself."""
def __init__(self, points):
pointclass = points[0].__class__
for i in points[1:]:
if i.__class__ != pointclass:
raise TypeError("""All points in a Line
must have the same type""")
self.points = points

def simplify(self, dist):
if self.points[0] != self.points[-1]:
points = ramerdouglas(self.points, dist)
else:
points = ramerdouglas(
self.points[:-1], dist) + self.points[-1:]
return self.__class__(points)

def __repr__(self):
return '{0}{1}'.format(self.__class__.__name__,
tuple(self.points))

class Vec:
"""Generic vector class for n-dimensional vectors
for any natural n."""
def __eq__(self, obj):
"""Equality check."""
if self.__class__ == obj.__class__:
return self.coords == obj.coords
return False

def __repr__(self):
"""String representation. The string is executable as Python
code and makes the same vector."""
return '{0}{1}'.format(self.__class__.__name__, self.coords)

if not isinstance(obj, self.__class__):
raise TypeError

return self.__class__(*map(sum, zip(self.coords, obj.coords)))

def __neg__(self):
"""Reverse the vector."""
return self.__class__(*[-i for i in self.coords])

def __sub__(self, obj):
"""Substract object from self."""
if not isinstance(obj, self.__class__):
raise TypeError

return self + (- obj)

def __mul__(self, obj):
"""If obj is scalar, scales the vector.
If obj is vector returns the scalar product."""
if isinstance(obj, self.__class__):
return sum([a * b for (a, b) in zip(self.coords, obj.coords)])

return self.__class__(*[i * obj for i in self.coords])

def dist(self, obj = None):
"""Distance to another object. Leave obj empty to get
the length of vector from point 0."""
return self.distSq(obj) ** 0.5

def distSq(self, obj = None):
""" Square of distance. Use this method to save
calculations if you don't need to calculte an extra square root."""
if obj is None:
obj = self.__class__(*[0]*len(self.coords))
elif not isinstance(obj, self.__class__):
raise TypeError('Parameter must be of the same class')

# simple memoization to save extra calculations
if obj.coords not in self.distSqMem:
self.distSqMem[obj.coords] = sum([(s - o) ** 2 for (s, o) in
zip(self.coords, obj.coords)])
return self.distSqMem[obj.coords]

class Vec3D(Vec):
"""3D vector"""
def __init__(self, x, y, z):
self.coords = x, y, z
self.distSqMem = {}

class Vec2D(Vec):
"""2D vector"""
def __init__(self, x, y):
self.coords = x, y
self.distSqMem = {}
```

Highlighted with Pygments [http://pygments.org].

This code at work:

Figure 1. Original line in 10*10 km square. 492 points.

Figure 2. Simplified line, threshold = 50 m. ~150 points.

Figure 3. Simplified line, threshold = 100 m. ~90 points.

Figure 3. Simplified line, threshold = 250 m. ~45 points.

The data is a coastline from this place [http://osm.org/go/2ttZPN0-] in OpenStreetMap [http://osm.org]. Coordinates are in kilometres.

I know this code isn't the fastest, and procedural approach (which I actually like more) can work faster on larger amount of data, but I finally chose OO approach, as it allows readable and maintainable code. Here is a procedural-style implementation [http://mappinghacks.com/2008/05/05/douglas-peucker-line-simplification-in-python/].