3DSoftware.com  >   Cartography  >   Programming  >  Polyline Reduction Cartography Programming
 Polyline Reduction

In 1973, David H. Douglas and Thomas K. Peucker wrote an article in Canadian Cartographer magazine explaining how they developed an algorithm for reducing the number of points in a polyline, when the polyline needs to be displayed at a lower resolution. [ 1 ]  This is a common problem in cartography, part of the concept of generalization:

 “The process of reducing the amount of detail in a map in a meaningful way is called generalization. The process of generalization is normally executed when the map scale has to be reduced.” [ 2 ]

Kraak and Ormeling go on to introduce the concept of line simplification with the nth point algorithm and with the Douglas-Peucker algorithm. [ 3 ]  The nth point algorithm consists of skipping every n points. This is quick and easy, but may miss corners and other important details. The Douglas-Peucker algorithm has a more sophisticated way of skipping points. It basically visualizes long lines, and discards points that are close to the virtual lines. This is more calculation intensive than the nth point algorithm, but provides much better results.

Douglas and Peucker developed iterative and recursive versions of their algorithm, in the fortran and algol programming languages respectively, on an IBM 370 mainframe computer. In this article we will describe an iterative approach using C/C++.  For a C++ implementation using recursion, see the Geometry Algorithms web site. [ 4 ]
Footnotes:

 1. David H. Douglas and Thomas K. Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” Canadian Cartographer Vol 10 No 2 December 1973, pp. 112-122. 2. Menno-Jan Kraak and Ferjan Ormeling, Cartography: Visualization of Geospatial Data 2nd Ed., p. 75. 3. Ibid., Sec. 5.4.3  pp. 82-83. 4. Dan Sunday, Polyline Simplification

Shortest Distance to a Line Segment

In this section, we will cover Linear Algebra concepts.

We begin with a definition of vector.   A vector, for this discussion, will be a list of numbers. The following are examples of vectors:

 (  1, 2, 3  ) (  3, 0, 4, 1  ) (  2, 3  )

The first vector has 3 components. The second and third vectors have 4 and 2 components respectively. A vector's number of components represents the number of dimensions its vector space has. In this article, we use vectors that have two components, representing points and lines in 2 dimensions. All of the examples in this article may be extended to 3 dimensions, by simply using vectors with 3 components.

The length of a vector is called its magnitude or norm.   This length is a scalar. (A scalar is a single number, not a vector). It can be calculated using the Pythagorean Theorem. For example, consider a point P represented by the vector (  2, 3  ):

The line from the Origin to P can be considered the hypotenuse of a right triangle, with the sides of the triangle being 2 and 3. According to the Pythagorean Theorem, the length of the hypotenuse will equal the square root of the sum of the squares of the sides:

 Distance 0,0 to P = sqrt( 2.0 * 2.0 + 3.0 * 3.0 )

Now we'll define what a unit vector is. A unit vector is a vector with a length of 1. To convert any vector to a unit vector, simply divide each component of the vector with the vector length. This is called normalizing a vector. The unit vector points in the same direction as the vector it represents, but has a length of 1 no matter what the original vector's length is.  For example, the unit vector of (  2, 3  ) is approximately (  0.5547, 0.83205  ).

Next we define the Inner Product, which is also known as the Dot Product (both terms are used interchangably). The Inner Product is an operation that is performed on two vectors that have the same number of components (dimensions), and the result is a scalar (a single number, not a vector). This resulting number (the dot product) is the sum of the products of the corresponding components. [ 5 ]  For example, the dot product of (  2, 3  ) and (  5, 6  ) is:

 (  2, 3  ) · (  5, 6  )  =  2 × 5 + 3 × 6  =  28

Notice that the symbol for a dot product is a dot (·), not the multiplication sign (×).

It is convenient to use labels and symbols to refer to vectors. The vectors (  2, 3  ) and (  5, 6  ) can be called u and v respectively:

 u  =  (  2, 3  ) v  =  (  5, 6  )

The length of a vector is denoted by its name (u or v in this example) surrounded by double vertical bars:

 ||u||  =  sqrt( 13.0 )  =  3.6055513   ||v||  =  sqrt( 61.0 )  =  7.81
 5 For definition of inner product, see linear algebra textbooks, such as:   Peter Shirley, Fundamentals of Computers Graphics (Natick, MA: A.K. Peters, 2002), Fig. 2.18, p. 25.   Steven A. Leduc, CliffNotes Linear Algebra (New York: Wiley, 1996), p. 27.   Seymour Lipschutz and Marc Lipson, Schaum's Outline of Linear Algebra 3rd Ed. (McGraw-Hill, 2001), p. 4.   Gareth Williams, Computational Linear Algebra with Models (Boston: Allyn and Bacon, 1978), p. 186.

The unit vector is a vector divided by its length:

 unit vector of u  = u   ||u||

The symbol for the unit vector is the hat symbol:

 û  = u   ||u||

However, unit vectors are often simply labeled as the vector divided by its length:

 u   ||u||

The inner product of vectors has several interesting properties. Of particular interest for our task is that the inner product can be used to calculate the projection of one vector onto another, which in turn can be used to calculate the shortest distance from the end of one vector (a point) to a line passing through the other vector.

Consider projecting a vector onto the line of another vector. We'll call the first vector u and the second vector v. Of special interest for us is the scalar (single number) that represents how far along the line of the vector v is the perpendicular projection of vector u onto the line of v (the distance from 0,0 to point A):

Conveniently, that number is the inner product of vector u and the unit vector of v:

 distance from 0,0 to point A  = u · v   ||v||

After calculating that number, you can use the Pythagorean Theorem to determine the distance from point A to the end of vector u, which is usually the shortest distance from the end of u to the line segment represented by vector v.

However, if the angle between the vectors u and v is greater than 90, the perpendicular projection will not land on the line segment represented by the vector v, but will instead land on a line that is collinear with v but extends past the vector's endpoints.  In that case, the projection scalar is negative instead of positive, [ 6 ]  and the shortest distance from the end of u to the line segment represented by v is the length of u.
 6 Williams, pp. 198-199. Lipschutz and Lipson, ex. 1.5b, p. 7. Leduc, pp. 36-39.

It is very convenient to have the projection scalar be negative in that case. Your program simply tests for the projection scalar being negative, and if it is negative then use the length of the vector u as the shortest distance to the line segment. Otherwise, calculate the perpendicular projection distance using the Pythagorean Theorem, and use that as the shortest distance to the line segment.

Finally, there is one last consideration. If the length of u is longer than the length of v and the angle between them is small enough, the perpendicular projection of u onto v may not intersect v, but instead intersect the line that is collinear with v extending past v:

In that case, the shortest distance to the line segment is not the perpendicular projection. To detect that possibility, we must repeat this process, but with the v vector reversed:

This is only necessary if the first set of vectors produced a positive (not negative) projection scalar. After reconstructing both vectors, proceed as before with the new vectors. Then if the projection scalar is negative, use the length of u as the shortest distance from the end point of u to the line segment of v. Otherwise use the perpendicular projection distance as the shortest distance.

Using Perpendicular Distance Only

Douglas and Peucker seem to have used only the perpendicular distance to select points, [ 7 ]  which is not necessarily the actual shortest distance (as described above). We'll explain the difference here, and compare results of the two methods.

Using only the perpendicular distance makes use of other math formulas than described above. Those other formulas had special properties of their own, which Douglas and Peucker took advantage of.

Calculating the perpendicular-only distance from a point to a line of infinte length is accomplished with a single math formula. [ 8 ]  That equation has the special property that it causes the distance to change sign when the point crosses to the other side of the infinite line. In other words, the sign changes as the point moves in a direction perpendicular to the line. This is different than our formulas listed above, which detect a sign change as the point moves parallel to the line.

Detecting a sign change as a point moves across a line allowed Douglas and Peucker to implement an estimation accumulator. [ 9 ]  However, computers are now more powerful. [ 10 ]  Those kind of optimizations are now performed at the hardware level. New implementations of the Douglas-Peucker algorithm should use the linear algebra formulas described in this article (see above). These formulas can detect movement parallel to the line, which the earlier formulas cannot.

For comparison, we implemented a perpendicular-only variation to simulate the original Douglas-Peucker algorithm. One characteristic of the perpendicular-only method is that it can ignore long inlets that are parallel to the first and last point of a polyline. Such an inlet could extend far beyond the end points of the polyline, but remain near its infinite line from which perpendicular tolerances are measured.
 7. With the exception of closed polylines: “In the case of closed loops, where the first and last point do not define a line then the maximum perpendicular distance from the segment is replaced with the maximum distance from the point.” – Douglas and Peucker, p. 117. 8. Murray R. Spiegel and John Lui, Schaum's Outline: Mathematical Handbook of Formulas and Tables 2nd Ed. (McGraw-Hill, 1999), Sec. 8.8, p. 20.  See also: William H. Beyer, CRC Standard Mathematical Tables 24th Ed. (CRC Press, 1976), p. 295. 9. Douglas and Peucker, Fig. 4, p. 118. 10. For example, “many modern CPUs can usually perform floating point operations just as fast as they perform integer calculations.” – Peter Shirley, Fundamentals of Computers Graphics (Natick, MA: A.K. Peters, 2002), p. 8.

The following is an example showing the Texas coast line on the Gulf of Mexico. This is a satellite image (courtesy of NASA Visible Earth project), with boundary points from the VMap Level 0 data set.

 Raw boundary points (no polyline reduction) Polyline reduction with perpendicular-only testing Polyline reduction with complete shortest distance testing

Here are blow ups of the two polyline reduction examples:

 Polyline reduction with perpendicular-only distance testing. Note that the satellite image shows shallow water ocean bottom (as well as land cover) so that some areas which appear like land may actually be under water. Also note that we did not implement Douglas and Peucker's exception for closed loops, as specified in Footnote 7 of this article. Polyline reduction with shortest distance testing (not perpendicular-only distances). This is the default for new implementations of the Douglas-Peucker algorithm.

Iterative Implementation

Our iterative implementation consists of two loops, one nested inside the other. The outer loop traverses the entire polyline, one vertex at a time, looking for an anchor point, then (when an anchor point is found) looking for a floater. Then when the corresponding floater is found, the inner loop is invoked.

The inner loop first checks to see if the anchor and floater points are adjacent vertices. If they are, they are marked as points to use in the final polyline, and the inner loop is not invoked (in which case the outer loop starts looking for the next anchor/floater pair).

If the anchor and floater vertices are not adjacent, the inner loop initializes the anchor-to-floater length and unit vector, and then invokes the inner loop, which traverses the line segment one vertex at a time.

For each intermediate point, that point is compared to the anchor, then if necessary compared to the floater.

Comparing to the anchor consists of constructing a vector from the anchor to the point in question, again using simple component subtraction. This is equivalent to the u vector described earlier in this article (in which case the anchor-to-floater vector would be v). Then the projection scalar is calculated. That is the distance from the origin of the vectors to point A in the diagrams above. If that scalar is negative, the length of the u vector is the shortest distance to the line segment and the inner loop goes to the next vertex.

But if the projection scalar is positive, the vertex in question is then compared to the floater point. A vector is constructed from the floater point to the point in question, the projection scalar of the new vector on the floater-to-anchor vector is calculated, and if that projection scalar is negative then the length of the new vector is used as the shortest distance. Otherwise, the perpendicular distance to the line is calculated using the Pythagorean Theorem, and that value is used as the shortest distance from the point to the line segment.

Note that for every vertex, when the minimum distance of that vertex to the anchor/floater line segment is calculated, that minimum distance is compared to a maximum value, and becomes the maximum value if it is greater than the maximum value. That is, the inner loop keeps track of which point has the greatest distance from the anchor/floater line segment. (That maximum value is reset at the beginning of the inner loop.)

After the inner loop reaches and finishes processing the last vertex (the vertex immediately before the floater), it checks to see if the maximum distance is greater than the tolerance value. If it is not greater, then the anchor and floater are marked as points to use in the final polyline. Otherwise, the anchor and floater are marked to be used as anchor and floater points in the next iteration of the outer loop, and the point furthest from the anchor/floater segment is also marked to be used as an anchor and as a floater in the next iteration of the outer loop.

The outer loop terminates when no more anchor points are found.

Note that before the outer loop is initially executed, the first and last points (only) of the polyline are marked as anchor and floater respectively.

Also note that this implementation does not actually delete points. Points are simply marked as suitable for being used. Ignoring such markings lets you access the original polyline. Or if you want to use the reduced polyline, then use only the marked points.

This implementation requires three flags per vertex. One flag specifies whether that point will be used in the final polyline. The other two flags are only needed during processing, and specify if the point will be used as an anchor and/or floater in the next iteration of the outer loop.

An optimization we implement in the source code below is to use a stack instead of the two temporary flags. In this implementation, only one flag per vertex is needed, to specify which vertex will be used in the final generalized polyline. The stack keeps track of anchor and floater pairs, instead of doing that with two temporary flags per vertex. While the order of marking the flags is not important, in the stack scenario it is important to specify the anchor and floater points in the proper order, as is done in the source code example.

Several other optimizations are possible, such as using the square of distances in comparisons, to eliminate usage of the sqrt() function.

This source code example assumes the pnUseFlag integer array is zeroed out before invoking the point reduction function. After the function returns, that array can be examined to determine the array indices of the vertices that are to be in the final polyline. This can then be saved to other data structures, such as a level of detail number for each vertex, as Paul B. Anderson did with MWDBPOLY, which was included on the MicroCAM CD. To review what MWDBPOLY was, here is an excerpt from its documentation:

 “MWDB-POLY was the mapping database used in a Delphi Component called TWorldMap. In 1996 NASA used software that incorporated the TWorldMap component onboard Atlantis and the Russian space station Mir while in orbit, to help identify and capture photographs of the Earth.”

As the MWDBPOLY documentation states, its polylines were provided with 5 levels of detail:

 “Detail level 5 produces the least detailed graphics image. The points at each level of detail are additive to the points at all lower levels. For example, when using detail level 4 the points from both levels 4 and 5 must be used/retrieved …   Note that the same number of polygons exist at all levels of detail …   Actual use of the data in these files has shown that most displays which cover a reasonably large area do not need all of the detail provided at level 1. A large area would be a major portion of the U.S. For large area plots level 3 or 4 is normally sufficient and greatly reduces the number of points which must be processed.”

To develop such a database, you could use multiple invocations of the ReducePoints() function with a different suitable tolerance for each invocation, zeroing out the pnUseFlag array before invoking the function, extracting the indices from that array after invoking the function, and saving a level number for those indices to a tag per vertex as MWDBPOLY did.

Source Code

The following source is provided free in the public domain for use without any attribution or payment. You are free to incorporate this into your software without giving us credit. We are not responsible for use or misuse of this source code.

 struct STACK_RECORD {       int nAnchorIndex, nFloaterIndex;       STACK_RECORD *precPrev; } *m_pStack = null; void StackPush( int nAnchorIndex, int nFloaterIndex ); int StackPop( int *pnAnchorIndex, int *pnFloaterIndex );   void ReducePoints( double *pPointsX, double *pPointsY, int nPointsCount,                   int *pnUseFlag,                   double dTolerance ) { int nVertexIndex, nAnchorIndex, nFloaterIndex; double dSegmentVecLength; double dAnchorVecX, dAnchorVecY; double dAnchorUnitVecX, dAnchorUnitVecY; double dVertexVecLength; double dVertexVecX, dVertexVecY; double dProjScalar; double dVertexDistanceToSegment; double dMaxDistThisSegment; int nVertexIndexMaxDistance;   nAnchorIndex = 0; nFloaterIndex = nPointsCount - 1; StackPush( nAnchorIndex, nFloaterIndex ); while ( StackPop( &nAnchorIndex, &nFloaterIndex ) ) {       // initialize line segment       dAnchorVecX = pPointsX[ nFloaterIndex ] - pPointsX[ nAnchorIndex ];       dAnchorVecY = pPointsY[ nFloaterIndex ] - pPointsY[ nAnchorIndex ];       dSegmentVecLength = sqrt( dAnchorVecX * dAnchorVecX             + dAnchorVecY * dAnchorVecY );       dAnchorUnitVecX = dAnchorVecX / dSegmentVecLength;       dAnchorUnitVecY = dAnchorVecY / dSegmentVecLength;       // inner loop:       dMaxDistThisSegment = 0.0;       nVertexIndexMaxDistance = nAnchorIndex + 1;       for ( nVertexIndex = nAnchorIndex + 1; nVertexIndex < nFloaterIndex; nVertexIndex++ ) {             //compare to anchor             dVertexVecX = pPointsX[ nVertexIndex ] - pPointsX[ nAnchorIndex ];             dVertexVecY = pPointsY[ nVertexIndex ] - pPointsY[ nAnchorIndex ];             dVertexVecLength = sqrt( dVertexVecX * dVertexVecX                   + dVertexVecY * dVertexVecY );             //dot product:             dProjScalar = dVertexVecX * dAnchorUnitVecX + dVertexVecY * dAnchorUnitVecY;             if ( dProjScalar < 0.0 )                   dVertexDistanceToSegment = dVertexVecLength;             else {                   //compare to floater                   dVertexVecX = pPointsX[ nVertexIndex ] - pPointsX[ nFloaterIndex ];                   dVertexVecY = pPointsY[ nVertexIndex ] - pPointsY[ nFloaterIndex ];                   dVertexVecLength = sqrt( dVertexVecX * dVertexVecX                         + dVertexVecY * dVertexVecY );                   //dot product:                   dProjScalar = dVertexVecX * (-dAnchorUnitVecX) + dVertexVecY * (-dAnchorUnitVecY);                   if ( dProjScalar < 0.0 )                         dVertexDistanceToSegment = dVertexVecLength;                   else //calculate perpendicular distance to line (pythagorean theorem):                         dVertexDistanceToSegment =                               sqrt( fabs( dVertexVecLength * dVertexVecLength - dProjScalar * dProjScalar ) );                   } //else                   if ( dMaxDistThisSegment < dVertexDistanceToSegment ) {                         dMaxDistThisSegment = dVertexDistanceToSegment;                         nVertexIndexMaxDistance = nVertexIndex;                   } //if             } //for (inner loop)             if ( dMaxDistThisSegment <= dTolerance ) { //use line segment                   pnUseFlag[ nAnchorIndex ] = 1;                   pnUseFlag[ nFloaterIndex ] = 1;             } //if use points             else {                   StackPush( nAnchorIndex, nVertexIndexMaxDistance );                   StackPush( nVertexIndexMaxDistance, nFloaterIndex );             } //else       } //while (outer loop) } //end of ReducePoints() function   void StackPush( int nAnchorIndex, int nFloaterIndex ) { STACK_RECORD *precPrev = m_pStack; m_pStack = (STACK_RECORD *)malloc( sizeof(STACK_RECORD) ); m_pStack->nAnchorIndex = nAnchorIndex; m_pStack->nFloaterIndex = nFloaterIndex; m_pStack->precPrev = precPrev; } //end of StackPush() function   int StackPop( int *pnAnchorIndex, int *pnFloaterIndex ) { STACK_RECORD *precStack = m_pStack; if ( precStack == null )       return 0; //false *pnAnchorIndex = precStack->nAnchorIndex; *pnFloaterIndex = precStack->nFloaterIndex; m_pStack = precStack->precPrev; free( precStack ); return 1; //true } //end of StackPop() function