Methods

What is R-Tree

A R-Tree, proposed by Antonin Guttman[1], is an index structure for point and spatial data at the same time. Insert, delete and search can be intermixed without periodic reorganization. It uses a tuple to represent a spatial data in the database. In order to retrieve the data, each tuple has a unique identifier, tuple-identifier. At the leaf node of a R-Tree, it has index record that can reference the spatial data. The index record is (I, tuple-identifier). I is an n-dimensional rectangle and it is the bounding rectangle of the spatial data indexed. This rectangle is also known as minimal bounding rectangle, MBR. and each entry in tuple-identifier is the upper and lower bounds, [upper, lower], of the rectangle along the dimension. Non-leaf nodes contain entries (I, childnode-pointer) where I is the minimal rectangle bounding all the rectangles in the lower nodes' entries. Childnode-pointer is the pointer to a lower node in the R-Tree. Let M and m<=M/2 be the maximum and minimum number of entries can be filled into one node respectively.

Properties of R-Tree

A R-Tree satisfies the following properties:

A R-Tree is a height balance tree and all leaves are on the same level.
Root node has at least two children unless it is the leaf node.
Every non-leaf node contains between m and M entries unless it is the root.
For each entries (I, childnode-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains all rectangles in its child nodes.
Every leaf node contains between m and M index records unless it is the root.
For each index record (I, tuple-identifier) in a leaf node, I is the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple.

R*-Tree

The R-tree is based on a heuristic optimization. The optimization criterion is to minimize the area of each enclosing rectangle in the inner nodes. R*-Tree which incorporates a combined optimization of area, margin and overlap of each bounding rectangle in the inner nodes was proposed in [6]. For slightly higher implementation cost, it outperforms the existing R-Tree variants.

Minimizing the area covered by a bounding rectangle should minimize the dead space. This will improve performance since decisions which paths have to be traversed, can be taken on higher level
Minimizing the overlap between bounding rectangles decreases the number of paths to be traversed.
Minimizing the margin of a bounding rectangle will make the rectangle more quadratic. It is because for fixed area, the object with the smallest margin is the square. Quadratic rectangles can be packed easily and thus building a smaller rectangle.

VP-Tree

Conventional spatial index structures divide the multi-dimensional vector space into partitions which have approximately the same number of data points as each other. It facilitates in finding the nearest neighbor of a given query point because it is only necessary to touch a small number of partitions. Most partitioning methods are based on absolute coordinate values of the vector space. R-Tree and R*-Tree described before use this type of partitioning method. The structures partitioned in this way are useful for queries based on absolute coordinates, like range queries. However, in general, it does not maintain any distance information, such as distance between points within a partition and the partition's boundaries. Since this information is critical in pruning the search space for nearest-neighbor search, index structures using partitioning methods based on absolute coordinate are thus not so useful for multi-dimensional nearest-neighbor search.

Nearest-neighbor search by definition is to find out one point with minimum point-to-point distance from a given query point, so it is natural to use partitioning method based on relative distance rather than absolute coordinate values. Vantage-Point tree, or VP-Tree, method was proposed by Peter N.Yianilos. It uses the partitioning method based on relative distance and aims for handling multi-dimensional nearest neighbor search.

As mentioned before, VP-Tree method bases the partitioning on the relative distances among the data points, rather than their absolute coordinate values. It also bases on a particular vantage point. Actually, vantage point is nothing special but a point selected from a vector space, or a set of data points. However, the choice of vantage point plays an important role in the performance of indexing algorithm.

Introduction | Methods | Design | Demostration | References

thip@cs.cuhk.hk & chmak2@cs.cuhk.hk