R-Trees

Description

The R-tree is intended for indexing two (and higher) dimensional objects in terms of their minimum bounding rectangles (MBR). Nodes of the tree store MBRs of objects or collections of objects. The leaf nodes of the R-tree store the exact MBRs or bounding boxes of the individual geometric objects, along with a pointer to the storage location of the contained geometry. All non-leaf nodes store references to several bounding boxes for each of which is a pointer to a lower level node. The tree is constructed hierarchically by grouping the leaf boxes into larger, higher level boxes which may themselves be grouped into even larger boxes at the next higher level. Since the original boxes are never subdivided, a consequence of this approach is that the non-leaf node ‘covering boxes’ can be expected to overlap each other. Another drawback of this method is that it does result in a disjointed decomposition of space. The problem is that an object is only associated with one bounding rectangle. In the worst case, this means that when we wish to determine which object is associated with a particular point in the two-dimensional space from which the objects are drawn, we may have to search the entire database.

Searching the R-tree

It consists of comparing the search window with the boxes in each node, starting at the root, and following the child pointers of those boxes that are included in or overlap the ranges of the search window. The procedure is continued, possibly down several branches, until reaching the leaf nodes, the contents of which are then tested against the extent of the search window.

Depth-first searching

A depth-first search (DFS) explores a path all the way to a leaf before backtracking and exploring another path.
For example, after searching A, then B, then D, the search backtracks and tries another path from B.
Node are explored in the order A B D E H L M N I O P C F G J K Q.
M will be found before J.

Breadth-first searching

A breadth-first search (BFS) explores nodes nearest the root before exploring nodes further away.
For example, after searching A, then B, then C, the search proceeds with D, E, F, G.
Node are explored in the order A B C D E F G H I J K L M N O P Q.
J will be found before M.

Variations of R-tree

To cope with the overlapping boxes this method was improved, decomposing the space into disjoint cells, which are mapped into buckets. This method based on disjointness partitions the objects into arbitrary disjoint subobjects and then groups the subobjects in another structure. This data structure is called R+-tree. Overlapping of rectangles is avoided by clipping them against each other, creating additional, smaller rectangles. Each object is associated with all the bounding rectangles that it intersects. All bounding rectangles in the tree are non-overlapping (with the exception of the bounding rectangles for the objects at the leaf nodes). The result is that there may be several paths starting at the root to the same object. This may lead to an increase in the height of the tree. However, retrieval time is sped up.

The R*-tree is a variant of the R-tree which makes use of the most complex of the node splitting algorithms. The algorithm differs from the other algorithms as it attempts to reduce both overlap and coverage. In particular, the primary focus is on reducing overlap with ties broken by favoring the splits that reduce the coverage by using the splits that minimize the perimeter of the bounding boxes of the resulting nodes. In addition, when a node A overflows, instead of immediately splitting A, an attempt is made first to see if some of the objects in A could possibly be more suited to being in another node. This is achieved by reinserting a fraction (30% has been found to yield good performance) of these objects in the tree (termed forced reinsertion). The node is only split if it has been found to overflow after reinsertion has taken place. This method is quite complex.
The insertion algorithm has following advantages:

Minimize node overlap
Minimize area covered by nodes
Minimize perimeters of the rectangles at leaf nodes

R-Tree split

R*-Tree split

Operation with R+-Tree

Searching operation

The idea is to first decompose the search space into disjoint sub-regions and for each of those descend the tree until the actual data objects are found in the leaves.

Find all objects in range

Result: objects 13, 14, 4, 8

Insertion operation

Insertion of a new rectangle in an R+ - tree is done by searching the tree and adding the rectangle in leaf notes. The difference from R-tree is that the input rectangle may be added to more than one leaf node, the reason is that it may be broken to sub-rectangles along existing partitions of the space.

Add object 15 to leaf C

Traverse tree top-down, finding all nodes whose directory root contain object’s MBR
Node is chosen so that enlargement of rectangles is minimal
Repeat until leaf is reached
If leaf is not full then add and adjust
If leaf is full then a new leaf is created and the objects are split (Split Algorithms)

Deletion operation

Deletion of a rectangle from an R+ - tree is done as in R-trees by first locating the rectangle(s) that must be deleted and then removing it (them) from the leaf nodes. The reason that more than one rectangle may have to be removed from leaf nodes is that the insertion routine outlined above may introduce more than one copy for a newly inserted rectangle.

Node Splitting operation

Splitting algorithm is needed to produce two new nodes. We require that the two sub-nodes cover mutually disjoint areas; we search for a “good” partition that will decompose the space into two sub-regions. Contrary to R-tree, downward propagation of the split may be necessary. This is due to that a rectangle R should not be found in a subtree rooted at a node A unless the rectangle associated with A covers R completely. Hence, nodes intersected by the partitions must be split recursively.