You are on page 1of 21

R-Trees: A Dynamic Index Structure

for Spatial Data

Antonin Guttman

R-Tree: Why, What ?

Why do we need R-Trees?


What are R-Trees?
How do I perform operations?
Alternatives? Why not a B+ tree?

Properties of R-Trees

Height Balanced
2 types of nodes
Leaves point to disk pages
Records in the leaves point to actual data objects
For a max capacity of M, min occupancy should be
M/2
Completely dynamic
Guaranteed Fan-out of M/2
Every leaf record is a smallest bounding box.
Root has at least two children

R-Trees: The Structure.

Internal nodes : ( rectangle, child pointer)

N dimensional rectangle.
Pointer to all rectangles that are cointained.

Leaf Nodes : (MBR , tuple-identifier)

MBR is minimum bounding rectangle


Tuple-identifier is a pointer to the data object.

R-tree of order 4

Example
mn op

a bc d

ef

g h i

j k l

mn o p

Example
a

a b cd

e f

g h i

jk l

mn o p

Example
a

a b cd

e f

g h i

jk l

mn o p

Example
a

e f

a b cd

g h i

jk l

g
o

h
d

R-Trees: Operations

Inserts
Deletes
Updates ( delete and re-insert)
Queries/Searches

Names of all the roads in 1 sq km area?


Which buildings would be encountered between Rogers
Hall and Reitz Union?
Give me all rectangles that are contained in the input
rectangle.
Give me all rectangles intersecting this rectangle.

Insert

Similar to insertion into B+-tree but may insert


into any leaf; leaf splits in case capacity
exceeded.

Which leaf to insert into? (Choose Leaf)


How to split a node? (Node Split)

Insert: Choose Leaf


n
m

Insert : Choose Leaf


m

Insert: Choose Leaf


n

Insert: Choose Leaf

Insert: Choose leaf

Node Splitting

Quadratic method

Select max area gradient in the nodes as seeds.


Start clustering from the seeds

Linear method

Select seeds with max separation using max x, y


Randomly assign rectangles to seeds

Delete

Search for the rectangle


If the rectangle is found, remove it.
If the node is deficient,

Put the remaining entries in a re-insert queue.


Adjust the parent rectangle if needed.
Continue this till you reach the root.
Re-insert in such a way that all internal nodes remain above the
leaf nodes.

Adjust the rectangles making them smaller.


Alternative sibling combination like a B-tree.

But re-insertion shows similar performance and is simple to


implement.

Performance Tests

R-Trees in C under UNIX on VAX11/780 computer running on


2D data(1057) for 5 page sizes

Linear node split was better than quadratic as expected.


CPU time unchanged with page sizes, indicating that when one
side became full all split algorithms simply put everything in the
other side.
Delete is affected by the fill factor.
Search insensitive to the fill factor and split algorithm used.
Storage space is a function of the fill factor, page size and split
algorithm
All split algorithms came in 10% of the best exhaustive search and
split algorithm.

Performance: 2nd Innings

Same configuration but on various data sizes


1057, 2238, 3295 and 4559 rectangles.

Low CPU cost, close to 150 micro seconds.


Comparable performance of split algorithms
Most space was used by the leaf nodes

Conclusions from the paper.

R-Tree perform well for spatial data with non zero node sizes.
With smaller node structure can be used as an in-memory
spatial data index.

Linear split was almost as good as others.

CPU performance of in-memory R-tree index is comparable and


there is no IO cost.

It was fast.
Node split quality was a bit off-target, but it did not hurt the search
performance noticeably.

Possible use with abstract data types and abstract indexes to


streamline handling of spatial data.

You might also like