# 95.

5204 :

Spatial Data Structures for GIS
Jörg-Rüdiger Sack

School of Computer Science, Carleton University Ottawa, Canada K1S 5B6, sack@scs.carleton.ca

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Geometric Objects
A geometric object is an object which characterizes a geometric component, i.e., the • location and • shape of the object in space. In addition, there is the attribute component which we will ignore for the discussion in this chapter.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Example
Planar subdivisions for example are collections of polygons which represent towns or municipality regions. The geometric information about the location of the place is stored through the polygon. (Non-geometric information such as name, size, …. are also stored.)

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Operations
There are many operations that need to be carried out on geometric objects, these include: • point in polygon (point location) • traversal of a subregion (window queries) • intersection tests • …. • other operations include: – distance, containment, intersection

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Operations cont’d
1. objects are stored on disc examining, i.e., retrieving all objects is extremely inefficient! 2. checking each object is time-consuming (even after retrieval) as the geometry may be complex. Idea: support spatial queries to geometric objects by realizing a filter, i.e., providing a superset of the solution set and subsequently refine that set to the correct solution.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Filter
Sometimes this approach is referred to as coarse filter fine filter where coarse filter refees to the retrieval of a subset of adjacent objects followed by the fine filter which analyzes geometric properties of the objects.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

The Idea of a Filter
Create a bounding box for 2-d geometric objects. Bounding box: = smallest axis parallel rectangle containing the geometric object The database search key for the geometric object is now that of the bounding box. There are many data structures for multi-dimensional For d dimensional objects, let Ui = universe in the ith dimension. Then U = U1x U2 x U3 … x Ud is the d-dimensional universe containing all geometric objects.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Filter cont’d
G : be a particular set of geometric objects g ε G described as: – g.b d-dim bounding box – g.rest other attributes that are not relevant for the search g = (b, rest) b= (l1, r1, l2, r2,…, ld, rd) d-dim interval [l1, r1] x … x [ld, rd] where b.li : left and r.ri is the right interval boundary of the ith interval. we use: g. li for g.b. li and g. ri for g.b. ri
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

Example
dim 2

r2

l2 dim 1 l1
© Jörg-Rüdiger Sack School of Computer Science Carleton University

r1
Course Notes Computational Aspects of GIS

Task: find a secondary storage structure S supporting the following operations: (1)Range query (2) Search (3) Insert (4)Remove (delete) more formally next

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Rangequery
Rangequery (w, S(G)) range w, G is stored in S report all objects g in G with g.b ∩ w ≠ Ø assumption: two rectangles that only intersect at a boundary do not intersect, i.e., intersection (A,B) := closure (interior of A ∩ interior of B)

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Rangequery cont’d
1 2 7

6 3 4

5 reports: 1, 6, 3, 5
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

Search
Search (b, S(G)) for bounding box b and G stored in S report all objects g in G with g.b =b

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

search - example

the object g (blue) has bounding box matching the query box g

g’

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Search
Insert (g, S(G)) S(G) := S(G U {g}) add g to G and store it in S

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Remove (Delete)
Remove (Delete) (b, S(G)) remove object g is g.b = b and S(G) := S(G \ {g}) remove g from G and store the result

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

1. While uniqueness is somewhat the underlying assumption it does not pose any serious implementation difficulties. 2. For insert, search and delete the key is spatial, but the spatial location is not referenced -> this can be handled by traditional secondary data structures such as B-trees, dynamic hashing, … e.g., map the 2d key components into one 1-dimensional key (lexicographic)

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Thus searchers can be handled!

Problem: Queries of type Rangequery
they are space relevant and the above storage schemes show serious deficiencies

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Objective
Find data structure for geometric objects such as points, polygons etc that allow efficient retrieval. Primary concern: When accessing data, long chains of pointers that are crossing disk block boundaries must! be avoided. Game: design data structures with – small internal memory access structure – efficient dynamically updates
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

Basic Concepts
Basic Concepts for spatial structures access time: DRAM (dynamic random access memory) chips for
personal computers have access times of 50 to 150 nanoseconds (billionths of a second). Fast hard disk drives for personal computers boast access times of about 9 to 15 milliseconds. Note that this is about 200

times slower than average DRAM.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Basic Concepts
Actually many machines have even larger ratios than that.

Typical numbers are: Memory access time (seconds): 10-7 … 10-6 Disc access time (seconds): 10-2 … 10-1 ratio disc/memory access time: 104 … 105

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Basic Concepts

Typical size of transfer unit (bits): Memory : 10 … 102 Disc : 104 … 105 ratio disc/memory transfer size: 102 … 103

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Basic Concepts
The time for an operation is thus determined by the time to retrieve the data + the time required to carry out the local computation. For many operations, # of disc accesses is the dominating factor. However, there are geometric problems where also the internal computations are costly.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Objective
Find data structure for geometric objects such as points, polygons etc that allow efficient retrieval. Primary concern: When accessing data, long chains of pointers that are crossing disk block boundaries must! be avoided. Game: design data structures with – small internal memory access structure – efficient dynamically updates
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

Proximity
Data on discs are seen to be organized in BLOCKS. A block is a unit of data that is retrieved in one shot from a disc. A block contains many data, these should be useful for the algorithm and its execution,. 1. local maintenance of proximity; i.e, physically close in space 2. global maintenance of proximity; objects stored in adjacent blocks are physically close.
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

Proximity
especially the last points is very difficult to obtain. There is no perfect data organization! Even small improvements in that, yield accelerations that are noticeable.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Central issue
Organizing the embedding space versus organizing its content. We will discuss data organizations who are dependent on the data and mostly those who are dependent on the space. This is the key distinction between space and non-spatial data structures.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Non-spatial data structures
Data structures for non-spatial data any search structure that you may have encountered for example: binary search tree. •searches are comparative: •structures exist and are readily available also balanced – AVL, 2-3 trees, red-black trees excellent search structures also for statistical queries including median, percentiles,

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Non-spatial data structures
Such data structures are not designed for, nor can they efficiently handle:
• general location queries – nearest neighbour – identify clusters in data

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

1. Hashing 2. radix trees 3. tries these assign an address of a storage cell to any key value x (course notes)

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

k-d trees
k-d trees were invented by Bentley ’75 as generalizations of search trees i.e. comparative other relevant structures: Lueker 78, Lee&Wong ’77, Willard’78, Bentley’79, Bentley and Maurer’80

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

k-d trees
An example:
x : 50 dim 1 y:4

y : 15

dim 2

dim 3

dim d dim 1 dim 2
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

k-d trees
Problems: • it is hard to balance these structures, i.e., get log height • 1-d is easy • space partitioning created lacks regularity • difficult neighbour queries

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

First approaches
First approaches to spatial data structures • based on the existing search structures • data stored! • not the space in which the data was embedded

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

filter illustration for a rectangular space partitioning
hit query q query cells

report all objects that intersect q ignored

the oval is examined and then droped
© Jörg-Rüdiger Sack School of Computer Science Carleton University

drop

not retrieved

Course Notes Computational Aspects of GIS

Comment
Spatial data structures cover the space with cells. Each cell is stored on disc and therefore is associated with a disc block or blocks.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Three-phase model
Three steps: 1. Cell addressing for a given query find all “cells” of the partitiong that could contain elements relevant to query 2. Coarse filter retrieve the elements found in Step 1 from disc 3. Fine Filter examine the elements (Step 2) if they fit the query
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

Tree-based schemes
Work has been done on the internal memory data structures: segment trees and range trees and how they can be extended external storage. This is not covered here. Could be a good topic for a class presentation.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Three philosophies
1. Space driven: 1. multi-dimensional linear hashing, 2. space filling curves 3. ... 2. Data driven 1. k-d-B-trees 2. …. 3. Combinations 1. grid file and its variants 2. Bang file, ….
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

Linear hashing
viewed as a spatial data structure partition the 1-d data space into intervals
0 0 0 0 4 2 2 5 1 1 1 3 6 3 7

interval sizes half of previous; simple addressing scheme
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

doubling
Doubling is typically adding a bit to the front (or back) of the string created thus far. e.g., in some of the schemes you would see
0 1 added bit this means that when you run out of space a piece of the same size is appended resulting in a doubling of the space used. However address calculations are simple! 00 10 01 11

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

MOLPHE
Multidimensional Order Preserving Linear Hashing

2 0 0 1 0

3

2 5 3 7

1

0 4 1 6

Note the alternation of split in the dimensions. 1st split by x; 2nd split by y; 3rd split again by x-axis. Note also the each block is split.
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

z-hashing
Dynamic z-hashing

1 0 0 1 0

3

2 3 6 7

2

0 1 4 5

Note the addressing function is different to the one given above. The reason is that proximity is better maintained between adjacent blocks.
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

space-filling curves
The above schemes define a traversal of the space. Here we list other space filling curves that are typically used. They have different properties and studies have been carried out on them. E.g., Peano, z-ordering and Hilbert

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

space-filling curves
Hilbert

Z-order G.M. Morton

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Z-order
z-order of a point with coordinate x,y is obtain by bit-wise interleaving of the x and y bits. Ex.: y = 2 = 010 x = 5 = 101 25 = 0 1 1 0 0 1
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

25

Z-order
z-order of a point with coordinate x,y is obtain by bit-wise interleaving of the x and y bits. range queries are possible slight care needs to be taken to find successors of point in zorder
© Jörg-Rüdiger Sack School of Computer Science Carleton University Course Notes Computational Aspects of GIS

Hilbert curve: maping
range queries more natural, but successor function more difficult than with z-ordering.

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Hilbert curve cont’d

direction in which to draw the elements of the Hilbnert curve

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS

Peano

© Jörg-Rüdiger Sack School of Computer Science Carleton University

Course Notes Computational Aspects of GIS