You are on page 1of 7

SPATIAL DATA MANAGEMENT

Spacial data management deals with the storage, Indexing, and querying of
data with spacial feature such as location and geometric extent.

TYPES OF SPATIAL DATA


we can classify spatial data into two types.
point data
Region data.

Point Data ƒ :
Points in a multidimensional space ƒ
A point has a spatial extent characterized completely by its
location .
E.g., Raster data such as satellite imagery, where each pixel stores a
measured value ƒ E.g., Feature vectors extracted from text .™

Region Data :
ƒ Objects have spatial extent with location and boundary. ƒ
Region data consists of a collection of regions.
DB typically uses geometric approximations constructed using line
segments, polygons, etc., called vector data.

TYPES OF SPATIAL QUERIES

™ Spatial Range Queries :ƒ


Find all cities within 50 miles of Madison ƒ Query has associated region
(location, boundary) ƒ Answer includes ovelapping or contained data regions

Nearest-Neighbor Queries :ƒ
Find the 10 cities nearest to Madison ƒ Results must be ordered by
proximity ™

Spatial Join Queries :ƒ


Find all cities near a lake ƒ Expensive, join condition involves regions and
proximity.
APPLICATIONS INVOLVED IN SPATIAL DATA MANAGEMENT

Geographic Information Systems (GIS) :
E.g., ESRI’s ArcInfo; OpenGIS Consortium Geospatial information.
ƒ All classes of spatial queries and data are common .

Computer-Aided Design/Manufacturing:
ƒ Store spatial objects such as surface of airplane fuselage.
ƒ Range queries and spatial join queries are common.

Multimedia Databases :
ƒ Images, video, text, etc. stored and retrieved by content ƒ First converted
to feature vector form; high dimensionality ƒ
Nearest-neighbor queries are the most common.

INTRODUCTION TO SPATIAL INDEXES

Single-Dimensional Indexes ™ B+ trees are fundamentally single-


dimensional indexes. ™

When we create a composite search key B+ tree, e.g., an index on <age,


sal>, we effectively linearize the 2-dimensional space since we sort entries first by
age and then by sale.

Consider entries: <11, 80>, <12, 10> <12, 20>, <13, 75>

11 12 13

70 60 50 40 30 20 10 80

B+ tree order

Multidimensional Indexes

™ A multidimensional index clustersentries so as to exploit “nearness” in


multidimensional space. ™

Keeping track of entries and maintaining a balanced index structure presents


a challenge.

Consider entries: <11, 80>, <12, 10> <12, 20>, <13, 75>
Spatial clusters

70 60 50 40 30 20 10 80

B+ tree order

11 12 13

MOTIVATION FOR MULTIDIMENSIONAL INDEXES :

Spatial queries (GIS, CAD).

ƒ Find all hotels within a radius of 5 miles from the conference venue. ƒ
Find the city with population 500,000 or more that is nearest to
Kalamazoo,MI. ƒ
Find all cities that lie on the Nile in Egypt. ƒ
Find all parts that touch the fuselage (in a plane design). ™

Similarity queries (content-based retrieval) :


ƒ Given a face, find the five most similar faces. ™

Multidimensional range queries. ƒ


50 < age < 55 AND 80K < sal < 90K

DIFFICULTY IN SPATIAL DATA :

™ An index based on spatial location needed. ƒ


One-dimensional indexes don’t support multidimensional searching
efficiently.
Hash indexes only support point queries; want to support range queries as
well. ƒ
Must support inserts and deletes gracefully. ™
Ideally, want to support non-point data as well (e.g., lines, shapes). ™ The
R-tree meets these requirements, and variants are widely used today.

INDEXING BASED ON SPACE :


Space-filling curves are based on the assulnption that any attribute value can
be represented with SaIne fixed nUlnher of bits, say k bits.
The maximum number of values along each dirnension is therefore 2k . v\le
consider a two-dimensional dataset for simpicity, although the approach can handle
any number of dimensions.

REGION QUAD TREES AND Z..ORDERING: REGION DATA

Z-ordering gives the way to group points according to spatial proxiInity

The R,egion used tree structure corresponds directly to the recursive


decornpo- sition of the data space. Each node in the tree corresponds to a square-
shaped region of the data space.
As special cases, the root corresponds to the entire data space, and S0111e
leaf nodes correspond to exactly one point. Each in- ternal node has four children,
corresponding to the four quadrants into which the space corresponding to the node
is partitioned: 00 identifies the bottom left quadrant, 01 identifies the top left
quadrant, 10 identifies the bottorn right quadrant, and 11 identifies the top right
quadrant.

SPATIAL QUERIES USING Z-ORDERING

Range queries can be handled by translating the query into a collection of


regions, each represented by a Z-value. (vVe saw how to do this in our discussion
of region data and R,egion Quad trees.) We then search the B+ tree to find
rnatching data iterns.
Nearest neighbor queries can also be handled, although they are a little trickier
because distance in the Z-value space does not always correspond well to dis-
tance in the original X - Y coordinate space (recall the diagonal jumps in the Z-
order curve). The basic idea is to first compute the Z-value of the query and find
the data point with the closest Z-value by using the B+ tree. Then, to rnake sure we
are not overlooking any points that are closer in the X-Y space, we cornpute the
actual distance r between the query point and the retrieved data point and issue a
range query centered at the query point and with radius r. We check all retrieved
points and return the one closest to the query point.

GRID FILE

 A grid file or bucket grid is a point access method which splits a space into a
non-periodic grid where one or more cells of the grid refer to a small set of
points.
 Grid files (a symmetric data structure) provide an efficient method of storing
these indexes on disk to perform complex data lookups.
 A grid file is usually used in cases where a single value can be referenced by
multiple keys.
 A grid file began being used because "traditional file structures that provide
multikey access to records, for example, inverted files, are extensions of file
structures originally designed for single-key access. They manifest various
deficiencies in particular for multikey access to highly dynamic files." [1]
 In a traditional single dimensional data structure (e.g. hash), a search on a
single criterion is usually very simple but searching for a second criterion
can be much more complex.
 Grid files represent a special kind of hashing, where the traditional hash is
replaced by a grid directory.

Advantages

Since a single entry in the grid file contains pointers to all records indexed by
the specified keys:

 No special computations are required


 Only the right records are retrieved
 Can also be used for single search key queries
 Easy to extend to queries on n search keys
 Significant improvement in processing time for multiple-key queries
 Has a two-disk-access upper bound for accessing data

Disadvantages
 Imposes space overhead
 Performance overhead on insertion and deletion
Adapting Grid Files to Handle Regions
There are two basic approaches to handling region data in a Grid file, neither
of which is satisfactory.
First, We can represent a region by a point in a higher diamentional space
The second approach is to store a record representing the region object in
each
grid partition that overlaps the region object.
R TREES: POINT AND REGION DATA

 R-Trees can organize any-dimensional data by representing the data


by a minimum bounding box.
 Each node bounds it’s children. A node can have many objects in it
 The leaves point to the actual objects (stored on disk probably)
 The height is always log n (it is height balanced)

R-Tree Example

Operations
 Searching: look at all nodes that intersect, then recurse into those nodes.
Many paths may lead nowhere
 Insertion: Locate place to insert node through searching and insert.
 If a node is full, then a split needs to be done
 Deletion: node becomes underfull. Reinsert other nodes to maintain balance.

Queries
The generalized search tree (GiST) abstracts the essential features of tree
index structures and provides 'template' algorithms for insertion, deletion, and
searching.
Searches for region objects and range queries are handled sirnilarly by
COluputing
a bounding box for the desired region and proceeding as in the search for
an object.
R‐Tree Search Algorithm

• The search algorithm descends the tree from the root in a


manner similar to a B‐Tree.
• Searching S rectangle
• If T (root Node) is not a leaf, check each entry E to determine
whether EI overlaps S. For all overlapping entries, invoke
Search on the tree whose root node is pointed to by Ep.
• If T is a leaf, check all entries E to determine whether EI
overlaps S. If so, E is a qualifying record.

R‐Tree Insert Algorithm

• Similar to B‐Tree.
• Adds to the appropriate leaf.
• Appropriate leaf can be find by Minimum Bounding Rectangle
idea.(Appropriate leaf)
• If there is a node contains MBR,search subtree until find the
appropriate leaf.
• If leaf node overflows, propogate the tree like B‐Tree.(split vs.)

R‐Tree Delete Algorithm

• Find the node that contains the E(entry).


• Stop if the record was not found.
• If record was found, remove from the tree,
and propogate the tree like B‐Tree

You might also like