Professional Documents
Culture Documents
Chapter 1: Introduction To Spatial Databases: WWW - Spatial.cs - Umn.edu/book/slides/ch1revised
Chapter 1: Introduction To Spatial Databases: WWW - Spatial.cs - Umn.edu/book/slides/ch1revised
1.1 Overview
1.2 Application domains
1.3 Compare a SDBMS with a GIS
1.4 Categories of Users
1.5 An example of an SDBMS application
1.6 A Stroll though a spatial database
1.6.1 Data Models, 1.6.2 Query Language, 1.6.3 Query Processing,
1.6.4 File Organization and Indices, 1.6.5 Query Optimization,
1.6.6 Data Mining
www.spatial.cs.umn.edu/Book/slides/ch1revised.ppt
Learning Objectives
Learning Objectives (LO)
LO1 : Understand the value of SDBMS
• Application domains
• users
• How is different from a DBMS?
LO2: Understand the concept of spatial databases
LO3: Learn about the Components of SDBMS
Exercise: List two ways you have used spatial data. Which
software did you use to manipulate spatial data?
Learning Objectives
Learning Objectives (LO)
LO1 : Understand the value of SDBMS
LO2: Understand the concept of spatial databases
• What is a SDBMS?
• How is it different from a GIS?
LO3: Learn about the Components of SDBMS
Sections for LO2
Section 1.5 provides an example SDBMS
Section 1.1 and 1.3 compare SDBMS with DBMS and GIS
What is a SDBMS ?
A SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial abstract data types
(ADTs) and a query language from which these ADTs are
callable
supports spatial indexing, efficient algorithms for
processing spatial operations, and domain specific rules
for query optimization
Example: Oracle Spatial data cartridge, ESRI SDE
can work with Oracle 8i DBMS
Has spatial data types (e.g. polygon), operations (e.g.
overlap) callable from SQL3 query language
Has spatial indices, e.g. R-trees
SDBMS Example
Consider a spatial dataset with:
County boundary (dashed white line)
Census block - name, area,
population, boundary (dark line)
Water bodies (dark polygons)
Satellite Imagery (gray scale pixels)
Fig 1.2
Modeling Spatial Data in Traditional DBMS
Figure 1.3
Spatial Data Types and Traditional Databases
Traditional relational DBMS
Support simple data types, e.g. number, strings, date
Modeling Spatial data types is tedious
Example: Figure 1.4 shows modeling of polygon using numbers
Three new tables: polygon, edge, points
• Note: Polygon is a polyline where last point and first point are same
A simple unit sqaure represented as 16 rows across 3 tables
Simple spatial operators, e.g. area(), require joining tables
Tedious and computationally inefficient
Fig 1.4
Evolution of DBMS technology
Fig 1.5
Spatial Data Types and Post-relational Databases
Post-relational DBMS
Support user defined abstract data types
Spatial data types (e.g. polygon) can be added
Choice of post-relational DBMS
Object oriented (OO) DBMS
Object relational (OR) DBMS
A spatial database is a collection of spatial data types,
operators, indices, processing strategies, etc. and can work
with many post-relational DBMS as well as programming
languages like Java, Visual Basic etc.
How is a SDBMS different from a GIS ?
GIS is a software to visualize and analyze spatial data
using spatial analysis functions such as
Search Thematic search, search by region, (re-)classification
Location analysis Buffer, corridor, overlay
Terrain analysis Slope/aspect, catchment, drainage network
Flow analysis Connectivity, shortest path
Distribution Change detection, proximity, nearest neighbor
Spatial analysis/Statistics Pattern, centrality, autocorrelation,
indices of similarity, topology: hole description
Measurements Distance, perimeter, shape, adjacency, direction
GIS uses SDBMS
to store, search, query, share large spatial data sets
How is a SDBMS different from a GIS ?
SDBMS focusses on
Efficient storage, querying, sharing of large spatial datasets
Provides simpler set based query operations
Example operations: search by region, overlay, nearest neighbor,
distance, adjacency, perimeter etc.
Uses spatial indices and query optimization to speedup queries over
large spatial datasets.
SDBMS may be used by applications other than GIS
Astronomy, Genomics, Multimedia information systems, ...
Will one use a GIS or a SDBM to answer the following:
How many neighboring countries does USA have?
Which country has highest number of neighbors?
Evolution of acronym “GIS”
Geographic Information Systems (1980s)
Geographic Information Science (1990s)
Geographic Information Services (2000s)
Fig 1.1
Three meanings of the acronym GIS
Geographic Information Services
Web-sites and service centers for casual users, e.g. travelers
Example: Service (e.g. AAA, mapquest) for route planning
Geographic Information Systems
Software for professional users, e.g. cartographers
Example: ESRI Arc/View software
Geographic Information Science
Concepts, frameworks, theories to formalize use and
development of geographic information systems and services
Example: design spatial data types and operations for querying
Exercise: Which meaning of the term GIS is closest to the focus of
the book titled “Spatial Databases: A Tour”?
Learning Objectives
Learning Objectives (LO)
LO1 : Understand the value of SDBMS
LO2: Understand the concept of spatial databases
LO3: Learn about the Components of SDBMS
• Architecture choices
• SDBMS components:
– data model, query languages,
– query processing and optimization
– File organization and indices
– Data Mining
Chapter Sections
1.5 second half
1.6 – entire section
Components of a SDBMS
Recall: a SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial ADTs and a query
language from which these ADTs are callable
supports spatial indexing, algorithms for processing
spatial operations, and domain specific rules for query
optimization
Components include
spatial data model, query language, query processing,
file organization and indices, query optimization, etc.
Figure 1.6 shows these components
We discuss each component briefly in chapter 1.6 and in
more detail in later chapters.
Three Layer Architecture Fig 1.6
1.6.1 Spatial Taxonomy, Data Models
Spatial Taxonomy:
multitude of descriptions available to organize space.
Topology models homeomorphic relationships, e.g. overlap
Euclidean space models distance and direction in a plane
Graphs models connectivity, Shortest-Path
Spatial data models
rules to identify identifiable objects and properties of space
Object model help manage identifiable things, e.g. mountains,
cities, land-parcels etc.
Field model help manage continuous and amorphous
phenomenon, e.g. wetlands, satellite imagery, snowfall etc.
More details in chapter 2.
1.6.2 Spatial Query Language
• Spatial query language
• Spatial data types, e.g. point, linestring, polygon, …
• Spatial operations, e.g. overlap, distance, nearest
neighbor, …
• Callable from a query language (e.g. SQL3) of
underlying DBMS
SELECT S.name
FROM Senator S
WHERE S.district.Area() > 300
• Standards
• SQL3 (a.k.a. SQL 1999) is a standard for query
languages
• OGIS is a standard for spatial data types and operators
• Both standards enjoy wide support in industry
• More details in chapters 2 and 3
Multi-scan Query Example
• Spatial join example
SELECT S.name FROM Senator S, Business B
WHERE S.district.Area() > 300 AND Within(B.location, S.district)
• Non-Spatial Join example
SELECT S.name FROM Senator S, Business B
WHERE S.soc-sec = B.soc-sec AND S.gender = ‘Female’
Fig 1.7
1.6.3 Query Processing
• Efficient algorithms to answer spatial queries
• Common Strategy - filter and refine
• Filter Step:Query Region overlaps with MBRs of B,C and D
• Refine Step: Query Region overlaps with B and C
Fig 1.8
Query Processing of Join Queries
•Example - Determining pairs of intersecting rectangles
• (a):Two sets R and S of rectangles, (b): A rectangle with 2 opposite corners
marked, (c ): Rectangles sorted by smallest X coordinate value
• Plane sweep filter identifies 5 pairs out of 12 for refinement step
•Details of plane sweep algorithm on page 15
Fig 1.9
1.6.4 File Organization and Indices
• A difference between GIS and SDBMS assumptions
•GIS algorithms: dataset is loaded in main memory (Fig. 1.10(a))
•SDBMS: dataset is on secondary storage e.g disk (Fig. 1.10(b))
•SDBMS uses space filling curves and spatial indices
•to efficiently search disk resident large spatial datasets
Fig 1.10
Organizing spatial data with space filling curves
•Issue:
•Sorting is not naturally defined on spatial data
•Many efficient search methods are based on sorting datasets
•Space filling curves
•Impose an ordering on the locations in a multi-dimensional space
•Examples: row-order (Fig. 1.11(a), z-order (Fig 1.11(b))
• Allow use of traditional efficient search methods on spatial data
Fig 1.11
Spatial Indexing: Search Data-Structures
•Choice for spatial indexing:
•B-tree is a hierarchical collection of ranges of linear keys, e.g. numbers
•B-tree index is used for efficient search of traditional data
•B-tree can be used with space filling curve on spatial data
•R-tree provides better search performance yet!
•R-tree is a hierarchical collection of rectangles
•More details in chapter 4
• Examples:
•GIS organize spatial set as a set of layers
•Databases organize dataset as a collection of tables
Why Data Models?
• Data models facilitate
• Early analysis of properties, e.g. storage cost, querying ability, ...
• Reuse of shared data among multiple applications
• Exchange of data across organization
• Conversion of data to new software / environment
• Example- Y2K crisis for year 2000
Many computer software systems were developed without well-defined data
models in 1960s and 1970s. These systems used a variety of data models for
representing time and date. Some of the representations used two digits to
represent years. In late 1990s, people worried that the 2 digit representation of
year may lead to errorneous behaviour. For example age of a person born in
1960 (represented as 60) in year 2000 (represented as 00) may appear
negative and may be flagged as illegal data item. A large amount of effort and
resources (hundreds of Billions of dollars) was spent in revising the software.
Proper use of data model may have significantly reduced the costs. If time and
date were modeled as abstract data types in a software, only a small portion of
the software implementing the date ADT had to be reviewed and revised.
Types of Data Models
•Two Types of data models
•Generic data models
•Developed for business data processing
•Support simple abstract data types (ADTs), e.g. numbers, strings, date
•Not convenient for spatial ADTs, e.g. polygons
•Recall a polygon becomes dozens of rows in 3 tables (Fig. 1.4, pp. 8)
•Need to extend with spatial concepts, e.g. ADTs
•Application Domain specific, e.g. spatial models
•Set of concepts developed in Geographic Info. Science
•Common spatial ADTs across different GIS applications
•Plan of Study
•First study concepts in spatial models
•Then study generic model
•Finally put the two together
Learning Objectives
• Learning Objectives (LO)
• LO1: Understand concept of data models
• LO2 : Understand the models of spatial information
• Field based model
• Object based model
• LO3: Understand the 3-step design of databases
• LO4: Learn about the trends in spatial data models
Green is A interior
U
Red is boundary of A
Concept Sy
m
bo
l
Entities
Attributes
Multi-valued Attributes
Relationships
ER Diagram for “State-Park”
Fig 2.4
•Exercise:
•List the entities, attributes, relationships in this ER diagram
•Identify cardinality constraint for each relationship.
•How many roads “Accesses” a “Forest_stand”? (one or many)
2.2.2 Logical Data Model: The Relational Model
• Relational model is based on set theory
• Main concepts
• Domain: a set of values for a simple attribute
• Relation: cross-product of a set of domains
• Represents a table, i.e. homogeneous collection of rows (tuples)
• The set of columns (i.e. attributes) are same for each row
• Comparison to concepts in conceptual data model
• Relations are similar to but not identical to entities
• Domains are similar to attributes
• Translation rules establishing exact correspondence are discussed in 2.2.3
Relational Schema
• Schema of a Relation
• Enumerates columns, identifies primary key and foreign keys.
• Primary Key :
• one or more attributes uniquely identify each row within a table
• Foreign keys
• R’s attributes which form primary key of another relation S
• Value of a foreign key in any tuple of R match values in some row of S
• Relational schema of a database
• collection of schemas of all relations in the database
• Example: Figure 2.5 (next slide)
• Ablue print summary drawing of the database table structures
• Allows analysis of storage costs, data redundancy, querying capabilities
• Some databases were designed as relational schema in 1980s
• Nowadays, databases are designed as E R models and relational schema is
generated via CASE tools
Relational Schema Example
•Exercise:
•Identify relations with
•primary keys
•foreign keys
•other attributes
•Compare with ER diagram
•Figure 2.4, pp. 37
Fig 2.5
Relational Schema for “Point”, “Line”, “Polygon” and “Elevation”
Fig 2.5
More on Relational Model
• Integrity Constraints
• Key: Every relation has a primary key.
• Entity Integrity: Value of primary key in a row is never undefined
• Referential Integrity: Value of an attribute of a Foreign Key must appear as a value
in the primary key of another relationship or must be null.
Fig 2.7
Specifying Pictograms
•Grammar based approach
•Rewrite rule
•like English syntax diagrams
•Classes of pictograms
•Entity pictograms
•basic: point, line, polygon
•collection of basic
•...
•Relationship pictograms
•partition, network
Entity Pictograms: Basic shapes, Collections
Entity Pictograms: Derived and Alternate Shapes
•Derived shape example is city center point from boundary polygon
•Alternate shape example: A road is represented as a polygon for construction
•or as a line for navigation
2.4 Conceptual Data Modeling with UML
•Motivation
•ER Model does not allow user defined operations
•Object oriented software development uses UML
•UML stands for Unified Modeling Language
•It is a standard consisting of several diagrams
•class diagrams are most relevant for data modeling
•UML class diagrams concepts
•Attributes are simple or composite properties
•Methods represent operations, functions and procedures
•Class is a collection of attributes and methods
•Relationship relate classes
•Example UML class diagram: Figure 2.8
UML Class Diagram with Pictograms: Example
•Exercise: Identify classes, attributes, methods, relationships in Fig. 2.8.
•Compare Fig. 2.8 with corresponding ER diagram in Fig. 2.7.
Fig 2.8
Comparing UML Class Diagrams to ER Diagrams
•Concepts in UML class diagram vs. those in ER diagrams
•Class without methods is an Entity
•Attributes are common in both models
•UML does not have key attributes and integrity constraints
• ERD does not have methods
•Relationships properties are richer in ERDs
•Entities in ER diagram relate to datasets, but UML class diagram
•can contain classes which have little to do with data
2.5 Summary
• Spatial Information modeling can be classed into Field
based and Object based
• Field based for modeling smoothly varying entities, like
rainfall
• Object based for modeling discrete entities, like country
Summary
• A data model is a high level description of the data
• it can help in early analysis of storage cost, data quality
• There are two popular models of spatial information
• Field based and Object based
• Database are designed in 3-steps
• Conceptual, Logical and Physical
• Pictograms can simplify Conceptual data models
Chapter 3:Spatial Query Languages
3.1 Standard Database Query Languages
3.2 Relational Algebra
3.3 Basic SQL Primer
3.4 Extending SQL for Spatial Data
3.5 Example Queries that emphasize spatial aspects
3.6 Trends: Object-Relational SQL
Learning Objectives
• Learning Objectives (LO)
• LO1: Understand concept of a query language
• What is a query language?
• Why use query languages?
• LO2 : Learn to use standard query language (SQL)
• LO3: Learn to use spatial ADTs with SQL
• LO4: Learn about the trends in query languages
•3 Relations
Country(Name, Cont, Pop, GDP, Life-Exp, Shape)
City(Name, Country, Pop,Capital, Shape)
River(Name, Origin, Length, Shape)
• Keys
•Primary keys are Country.Name, City.Name, River.Name
• Foreign keys are River.Origin, City.Country
•Data for 3 tables
•Shown on next slide
World database data tables
RA(Relational Algebra)
• Two distinct elements
• Ωa : set of operands
• Ωa : set of operation
• Basic operation
• Select
• Project
• Union
• Cross-product
• Difference
• intersection
Select and Project Operations
Output of select and Project operation
Set Operations
• Union
RUS
• Difference
R-S
• Intersection
R∩S
• cross-product
RXS
Join Operation
• Conditional Join
• Natural Join
Learning Objectives
• Learning Objectives (LO)
• LO1: Understand concept of a query language
• LO2 : Learn to use standard query language (SQL)
• How to create and populate tables?
• How to query given tables?
• LO3: Learn to use spatial ADTs with SQL
• LO4: Learn about the trends in query languages
• Related statements
• SELECT statement with INTO clause can insert multiple rows in a table
• Bulk load, import commands also add multiple rows
• DELETE statement removes rows
•UPDATE statement can change values within selected rows
Querying populated Tables in SQL
• SELECT statement
• The commonly used statement to query data in one or more tables
•Returns a relation (table) as result
• Has many clauses
• Can refer to many operators and functions
• Allows nested queries which can be hard to understand
• Scope of our discussion
• Learn enough SQL to appreciate spatial extensions
•Observe example queries
• Read and write simple SELECT statement
• Understand frequently used clauses, e.g. SELECT, FROM, WHERE
• Understand a few operators and function
SELECT Statement- General Information
• Clauses
•SELECT specifies desired columns
•FROM specifies relevant tables
•WHERE specifies qualifying conditions for rows
•ORDER BY specifies sorting columns for results
•GROUP BY, HAVING specifies aggregation and statistics
•Operators and functions
•arithmetic operators, e.g. +, -, …
•comparison operators, e.g. =, <, >, BETWEEN, LIKE…
•logical operators, e.g. AND, OR, NOT, EXISTS,
•set operators, e.g. UNION, IN, ALL, ANY, …
•statistical functions, e.g. SUM, COUNT, ...
• many other operators on strings, date, currency, ...
SELECT Example 1.
• Simplest Query has SELECT and FROM clauses
• Query: List all the cities and the country they belong to.
Result 🡪
SELECT Example 2.
• Commonly 3 clauses (SELECT, FROM, WHERE) are used
•Query: List the names of the capital cities in the CITY table.
SELECT *
FROM CITY
WHERE CAPITAL=‘Y ’
Result 🡪
Query Example…Where clause
Query: List the attributes of countries in the Country relation
where the life-expectancy is less than seventy years.
SELECT Co.Name,Co.Life-Exp
FROM Country Co
WHERE Co.Life-Exp <70
Result 🡪
Multi-table Query Examples
Query: List the capital cities and populations of countries
whose GDP exceeds one trillion dollars.
Note:Tables City and Country are joined by matching “City.Country =
Country.Name”. This simulates relational operator “join” discussed in 3.2
SELECT Ci.Name,Co.Pop
FROM City Ci,Country Co
WHERE Ci.Country =Co.Name
AND Co.GDP >1000.0
AND Ci.Capital=‘Y ’
Multi-table Query Example
Query: What is the name and population of the capital city in the
country where the St. Lawrence River originates?
Note: Three tables are joined together pair at a time. River.Origin is matched
with Country.Name and City.Country is matched with Country.Name. The
order of join is decided by query optimizer and does not affect the result.
Exercise
• Write a query to find the names of the customers and
salesman who live in same city.
Salesman table
salesman_id | name | city | commission
-------------+------------+----------+------------
5001 | James Hoog | New York | 0.15
5002 | Nail Knite | Paris | 0.13
5005 | Pit Alex | London | 0.11
5006 | Mc Lyon | Paris | 0.14
5007 | Paul Adam | Rome | 0.13
5003 | Lauson Hen | San Jose | 0.12
Cont...
Customer table
-------------+---------------------+--------------+--------+-------
---------
3002 | Nick Rimando | New York | 100 | 5001
3007 | Brad Davis | New York | 200 | 5001
3005 | Graham Zusi | California | 200 | 5002
3008 | Julian Green | London | 300 | 5002
3004 | Fabian Johnson | Paris | 300 | 5006
Solution
SELECT customer.cust_name,
salesman.name, salesman.city
FROM salesman, customer
WHERE salesman.city = customer.city;
Output
cust_name name city
Nick Rimando James Hoog New York
Brad Davis James Hoog New York
Julian Green Pit Alex London
Fabian Johnson Mc Lyon Paris
Exercise
• Write a query to find the names of all the customer along
with the salesman who works with them.
Solution
SELECT customer.cust_name, salesman.name
FROM customer,salesman
WHERE salesman.salesman_id = customer.salesman_id;
Output
cust_name name
Nick Rimando James Hoog
Brad Davis James Hoog
Graham Zusi Nail Knite
Julian Green Nail Knite
Fabian Johnson Mc Lyon
Exercise
Write a SQL statement to display all those orders by the
customers not located in the same cities where their
salesmen live.
ord_no purch_amt ord_date customer_id salesman_id
---------- ---------- ---------- ----------- -----------
70001 150.5 2012-10-05 3005 5002
70009 270.65 2012-09-10 3001 5005
70002 65.26 2012-10-05 3002 5001
70004 110.5 2012-08-17 3009 5003
70007 948.5 2012-09-10 3005 5002
70005 2400.6 2012-07-27 3007 5001
70008 5760 2012-09-10 3002 5001
70010 1983.43 2012-10-10 3004 5006
70003 2480.4 2012-10-10 3009 5003
70012 250.45 2012-06-27 3008 5002
70011 75.29 2012-08-17 3003 5007
Solution
SELECT ord_no, cust_name, orders.customer_id,
orders.salesman_id
FROM salesman, customer, orders
WHERE customer.city <> salesman.city
AND orders.customer_id = customer.customer_id
AND orders.salesman_id = salesman.salesman_id;
Output
ord_no cust_name customer_id salesman_id
70004 Geoff Cameron 3009 5003
70003 Geoff Cameron 3009 5003
70011 Jozy Altidor 3003 5007
70001 Graham Zusi 3005 5002
Exercise
Write a SQL statement that finds out each order number
followed by the name of the customers who made the
order.
Solution
SELECT orders.ord_no, customer.cust_name
FROM orders, customer
WHERE orders.customer_id = customer.customer_id;
ord_no cust_name
70009 Brad Guzan
70002 Nick Rimando
70004 Geoff Cameron
70005 Brad Davis
70008 Nick Rimando
70010 Fabian Johnson
70003 Geoff Cameron
Query Examples…Aggregate Staistics
Query: What is the average population of the noncapital cities listed in the
City table?
SELECT AVG(Ci.Pop)
FROM City Ci
WHERE Ci.Capital=‘N ’
Query: List the countries whose GDP is greater than that of Canada.
SELECT Co.Name
FROM Country Co
WHERE Co.GDP >ANY(SELECT Co1.GDP
FROM Country Co1
WHERE Co1.Name =‘Canada ’)
Learning Objectives
• Learning Objectives (LO)
• LO1: Understand concept of a query language
• LO2 : Learn to use standard query language (SQL)
• LO3: Learn to use spatial ADTs with SQL
• Learn about OGIS standard spatial data types and operations
• Learn to use OGIS spatial ADTs with SQL
• LO4: Learn about the trends in query languages
Query: The St. Lawrence River can supply water to cities that are
within 300 km. List the cities that can use water from the St.
Lawrence River.
SELECT Ci.Name
FROM City Ci, River R
WHERE Overlap(Ci.Shape, Buffer(R.Shape,300))=1
AND R.Name =‘St.Lawrence ’
Note: It shows a complex nested query with aggregate operations. Such queries can be
written into two expression, namely a view definition, and a query on the view. The inner
query becomes a view and outer query is runon the view. This is illustrated in the next slide.
Rewriting nested queries using Views
•Views are like tables
•Represent derived data or result of a query
•Can be used to simplify complex nested queries
•Example follows:
CREATE VIEW Neighbor AS
SELECT Co.Name, Count(Co1.Name)AS num neighbors
FROM Country Co,Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
GROUP BY Co.Name
• Size of sectors
• Larger sector provide faster transfer of large data sets
• But waste storage space inside sectors for small data sets
Figure 4.3
File Structure : Hash
•Components of a Hash file structure (Fig. 4.2)
• A set of buckets (sectors)
• Hash function : key value --> bucket
• Hash directory: bucket --> sector
• Operations
•find, insert, delete are fast
•compute hash function
•lookup directory
•fetch relevant sector
•findnext, nearest neighbor are slow
•no order among records
Fig 4.2
4.1.5 Spatial File Structures: Clustering
● The goal of clustering is to reduce seek (ts) and latency (tl) time
in answering common large queries.
● Three types of clustering supported by SDBMS to provide
efficient query processing are:
● Internal clustering: to speed up access to single object by storing in
single disk page.
● Local clustering: to speed up access to several objects by storing
grouped set of spatial objects onto one page.
● Global clustering: Spatially adjacent objects stored on several
physically consecutive pages that can be accessed by a single read
request.
Clustering
• Motivation:
•Ordered files are not natural for spatial data
• Clustering records in sector by space filling curve is an alternative
•In general, clustering groups records
•accessed by common queries
•into common disk sectors
•to reduce I/O costs for selected queries
•Clustering using Space filling curves
•Z-curve
•Hilbert-curve
•Details on following 3 slides
Z-Curve
•What is a Z-curve?
• A space filling curve
• Generated from interleaving bits
Fig 4.6
•x, y coordinate
•See Fig. 4.6
•Alternative generation method
•see Fig. 4.5
•Connecting points by z-order
•see Fig. 4.4
•looks like Ns or Zs
•Implementing file operations
•similar to ordered files
Fig 4.4
Z-curve
Example of Z-values
•Figure 4.7
• Left part shows a map with spatial object A, B, C
• Right part and Left bottom part Z-values within A, B and C
•Note C gets z-values of 2 and 8, which are not close
•Exercise: Compute z-values for B.
Fig 4.7
Hilbert Curve
Fig 4.5
• A space filling curve
•Example: Fig. 4.5
•Procedure on pp. 92
Fig 4.8
Handling Regions with Z-curve
Fig 4.9
Learning Objectives
• Learning Objectives (LO)
• LO1: Understand concept of a physical data model
• LO2 : Learn how to efficiently use storage devices
• LO3: Learn how to structure data files
• LO4: Learn how to use auxiliary data-structures
• Concept of index
• Spatial indices, e.g. Grids / Grid-file and R-tree families
• Focus on concepts not procedures!
• LO5: Learn about technology trends in physical data model
Fig 4.13
What is R-tree?
● A Dynamic Index structure for Spatial searching
Fig 4.21
Spatial Join-index Details
Fig 4.22
Fig 4.23
Summary
• Physical DM efficiently implements logical DM on computer hardware
• Physical DM has file-structure, indexes
• Classical methods were designed for data with total ordering
• fall short in handling spatial data
• because spatial data is multi-dimensional
• Two approaches to support spatial data and queries
• Reuse classical method
• Use Space-Filling curves to impose a total order on multi-dimensional data
• Use new methods
• R-trees, Grid files
Ch. 5: Query Processing and Optimization
Exercise:
Propose a few additional building blocks for spatial queries
• besides spatial selection, spatial join and nearest neighbor
• Use GIS operations (Table 1.1, pp. 3) as a guide if needed
Justify the proposal by listing spatial queries needing the component
Detail the proposal by listing a few algorithms for the building block
How would one choose between the available algorithms?
Scope of Discussion
Chapter 5 will discuss
Choice of building blocks for spatial queries
Choice of processing strategies for building blocks
How to choose the “best” strategy from among the applicable ones?
Focus on concepts not procedures
Procedures change with change in computer hardware
Concepts do not change as often
Readers are more likely to remember the concepts after the course
Learning Objectives
Learning Objectives (LO)
LO1: Understand concept of query processing and optimization (QPO)
LO2 : Learn about alternative algorithms to process spatial queries
• What are the building blocks of spatial queries?
• What are common strategies for each building block?
LO3: Learn about query optimizer
LO4: Learn about trends
Focus on concepts not procedures!
Mapping Sections to learning objectives
LO2 - 5.1
LO3 - 5.2, 5.3
LO4 - 5.4, 5,5
Building Blocks for Spatial Queries
Challenges in choosing building blocks
Rich set of data types - point, line string, polygon, …
Rich set of operators - topological, euclidean, set-based, …
Large collection of computation geometric algorithms
• for different spatial operations on different spatial data types
Desire to limit complexity of SDBMS
How to simplify choice of data types and operators?
Reusing a Geographic Information System (GIS)
• which already implements spatial data types and operations
• however may have difficulties processing large data set on disk
SDBMS reduces set of objects to be processed by a GIS
SDBMS is used as a filter
This is filter and refinement approach
The Filter-Refine Paradigm
• Processing a spatial query Q
•Filter step : find a superset S of object in answer to Q
•Using approximate of spatial data type and operator
•Refinement step : find exact answer to Q reusing a GIS to process S
•Using exact spatial data type and operation
Fig 5.1
Approximate Spatial Data types
Approximating spatial data types
Minimum orthogonal bounding rectangle (MOBR or MBR)
• approximates line string, polygon, …
• See Examples below (Bblack rectangle are MBRs for red objects)
MBRs are used by spatial indexes, e.g. R-tree
Algorithms for spatial operations MBRs are simple
• A site-seeing trip
•Start : A SQL Query
•End: An execution plan
•Intermediate Stopovers
•query trees
•logical tree transforms
•strategy selection
• What happens after the journey?
•Execution plan is executed
•Query answer returned
Fig 5.2
Query Trees
• Nodes = building blocks of (spatial) queries
• See section 3.2 (pp.55) for symbols sigma, pi and join
• Children = inputs to a building block
• Leafs = Tables
• Example SQL query and its query tree follows:
Fig 5.3
Logical Transformation of Query Trees
• Motivation
• Transformation do not change the answer of the query
• But can reduce computational cost by
• reducing data produced by sub-queries
• reducing computation needs of parent node
• Example Transformation
• Push down select operation below join
• Example: Fig. 5.4 (compare w/ Fig 5.3, last slide)
• Reduces size of table for join operation
• Other common transformations
• Push project down
• Reorder join operations
• ...
Fig 5.4
Logical Transformation and Spatial Queries
• Traditional logical transform rules
•For relational queries with simple data types and operations
• CPU costs are much smaller and I/O costs
• Need to be reviewed for spatial queries
• complex data types, operations
• CPU cost is hgher
•Example:
• Push down spatial selection beow join
• May not decrease cost if
•area() is costlier than distance()
Fig 5.5
Execution Plans
An execution plan has 3 components
A query tree
A strategy selected for each non-leaf node
An ordering of evaluation of non-leaf nodes
Example
Strategies for Query tree in Fig. 5.5
• Use scan for Area(L.Geometry) > 20 Fig 5.5
• Use index for Fa.Name = ‘Campground’
• Use space-partitioning join for
– Distance(Fa, L) < 50
• Use on-the-fly for projection
Ordering
• As listed above
Choosing strategies for building blocks
A priority scheme
Check applicability of each strategies given file-structures and indices
Choose highest priority strategy
This procedure is fast, Used for complex queries
Rule based approach
System has a set of rules mapping situations to strategy choices
Example: Use scan for range query if result size > 10 % of data file
Cost based approach
See next slide
Choosing strategies for building blocks - 2
Cost model based approach
Single building block
• Use formulas to estimate cost of each strategy, given table size etc.
• Choose the strategy with least cost
• Example cost models for spatial operation in section 5.3
A query tree
• Least cost combination of strategy choices for non-leaf nodes
• Dynamic programming algorithm
Commercial practice
RDBMS use cost based approach for relational building blocks
But cost models for spatial strategies are not mature
Rule based approach is often used for spatial strategies
Learning Objectives
Learning Objectives (LO)
LO1: Understand concept of query processing and optimization (QPO)
LO2 : Learn about alternative algorithms to process spatial queries
LO3: Learn about query optimizer
LO4: Learn about trends
• Impact of Distributed, Web-based, Parallel Computing Environment
Focus on concepts not procedures!
Mapping Sections to learning objectives
LO2 - 5.1
LO3 - 5.2, 5.3
LO4 - 5.4, 5,5
Trends in Query Processing and Optimization
Motivation
SDBMS and GIS are invaluable to many organizations
Price of success is to get new requests from customers
• to support new computing hardware and environment
• to support new applications
New computing environments
Distributed computing (Section 5.4)
Internet and web (Section 5.4)
Parallel computers (Section 5.5)
New applications
Location based services, transportation (Chapter 6)
Data Mining (Chapter 7)
Raster data (Chapter 8)
5.4 Distributed Spatial Databases
Distributed Environments
Collection of autonomous heterogeneous computers
Connected by networks
Client-server architectures
• Server computer provides well-defined services
• Client computers use the services
New issues for SDBMS
Conceptual data model -
• Translation between heterogeneous schemas
Logical data model
• Naming and querying tables in other SDBMSs
• Keeping copies of tables (in other SDBMs) consistent with original table
Query Processing and Optimization
• Cost of data transfer over network may dominate CPU and I/O costs
• New strategies to control data transfer costs
5.4 Internet and (World-wide-)web
Internet and Web Environments
Very popular medium of information access in last few years
A distributed environment
Web servers, web clients
• Common data formats (e.g. HTML, XML)
• Common communication protocols (e.g. http)
• Naming - uniform resource locator (url), e.g. www.cs.umn.edu
New issues for SDBMS
Offer SDBMS service on web
Use Web data formats, communication protocols etc.
• Example on next slide
Evaluate and improve web for SDBMS clients and servers
5.4 Web-based Spatial Database Systems
• SDBMS on web
•MapServer case study
• SDBMS talks to a web server
• web server talks to web clients
•Commercial practice
•Several web based products
•Web data formats for spatial data
•GML
•WMS
•Fig 5.10
5.5 Parallel Spatial Databases
Parallel Environments
Computer with multiple CPUs, Disk drives (See Fig. 5.11 for examples)
All CPUs and disk available to a SDBMS
Can speed-up processing of spatial queries!
Fig 5.11
5.5 Parallel Spatial Databases - 2
New issues for DBMS
Physical Data Model
• Declustering: How to partition tables, indices across disk drives?
Query Processing and Optimization
• Query partitioning: How to divide queries among CPUs?
• Cost model of strategies on parallel computers
Exmaple: Techniques for declustering (Fig. 5.12)
Simple technique: round robin based on an order (space filling curve)
Disk
Declustering for Data Partitioning
• Exmaple
• A Simple Techniques for declustering (Fig. 5.12)
•1. Order the spatial objects using a space filling curve
•2. Allocate to disk drives in a round robin manner
• Effective for point objects, e.g. pixels in an image
• Many queries, e.g. large MBRs are parallelized well
•Ex. Consider a query to retrieve dat in bottom-left quarter of the space
•Two data points retrieved fromeach disk drive for Z-curve
A Case Study: High Performance GIS
Goal: Meet the response time constraint for real
time battlefield terrain visualization in flight
•
simulator.
Methodology:
Data-partitioning approach
Evaluation on parallel computers,
e.g. Cray T3D, SGI Challenge.
Significance:
A major improvement in capability of
geographic information systems for determining
the subset of terrain polygons within the view
point (Range Query) of a soldier in a flight
simulator using real geographic terrain data set.
Graphs represent entities as nodes and the ways in which those entities relate to the
world as relationships.
Why graphs are important?
• Modeling of biological data
• Road network data
• Social network data
• Hierarchical data
• The web data
What is Graph Database?
Graph data model means that data are modelled such a graph.
What is Graph Database?
A database for storing, managing and querying highly connected
and complex data.
OR
MATCH (a:Person {name:'Jim'})-[:KNOWS]->(b)-[:KNOWS]->(c),
(a)-[:KNOWS]->(c)
RETURN b, c
Cypher clauses
• WHERE
• CREATE and CREATE
UNIQUE
• MERGE
• DELETE
• SET
• FOREACH
• UNION
• WITH
• START
Graph model for data center deployment
Query Processing in Graph Database
Summary
• A data model is a high level description of the data
• it can help in early analysis of storage cost, data quality
• There are two popular models of spatial information
• Field based and Object based
• Database are designed in 3-steps
• Conceptual, Logical and Physical
• Pictograms can simplify Conceptual data models
Chapter 7
Chapter 6: Building Graph Database Application
6.1 Data Modeling
6.2 Application Architecture
6.3 Testing
6.4 Graph Database Internals:
6.4.1 Native Graph Processing
6.4.2 Native Graph Storage
6.4.3 Advances in the domain
Data Modeling
Graph data modeling is the process in which a user describes an arbitrary domain as
a connected graph of nodes and relationships with properties and labels
Data model for the book reviews user story
AS A reader who likes a book, I WANT to know which books
other readers who like the same book have liked, SO THAT I can
find other books to read.
MATCH (:Reader {name:'Alice'})-[:LIKES]->(:Book {title:'Dune'})
<-[:LIKES]-(:Reader)-[:LIKES]->(books:Book)
RETURN books.title
Model facts as Nodes
When two or more domain entities interact for a period of time, a
fact emerges.
Eg: Employment
CREATE (:Person {name:'Ian'})-[:EMPLOYMENT]->
(employment:Job {start_date:'2011-01-05'})
-[:EMPLOYER]->(:Company {name:'Neo'}),
(employment)-[:ROLE]->(:Role {name:'engineer'})
Cont...
How the fact that William Hartnell played The Doctor in the story
The Sensorites can be represented in the graph.
CREATE (:Actor {name:'William Hartnell'})-[:PERFORMED_IN]->
(performance:Performance {year:1964})-[:PLAYED]->
(:Role {name:'The Doctor'}),
(performance)-[:FOR]->(:Story {title:'The Sensorites'})
Cont...
Ian emailed Jim, and copied in Alistair
CREATE (:Person {name:'Ian'})-[:SENT]->(e:Email {content:'...'})
-[:TO]->(:Person {name:'Jim'}),
(e)-[:CC]->(:Person {name:'Alistair'})
Cont...
How the act of Alistair reviewing a film can be represented in the
graph.
CREATE (:Person {name:'Alistair'})-[:WROTE]->
(review:Review {text:'...'})-[:OF]->(:Film {title:'...'}),
(review)-[:PUBLISHED_IN]->(:Publication {title:'...'})
Represent Complex Value Types as Nodes
Value types are things that do not have an identity, and whose
equivalence is based solely on their values.
Eg: Time.