You are on page 1of 131

Chapter 3:Spatial Query Languages

What is a query?
What is a Query ?
A query is a “question” posed to a database
Queries are expressed in a high-level declarative manner
• Algorithms needed to answer the query are not specified in the query
Examples:
Mouse click on a map symbol (e.g. road) may mean
• What is the name of road pointed to by mouse cursor ?
Typing a keyword in a search engine (e.g. google, yahoo) means
• Which documents on web contain given keywords?
SELECT S.name FROM Senator S WHERE S.gender = ‘F’ means
• Which senators are female?
What is a query language?
What is a query language?
A language to express interesting questions about data
A query language restricts the set of possible queries
Examples:
Natural language, e.g. English, can express almost all queries
Computer programming languages, e.g. Java,
• can express computable queries
• however algorithms to answer the query is needed
Structured Query Language(SQL)
• Can express common data intensive queries
• Not suitable for recursive queries
Graphical interfaces, e.g. web-search, mouse clicks on a map
• can express few different kinds of queries
An Example World Database
Purpose: Use an example database to learn query language SQL
Conceptual Model
3 Entities: Country, City, River
2 Relationships: capital-of, originates-in
Attributes listed in Figure 3.1
An Example Database - Logical Model

• 3 Relations
Country(Name, Cont, Pop, GDP, Life-Exp, Shape)
City(Name, Country, Pop,Capital, Shape)
River(Name, Origin, Length, Shape)
• Keys
•Primary keys are Country.Name, City.Name, River.Name
• Foreign keys are River.Origin, City.Country
•Data for 3 tables
•Shown on next slide
World database data tables
What is SQL?
SQL - General Information
is a standard query language for relational databases
It support logical data model concepts, such as relations, keys, ...
Supported by major brands, e.g. IBM DB2, Oracle, MS SQL Server, Sybase, ...
3 versions: SQL1 (1986), SQL2 (1992), SQL 3 (1999)
Can express common data intensive queries
SQL 1 and SQL 2 are not suitable for recursive queries
SQL and spatial data management
ESRI Arc/Info included a custom relational DBMS named Info
Other GIS software can interact with DBMS using SQL
• using open database connectivity (ODBC) or other protocols
In fact, many software use SQL to manage data in back-end DBMS
And a vast majority of SQL queries are generated by other software
Although we will be writing SQL queries manually!
Three Components of SQL?
Data Definition Language (DDL)
Creation and modification of relational schema
Schema objects include relations, indexes, etc.
Data Manipulation Language (DML)
Insert, delete, update rows in tables
Query data in tables
Data Control Language (DCL)
Concurrency control, transactions
Administrative tasks, e.g. set up database users, security permissions
Focus for now
A little bit of table creation (DDL) and population (DML)
Primarily Querying (DML)
Creating Tables in SQL
• Table definition
• “CREATE TABLE” statement
• Specifies table name, attribute names and data types
• Create a table with no rows.
• See an example at the bottom
• Related statements
• ALTER TABLE statement modifies table schema if needed
• DROP TABLE statement removes an empty table
Populating Tables in SQL
• Adding a row to an existing table
• “INSERT INTO” statement
• Specifies table name, attribute names and values
• Example:
INSERT INTO River(Name, Origin, Length) VALUES(‘Mississippi’, ‘USA’, 6000)

• Related statements
• SELECT statement with INTO clause can insert multiple rows in a table
• Bulk load, import commands also add multiple rows
• DELETE statement removes rows
•UPDATE statement can change values within selected rows
Querying populated Tables in SQL
• SELECT statement
• The commonly used statement to query data in one or more tables
•Returns a relation (table) as result
• Has many clauses
• Can refer to many operators and functions
• Allows nested queries which can be hard to understand
• Scope of our discussion
• Learn enough SQL to appreciate spatial extensions
•Observe example queries
• Read and write simple SELECT statement
• Understand frequently used clauses, e.g. SELECT, FROM, WHERE
• Understand a few operators and function
SELECT Statement- General Information
• Clauses
•SELECT specifies desired columns
•FROM specifies relevant tables
•WHERE specifies qualifying conditions for rows
•ORDER BY specifies sorting columns for results
•GROUP BY, HAVING specifies aggregation and statistics
•Operators and functions
•arithmetic operators, e.g. +, -, …
•comparison operators, e.g. =, <, >, BETWEEN, LIKE…
•logical operators, e.g. AND, OR, NOT, EXISTS,
•set operators, e.g. UNION, IN, ALL, ANY, …
•statistical functions, e.g. SUM, COUNT, ...
• many other operators on strings, date, currency, ...
SELECT Example 1.
• Simplest Query has SELECT and FROM clauses
• Query: List all the cities and the country they belong to.

SELECT Name, Country


FROM CITY

Result →
SELECT Example 2.
• Commonly 3 clauses (SELECT, FROM, WHERE) are used
•Query: List the names of the capital cities in the CITY table.
SELECT *
FROM CITY
WHERE CAPITAL=‘Y ’

Result →
Query Example…Where clause
Query: List the attributes of countries in the Country relation
where the life-expectancy is less than seventy years.

SELECT Co.Name,Co.Life-Exp
FROM Country Co
WHERE Co.Life-Exp <70

Note: use of alias ‘Co’ for Table ‘Country’

Result →
Multi-table Query Examples
Query: List the capital cities and populations of countries
whose GDP exceeds one trillion dollars.
Note:Tables City and Country are joined by matching “City.Country =
Country.Name”. This simulates relational operator “join” discussed in 3.2

SELECT Ci.Name,Co.Pop
FROM City Ci,Country Co
WHERE Ci.Country =Co.Name
AND Co.GDP >1000.0
AND Ci.Capital=‘Y ’
Multi-table Query Example
Query: What is the name and population of the capital city in the
country where the St. Lawrence River originates?

SELECT Ci.Name, Ci.Pop


FROM City Ci, Country Co, River R
WHERE R.Origin =Co.Name
AND Co.Name =Ci.Country
AND R.Name =‘St.Lawrence ’
AND Ci.Capital=‘Y ’

Note: Three tables are joined together pair at a time. River.Origin is matched
with Country.Name and City.Country is matched with Country.Name. The
order of join is decided by query optimizer and does not affect the result.
Query Examples…Aggregate Staistics
Query: What is the average population of the noncapital cities listed in the
City table?

SELECT AVG(Ci.Pop)
FROM City Ci
WHERE Ci.Capital=‘N ’

Query: For each continent, find the average GDP.

SELECT Co.Cont,Avg(Co.GDP)AS Continent-GDP


FROM Country Co
GROUP BY Co.Cont
Query Example..Having clause, Nested queries
Query: For each country in which at least two rivers originate, find the length
of the smallest river.

SELECT R.Origin, MIN(R.length) AS Min-length


FROM River
GROUP BY R.Origin
HAVING COUNT(*) > 1

Query: List the countries whose GDP is greater than that of Canada.

SELECT Co.Name
FROM Country Co
WHERE Co.GDP >ANY(SELECT Co1.GDP
FROM Country Co1
WHERE Co1.Name =‘Canada ’)
3.4 Extending SQL for Spatial Data
Motivation
SQL has simple atomic data-types, like integer, dates and string
Not convenient for spatial data and queries
• Spatial data (e.g. polygons) is complex
• Spatial operation: topological, euclidean, directional, metric
SQL 3 allows user defined data types and operations
Spatial data types and operations can be added to SQL3
Open Geodata Interchange Standard (OGIS)
Half a dozen spatial data types
Several spatial operations
Supported by major vendors, e.g. ESRI, Intergraph, Oracle, IBM,...
OGIS Spatial Data Model
Consists of base-class Geometry and four sub-classes:
Point, Curve, Surface and GeometryCollection

Operations fall into three categories:


Apply to all geometry types
• SpatialReference, Envelope, Export,IsSimple, Boundary
Predicates for Topological relationships
• Equal, Disjoint, Intersect, Touch, Cross, Within, Contains
Spatial Data Analysis
• Distance,Buffer,Union, Intersection, ConvexHull, SymDiff
Spatial Queries with SQL/OGIS
• SQL/OGIS - General Information
•Both standard are being adopted by many vendors
•The choice of spatial data types and operations is similar
•Syntax differs from vendor to vendor
• Readers may need to alter SQL/OGIS queries given in text to make
them run on specific commercial products
• Using OGIS with SQL
• Spatial data types can be used in DML to type columns
• Spatial operations can be used in DML
• Scope of discussion
• Illustrate use of spatial data types with SQL
• Via a set of examples
List of Spatial Query Examples
• Simple SQL SELECT_FROM_WHERE examples
•Spatial analysis operations
•Unary operator: Area (Q5, pp.68)
•Binary operator: Distance (Q3)
•Boolean Topological spatial operations - WHERE clause
•Touch (Q1, pp. 67)
•Cross (Q2, pp. 68)
•Using spatial analysis and topological operations
•Buffer, overlap (Q4)
•Complex SQL examples
• Aggreagate SQL queries
• Nested queries
Using spatial operation in SELECT clause
Query: List the name, population, and area of each country listed in
the Country table.

SELECT C.Name,C.Pop, Area(C.Shape)AS "Area"


FROM Country C

Note: This query uses spatial operation, Area().Note the use of


spatial
operation in place of a column in SELECT clause.
Using spatial operator Distance
Query: List the GDP and the distance of a country’s capital
city to the equator for all countries.

SELECT Co.GDP, Distance(Point(0,Ci.Shape.y),Ci.Shape) AS "Distance"


FROM Country Co,City Ci
WHERE Co.Name = Ci.Country
AND Ci.Capital =‘Y ’
Using Spatial Operation in WHERE clause
Query: Find the names of all countries which are neighbors of the United
States (USA) in the Country table.

SELECT C1.Name AS "Neighbors of USA"


FROM Country C1,Country C2
WHERE Touch(C1.Shape,C2.Shape)=1
AND C2.Name =‘USA ’

Note: Spatial operator Touch() is used in WHERE clause to join Country table
with itself. This query is an example of spatial self join operation.
Spatial Query with multiple tables
Query: For all the rivers listed in the River table, find the countries through
which they pass.

SELECT R.Name, C.Name


FROM River R, Country C
WHERE Cross(R.Shape,C.Shape)=1

Note: Spatial operation “Cross” is used to join River and Country tables. This
query represents a spatial join operation.

Exercise: Modify above query to report length of river in each country.


Hint: Q6, pp. 69
Example Spatial Query…Buffer and Overlap

Query: The St. Lawrence River can supply water to cities that are
within 300 km. List the cities that can use water from the St.
Lawrence.

SELECT Ci.Name
FROM City Ci, River R
WHERE Overlap(Ci.Shape, Buffer(R.Shape,300))=1
AND R.Name =‘St.Lawrence ’

Note: This query uses spatial operation of Buffer, which is


illustrated in Figure 3.2 (pp. 69).
Recall List of Spatial Query Examples
• Simple SQL SELECT_FROM_WHERE examples
•Spatial analysis operations
•Unary operator: Area
•Binary operator: Distance
•Boolean Topological spatial operations - WHERE clause
•Touch
•Cross
•Using spatial analysis and topological operations
•Buffer, overlap
•Complex SQL examples
• Aggreagate SQL queries (Q9, pp. 70)
• Nested queries (Q3 pp. 68, Q10, pp. 70)
Using spatial operation in an aggregate query
Query: List all countries, ordered by number of neighboring countries.

SELECT Co.Name, Count(Co1.Name)


FROM Country Co, Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
GROUP BY Co.Name
ORDER BY Count(Co1.Name)

Notes: This query can be used to differentiate querying capabilities of simple


GIS software (e.g. Arc/View) and a spatial database. It is quite tedious to carry
out this query in GIS.

Earlier version of OGIS did not provide spatial aggregate operation to support
GIS operations like reclassify.
Using Spatial Operation in Nested Queries
Query: For each river, identify the closest city.

SELECT C1.Name, R1.Name


FROM City C1, River R1
WHERE Distance (C1.Shape,R1.Shape) <= ALL ( SELECT Distance(C2.Shape)
FROM City C2
WHERE C1.Name <> C2.Name
)

Note: Spatial operation Distance used in context of a nested query.


Exercise: It is interesting to note that SQL query expression to find smallest distance
from each river to nearest city is much simpler and does not require nested query.
Audience is encouraged to write a SQL expression for this query.
Nested Spatial Query
Query: List the countries with only one neighboring country. A country is a
neighbor of another country if their land masses share a boundary. According to
this definition, island countries, like Iceland, have no neighbors.
SELECT Co.Name
FROM Country Co
WHERE Co.Name IN (SELECT Co.Name
FROM Country Co,Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
GROUP BY Co.Name
HAVING Count(*)=1)

Note: It shows a complex nested query with aggregate operations. Such queries can be written into
two expression, namely a view definition, and a query on the view. The inner query becomes a view
and outer query is runon the view. This is illustrated in the next slide.
Rewriting nested queries using Views
•Views are like tables
•Represent derived data or result of a query
•Can be used to simplify complex nested queries
•Example follows:
CREATE VIEW Neighbor AS
SELECT Co.Name, Count(Co1.Name)AS num neighbors
FROM Country Co,Country Co1
WHERE Touch(Co.Shape,Co1.Shape)
GROUP BY Co.Name

SELECT Co.Name,num neighbors


FROM Neighbor
WHERE num neighbor = ( SELECT Max(num neighbors)
FROM Neighbor )
Summary
Queries to databases are posed in high level declarative manner
SQL is the “lingua-franca” in the commercial database world
Standard SQL operates on relatively simple data types
SQL3/OGIS supports several spatial data types and operations
Additional spatial data types and operations can be defined
CREATE TYPE statement
Spatial Databases Management System
Value of SDBMS
Traditional (non-spatial) database management systems provide:
Persistence across failures
Allows concurrent access to data
Scalability to search queries on very large datasets which do not fit
inside main memories of computers
Efficient for non-spatial queries, but not for spatial queries
Non-spatial queries:
List the names of all bookstore with more than ten thousand titles.
List the names of ten customers, in terms of sales, in the year 2001
Spatial Queries:
List the names of all bookstores with ten miles of Minneapolis
List all customers who live in Tennessee and its adjoining states
Value of SDBMS – Spatial Data Examples
Examples of non-spatial data
Names, phone numbers, email addresses of people
Examples of Spatial data
Census Data
NASA satellites imagery - terabytes of data per day
Weather and Climate Data
Rivers, Farms, ecological impact
Medical Imaging
Exercise: Identify spatial and non-spatial data items in
A phone book
A cookbook with recipes
Value of SDBMS – Users, Application Domains
Many important application domains have spatial data and
queries. Some Examples follow:
Army Field Commander: Has there been any significant
enemy troop movement since last night?
Insurance Risk Manager: Which homes are most likely to
be affected in the next great flood on the Mississippi?
Medical Doctor: Based on this patient's MRI, have we
treated somebody with a similar condition ?
Molecular Biologist:Is the topology of the amino acid
biosynthesis gene in the genome found in any other sequence
feature map in the database ?
Astronomer:Find all blue galaxies within 2 arcmin of quasars.

Exercise: List two ways you have used spatial data. Which
software did you use to manipulate spatial data?
What is a SDBMS ?
A SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial abstract data types (ADTs)
and a query language from which these ADTs are callable
supports spatial indexing, efficient algorithms for processing
spatial operations, and domain specific rules for query
optimization
Example: Oracle Spatial data cartridge, ESRI SDE
can work with Oracle 8i DBMS
Has spatial data types (e.g. polygon), operations (e.g. overlap)
callable from SQL3 query language
Has spatial indices, e.g. R-trees
SDBMS Example
Consider a spatial dataset with:
County boundary (dashed white line)
Census block - name, area, population,
boundary (dark line)
Water bodies (dark polygons)
Satellite Imagery (gray scale pixels)

Storage in a SDBMS table:


create table census_blocks (
name string,
area float,
population number,
boundary polygon );

Fig 1.2
Modeling Spatial Data in Traditional DBMS

•A row in the table census_blocks (Figure 1.3)


• Question: Is Polyline datatype supported in DBMS?

Figure 1.3
Spatial Data Types and Traditional Databases
Traditional relational DBMS
Support simple data types, e.g. number, strings, date
Modeling Spatial data types is tedious
Example: Figure 1.4 shows modeling of polygon using numbers
Three new tables: polygon, edge, points
• Note: Polygon is a polyline where last point and first point are same
A simple unit sqaure represented as 16 rows across 3 tables
Simple spatial operators, e.g. area(), require joining tables
Tedious and computationally inefficient

Question. Name post-relational database management systems


which facilitate modeling of spatial data types, e.g. polygon.
Mapping “census_table” into a Relational Database

Fig 1.4
Evolution of DBMS technology

Fig 1.5
Spatial Data Types and Post-relational Databases
Post-relational DBMS
Support user defined abstract data types
Spatial data types (e.g. polygon) can be added
Choice of post-relational DBMS
Object oriented (OO) DBMS
Object relational (OR) DBMS
A spatial database is a collection of spatial data types, operators,
indices, processing strategies, etc. and can work with many post-
relational DBMS as well as programming languages like Java, Visual
Basic etc.
How is a SDBMS different from a GIS ?
GIS is a software to visualize and analyze spatial data
using spatial analysis functions such as
Search Thematic search, search by region, (re-)classification
Location analysis Buffer, corridor, overlay
Terrain analysis Slope/aspect, catchment, drainage network
Flow analysis Connectivity, shortest path
Distribution Change detection, proximity, nearest neighbor
Spatial analysis/Statistics Pattern, centrality, autocorrelation, indices of
similarity, topology: hole description
Measurements Distance, perimeter, shape, adjacency, direction
GIS uses SDBMS
to store, search, query, share large spatial data sets
How is a SDBMS different from a GIS ?
SDBMS focusses on
Efficient storage, querying, sharing of large spatial datasets
Provides simpler set based query operations
Example operations: search by region, overlay, nearest
neighbor, distance, adjacency, perimeter etc.
Uses spatial indices and query optimization to speedup queries
over large spatial datasets.
SDBMS may be used by applications other than GIS
Astronomy, Genomics, Multimedia information systems, ...
Will one use a GIS or a SDBM to answer the following:
How many neighboring countries does USA have?
Which country has highest number of neighbors?
Evolution of acronym “GIS”
Geographic Information Systems (1980s)
Geographic Information Science (1990s)
Geographic Information Services (2000s)

Fig 1.1
Three meanings of the acronym GIS
Geographic Information Services
Web-sites and service centers for casual users, e.g. travelers
Example: Service (e.g. AAA, mapquest) for route planning
Geographic Information Systems
Software for professional users, e.g. cartographers
Example: ESRI Arc/View software
Geographic Information Science
Concepts, frameworks, theories to formalize use and
development of geographic information systems and services
Example: design spatial data types and operations for querying
Exercise: Which meaning of the term GIS is closest to the focus of
the book titled “Spatial Databases: A Tour”?
Components of a SDBMS
Recall: a SDBMS is a software module that
can work with an underlying DBMS
supports spatial data models, spatial ADTs and a query
language from which these ADTs are callable
supports spatial indexing, algorithms for processing spatial
operations, and domain specific rules for query optimization
Components include
spatial data model, query language, query processing, file
organization and indices, query optimization, etc.
Figure 1.6 shows these components
We discuss each component briefly in chapter 1.6 and in more
detail in later chapters.
Three Layer Architecture Fig 1.6
1.6.1 Spatial Taxonomy, Data Models
Spatial Taxonomy:
multitude of descriptions available to organize space.
Topology models homeomorphic relationships, e.g. overlap
Euclidean space models distance and direction in a plane
Graphs models connectivity, Shortest-Path
Spatial data models
rules to identify identifiable objects and properties of space
Object model help manage identifiable things, e.g. mountains,
cities, land-parcels etc.
Field model help manage continuous and amorphous
phenomenon, e.g. wetlands, satellite imagery, snowfall etc.
More details in chapter 2.
1.6.2 Spatial Query Language
• Spatial query language
• Spatial data types, e.g. point, linestring, polygon, …
• Spatial operations, e.g. overlap, distance, nearest neighbor, …
• Callable from a query language (e.g. SQL3) of underlying DBMS
SELECT S.name
FROM Senator S
WHERE S.district.Area() > 300

• Standards
• SQL3 (a.k.a. SQL 1999) is a standard for query languages
• OGIS is a standard for spatial data types and operators
• Both standards enjoy wide support in industry
• More details in chapters 2 and 3
Multi-scan Query Example
• Spatial join example
SELECT S.name FROM Senator S, Business B
WHERE S.district.Area() > 300 AND Within(B.location, S.district)
• Non-Spatial Join example
SELECT S.name FROM Senator S, Business B
WHERE S.soc-sec = B.soc-sec AND S.gender = ‘Female’

Fig 1.7
1.6.3 Query Processing
• Efficient algorithms to answer spatial queries
• Common Strategy - filter and refine
• Filter Step:Query Region overlaps with MBRs of B,C and D
• Refine Step: Query Region overlaps with B and C

Fig 1.8
Query Processing of Join Queries
•Example - Determining pairs of intersecting rectangles
• (a):Two sets R and S of rectangles, (b): A rectangle with 2 opposite corners
marked, (c ): Rectangles sorted by smallest X coordinate value
• Plane sweep filter identifies 5 pairs out of 12 for refinement step
•Details of plane sweep algorithm on page 15

Fig 1.9
1.6.4 File Organization and Indices
• A difference between GIS and SDBMS assumptions
•GIS algorithms: dataset is loaded in main memory (Fig. 1.10(a))
•SDBMS: dataset is on secondary storage e.g disk (Fig. 1.10(b))
•SDBMS uses space filling curves and spatial indices
•to efficiently search disk resident large spatial datasets

Fig 1.10
Organizing spatial data with space filling curves
•Issue:
•Sorting is not naturally defined on spatial data
•Many efficient search methods are based on sorting datasets
•Space filling curves
•Impose an ordering on the locations in a multi-dimensional space
•Examples: row-order (Fig. 1.11(a), z-order (Fig 1.11(b))
• Allow use of traditional efficient search methods on spatial data

Fig 1.11
Spatial Indexing: Search Data-Structures
•Choice for spatial indexing:
•B-tree is a hierarchical collection of ranges of linear keys, e.g. numbers
•B-tree index is used for efficient search of traditional data
•B-tree can be used with space filling curve on spatial data
•R-tree provides better search performance yet!
•R-tree is a hierarchical collection of rectangles
•More details in chapter 4

Fig 1.12: B-tree Fig. 1.13: R- tree


1.6.5 Query Optimization
•Query Optimization
• A spatial operation can be processed using different strategies
• Computation cost of each strategy depends on many parameters
•Query optimization is the process of
•ordering operations in a query and
•selecting efficient strategy for each operation
•based on the details of a given dataset
•Example Query:
SELECT S.name FROM Senator S, Business B
WHERE S.soc-sec = B.soc-sec AND S.gender = ‘Female’
•Optimization decision examples
•Process (S.gender = ‘Female’) before (S.soc-sec = B.soc-sec )
•Do not use index for processing (S.gender = ‘Female’)
1.6.6 Data Mining
• Analysis of spatial data is of many types
• Deductive Querying, e.g. searching, sorting, overlays
• Inductive Mining, e.g. statistics, correlation, clustering,classification, …
• Data mining is a systematic and semi-automated search for
interesting non-trivial patterns in large spatial databases

•Example applications include


•Infer land-use classification from satellite imagery
•Identify cancer clusters and geographic factors with high correlation
•Identify crime hotspots to assign police patrols and social workers
1.7 Summary
SDBMS is valuable to many important applications
SDBMS is a software module
works with an underlying DBMS
provides spatial ADTs callable from a query language
provides methods for efficient processing of spatial queries
Components of SDBMS include
spatial data model, spatial data types and operators,
spatial query language, processing and optimization
spatial data mining
SDBMS is used to store, query and share spatial data
for GIS as well as other applications
Spatial Databases
Transformasi spatial database
Guting’s1 definition of a spatial
database
▪ (1) A spatial database system is a
database system
▪ (2) It offers spatial data types in its data
model and query language
▪ (3) It supports spatial data types in its
implementation, providing at least spatial
indexing and efficient algorithms for
spatial join2.
Why use a database for GIS?
▪ GIS are not database systems, they can be
connected to a DBMS.
▪ A GIS cannot efficiently manage large quantities
of non-spatial data (e.g. at government
department level).
▪ They lack ad hoc querying capability (they
provide a restricted form of predefined queries)
▪ They lack indexing structures for fast external
data access (they use in memory techniques).
▪ They lack a 'logic' (e.g. first order logic of the
relational calculus)
Why use a database for GIS?
▪ Databases offer the following functions:
▪ Reliability ▪ Data independence
▪ Integrity: enforces ▪ Data Abstraction
consistency ▪ Self-describing
▪ Security ▪ Concurrency
▪ User views ▪ Distributed capabilities
▪ User interface ▪ High performance
▪ Querying ▪ Supports spatial data
▪ Updating types using ADTs.
▪ Mathematical basis ▪ Alternative: files
Why use a database for GIS?
Data Abstraction- allows users to ignore
unimportant details
View Level – a way of presenting data to a
particular group of users
Logical Level – how data is interpreted when
writing queries
Physical Level – how data is manipulated at
storage level by a computer. Most users are
not interested in the physical level.
Databases use high level
declarative languages (SQL)
▪ Data Definition Language (DDL)
▪ Create, alter and delete data
▪ CREATE TABLE, CREATE INDEX
▪ Data Manipulation Language (DML)
▪ Retrieve and manipulate data
▪ SELECT, UPDATE, DELETE, INSERT
▪ Data Control Languages (DCL)
▪ Control security of data
▪ GRANT, CREATE USER, DROP USER
Spatial Types – OGC Simple
Features for SQL Composed

Sub Type
Geometry SpatialReferenceSystem
Relationship

Point Curve Surface GeometryCollection

LineString Polygon MultiSurface MultiCurve MultiPoint

Line LinearRing
MultiPolygon MultiLineString
Spatial Types – OGC Simple
Features for SQL (*)
Operations OGC Simple Feature Types
OGC Simple Features for SQL1(*)
▪ The OGC SF (roughly AKA ISO 19125-1)
describes 2-D geometry with linear
interpolation between vertices. The simple
feature model consists of a root class
Geometry and its subclasses Point, Curve,
Surface, GeometryCollection. The class
Geometry collection has the subclasses
Multipoint, Multicurve, MultiSurface.
OGC Simple Features for SQL1 (*)
▪ The OGC does not include complexes, a third
dimension, non-linear curves, `networking or
topology (i.e. connectivity information).
▪ Because of it relative simplicity and its support in
both the commercial & open source community
SFSQL is widely used in DBMS and is
supported in many Web applications.
▪ It is expected that newer more sophisticated
standards such as ISO-19107 will gradually
replace OGC SF.
OGC Simple Features for SQL (*)
▪ Brief description
▪ A simple feature is defined to have both spatial
and non-spatial attributes. Spatial attributes are
geometry valued, and simple features are based
on 2D geometry with linear interpolation
between vertices. Each feature is stored as a
row in a database table. This course covers the
OGC: GEOMETRY type with subtypes such as
POINT, LINE, POLYLINE, POLYGON, and
collections of these.
OGC Spatial Relations
▪ Equals – same geometries
▪ Disjoint – geometries share common point
▪ Intersects – geometries intersect
▪ Touches – geometries intersect at common boundary
▪ Crosses – geometries overlap
▪ Within– geometry within
▪ Contains – geometry completely contains
▪ Overlaps – geometries of same dimension overlap
▪ Relate – intersection between interior, boundary or
exterior
Contains Relation
Does the base geometry (small circles) contain the comparison geometry
(big circles)?

For the base geometry to contain the comparison geometry it must be a


superset of that geometry.
Geographic Information Systems and Science, ,Longley,Goodchild,Maguire,Rhind
Geographic Information Systems and Science , Longley,,Goodchild,,Maguire,Rhind

Touches Relation
Does the base geometry (small circles) touch the comparison geometry
(big circles) ?

Two geometries touch when their boundaries intersect. Raise deep


mathematical issues e.g. what is the boundary of a point?, what about
tolerance + or - a metre?
Spatial Methods

▪ Distance – shortest distance


▪ Buffer – geometric buffer
▪ ConvexHull – smallest convex polygon geometry
▪ Intersection – points common to two geometries
▪ Union – all points in geometries
▪ Difference – points different between two
geometries
▪ SymDifference – points in either, but not both of
input geometries
Convex Hull
The convex hull of a set of points is the intersection of all convex sets which
contain the points. A set of points is convex if and only if for every pair of
points p,q in S, the line segment pq is completely contained in S.

Left is convex set and right non-convex set

Convex hulls constructed around


objects.
Operations on themes (*)
▪ Theme projection1: ‘selecting’ some attributes
from the countries theme. Get the Population
and Geometry of European countries
▪ Theme selection: Name and population of
European countries with a population of 50
million or more.
▪ Theme union : European countries with
population less than 10 million joined with those
over 10 million.
▪ Theme overlay: See example
▪ Theme merge : See example
Operations on themes (*)
▪ Theme overlay1: Generates a new theme and
new geometry from the overlaid themes. We get
the geometric intersection of spatial objects with
the required themes. See European language
example.
▪ Theme merge : The merge operation performs
the geometric union of the spatial part of n
geographic objects that belong to the same
theme under a constraint condition supplied by
the user. See East/West Germany example.
Projection on Theme (*)

Find the countries of western Europe with population greater than 50


million. This is a projection on the attribute population. Unlike a
conventional database query we often want the query result and the
original context, in this case Europe.
Theme Merge (*)

Merging two geographic objects in a selected theme (say country)


into a single object.
Theme Overlay (*)

Latin languages
Anglo-Saxon

The lower map represents the overlay of European


countries and languages.
Indexing
▪ Indexing is used to speed up queries and locate
rows quickly
▪ Traditional RDBMS use 1-d indexing (B-tree)
▪ Spatial DBMS need 2-d, hierarchical indexing
▪ Grid
▪ Quadtree
▪ R-tree
▪ Others
▪ Multi-level queries often used for performance
(MBR)
R-tree

Examples of R – Tree Index


of polygons
Minimum Bounding Rectangles

Minimum
Bounding
Rectangle

Study
Area
What can PostGIS do?
Many PostGIS functions available via SQL
Compliant with OGC1 Simple Features Specification
Coordinate Crosses
transformation Within
Identify Overlaps
Buffer Contains
Touches Area
Crosses Length
Within Point on surface
Overlaps Return geometry
Contains as SVG
What can PostGIS do?
PostGIS supports a geometry type which is
compliant with the OGC standard for Simple
Features.
▪ POINT( 50 100 )
▪ LINESTRING ( 10 10, 20 20 )
▪ POLYGON ( ( 0 0, 5 5, 5 0, 0 0 ) )
▪ MULTIPOINT ( ( 1 1 ), ( 0 0 ) )
▪ MULTILINESTRING ( … )
▪ MULTIPOLYGON ( … )
Spatial Database Features

Informix
PostGIS
MySQL
Oracle
DB2
Spatial Objects ✓ ✓ ✓ ✓ ✓
R-Tree Index ✓ ✓ ✓ ✓
Spatial Functions ✓ ✓ ✓ ✓
OpenGIS ✓ ✓ ✓ ✓

Coord Transform ✓ ✓ ✓ ✓
Spatial Aggregates ✓ ✓ ✓ ✓
HOW Spatial Databases Fit into
GIS
LAN
Internet
Editing
GIS
Loading Mapping
Database Web
Analysis Features Client
GIS Other

Image from Paul Ramsey Refractions Research


ProstgreSQL
▪ PostgreSQL itself provides the main
features of a RDBMS. Includes other
advanced features such as:
▪ Inheritance
▪ Functions
▪ Constraints
▪ Triggers
▪ Rules
▪ Transactional integrity
▪ Permits an ‘OO like’ style of programming
PostgreSQL/PostGIS
The data is stored in a relatively simple format with
the attributes and geometry stored in a single
table. Spatial
Attribute Data reference Data type Coordinates
number

name city hrs status st_fed the_geom


SRID=32140;POINT(968024.87474318
Brio Refining Friendswood 50.38 active Fed 4198600.9516049)
SRID=32140;POINT(932279.183664999
Crystal Chemical Houston 60.9 active Fed 4213955.37498466)
SRID=32140;POINT(952855.717021537
North Cavalcade Houston 37.08 active Fed 4223859.84524946)
SRID=32140;POINT(967568.655313907
Dixie Oil Processors Friendswood 34.21 active Fed 4198112.19404211)
SRID=32140;POINT(961131.619598681
Federated Metals Houston 21.28 active State 4220206.32109146)
How does it work?
▪ Spatial data is stored using the coordinate
system of a particular projection.
▪ That projection is referenced with a Spatial
Reference Identification Number (SRID)
▪ This number relates to another table
(spatial_ref_sys) which holds all of the
spatial reference systems available.
▪ This allows the database to know what
projection each table is in, and if need be, re-
project from those tables for calculations or
joining with other tables.
Coordinate Projection
SRID=3005;MULTILINESTRING((1004687.04355194594291.
053764096,1004729.74799931 594258.821943696))

SRID=4326;MULTILINESTRING((125.934150.364070000000
1,-125.9335 50.36378))

Coordinates of one table can be converted to those of another table. This


permits the ‘geometry’ in each table to match. Relatively easy to do in PostGIS
Spatial Database Components
▪ The Geometry metadata table

table geometry coord


schema table name column dim srid type

brazos texas_counties the_geom 2 32139 MULTIPOLYGON

brazos texas_rivers the_geom 2 32139 MULTILINESTRING

brazos texas_roads the_geom 2 32139 MULTILINESTRING

brazos tx_maj_aquifers the_geom 2 32139 MULTIPOLYGON

brazos tx_min_aquifers the_geom 2 32139 MULTIPOLYGON

brazos txzip_codes the_geom 2 32139 MULTIPOLYGON

brazos bz_landmarks the_geom 2 32139 POINT


spatial_ref_sys
▪ postgis=# \d spatial_ref_sys
Table "public.spatial_ref_sys"
Column | Type | Modifiers
-----------+-------------------------+-----------
srid | integer | not null
auth_name | character varying(256) |
auth_srid | integer |
srtext | character varying(2048) |
proj4text | character varying(2048) |

Indexes:
"spatial_ref_sys_pkey" PRIMARY KEY, btree (srid)
geometry_columns
▪ postgis=# \d geometry_columns

Table "public.geometry_columns"
Column | Type | Modifiers
-------------------+------------------------+-----------
f_table_catalog | character varying(256) | not null
f_table_schema | character varying(256) | not null
f_table_name | character varying(256) | not null
f_geometry_column | character varying(256) | not null
coord_dimension | integer | not null
srid | integer | not null
type | character varying(30) | not null
Indexes:
"geometry_columns_pk" PRIMARY KEY, btree
(f_table_catalog, f_table_schema, f
_table_name, f_geometry_column)
Database Rules

▪ Rules help prevent human error when


modifying a data set
▪ Rules are user defined
▪ Rules are such things as;
▪ “A fire hydrant must be located on a water
line”
▪ Rivers should flow down hill.
Constraints
▪ Constraints are similar to rules, but are
less assertive.
▪ Constraints are provided by the DBMS
and are applied by the user
▪ A Constraint would be “Parcel_ID Not Null”
- meaning a number ID has to be provided
when a parcel is created.
Constraints
Constraint GIS examples
Two spatial objects cannot exist at the same point
Uniqueness
Non-Null All Address points must have co-ordinates
Range All heights in Ireland must be in range -100 to 2000 metres
Relationship Every river must be connected to the sea, a lake or other
river (Can rivers cross/)
Cardinality Each side of a triangle has a 1:2 relation with the others
Inclusion All counties are polygons
Covering A boundary may be a townland and/or a barony.
Disjointedness All roads must be only a primary or a secondary or a
regional
Referential Integrity A county border must be represented by a ground feature
Geometrical Triangles must have three sides
Orientation Roads are usually to the front of houses
Topological Inner walls must be "inside" buildings
General Complex rules built from above constraints
Constraints
How can we define in front of?
Data integrity

Valid Invalid

select count(*) from bc_voting_areas


where not isvalid(the_geom);
Dynamic and Static Data
▪ Static non-spatial data is usually maintained in
the table with the geometry (e.g. county name).
In this case the geometry is considered
immutable.
▪ Dynamic non-spatial data is usually maintained
in a separate table.
▪ There can be more than one dynamic table for a
geometry table.
▪ Dynamic spatial can include moving objects or a
changing world (temporal requires different
treatment)
Joins(*)
▪ Dynamic tables can be joined with the
geometry tables for querying purposes
▪ A primary key is used to relate the 2 tables
together
▪ A primary key is a unique identifier for
each row in a table
Primary Key

ID rainfall 10/7/2003 temp Hi 10/7/2003 temp Lo 10/7/2003 ID the_geom


1 0.5 78 68 1 SRID=32140;POINT(968024.87474318 4198600.9516049)
2 0 80 65 2 SRID=32140;POINT(932279.183664999 4213955.37498466)
3 1.02 76 66 3 SRID=32140;POINT(952855.717021537 4223859.84524946)
4 0.23 81 68 4 SRID=32140;POINT(967568.655313907 4198112.19404211)
5 0.18 80 67 5 SRID=32140;POINT(961131.619598681 4220206.32109146)
Spatial Join1
A typical example of spatial join is “Find all pair
of rivers and cities that intersect”. The result of
join between the set of rivers {R1, R2} and cities
{C1, C2, C3, C4, C5} is { (R1, C1), (R2, C5)}.
Temporal Queries
▪ Find where (x,y) and when(t) will it snow :
Clouds(X, Y, T, humidity)
Region(X, Y, T, temperature)

(SELECT x, y, t
FROM Clouds
WHERE humidity >= 80)
INTERSECT
(SELECT x, y, t
FROM Region
WHERE temperature <= 32)
Temporal Example: roads, buildings,
and regions

Consider a line. From the properties of metric spaces it has a length.


Temporal Example: roads, buildings,
and regions

Lets call it a road. From graph theory we have a path


Temporal Example: roads, buildings,
and regions

F1

Purple line segment


represents both a road
and a fence.

Lets add a field (F1) with an area and a topology.


Example: roads, buildings, and regions

F1

Lets add an administrative region (outer red rectangle) and some houses
Example: roads, buildings, and regions

F2 F3

Lets divide the field in two by inserting a new fence.


We need to delete the old area and add two new areas.
What about adjacency relation between fields?
Example: roads, buildings, and regions

Time1 Time2

A
A
F1
F2 F3
B B

Imagine a picture of the world at Time1 and Time2.


Not only have some objects changed but some spatial
relationships have changed.
An addition can induce a deletion and a deletion can induce
an insertion.
Example of temporal queries
• Is there a route from A to B? (now is
assumed)
• Was there a route from A to B in Time1?
• Does the route in query 1 pass through
the administrative region?
• Does the route in query 1 pass touch the
administrative region ?
• What fields were adjacent to F2 in
Time2?
Raster Image Data Not Covered
References
http://www.spatial.cs.umn.edu/Book/

Spatial Databases: Geography Mark-Up


With Application to GIS Language:
Rigaux, Scholl, Voisard Foundation for the
Geo-Web

GIS: A Computing Perspective


Michael Worboys, Matt Duckham

http://www.pragprog.com/titles/sdgis/gis-for-web-developers
References

Applied Spatial Data Analysis with R


Lloyd: Spatial Data Analysis Bivand, Pebesma, Gómez-Rubio

Theories of geographic concepts:


ontological approaches to semantic
integration By Marinos Kavouras, Margarita
Kokla

You might also like