You are on page 1of 59

Lecture 5 & 6

Space is defined as relation on a set of objects.

Metric space refers to a set of locations with co-ordinates (Xi, Yi)


Co-ordinates (Xi, Yi) defines the position of geographic objects.

A metric space is a set where a notion of distance (called a metric)


between elements of the set is defined. The metric space which most
closely corresponds to our intuitive understanding of space is the 3-
dimensional Euclidean space.

Euclidean space refers to the study of relationships among


distances and angles, first in a plane and then in a space and this
relationship are known as two- and three- dimensional Euclidean
geometry.

An n-dimensional space with notions of distance and angle that


obey the Euclidean relationships is called an n-dimensional
Euclidean space.
2D and 3D Cartesian coordinate systems provide the mechanism for
describing the geographic location and shape of features using x
and y values (by using columns and rows in rasters).

N Two axes:
•One horizontal (x), representing east-west,
•One vertical (y), representing north-south.

Origin: The point at which the axes


intersect (0,0).

W E Locations of geographic objects are


defined relative to the origin, using the
notation (x,y), where x refers to the
distance along the horizontal axis, and y
S refers to the distance along the vertical
2D coordinate axis.
systems (4, 3) records a point that is 4 units over in
x and 3 units up in y from the origin.
Apart from (x,y) also use a Z value to measure elevation above or
below mean sea level i.e. (x,y,z).

(2,3,4) indicates 2 units in x and 3 units in y


from (0,0) and whose elevation is 4 units
above the earth's surface (such as 4 meters
above mean sea level)

3D coordinate systems
Euclidean space is more than just a real coordinate space. The
distances between points and the angles between lines or vectors
can be measured as :

This distance function is called the Euclidean metric. It can be


viewed as a form of the Pythagorean theorem.

Real coordinate space together with this Euclidean structure is


called Euclidean space and often denoted En

Euclidean space is a metric space, it is also a topological space with


the natural topology induced by the metric.
The Euclidean distance between points for n-dimension P(P1,P2,
….Pn.) and Q (Q1,Q2,………,Qn)

One-dimensional distance

For two 1D points, P (Px ) and Q (Qx), the distance is computed as:

Two-dimensional distance
For two 2D points P (Px, Py) and Q (Qx, Qy),
the distance is computed as:

Three-dimensional distance
For two 3D points P (Px, Py, Pz) and Q (Qx, Qy, Qz), the distance computed as:
Geographically referenced data refers to data referenced by location
on Earth (e.g., latitude/longitude, northing/easting) in some standard
format.

Geographic information contains:


• Either an explicit geographic reference (latitude and longitude or
national grid co-ordinate),
•Or an implicit reference (an address, postal code, census tract name,
forest stand identifier, or road name).
Geographic Data
(Geographically referenced data, identified according to location)

Spatial data Non-spatial data

Raster Vector

MAP DATABASE
Introduction to Spatial Data

•Spatial data represents Spatial information in 2, 3 or 4 dimensions


etc.

•Geographic information is a subset of spatial information.

•The data that indicates the Earth location (latitude and longitude, or
height and depth) of these rendered objects is the spatial data.

•When the map is rendered, this spatial data is used to project the
locations of the objects on a two-dimensional piece of paper.

•A GIS is often used to store, retrieve, and render this Earth-relative


spatial data.
All Spatial features recorded as Geographic Primitives with
several primary characteristics

Points (0-D. no length or width) are represented as a single “Dot” on


the map.
• Points are used to indicate discrete locations.
•They have no length or area at the given scale, only position in space.
•They usually have a single X, Y coordinate.
•Used to represent a feature that is too small to be displayed as a line
or area.

Lines/Lines (1-D, length, no width) are ordered sets of points that


represents straight line or a curved arc depending upon the feature it
describes. Besides having a position in space, they also have a length.

• They are accompanied by a set of coordinates.

• They are used to represent a geographical feature that is too narrow


to have area, such as a stream or a road.
Polygons/areas (2-D, length and width / area and perimeter) are
closed features whose boundary encloses a homogenous area. not
only a position in space and a length but also a width

•They have an area that is given by the arcs/lines that make the
boundary.

•They are used to represent features that have area (e.g. lakes,
large cities and islands)

Surfaces (3-D Areas with Z dimension) represents continuous value.


Represents spatial objects with not only a position in space, a length
and a width, but also a depth or height (in other words they have a
volume).
There are 2 basic spatial data types representing the real world
Raster and Vector
•Points: single cells, unique/
Raster Data
known values;
•Lines: Strings of cells with
common values;
•Polygons/areas: groups of
cells with common values;
•Surfaces: cells represent
real or virtual elevations;

•Points
•Lines
•Polygons
•TINs
Vector Data
Raster: matrix of cells (pixels) referenced by row/column,
stored as a matrix or array;
•For geo-referenced rasters, every cell represents a given area on
the ground (resolution). The smaller the area the cells represent, the
larger the data set size for a given area.

•Raster cell values represent nominal, ordinal, or continuous data.


Numbers in cells can be integer or floating point.

•Raster attributes are the data set.


In the ArcGis grid data model, data tables can store additional
information about nominal/categorical data, in the
Value Attribute Table (VAT). VATs store information about the
categories, not about individual cells:

Value Count Name Suitability Type


2 30672 Cropland & Pasture 4 Agriculture
3 3339 Urban & Industrial 5 Urban
10 212 Clearings & bush fields 5 Cleared
21 1383 Cottonwood 4 Riparian
463 142 Ash Cottonwood 3 Woodland
476 7205 Oak 3 Forest
585 1112 Mixed evergreen 2 Forest
broadleaf
Raster data are good at:
•Representing continuous data (e.g., slope, elevation, chemical
concentrations).

•Representing multiple feature types (e.g., points, lines, and


polygons) as single feature types (cells).

•Rapid computations ("map algebra") in which raster layers are


treated as elements in mathematical expressions analysis of multi-
layer or multivariate data (e.g., satellite image processing and
analysis) is possible.

•Hogging disk space.


Vector is a data structure, used to store spatial data in a discrete
Cartesian x,y coordinates

•Sizes of lines or areas vary, as


they trace surface phenomena.

•Data stored as pairs of x,y


coordinates, usually with ID
numbers; data typically stored in
separate data tables.

•In ArcGis, except in polygon


coverages, the data tables
contain exactly as many records
as there are unique features in
the data set.
•Points: id (x, y);
•Lines: id (x1,y1, ... xn, yn)
•Polygons: id (x1,y1 ... xn, yn),
where xn=x1, yn=y1 (closed);
•Surfaces: represented by
Triangulated Irregular Networks
(TINS)
A vector based GIS is defined by the vectorial representation of its
geographic data. According with the characteristics of this data
model, geographic objects are explicitly represented (Spatial) and,
within the spatial characteristics, the thematic aspects are
associated (Thematic).

Vectorial systems are composed of two components:


•One that manages spatial data
•One that manages thematic data.

This is the named hybrid organization system, as it links a relational


data base for the attributes with a topological one for the spatial
data.
A key element in these kind of systems is the identifier of every
object. This identifier is unique and different for each object and
allows the system to connect both data bases.
Vector data are good at:

•Accurately representing true shape and size

•Representing non-continuous data (e.g., rivers, political


boundaries, road lines, mountain peaks)

•Creating aesthetically pleasing maps

•Conserving disk space


TOPOLOGY
Topology refers to the spatial relationships between geographic
features. It describes the relationships between connecting or
adjacent coverage features. Topological relationships are built from
simple elements into complex elements: points (simplest elements),
arcs (sets of connected points), areas (sets of connected arcs), and
routes (sets of sections, which are arcs or portions of arcs).

Topology is useful in GIS because many spatial modeling operations


don't require coordinates, only topological information. For example,
to find an optimal path between two points requires a list of the arcs
that connect to each other and the cost to traverse each arc in each
direction. Coordinates are only needed for drawing the path after it is
calculated.
Components of Topology: Topology has three basic components:

I. Connectivity (Arc – Node Topology):

o Points along an arc that define its shape are called Vertices.

o Endpoints of the arc are called Nodes.

o Arcs join only at the Nodes.

II. Area Definition / Containment (Polygon – Arc Topology):

o An enclosed polygon has a measurable area.

o Lists of arcs define boundaries and closed areas are maintained.

o Polygons are represented as a series of (x , y) coordinates that


connect to define an area.
III. Contiguity:
o Every arc has a direction
o A GIS maintains a list of Polygons on the left and right side of each
arc.
o The computer then uses this information to determine which
features are next to one another.
Connectivity Containment / Area Contiguity
Definition.
Node Arcs Polygon Arcs Arc Left & Right
Polygons
1 a1, a2, a6 A a1, a2, a3 a1 A/D

2 a2, a3, a5 B a2, a5, a6 a2 A/B

3 a1, a3, a4 C a3, a4, a5 a3 A/C

4 a4, a5, a6 D A1, a4, a6 a4 C/D


Explanation of Topology
a5 B/C
a6 B/D
The vector data attributes are also held in database tables. Because
the vector data represent both linear and polygonal features, there
will be 2 attribute tables (Polygon attribute & Line attribute).

Polygon attribute

Line attribute
Non-Spatial /Attribute Data: The attributes refer to the properties
of spatial entities. They are often referred to as non-spatial data
since they do not in themselves represent location information.
Attribute data are mainly database information corresponding to
the geographic features under consideration.

•This type of data describes characteristics of the spatial features.

•These characteristics can be quantitative and/or qualitative in


nature.

•Attribute data is often referred to as tabular data and linked to the


feature by a unique identifier. For example, attributes of a river
might include its name, length, and sediment load at a gauging
station.
•Non-spatial data can be joined to geocoded files with matching
attributes and displayed as regular maps. E.g. census information
such as race or income, non-inherently spatial data, can be displayed
as maps.

•By drawing on cartographic metaphors and representing non-spatial


data as maps, or "information maps," the information in non-spatial
data can be "spatialized," analyzed, browsed, and processed using
GIS and cartographic methods, then shared on the web using internet
map servers.

• Non-spatial data often has no corresponding geocoded


representation; yet valuable information may still be derived if the
right representation can be found.

•Non-spatial (Non-graphic) Database: Set of tabular data records,


each record containing multiple data fields. In the context of
spatial databases, one of these fields is the Unique ID Number of
a corresponding map feature.
• Attribute values in a GIS are stored as Relational Database tables.

• Each feature (point, line, polygon, or raster) within each GIS layer
will be represented as a record in a table.

• Each cell has a coordinate representation within the table and a


numeric value (i.e., LU_CODE). Each LU_CODE is associated with
a full description through a relational join.
GIS Data Formats & Structures
GIS DATA MODELS:
A GIS is based on data. A data set may be stored in more than one
format to ensure that the data can meet a range of business needs
and software access requirements of users. There are types of
standard data model that store GIS data. They are:

Data formats:

Vector
Lattice/Grid/Raster
Image 1. Spatial Data Models
TIN
2. Attribute Data Models
ASCII
DWG/DXF
Tabular Databases
GeoDatabase
SPATIAL DATA MODELS:

Spatial data has been stored and presented in the form of a map. Three
basic types of spatial data models have evolved for storing geographic
data digitally. These are referred to as:

•Lattice/Grid/Raster

•Vector

•Image
Lattice and Grid (Raster):
•Describe a data format that stores positional (horizontal) location
information in a row-column (Cartesian) structure (pixels), a highly
efficient data storage, access, and manipulation format.

•Some grids may store multiple attributes just like vector data, grids
usually store only a single numerical value.

•Store 2D & 3D information

•Users with a strong demand for analyzing and manipulating grid


data, will require Spatial Analyst or similar extensions to their GIS
software.
Image
•Images are really just a flavor of a grid or raster.

•Image is usually means orthophotography (i.e., aerial or high-


resolution satellite imagery).

•Images store their positional, x, y, location information in a pixel by


pixel pattern just like grids.

•The ‘Z’ value is a number which is interpreted by software as a shade


of gray, as in a panchromatic image, or a Red-Blue-Green color pattern
as in color photography.

•The ‘Z’ value is just a number so it can be manipulated as in a grid,


allowing image analysis to be performed or imagery color or display
characteristics to be modified.
•Imagery provides a key cartographic role serving as an up-to-date

background to other vector datasets.

•Because of the common usage of imagery in GIS, most software


supports a range of image file types such as TIF, IMG, PIX etc., with
installed or no-cost extensions.
Data storage in Raster/image/Grid

•Data are stored in binary format (0,1)

•Simple binary data values uses meaning that the possibilities are
limited to two digit numbers – either 0 or 1. This is an example of a
1-bit raster data file. Mathematically, there are only two
possibilities for each pixel, 0 or 1. By contrast in an 8-bit data file,
there are 256 possibilities of data values for each pixel.

•The computer “sees” the cells that contain 0 as “turned off”, while
the cells that contain 1 as “turned on”.
Vector:
•Roadways as lines, firestations as points and lakes and ponds as
polygons (areas).

•Vector data is a straight forward digital version of the lines that define
the shape or boundary of a map feature.

•In some software packages, vector data can have more complex
structure, e.g. measures along lines (i.e., roads), or areas of polygon
overlap such as animal habitat zones.

•Vector data is stored as Geodatabase (GDB) feature classes and as


shapefiles. ArcView 3.x users can access only shapefiles, while ArcGIS
software can use GDB featureclasses and shapefiles.
•Vector data store significant amounts of attribute data or details
about features in the data set, providing the real power in using
GIS for queries and analyses.

•Vector data does not provide any 3-D representation, as this


format of data usually describes only the map or 2-D view of the
world.
Advantages of Raster data
1. The geographic location of each cell is implied by its position in
the cell matrix.

2. Overlaying is easy and efficiently implemented.

3. Due to the nature of the data storage technique data analysis is


usually easy to program and quick to perform.

4. The inherent nature of raster maps, e.g. one attribute maps, is


ideally suited for mathematical modeling and quantitative
analysis.

5. Discrete data, e.g. forestry stands, is accommodated equally well


as continuous data, e.g. elevation data, and facilitates
the integrating of the two data types.
Disadvantages of Raster Data:

1.The cell size determines the resolution at which the data is


represented.

2. It is especially difficult to adequately represent linear features


depending on the cell resolution. Accordingly, network linkages
are difficult to establish.

3. Processing of associated attribute data may be cumbersome if


large amounts of data exists. Raster maps inherently reflect
only one attribute or characteristic for an area.

4. Since most input data is in vector form, data must undergo vector-
to-raster conversion. Besides increased processing
requirements, this may introduce data integrity concerns due to
generalization and choice of inappropriate cell size.
Advantages of Vector Data:

1.Data can be represented at its original resolution without


generalization.

2.Graphic output is usually more aesthetically pleasing.

3.Since most data, e.g. hard copy maps are in vector form, no
conversion is required.

4.Accurate geographic location of data is maintained.

5.Allows for efficient encoding of topology, and as a result more


efficient operations that require topological information,
e.g. proximity, network analysis.
Disadvantages of Vector Data:

1.The location of each vertex needs to be stored explicitly.

2.Algorithms for manipulative and analysis functions are complex and


may be processing intensive.

3.Continuous data, such as elevation data, is not effectively


represented in vector form.

4.Spatial analysis and filtering within polygons is impossible.


TIN –

•Store 3 D data with an x, y and z value.

•Store and display elevation data.

•They are somewhat specialized in that they require 3-D analysis


and display software such as ArcView or ArcGIS 3-D analyst.

•Because there is a continuity relationship between all data formats,


TINs can be converted into grids and also in vector equivalents.
However this changes the way the data is modeled and usually
involves some interpolation of the data thus reducing the
functionality of the TIN format.
•Even though TINs generally store only a single Z value as an
attribute, the TIN format creates very large files as they store the
relationship between all the features within the data.

•TINs representing thousands or millions of points are not


uncommon and their resulting file size limits TINs to a relatively
small tile extent covering a limited geographic area.
ATTRIBUTE DATA MODELS (DBMS Models used in GIS):

A separate data model is used to store and maintain attribute data for
GIS software. These data models may exist internally
within the GIS software, or may be reflected in external commercial
Database Management Software (DBMS). A variety of
different data models exist for the storage and management of
attribute data. The most common are:

ASCII
DWG/DXF
Tabular Databases
GeoDatabase
ASCII – (American Standard Code for Information Interchange)
•Data in this format is simply a line-by-line listing of information in
text
format that takes on a geographical meaning when the listing
contains
positional coordinate information.

•Text information can be easily imported into most GIS and CAD-
based software programs and it is this flexibility that drives storing
some point data sets in this format.

•When possible, most point data sets are stored as vector datasets to
make them more consumable to ArcView and ArcGIS software
packages.

•In the case of the elevation data that originate as very large ASCII
files, storage as vector point files is not efficient for display and
DWG/DXF Drawing files (DWG) and the ASCII export version (DXF)
• Another flavor of vector data developed for and used extensively
in engineering CAD (Computer Aided Drawing) software.

• As the line between GIS and traditional CAD software and data
types continues to blur, the industry has improved the
compatibility, and thus sharing of these data types.

• Used to store planimetric linework such as roads, water/sewer


infrastructure, and legal description information by public work
agencies, survey departments and utility companies.

• For GIS users this data is often converted to GIS-type formats


such as vector shapefiles, but DXF and DWG can also be read
directly by most GIS software.
•These CAD data types provide a key bridge between GIS and
engineering applications. For example the LiDAR-derived elevation
contours in the SDW are provided in both vector shapefile and vector
DWG format.

•Though CAD formats provide accurate and detailed location


information they do not store attribute information in the same way as
GIS vector data does but rather provide more limited descriptive
information in the LAYER and other DWG entity values.
Tabular databases –
•Microsoft Access, SQL Server, Oracle and other relational database
systems serve as storage and access software for a wide range of
tabular data tables.

•ASCII data is often moved to a tabular database arranged in a logical


integrated manner that emphasizes relationships between the data
sets.

•Vector data also incorporates this functionality in storing the data as


attributes, but large complex business tables such as financial
records, census data, etc., are stored and managed as tables in these
more efficient databases.

•This allows the data to be served up from a central point to a variety


of web-based applications and query and reporting applications.
•GIS data, particularly vector data, can also access these databases
through connections within the GIS software establishing a
relationship between the spatial location of features and the
descriptive information about them.

•Extracts of information from these relational databases is sometimes


stored in standalone dbase-format (dbf) tables that are highly
compatible with shapefile format data and can be joined to the
shapefile dbf attribute table.
Attribute data structure

Tabular Model: ASCII or other standard format


Hierarchical Model: The hierarchical database organizes data in a
tree structure. Data is structured downward in a hierarchy of tables
Network Model: The network database organizes data in a network
structure. Any column in this structure can be linked to any other.
Relational Model: The relational database organizes data in tables.
Each table, is identified by a unique table name, and is organized by
rows and columns.
Each column within a table also has a unique name. Columns store
the values for a specific attribute, e.g. cover group, tree height.
Rows represent one record in the table. In a GIS each row is usually
linked to a separate spatial feature, e.g. a forestry stand.
Accordingly, each row would be comprised of several columns, each
column containing a specific value for that geographic feature.
Data is often stored in several tables. Tables can be joined or
referenced to each other by common columns (relational fields).

Usually the common column is an identification number for a selected


geographic feature, e.g. a forestry stand polygon number.

This identification number acts as the primary key for the table.

The ability to join tables through use of a common column is the


essence of the relational model. Such relational joins are usually ad
hoc in nature and form the basis of for querying in a
relational GIS product.

Unlike the other previously discussed database types, relationships


are implicit in the character of the data as opposed to explicit
characteristics of the database set up.
The relational database model is the most widely
accepted for managing the attributes of geographic data.

The relational DBMS is attractive because of it’s:

●Simplicity in organization and data modeling.

●Flexibility - data can be manipulated in an ad hoc manner by joining


tables.

●Efficiency of storage-proper design of data tables can reduce


redundancy.

• Queries do not need to take into account the internal organization of


data.

The relational DBMS has emerged as the dominant commercial data


management tool in GIS implementation and application.
GeoDatabase –
•This close association between spatial vector data and relational
database tables is taken toward a single common format of data
storage.

•Beyond enhanced storage efficiencies and improvements in access


speeds, geodatabases will help integrate the spatial data of
organizations with their extensive business table data. Users will move
from accessing their common data types in a file-based model as now
done to a design where all GIS data – location and attribute – is
accessed from a relational database.
Other types of spatial data that can be stored using the Spatial option
besides GIS data include:
•Data from computer-aided design (CAD)
•Computer-aided manufacturing (CAM) systems.

Instead of operating on objects on a geographic scale, CAD/CAM


systems work on a smaller scale such as for an automobile engine or
printed circuit boards.
Object Oriented Model:

The object-oriented database model manages data through objects.


An object is a collection of data elements and operations that
together are considered a single entity. The object-oriented database
is a relatively new model. This approach has the attraction that
querying is very natural, as features can be bundled together with
attributes at the database administrator's discretion.

To date, only a few GIS packages are promoting the use of this
attribute data model. However, initial impressions indicate that this
approach may hold many operational benefits with respect to
geographic data processing. Fulfillment of this promise with a
commercial GIS product remains to be seen.

You might also like