You are on page 1of 17

GIS 205 – GIS and Remote Sensing

Lesson 3
Data Models and Data Structures
Lesson 3: Data Models and Data Structures
Introduction
In order to visualize natural phenomena, one must first determine how to best
represent geographic space. Data models are a set of rules and/or constructs used
to describe and represent aspects of the real world in a computer. Two primary
data models are available to complete this task: raster data models and vector data
models.

This lesson discusses the different data models and you will be able to create your
own spatial data.

Learning Outcomes
Upon completion of this lesson, the students will be able to:
1. Discuss the different spatial data model.
2. Explain geodatabase and metadata.
3. Create a map with vector and raster data.
4. Perform heads-up digitizing.
5. Perform joining layers.
6. Perform interpolation technique.

ACTIVITY
Please refer to the attached activity.
Activity No. 4 Creating Vector Data
Activity No. 5 Importing CSV File into QGIS

ANALYSIS
1. Compare Raster and Vector Model in representing geographic features.
2. Illustrate raster and vector data by figures.
3. Why is snapping tool important?
4. How scale visibility matters in digitizing?
5. What do you think are the application of interpolation?

ABSTRACTION
A. Terminologies

Spatial and non-spatial data

Spatial data refers to the data or information that describes the absolute or relative
location of geographic features on the earth. The non-spatial data or the attribute
data on the other hand describes the characteristics of the spatial features. These
characteristics can be quantitative or qualitative. (Ex. spatial—a geographic feature

31 | P a g e
GIS 205 – GIS and Remote Sensing

such as a road; and non-spatial—information about that road, such as its name,
route number, classification, the number of cars on it, etc.)

Representation of Space

Burrough & McDonnell (1998) described two ways to represent the space (an area,
landscape or some bigger unit), which are as follows:
a) Discrete Entities: The space could be seen as occupied with entities that
are described by their properties and can be located on earth using
coordinate systems. The entities have a clear boundary. (Ex. Buildings,
roads, land parcels etc.)
b) Continuous fields: The variation of an attribute over the space as a
continuous field. No physical boundary can ever be observed in such case.
(Ex. Temperature, pressure, elevation etc. across an area)

Figure 12. The same data is represented differently in vector and raster
formats; the diagram also reflects the corresponding difference between
discrete and continuous data.

B. GIS Data Models

Data models are conceptual models of the real world. These describe us the
representation and storage of the geographic data. The data models used in GIS
are described below:

A. Vector Data Model


The vector data model is closely linked with the discrete object view. In vector data
model, geographical phenomena are represented in three different forms: point,
line and polygon. The shape of a spatial entity is stored using two-dimensional
(x, y) coordinate system.

32 | P a g e
GIS 205 – GIS and Remote Sensing

Figure 13. In vector formats, points, lines, and polygons represent spatial features.

1) Point: A location depicted by a single set of (x, y) coordinates at the scale


of abstraction. (Ex. The wells in a village, electricity poles in a town and
cities in the world map are the examples of spatial features described by
points.)

Note: A city can be marked as a single point on a world map but would be
marked as a polygon on a state map. The scale plays an important role in
deciding the geometry of a geographical feature.

2) Line/Arc: Ordered sets of (x, y) coordinate pairs arranged to form a linear


feature. The curves in a linear feature are generated by increasing the
density of points/vertices. (Ex. The roads, rails and telephone cables are the
examples of the spatial features described by lines.)

3) Polygon: The set of (x, y) coordinate pairs enclosing a homogeneous area.


(Ex. The land parcels, agricultural farms and water bodies are the examples
of the spatial features described by polygons.)

B. Raster Data Model


The raster data model is commonly associated with the field conceptual model.
Here, geographic space is represented by array of cells or pixels (aka picture
elements) which are arranged in rows and columns. Each pixel has a value that
represents information. The value can be in the form of integer, floating points
or alphanumeric.
A point can be represented by a single pixel in raster model. A line is a chain of
spatially connected cells with the same value. Similarly, a water body in raster
data is represented as a set of contiguous pixels having same value that
represents a homogeneous area.

33 | P a g e
GIS 205 – GIS and Remote Sensing

Figure 14. A close look at this raster of ocean depth shows that it is composed of square cells. Each cell holds
a numeric value indicating ocean depth.

C. Vector Data Structure

Geographic entities encoded using the vector data model, are often called
features. The features can be divided into two classes:

a) Simple features
These are easy to create, store and are rendered on screen very quickly.
They lack connectivity relationships and so are inefficient for modeling
phenomena conceptualized as fields.

b) Topological features
A topology is a mathematical procedure that describes how features are
spatially related and ensures data quality of the spatial relationships.
Topological relationships include following three basic elements:
i. Connectivity: Information about linkages among spatial objects
ii. Contiguity: Information about neighboring spatial object
iii. Containment: Information about inclusion of one spatial object within
another spatial object

34 | P a g e
GIS 205 – GIS and Remote Sensing

Figure 15. Topological relationships.

Connectivity

Arc node topology defines connectivity - arcs are connected to each other if they
share a common node. This is the basis for many network tracing and path finding
operations.

Arcs represent linear features and the borders of area features. Every arc has a
from-node which is the first vertex in the arc and a to-node which is the last vertex.
These two nodes define the direction of the arc. Nodes indicate the endpoints and
intersections of arcs. They do not exist independently and therefore cannot be
added or deleted except by adding and deleting arcs.

Figure 16. Arc-node Topology

Nodes can, however, be used to represent point features which connect segments
of a linear feature (e.g., intersections connecting street segments, valves
connecting pipe segments).

Figure 17. Node showing intersection

Arc-node topology is supported through an arc-node list. For each arc in the list
there is a from node and a to node. Connected arcs are determined by common
node numbers.

35 | P a g e
GIS 205 – GIS and Remote Sensing

Figure 18. Arc-Node Topology with list

Contiguity

Polygon topology defines contiguity. The polygons are said to be contiguous if they
share a common arc. Contiguity allows the vector data model to determine
adjacency.

Figure 19. Polygon Topology

The from node and to node of an arc indicate its direction, and it helps determining
the polygons on its left and right side. Left-right topology refers to the polygons on
the left and right sides of an arc. In the illustration above, polygon B is on the left
and polygon C is on the right of the arc 4.

Polygon A is outside the boundary of the area covered by polygons B, C and D. It


is called the external or universe polygon, and represents the world outside the
study area. The universe polygon ensures that each arc always has a left and right
side defined.

Containment

Geographic features cover distinguishable area on the surface of the earth. An


area is represented by one or more boundaries defining a polygon. The polygons
can be simple or they can be complex with a hole or island in the middle. In the
illustration given below assume a lake with an island in the middle. The lake
actually has two boundaries, one which defines its outer edge and the other (island)
which defines its inner edge. An island defines the inner boundary of a polygon.

36 | P a g e
GIS 205 – GIS and Remote Sensing

The polygon D is made up of arc 5, 6 and 7. The 0 before the 7 indicates that the
arc 7 creates an island in the polygon.

Figure 20. Polygon arc topology

Polygons are represented as an ordered list of arcs and not in terms of X, Y


coordinates. This is called Polygon-Arc topology. Since arcs define the boundary
of polygon, arc coordinates are stored only once, thereby reducing the amount of
data and ensuring no overlap of boundaries of the adjacent polygons.

Simple Features

Point entities: These represent all geographical entities that are positioned by a
single XY coordinate pair. Along with the XY coordinates the point must store other
information such as what does the point represent etc.

Line entities: Linear features made by tracing two or more XY coordinate pair.
 Simple line: It requires a start and an end point.
 Arc: A set of XY coordinate pairs describing a continuous complex line. The
shorter the line segment and the higher the number of coordinate pairs, the
closer the chain approximates a complex curve.

Simple Polygons: Enclosed structures formed by joining set of XY coordinate


pairs. The structure is simple but it carries few disadvantages which are mentioned
below:
 Lines between adjacent polygons must be digitized and stored twice,
improper digitization give rise to slivers and gaps
 Convey no information about neighbor
 Creating islands is not possible

Topologic Features

Networks: A network is a topologic feature model which is defined as a line graph


composed of links representing linear channels of flow and nodes representing
their connections. The topologic relationship between the features is maintained in

37 | P a g e
GIS 205 – GIS and Remote Sensing

a connectivity table. By consulting connectivity table, it is possible to trace the


information flowing in the network

Polygons with explicit topological structures: Introducing explicit topological


relationships takes care of islands as well as neighbors. The topological structures
are built either by creating topological links during data input or using software.
Dual Independent Map Encoding (DIME) system of US Bureau of the Census is
one of the first attempts to create topology in geographic data.

Figure 21. Polygon as a topological feature

Polygons are formed using the lines and their nodes.


Once formed, polygons are individually identified by a unique identification number.
The topological information among the polygons is computed and stored using the
adjacency information (the nodes of a line, and identifiers of the polygons to the
left and right of the line) stored with the lines.

Fully topological polygon network structure

A fully topological polygon network structure is built using boundary chains that are
digitized in any direction. It takes care of islands and lakes and allows automatic
checks for improper polygons. Neighborhood searches are fully supported. These
structures are edited by moving the coordinates of individual points and nodes, by
changing polygon attributes and by cutting out or adding sections of lines or whole
polygons. Changing coordinates require no modification to the topology but cutting
out or adding lines and polygons requires recalculation of topology and rebuilding
the database.

Triangular Irregular Network (TIN)

TIN represents surface as contiguous non-overlapping triangles created by


performing Delaunay triangulation. These triangles have a unique property that the
circumcircle that passes through the vertices of a triangle contains no other point
inside it. TIN is created from a set of mass points with x, y and z coordinate values.
38 | P a g e
GIS 205 – GIS and Remote Sensing

This topologic data structure manages information about the nodes that form each
triangle and the neighbors of each triangle.

Figure 22. Delaunay Triangulation

Advantages of Delaunay triangulation


 The triangles are as equiangular as possible, thus reducing potential
numerical precision problems created by long skinny triangles
 The triangulation is independent of the order the points are processed
 Ensures that any point on the surface is as close as possible to a node

Because points can be placed irregularly over a surface a TIN can have higher
resolution in areas where surface is highly variable. The model incorporates
original sample points providing a check on the accuracy of the model. The
information related to TIN is stored in a file or a database table. Calculation of
elevation, slope, and aspect is easy with TIN but these are less widely available
than raster surface models and more time consuming in term of construction and
processing.

39 | P a g e
GIS 205 – GIS and Remote Sensing

The TIN model is a vector data model which is stored using the relational attribute
tables. A TIN dataset contains three basic attribute tables: Arc attribute table that
contains length, from node and to node of all the edges of all the triangles.

40 | P a g e
GIS 205 – GIS and Remote Sensing

Node attribute table that contains x, y coordinates and z (elevation) of the vertices
Polygon attribute table that contains the areas of the triangles, the identification
number of the edges and the identifier of the adjacent polygons.
Storing data in this manner eliminated redundancy as all the vertices and edges
are stored only once even if they are used for more than one triangle. As TIN stores
topological relationships, the datasets can be applied to vector based
geoprocessing such as automatic contouring, 3D landscape visualization,
volumetric design, surface characterization etc.

D. Raster data structure

In a simple raster data structure, the geographical entities are stored in a matrix of
rectangular cells. A code is given to each cell which informs users which entity is
present in which cell. The simplest way of encoding a raster data into computers
can be understood as follows:

(a) Entity model: It represents the whole raster data. Let us assume
that the raster data belongs to an area where land is surrounded by
water. Here a particular entity (land) is shown in green color and the
area where land is not present is shown by white.

(b) Pixel values: The pixel value for the full image is shown. Cells
having a part of the land are encoded as 1 and others where land is
not present are encoded as 0.

(c) File structure: It demonstrates the method of coding raster data.


The first row of the file structure data tells that there are 5 rows and
5 columns in the image, and 1 is the maximum pixel value. The
subsequent rows have cells with value as either 0 or 1 (similar to
pixel values).

The huge size of the data is a major problem with raster data. An image consisting
of twenty different land-use classes takes the same storage space as a similar
raster map showing the location of a single forest. To address this problem many
data compaction methods have been developed which are discussed below:
 Run length encoding
o Reduction of data on a row by row basis
o Stores a single value for a group of cells rather than storing values
for individual cells
o First line represents the dimension of the matrix (8×8) and the
number of entities (1) present. In second and subsequent lines, the
first number in the pair represents absence (0) or presence (1) of the

41 | P a g e
GIS 205 – GIS and Remote Sensing

entity and the second number indicates the number of cells


referenced.

 Block encoding
o Data is stored in blocks in the raster matrix.
o The entity is subdivided into hierarchical blocks and the blocks are
located using coordinates.
o The first cell at top left hand is used as the origin for locating the
blocks
o Ex. Instead of storing 64 grid cells, all it takes is just 7 blocks. Using
block coding, it requires one 3×3 block, two 2×2 blocks and four 1×1
cell blocks to encode this raster image. In this block coding example,
the top-left corner is used as a reference for each block.

 Chain encoding
o Works by defining boundary of the entity i.e. sequence of cells
starting from and returning to the given origin
o Ex. We start at position (5,2). From here we define the border using
cardinal directions and number of movements. We move east 3
positions until we hit the edge. At this location, we move south 4
positions. This process continues until the end point hits the start
point.
o Note: Only for the purpose of this exercise, we used north, east,
south and west as alphabetical values. When encoded, it is a
numerical value.

42 | P a g e
GIS 205 – GIS and Remote Sensing

 Quadtree
o A raster is divided into a hierarchy of quadrants that are subdivided
based on similar value pixels.
o The division of the raster stops when a quadrant is made entirely
from cells of the same value.
o A quadrant that cannot be subdivided is called a leaf node.

A satellite or remote sensing image is a raster data where each cell has some
value and together these values create a layer. A raster may have a single layer
or multiple layers. In a multi-layer/ multi-band raster each layer is congruent with
all other layers, have identical numbers of rows and columns, and have same
locations in the plane. Digital elevation model (DEM) is an example of a single-
band raster dataset each cell of which contains only one value representing surface
elevation.

A single layer raster data can be represented using


a. Two colors (binary): The raster is represented as binary
image with cell values as either 0 or 1 appearing black
and white respectively

43 | P a g e
GIS 205 – GIS and Remote Sensing

Grayscale: Typical remote sensing images are recorded


in an 8 bit digital system. A grayscale image is thus
represented in 256 shades of gray which range from 0
(black) to 255 (white). However a human eye can’t make
distinction between the 255 different shades. It can only
interpret 8 to 16 shades of gray.

A satellite image can have multiple bands, i.e. the scene/details are captured at
different wavelengths (Ultraviolet- visible- infrared portions) of the electromagnetic
spectrum. While creating a map we can choose to display a single band of data or
form a color composite using multiple bands. A combination of any three of the
available bands can be used to create RGB composites. These composites
present a greater amount of information as compared to that provided by a single
band raster.

Table 2. Comparison between Vector and Raster Data Models


Data Model Advantages Disadvantages
Raster Simple data structure Cell size determines the
resolution at which the data is
represented
Compatible with remote Requires a lot of storage space
sensing or scanned data
Spatial analysis is easier Projection transformations are
time consuming
Simulation is easy because Network linkages are difficult to
each unit has the same size establish
and shape
Vector Data is represented at its The location of each vertex is
original resolution and form to be stored explicitly
without generalization
Require less storage space Overlay based on criteria is
difficult
Editing is faster and Spatial analysis is
convenient cumbersome
Network analysis is fast Simulation is difficult because
each unit has a different
topological form
Projection transformations are
easier

44 | P a g e
GIS 205 – GIS and Remote Sensing

E. Geodatabase and Metadata

Geodatabase

The term ‘Geodatabase’ was introduced by Environmental Systems Research


Institute, Inc. (ESRI) and is defined as a collection of geographic datasets of
various types that are held in a common file system folder such as MS Access
database, Oracle, SQL server, DB2 etc. The geodatabase is built on extended
relational database. In this model, entities are represented as objects with
properties, behavior, and relationships.

Figure 23. Inside Geodatabase

Geodatabase supports various elements of GIS such as attribute data, CAD data,
geographic features, satellite and aerial images, GPS data and survey
measurements. These types of data can be represented as data objects viz.
annotation, dimension, feature class, geometric network, raster dataset, tables,
topology, relationship class etc. Geodatabase design is based on a fundamental
step of GIS design which involves organizing geographic information into a series
of data themes then specifying the content and representation of the thematic
layers. Advance capabilities (network, topology, subtypes etc.) are added later to
the geodatabase to model GIS behavior and maintain data integrity. Other key
properties of geodatabase design include definition of coordinate properties and
spatial properties, tolerances, coordinate resolution and metadata documentation
for each dataset.

Metadata

Metadata is structured information that describes and makes it easier to retrieve,


use, or manage an information resource. It is also known as data about data.

Need of metadata
 To enable the process of search over distributed archives: Similar to a
library catalog, it sorts data and makes it easy for a user to find it.
 Helps assessing the fitness of a dataset for a given use: Metadata is needed
to determine whether a dataset will satisfy a user’s requirement. Does the

45 | P a g e
GIS 205 – GIS and Remote Sensing

data have acceptable quality? It may also have comments from previous
users.
 Provides information about data content: In the case of remotely sensed
images, it may include the percentage of cloud obscuring the scene and
some other information.
 Provide information about handling the dataset: It includes technical
specification of the data format, software compatible with the data, data
volume etc.

Geospatial metadata commonly keep records of Geographic Information System


(GIS) files, geospatial databases, and earth imagery. It can also be used to
document data catalogs, mapping applications, data models and related websites.
Metadata has the information on library catalog elements such as title, abstract,
and publication; geographic elements such as geographic extent and projection
information; and database elements such as attributes and their values.

The most widely used standard for metadata is the US Federal Geographic Data
Committee’s Content Standards for Digital Geospatial Metadata (CSDGM).
CSDGM describes the items that should be present in a metadata archive but
doesn’t prescribe the format to present it. Developers implement the standards that
suit their own ways but make sure that the implementations are interoperable i.e.
can be understood by other.

Figure 24. Screenshot of metadata of a shapefile displayed in ArcCatalog

46 | P a g e
GIS 205 – GIS and Remote Sensing

Temporal Dimension in GIS

Spatial features may change over time in terms of space and the content. The
changes could be geometrical (change in geometry of features), positional (change
in position of features), or a change in attributes of the features. When changes in
locations of a group of objects are observed together, the changes in the spatial
distribution pattern of the objects can be deciphered.

One may analyze the temporal data sets to monitor the changes that are
happening over the time. Though with time, a lot of things undergo changes but
monitoring the changes must be done prudently as it involves huge investment of
resources. The monitoring intervals must be fixed in a manner that captures the
change in the spatial phenomena and at the same time it must remain efficient and
viable.

The effect of urbanization on the land use of an area can be monitored by a change
detection analysis that makes use of temporal satellite images and GIS to
determine the nature, extent and rate of land cover change and fragmentation over
time and space. Temporal GIS studies are quite popular in the field of forest
conservation and management. One of the studies described the monitoring of
deforestation in a land resource inventory project in Nepal where within an interval
of 30 years (1950-1980) 50% of the forest land was lost to shrub and agriculture.
Similar, temporal studies are carried out for various sectors of natural resources
management such as biodiversity, water; land/soil etc. where considering the
future needs, making a balance between consumption and availability of the
natural resources is of utmost importance.

APPLICATION
Please refer to the attached activity.
Activity No. 4 Creating Vector Data
Activity No. 5 Importing CSV File into QGIS

Closure
You have finished with the concept of data models and data structures. You were
able to create a map with different type of data models by performing a heads-up
digitization and interpolation. In the next lesson, we will explore more on spatial
data inputting and editing.

47 | P a g e

You might also like