You are on page 1of 77

Geographic Information System (GIS)

1.0 INTRODUCTION

At the end of this course unit the student should be able to:

 Define GIS
 Discuss the components of GIS

1.1 THE PURPOSE OF GIS

Different professionals are involved in studies of their environment, in the hope


of a better understanding of that environment. By environment, we mean the
geographic space of their study and the events that take place there.

For instance

 An urban planner might like to find out about the urban fridge growth in
her/his city, and quantify the population growth that some suburbs are
witnessing. She/he might also like to understand why it is these suburbs
and not others.

 A biologist might be interested in the impact of slash –and-burn practices


on the populations of amphibian species in the forests of a mountain range
to obtain a better understanding of the involved long-term threats to those
populations.

 A natural hazard analyst might like to identify the high- risk areas of natural
monsoon related flooding by looking at rainfall patterns and terrain
characteristics.

1
 A geologist engineer might want to identify the best localities for buildings
in an area with regular earthquakes by looking at rock formation
characteristics.

 A mining engineer could be interested in determining which prospect


copper mines are best fit for future exploration, taking into account
parameters such as extent, depth and quality of the ore body, amongst
others;

 A geoinformatics engineer hired by Telecommunication Company may want


to determine the best sites for the company’s relay stations taking into
account various cost factors such as land prices, undulation of the terrain
etc.

 A forest manager might want to optimize timber production using data on


soil and current tree stand distributions in the presence of a number of
operational constraints such as requirement to preserve tree diversity.

 A hydrological engineer might want to study a number of water quality


parameters of different sites in a fresh water lake to improve his/her
understanding of the current distribution of typhus reed beds, and why it
differs so much from that of a decade ago.

All the above professionals work with data that relates to space, typically
involving positional data. Positional data determines where things are or
perhaps where they were or will be. More precisely, these professionals deal

2
with questions related to geographic space, which we might informally
characterize as having positional data relative to the earth’s surface.

Work to do

In your field of study, think of how you can make use of GIS to better carry out
the daily tasks required from you. (The field of study may be imaginary).

1.2 DEFINITON OF GIS

 Geographic Information System is a term applied to computerized


information storage, processing and retrieval systems that have
hardware and software specifically designed to cope with geographically
referenced spatial data and the corresponding attribute data.

 The sources of such spatial data could be maps, field surveys, census,
aerial photographs and satellite imagery. It is therefore evident that
these datasets vary in format, level of detail, accuracy and reference.

 The ability of GIS to combine spatial data from different sources and
non-spatial data (attribute data) distinguishes it from other data
processing software.

Get more definitions from books and in the web

A GIS is a computer-based system that provides the following four sets of


capabilities to handle geo-referenced data (also referred to as subsystems of GIS):

1. Input

2. Data management (data storage and retrieval)


3
3. Manipulation and analysis, and

4. Output

INPUT

The data input subsystems collect and /or allows the user to enter spatial data
derived from existing maps, remote sensors, digital files, tables and airphotos.
STORAGE AND RETRIEVAL (MANAGEMENT) subsystems organizes the spatial
data in a form that allows it to be quickly retrieved, updated and collected.

DATA MANIPULATION AND ANALYSIS subsystem performs a variety of tasks


depending on the specific application. A GIS is distinguished from other
information systems by its functions that permit spatial analysis.

DATA OUTPUT subsystem is capable of displaying all the parts of the original
database as well as the manipulated data. The reporting subsystem must be able
to output data in a variety of forms (i.e. maps, digital files and tables).

4
1.3 COMPONENTS OFGIS

Also referred to as anatomy of GIS


The six components of a GIS are;

Software

People
Data

Network

Hardware Procedure
s

The network – It is the most fundamental component of a GIS without which no


rapid communication or sharing of digital information could occur except
between a small group of people crowded around a computer monitor. GIS today
relies heavily on the internet as a mechanism of information exchange.

Hardware – The device that the user interacts with directly in carrying out GIS
operations by typing, pointing, clicking or speaking and which returns information
by displaying it on the device’s screen or generating meaningful sounds.
5
Software – This runs locally in the user’s machine.

Database – This consists of a digital representation of selected aspects of some


specific area of the earth’s surface or near-surface, built to serve some problem
solving or scientific purpose.

Procedures – (management) an organization must establish procedures, lines of


reporting, control points and other mechanism for ensuring that its GIS activities
stay within budgets, maintain high quality, and generally meet the needs of the
organization.

People – GIS requires people to design, program, and maintain it, supply it with
data, and interpret its results. These people will have various skills depending on
the roles they perform.

1.4 BRIEF HISTORY OF GIS

 Controversy about the history of GIS since parallel developments took place
in North America, Europe and Australia but much of the published history
focuses on the US contributions.

 What is certain is that the extraction of simple measures largely led to the
development of the first real GIS namely the Canadian Geographic
Information System (CGIS) in the mid-60s as a computerized map
measuring system.

 In the late 60s, the US bureau of Census recognized a need for creating
digital records of all US streets to support automatic referencing and
aggregation of census records.
6
 In separate developments cartographers and mapping agencies were
debating on the use of computers to reduce on the costs and shorten the
time taken to create a map.

 The UK Experimental Cartography Unit (ECU) pioneered high quality


computer mapping in 1968 and published the world’s first computer-made
map in a regular series in 1973 with the British Geological Survey.

 National mapping agencies e.g. Britain’s Ordinance Survey Institut


Geographique National and the US Geological Survey and Defense Mapping
Agency (now the National Geospatial – Intelligence Agency) began to
investigate the use of computers to support the editing of maps, to avoid
the expensive and slow process of hand correction and redrafting.

 It was not until 1995, that the first country (Great Britain) achieved
complete digital map coverage in a database.

 Remote sensing also contributed to the development of GIS as a source of


technology as well as source of data.

 GIS took off in the early 1980s, when the price of computing hardware had
fallen to a level that could sustain a significant software industry and cost -
effective application.

1.5 EXAMPLES OF APPLICATION AREAS OF GIS

Give examples of these application areas

7
2.0 DATA STRUCTURES AND DATA MODELS (DATA MANAGEMENT)

At the end of this course unit the student should be able to;

 Apply raster data structures in GIS

 Apply vector data structures in GIS

 Compare data structures

 Understand the basic principles of data management

 Comprehend conventional database management systems.

 Comprehend spatial database management.

2.1 DEFINITION OF TERMS

2.1.1 DATA AND INFORMATION

Data and information are often used synonymously; they are not identical.
Data are what you collect through observation, measurement, and inference.
Information is obtained after analysis and organization of data, therefore
information is data in useful form, useful for solving problem and decision
making, useful for a certain user at a certain moment in time. The main role of
GIS is to convert data into information.

2.1.2 Spatial data – data that contains positional value. Geospatial data
(spatial data that is georeferenced).

8
2.1.3 Modeling is a buzzword used in many different ways and many
different meanings. A representation of some part of the real world can be
considered as a model of that part.

Models as representations, comes in many different flavors. In the


environment, the most familiar model is that of a map. A map is a miniature
representation of some part of the real world. Paper maps are the best known
but digital maps also exist (to be discussed later).

Another important class of models are the databases. A database stores a


usual considerable amount of data and provides various functions to operate
on stored data.

2.1.4 Data modeling – Is the common name for the design efforts of
structuring a database.

2.1.5 Spatial databases – Are a specific type of database. They store


representations of geographic phenomena in the real world to be
used in GIS.

2.1.6 GIS and databases – A database like a GIS is a software Package


capable of storing and manipulating data. This begs the question
when to use which, or possibly when to use both. Historically, these

systems have different strengths and the distinction remains until today.

Databases are good at storing large quantities of data, they can deal with

multiple users at the same time, they support data integrity and system

9
recovery, and they have a high level, easy to use data manipulation language

GIS are not very good at any of this. GIS, however is tailored to operate on

spatial data, and allows sorts of analysis that are inherently geographic in

nature.

2.2 DATA MODELS

Representing the real world

While looking at a landscape, it is possible to describe the view by breaking

down the landscape into units such as buildings, roads, field, valley or hill and

use geographic referencing in terms of “beside”, “to the left |or “in front” to

describe the features. This is in fact the way to develop a conceptual model of

the landscape (fig. 2.0). When information needs to be exchanged over a large

domain it becomes necessary to formalize the models used to describe an area

that data are interpreted without ambiguity and communicated effectively.

10
Figure 2.0

Conceptual models of real world geographic phenomena

Geographic phenomena require two descriptors to represent the real world;

what is present, and where it is. For the former phonological concepts such as

‘town’, ‘river’, flood plain’, ecotope’, soil association are used as fundamental

blocks for analyzing and synthesizing complex information. These phenomena

are recognised and described in terms of well established ‘objects’ or ‘entities’.

11
When considering any space – a room, landscape or a continent - we may

adapt several fundamental different ways to describe what is going on in that

subset of the earth’s surface.

The two extremes are:-

(a) To perceive space as being occupied by entities which are described by


their attributes or properties, and whose position can be mapped using a
geometric coordinate system or.

(b) To imagine the variation of an attribute of interest over the space as some
continuous mathematical function or field.

ENTITIES – The most common view is that space is peopled with “objects’
(entities). Defining and recognizing the entity (is it a house, a cable, a forest, a
river, mountain?) is the first step, listing its attributes, defining its boundaries is
second.

CONTINOUS FIELDS – Here the simplest conceptual model represents geographic


space in terms of continuous Cartesian coordinate in 2 or 3 dimensions or 4 (for
time).

The attribute is assumed to vary smoothly and continuously over the space. The
attribute (e.g. air pressure, temperature, elevation above sea level) and its spatial
variation is considered first; only when there are remarkable clusters of like
attribute values in geographical space or time as with hurricanes or mountain
peaks or ‘significant events’ will these zones be recognized as things.

12
Geographic data models and geographical data primitives

Geographic data models are the formalized equivalents of the conceptual models
used by people to perceive geographic phenomena. Most anthropogenic
phenomena (houses, land parcels, administrative units, roads, cables, pipelines,
agricultural fields) can be handled best using the entity approach the simplest and
most frequently used data model of reality in a basic spatial entity which is
further specified by attributes and geographical location. This can be further
subdivided according to one of the three basic geographical data primitives
namely:-

A point, a line or an area (also known as polygon in GIS). These are the
fundamental units of the vector data model.

Alternative means of representing entities using tessellations of regular shaped


polygons are to use sets of pixels.

Without continuous field data, although the variation of attributes such as


elevation, air pressure, temperature or clay content of the soil is assumed to be
continuous in 2, 3 dimensions and also in time, the variation is generally too
complex to be captured by a simple mathematical function such as polynomial
equation but it is generally necessary to divide geographic space into discrete
spatial units. The resulting tessellation is taken as a reasonable approximation of
reality at the level of resolution under consideration.

Both the entity and tessellation models assume that the phenomenon can be
specified exactly in terms of both their attributes and spatial position. There are
some situations where these data models are acceptable representations of
13
reality, but there will be many others where uncertainties force us to choose the
other option.

RASTERS AND VECTORS

Continuous fields and discrete objects define two conceptual views of geographic
phenomena, but they do not solve the problem of digital representation.

Two methods are used to reduce geographic phenomena to forms that can be
coded in computer databases called raster and vector. In principle, both can be
used to code both fields and discrete objects, but in practice there is a strong
association between raster and fields and between vector and discrete objects.

RASTER DATA

In raster representation space is divided into an array of rectangular (usually


square) cells (fig. 2.1). All geographic variation is then expressed by assigning
properties or attributes to these cells. The cells are sometimes called pixels short
for picture elements).

One of the commonest forms of raster data comes from remote sensing satellites
which capture information in this form and send it to the ground to be distributed
and analysed. Other similar data can be obtained from sensors mounted on
aircrafts.

14
Raster data model
Legend

Mixed conifer

Douglas fir

Oak savannah

Grassland

Figure 2.1

Square cells fit together nicely on a flat table or a sheet of paper, but they will not
fit together neatly on the curved surface of the earth, so just as representations
on paper require that the earth be flattened or projected, so to do raster, because
of the distortions associated with flattening, the cells in a raster can never be
perfectly equal in shape or area on the Earth’s surface.

15
Many of the terms that describe raster suggest the laying of a tile floor on a flat
surface. We talk of raster cells tiling an area, and a raster is said to be an instance
of a tessellation, derived from the word for a mosaic.

When information is represented in raster form all detail about variation within
cells is lost, and instead the cell is given a single value. Suppose we want to
represent the map of the counties of Texas as a raster. Each cell would give a
single value to identify a county, and we would have to decide the rule to apply
when a cell falls in more than one county. Often the rule is that the county with
the largest share of the cell’s area gets the cell. Sometimes the rule is based on
the central point of the cell, and the county at that point is assigned to the whole
cell.

(Fig 2.2). The largest share rule is the almost always preferred, but the central
point rule is sometimes used in the interest of faster computing and is often used
in creating raster datasets of elevation.

16
Raster data
 Cell 1 single value, all detail within lost

Largest share rule

Central share rule

Figure 2.2

17
REGULAR TESSELLATIONS

A tessellation (or tiling) is a partition of space into mutually exclusive cells that
together make up the complete study space with each cell, some thematic value
is associated to characterize that part of space. Three regular tessellation types
are illustrated below:-fig 2.3

In a regular tessellation, the cells are the same shape and size. The simplest
example is a regular raster of unit square represented in a computer in the 2D
case as an array of n x m elements.

Regular Tessellations

Square Cells Hexagonal Cells Triangular cells


Figure 2.3

The square cell tessellation is by far the most commonly used, mainly because
georeferencing a cell is so straight forward. Square regular tessellations are
known in GIS names as raster or raster maps.

18
The size of the area that a raster cell represents is called raster’s resolution.
Sometimes the word grid is also used, but a grid is an equally spaced collection of
points which all some attribute value assigned. Grids are often considered
synonymous with raster cells.
IRREGULAR TESSELLATIONS

As regular tessellations provide simple structures with straight forward


algorithms, they are not adaptive to the phenomenon they represent, and this is
why irregular tessellation are used which partition space into mutually disjoint
cells but now these cells many vary in size and shape allowing them to adapt to
the spatial phenomena that they represent.
Irregular tessellations are more complex than the regular ones but they are also
more adaptive, which typically leads to a reduction in the amount of memory
used to store data. A well-known data structure in this family is the region
quadtree. It is based on a regular tessellation of square cells, but takes advantage
of cases where neighbouring cells have the same field value, so that they can
together be represented as one big cell. Fig 2.4

Irregular Tessellations

19
Quadtree Irregular Tessellations Figure 2.4

Characteristics of a raster
Important characteristics of a raster layer are its:-
1. Resolution
2. Orientation
3. Zones, and
4. Location
Resolution- Burrough (1986:182) defines resolution as the smallest size of a
feature that can be mapped or sampled. Spatial resolution is the minimum linear
dimension of the smallest unit of geographic space for which data are recorded.
The size of the cells contained in a raster can vary therefore the spatial resolution
of the data is determined by the size of the grid. High resolution refers to rasters
with small cell dimensions, lots of detail, lots of cells; low resolution refers to
rasters with large cell dimensions, less detail and the number of cells is small
(figure 2.5)

20
High resolution small pixels

21
Low resolution large pixels

Figure 2.5

Orientation – refers to the angle between true north and the direction defined by
columns of the raster. Raster columns are usually having a North/South. However,
this may vary depending on user’s application.

Zone –Often adjacent cells in a raster have the same value. The contiguous
locations (cell that touch each other) having the same value can be grouped into
zones. Cells in the same zone have the same value. By defining zones within the
raster, area and perimeter calculations may be performed.

Location –In a raster, location is generally identified by an ordered pair of


coordinates (row and column numbers) that identify the location of the cell within
the raster

VECTOR DATA

The vector data model represents space as a series of discrete entity - defined by
point, line or polygon units which are geographically referenced by Cartesian
coordinates.

22
Point representation – A point entity implies that the geographic extents of the
object are limited to a location that can be specified by one set of xy coordinates
at the level of resolution of the abstraction. e.g. a town could be represented by a
point entity at a continental level of resolution but as a polygon entity at a
regional level.

Line representation – A line entity implies that the geographical extents of the
object may adequately be represented by sets of xy coordinate pairs that define a
connected path through space, but one that has no true width unless specified in
terms of an attached attribute e.g. a road at a national level is adequately
represented by a line; at street level, it becomes an area.

Figure 2.6

Polygon – Is a homogenous representation of a 2D space. This too depends on the


level of resolution. The polygon can be represented in terms of xy coordinates of
its boundaries, or in terms of the set of xy coordinates that are enclosed by such a
boundary. Polygons may contain holes, they have direct neighbours, and different
polygons with the same characteristics can occur at different locations

(See fig. 2.7).

23
Vector data

24
Figure 2.7

TOPOLOGY AND SPATIAL RELATIONSHIPS

25
Types of vector data models are: - the spaghetti model, the topological model
and the triangulated irregular network.

SIMPLE FEATURES (SPAGHETTI MODEL)

Geographic entities encoded using the vector data model are usually called
features. Features of the same geometric type are stored in a geographic
database as feature class. GIS commonly deal with two types of feature: simple
and topological. The structure of simple feature polyline and polygon datasets is
sometimes called spaghetti because, like a plate of cooked spaghetti, lines
(strands of spaghetti) and polygons (spaghetti hoops) can overlap and there are
no relationships between any of the objects.

Simple feature datasets are useful in GIS applications because they

 are easy

 are easy to store

 can be retrieved and rendered on the screen very quickly.

On the other hand:

They lack advanced data structure characteristics such as topology therefore;

 Lack of connectivity à restricts analysis

 Adjacent polygon boundaries stored twice resulting to duplication

 Data redundancy

 Errors can easily creep in


26
 Inflexible structure: difficult to dissolve polygons

 Polygons can overlap à restricts applications

TOPOLOGICAL FEATURES (TOPOGICAL MODEL)

Topological features are simple features structured using topological rules.


Topology is the mathematics and science of geometrical relationships and are
non-metric (qualitative) properties of geographic objects that remain constant
when the geographic space of objects is distorted. Example is when a map is
stretched properties such as distance and angle change, whereas topological
properties such as adjacency and containment do not. Topology is important in
GIS because of:

 data validation

 modeling integrated feature behaviour

 and query optimization.

A common form of the topological data model is the arc-node model. The basic
spatial object of the arc –node model is the arc. An arc is a series of points that
start and end at a node. A node is an intersection point where two or more arcs
meet. A node can also occur at the end of a “dangling arc” where two arcs do not
meet. This topological data structures assign direction to lines between endpoints
(arcs and nodes) and areas are labeled and defined by a closed series of
segments. By assigning direction to lines one is given the ability to define what is

27
connected to what and which areas are bordered by the line. Figure 2.8 illustrates
how relationships are defined.

Topology of Vector
Data

Figure 2.8

This type of data model is especially suited for spatial analysis common to spatial
analytical functions include the examination of explicit relationships including
adjacency (contiguity) and connectivity.

28
Triangulated Irregular Network (TIN)

Triangulated Irregular Network model is used to represent terrain data. In this


case, the surface of the earth is represented by a set of irregular triangular cells.
For each coordinate that defines the triangular cell there are three values x, y and
z. The x and y values represent the horizontal location and the z value represents
elevation.

29
Topological Relationships 1

Figure 2.9

30
Topological Relationships 2

Figure 2.10

31
Geometry and topology of Raster Data (Point)

Figure 2.11

32
COMPARISON OF DATA STRUCTURES

Vector and Raster representations

Geometry and topology of Raster Data (Area)

Figure 2.12

33
ADVANTAGES AND DISADVANTAGES OF VECTOR DATA MODELS

Advantages Disadvantages

- Precise expression - Complicated structure

- Less data volume - Difficulty in overlay

- Full topology - Difficulty in updating

- Fast retrieval - Expensive data capture

- Fast conversion

ADVANTAGES AND DISADVANTAGES OF RASTER DATA MODELS

Advantages Disadvantages

- Simple data structure - Large data volume

- Easy to overlay and model - Low precision

- Suitable for 3D display - Difficulty in network analysis

- Integration of image data - Slow conversion

- Automated data capture

34
Organising spatial data

The main principal of data organization in GIS is that of a spatial data layer. A
spatial data layer is a representation of a continous or discrete field, or a
collection of objects of the same kind. The data is organized by kind. i.e. all
telephone booth objects would be in a single data layer, all road line objects in
another one (Figure 2.13). Attribute data normally arranged in tubular form. Data
layers can be overlaid with each other inside the GIS package, so as to study
combinations of geographic phenomena.

Layer concept

Figure 2.13

35
Temporal dimensions

Spatiotemporal data

Besides having geometric, thematic and topological properties, geographic


phenomena change over time; we say that they have temporal characteristics. For
many applications, it is change over time that is most interesting aspect of the
phenomenon to study. This is commonly known as change detection. Change
detection addresses such questions as:

 Where and when did change take place?.

 What kind of change occurred?

 With what speed did change occur?.

 What else can be understood about the pattern of change?

Homework 3

Write short notes on georefencing and say its relationship to spatial data input into a GIS
system.

36
3.0 OVERVIEW OF DATA INPUT AND DATA PREPARATION
CONTENT
Geospatial data acquisition
Concepts of georeferencing
Issues in data preparation

Objectives
At the end of this topic the learner should be able to:-
 Appreciate the major data sources for GIS.
 Understand data acquisition techniques namely manual and automatic
methods.
 Understand the concept of georeferencing.
 Understand the importance of data preparation.

DATA INPUT AND OUTPUT


The first step of using a GIS is to provide it with data. The acquisition and
preprocessing of spatial data is an expensive and time-consuming process.
Implementing a GIS implicitly entails a “database build” phase. The data
incorporated into your GIS must come from the sources. A wide variety of data
sources exist for both spatial and attribute data. The most common general
sources for spatial data are:-

 Hardcopy maps (analogue maps)


 Aerial photographs
 Remotely-sensed imagery
37
 Point data samples from surveys and
 Existing digital files.
Attribute data has an even wider variety of data sources. Any textural or tubular
data that can be referenced to a geographic feature, e.g. a point, line, or area can
be input into a GIS.

Categories of data sources

Data can come from external commercial, non-profit, educational, and


governmental sources, other GIS software users, as well as from your own
organization (internal). Geographic data may be obtained in either digital or
analogue format. Analogue data (hardcopy) must always be digitized before being
added to a geographic database.

Vector and raster geographic data can be classified as primary or secondary: -

Table 3.0: Classification of geographic data for data collection purposes (source:
Longley et al 2001)
Raster Vector
Primary  Digital remote sensing images  GPS measurement
 Digital aerial photographs  Survey measurements

Secondary  Scanned maps or photographs  Topographic maps


 Digital elevation models from  Toponymy(placename)
maps databases
A descriptive classification of the origin of data;

38
 Measured data – Physically collected data such as surveyors determining
the location of a pipeline.
 Inferred data – This is data calculated from other data e.g. the type of crops
in a given field which are inferred from electromagnetic radiation.
 Imported data (and converted) data – This is data usually imported from
into or converted from disparate digital sources.

TECHNIQUES OF DATA COLLECTION


The process of data collection are also variously referred to as data capture, data
automation, data conversion, data transfer, data translation and digitizing.

a) PRIMARY GEOGRAPHIC DATA CAPTURE


Primary geographic data capture involves the direct measurement of objects.
Both raster and vector data capture methods are available.
1. Raster data capture – Much of the most popular form of primary raster
data capture is remote sensing. A technique used to derive information
about the physical, chemical and biological properties of objects without
direct physical contact.
2. Vector data capture – The two main branches of vector data capture are
ground surveying and Global Positioning System GPS.

b) SECONDARY GEOGRAPHIC DATA CAPTURE


Although data conversion from secondary sources is not always ideal, there are
substantial advantages in cost and speed over 100% field data collection.

Geographic data capture from secondary sources is the process of creating raster
and vector files and databases from maps and other hardcopy documents.

39
Scanning is used to capture raster data. Table digitizing, head-up, stereo-
photogrammetry, and COGO (Coordinate Geometry) data entry are used for
vector data.

Raster data capture

Raster data capture of secondary data can be done by manual gridding or


scanning.
Manual gridding
Entering data manually by keyboard, pixel by pixel into a raster structure, is a
tedious and time-consuming method of data entry. Entering data in this manner
involves choosing a cell size and creating a transparent raster. This raster is laid
over the document to be coded and the value of the cell is written down or typed
into a file. An encoding system must be devised to define the attributes to be
entered into a raster.

The following figure illustrates the manner in which data could be coded. Note
the redundancy and volume of data required to code the document.

A
B
40
+
C
D

GRID MAP

A A A B B B
A A B B B B
C C D D B B
C C C D D B
C C C D D D

Figure 3.0

A grid overlay used to manually enter data into a raster GIS database
Vector data capture
Secondary vector data capture involves digitizing vector objects from maps and
other geographic data sources. The most popular methods are manual digitizing,
heads-up digitizing and vectorization, photogrammetry, and COGO data entry.

41
Types of scanners are: - Desktop scanner, drum scanner and large format feed
scanner.

Digitizing
This is the simplest, cheapest and most commonly used means of capturing vector
objects from hardcopy maps. A digitizer is an electronic device consisting of a
table upon which the map or drawing is placed. They come in different designs,
sizes and shapes. There are two forms of digitizing namely:-

- On-table

- On screen

Manual digitizing (on-table)

Manual digitizing is the most common method of inputting positional data from
maps and photographs. Digitizing is the process of adding “XY” coordinates to a
set of computer files for spatial features (arcs, lines, points). The coordinates for a
point on the surface of the digitizer (table) are sent to the computer by a hand-
held magnetic pen, called a Cursor or puck. Paper map, air photo, etc is taped to
the digitizing table in the active area of the table, ensuring that there are no
creases or buckles. The digitizing table electronically encodes the position of the
puck, relative to the affixed map, with the precision of a fraction of a millimeter.
The digitizing table uses a fine grid of wires embedded in the table (figure 3.1).

42
Figure 3.1

Map registration
This is the process where geometric transformations are used to assign ground
coordinates (e.g. UTM) to a map. Positions on a map, as measured in digitizer
coordinates are correlated to positions on the earth’s surface as measured in real
world units (meters). When the features are digitized on a map, the digitizer
coordinates for each data point are converted to Easting and Northing
coordinates and then stored in the database (Digital Resource system Limited,
1991). The transformation process of map registration establishes a correlation
between positions on the digitizer tablet (as represented by their location on a
map) and positions on the Earth’s surface as measured in real world units.

The maps must be referenced to a particular coordinate system. The map to be


digitized must have reference positions (control points), that are coordinated to
the same positions in the computer generated data layer (coverage or theme).
43
Three or more control points are determined for each map sheet. These points
must be easy to identify (i.e. intersections of major streams, the intersection of
map neat lines. The coordinates (XY values) for these points must be known in the
coordinate system to be used in the final database. For example, if the database is
referenced to the UTM coordinate system, the positions of the control points in
the real world must be known or transformed to the UTM coordinate system. If
the coordinates for the corners of the map are known, they can be used to
register the map.

The control points are used by the system to calculate the necessary
mathematical transformations to convert all coordinates to the final system.

Digitizing a maps contents can be done in two different modes: point mode and
stream mode. In point mode, the operator identifies the points to be captured
explicitly by pressing a button on the puck. In stream mode, points are captured
at set time intervals (typically 10 per second) or on movement of the cursor by a
fixed distance. Point mode is more used since it can be better controlled as it is
less prone to shaky movements.

Another set of techniques that works from a scanned image of the original map,
but uses the GIS to find features in the image. These techniques are known as
semi-automatic or automatic digitizing, depending on how much operator
interaction is required. If vector data is to be distilled from this procedure, a less
labour intensive, but can only be applied on relatively simple sources.

Semi-Automatic and Automatic


44
 The semi-automatic and automatic depend on how much human intervention is
required. For instance, automatic data capture by simply scanning an image is simply
automatic since there is no human intervention. Semi-automatic would involve where
the operator would give instructions at the beginning e.g. the starting point and the rest
is done by the system.

Semi-automatic Automatic

Figure 3.2

45
Typical Digitizing Errors

Figure 3.3

The scanning process

A scanner is a device that converts hardcopy analogue media into digital images
by scanning successive lines across a map or document and recording the amount
of light reflected from a local data source. Digital scanners have a fixed maximum
resolution, expressed as the highest number of pixels that can identify per inch;
the unit is dots per inch (dpi).

Scanning or scan digitizing is a more automatic method than manual digitizing.


Using a scanner, the operator automatically extracts the spatial data from maps
and photographs. Scan digitizing is thus considerably less labour intensive than
conventional manual digitizing.
46
A FAX machine it probably the most familiar scanning device. Scanning involves
systematically sampling of the source document by an electric detector across the
surface of the document transmitted or reflected light. A digital image of the
analog document (map) is produced by moving an electronic detector across the
surface of the document.

Scanners are separated into two types, those that can scan the map in a Raster
mode and those that can scan lines by following them directly (vector scanners).
Vector scanning or automatic line following techniques are used to scan off a
continous string of coordinates associated with lines on a map. Using a light beam
the operator guides the beam to the start point of a line. The beam follows the
line automatically until it meets a junction or the starting point of that line. Once
the line has been scanned a second laser paints out the scanned line.

Laser scanning works on the principle that any point or part of the document to
be scanned may have only one or two colours, black or white. The scanned cell
contains intensity values ranging from zero for black to 255 for white. The scanner
or the document moves systematically back and forth. A lower power laser and a
television camera with a high resolution lens record the two colours. The step size
controls the cell sizes. The resulting raster data is a huge number of pixels that
have been coded either black or white. Colour raster scanning can also be
achieved whereby each cell contains the intensity of red, green and blue light.

47
Scanners for Raster Data Input

Figure 3.4

Mechanical Scanner

Figure 3.5

48
Obtaining spatial data elsewhere

Various spatial data sources are available from elsewhere, though sometimes at a
price. It all depends on the nature, scale and date of production that one requires.
Topographic base data is easier to obtain than elevation data which in turn is
easier to get than natural resources or census data. Obtaining large scale data is
more difficult than small scale and of course while recent data is more difficult to
obtain that older data.

Some of this data is only available commercially, as usually is satellite imagery.


National Mapping Organisations (NMO’s) historically are the most important
spatial data providers but the governments can no longer maintain these large
institutions and they are looking for alternatives to the nation’s spatial data
production. Private companies will enter into the market of providing spatial data
and for GIS application people; this will mean they no longer have a single
provider.

Clearing houses – As digital data provision is an expertise by itself, many of the


above mentioned organizations dispatch their data via centralized paces,
essentially creating a market place where potential data users can ‘shop’. Such
markets for digital data have an entrance through the world wide web and they
are called spatial data clearing houses. In addition to data they provide
description of the data available, e.g. type, date of acquisition, mode of
acquisition etc information that would help the user to assess the quality of the
data.

49
CONCEPTS OF GEO-REFERENCING

 The need to integrate and combine data sets acquired using different
techniques, having different references necessitates referencing to one
system to enable effective manipulation of such data.
 Geo-Spatial referencing has been defined as to involve definitions,
physical/geometric constructs, and the tools required to describe the
geometry and motion of objects near or on the earth’s surface.
 The map legend in most cases contains this information, e.g.:- 
       Name of the local vertical datum, e.g. Tide gauge Mombasa
     Name of the local horizontal datum, e.g. Potsdam
    Name of the reference ellipsoid and fundamental point, e.g.
Bessel ellipsoid
 Types of co-ordinates associated with the map grid lines e.g.
geographic co- ordinates, plane co-ordinates, etc.
 Map Projection, e.g. Universal Transverse Mercator
  Map Scale, e.g. 1: 25,000
 Transformation parameters e.g. from global datum to a horizontal
local datum.
 Some understanding of spatial referencing is important since the
user of spatial data will be able to understand the problem
associated with incompatibility of data.

50
DATA PREPARATION
Spatial data preparation aims to make the acquired spatial data fit for use.

 Images may require enhancements and corrections of the classification


scheme of the data.

 Vector data may require editing such as trimming of overshoots of lines at


intersections, deleting duplicate lines. Closing gaps in lines and generating
polygons.

 Data may need to be converted to either vector format or raster format to


match other datasets.

 Associating attribute data with the spatial data through manual input or
reading digital attribute files into GIS/DBMS.

Data checks and repairs

Acquired datasets must be checked for consistency and completeness. The


requirement applies to geometric and topological quality as well as the semantic
quality of the data.

Different approaches to data clean up exist:-

 Errors can be identified automatically, after which manual editing methods


can be applied to correct errors.

 System may identify and automatically correct many errors.

51
Clean-up operations are performed in a standard sequence. For example closing
lines are split before dangling lines are erased, and nodes are created at
intersections before polygons are generated. A number of clean-up operations
are shown below (figure 3.6)

Examples of data repairs for vector data

Figure 3.6

52
With polygon data one starts with many polylines that are combined in the first
step (figure 3.7 (a) to (b). This results in fewer polylines (with more internal
vertices). Then, polygons can be identified (c). Sometimes Polylines do not
connect to form closed boundaries, and therefore must be connected; this step is
not indicated in the figure. In a final step, the elementary topology of the
polygons can be deduced (d).

Clean-up operations for vector data, turning spaghetti data into topological
structure (figure 3.7).

Figure 3.7

53
Associating attributes
Attributes may be automatically associated with features, when they have been
given unique identifiers. In vector data, attributes are assigned directly to the
features, while in a raster the attributes are assigned to the cells that represent a
feature.

RASTERIZATION OR VECTORIZATION

If much or all the subsequent spatial data analysis is to be carried out on raster
data, one may want to convert vector datasets to raster data. This process is
known as rasterization. It involves assigning a point, line and polygon attribute
values to raster cells that overlap with the respective point. To avoid information
loss, the raster resolution should be carefully chosen on the basis of the
geometric resolution. The inverse operation to rasterization is vectorisation that
produces a vector data set from a raster

Combining multiple data sources

54
4.0 GEOSPATIAL DATA ANALYSIS

CONTENT

4.1. Introduction

4.2. Exploratory Operators

4.3. Overlay Operators

4.4. Neighbourhood Operators

4.5. Network Operators

Objectives

At the end of this topic the learner should be able to: -

 Understand the principle of spatial analysis

 Understand the various analysis operators

 Apply the various analysis operators

 Interpret results from the analysis process.

55
4.1 INTRODUCTION

 Spatial analytic capabilities distinguish GIS from other data processing


systems since it makes use of spatial and non-spatial databases to answer
questions and solve problems.
 Principle objective of spatial data analysis:
 To transform and combine data from diverse sources into useful
information, thus improving ones understanding or satisfying the
requirements or objectives of decision making
 GIS application deals with some aspect, or relevant area of reality normally
referred to as the universal discourse of the application.
 Typical problem could be in planning: - e.g. what is the most suitable
location for a dam? Or in prediction, e.g. what will be the size of the lake
behind the dam?
 The universe of discourse is the construction of the dam and its
environmental, social and economic impact.
 The solution to a problem depends on a number of parameters, which
are often interrelated.
 Their interaction is made more precise in an application model. Such a
model will describe in a consistent manner how the applicator’s
universe of discourse behaves. e.g. application models used for
planning and site selection are usually prescriptive, i.e. involve use of
criteria and parameters to quantify environmental, economic and
social factors (certain conditions must be met).
 On the other hand predictive models involve forecasting the likelihood
of future events e.g. pollution, erosion, landslides. Involves expert use
of various spatial data layers and combining them methodically in
order to arrive at a sensible conclusion or prediction.

56
4.1.2 Classification of Analytic GIS capabilities
 Exploratory operators: Measurement, retrieval and classification functions:
In general they involve exploring the data without making fundamental
changes – useful at the beginning of data analysis.
 Overlay operators:-Data layers are combined and new information is
derived: and the principle is to ensure these layers occupy the same
location. Combination can be on the basis of arithmetic operations,
relational conditions and many other functions.
 Neighborhood operators: - Involve evaluating the characteristics of an area
surrounding a features location; i.e. considering buffer zones around
features and their impacts.
 Connectivity operators: -evaluate how features are connected. Useful for
applications dealing with networks of connected features, e.g. road
networks, water courses in coastal zones, communication lines in mobile
telephones

4.2 EXPLORATORY OPERATORS


They have to do with measurement, retrieval and classification functions.
Exploring the data without making fundamental changes. Useful at the beginning
of data analysis.

4.2.1 Measurement
Geometric measurement on spatial features include: - counting, distance and area
size computation.

57
Measurement on vector data

Primitives of vector data sets are point, polyline and polygon. Related geometric
measurements are location, length, distance and area size.

(location, length, area size) are geometric properties of a feature in isolation, but
distance require two features to be identified.

The location property of a vector feature is always stored by the GIS. A single
coordinate pair or a point, or a list of pairs for a polyline or polygon boundary.

Length is a geometric property associated with polylines, by themselves or their


function as polygon boundary. It can be computed by GIS as the sum of lengths of
the constituent line segments, but it is quite often stored with the polyline.

Area size is associated with polygon sizes. Again it can be computed but usually is
stored with the polygon as an extra attribute value.

We see that all the above measurements do not require computation, but only a
look up in stored data.

Measurement on raster data

Measurement on raster data layers are simpler because of the regularity of the
cells. Area of the cell is constant, and is determined by the cell resolution.
Horizontal and vertical resolution may differ but typically not. Together with the
location of a so called anchor point, this is the only geometric information stored
with the raster data, so all other measurements by the GIS are computed. The
58
anchor point is fixed by convention to the lower left (or sometimes upper left
location of the raster.

Location of an individual cell derives from the raster’s anchor point, the cell
resolution, and the position of the cell in the raster. There are two conventions:-

 The cell’s location can be its lower left corner or

 The cell’s mid-point.

The conventions are set by the software in use and in the use of low
resolution data become more important to be aware of.

The area size of a selected part of a raster (a group of cells) is calculated as the
number of cells multiplied with the cell area size. The distance between two
raster cells is the standard distance function applied to the locations of their
respective mid-point, taking into account the cell resolution.

Where a raster is used to represent line features as strings of cells through the
raster, the length of a line feature is computed as the sum of distances between
consecutive cells. This computation is prone to errors.

4.2.2 Retrieval or spatial data queries


When exploring a spatial dataset, the first thing to be done is to select certain
features on the basis of criteria. Spatial selection can be done on the basis of:-

 Geometric or spatial data

 Attribute data

59
Selection on the basis geometric and spatial data

In interactive spatial selection, one defines the selection condition by pointing at


or drawing spatial objects on the screen display, after indicating the spatial data
layer(s) from which to select features. The interactively defined objects are called
selection objects; they can be points, lines or polygons. The GIS then selects
features in the indicated data layer(s) that overlap (i.e. intersect, meet, Contains
or are contained) (see figure 2.9 in previous chapter) with the selection objects.
These become the selected objects.

As we have seen earlier, spatial data is (stored in tables) through a key/foreign


key link. Selection of features lead, via these links, to selection of the records vice
versa, selection of records leads to selection of features. Interactive spatial
selection answers questions like what is …? In figure 4.0, the selection object is a
circle and the selected objects are the red polygons; they overlap with the
selection object.

60
NB. Selected objects are highlighted in red.

Figure 4.0

Spatial selection by attributes


One can select features by stating selection conditions on the features’ attribute
These conditions are formulated in SQL (Structured Query Language) if the
attribute data reside in a relational database or in a software specific language (if
the data reside in GIS itself). This type of selection answers questions lie “where
are the features with …?” Figure 4.1 shows an example of selection by attribute.
The query expression is area < 400000, which can be interpreted as “select all the
land use areas of which the size is less than 400000”. The polygons in red are the
selected areas; their associated records are also highlighted in red.

61
Area≤400000 and landuse=80

Figure 4.1

62
We can use an already selected selection set of features as the basis of further
selection. For instance, if we are interested in landuse areas of size less than
400,000 that are of landuse type 80, the selected features of figure 4.1 are
subjected to a further condition, landuse = 80.

4.2.3 Classification
This is the process of highlighting important patterns in the input spatial data by

applying some classification parameter. Classification can be viewed as a data

reduction process.

In classification vector data there are TWO possibilities: -

 The input features may become the output features, in a new data layer

with an additional category assigned. Here nothing changes with respect to

spatial extents of the original features (Figure 4.2).

Fig 4.2

63
 A second type of output is obtained when adjacent features with the same

category are merged into one bigger feature. Such a post processing

function is called spatial merging or dissolving Figure 4.3

Figure 4.3

64
4.3. OVERLAY OPERATORS

Overlay operators is a technique that combine two spatial data and produce a
third one from them.

Types of overlay operators are: -


 Vector overlay operators
 Raster overlay operators
 Decision table operators

Vector overlay operators


The combining of spatial data layers is based on binary operators referred to as
spatial overlay operators. The assumption in combining two data layers are
mainly: -

 They are georeferenced in the same system


 They overlap in the same geographic location

Examples of standard overlay operators include: -


 Polygon intersection also referred to as a polygon join, takes all the
possible polygon intersection and the resulting attribute table is a
join of the two input attribute tables.
 Polygon clipping operator involves using one of the polygon data
layer to restrict the spatial extent of the other layer.
 Polygons overwrite results in the polygons of the second layer except
where polygons of the first layer existed since they take precedence.

65
Polygon Intersection

Figure 4.4

Polygon Clipping
Polygon clipping operator involves using one of the polygon data layer to restrict the spatial
extent of the other layer.

66
Figure 4.5

Polygon Overwrite
Polygons overwrite results in the polygons of the second layer except where
polygons of the first layer existed since they take precedence.

Figure 4.6

Raster overlay operators


In comparison to vector overlay operations, the raster ones are less complicated
since they are performed cell by cell. Some of the operators that can be used in
raster calculus include: -

 Arithmetic operators e.g. subtraction, multiplication, division, addition, etc.


For instance, an NDVI (Normalized Difference Vegetation Index) image is as
a result of an arithmetic operator.

 Comparison and logical operators, which entail comparing raster cell by cell
using standard comparison operators e.g. <, <=, >, >=, <>. Logical
connectives include NOT, OR and AND operators. The effect of these

67
operators is that they generate an output raster with values that attest to
either true or false.

Arithmetic Raster Calculi

Figure 4.7

68
Logic raster calculi
The green cells are the true values

Figure 4.8

Decision table driven operators


 The decision table overlay operators are useful where domain expertise has
to be exploited in combining different raster images to generate an output
on the basis of a certain criteria. They are particularly useful for suitability
studies. For instance, where one had land use cover and geological
information to extract suitability areas on the basis of say forest areas and
alluvial terrain. This can be effected using an expression of the nature: -

Suitability=IF(( Landuse= ital Forest AND Geo log y= ital Alluvial ) OR


( Landuse= ital Grass AND Geo log y= ital 69
Shale ), ital Suitable , ital Unsuitable )
 The output will be areas that are suitable and those that are not suitable.

Decision Table in raster Overlay

Figure 4.9

4.4 Neighborhood Operators

Neighborhood operators include: -

 Proximity Computation

 Spread Computation

 Seek Computation
70
Proximity Computation

 The objective of proximity computation is to establish the characteristics of


the neighborhood of a given location. The usefulness of such operations is
such that they are able to answer suitability questions on the basis of not
only what is at but also on what is near.

 There are three fundamental issues that should be addressed before such
computations can be performed namely: -

 Identifying the target locations and their spatial extent

 Definition of the mode of determining the neighborhood

 Identify the characteristics of the target to be used in the


computation of each neighborhood.

 On the basis of geometric distance, proximity computation can be done


using: -

 Buffer generation

 Thiessen polygon generation

71
Buffer Generation

 The concept of buffer generation involves simply identifying the target of


interest and simply determining the area around them on the basis of a
distance.

 The figure shows such an example where the targets are main and minor
roads and different buffer distances applied.

Figure 4.10

72
Thiessen Polygon Generation

 Thiessen polygon is based on spatially distributed points as target


locations and the idea is to find for each location which target is
closest. This will involve generating polygons for each target
thereby identifying those locations that belong to that target.

Figure 4.11

73
Spread Computation

 Here the idea is that the neighborhood of a target will depend not
only on the distance but also on direction and differences on
terrain e.g. in case of air pollution, water flow, etc. Hence spread
computation will involve: -

 One or more target locations

 Local resistance factor

 The computation will involve determining the least cost path.

Seek Computation

 This involves determining how an object moves in an area in different


directions and different resistances. A good example is in the drainage
pattern in a water catchments area.

 The figure shows a generic example based on terrain differences, where


the least cost path is used.

74
Figure 4.12

4.5. NETWORK OPERATORS

 Within the context of geospatial analysis, a network can be defined as a


series of connected lines representing some geographic phenomena
specifically for transportation purposes. Network analysis can be performed
on either raster or vector data. Networks characteristics will depend on
whether they are: -

 Directed where a direction is associated with each line

 Undirected networks do not have a direction associated with them

 A number of spatial analysis networks supported include: -

 Optimal path finding which defines the least cost path on the basis of
pre-defined locations and the associated attribute data.

75
 Network partitioning assigns network elements namely nodes or line
segments to different locations using pre-defined criteria.

 Optimal path finding is conducted when the least cost path between the
origin and the destination is required. It involves identifying a sequence of
connected lines that traverse from the origin to the destination at the
lowest cost. The determination of the lowest cost path can be defined on
the basis of: -

 Total length of all lines on the path which is a simple operation.

 In addition to the total length, it could take into account the


maximum capacity, travel rate among other parameters to
determine the lowest cost path.

The aim of network partitioning is to assign lines or nodes in a network


target location which play the role of service centre e.g. medical services,
education facilities, water supply, etc. This type of network partitioning
problem is known as network allocation problem. The issue then is, which
part of the network to assign exclusively to which part of service centre.

Another type is trace analysis which focuses on problems pertinent to that


part of network that is upstream or upstream from a given target location.
This finds application in pollution tracing along e.g. rivers or streams,
energy distribution networks, etc. The idea is to find which part of the
network is conditionally connected to a chosen node on the network
namely the trace origin, where the condition will depend on the application
and is logical in nature.

76
Homework

Due to the construction of the multibillion shilling Nairobi Thika highway, many
land owners incurred huge losses due to their houses and business premises
being pulled down to pave way for the highway. The Kenya Government has laid
out plans on how to compensate those land owners whose land parcels were
taken away by the highway.

(a) State clearly the steps that you would require in order to assess the huge
losses incurred without the use of GIS.

(b) Then using GIS state, the same steps that you would require to assess the
loses.

77

You might also like