You are on page 1of 44

Better life through Technology

Dedan Kimathi University of Technology

GGE 4203- Intro to


GIS & RS
Lecture 3
Ms. Caroline
Better life through Technology
Dedan Kimathi University of Technology

Comparison of Raster to Vector


models
• Raster datasets record a value for all points in the area covered
which may require more storage space than representing data in
a vector format that can store data only where needed.
• Raster data allows easy implementation of overlay operations,
which are more difficult with vector data.
• Vector data can be displayed as vector graphics used on
traditional maps, whereas raster data will appear as an image that
may have a blocky appearance for object boundaries. (depending
on the resolution of the raster file)
• Vector data can be easier to register, scale, and re-project, which
can simplify combining vector layers from different sources.
• Vector data is more compatible with relational database
environments, where they can be part of a relational table as a
normal column and processed using a multitude of operators.
Better life through Technology
Dedan Kimathi University of Technology

Comparison of Raster to Vector


models
• Vector file sizes are usually smaller than raster data, which
can be 10 to 100 times larger than vector data (depending
on resolution).
• Vector data is simpler to update and maintain, whereas a
raster image will have to be completely reproduced.
(Example: a new road is added).
• Vector data allows much more analysis capability, especially
for "networks" such as roads, power, rail,
telecommunications, etc. Raster data will not have all the
characteristics of the features it displays.
• Raster files can be manipulated quickly by the computer, but
they are often less detailed and may be less visually
appealing than vector data files, which can approximate the
appearance of more traditional hand-drafted map
Better life through Technology
Dedan Kimathi University of Technology

Advantages of Raster data


• data structure is simple
• good for representing continuous surfaces
• location specific data collection is easy
• spatial analytical operations are faster
• different forms of data are available (satellite
images, field data, etc.), and
• mathematical modelling and quantitative analysis
can be made in easiest way due to the inherent
nature of raster images.
Better life through Technology
Dedan Kimathi University of Technology

Disadvantages of Raster model


• Data volumes are huge
• Poor representation for points, lines and areas
• Cartographic output quality may be low
• Difficult to effectively represent linear features (depends on
the cell resolution). Hence, the network analysis is difficult
to establish
• Coordinate transformation is difficult which sometimes
cause distortion of grid cell shape
• Suffer from mixed pixel problem and missing or redundant
data, and
• Raster images generally have only one attribute or
characteristic value for a feature or object, therefore,
limited scope to handle the attribute data.
Better life through Technology
Dedan Kimathi University of Technology

Advantages of Vector Data


• Data structure is more compact
• Data can be represented with good resolution
• It can clearly describe topology. Hence, good for proximity and network
analysis
• Spatial adjustment of the data is easy with the utilisation of techniques
such as rubber sheeting, affine, etc.
• Graphic output in small scale as well as large scale gives a good accuracy
• geographic location of entities accurate
• Modernising and generalisation of the entities are possible
• Easy handling of attribute data, and
• Coordinate transformation techniques such as linear transformation,
similarity transformation and affine transformation could be done easily.
Better life through Technology
Dedan Kimathi University of Technology

Disadvantages of Vector Data


• Data structures are complex
• overlay analysis is difficult in processing. Often, this
inherently limits the functionality for large data sets,
e.g., a large number of features
• Data collection may be expensive
• High resolution drawing, colouring, shading and also
displaying may be time consuming and unbearable
• Technology of data preparation is expensive
• Representation of spatial variability is difficult, and
• Spatial analysis and filtering within polygons is
impossible.
Better life through Technology
Dedan Kimathi University of Technology

Spatial Data structures


• Structures that provide information required for
computer to reconstruct spatial data model in
digital form are defined as spatial data structure.
• Two basic spatial data structures in GIS:
• Vector- Points, lines, polygon
• Raster- It consists of rows and columns of equally sized
pixels interconnected to form a planar surface. These
pixels are used as building blocks for creating points,
lines, areas. The area covered by each pixel determines
the spatial resolution
Better life through Technology
Dedan Kimathi University of Technology

Raster Data Structure


• Raster or grid data structure refers to the storage of
the raster data for data processing and analysis by
the computer.
• There are mainly three commonly used raster data
structures (Chang, 2010) such as
• cell-by-cell encoding,
• run-length encoding, and
• quadtree
Better life through Technology
Dedan Kimathi University of Technology

Cell-by-Cell encoding
• By convention, raster data is Cell-by-cell encoding
normally stored row by row from the
top left corner.
• This is the simplest raster data
structure and is characterised by
subdividing a geographic space into
grid cells. Each pixel or grid cell
contains a value. A grid matrix and its
cell values for a raster are arranged
into a file by row and column.
• It encodes a raster by creating
records for each cell value by row Multispectral image
and column
• This method is also referred to as
“exhaustive enumeration.”
• Cell-by-cell encoding can be used to
encode satellite images
(Multispectral)
Better life through Technology
Dedan Kimathi University of Technology
Better life through Technology
Dedan Kimathi University of Technology

Cell-by-Cell encoding
• For mul:-spectral satellite image, each cell has more than one value, data are
stored in either of the following formats. –
• BSQ (Band Sequential Format): – each line of the data followed immediately by the next
line in the same spectral band. This format is optimal for spatial (X, Y) access of any part of
a single spectral band. Good for multispectral images
• BIP (Band Interleaved by Pixel Format): – the first pixel for all bands in sequential order,
followed by the second pixel for all bands, followed by the third pixel for all bands, etc.,
interleaved up to the number of pixels. This format provides optimum performance for
spectral (Z) access of the image data. Good for hyperspectral images
• BIL (Band Interleaved by Line Format): – the first line of the first band followed by the first
line of the second band, followed by the first line of the third band, interleaved up to the
number of bands. Subsequent lines for each band are interleaved in similar fashion. Good
for images with 20-60 bands.
• The BIL, BIP, and BSQ files are binary files, and they must have an associated
ASCII file header to be interpreted properly.
• his header file contains ancillary data about the image such as the number of
rows and columns in the image, if there is a color map, and latitude and
longitude.
• Why is it important to know about the format? If you want to write a program to
read an image, given the metadata. Read a value of particular pixel of any given
band
Better life through Technology
Dedan Kimathi University of Technology

Cell-by Cell encoding


• BIL

BSQ

BIP
Better life through Technology
Dedan Kimathi University of Technology

Run-length raster encoding


• This method encodes cell values in
runs of similarly valued pixels and can
result in a highly compressed image
file
• Run-Length Encoding (RLE) algorithm
was developed to handle the problem
that a grid often contains redundant or
missing data. When the raster data
contains more missing data, the cell-
by-cell encoding method cannot be
suggested

• The run-length encoding method is


useful in situations where large groups
of neighboring pixels have similar
values
• Less useful where neighboring pixel
values vary widely

run-length encoding
Better life through Technology
Dedan Kimathi University of Technology

Quad-tree raster encoding


• This method divides a
raster into a hierarchy of Quad-tree
quadrants that are
subdivided based on
similarly valued pixels
• The division of the raster
stops when a quadrant is
made entirely from cells
of the same value. A
quadrant that cannot be
subdivided is called a
“leaf node.”
Better life through Technology
Dedan Kimathi University of Technology

Quad Tree example


Better life through Technology
Dedan Kimathi University of Technology

Digital image File formats


• File formats are intended to store particular kinds of digital
information.
• The standard greyscale images use 256 shades of grey from
0 (black) to 255 (white). With color images, the situation is
more complex.
• For a given number of pixels, considerably more data is
required to represent the image and more than one color
model is used.
• Common data formats:
• GIF
• JPEG
• TIFF
• BMP
• PNG
Better life through Technology
Dedan Kimathi University of Technology

• GIF (Graphic Interchange Format)


• GIF uses lossless LZW compression for relatively small
file sizes, as compared to uncompressed data. GIF files
offer optimum compression (smallest files) for solid
color graphics, because objects of one exact color
compress very efficiently in LZW.
LZW The LZW compression
is lossless, but of course, the conversion to only 256
colors may be a great loss.
• TIFF (Tagged Image File Format)
• TIFF files have many formats: Black and white,
greyscale, 4- and 8-bit color, full color (24-bit) images.
TIFF files support the use of data compression using
LZW and other compression standards.
Better life through Technology
Dedan Kimathi University of Technology

• BMP (Bitmap)
• B/W Bitmap is monochrome and the color table contains two entries.
Each bit in the bitmap array represents a pixel. If the bit is clear, the
pixel is displayed with the color of the first entry in the color table. If the
bit is set, the pixel has the color of the second entry in the table.
• JPEG (Joint Photographic Expert Group Format)
• A JPEG image provides very good compression, but does not
uncompress exactly as it was; JPEG is a lossy compression techniques.
Compressions up to 50:1 are easily obtainable.
• PNG (Portable Network Graphics)
• The PNG format was designed to replace the antiquated GIF format, and
to some extent, the TIFF format. It utilizes lossless compression. It is a
universal format that is recognized by the World Wide Web consortium,
and supported by modern web browsers.
Better life through Technology
Dedan Kimathi University of Technology

Vector data structure


• Vector data structure
• Geographic entities encoded using the vector data model, are often
called features. The features can be divided into two classes:
• a. Simple features/Spaghetti
These are easy to create, store and are rendered on screen very quickly.
They lack connectivity relationships and so are inefficient for modeling
phenomena conceptualized as fields.
• b. Topological features
A topology is a mathematical procedure that describes how features are
spatially related and ensures data quality of the spatial relationships.
Topological relationships include following three basic elements:
• Connectivity: Information about linkages among spatial objects
• Contiguity: Information about neighboring spatial object
• Containment: Information about inclusion of one spatial object within another
spatial object
Better life through Technology
Dedan Kimathi University of Technology

Topology features
• Connectivity
• Arc node topology defines connectivity - arcs are
connected to each other if they share a common node.
This is the basis for many network tracing and path
finding operations.
• Arcs represent linear features and the borders of area
features. Every arc has a from-node which is the first
vertex in the arc and a to-node which is the last vertex.
These two nodes define the direction of the arc. Nodes
indicate the endpoints and intersections of arcs. They
do not exist independently and therefore cannot be
added or deleted except by adding and deleting arcs.
Better life through Technology
Dedan Kimathi University of Technology

Topology features
• Contiguity
• Polygon topology defines contiguity. The polygons
are said to be contiguous if they share a common
arc. Contiguity allows the vector data model to
determine adjacency.

Better life through Technology
Dedan Kimathi University of Technology

Topology

Arc-Node Topology Polygon Topology


Better life through Technology
Dedan Kimathi University of Technology

Topology features
• Containment
• Geographic features cover distinguishable area on
the surface of the earth.
• An area is represented by one or more boundaries
defining a polygon.
• The polygons can be simple or they can be complex
with a hole or island in the middle.
Better life through Technology
Dedan Kimathi University of Technology

Topology features
• A lake D has an island in the middle.
Polygon arc topolgy
• The lake actually has two boundaries,
one which defines its outer edge and
the other (island) which defines its
inner edge.
• An island defines the inner boundary
of a polygon.
• The polygon D is made up of arc 5, 6
and 7.
• The 0 before the 7 indicates that the
arc 7 creates an island in the
polygon.
• Polygons are represented as an
ordered list of arcs and not in terms
of X, Y coordinates. This is
called Polygon-Arc topology.
Better life through Technology
Dedan Kimathi University of Technology

Simple features/ Spaghetti


• Simple Features include the following
• Point entities : These represent all geographical entities that are positioned by a
single XY coordinate pair. Along with the XY coordinates the point must store
other information such as what does the point represent etc.
• Line entities : Linear features made by tracing two or more XY coordinate pair.
• Simple line: It requires a start and an end point.
• Arc: A set of XY coordinate pairs describing a continuous complex line. The shorter the line
segment and the higher the number of coordinate pairs, the closer the chain approximates
a complex curve.
• Simple Polygons : Enclosed structures formed by joining set of XY coordinate
pairs. The structure is simple but it carries few disadvantages which are
mentioned below:
• Lines between adjacent polygons must be digitized and stored twice, improper
digitization give rise to slivers and gaps
• Convey no information about neighbor
• Creating islands is not possible
Better life through Technology
Dedan Kimathi University of Technology

GIS Data Acquisation


• Two types of data are input into a GIS,
• spatial and
• attribute.
• The data input process is the operation of encoding both types of data
into the GIS database formats.
• The general consensus among the GIS community is that 60 to 80 % of
the cost incurred during implementation of GIS technology lies in data
acquisition, data compilation and database development.
• A variety of data sources and methods can be used to create new data:
• Remotely sensed imagery
• Aerial photographs
• Field data (survey data and GPS data)
• Text files with x-, y-coordinates
• Hard copy Maps- Scanning, Digitizing using a digitizing table, On-screen
digitizing
• Existing digital data files
Better life through Technology
Dedan Kimathi University of Technology

GIS Data Acquisation:Sources


• Remote sensing data capture : Remote sensing refers to the
technique of deriving the information about the objects
without getting in physical contact with them. The
information is derived from the measurements of the
amount of electromagnetic (EM) radiations reflected,
emitted or scattered from the objects under observation.
The response is measured /captured by the sensors
deployed in air or in space.
• Surveying : Ground surveying is based on the principle of
determining the 3D location of a point with the help of
angles and distance measured from other known points.
Survey starts from a benchmark position. The location of all
surveyed points is relative to other points. The traditional
surveying involves the use of transits, theodolites, chains
and tapes for angle and distance measurement.
Better life through Technology
Dedan Kimathi University of Technology

GIS Data Acquisation:Sources


• Photogrammetry: It is the science of making measurements from aerial
photographs and images. Apart from the 2D measurement from a single
photograph, photogrammetry is also used for making 3D measurements
from models made using stereo pairs of photographs. To make a 3D
model, there must be 60% overlap along each flight line and 30%
overlap between flight lines. The measurements from overlapping pairs
of photographs are captured using stereoplotters. These build a model
and allow 3D measurements to be captured, edited, stored and plotted.
One can extract vector objects from 3D model in a way similar to the
above discussed digitization.
• Obtaining Data from external sources : Creating the same dataset
multiple times for the same area is a time and resource intensive
process. One can always import data from data repositories. Some of
these are freely available while others are available at a price. Internet is
the best way to search geographic data. The internet gives information
about geographic data catalogs and vendors. National agencies of a
state/country also disseminate geographic data through their web
portals or through other digital media on demand made by the users.
Better life through Technology
Dedan Kimathi University of Technology

Raster data editing


• Raster data editing is concerned with correcting
the specific contents of raster images than their
general geometric characteristics.
• The objective of the editing is to produce an image
suitable for raster geoprocessing.
• Following editing functions are mostly used for
raster data editing:
Better life through Technology
Dedan Kimathi University of Technology

Raster Editing Functions


• Filling holes and gaps: To fill holes and gaps that
appear in the raster image
• Edge smoothing: To remove or fill single pixel
irregularities in the foreground pixels and background
pixels along lines
• Deskewing: To rotate the image by a small angle so that
it is aligned orthogonally to the x and y axes of the
computer screen
• Filtering: To remove speckles or the random high or
low valued pixels in the image
• Clipping and delete: To create a subset of an image or
to remove unwanted pixels
Better life through Technology
Dedan Kimathi University of Technology

Vector Editing
• Vector data editing is a post digitizing process that
ensures that the data is free from errors. It ensures
that:
• Lines intersect properly without having any undershoots
or overshoots
• Nodes are created at all points where lines intersect
• All polygons are closed and each of them contain a label
point
• Topology of the layer is built
Better life through Technology
Dedan Kimathi University of Technology

Spatial Data Input & Editing


• Secondary data : It refers to the data obtained from maps, hardcopy documents
etc. Some of the methods to capture secondary data are as follows:
• Scanned data: A scanner is used to convert analog source map or document into
digital images by scanning successive lines across a map or document and
recording the amount of light reflected from the data source.
• Documents such as building plans, CAD drawings, images and maps are
scanned prior to vectorization.. Scanning helps in reducing wear and tear;
improves access and provides integrated storage.
• There are three different types of scanner that are widely used:
• Flat bed scanner
• Rotating drum scanner
• Large format feed scanner
• Flat bed scanner is a PC peripheral which is small and comparatively inaccurate.
• The rotating drum scanners are accurate but they tend to be slow and
expensive.
• Large format feed scanner are the most suitable type for inputting GIS data as
they are cheap, quick and accurate.
Better life through Technology
Dedan Kimathi University of Technology

Spatial Data Input & Editing


Digitization
• Digitizing is the process of interpreting and converting paper map or image data to vector digital
data.
• Heads down digitization
• Digitizers are used to capture data from hardcopy maps. Heads down digitization is done on a
digitizing table using a magnetic pen known as Puck. The position of a cursor or puck is detected
when passed over a table inlaid with a fine mesh of wires.
• The function of a digitizer is to input correctly the coordinates of the points and the lines.
Digitization can be done in two modes:
• Point mode: In this mode, digitization is started by placing a point that marks the beginning of the feature
to be digitized and after that more points are added to trace the particular feature (line or a polygon). The
number of points to be added to trace the feature and the space interval between two consecutive points
are decided by the operator.
• Stream mode: In stream digitizing, the cursor is placed at the beginning of the feature, a command is then
sent to the computer to place the points at either equal or unequal intervals as per the position of the
cursor moving over the image of the feature.
Heads-up digitization
• This method uses scanned copy of the map or image and digitization is done on the screen of the
computer monitor. The scanned map lays vertical which can be viewed without bending the head
down and therefore is called as heads up digitization. Semi-automatic and automatic methods of
digitizing requires post processing but saves lot of time and resources compared to manual
method and is described in the following section.
Better life through Technology
Dedan Kimathi University of Technology

Spatial Data Input & Editing


• Vectorization
• Vectorization is the process of converting a raster image into a vector
image. It is a faster way of creating the vector data from raster data.
• Automatic vectorization is performed in either batch or interactive
mode.
• Batch vectorization takes one raster file and converts it into vector
objects in a single operation. Post vectorization editing is required to
remove the errors.
• In interactive vectorization software is used to automate digitizing. The
operator snaps the cursor to a pixel and indicates the direction in which
line is to be digitized. The software then automatically digitizes the line.
The operator can decide various parameters such as density of points,
whether to pause at junction for operator’s intervention or to trace in a
specific direction etc.
• Though the process involves labor it produces high quality data and
greater productivity than the manual digitization.
Better life through Technology
Dedan Kimathi University of Technology

Data Quality: Sources of errors


• Errors affect the quality of GIS data. Once the data
is collected, and prepared for visualization and
analysis it must be checked for errors.
• Burrough (1986) divided the sources of error into
the following categories:
• Common sources of error
• Errors resulting from original measurements
• Errors arising through processing
Better life through Technology
Dedan Kimathi University of Technology

Common sources of error


• Old data sources: The data sources used for a GIS project
may be too old to use. Data collected in past may not be
acceptable for current time projects.
• Lack of data: The data for a given area may be incomplete
or entirely lacking. For example the land-use map for border
regions may not be available.
• Map scale: The details shown on a map depend on the scale
used. Maps or data of the appropriate scale at which details
are required, must be used for the project. Use of wrong
scale would make the analysis erroneous.
• Observation density: High density of observations in an
area increases the reliability of the data. Insufficient
observations may not provide the level of resolution
required for adequate spatial analysis as expected from the
project.
Better life through Technology
Dedan Kimathi University of Technology
Errors resulting from original
measurements
• Positional accuracy: Representing correct positions
of geographic features on map depend upon the
data being used. Biased field work, improper
digitization and scanning errors result in accuracies
in GIS projects.
• Content accuracy: Maps must be labeled correctly.
An incorrect labelling can introduce errors which
may go unnoticed by the user. Any omission from
map or spatial database may result in inaccurate
analysis.
Better life through Technology
Dedan Kimathi University of Technology

Errors arising through processing

• Numerical errors: Different computers have


different capabilities for mathematical operations.
Computer processing errors occur in rounding off
operations and are subject to the inherent limits of
number manipulation by the processor.
• Topological errors: Data is subject to variation.
Errors such as dangles, slivers, overlap etc are
found to be present in the GIS data layers.
Better life through Technology
Dedan Kimathi University of Technology

Topological errors
• Dangle: An arc is said to be a dangling arc if either it is not
connected to another arc properly (undershoot) or is digitized
past its intersection with another arc (overshoot).
• Sliver polygon: It refers to the gap which is created between the
two polygons when snapping is not considered while creating
those polygons.
• These errors can be corrected using the constraints or the rules
which are defined for the layers.
• Topology rules define the permissible spatial relationships
between features.
• Digitizing and geocoding: Many errors arise at the time of
digitization, geocoding, overlaying or rasterizing. The errors
associated with damaged source maps and error while digitizing
can be corrected by comparing original maps with digitized
versions.
Better life through Technology
Dedan Kimathi University of Technology

Error propagation
• No map stored in a GIS is truly error-free
• Errors include not only "mistakes" and "blunders", but also to include the
statistical concept of error meaning "variation“
• When maps stored in a GIS database are used as input to a GIS operation, then
the errors in the input will propagate to the output of the operation.
• Moreover, the error propagation continues when the output from one
operation is used as input to an ensuing operation.
• Consequently, when no record is kept of the accuracy of intermediate results, it
becomes extremely difficult to evaluate the accuracy of the final result
• Although users may be aware that errors propagate through their analyses, in
practice they rarely pay attention to this problem. No professional GIS currently
in use can present the user with information about the confidence limits that
should be associated with the results of an analysis
• Therefore, there is need to be aware of the quality of data on is using.
Better life through Technology
Dedan Kimathi University of Technology

Data Quality components


• Lineage – This refers to source materials, methods
of derivation and transformations applied to a
database.
• Includes temporal information (date that the
information refers to on the ground).
• Intended to be precise enough to identify the sources of
individual objects (i.e. if a database was derived from
different source, lineage information is to be assigned as
an additional attribute of objects or as a spatial overlay)
Better life through Technology
Dedan Kimathi University of Technology

Data Quality components


• Positional accuracy –This efers to the accuracy of the
spatial component.
• Subdivided into horizontal and vertical accuracy elements.
• Assessment methods are based on comparison to source,
comparison to a standard of higher accuracy, deductive
estimates or internal evidence.
• Variations in accuracy can be reported as quality overlays or
additional attributes.
• Attribute accuracy -refers to the accuracy of the
thematic component.
• Specific tests vary as a function of measurement scale.
• Assessment methods are based on deductive estimates,
sampling or map overlay
Better life through Technology
Dedan Kimathi University of Technology

Data Quality components


• Logical consistency -refers to the fidelity of the
relationships encoded in the database.
• Includes tests of valid values for attributes, and
identification of topological inconsistencies based on
graphical or specific topological tests.
• Completeness - refers to the relationship between
database objects and the abstract universe of all
such objects.
• Includes selection criteria, definitions and other
mapping rules used to create the database.

You might also like