Professional Documents
Culture Documents
1
Sources of Geographical Data
• maps, aerial photographs, satellite images, tables,
paper map, thematic map, digitally topographic map
• Data must be geometrically registered to some
coordinate system
• Steps needed to create a spatial database:
• acquire data in digital form from a data supplier,
• digitize existing analogue data, and
• carry out one’s own survey of geographic entities
• interpolate from point observations to continuous
surfaces.
4
Remote Sensing Systems
• Remote sensing is the collection of data about an object without coming
into contact with it.
• This involves the detection and recording of values of emitted or reflected
electro¬ magnetic radiation (energy emitted by all bodies with a
temperature greater than -273 C) using sensors onboard aircraft and
satellites.
• Simple systems only register a single value for each of a limited number of
wavebands (a range of wavelengths of electromagnetic radiation)
• that have been chosen to give as much information as possible about
certain aspects of the earth’s surface such as vegetation, rock and soil
minerals, and water.
• For example, the scanners on the French SPOT satellite record values for
four wavebands (Band 1; 0.5-0.6 Jim, Band 2; 0.6-0.7 jam, Band 3; 0.7-0.8
jam, Band o.8-1.1 p.m) in order to be able to detect differences in water,
vegetation, and rock.
• Multispectral scanners now in development record continuous spectra for
each pixel and therefore generate huge amounts of data.
• The spatial resolution, or the area covered by a single pixel,
depends on the altitude of the sensor, the focal length of the lens
or focusing system, the wavelength of the radiation, and other
inherent characteristics of the sensor itself.
• Pixel sizes vary from a square kilometer for data from
meteorological satellites to a few square centimeters for aircraft-
based, high-resolution sensors.
• Data collected by remote sensing are affected by atmospheric
conditions and irregularities of the platform, such as tilt and
orientation.
• Geometric and radiometric corrections are needed prior to data
input to the GIS to minimize these distortions.
• The visual appearance of the images can be improved by increasing
contrast, stretching the range of grey levels or colours used, and by
edge detection, to make it easier to recognize spatial features.
• Stereo aerial (3D) photographs are overlapping, analogue
images having many applications like creation of topographical
maps and orthophoto maps by photogrammetry.
8
Traditional means of GIS Data collection
9
• The most usual and convenient coordinate system used in GIS
is orthogonal cartesian coordinates, oriented conventionally
north-south and east-west.
10
• As per cartographers ,
– cylindrical projections are best for lands between the tropics,
– conical projections are best for temperate latitudes, and
– azimuthal projections are best for polar areas.
11
Base levels
• All elevation data are referenced to mean
annual sea levels.
• These, are not constant over the whole world
and may differ by some meters from one side
of a continent to the other.
• All national mapping agencies (NMAs) have
defined local base reference levels to suit their
conditions
12
13
Georeferencing raw data
• Ground and field surveys are georeferenced in many ways.
• Ground checks are needed to locate aerial photographs and
satellite images correctly.
14
Georeferencing with GPS
• Defining and recording the location of a data
point has been eased through the development
of Global Position Systems (GPS)
18
Ad hoc data collectors
19
Data providers
20
Data providers
• Data providers offer or sell geographical data in a variety of detail,
formats, scales, and structures.
21
22
Direct spatial data capture
• Data which is captured directly from the environment
is known as primary data.
1. Direct observation of the relevant geographic
phenomena.
2. Done through ground-based field surveys, or by using
remote sensors in satellites or airplanes
• With primary data the core concern in knowing its
properties is to know the process by which it was
captured, the parameters of any instruments used and
the rigour with which quality requirements were
observed.
24
• Remotely sensed imagery is usually not fit for
immediate use, as various sources of error and
distortion may have been present.
25
Indirect spatial data capture
• Any data which is not captured directly from the environment is
known as secondary data.
26
Key sources of secondary data
• Digitizing
• Scanning
• Vectorization
27
Other source of data
• Clearinghouses and web portals
• Web portal categorizes all available data and
provides a local search engine and links to
data documentation (also called metadata).
• It often also points to data viewing and
processing services.
28
Metadata
• Metadata is defined as background information that describes all
necessary in- formation about the data itself.
• It is known as ‘data about data’.
• Metadata answer who, what, when, where, why, and how
• This includes:
29
Creating digital data sets by manual input
(Secondary data)
• The manual input of data to a GIS involves four main
stages:
• entering the spatial data,
• entering the attribute data,
• spatial and attribute data verification and editing,
• and, where necessary, linking the spatial to the
attribute data.
30
31
32
Entering the spatial data
33
Digitizers
• A digitizer is an electronic or electromagnetic tablet upon
which a map or document is placed. Embedded in the table, or
located directly under it, is a sensing device that can
accurately locate the centre of a pointing device which is used
to trace the data points of the map.
34
Rasterization
• Rasterization (vector to raster conversion) is the
process of converting vector data into a grid of pixel
values.
• This involves basically placing a grid over the map
and then coding the pixels according to the
occurrence or not of the phenomena.
35
Document scanners
36
Two types of Scanner
• There are two basic kinds of scanners,
1. record data on a step-for-step basis,
2. scan a whole document in one operation in a manner akin to
xerography.
37
• A digital two-dimensional image
ofthe map is built up by the
movement of either the scanner or
the map
38
Second Type
• Modern document scanners that use a method akin to
xerography resemble laser printers in reverse because the
scanning surface is manufactured with a given resolution of
light-sensitive spots that can be directly addressed by
software.
39
Vectorization
• Vectorization (raster to vector conversion) is usually
undertaken using specialist software which provides
algorithms converting arrays of pixels to line data.
40
Analytical stereoplotters
• A third type of technology used for capturing digital
geographical data is a stereoplotter.
• This is a photogrammetric instrument used to record the levels
and positions of terrain and entities directly from stereo pairs
of aerial photographs (taken of the same area but from a
slightly different viewing positions).
• In recent developments, digital stereo images from satellite
sensors, video recordings, and digital cameras have been used
to generate elevation data using specialized photogrammetric
algorithms in image processing systems.
41
Entering the attribute data
• Attribute data (sometimes called feature codes) are those
properties of a spatial entity that need to be handled in the GIS,
but which are not themselves spatial.
• For example, a road may be captured as a set of contiguous
pixels, or as a line entity and represented in the spatial part of
the GIS by a certain colour, symbol, or data location.
• Attribute data may come from many different sources such as
paper records, existing databases, spreadsheets, etc. They may
be input into the GIS database either manually or by importing
the data using a standard transfer format such as TXT, CSV, or
ASCII.
42
Data verification
and editing
• Once the data have
been entered it is
important to check
them for errors,
possible
inaccuracies,
omissions, and
other problems
prior to linking the
spatial and the
attribute data.
43
Linking spatial and attribute data
44
Data structuring
• Following the data capture process the data are now in one of
two form
• geometrically correct raster data, or
• topologically and geometrically correct vector data.
45
46
47
48
49
50
51
Data Collection
• Spatial data can be obtained from various
sources.
• It can be collected from scratch, using direct
spatial data acquisition techniques, or
indirectly, by making use of existing spatial
data collected by others.
52
Data Quality
• Positional, Temporal and Attribute accuracy
• Lineage
• Completeness
• and logical consistency
53
Accuracy and Precision
54
Positional accuracy
• Human errors in measurement (e.g. reading
errors)
• Instrumental or systematic errors (e.g. due to
misadjustment of instruments).
• Random errors caused by natural variations in
the quantity being measured
• Measurement errors are generally described
in terms of accuracy
55
Attribute accuracy
• For nominal or categorical data, the accuracy
of labeling (for example the type of land
cover, road surface, etc).
• For numerical data, numerical accuracy (such
as the concentration of pollu- tants in the soil,
height of trees in forests, etc).
56
Temporal accuracy
• data can provide useful temporal information
such as changes in land ownership and the
monitoring of environmental processes such
as deforestation. Analogous to its positional
and attribute components, the quality of
spatial data may also be assessed in terms of
its temporal accuracy.
• For a static feature this refers to the difference
in the values of its coordinates at two different
times.
57
Lineage
• Lineage describes the history of a data set
58
Completeness
• Completeness refers to whether there are data lacking in
the database compared to what exists in the real world.
• Completeness can relate to either spatial, temporal, or
thematic aspects of a data set.
• For example, a data set of property boundaries might be
spatially incomplete because it contains only 10 out of 12
suburbs; it might be temporally incomplete because it
does not include recently subdivided properties; and it
might be thematically overcomplete because it also
includes building footprints.
59
Logical consistency
• For any particular application, (predefined) logical rules
concern:
• The compatibility of data with other data in a data set
(e.g. in terms of data format), The absence of any
contradictions within a data set,
• The topological consistency of the data set, and
• The allowed attribute value ranges, as well as
combinations of attributes. For example, attribute
values for population, area, and population density
must agree for all entities in the database.
60
Data preparation
• Data checks and repairs
• Combining data from multiple sources
61
Data checks and repairs
• Acquired data sets must be checked for quality in
terms of the accuracy, consistency and completeness
parameters.
• focus on the geometric, topological, and attribute
components of spatial data.
• ‘Clean-up’ operations are often performed in a
standard sequence.
• With polygon data, one usually starts with many
polylines, in an unwieldy format known as spaghetti
data,
62
63
Data checks and repairs
• Associating attributes
• Rasterization or vectorization
• Topology generation
– more topological relations may sometimes be needed,
for instance in networks, e.g. the questions of line
connectivity, flow direction, and which lines have over
and underpasses.
– For polygons, questions that may arise involve
polygon inclusion: Is a polygon inside another one, or is
the outer polygon simply around the inner polygon?
64
Combining data from multiple sources
• Four Cases:
65
Differ in accuracy
• Due to scale differences in the sources, the resulting polygons do not per-
fectly coincide, and polygon boundaries cross each other. This causes small,
artefact polygons in the overlay known as sliver polygons.
• If the map scales involved differ significantly, the polygon boundaries of the
large-scale map should probably take priority, but when the differences are
slight, we need interactive techniques to resolve the issues.
66
Differ in Representation
67
Combining Adjacent Areas
68
Different Coordinate system
• Map projections provide means to map
geographic coordinates onto a flat surface (for
map production)
• different coordinate systems, or are based
upon different datums.
• Data Transformations may need coordinate
transformation or both a coordinate trans-
formation and datum transformation.
69
Point data transformation
• Suppose we have captured a sample of points but
wish to derive a value for the phenomenon at another
location or for the whole extent of our study area.
• OR we may want to transform our points into other
representations in order to facilitate interpretation
and/or integration with other data.
• Example: homogeneous areas (polygons) from our
point data, or deriving contour lines.
• This is refered as Interpolation.
70
Interpolation
• the calculation of a value from ‘surrounding’
observations.
• Interpolation is the process of using points
with known values or sample points to
estimate values at other unknown points. It
can be used to predict unknown values for any
geographic point data, such as elevation,
rainfall, chemical concentrations, noise levels,
and so on.
71
Nearest-neighbour interpolation
• Simply find the ‘nearest’ known value to the
point (x, y) location, and assign that value to it.
• Use the distance that points are away from (x,
y) to weight their importance in our
calculation.
72
Discrete and Continuous Data /
Qualitative and Quantitative Data
73
Interpolating discrete data
• For discrete (nominal, categorical or ordinal)
data, we are effectively restricted to using
nearest-neighbour interpolation.
• This technique will construct ‘zones’ around
the points of measurement, with each point
belonging to a zone assigned the same value.
Effectively, this represents an assignment of an
existing value (or category) to a location.
74
Thiessen polygons
• If the desired output was a polygon layer, we
could construct Thiessen polygons around the
points of measurement. The boundaries of
such polygons, are the locations for which
more than one point of measurement is the
closest point.
75
Interpolating continuous data
• Interpolation of values from continuous
measurements is significantly more complex
• Many continuous geographic fields—elevation,
temperature and ground water salinity.
• Commonly, continuous fields are represented as
rasters
• The main alternative for continuous field
representation is a polyline vector layer, in which
the lines are isolines(vector).
76
Interpolating continuous data
• Four techniques to use measurements to
obtain a representation of the entire field
using point samples are
– Trend surface fitting using regression,
– Triangulation,
– Spatial moving averages using inverse distance
weighting,
– Kriging.
77
Trend surface fitting
• Entire study area can be represented by a formula f (x, y)
that for a given location with coordinates (x, y) will give us
the approximated value of the field in that location.
• Derive a formula that best describes the field.
• Simplest formula describes a flat, but tilted plane is
• f (x, y) = c1 · x + c2 · y + c3.
• This judgement must be based on domain expertise
• Determining best values for the coefficients c1, c2 and c3.
• Statistical techniques known as regression techniques can
be used to determine values for these coefficients ci that
best fit with the measurements.
78
Various global trend surfaces obtained from regression
techniques: (a) simple tilted plane; (b) bilinear saddle; (c)
quadratic surface; (d) cubic surface. Values range from white
(low), via blue, and light green to dark green (high).
79
• Not all fields are representable as simple,
tilted planes.
• higher-order polynomial function
• Bilinear Saddle
– f (x, y) = c1 · x + c2 · y + c3 · xy + c4
• quadratic surfaces, described by:
– f (x, y) = c1 · x2 + c2 · x + c3 · y2 + c4 · y + c5 · xy + c6.
• Cubic Surface given by:
80
• Trend surface fitting is a useful technique of
continuous field approximation, though determining
the ‘best fit’ values for the coefficients ci is a time-
consuming operation.
• Once these best values have been determined, we
know the formula, making it possible to compute an
approximated value for any location in the study
area.
• It is possible to use trend surfaces for both global
and local trends
• Sometimes single formula can describe the field for
the entire study area is an unrealistic one
81
• Identify the parts, apply the trend surface fitting
techniques, and obtain an approximation polynomial for
each part.
• Local trend surface fitting is not a popular technique in
practical applications, because they are relatively
difficult to implement, and other techniques such as
moving windows are better for the representation and
identification of local trends
• It is relatively simple to generate a raster layer, given an
appropriate cell resolution and an approximation
function for the cell’s value.
• In order to generate a vector layer representing this
data, isolines can be derived, for a given set of intervals.
82
Triangulation
• Triangulated Irregular Networks (TINs)
• This technique constructs a triangulation of
the study area from the known measurement
points.
• After having obtained it, we may define for
which values of the field we want to construct
isolines.
83
TIN
• Triangulated Irregular Network, or TIN is a
commonly used data structure in GIS software.
• It can be used to represent any continuous field.
• It is built from a set of locations for which we have a
measurement, for instance an elevation.
• The locations can be arbitrarily scattered in space,
and are usually not on a nice regular grid.
84
TIN
85
• Any location together with its elevation value
can be viewed as a point in three-dimensional
space. From these 3D points, we can construct
an irregular tessellation made of triangles.
86
• In three-dimensional space, three points
uniquely determine a plane, as long as they are
not collinear.
• A plane fitted through these points has a fixed
aspect and gradient, and can be used to
compute an approximation of elevation of other
locations.
• Since we can pick many triples of points, we can
construct many such planes, and therefore we
can have many elevation approximations for a
single location.
87
• Obtain a triangular tessellation of the complete
study space.
• There are many different tessellations for a given
input set of anchor points.
• Some tessellations are better than others, in the
sense that they make smaller errors of elevation
approximation.
• The second (in Fig) will provide a better
approximation because the average distance from P
to the three triangle anchors is smaller.
• Delaunay triangulation, which is an optimal
triangulation.
88
• Some properties of Delaunay Triangle:
– Equilateral Triangle for the given the set of anchor
points.
– For each triangle, the circumcircle through its three
anchor points does not contain any other anchor point.
• A TIN is a vector representation
• Each anchor point has a stored georeference.
• It is an irregular tessellation
• In this case, the cells do not have an associated
stored value as is typical of tessellations, but rather
a simple interpolation function that uses the
elevation values of its three anchor points.
89
• For instance, for elevation, we might want to
have the 100 m-isoline, the 200 m-isoline, and
so on. For each edge of a triangle, a geometric
computation can be performed that indicates
which isolines intersect it, and at what
positions they do so. A list of computed loca-
tions, all at the same field value, is used by the
GIS to construct the isoline
90
Triangulation as a means of interpolation
91
Isolines constructed from the triangulation
92
Moving Window averages using inverse
distance weighting (IDW)
• Moving window averaging attempts to directly
derive a raster dataset from a set of sample
points
• sometimes also called ‘gridding’
• A ‘window’ also known as a kernel is defined
• Moving window averaging is said to be a local
interpolation method.
93
No measurements available Computation is based on eleven
measurements
94
• simplest averaging function will compute the
arithmetic mean
95
P >= 1
96
Moving window averaging parameters
• Raster resolution:
– Larger cell size will smooth the function, removing local variations
– Smaller cell size will result in large clusters of equally valued cells, with little added
value
• Shape/size of window:
– Most procedures use square windows, but rectangular, circular or elliptical
windows are also possible. These can be useful in cases where the measurement
points are distributed regularly at fixed distance over the study area, and the
window shape must be chosen to ensure that each raster cell will have its window
include the same number of measurement points. The size of the window is
another important matter. Small windows tend to exaggerate local extreme
values, while large windows have a smoothing effect on the predicted field values.
• Selection criteria:
• Not necessarily all measurements within the window need to be used in averaging. We may
choose to select use at most n (nearest) measurements
• Averaging function:
– It is possible to use different distance–weighting functions, each of which will
influence the calculation of the resulting value.
97
Kriging
• Kriging is usually used when the variation of an attribute and/or
the density of sample points is such that simple methods of
interpolation may give unreliable predictions
• Kriging is based on the notion that the spatial change of a
variable can be described as a function of the distance between
points.
• It is similar to IDW interpolation, in that it the surrounding
values are weighted to derive a value for an unmeasured
location.
• However, the kriging method also looks at the overall spatial
arrangement of the measured points and the spatial correlation
between their values, to derive values for an unmeasured
location.
98
• The first step in the kriging procedure is to
compare successive pairs of point
measurements to generate a semi-variogram.
• In the second step, the semi-variogram is used
to calculate the weights used in interpolation.
99
• Kriging uses idea of rationalized variable which varies
from place to place with some continuity, which cannot
be modelled with single smooth mathematical equation.
• Ex: changes in the grade of ores, variation in soil
qualities, number of vegetative variables
• Each surface treated separately with three values
1. Drift or structure of surface, treat the surface as a
general trend in particular direction
2. Kriging assume that there will be small variations from
the general trend such as small peaks and depressions
that are random but still related spatially
3. Random noise that is neither associated with trend nor
spatially autocorrelated.
100
• Drift is estimated using a mathematical
equation
• Elevational distance is measured with the use
of statistical graphing technique called
semivariogram, which plot the distance
between samples called lag and the
semivariance
• Semivariance is the measure of
interdependency of the elevational values
based on closeness.
102
• A critical component of generating any Kriging
model is creating the semivariogram, which is a plot
that shows the variance in measure with distance
between all pairs of sampled locations.
• Points near to each other are expected to be more
similar than points that are farther apart.
• The range is the lag till where autocorrelation exists
among points based on distance.
• The nugget variance is the error or random effect.
• The sill is the distance at which points are no longer
spatially autocorrelated.
103
104
• Kriging is a powerful type of spatial
interpolation that uses complex mathematical
formulas to estimate values at unknown
points based on the values at known points.
There are several different types of Kriging,
including Ordinary, Universal, CoKriging, and
Indicator Kriging.
105
• Two general forms:
• Universal
• Used when surface is estimated from
irregularly distributed samples where trends
exist- nonstationarity
• Punctate
• Assumes that data exhibit stationarity are
isotropic and are equally spaced locations.
106
Problems of Interpolation
• While applying any interpolation methods
following fur factors matter:
• No. of control points
• Location of control points
• Problem of saddle points
• Area containing data points
107
• No. of control points
– More data points in case of unevenly generalized
surface
– More data points not always improve accuracy
– More complex surface needs more control points
– To capture necessary details
108
• Location of points
– Sample placement is more severe when interpolation of
collected data by area is considered to produce isoplethic
map
– Isopleth maps simplify information about a region by
showing areas with continuous distribution. Isopleth
maps may use lines to show areas where elevation,
temperature, rainfall, or some other quality is the same;
values between lines can be interpolated.
– Centroid of cell method for to locate the sample data
points
– Center of gravity method is most applicable when sample
polygons are either clustered or unevenly distributed.
109
• Saddle point problem
– Sometimes called alternative choice problem
110
111
• Area under consideration
– Select control points from all direction for better
interpolation
– Best interpolation results obtained when area
from all neighbourhood selected.
112
113
Type of Spatial Data
● Raster
○ PNG
○ Geotiff
● Vector
○ CSV
○ ESRI Shapefile (European Petroleum Survey Group)
○ Geo-JSON - GeoJSON is a geospatial data interchange format based on
JavaScript Object Notation (JSON).
○ Topo-JSON -as an extension of GeoJSON, supports multiple geometry
types: Point, LineString, Polygon, MultiPoint, MultiLineString,
MultiPolygon, and GeometryCollection.
○ KML and KMZ - KML (formerly known as Keyhole Markup Language) is an
XML-based file format for displaying information in a geographic context,
extension(.kml). KMZ stands for KML Zipped
○ Geopackage - A GeoPackage (GPKG) is an open, non-proprietary,
platform-independent and standards-based data format for geographic
information system implemented as a SQLite database container.
114
How to Create a Map from Phone’s
camera?
● Install ‘gps photo’, ‘gps status’ on mobile
● Take out your phones everyone.
● Open Camera app.
● Go to settings and Switch on “Save Location”/ “Store
Location”/”On Location”.
● Now click a photo.
● Go to the Info/Details of that photo and you will find
Latitude/Longitude of you image clicked.
● Now, tap on those and your image will open in
Google Map.
● Like this you can collect these Lat/Longs and later
load in QGIS and map will be created. (Introduction
to QGIS is in upcoming lectures).
How to Create a Map from Phone’s camera?
Exercise: Rome Around and COllect Pictures of Healthcare Buildings
(Hospital/Clinic/Medical Shop) and List their Lat/Longs in an Excel Sheet.
That’s how you collected material for digitization of those buildings! (Picture
below is just a sample)
QGIS
Main 6 features offered by QGIS:
● View Data
● Explore data and compose maps
● Create, edit, manage and export data
● Georeference images
● Analyse data
● Publish maps on the Internet : qgis2web
● Explore more functionality through plugins
117
Mobile Apps to make maps
● GPS Marker
■ How to make a point/line/polygon
■ Export them.
● GPS essentials
■ Demo for working of app
■ Hands on : use GPS essentials to make a track with images and
waypoints
■ Exporting (choose data format)
○ Others: geopaperazzi, kobotoolbox, mytracks (from google , now obsolete)
118
Map Marker
● One can easily organize the places they have
gone to using map marker app.
● Set a title, a description, a date, a color,
an icon and pictures for each marker
● Organize your markers into different
folders
● Search for places with Google
● Can edit the previous markers, lines and
polygons