You are on page 1of 39

There are three main forms of

Gathering Primary Geographical Data;


Ground Surveying and Positioning,
Remote Sensing, and
Census and Spatial Sampling Procedures
Ground Surveying and Remote Sensing are
being taught in your other subjects, so this
lecture will only cover Census and Spatial
Sampling Procedures
Census and Spatial Sampling may be used for
primary data gathering when Ground Surveying
and Remote Sensing methods can not gather
some data due to E.g.;

Impracticality in locating them due to high


mobility, short life span or large number,
Obstructions, or
Intangible phenomena.
Discrete features, such as houses or people can be
counted

Aim is to identify and count all members of a


population

Full enumeration of population would be ideal form


of data gathering, but have disadvantages;

Time consuming,

Laborious, and

Costly, when a population is large and widely dispersed


Census – Population Enumerations
Errors also occur for many reasons, e.g.;
Dishonest responses

Variety of addresses of people

Homeless people are elusive

Field workers paid at near local minimum


range
The practice of attaching locational information to
census data
Methods vary depending on whether the position of
individual population entities is preserved, or population
elements are aggregated with census regions and
reported only as totals
Entity focus

The most accurate way to geocode census data is to


identify the locations as well as the attributes of each
population entity

Can be done in two ways;

Vector Geocoding

Raster Geocoding
Vector Geocoding
Use a coordinate reference system to define the location
of point, line and area features

Basic coding unit is a point, defined by a coordinate pair (x,y)


or triplet (x, y, z)
Use a coordinate reference system to define the
location of point, line and area features

Line features are treated as a series of points

Area features as lines that close on themselves

Terrestrial coordinates (lats and longs, UTM etc)


are preferred over local coordinates (Cartesian- x,y
) for 2D geocoding
Alternative to vector geocoding and its use of
coordinates based on matrix or grid notations
Technically speaking, a pixel defines an area, not a
point
But in practice, if pixels are significantly smaller than
the spatial entities they represent, they provide a
satisfactory locational reference
Can think of vector and raster geocoding as being
locational aliases

They represent different but equivalent ways to define the


position of environmental features

Can readily convert locations in one system to locations in


the other

Thus - rasterization and vectorization


Instead of preserving the location of population
entities, we may have to aggregate census
information within the boundaries of data
collection regions
May do so;
To reduce costs
For reasons of confidentiality or to guarantee anonymity
To represent overall (total) population
Aggregating data raises a variety of mapping concerns
Maps based on region totals tend to be more abstract
and prone to inaccuracy
Need to look at the relation between population
distributions in the environment and the ways their
characteristics are captured and reported , in order to
understand the concern
Distributions are rarely uniform in regions

Region often vary widely in shape, size and orientation

Typically census regions are defined by political boundaries

Countries, Provinces, Districts, townships and cities are widely


used for aggregation

Government census organizations often define additional units,


such as enumeration districts, census tracts and blocks

Boundaries for these special census regions generally follow


political borders where possible, but also aligned with
prominent natural and cultural landmarks

Natural resource specialists tend to rely on natural landmarks,


such as watersheds, or flora/ fauna communities for defining
their census regions
Data is being aggregated by focused
administrative regions, such as suburbs,
wards, and zones (formal/informal) for
defining their census regions

Data may also be aggregated by units whose


boundaries define natural regions in the
environments, e.g.;
Watersheds
Climate zones
Soil classes
Ecological units, etc.
Relation to Population Distribution
Problem with aggregating census data by
region of any sort is that it is difficult to
devise a regional scheme that does not give
a biased picture of the population
Effort and cost required to devise an
unbiased data collection scheme often
cannot be justified to funders.
Maps based on these data will also suffer.
Aggregated data are generally reported as census
region totals

Information available for mapping is a count by region

Commonly represented with value-by-area mapping


techniques

Can also report census totals at region Centroid when


doing so, there are several options;

Centre of area

Centre of population
Centre of Area

Is the balance point for census region shape

Bears no relation to the distribution of


population within region

If distribution is uniform – centre will provide


an unbiased representation

If otherwise – centre of area can be


misleading
Centre of Population

Is more responsive to distribution of population

For point features, the centre of population is


determined by averaging the x and y coordinates of
the individual population

Centre of population is therefore also called the


Bivariate mean

This Centroid can only be computed if the positions


of individual features were recorded at the time of
data collection
Continuous phenomena, such as temperature or
barometric pressure, must be measured rather than
counted.

Not practical to collect complete data on these


phenomena, as it would entail making observations at
all possible locations

Census methods are inappropriate for these


phenomena

Even for those features that can be counted, such as


trees, a full population enumeration is often not
practical.

When census is not appropriate or too costly, the


alternative is SPATIAL SAMPLING
Aim

make observations at a limited number of carefully


chosen locations that are representative of the
distribution

must find a strategy that allows us to;

use the smallest number of observations

achieve a given level of accuracy


The choice of the strategy is key to efficient
sampling.

From the data we can predict the overall


character of the population.

The accuracy with which we can do so


depends on the quality of the sample
•.

Effective sampling involves matching several factors


to the nature of distribution under scrutiny.

A poor match will thwart attempts to reconstruct the


overall population with any degree of accuracy.

The most important factors to consider are;

Sample size,

Type of sampling units, and

Sampling unit dispersion


The size of sample is the first to be considered, but is not
as important then other two.

The key is making each sample location provide as much


insight into the nature of the distribution as possible.

A small number of well chosen samples may be more


revealing than a larger number of samples chosen
haphazardly.

From the literature of inferential statistics, it indicates


that a sample of 30 is about the ideal.
The common sampling units are; points, lines (called
“transects”), and areas (called “quadrats”).

Some liberty is taken in defining these terms.

For example, transects may consist of counting all features


falling within 100 metres either side of the line.

Size and shape of area units chosen is relatively


unimportant, in the context of the other sampling
parameters. It is more important that sampling units used
are placed so that they generate the most information.

There are a number of ways to disperse point, line and area


samples throughout a region.
The most critical sampling parameter is
the way sampling units are dispersed.

It is convenient to define SAMPLE


SCATTER in terms of deviation from a
random dispersion.
Thus a spatial sample may be;

Randomly dispersed,

More clustered than random, or

More uniform than random.


In inferential statistics, a random sample is
preferred.
In spatial terms, a random sample is one in
which each location has the same chance to
be chosen, and the choice of one location in
no way changes the probability of selecting
other locations to complete the sample.
Sampling of this type is “blind” as all prior
knowledge about the distribution is ignored.
As a result, a random strategy makes for a relatively
inefficient sampling method.

Although sampling is random, the scatter of


resulting sample usually is not.

Some areas are likely to be over-represented by the


sample, while other areas are not represented at all.

For this reason, samples with random scatter as a


rule, lead to the least accurate maps.
Greater sampling efficiency can be achieved if
the sample scatter is carefully matched to the
distribution being sampled.

This is done by portioning sample sites out to


locations in terms of their distributional
importance, and by using a sampling interval
that is sufficiently small to capture the
distributional features of particular interest.
The stratified sampling strategy is widely practiced in
national polls and ratings surveys.
This can be remarkably efficient and serves the
mapping purposes equally well.
The logic of stratified spatial sampling is;
First – divide a region into relatively homogeneous sub-
regions considered to bear some relation to
phenomenon being sampled. The sub-regions are called
“Strata”.

Second – assign sample sites to each sub-region in


proportion to what each area is thought to contribute to
the overall picture.
Where there is lack of information, stratification
can only be approximate.

But sketchy knowledge of the distribution vastly


improves the sampling accuracy compared to
the blind approach.
The sampling theorem is the rule that addresses the
sampling interval.
Stratified sampling can reduce the sample size needed
to achieve a given level of sampling accuracy.
It must still determine the minimum spacing between
sample sites – the “sampling interval”.
A guideline is borrowed from communications theory
which states that;
a sampling interval should be less than half the size of
the target features in a distribution.
A larger interval will not let you reconstruct the sampled
population and therefore, will lead to distorted
impressions of the distribution.
The electronic technology has
contributed to the growing interest in
collecting data from field in digital
form using automated methods.
Once such passive data collection
systems are established, data are
sensed, transmitted, received and
processed without the need for direct
human intervention.
The approach is ideal when monitoring
movements past a fixed position.
Some examples include, highway traffic,
and wildlife migration studies.
They are particularly useful when data must
be gathered repeatedly at a large number of
locations, or at remote locations.
Transmitters are also being used in passive
data collection systems.
In bio-telemetry studies, e.g. census taking
involves attaching transmitters to animals,
which automatically signal their position to
data recording devices.

Also, traffic flow and density can be


monitored with the aid of automated data
recorders and vehicles fixed with
transmitters.

You might also like