SL 364 Lecture 4 Census & Spatial Sampling - 2019

There are three main forms of
Gathering Primary Geographical Data;

Ground Surveying and Positioning,
Remote Sensing, and
Census and Spatial Sampling Procedures
Ground Surveying and Remote Sensing are
being taught in your other subjects, so this
lecture will only cover Census and Spatial
Sampling Procedures
Census and Spatial Sampling may be used for
primary data gathering when Ground Surveying
and Remote Sensing methods can not gather
some data due to E.g.;
Impracticality in locating them due to high

mobility, short life span or large number,
Obstructions, or
Intangible phenomena.
Discrete features, such as houses or people can be
counted
Aim is to identify and count all members of a

population
Full enumeration of population would be ideal form

of data gathering, but have disadvantages;
Time consuming,
Laborious, and
Costly, when a population is large and widely dispersed

Census – Population Enumerations
Errors also occur for many reasons, e.g.;
Dishonest responses
Variety of addresses of people
Homeless people are elusive
Field workers paid at near local minimum

range
The practice of attaching locational information to
census data
Methods vary depending on whether the position of
individual population entities is preserved, or population
elements are aggregated with census regions and
reported only as totals
Entity focus
The most accurate way to geocode census data is to

identify the locations as well as the attributes of each
population entity
Can be done in two ways;
Vector Geocoding
Raster Geocoding
Vector Geocoding
Use a coordinate reference system to define the location
of point, line and area features
Basic coding unit is a point, defined by a coordinate pair (x,y)

or triplet (x, y, z)
Use a coordinate reference system to define the
location of point, line and area features
Line features are treated as a series of points
Area features as lines that close on themselves
Terrestrial coordinates (lats and longs, UTM etc)

are preferred over local coordinates (Cartesian- x,y
) for 2D geocoding
Alternative to vector geocoding and its use of
coordinates based on matrix or grid notations
Technically speaking, a pixel defines an area, not a
point
But in practice, if pixels are significantly smaller than
the spatial entities they represent, they provide a
satisfactory locational reference
Can think of vector and raster geocoding as being
locational aliases
They represent different but equivalent ways to define the

position of environmental features
Can readily convert locations in one system to locations in

the other
Thus - rasterization and vectorization

Instead of preserving the location of population
entities, we may have to aggregate census
information within the boundaries of data
collection regions
May do so;
To reduce costs
For reasons of confidentiality or to guarantee anonymity
To represent overall (total) population
Aggregating data raises a variety of mapping concerns
Maps based on region totals tend to be more abstract
and prone to inaccuracy
Need to look at the relation between population
distributions in the environment and the ways their
characteristics are captured and reported , in order to
understand the concern
Distributions are rarely uniform in regions
Region often vary widely in shape, size and orientation
Typically census regions are defined by political boundaries
Countries, Provinces, Districts, townships and cities are widely

used for aggregation
Government census organizations often define additional units,

such as enumeration districts, census tracts and blocks
Boundaries for these special census regions generally follow

political borders where possible, but also aligned with
prominent natural and cultural landmarks
Natural resource specialists tend to rely on natural landmarks,

such as watersheds, or flora/ fauna communities for defining
their census regions
Data is being aggregated by focused
administrative regions, such as suburbs,
wards, and zones (formal/informal) for
defining their census regions
Data may also be aggregated by units whose

boundaries define natural regions in the
environments, e.g.;
Watersheds
Climate zones
Soil classes
Ecological units, etc.
Relation to Population Distribution
Problem with aggregating census data by
region of any sort is that it is difficult to
devise a regional scheme that does not give
a biased picture of the population
Effort and cost required to devise an
unbiased data collection scheme often
cannot be justified to funders.
Maps based on these data will also suffer.
Aggregated data are generally reported as census
region totals
Information available for mapping is a count by region
Commonly represented with value-by-area mapping

techniques
Can also report census totals at region Centroid when

doing so, there are several options;
Centre of area
Centre of population
Centre of Area
Is the balance point for census region shape
Bears no relation to the distribution of

population within region
If distribution is uniform – centre will provide

an unbiased representation
If otherwise – centre of area can be

misleading
Centre of Population
Is more responsive to distribution of population
For point features, the centre of population is

determined by averaging the x and y coordinates of
the individual population
Centre of population is therefore also called the

Bivariate mean
This Centroid can only be computed if the positions

of individual features were recorded at the time of
data collection
Continuous phenomena, such as temperature or
barometric pressure, must be measured rather than
counted.
Not practical to collect complete data on these

phenomena, as it would entail making observations at
all possible locations
Census methods are inappropriate for these

phenomena
Even for those features that can be counted, such as

trees, a full population enumeration is often not
practical.
When census is not appropriate or too costly, the

alternative is SPATIAL SAMPLING
Aim
make observations at a limited number of carefully

chosen locations that are representative of the
distribution
must find a strategy that allows us to;
use the smallest number of observations
achieve a given level of accuracy

The choice of the strategy is key to efficient
sampling.
From the data we can predict the overall

character of the population.
The accuracy with which we can do so

depends on the quality of the sample
•.
Effective sampling involves matching several factors

to the nature of distribution under scrutiny.
A poor match will thwart attempts to reconstruct the

overall population with any degree of accuracy.
The most important factors to consider are;
Sample size,
Type of sampling units, and
Sampling unit dispersion

The size of sample is the first to be considered, but is not
as important then other two.
The key is making each sample location provide as much

insight into the nature of the distribution as possible.
A small number of well chosen samples may be more

revealing than a larger number of samples chosen
haphazardly.
From the literature of inferential statistics, it indicates

that a sample of 30 is about the ideal.
The common sampling units are; points, lines (called
“transects”), and areas (called “quadrats”).
Some liberty is taken in defining these terms.
For example, transects may consist of counting all features

falling within 100 metres either side of the line.
Size and shape of area units chosen is relatively

unimportant, in the context of the other sampling
parameters. It is more important that sampling units used
are placed so that they generate the most information.
There are a number of ways to disperse point, line and area

samples throughout a region.
The most critical sampling parameter is
the way sampling units are dispersed.
It is convenient to define SAMPLE

SCATTER in terms of deviation from a
random dispersion.
Thus a spatial sample may be;
Randomly dispersed,
More clustered than random, or
More uniform than random.

In inferential statistics, a random sample is
preferred.
In spatial terms, a random sample is one in
which each location has the same chance to
be chosen, and the choice of one location in
no way changes the probability of selecting
other locations to complete the sample.
Sampling of this type is “blind” as all prior
knowledge about the distribution is ignored.
As a result, a random strategy makes for a relatively
inefficient sampling method.
Although sampling is random, the scatter of

resulting sample usually is not.
Some areas are likely to be over-represented by the

sample, while other areas are not represented at all.
For this reason, samples with random scatter as a

rule, lead to the least accurate maps.
Greater sampling efficiency can be achieved if
the sample scatter is carefully matched to the
distribution being sampled.
This is done by portioning sample sites out to

locations in terms of their distributional
importance, and by using a sampling interval
that is sufficiently small to capture the
distributional features of particular interest.
The stratified sampling strategy is widely practiced in
national polls and ratings surveys.
This can be remarkably efficient and serves the
mapping purposes equally well.
The logic of stratified spatial sampling is;
First – divide a region into relatively homogeneous sub-
regions considered to bear some relation to
phenomenon being sampled. The sub-regions are called
“Strata”.
Second – assign sample sites to each sub-region in

proportion to what each area is thought to contribute to
the overall picture.
Where there is lack of information, stratification
can only be approximate.
But sketchy knowledge of the distribution vastly

improves the sampling accuracy compared to
the blind approach.
The sampling theorem is the rule that addresses the
sampling interval.
Stratified sampling can reduce the sample size needed
to achieve a given level of sampling accuracy.
It must still determine the minimum spacing between
sample sites – the “sampling interval”.
A guideline is borrowed from communications theory
which states that;
a sampling interval should be less than half the size of
the target features in a distribution.
A larger interval will not let you reconstruct the sampled
population and therefore, will lead to distorted
impressions of the distribution.
The electronic technology has
contributed to the growing interest in
collecting data from field in digital
form using automated methods.
Once such passive data collection
systems are established, data are
sensed, transmitted, received and
processed without the need for direct
human intervention.
The approach is ideal when monitoring
movements past a fixed position.
Some examples include, highway traffic,
and wildlife migration studies.
They are particularly useful when data must
be gathered repeatedly at a large number of
locations, or at remote locations.
Transmitters are also being used in passive
data collection systems.
In bio-telemetry studies, e.g. census taking
involves attaching transmitters to animals,
which automatically signal their position to
data recording devices.
Also, traffic flow and density can be

monitored with the aid of automated data
recorders and vehicles fixed with
transmitters.

SL 364 Lecture 4 Census & Spatial Sampling - 2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SL 364 Lecture 4 Census & Spatial Sampling - 2019

Uploaded by

Copyright:

Available Formats

There are three main forms of

Gathering Primary Geographical Data;

Impracticality in locating them due to high

Aim is to identify and count all members of a

Full enumeration of population would be ideal form

Costly, when a population is large and widely dispersed

Variety of addresses of people

Homeless people are elusive

Field workers paid at near local minimum

The most accurate way to geocode census data is to

Can be done in two ways;

Basic coding unit is a point, defined by a coordinate pair (x,y)

Line features are treated as a series of points

Area features as lines that close on themselves

Terrestrial coordinates (lats and longs, UTM etc)

They represent different but equivalent ways to define the

Can readily convert locations in one system to locations in

Thus - rasterization and vectorization

Region often vary widely in shape, size and orientation

Typically census regions are defined by political boundaries

Countries, Provinces, Districts, townships and cities are widely

Government census organizations often define additional units,

Boundaries for these special census regions generally follow

Natural resource specialists tend to rely on natural landmarks,

Data may also be aggregated by units whose

Information available for mapping is a count by region

Commonly represented with value-by-area mapping

Can also report census totals at region Centroid when

Is the balance point for census region shape

Bears no relation to the distribution of

If distribution is uniform – centre will provide

If otherwise – centre of area can be

Is more responsive to distribution of population

For point features, the centre of population is

Centre of population is therefore also called the

This Centroid can only be computed if the positions

Not practical to collect complete data on these

Census methods are inappropriate for these

Even for those features that can be counted, such as

When census is not appropriate or too costly, the

make observations at a limited number of carefully

must find a strategy that allows us to;

use the smallest number of observations

achieve a given level of accuracy

From the data we can predict the overall

The accuracy with which we can do so

Effective sampling involves matching several factors

A poor match will thwart attempts to reconstruct the

The most important factors to consider are;

Type of sampling units, and

Sampling unit dispersion

The key is making each sample location provide as much

A small number of well chosen samples may be more

From the literature of inferential statistics, it indicates

Some liberty is taken in defining these terms.

For example, transects may consist of counting all features

Size and shape of area units chosen is relatively

There are a number of ways to disperse point, line and area

It is convenient to define SAMPLE

More clustered than random, or

More uniform than random.

Although sampling is random, the scatter of

Some areas are likely to be over-represented by the