You are on page 1of 22

2021/08/23

Lecture 3

GIS220:
The nature of models and data

Prof Gregory Breetzke


greg.breetzke@up.ac.za
Room 1-19, Geography

Lecture outline

• The scientific method

• Accuracy, precision and measurement error

• Scales or levels of measurement

• Sampling
• Reasons for sampling
• Sampling methods
• Probability and non-probability sampling
• Advantages and disadvantages of each sampling
method

1
2021/08/23

The scientific method


• The study of geographic phenomena often requires the application of
statistical methods to produce new insight

• Both social scientists and physical scientists make use of the scientific
method in their attempts to learn and solve problem about the world

• E.g., Suppose we are interested in showing and describing the spatial


association between TB and alcohol outlets in a metropolitan area

The scientific method

2
2021/08/23

Accuracy and precision


Shots tightly grouped in the centre (accurate) Shots all over the target (accurate)

Shots grouped away from the centre (inaccurate) All shots wide apart from the centre (inaccurate)

What do you say?

Error

Types of measurement errors


1) Gross
Mistakes & blunders & carelessness. Outliers can be identified and
removed.
2) Random
Inconsistency in measured values due to, e.g. precision limitations in
equipment.
3) Systematic
An error having a non-zero mean, so that its effect is not reduced
when observations are averaged.
E.g., stopwatch

3
2021/08/23

Gross error

Types of measurement errors


1) Gross
Mistakes & blunders & carelessness. Outliers can be identified and
removed.
2) Random
Inconsistency in measured values due to, e.g. precision limitations in
equipment.
3) Systematic
An error having a non-zero mean, so that its effect is not reduced
when observations are averaged.
E.g., stopwatch

Error

Say you are doing spatial analysis of TB and you use the wrong
projection. What type of error will you suspect? Systematic, Random
or Gross error?

Does it matter?

4
2021/08/23

Why is the Mercator projection a poor


choice for Google Maps?
• Google Map uses a Mercator projection

• A cylindrical projection

• It does not preserve:


– areas,
– distances,
– shortest paths as straight lines,
– shapes (except for very/infinitesimally small areas)

• It does preserve:
– angles

Inability to show entire Earth


In conventional equatorial aspect, polar regions omitted

5
2021/08/23

South Pole is “off the bottom” of the map

Severe distortion in areal scale


Greenland appears larger than South America but is in
fact ~one-eighth the size, etc.

6
2021/08/23

Severe distortion in linear scale


Christchurch to Falkland Islands, using Google’s scale bar is ~12,200 km,
whereas real distance is ~8,300 km.

Google Earth reports great circle distances


Note that ruler tool yields properly curved lines. Only
the gnomonic projection preserves all great circle
segments as straight lines.

7
2021/08/23

Example
The assumption that the earth is a sphere (and NOT
a spheroid!) is fine for small scale maps (smaller
than 1:5.000.000). At this scale the difference
between a sphere and a spheroid is not detectable
For large scale maps, a spheroid is necessary to
more accurately represent the shape of the earth

Horn, C. A. & Breetzke, G. D. (2009). Informing a crime strategy for the FIFA 2010 World Cup: a case
study for the Loftus Versfeld stadium in the city of Tshwane, South Africa, Urban Forum, 20(1), 19-32.

Scales of measurement
• What a scale actually means and what we can do with it depends on
what its numbers represent.

• Numbers can be grouped into 4 types or levels: nominal, ordinal, interval,


and ratio.

• Nominal is the simplest, and ratio the most sophisticated.

• Each level possesses the characteristics of the preceding level, plus an


additional quality.

• It is important because the mathematical and statistical


methods we use are largely affected by the scales of
measurement of our data.

8
2021/08/23

Scales of measurement
• There are four measurement scales (or types of data):

• nominal

Scales of measurement
There are four measurement scales (or types of data):

• ordinal

9
2021/08/23

Scales of measurement
There are four measurement scales (or types of data):

• interval

• ratio

Numeric values can represent four types of information: nominal data (class),
ordinal data (rank), interval data (ordered scale), or ratio data (continuous
scale). The type of information represented may have a dramatic effect on
how you should interpret the values.

20

10
2021/08/23

Scales of measurement

Test

11
2021/08/23

What is sampling?

Sampling
• What is your (target) population of interest?

• To whom do you want to generalize your results?


• Students
• All doctors
• School children
• Sepedi’s
• Women aged 15-45 years
• Other

• Can you sample the entire population?

12
2021/08/23

Some definitions
• Population: the universe of units from which the sample is to be
selected
• Sample: the segment of population that is selected for investigation
• Sampling frame: list of all units
• Representative sample: a sample that reflects the population accurately
• Sample bias: distortion in the representativeness of the sample
• Sampling error: difference between sample and population
• Non-sampling error: occurs as a result of
non-response, poor question wording, poor
interviewing
• Non-response: when members of sample are
unable or refuse to take part
• Census: data collected from entire population

Types of samples

• Probability (random) samples (each unit has a known chance


of selection)

• Simple random sample


• Systematic random sample
• Stratified random sample
• Multistage sample
• Cluster sample

13
2021/08/23

Types of samples

1. Simple random sample


• Every unit has an equal probability of selection

• Sampling fraction: n/N where n = sample size and N = population size

• List all units and number them consecutively

• Use random numbers table to select units

Types of samples

2. Systematic sample
• Select units directly from sampling frame

• From a random starting point, choose every nth unit (e.g. every 4th name)

• Ensure sampling frame has no inherent ordering

14
2021/08/23

Types of samples

3. Stratified random sample


• Proportionately representative of each stratum

• Stratify population by appropriate criteria

• Randomly select within each category

Types of samples
• Scenario: Imagine you want to obtain a “stratified random sample” of 60
elements or observations (n) from a population (N) of 400

• Suppose the population (N) has been sub-divided into Stratum A (White) with
140 observations (N1), Stratum B (Black) containing 220 observations (N2), and
Stratum C (Coloured/Indian) with 40 observations (N3).

• Question: Can you determine the required number of samples per stratum (i.e. in
A, B and C respectively)?

Answer: The required number of


samples in the three strata would be:
21 for A,
33 in B, and
6 in C.

15
2021/08/23

Types of samples

Components of research
4. Multi-stage cluster sample
• Useful for widely dispersed populations

• Divide population into groups (clusters) of units

• Sample sub-clusters from clusters

• Randomly select units from each (sub) cluster

• Collect data from each cluster of units, consecutively

16
2021/08/23

Qualities of a probability sample

• Representative - allows for generalisation from sample to population

• Sample means can be used to estimate population means

• Standard error (SE): estimate of discrepancy between sample mean and


population mean

• 95% sample means fall between +/- 1.96 SE from population mean

Confidence interval

• If a sample has been selected according to probability sampling


principles, we know that we can be 95% certain that the population
mean will lie between the sample mean + or – 1.96 multiplied by the
standard error of the mean. Known as the confidence interval

• If the HIV rate among incarcerated prisoners at a sample of correctional


centres is 60%, and the standard error of the mean is 6.5%, we can be
95% certain that the population mean will lie between:

60% + (1.96 * 6.5) and 60% - (1.96 * 6.5)

i.e. lie between 47.26% and 72.74%

17
2021/08/23

Sample size

• Absolute size matters more than relative size

• The larger the sample, the more precise and representative it is likely to
be

• As sample size increases, sampling error decreases

• Important to be honest about the limitations of your sample

Factors affecting sample size

• Time and cost


• After a certain point (n=1000), increasing sample size produces less
noticeable gains in precision

• Very large samples are decreasingly cost-efficient (Hazelrigg, 2004)

• Non-response
• Response rate = % of sample who agree to
participate (or % who provide usable data)

• Responders and non-responders may differ


on a crucial variable

18
2021/08/23

Factors affecting sample size

• Heterogeneity of the population


• The more varied the population is, the larger the sample will have to be

• Kind of analysis to be carried out


• Some techniques require large sample (e.g. contingency table;
inferential statistics)

Limitations to generalisations

• Findings can only be generalised to the population from


which the sample was selected
• Be wary of over-generalising in terms of locality

• Time, historical events and cohort effects


• Results may no longer be relevant and so require updating (replication)

19
2021/08/23

Types of non-probability sampling

• Non-probability samples

• Convenience/opportunity sampling
• Snowball sampling
• Quota sampling
• Judgemental sampling

Types of non-probability sampling

1. Convenience/opportunity sampling
• The most easily accessible individuals
• Useful when piloting a research instrument
• May be a chance to collect data that is too good to miss

2. Snowball sampling
• Researcher makes initial contact with a small group
• These informants lead you to others in their network
• Useful for qualitative studies of deviant groups
• e.g. Becker (1963) marijuana users

20
2021/08/23

Types of non-probability sampling

3. Quota sampling
• Strata: Asian, 25-34 yrs old, lower-middle class
• Often used in market research and opinion polls
• Relatively cheap, quick and easy to manage
• But non-random sampling of each stratum’s units
• Interviewers select people to fit their quota for each category
• sample biased towards those who appear friendly and accessible (e.g. in the
street)
• under-representation of less accessible groups

Types of non-probability sampling

4. Judgemental sampling
• Researcher selects observations to be sampled based on their
knowledge…
• Used when the individuals have expertise in the area being
researched
• E.g. conduct field survey in specific areas prone to large fires during
the fire season in Western Cape

21
2021/08/23

References
• de Smith, M.J., Goodchild, M.F. and Longley, P.A. 2018. Geospatial
Analysis: A Comprehensive Guide to Principles, Techniques and Software
Tools. Matador.

• Rogerson, P. (2001). Statistical Methods for Geography. Sage.

• Walford, N. (2011). Practical Statistics for Geographers and Earth


Scientists. John Wiley & Sons.

• Wheeler, D., Shaw, G. Barr, S. (2010). Statistical Techniques in


Geographical Analysis (3rd ed.). Routledge.

22

You might also like