GIS220 Descriptive Statistics

2021/08/25
Lecture 4
GIS220:
Descriptive statistics
Prof Gregory Breetzke

greg.breetzke@up.ac.za
Room 1-19, Geography
Lecture overview
• What are descriptive statistics?

• Types of descriptive statistics
– Univariate
– Bivariate
• Examples
1
2021/08/25
Descriptive statistics
• Provide an initial entry point
• Some research questions can satisfactory be answered

using descriptive statistics
Types of descriptive statistics
• Univariate and bivariate statistics

– U: mean, mode, range, standard deviation
– B: correlation coefficient
2
2021/08/25
UNIVARIATE
• Measures of central tendency

– Mean
– Mode
– Median
• Measures of dispersion
– Range
– Interquartile range
– Variance
– Standard deviation
The mean
• The mean is a measure of central value

– What most people mean by “average”
– Sum of a set of numbers divided by the number
of numbers in the set
3
2021/08/25
The median
• Middlemost or most central item in the set of
ordered numbers; it separates the distribution
into two equal halves
• If odd, then n is the middle value of sequence
– if X = [1,2,4,6,9,10,12,14,17]
– then 9 is the median
• If even, then n, average of 2 middle values
– if X= [1,2,4,6,9,10,11,12,14,17]
– then 9.5 is the median; i.e., (9+10)/2
• Median is not affected by extreme values
The mode
• The mode is the most frequently occurring
number in a distribution
– if X = [1,2,4,7,7,7,8,10,12,14,17]
– then 7 is the mode
• Easy to see in a simple frequency distribution
• Possible to have no modes or more than one
mode
– bimodal and multimodal
• Don’t have to be exactly equal frequency
– major mode, minor mode
• Mode is not affected by extreme values
4
2021/08/25
When to use what…?

• Mean is a great measure. But, there are time when its
usage is inappropriate or impossible
– Nominal data: Mode
– The distribution is bimodal: Mode
– You have ordinal data: Median or mode
– Are a few extreme scores: Median
Dispersion
• Dispersion
– How tightly clustered or how
variable the values are in a data
set
• Example
– Data set 1: [0,25,50,75,100]
– Data set 2: [48,49,50,51,52]
– Both have a mean of 50, but data
set 1 clearly has greater variability than data set 2
5
2021/08/25
Range
• The difference between the maximum and
minimum values in a set
• Example
– Data set 1: [1,25,50,75,100]; R: 100-1 = 99
– Data set 2: [48,49,50,51,52]; R: 52-48 = 4
– The range ignores how data are distributed and
only takes the extreme scores into account
• RANGE = (Xlargest –Xsmallest)
Quartiles
• Split ordered data into four quarters
= first quartile = (25th percentile)

= second quartile = Median (50th percentile)
= third quartile = (75th percentile)
6
2021/08/25
Interquartile range (IQR)

• Difference between third and first quartiles
– Interquartile Range = Q3-Q1
• Spread in middle 50%
• Not affected by extreme values
• The IQR is used to measure how spread out the data points in a set
are from the mean of the data set
• The higher the IQR, the more spread out the data points
• The smaller the IQR, the more bunched up the data points are
around the mean
• It is best used with other measurements such as the median and

total range to build a complete picture of a data set’s tendency to
cluster around its mean.
Example
• Given the set of values: 27, 18, 19, 12, 15, 1,

2, 6, 5, 9, 7, find the…
– Mean
– Median
– Range
– Interquartile range
7
2021/08/25
Standard deviation
• Let X = [3, 4, 5 ,6, 7]
– X=5
– (X - X) = [-2, -1, 0, 1, 2]
• Subtract x from each number in X
– (X - X)2 = [4, 1, 0, 1, 4]
• Squared deviations from the mean
– – S (X - X)2 = 10
• Sum of squared deviations from the mean (SS)
– S (X - X)2 /n-1 = 10/5 = 2.5
• Average squared deviation from the mean
– S (X - X)2 /n-1 = 2.5 = 1.58
• Square root of averaged squared deviation
Standard deviation
• Most South African employers issue raises based on
percent of salary
• Why do supervisors think the most fair raise is a
percentage raise?
• Answer:
1)Because higher paid persons get the most money.
2)The easiest thing to do is raise everyone’s salary by a fixed
percent.
• If your budget went up by 5%, salaries can go up by 5%.
• The problem is that the flat percent raise gives
unequal increased rewards
8
2021/08/25
Standard deviation
• Acme Toilet Cleaning Services
• Salary Pool: R200,000
Incomes:
• President: R100K; Manager: R50K; Secretary: R40K; and
Toilet Cleaner: R10K
• Mean: R50K - These can be considered
• Range: R90K “measures of inequality”
• Variance: R1,050,000,000
• Standard Deviation: R32.4K
• Now, let’s apply a 5% raise
Standard deviation
• After a 5% raise, the pool of money increases by R10K to
R210,000
• Incomes:
– President: R105K; Manager: R52.5K; Secretary: R42K; and Toilet Cleaner:
R10.5K
– Mean: R52.5K –went up by 5%
– Range: R94.5K –went up by 5%
– Variance: R1,157,625,000
– Standard Deviation: R34K –went up by 5%
• The flat percentage raise increased

inequality. The top earner got 50% of
the new money. The bottom earner
got 5% of the new money. Measures of
inequality went up by 5%.
9
2021/08/25
Skew
• Skewness is a measure of the asymmetry of the
probability distribution
• Roughly speaking, a distribution has positive skew
(right-skewed) if the right (higher value) tail is
longer and a negative skew (left-skewed) if the left
(lower value) tail is longer (confusing the two is a
common error)
Skew
10
2021/08/25
Kurtosis
• A high kurtosis distribution has a sharper "peak"

and fatter "tails", while a low kurtosis distribution
has a more rounded peak with wider "shoulders".
11
2021/08/25
Frequency distributions
• Symmetrical distribution
– Approximately equal numbers of observations above and
below the middle
• Skewed distribution
– One side is more spread out that the other, like a tail
– Direction of the skew
• Positive or negative (right or left)
• Side with the fewer scores
• Side that looks like a tail
Symmetrical vs. skewed distributions
12
2021/08/25
BIVARIATE
• Correlation
– linear pattern of relationship between one variable (x) and
another variable (y) –an association between two variables
• Relative position of one variable correlates with relative
distribution of another variable
• Warning:
– No proof of causality
– Cannot assume x causes y
Scatterplots and correlation

• A scatter plot (or scatter diagram) is used to show
the relationship between two variables
– Scatter diagram plots pairs of bivariate observations (x, y)
on the X-Y plane
– Y is called the dependent variable
– X is called an independent variable
• Correlation analysis is used to measure strength of
the association (linear relationship) between two
variables
– Only concerned with strength of the
relationship
– No causal effect is implied
13
2021/08/25
Types of correlation
• Positive correlation
– High values of X tend to be associated with high values of Y.
– As X increases, Y increases
• Negative correlation
– High values of X tend to be associated with low values of Y.
– As X increases, Y decreases
• No correlation
• No consistent tendency for values on Y to increase or
decrease as X increases
14
2021/08/25
15
2021/08/25
Applications
Individual vs Group (Neighbourhood)
16
2021/08/25
What type of relationship?

Scatterplot:Video Games and Alcohol Consumption
20
Average Number of Alcoholic Drinks
18
16
14
Per Week
12
10
8
6
4
2
0
0 5 10 15 20 25
Average Hours of Video Games Per Week
What type of relationship?

Scatterplot: Video Games and Test Score
100
90
80
70
Exam Score
60
50
40
30
20
10
0
0 5 10 15 20
Average Hours of Video Games Per Week
17
2021/08/25
Each point represents something or

some PLACE!!
18
2021/08/25
19
2021/08/25
Practical 1
Date: Thursday 26th August 1130-1430 (Posted on Thursday)
Location: Remotely or on-campus (Brown & Orange & Red IT labs)
Assistance: Thursdays 1130-1420 and Thursdays 14:00-16:00 by

appointment via Doodle
Due: Thursday 9th September at 1130 (upload on ClickUp)
Task: Sampling exercise and gaining familiarity with GeoDa and

ArcPro
Software: Excel, GeoDa and ArcPro
20

GIS220 Descriptive Statistics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GIS220 Descriptive Statistics

Uploaded by

Copyright:

Available Formats

2021/08/25

Prof Gregory Breetzke

• What are descriptive statistics?

• Provide an initial entry point

• Some research questions can satisfactory be answered

Types of descriptive statistics

• Univariate and bivariate statistics

Types of descriptive statistics

• Measures of central tendency

• The mean is a measure of central value

When to use what…?

• RANGE = (Xlargest –Xsmallest)

= first quartile = (25th percentile)

Interquartile range (IQR)

• Spread in middle 50%

• Not affected by extreme values

• It is best used with other measurements such as the median and

• Given the set of values: 27, 18, 19, 12, 15, 1,

• The flat percentage raise increased

• A high kurtosis distribution has a sharper "peak"

Symmetrical vs. skewed distributions

Types of descriptive statistics

Scatterplots and correlation

Individual vs Group (Neighbourhood)

What type of relationship?

What type of relationship?

Each point represents something or

Date: Thursday 26th August 1130-1430 (Posted on Thursday)

Location: Remotely or on-campus (Brown & Orange & Red IT labs)

Assistance: Thursdays 1130-1420 and Thursdays 14:00-16:00 by

Due: Thursday 9th September at 1130 (upload on ClickUp)

Task: Sampling exercise and gaining familiarity with GeoDa and

Software: Excel, GeoDa and ArcPro

You might also like