You are on page 1of 7

25/02/2020

Lecture 3 – Introduction

• Measures of central tendency


Lecture 3
– Mean, median, mode
• Measures of spread
Measures of central tendency and spread,
– Range, inter-quartile range,
sampling variation & standard error
– Variance, standard deviation
• Sampling variation
• Standard error

[See Kirkwood & Sterne: Chapter 4 – Means, standard deviations and standard errors]

1 2

Summarising distributions – Central tendency Central tendency: Mean

Central tendency
• Mean in words: ‘Average’ value
Central tendency – Sum of the values divided by the number of observations
• Mean in numbers: n

x + x2 + x3 +  + xn x i
x (i.e. ‘x bar’) = 1 = i =1
n n
– xi are the values of the variable (i.e., the observations)
– S (i.e., ‘sigma’) means ‘the sum of’
– n is the number of observations

Numerical variables

3 4

Example: Calculating the Sample Mean (n = 8) Central tendency: Median

Plasma volume 8

(litres) x
i =1
i = 2.75 + 2.86 + 3.37 + 2.76 • Median in words: ‘middle observation’
+ 2.62 + 3.49 + 3.05 + 3.12
x1 = 2.75 – Half of the observations lie above the median and half
= 24.02
x2 = 2.86 8
below
x i
x3 = 3.37 x = i =1
n
x4 = 2.76
x5 = 2.62 • Median in numbers:
24.02
=
x6 = 3.49 8
(n + 1)th
x7 = 3.05 value of the ordered observations
= 3.00 litres 2
x8 = 3.12
Source: Kirkwood & Sterne, Example 4.1, pg 34

5 6

1
25/02/2020

Example: Calculating the Sample Median (n = 8) Example: Calculating the Sample Median

Plasma volume First rearrange the values in order Plasma volume


(litres) (litres)
Median = (n + 1)th value of the
x1 = 2.75 x1 = 2.62
x2 = 2.86 x2 = 2.75 2 ordered observations

x3 = 3.37 x3 = 2.76 = value of 4.5th ordered observation


x4 = 2.76 x4 = 2.86
x5 = 2.62 x5 = 3.05 = average of 2.86 (4th) and 3.05 (5th)
x6 = 3.49 x6 = 3.12
= 2.96 litres
x7 = 3.05 x7 = 3.37
x8 = 3.12 x8 = 3.49
Source: Kirkwood & Sterne, Example 4.1, pg 34 Source: Kirkwood & Sterne, Example 4.1, pg 34

7 8

Central tendency: Mode Which measure to use?

• Mode in words: • Mean


– Useful for symmetric distributions
– Value that occurs most often – Takes into account all observations
• Not unique – Affected by a few extremely high or low values
– e.g., mode of (2, 1, 3, 1, 2, 3, 4, 8, 2, 2, 5, 3, 2) = 2
• Median
– e.g., mode of (1, 1, 1, 2, 2, 2, 3, 3, 4) = ?? – Useful for skewed distributions

• Mode in numbers: • Mode


– Insensitive to outliers, except in very small data sets
– Derived by observation – Rarely used

9 10

Which measure to use? Summarising distributions – Spread

Numerical variables

Bell shaped distribution

Median Median

Mean

Mean

Median = Mean Median < Mean Median > Mean


Spread
Spread

11 12

2
25/02/2020

Range and inter-quartile range Range

1
• Range Highest value = 0.98 m2

.95
– Range = Highest value – Lowest value

Total Body Surface Area (m2)


– Based on only two observations

.9
Range = 0.98 – 0.74

.85
= 0.24 m2

.8
Range Range Range

.75
Lowest value = 0.74 m2
Source: Kidskin Study, Perth,

.7
WA, Australia

13 14

Inter-quartile range Inter-quartile range (IQR)

1
• Inter-quartile range (IQR)
– IQR = upper quartile – lower quartile .95
Total Body Surface Area (m2)

Upper quartile = 0.89


– Spread of the middle 50% of the values
.9

IQR = 0.89 – 0.80


.85

= 0.09 m2
.8

IQR IQR IQR Lower quartile = 0.80


.75

Source: Kidskin Study, Perth,


.7

WA, Australia
Figure 1 Figure 2 Figure 3

15 16

Variance Variance

• Spread of the distribution around the mean • Sample variance is:


n
 ( xi − x )
2

Figure 1 Figure 2 Figure 3 i =1

(n − 1)
• Difference of individual observations from mean
(deviations) = ( xi − x)
• Squared deviations = ( xi − x)
2

• Sum of deviations, squared =  ( xi − x) = ( x1 − x) 2 + ( x2 − x) 2 +  + ( xn − x) 2


2

i =1
Mean = 0 Mean = 0 Mean = 0
Variance = 0.25 Variance = 1.00 Variance = 9.00 • Number of independent deviations = (n − 1)

17 18

3
25/02/2020

Example: Calculating the sample variance Example: Calculating the sample variance

Plasma volume Compute sample mean: x = 3.00 Plasma volume Compute deviations around the mean
(litres; n=8) (litres; n=8) ( xi − x)
2.75 2.75 2.75 – 3 = -0.25
2.86 2.86 2.86 – 3 = -0.14
3.37 3.37 3.37 – 3 = 0.37
2.76 2.76 2.76 – 3 = -0.24
2.62 2.62 2.62 – 3 = -0.38
3.49 3.49 3.49 – 3 = 0.49
3.05 3.05 3.05 – 3 = 0.05
3.12 3.12 3.12 – 3 = 0.12

Source: Kirkwood & Sterne, Example 4.1, pg 34 Source: Kirkwood & Sterne, Example 4.1, pg 34

19 20

Example: Calculating the sample variance Example: Calculating the sample variance

Plasma volume Plasma volume


( xi − x) ( xi − x) 2 Compute squared deviations ( xi − x) ( xi − x) 2 8
(litres; n=8)
around the mean
(litres; n=8)
 ( x − x)
i =1
i
2
= 0.678
2.75 -0.25 (-0.25)2 = 0.0625 2.75 -0.25 0.0625
2.86 -0.14 (-0.14)2 = 0.0196 2.86 -0.14 0.0196
3.37 0.37 (0.37)2 = 0.1369 3.37 0.37 0.1369
2.76 -0.24 (-0.24)2 = 0.0576 2.76 -0.24 0.0576
2.62 -0.38 (-0.38)2 = 0.1444 2.62 -0.38 0.1444
3.49 0.49 (0.49)2 = 0.2401 3.49 0.49 0.2401
3.05 0.05 (0.05)2 = 0.0025 3.05 0.05 0.0025
3.12 0.12 (0.12)2 = 0.0144 3.12 0.12 0.0144

Sum = 0.678
Source: Kirkwood & Sterne, Example 4.1, pg 34 Source: Kirkwood & Sterne, Example 4.1, pg 34

21 22

Example: Calculating the sample variance Standard deviation

Plasma volume
( xi − x) ( xi − x) 2
(litres; n=8)
8

 ( x − x)
i
2
= 0.678 • Variance is measured in the square of units used for
2.75 -0.25 0.0625 i =1
observations
0.678
2.86 -0.14 0.0196 s2 =
7
= 0.097 L2 • Standard deviation is the square root of the variance
3.37 0.37 0.1369
2.76 -0.24 0.0576
– More convenient for many purposes since it is on the
same scale as the original measurements
2.62 -0.38 0.1444
3.49 0.49 0.2401 n
 ( xi − x )
2

3.05 0.05 0.0025 • Sample standard deviation = s = i =1

3.12 0.12 0.0144 (n − 1)


Sum = 0.678
Source: Kirkwood & Sterne, Example 4.1, pg 34

23 24

4
25/02/2020

Example: Calculating the sample standard deviation Interpretation of the standard deviation

Plasma volume
( xi − x) ( xi − x) 2 8

 ( x − x)
(litres; n=8) 2
Usually about 70% of
i = 0.678
2.75 -0.25 0.0625 i =1 observations lie within one
2.86 -0.14 0.0196 0.678 standard deviation
s2 = = 0.097 L2 Usually about 95% of
3.37 0.37 0.1369 7 of the sample mean observations lie within two
2.76 -0.24 0.0576 s = 0.097  0.311 L standard deviations
2.62 -0.38 0.1444
of the sample mean
3.49 0.49 0.2401
3.05 0.05 0.0025
3.12 0.12 0.0144

Sum = 0.678 x-2s x-s x x+s x+2s


Source: Kirkwood & Sterne, Example 4.1, pg 34

25 26

Interpretation of the standard deviation Which measure to use?

• Example • Bell shaped (symmetric) distributions


– Sample mean x = 80 kg – Mean, standard deviation
– Sample standard deviation s = 5.0 kg
• Skewed (positive or negative) distributions
• ~70% observations are between 75kg and 85kg – Median, IQR
• ~95% observations are between 70kg and 90kg

27 28

Sampling variation and standard error Sampling distribution

We infer the characteristics of the population (e.g. mean, spread)


from the characteristics of the sample
• Mean of a sample is unlikely to be exactly equal to the
population mean

POPULATION SAMPLE • Different samples would give different sample means

Population mean = m Sample mean = x


Population standard deviation = s Sample standard deviation = s
INFERENCE

29 30

5
25/02/2020

Sampling distribution Sampling variation and standard error

We infer the characteristics of the population (e.g. mean, spread)


from the characteristics of the sample
• If we collected many samples their sample means would
form a frequency distribution of their own
1, 2, 3, 4, 5, ……, 30

• This theoretical distribution of sample means is the POPULATION SAMPLES


SAMPLING DISTRIBUTION

• Variation in one sample mean to another sample mean is Population mean = m x x …x


known as SAMPLING VARIATION 1 2 30

INFERENCE
Sampling
distribution

31 32

Characteristics of the Sampling distribution Standard error of the sample mean

• The mean of the sampling distribution (over all possible • The standard error (s/√n) of the sample mean measures
samples) is the same as the population mean how precisely a sample mean estimates the population
mean
• The standard deviation (s.d.) of the sampling distribution
equals s/√n (known as the standard error) • The size of the standard error of the sample mean is
– (i.e. the s.d. of the sample mean over repeated determined both by variation in the population (s) and by
samples of the same size from the same population) size of sample (n)

33 34

Standard error of the sample mean Standard deviation versus Standard error

• Usually the population standard deviation, s, is unknown • The standard deviation gives a measure of the
variability (spread) of the individual values of a variable
• We can use the sample standard deviation, s, to
calculate the standard error: • The standard error gives an estimate of the variability of
sample means that would arise from repetitions of the
same study of the same sample size drawn from the
s.e.( x ) = s/√n
same population.

35 36

6
25/02/2020

Standard deviation versus Standard error Are the following statements TRUE or FALSE?

• The standard error gives us an idea of how confident • Sample median is always greater than the sample mean
we are in our estimate of the true but unknown
population mean, based on the observed sample mean
and the variability of observations within the sample. • The standard deviation measures the variability of the
observations.

• If the data were symmetrically distributed, we would expect


about 2.5% of the observations to be less than the mean
minus 2 standard deviations

37 38

Summary

• Understand two ways to


summarise distribution of
numerical variables:
– Measures of central
tendency
– Measures of spread
(variation)

• Understand sampling
variation and standard error

39

You might also like