Professional Documents
Culture Documents
If we asked them all, we would be conducting a census, but this takes too long
Instead, we ask a random sample (e.g. every 10th name from an alphabetical list)
and get the following data from 20 students:
60, 20, 40, 80, 30, 70, 100, 50 ,50, 90, 60, 40, 50, 60, 30, 60, 70, 80, 40, 110
Because the sample was randomly selected from the population, we hope that
the sample is representative of the population. If that is the case, the statistics
calculated for the sample will approximate the parameters for the population.
3
We have described the data set using:
Minimum Mean, median, mode
Maximum
Range
Q1 Q3
IQR
F 4
R
E
Q 3 3
U
E 2 2 2 2
N
C
Y 1 1
∑ | 𝒙 − 𝒙 | ∑ ( 𝒙 − 𝒙 )𝟐 = 𝒔
𝑴𝑨𝑫 = 𝒊=𝟏
𝒔 = 𝟐 𝒊 =𝟏 𝑪𝑽 = (𝟏𝟎𝟎 %)
𝒏 𝒏− 𝟏 𝒙
Q3
A line within the box marks the median (Q2) Q2
Q1
From each end of the box, lines (called whiskers) extend
to the min. and max. values (not including outliers) min.
6
Info. obtained from box plots (2)
Signs of positively (right) skewed distributions:
Median lies towards the left of the box
The right whisker is longer than the left
7
Info. obtained from box plots (3)
Signs of negatively (left) skewed distributions:
Median lies towards the right of the box
The left whisker is longer than the right
8
Quartiles: special Percentiles
We have seen how to divide our ordered set of values into quartiles.
Each quartile cuts off ¼ (25%) of the total observations.
Median
Q1 Q2 Q3
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
40 60 75
We might start with a given data value, and want to establish what
percentile it represents: let’s consider the value 80
𝟏𝟓+ 𝟎 .𝟓
¿
𝟐𝟎 ¿𝟎.𝟕𝟕𝟓 %
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
P77.5
10
Calculating Percentiles (2)
𝟐𝟎 ∗ 𝟏𝟐
¿
𝟏𝟎𝟎 ¿ 𝟐 .𝟒
Because the position (2.4) is not an integer, we round up to the next
whole number (3). The third value in the data set is the 12th percentile
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
P12
11
Calculating Percentiles (3)
𝟐𝟎 ∗ 𝟏𝟎
¿
𝟏𝟎𝟎 ¿𝟐
When the position (c) is an integer, such as 2, we identify the required
percentile as the average of the values in the c and c+1 positions.
c c+1
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
30
Like P25, P50, and P75 are special percentiles (QUARTILES),
P10 so P10, P20, P30…P90 are special percentiles (DECILES).
P50 = D5 = Q2 = median
12
z score or Standardized Score
A different way we can describe the position of a certain value in our
data set (student concert expenditures) is by allocating it a z-score
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
13
z score calculation
We already know the sample mean and sample standard deviation for
our data set.
What is the z score for the value of 100?
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
SAMPLE
The value of 100 is located 1.6 standard deviations above the mean (we
know it is above the mean, because the z score is positive)
14
z score visualisation
The z score tells us the position of the data value relative to the mean…
how far the value is above (+) or below (-) the mean
15
Practice questions
16