MS1 Lecture 3

Math and Statistics 1
Unit 3: Describing positions in data sets

Instructors:
David Leonard, Ph.D. Daniel Dan, Ph.D. Joanne Yu, MSc.
david.leonard@modul.ac.at daniel.dan@modul.ac.at
joanne.yu@modul.ac.at
In the last session, we:
…used descriptive statistics to describe data sets in terms of:
• how large the values are (measures of location/central tendency)
• mean, median, mode
• how spread out the values are (measures of dispersion/variability)
• range, interquartile range, mean absolute deviation,
variance, standard deviation, coefficient of variation
…noted that different notation is used to distinguish whether we
are using data coming from a sample or the whole population
…saw that the formulas for sample statistics are often different to
those for population parameters (so that we obtain an “unbiased
estimate” when extrapolating from samples to populations)
We continue using the same sample data today to explore

how to describe particular positions within data sets
2
Example: concert tickets
We want to know the average amount that female students at MU spent on
concert tickets last year.
The population: all those about whom we want to make a general statement…
all female students at MU
If we asked them all, we would be conducting a census, but this takes too long
Instead, we ask a random sample (e.g. every 10th name from an alphabetical list)
and get the following data from 20 students:
60, 20, 40, 80, 30, 70, 100, 50 ,50, 90, 60, 40, 50, 60, 30, 60, 70, 80, 40, 110
Because the sample was randomly selected from the population, we hope that
the sample is representative of the population. If that is the case, the statistics
calculated for the sample will approximate the parameters for the population.
3
We have described the data set using:
Minimum Mean, median, mode
Maximum
Range
Q1 Q3
IQR
F 4
R
E
Q 3 3
U
E 2 2 2 2
N
C
Y 1 1
Class ($) 20 30 40 50 60 70 80 90 100 110

Frequency ƒ 1 2 3 3 4 2 2 0 2 1
𝒏 𝒏
∑ | 𝒙 − 𝒙 | ∑ ( 𝒙 − 𝒙 )𝟐 = 𝒔
𝑴𝑨𝑫 = 𝒊=𝟏
𝒔 = 𝟐 𝒊 =𝟏 𝑪𝑽 = (𝟏𝟎𝟎 %)
𝒏 𝒏− 𝟏 𝒙
Next we describe other positions in the data set,

starting with a new visualization: the box plot
4
Box plots
A box plot is another way of graphically displaying a distribution. It provides a
summary of the distribution by using five numbers: the minimum value, Q1, Q2,
Q3, and the maximum value, plus any outliers
outlier
max.
On a number line, the box extends from Q1 to Q3
(50% of the values will fall within this range)
Q3
A line within the box marks the median (Q2) Q2
Q1
From each end of the box, lines (called whiskers) extend
to the min. and max. values (not including outliers) min.
Beyond the ends of the whiskers, outliers are marked as dots/asterisks

5
Info. obtained from box plots (1)
Signs of symmetrical distributions:
Median is close to the center of the box
Whiskers are approximately the same length
6
Signs of positively (right) skewed distributions:
Median lies towards the left of the box
The right whisker is longer than the left
7
Signs of negatively (left) skewed distributions:
Median lies towards the right of the box
The left whisker is longer than the right
8
Quartiles: special Percentiles
We have seen how to divide our ordered set of values into quartiles.
Each quartile cuts off ¼ (25%) of the total observations.
Median
Q1 Q2 Q3
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
40 60 75
P25 P50 P75
We can refer to Q1 as the 25th percentile (P25) because it cuts of 25%

of the values on the left of the distribution.
We can also calculate other percentiles within the data set.
9
Calculating Percentiles (1)
We might start with a given data value, and want to establish what
percentile it represents: let’s consider the value 80
𝟏𝟓+ 𝟎 .𝟓
¿
𝟐𝟎 ¿𝟎.𝟕𝟕𝟓 %
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
P77.5
10
Alternatively, we might want to find the value in a data set that

corresponds to a certain percentile (Pk): let’s find P12, the 12th
percentile
𝟐𝟎 ∗ 𝟏𝟐
¿
𝟏𝟎𝟎 ¿ 𝟐 .𝟒
Because the position (2.4) is not an integer, we round up to the next
whole number (3). The third value in the data set is the 12th percentile
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
P12
11
Perhaps when we find the position corresponding to a certain percentile,

the result is a whole number: for example, let’s find P10
𝟐𝟎 ∗ 𝟏𝟎
¿
𝟏𝟎𝟎 ¿𝟐
When the position (c) is an integer, such as 2, we identify the required
percentile as the average of the values in the c and c+1 positions.
c c+1
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
30
Like P25, P50, and P75 are special percentiles (QUARTILES),
P10 so P10, P20, P30…P90 are special percentiles (DECILES).
P50 = D5 = Q2 = median
12
z score or Standardized Score
A different way we can describe the position of a certain value in our
data set (student concert expenditures) is by allocating it a z-score
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
The z-score tells us how many standard deviations a particular value is

above or below the mean.
SAMPLE z-score POPULATION z-score

𝒙−µ
𝒛 𝒔𝒄𝒐𝒓𝒆=
𝝈
To calculate the z score for a particular value, subtract the mean of the
data set from the value and divide the result by the standard deviation.
13
z score calculation
We already know the sample mean and sample standard deviation for
our data set.
What is the z score for the value of 100?
20, 30, 30, 40, 40, 40, 50, 50, 50, 60, 60, 60, 60, 70, 70, 80, 80, 100, 100, 110
SAMPLE
The value of 100 is located 1.6 standard deviations above the mean (we
know it is above the mean, because the z score is positive)
The value of 20 is located about 1.6 standard deviations below the

mean ()
z scores are affected by outliers, because outliers directly affect both

the mean and the standard deviation.
14
z score visualisation
The z score tells us the position of the data value relative to the mean…
how far the value is above (+) or below (-) the mean
z score= -1 z score= 0 z score= 1

(𝒙 − 𝒔 ≈ 35) (𝒙=60) (𝒙+ 𝒔 ≈ 8 5)
𝒔 ≈ 𝟐𝟓 𝒔 ≈ 𝟐𝟓 z score= 1.6
𝟒𝟎=𝟏.𝟔∗ 𝒔
15
Practice questions
For z scores and percentiles,

use Jaisingh Chapter 4, pages 86 to 96.
16

MS1 Lecture 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MS1 Lecture 3

Uploaded by

Copyright:

Available Formats

Math and Statistics 1

Unit 3: Describing positions in data sets

We continue using the same sample data today to explore

Class ($) 20 30 40 50 60 70 80 90 100 110

Next we describe other positions in the data set,

Beyond the ends of the whiskers, outliers are marked as dots/asterisks

P25 P50 P75

We can refer to Q1 as the 25th percentile (P25) because it cuts of 25%

Alternatively, we might want to find the value in a data set that

Perhaps when we find the position corresponding to a certain percentile,

The z-score tells us how many standard deviations a particular value is

SAMPLE z-score POPULATION z-score

The value of 20 is located about 1.6 standard deviations below the

z scores are affected by outliers, because outliers directly affect both

z score= -1 z score= 0 z score= 1

For z scores and percentiles,

You might also like