Professional Documents
Culture Documents
Z Scores
Learning Objectives
By the end of this lecture, you should be able to:
• One method of summarizing the spread has already been discussed: Quartiles and the
related 5-number summary.
• Generally, you should only use Variance / SD with data that has a symmetrical distribution
– That is, using SD/Variance to describe spread can lead to inaccurate calculations if looking at data
that is NOT normally distributed. This is important!
– The reason is that in order to calculate the SD, we need to use the mean of that data.
– Terminology review: We say that the SD is not resistant to skewed data.
Variation
In every set of values (“dataset”—know this term!) e.g. ages, heights, incomes, crop yields,
spitball-distances, etc. there will always be variation. The question is, how do we describe just
how varied our data is? Are all the points closely clustered around some value or are they all over
the map?
Having a sense of how much the various datapoints are spread out around the center (mean or
median) is an important piece of information. Consider employee pay at DePaul University: if we
looked at every income from student workers to the Provost/President, we would see a
tremendous variation around the center as some people make a great deal more than the center,
and some make much less. However, if we focused only on student employees, we would find
that the variation around the center is considerably less.
The most common statistic we used for measuring variability (spread) is called the standard
deviation.
How to calculate the SD
(Note: This is for explanatory purposes
only - you will not have to do this by hand)
1 n
i
3 61 63.4 -2.4 5.76
2
s ( x x ) 4 62 63.4 -1.4 1.96
(n 1) 1 5 62 63.4 -1.4 1.96
• In the math world, there are numerous ‘shortcuts’ that are used
to represent certain concepts.
• Such symbols are both widespread and useful (though I believe
sometimes it's just geeks trying to look impressive).
• However, you must get comfortable with them as they are
introduced.
However, you should always keep in mind what it is you are calculating when you apply
this formula. Do not ever forget that a z-score measures the number of standard
deviations that a data value x is from the mean m.
(x )
z