You are on page 1of 2

SUPPLEMENTARY TOPIC: CHEBYSHEV’S THEOREM

1
Lecture Notes for Introductory Statistics

Neal Smith, Augusta University (2016)

In chapter 2, we discussed how to calculate the mean and standard deviation of


a data set. As it turns out, simply knowing the mean and standard deviation of a
data set can reveal much about the nature of a set of data. A well-known result
attributed to Chebyshev makes the nature of this statement precise.

1. Chebyshev’s Theorem
First, we begin by stating Chebyshev’s Theorem. We will not prove the result.

Chebyshev’s Theorem. Given any set of data, and any real number k ≥ 1, at
least 1−1/k 2 of the points in that set of data must fall within k standard deviations
of the mean. That is, at least 1 − 1/k 2 of the data must lie in the interval between
µ − kσ and µ + kσ.
This is a pretty powerful result, since it allows us to make a statement about
any set of data. Additionally, there are some special cases of this theorem that are
worth knowing, so let’s take a look at a couple of these.
If we set k = 2, then 1 − 1/k 2 = 1 − 1/4 = 75%, and Chebyshev’s Theorem
tells us that in any data set, at least 75 percent of the data must lie within k = 2
standard deviations of the mean.
If we set k = 3, then 1 − 1/k 2 = 1 − 1/9 ≈ 88.9%, and Chebyshev’s Theorem
tells us that in any data set, at least 88.9 percent of the data must lie within k = 3
standard deviations of the mean.
If we set k = 4, then 1 − 1/k 2 = 1 − 1/16 = 93.75%, and Chebyshev’s Theorem
tells us that in any data set, at least 93.75 percent of the data must lie within k = 4
standard deviations of the mean.
We could of course do this all day, but the takeaway is that in any set of data we
can expect the vast majority of the data to fall within just a few standard deviations
of the mean.

2. Examples Using Chebyshev’s Theorem


Example 1. A group of 20 students were asked for the amount they spent in
textbooks during the last academic year. The amounts in dollars reported were

700, 600, 550, 550, 550, 500, 500, 500, 450, 450,
450, 400, 400, 400, 400, 350, 350, 300, 300, 200

Check that this population has mean 445 dollars with σ = 113.91 dollars.
Chebyshev’s Theorem therefore predicts that at least 75 percent of the costs will
fall in the interval 445 − 2 × 113.91 = 217.18 to 445 + 2 × 113.91 = 672.82. By
examination of the data set, we see that this is certainly true, as 18 of the 20 (or 90
percent) of the reported costs do fall in this interval. Thus, the operative phrase is
at least.
1
These lecture notes are intended to be used with the open source textbook “Introductory
Statistics” by Barbara Illowsky and Susan Dean (OpenStax College, 2013).
1
Supplemental topic: Chebyshev’s Theorem N. Smith

Also observe that Chebyshev’s Theorem predicts that at least 88.9 percent of
the costs fall within three standard deviations of the mean, but it is easy to see that
in fact all of the data in this particular data set is in fact within three standard
deviations of the mean.
Example 2. A professor tells a class that the mean on a recent exam was 80
with a standard deviation of 6 points, and suppose you wanted to find an interval
where at least 75 percent of the students must have scored. Since 75 percent
corresponds to k = 2 in Chebyshev’s Theorem, we need only look 2 standard
deviations from the mean to conclude that at least 75 percent of the students
scored between 80 − 2 × 6 = 68 and 80 + 2 × 8 = 96.
Depending on the data, Chebyshev’s Theorem may tell you a lot or not so much.
Let’s look at an example where Chebyshev’s Theorem is not too enlightening.
Example 3. A professor tells a class that the mean on a recent (100 point) exam
was 62 and the standard deviation was a whopping 33 points. Again, if we wanted
to get a handle on at least 75 percent of the exam scores, we would let k = 2 and
conclude that at least 75 percent of the students scored between 62 − 2 × 33 = −4
and 62 + 2 × 33 = 128. Since this was a 100 point exam, and presumably negative
scores were not possible, this tells us that at least 75 percent of the students scored
between 0 and 100 on the exam. While this statement is of course true, it is not
terribly enlightening! Hopefully, you can see that since the standard deviation was
so large, this is an indication of high variability in the data set, and there is simply
too much potential variation in the data to be able to draw fantastic conclusions
knowing only the mean and the standard deviation!
Let’s do one final example.
Example 4. Anew college graduate has done their homework and is searching
for their first job. Based on their major, their educational level, the type of job
they are looking for, their experience, and the geographic location where they want
to live, a salary aggregator tells them that the mean salary of new employees is
approximately 45000 dollars with a standard deviation of 2600 dollars. This person
is subsequently offered a salary of 52000 dollars. How good is this offer?
Solution. Well, it’s certainly not terrible, being above the mean, but fortunately
we can quantify this somewhat better. First, since Chebyshev’s Theorem can tell
us what is happening a certain number of standard deviations away from the mean,
it would be nice to know a z-score for this 52000 dollar salary.
x−µ 52000 − 45000
z52000 = = ≈ 2.7
σ 2600
Since this salary is 2.7 standard deviations away from the mean, using Cheby-
shev’s Theorem with k = 2.7 tells us that at least 1 − 1/2.72 ≈ 86.3 percent of the
salaries must lie within 2.7 standard deviations of the mean; that is between 38000
and 52000 dollars. Thus, we can safely conclude that at least this 52000 dollar offer
is greater or equal to at least 86.3 percent of the other salaries out there.

Notes, p 2