You are on page 1of 1

Using Z-scores to Detect Outliers

Z-scores can quantify the unusualness of an observation when your data follow the normal distribution. Z-scores
are the number of standard deviations above and below the mean that each value falls. For example, a Z-score
of 2 indicates that an observation is two standard deviations above the average while a Z-score of -2 signifies it
is two standard deviations below the mean. A Z-score of zero represents a value that equals the mean.

The further away an observation’s Z-score is from zero, the more unusual it is. A standard cut-off value for
finding outliers are Z-scores of +/-3 or further from zero. The probability distribution below displays the
distribution of Z-scores in a standard normal distribution. Z-scores beyond +/- 3 are so extreme you can barely
see the shading under the curve.

In a population that follows the normal distribution, Z-score values more extreme than +/- 3 have a probability
of 0.0027 (2 * 0.00135), which is about 1 in 370 observations. However, if your data don’t follow the normal
distribution, this approach might not be accurate.

Also, note that the outlier’s presence throws off the Z-scores because it inflates the mean and standard deviation
as we saw earlier. Notice how all the Z-scores are negative except the outlier’s value. If we calculated Z-scores
without the outlier, they’d be different! Be aware that if your dataset contains outliers, Z-values are biased such
that they appear to be less extreme (i.e., closer to zero).

To calculate the outlier fences, do the following:

1. Take your IQR and multiply it by 1.5 and 3. We’ll use these values
to obtain the inner and outer fences. For our example, the IQR equals 0.222.
Consequently, 0.222 * 1.5 = 0.333 and 0.222 * 3 = 0.666. We’ll use 0.333
and 0.666 in the following steps.
2. Calculate the inner and outer lower fences. Take the Q1 value and subtract the two values from step 1. The two
results are the lower inner and outer outlier fences. For our example, Q1 is 1.714. So, the lower inner fence =
1.714 – 0.333 = 1.381 and the lower outer fence = 1.714 – 0.666 = 1.048.
3. Calculate the inner and outer upper fences. Take the Q3 value and add the two values from step 1. The two
results are the upper inner and upper outlier fences. For our example, Q3 is 1.936. So, the upper inner fence =
1.936 + 0.333 = 2.269 and the upper outer fence = 1.936 + 0.666 = 2.602.

Using the Outlier Fences with Our Example Dataset

For our example dataset, the values for these fences are 1.048, 1.381, 2.269, and 2.602. Almost all of our data
should fall between the inner fences, which are 1.381 and 2.269. At this point, we look at our data values and
determine whether any qualify as being major or minor outliers. 14 out of the 15 data points fall inside the inner
fences—they are not outliers. The 15th data point falls outside the upper outer fence—it’s a major or extreme
outlier.

The IQR method is helpful because it uses percentiles, which do not depend on a specific distribution.
Additionally, percentiles are relatively robust to the presence of outliers compared to the other quantitative
methods. Values that fall inside the two inner fences are not outliers. Let’s see how this method works using
our example dataset.

You might also like