You are on page 1of 7

Detecting Outliers

The outliers – or extreme values – can represent a danger for the analysis, because they
directly affect mean and standard deviation. That’s why we should detect, and in some
cases remove the outliers before running such tests.

There are two methods for identifying the outliers:


1. A numerical method, based on the standardized values
2. A graphical method, bases on the boxplot chart
How to manage the outliers?…
There are three kinds of outliers, depending on their source:

• Data entry errors, due to lack of attention, negligence, tiredness etc.

• Measurement or data collecting errors, due either to human mistakes or to equipment malfunction.

• Real non-typical, unusual values in your population. These are the so called genuine outliers.
How to manage the outliers?…
There are two basic solutions for dealing with genuine outliers:

• Remove the outliers from the data series


• Keep the extreme values in the data series
How to manage the outliers?…
If we decide to keep the outliers, we have other four possible routes to choose from:

1. Run a nonparametric test, because these tests are less sensitive to outliers.

2. Replace the outliers with values closer to the normal. Let’s suppose that our
data
2.7series look
2.2like this:5.9 3.4 3.0 3.7 2.8
How to manage the outliers?…
Solutions for managing the outliers (continued)

3.Run the parametric test regardless, being aware of the possible effects of the outliers.

4.Perform a so called sensitivity analysis: run both the parametric and the
nonparametric test. If the results are similar, we can conclude that the outliers do not
affect our findings.

You might also like