You are on page 1of 16

Outlier detection is both easy and

difficult.
• It is easy since there are several
relatively straightforward tests for
Detecting Outliers the presence of outliers.

• It is difficult since there are no firm


rules as to when outlier removal is
appropriate.
Detecting Outliers Using SPSS:
Calculating Z-scores

• Analyze
• Descriptive
Statistics

Aniceto B. Naval 2
Calculating Z-scores
This opens a window that allows us to select
variables to have boxplots made for.

• Save standardized
values as variables

Aniceto B. Naval 3
Calculating Z-scores
Expected output:

The only thing that will show in the output window is a


table like the above for the descriptive statistics.

Aniceto B. Naval 4
Calculating Z-scores:
Looking for Outliers
The z-scores (standardized scores) will show as a
new variable in the Data Editor window:

The easiest way to look for


outliers with the z-scores is
to scan the list visually
looking for numbers that are
greater than 3 in absolute
value.

This would indicate an


outlier.

Aniceto B. Naval 5
Dealing with Outliers
Outlier tests are an iterative process.
1. Check most extreme value for being an outlier.
2. If it is, remove it.
3. Check for the next extreme value using the
new, smaller sample. It is smaller because the
first outlier was removed.
4. Repeat the process.

Once all outlier are removed the sample can be


analyzed.

Aniceto B. Naval 6
Detecting Outliers:
Interquartile Range and Box Plots

Procedure for
Identifying Outliers:

From the menu at the


top of the screen, click
on Analyze, then click
on Descriptive
Statistics,
then Explore.

Aniceto B. Naval 7
Detecting Outliers:
Interquartile Range and Box Plots
• In the Display section,
make sure Both is
selected. This provides
both Statistics and Plots.
• Click on your variable (e.g.
technology), and move it
into the Dependent
list box. Consider a factor
(e.g. Gender of the
Respondents) and move
to Factor List.
• Click Statistics.

Aniceto B. Naval 8
Detecting Outliers:
Interquartile Range and Box Plots

• Click on Descriptives
and Outliers.
• Then click Continue.

Aniceto B. Naval 9
Detecting Outliers:
Interquartile Range and Box Plots

• Click on Plots.

Aniceto B. Naval 10
Detecting Outliers:
Interquartile Range and Box Plots

• Click on Histogram.
Aniceto B. Naval 11
Expected Outputs

• Displays the valid and missing cases.

Aniceto B. Naval 12
Expected Outputs
• Descriptive table provides with an
indication of how much a problem
associated with these outlying
cases.
• The expected value is the 5%
Trimmed Mean.
• SPSS removes the top and
bottom 5 per cent of the cases
and calculated a new mean value
to obtain this Trimmed Mean
value.
• Compare the original mean and
the new trimmed mean. If these
two mean values are very
different, then there’s a need to
investigate the data points further.

Aniceto B. Naval 13
Expected Outputs

• The Extreme values table


gives the highest and the
lowest values recorded for
that variable and also
provide the ID of the
person with that score.
• It helps to identify the case
that has the outlying
values.

Aniceto B. Naval 14
Expected Outputs

Have a look at the Histogram and check the tails of


distribution if there are data points falling away as the
extremes.

Aniceto B. Naval 15
Expected Outputs

Inspect the
Boxplot whether
SPSS identifies
outliers. These
outliers are
displayed as little
circles with a ID
number attached.

Aniceto B. Naval 16

You might also like