You are on page 1of 5

2/20/2013

1
OutlierDetection
Outlierdetectionisbotheasyanddifficult.
Itiseasysincethereareseveralrelativelystraightforward
testsforthepresenceofoutliers.
Itisdifficultsincetherearenofirmrulesastowhen
outlierremovalisappropriate.
Outliersmaybedueto:
Chance.
Measurementerror.
Experimentalerror.
Outliersmayormaynotbeaproblem,dependingonmany
factors:
Somestatisticaltestsarerobustandcanaccommodate
outliers,othersmaybeseverelyinfluencedbyoutliers.
Parametrictestcanundulyinfluenced.
Nonparametrictestsrarelyare.
Somedatatypeswillnaturallycontainextremevalues.
Radiationlevelsoftenhaveextremevalues(spikes).
Thepresenceofoutliersmay,infact,beofinterest.
Again,radiationspikes.
Theoutlier(s)mayfallinaregionofpopulationoverlap.This
typeofoutliermustberemovedfromthedataset.
2/20/2013
2
Isthisobservation
(57.00)anoutlier?
Insomecasesasingleoutliermayinfluencenormality,however,
inthiscasethedataarenormalevenwiththisobservation.
Shouldthisobservationbeexaminedfurtheratthispoint?
Tests of Normality

Kolmogorov-Smirnov
a
Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Male Standing-Sitting Height
Ratio (Cormic Index)
.065 93 .200
*
.986 93 .304
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction

Male Standing-Sitting Height Ratio (Cormic Index)


Stem-and-Leaf Plot
Fr equency St em& Leaf
1.00 Extremes (=<48.9)
3.00 49 . 446
12.00 50 . 012334557788
25.00 51 . 0122233344455566678888899
31.00 52 . 0111222233333334455555677789999
11.00 53 . 00111345789
7.00 54 . 0012378
2.00 55 . 02
1.00 Extremes (>=57.0)
St emwi dt h: 1. 00
Each l eaf : 1 case( s)
Theobservation57.0isconsideredtobeanextremevaluein
thestemandleafplot.
Whenexaminingpotentialoutliers,thedetrended normalQQ
plotisuseful.
Observationsaretransformedtozscoresandplottedas
standarddeviationsfromthemean.
Thisobservationisnearly
1.5standarddeviations
fromthemean.
2/20/2013
3
Thebestmethodofdeterminingifanobservationisanoutlier
istouseanoutliertest.
Thetestgivestheprobabilitythatanobservationisfrom
adifferentpopulation.
Itisdefensible.
ItDOESNOTtellyouwhetherornottoremovethe
extremeobservation(s)
s
x x
G or
s
x x
G
min
min
max
max

GrubbsOutlierTest
whereG
max
isusediftheobservationisgreaterthanthemean
andG
min
isusedifitislessthanthemean,andwherex
max
orx
min
istheextremeobservationvalue.
FromtheGtableatn=93and=0.05thecriticalvalueis3.18.
Since3.49>3.18,rejectH
o
.
Theobservationisfromadifferentpopulation(G
3.49
,p<0.025).
49 . 3
38 . 1
2 . 52 0 . 57
38 . 1
2 . 52
93
max

G
s
x
n
Ho:Theobservationisnotdifferentthanthesamplepopulation.
Ha:Theobservationisdifferentthanthesamplepopulation.
Critical Valuesof GrubbsOutlier(G) Test
TakenfromGrubb1969, Table1
N =0.05 =0.025 =0.01
Calculatedvaluefalls
abouthere.
2/20/2013
4
DixonOutlier(Q)Test
Wherex
n
isthesuspectedoutlier,x
n1
isthenextranked
observation,andx
1
isthelastrankedobservation.
Notethatthedatahavetoberanked,withthesuspected
outlierasthefirstobservation.
1
1
x x
x x
Q
n
n n


InSPSSAnalyze>DescriptiveStatistics>Explore,thenchoose
theStatistics buttonandOutliers.
ThisgivestheupperandlowerextremesANDthenextseveral
observations,veryusefulwhenusingtheDixontest.
Extreme Values

Case Number Value
Male Standing-Sitting Height
Ratio (Cormic Index)
Highest
1 1 57.00
2 2 55.20
3 3 55.06
4 4 54.87
5 5 54.72
Lowest
1 93 48.93
2 92 49.46
3 91 49.48
4 90 49.68
5 89 50.05

02 . 0 05 . 0 1881 . 0 223 . 0
07 . 8
8 . 1
93 . 48 00 . 57
20 . 55 00 . 57
93

p Q Q
n
Critical
Extreme Values

Case Number Value
Male Standing-Sitting Height
Ratio (Cormic Index)
Highest
1 1 57.00
2 2 55.20
3 3 55.06
4 4 54.87
5 5 54.72
Lowest
1 93 48.93
2 92 49.46
3 91 49.48
4 90 49.68
5 89 50.05

Ho:Theobservationisnotdifferentthanthesamplepopulation.
Ha:Theobservationisdifferentthanthesamplepopulation.
Theobservationisfromadifferentpopulation(Q
0.223
,0.05>p
>0.02).
2/20/2013
5
CharacteristicsoftheDixonandGrubbsTests
DixonQ:
Istheratiooftheoutliergaptothedatarange.
Similartothew/s(range)normalitytest.
GrubbsG:
Isessentiallyazscorethatreferencesamodifiedttable.
Verysimilartoaonesamplettest.
SuspectCormic valuebecomesincreasinglyextreme.
Morenormal Lessnormal
TheGrubbstestpicksupextremevaluesearlierthantheDixontest,sochoose
thetestthatismostappropriatebasedonyourknowledgeofthedata
Thesamedatausedthegeneratethepreviousgraph,displayedasa
detrended QQplot.
1 2 3
4 5 6

You might also like