# Welcome back

## Find a book, put up your feet, stay awhile

Sign in with Facebook

Sorry, we are unable to log you in via Facebook at this time. Please try again later.

or

Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more

Download

Standard view

Full view

of .

Look up keyword

Like this

Share on social networks

2Activity

×

0 of .

Results for: No results containing your search query

P. 1

Grubbs Test for outlier detectionRatings: (0)|Views: 174|Likes: 1

Published by Prakash Chowdary

See more

See less

https://www.scribd.com/doc/97788185/Grubbs-Test-for-outlier-detection

12/11/2012

text

original

Short Communication

A recursive version of Grubbs' test for detecting multiple outliers in environmentaland chemical data

Ram B. Jain

Centers for Disease Control and Prevention, 4770 Buford Highway, Chamblee, GA 30341, USA

a b s t r a c ta r t i c l e i n f o

Article history:

Received 16 January 2010Received in revised form 18 March 2010Accepted 27 April 2010Available online 21 May 2010

Keywords:

OutliersExtreme Studentized Deviate StatisticSimulationGrubbs

Objective:

To compare the performance of Grubbs' outlier detection procedure with recursive ExtremeStudentized Deviate (ESD) outlier detection procedure.

Design and methods:

Using simulated data, the powers of Grubbs

’

and ESD procedures were evaluated.

Results:

Except when the sample contained exactly one outlier, the power of ESD procedure was higherthan that of Grubbs' procedure.

Conclusion:

The ESD recursive procedure is the procedure of choice to detect multiple outliers inenvironmental and chemical data.© 2010 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

Introduction

In any laboratory where data quality is important, certain qualitycontrolschemesarepracticed.Theseschemesincludediscardingcertainobservations, called outliers, which are beyond certain standarddeviations (SD) from their respective means[1]. These outliers mustbe detected and treated so that they will not have a disproportionatein

ﬂ

uence on data analysis. Outliers are de

ﬁ

ned as the observationsthat do not

ﬁ

t into the pattern of the remaining observations calledinliers[2].Many procedures to detect outliers have been proposed[3]. In aconsecutive procedure like the one by Grubbs[4], only oneobservation at a time can be tested as an outlier. In the presence of multiple outliers, these procedures must be used repeatedly until nofurther outliers can be detected. In this case, the total Type I errormay be much higher than the intended

α

of 1% or 5%. This is of concern since chemical data do have multiple outliers[5]. Grubbs[4]
procedure (GRBP) is widely accepted. Among several variations of Grubbs

’

statistics, the statistics

τ

(

N

)

=(

X

(

N

)

−

X

̅

)

/

S

and

τ

(1)

=(

X

̅

−

X

(1)

)/

S

(where

X

̅

=mean of the sample,

S

=of the sample,

X

(

N

)

=the largest observation in the sample, and

X

(1)

=the smallestobservation in the sample) are in use today and the tables of criticalvalues are available[4]. Recently, critical values for sample sizes up to30,000 have become available[6].Recursive outlier detection procedures[7]were designed to eli-minate some of the shortcomings associated with consecutive proce-dures. These procedures, even if used to detect the presence of

K

S

suspected outliers, can detect the presence of any number of outliers,from zero to

K

S

and can control Type I error to the intended level.Even thoughcritical values for these procedures are available[3], thereis a set of tables of critical values for each combination of

N

and

K

S

S

(

i

)

isanordered sample containing observations

X

(1Si)

,

…

,

X

(NSi)

such that

X

(1Si)

≤

X

(2Si)

≤

. . .

≤

X

(

N

-1Si)

≤

X

(NSi)

in sample

S

(i)

, then to detect

K

S

suspected outliers in the sample, statistics ESD

i

=max((

X

(NSi)

−

M

Si

),(

M

Si

−

X

(1Si)

))/SD

Si

, (

X

(NSi)

=largest observation in sample

S

(i)

,

X

(1Si)

=smallest observation in sample

S

(

i

)

, M

Si

=mean of the sample S

(i)

, andSD

Si

=SDofsample

S

(

i

)

,

i

=1,

…

,

K

S)

.Or,ESD

i

istheextremestudentizeddeviate for sample

S

(

i

)

. The null hypothesis of no outliers is rejected if ESD

i

isgreaterthanitscriticalvalue.ThecriticalvaluesofESDfor

N

=20to100,for

K

K

S

.The powers of consecutive as well as and recursive proceduresare negatively affected by the masking and swamping effects whenthe actual number of outliers,

K

A

is different than

K

S

. The maskingeffect, de

ﬁ

ned as the inability to detect an outlier in the presenceof another outlier, is present when

K

S

b

K

A.

The swamping effect,de

ﬁ

ned to detect inliers as outliers in addition to outliers, is presentwhen

K

S

N

K

A

. Details about masking and swamping effects arepresented as Supplemental Information (SI) S1.In this paper, I explain the algorithm used to compute samplestatistics in the ESDP, develop a model to compute the critical valuesfor ESDP, and compare the performance of ESDP with GRBP byconducting a simulation study.

Clinical Biochemistry 43 (2010) 1030

–

1033

☆

The

ﬁ

ndings and conclusions in this report are those of the author[s] and do notnecessarily represent the views of the Centers for Disease Control and Prevention.

⁎

Fax: +1 770 488 0181.

E-mail address:

–

see front matter © 2010 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.doi:10.1016/j.clinbiochem.2010.04.071

Contents lists available atScienceDirect

Clinical Biochemistry

journal homepage: www.elsevier.com/locate/clinbiochem

Materials and methods

Critical values of the ESD recursive outlier detection procedure

I

ﬁ

tted non-linear regression models for ESD

i

as a function of

N

and

K

S,

where

N

is the sample size of the original sample for

α

=0.01and 0.05. Models for ESD

i

were: ESD

i

=

α

+

β

1

N

+

β

2

N

2

+

β

3

N

3

+

β

4

K

S

where

N

is the sample size. The estimated values of regressioncoef

ﬁ

N

from 20 to 100 and 2

≤

K

S

≤

5. An exampleillustrating the computation is given in SI S2. The difference betweenthe tabulated critical values provided elsewhere[3]and those com-puted here are given in SI S3.

Simulation study

In the simulation study for ESD,

K

S

for each sample was arbitrarilyspeci

ﬁ

ed as 2

≤

K

S

≤

5. I generated 500 random samples each of size20,30,40,50,and100fromanormaldistribution

x

∼

N

(0,1).Upto

ﬁ

veoutliers were then introduced in each sample randomly with thelowest and/or the largest observations in the sample being increasedor reduced by randomly determined value that varied between 1 SDand 3 SD. The number of outliers introduced in each sample wasrandomly determined with the restriction that no sample had morethan 10% outliers. The total number of outliers on the lower and theupper ends of the sample were also randomly determined. The step-by-step details of the simulation algorithm are given in SI S4.

Results

Simulation study

When

α

=0.05 and

K

S

=

K

A

, the power or the percent proba-bility of detecting the exact number of outliers varied from 83.6% to99.8% for the ESDP and from 54% to 81% for GRBP (Fig. 1, Panel A).

Table 1

Intercepts and model coef

ﬁ

cient to compute critical value for ESD statistics.Type IErrorESDStatisticsEstimated regression coef

ﬁ

cients

α β

1

β

2

β

3

β

4

0.05 ESD

1

2.36272 0.02164

−

0.00011202 0.000000E+00 0.05400ESD

2

2.08321 0.02360

−

0.00024696 9.940952E-07 0.03929ESD

3

2.12080 0.01562

−

0.00014331 5.330823E-07 0.02643ESD

4

2.08897 0.01364

−

0.00013213 5.284038E-07 0.02143ESD

5

2.26316 0.00626

−

0.00002207 0.000000E+00 0.000000.01 ESD

1

2.21006 0.05135

−

0.00062151 2.680000E-06 0.04271ESD

2

2.31375 0.02378

−

0.00024152 9.337330E-07 0.03900ESD

3

2.36188 0.01382

−

0.00010112 2.566791E-07 0.02143ESD

4

2.27294 0.01538

−

0.00017094 7.268129E-07 0.02000ESD

5

2.40764 0.01020

−

0.00012227 6.338063E-07 0.00000

Fig. 1.

(A) Probability of detecting exact number of outliers for Grubbs' and ESD procedures when

K

S

=

K

A

. (B) Probability of detecting less than the exact number of outliers forGrubbs' and ESD procedures when

K

S

=

K

A

. (C) Probability of detecting exact number of outliers for Grubbs' and ESD procedures when

K

S

b

K

A

. (D) Probability of detecting less thanthe exact number of outliers for Grubbs' and ESD procedures when

K

S

b

K

A

. The probabilities for Grubbs' procedure are displayed in dotted lines. The probabilities for ESD procedureare displayed by solid line.1031

R.B. Jain / Clinical Biochemistry 43 (2010) 1030

–

1033

The power of ESDP was as much as 38.6% higher than the power of GRBP. The probability of detecting less than the actual number of outliers was as high as 47.2% for GRBP (Fig. 1, Panel B). Thus, whenthere is exact knowledge of how many outliers are present in thesample, ESD is the procedure of choice.When

K

S

b

K

A

, the ESDP performed much better than GRBP. Thepower of

ESDP

was higher by as much as 42.2% than GRBP for

N

=50,

K

A

=5, and

K

S

K

A

(

P

high

) for GRBP was several-fold higher thanfor ESDP. For example for

N

=40,

K

A

=4, and

K

S

=3,

P

high

for GRBPwas 44.6% and 4% for ESDP (Fig. 1, Panel D). Thus, ESDP was theprocedure of choice. However, the closer the values of

K

S

and

K

A

were, the better was the power of ESDP.When

K

S

N

K

A

and when

K

A

=1, (Fig. 2, Panel A), the power of GRBP was higher by as much as 35% than ESDP. However, as thedifference between

K

S

and

K

A

decreased and sample size increased,ESDP performed better and better and, actually it performed betterthan GRBP. For example, when N=40,

K

A

=4, and

K

S

=5, the powerof ESDP was 80% and the power of GRBP was 50.8% (Fig. 2, Panel B).While GRBP performed better than ESDP in quite a few cases, itwas still quite sensitive to the masking effect (Fig. 2, Panels C and D)as the difference between

K

S

and

K

A

decreased and sample sizeincreased. For example, for N=30,

K

A

=3, and

K

S

=5,

P

high

K

S

N

K

A

was dif

ﬁ

cult and depends upon the dif-ference between

K

S

and

K

A

.In addition to Tukey's exploratory procedure, other proceduresto estimate

K

A

are available[9]but require use of expected valuesof normal order statistics which are easily available[10]. A statisticRDM that can be used to estimate the number of outliers[9]is brie

ﬂ

ydiscussed as SI S5.

Discussion

Overall, ESDP works better than GRBP in all situations when

K

S

≤

K

A

. However, when

K

A

=1, GRBP may be better. While theESDP performs satisfactorily when

K

S

N

K

A

, its performance may bedegraded when the difference between

K

S

and

K

A

is large. Theimpact of this can be minimized by having a

“

good

”

estimate of

K

S

.We found that Tukey's inner fences do not always work satisfacto-rily (data not shown). In my opinion, procedures for estimating (ascompared to detecting)

K

S

such as those given elsewhere[9]canprovide better results.The issue of what to do with outliers once they have been detectedis complicated. It probably will depend upon the source of outliers. Agood discussion is given by Barnett and Lewis[3]in Chapter 2. Finally,anexamplethatdemonstratescomputationsofESDispresentedasSIS6.

Fig. 2.

(A) Probability of detecting exact number of outliers for Grubbs

’

and ESD procedures when 2

≤

K

S

≤

5 and

K

A

=1. (B) Probability of detecting the exact number of outliers forGrubbs

’

and ESD procedures when 4

≤

K

S

≤

5 and 2

≤

K

A

≤

4. (C) Probability of detecting less than the exact number of outliers for Grubbs' and ESD procedures when 4

≤

K

S

≤

5 and2

≤

K

A

≤

4. (D) Probability of detecting less than the exact number of outliers for Grubbs

’

and ESD procedures when 2

≤

K

S

≤

5 and

K

A

=1. The probabilities for Grubbs' procedure aredisplayed in dotted lines. The probabilities for ESD procedure are displayed by solid line.1032

R.B. Jain / Clinical Biochemistry 43 (2010) 1030

–

1033

- Read and print without ads
- Download to keep your version
- Edit, email or read offline

© Copyright 2015 Scribd Inc.

Language

Choose the language in which you want to experience Scribd:

Sign in with Facebook

Sorry, we are unable to log you in via Facebook at this time. Please try again later.

or

Password Reset Email Sent

Join with Facebook

Sorry, we are unable to log you in via Facebook at this time. Please try again later.

or

By joining, you agree to our

read free for two weeks

Unlimited access to more than

one million books

one million books

Personalized recommendations

based on books you love

based on books you love

Syncing across all your devices

Join with Facebook

or Join with emailSorry, we are unable to log you in via Facebook at this time. Please try again later.

Already a member? Sign in.

By joining, you agree to our

to download

Unlimited access to more than

one million books

one million books

Personalized recommendations

based on books you love

based on books you love

Syncing across all your devices

Continue with Facebook

Sign inJoin with emailSorry, we are unable to log you in via Facebook at this time. Please try again later.

By joining, you agree to our

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

CANCEL

OK

You've been reading!

NO, THANKS

OK

scribd