You are on page 1of 21

Six

Robust Statistics for


Location and Scale
Parameters
212
6.1 WHY DO WE NEED ROBUST STATISTICS?
There may be outliers in the data.
Outliers are sample values that are considered very different from
the majority of the sample.
The data may depart from the underlying distribution assumptions.
213
6.2 WHAT IS A ROBUST STATISTIC?
A statistical method is robust if the statistic is insensitive to slight
departures from the assumptions that justify the use of the statistic.
We shall see some robust statistics for location and scale parameters
rather than going into the details.
The robustness of a robust statistic can be measured by measures
such as breakdown point, inuence curve and gross error sensitiv-
ity.
214
6.3 ROBUST LOCATION ESTIMATORS
Trimmed mean
Winsorized mean
Hubers M-estimator
Tukeys bisquare estimator
Humpes M-estimator
215
Trimmed and Winsorized Means
Consider for example,
X = (2, 3, 4, 6, 8, 10, 12, 14, 18, 27).
For this set of data,
Mean = 10.4;
20% trimmed mean = 9;
(discard the higher and lower 20% of the observations)
20% Winsorized mean = 9.
(a 20%Winsorized mean would see all data belowthe 20th percentile
set to the 20th percentile, and data above the 80th percentile set to the
80th percentile.)
216
217
M-estimators for Location
Find which minimizes
n

i=1
(y
i
)
2
.
The solution is:
=
1
n
n

i=1
y
i
= y.
In general, we may nd which minimizes
n

i=1
(y
i
).
where is some meaningful function.
218
To minimize, we differentiate with respect to and equate the deriva-
tive to 0 and solve the equation:
n

i=1

(y
i
) = 0.
M-estimator of location parameter is dened as the solution of the
equation
n

i=1
(y
i
) = 0,
for some function .
219
EXAMPLE 6.1 (SOME EXAMPLES)
If (x) = x, then solving
n

i=1
(y
i
) = 0
will give = y.
If (x) = sign(x), then solving
n

i=1
(y
i
) = 0
will give = y
median
.
220
Other M-estimators
Metrically trimmed mean:
(x) =
_
x, |x| < c;
0, otherwise.
Metrically Winsorized Mean (Huber):
(x) =
_

_
c, x <c;
x, |x| < c;
c, x > c.
221
Tukeys bisquare:
(x) = x
_
1
_
x
R
_
2
_
2
+
where [u]
+
= max{u, 0}. R = 4.685 is most efcient for normal distri-
bution.
Humpels function:
(x) =
_

_
|x|, 0 <|x| < a;
a, a |x| < b;
a
_
c |x|
c b
_
, b |x| < c;
0, |x| c.
222
223
6.4 ROBUST MEASURES OF SCALE PARAMETER
The sample standard deviation is a commonly used estimator of the
population scale parameter, .
However, it is sensitive to outliers and may not remain bounded
when a single data point is replaced by an arbitrary number.
With robust scale estimators, the estimates remain bounded even
when a portion of the data points are replaced by arbitrary numbers.
224
Interquartile Range (IQR)
IQR is dened as IQR = Q
3
Q
1
, where Q
1
and Q
3
are the rst and
third quartiles respectively.
For a normal distribution, the standard deviation can be estimated
by dividing the interquartile range by 1.34898.
225
Median Absolute Deviation (MAD)
Most popular robust estimator of scale.
MAD = median
i
(|y
i
median
j
(y
j
)|)
where the inner median, median
j
(y
j
) is the median of n observations
and the outer median, median
i
is the median of the n absolute values
of the deviations about the median.
For normal distribution, 1.4826 MAD can be used to estimate the
standard deviation .
226
Ginis Mean Difference
Ginis mean difference is dened as
G =
1
_
n
2
_

i<j
|y
i
y
j
|.
If the observations are from a normal distribution, then

G/2 is an
unbiased estimator of the standard deviation .
227
6.5 ROBUST ESTIMATORS: SOFTWARE
It is easy to obtain these estimators using SAS, R and SPSS.
228
Robust Estimators: SAS
data ex6_1;
input x@@;
datalines;
2 3 4 6 8 10 12 14 18 27
;
proc univariate data=ex6_1 robustscale trimmed = 0.2 winsorized = 0.2;
var x;
run;
229
230
Robust Estimators: R
> # Calculate 20% Trimmed Mean
> mean(x, trim=0.2)
[1] 9
> # Calculate MAD
> median(abs(x-median(x)))
[1] 5
> # Calculate estimate of \sigma = 1.4826
*
MAD
> mad(x)
[1] 7.413
> # Calculate Interquartile Range
> IQR(x)
[1] 9
231
Robust Estimators: SPSS
Analyze" Descriptive Statistics" Explore ...".
Move the variable to the Dependent list". Then click Statistics" and
choose M-estimator".
Click Continue" and OK".
232

You might also like