221 views

Uploaded by shakti_ilead

save

- Testing the predictive ability of two classes of models
- Employee Branding FINAL VERSION
- RCL Master's Thesis
- hw12.pdf
- MODELING SYSTEMATIC RISK AND POINT-IN-TIME PROBABILITY OF DEFAULT UNDER THE VASICEK ASYMPTOTIC SINGLE RISK FACTOR MODEL FRAMEWORK
- Output Marketing Rev
- Part 1 Project-Simulation Exercise
- 1-s2.0-S0005796717301067-main
- joshi2011
- 10.1016@j.aap.2017.07.006.pdf
- Box Cox
- Class Activity Wk 5
- Ammann Tracking Error Variance Decomposition Final
- 6 Multiple Regression
- Full Text
- RPoE.pdf
- handout2
- PAPER -1 AKB
- Symbol
- Isoplot3
- CH11B063 Assignment 4
- Efficiency and Profitability Analysis of Investment Banking in Pakistan (2001-2011)
- Accrual Determinants
- Linear Models
- Chapter 2 correlation and regression.doc
- Sea Surface Temper 00 and e
- Variance
- Model Predictive Heuristic Control: Applications to Industrial Processes
- Mathematics Skills as Predictors of Physics Students’ Performance in Senior Secondary Schools
- Concurrent Optimisation of Parameter and Tolerance Design via Computer Simulation and Statistical Method
- Sapiens: A Brief History of Humankind
- Yes Please
- The Unwinding: An Inner History of the New America
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- Dispatches from Pluto: Lost and Found in the Mississippi Delta
- John Adams
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- The Prize: The Epic Quest for Oil, Money & Power
- Grand Pursuit: The Story of Economic Genius
- This Changes Everything: Capitalism vs. The Climate
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- The Emperor of All Maladies: A Biography of Cancer
- Team of Rivals: The Political Genius of Abraham Lincoln
- The New Confessions of an Economic Hit Man
- Rise of ISIS: A Threat We Can't Ignore
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America
- The World Is Flat 3.0: A Brief History of the Twenty-first Century
- Bad Feminist: Essays
- How To Win Friends and Influence People
- Angela's Ashes: A Memoir
- Steve Jobs
- Leaving Berlin: A Novel
- The Silver Linings Playbook: A Novel
- The Sympathizer: A Novel (Pulitzer Prize for Fiction)
- Extremely Loud and Incredibly Close: A Novel
- The Light Between Oceans: A Novel
- The Incarnations: A Novel
- You Too Can Have a Body Like Mine: A Novel
- Life of Pi
- The Love Affairs of Nathaniel P.: A Novel
- We Are Not Ourselves: A Novel
- The First Bad Man: A Novel
- The Rosie Project: A Novel
- The Blazing World: A Novel
- Brooklyn: A Novel
- The Flamethrowers: A Novel
- A Man Called Ove: A Novel
- The Master
- Bel Canto
- Interpreter of Maladies
- The Kitchen House: A Novel
- Beautiful Ruins: A Novel
- Wolf Hall: A Novel
- The Art of Racing in the Rain: A Novel
- The Wallcreeper
- My Sister's Keeper: A Novel
- A Prayer for Owen Meany: A Novel
- The Cider House Rules
- The Bonfire of the Vanities: A Novel
- Lovers at the Chameleon Club, Paris 1932: A Novel
- The Perks of Being a Wallflower

You are on page 1of 8

**2004, Vol. 19, No. 4, 571–578
**

DOI 10.1214/088342304000000503

© Institute of Mathematical Statistics, 2004

**Comparing Variances and Other
**

Measures of Dispersion

Dennis D. Boos and Cavell Brownie

**Abstract. Testing hypotheses about variance parameters arises in contexts
**

where uniformity is important and also in relation to checking assumptions

as a preliminary to analysis of variance (ANOVA), dose-response model-

ing, discriminant analysis and so forth. In contrast to procedures for tests on

means, tests for variances derived assuming normality of the parent popula-

tions are highly nonrobust to nonnormality. Procedures that aim to achieve

robustness follow three types of strategies: (1) adjusting a normal-theory test

procedure using an estimate of kurtosis, (2) carrying out an ANOVA on a

spread variable computed for each observation and (3) using resampling of

residuals to determine p values for a given statistic. We review these three

approaches, comparing properties of procedures both in terms of the theoret-

ical basis and by presenting examples. Equality of variances is first consid-

ered in the two-sample problem followed by the k-sample problem (one-way

design).

Key words and phrases: Comparing variances, measures of spread, permu-

tation method, resampling, resamples, variability.

**1. INTRODUCTION uses Bartlett’s test of equal covariance matrices to de-
**

cide between fitting linear and quadratic discriminant

Tests for equality of variances are of interest in

functions (SAS Institute, 1999). Fifty years ago, the

a number of research areas. Increasing uniformity is

an important objective in quality control of manu- use of such nonrobust variance procedures before rel-

facturing processes (e.g., Nair and Pregibon, 1988; atively robust means procedures prompted Box (1953,

Carroll, 2003), in agricultural production systems (e.g., page 333) to comment, “To make the preliminary test

Fairfull, Crowber and Gowe, 1985) and in the devel- on variances is rather like putting to sea in a row-

opment of educational methods (e.g., Games, Winkler ing boat to find out whether conditions are sufficiently

and Probert, 1972). Biologists are interested in differ- calm for an ocean liner to leave port!” It is unfortunate

ences in the variability of populations for many rea- that procedures condemned in 1953 are still in prac-

sons, for example, as an indicator of genetic diversity tice today, and Box’s comment is still relevant. On the

and in the study of mechanisms of adaptation. other hand, more robust procedures for testing equal-

Procedures for comparing variances are also used as ity of variances are now available, although many are

a preliminary to standard analysis of variance, dose– not easily implemented with commercial software. Our

response modeling or discriminant analysis. For ex- plan is to describe the development and properties of

ample, SAS PROC TTEST presents the F test for these more robust procedures and so encourage their

equality of variances as a tool for choosing between the use.

pooled variance t and the unequal variances Welch t. Testing equality of variances, or other measures of

The POOL = TEST option in SAS PROC DISCRIM scale, is a fundamentally harder problem than com-

paring means or measures of location. There are two

Dennis D. Boos and Cavell Brownie are Professors, reasons for this. First, standard test statistics for mean

Department of Statistics, North Carolina State Uni- comparisons (derived assuming normality) are natu-

versity, Raleigh, North Carolina 27695, USA (e-mail: rally standardized to be robust to nonnormality via the

boos@stat.ncsu.edu; brownie@stat.ncsu.edu). central limit theorem. (Here we refer to robustness of

571

572 D. D. BOOS AND C. BROWNIE

validity, that is, whether test procedures have approx- ANOVA on the absolute deviation from the median

imately the correct level.) In contrast, normal-theory as the best procedure. This was also the recommen-

test statistics for comparing variances are not suitably dation of the comprehensive review and Monte Carlo

standardized to be insensitive to nonnormality. Asymp- study of Conover, Johnson and Johnson (1981). In

totically, these statistics are not distribution-free, but situations where power is more important than com-

depend on the kurtosis of the parent distributions. Sec- putational simplicity, we recommend a bootstrap ver-

ond, for comparing means, a null hypothesis of iden- sion (approach 3) of this procedure (see Lim and Loh,

tical populations is often appropriate, allowing the use 1996).

of permutation methods that result in exact level-α test In Section 2 we briefly review the difference be-

procedures for any type of distribution. For variance tween asymptotic properties of normal-theory tests for

comparisons, a null hypothesis of identical populations variances and for means. Important ideas in the de-

rarely makes sense—at minimum, we usually want to velopment of procedures robust to nonnormality using

allow mean differences in the populations. Given that the three approaches above, as well as Monte Carlo

it is necessary to adjust for unknown means or loca- estimates of performance, are reviewed in Section 3.

tions, permutation procedures do not provide exact, Several specific methods for the two- and k-sample

distribution-free tests for equality of variances, because problems are described in more detail. Examples are

after subtracting means, the residuals are not exchange- given in Section 4, followed by concluding remarks in

able. Section 5.

There are three basic approaches that have been used

to obtain procedures robust to nonnormality: 2. ASYMPTOTIC BEHAVIOR OF NORMAL-THEORY

TESTS FOR VARIANCES

1. Adjust the normal theory test procedure using an

estimate of kurtosis (Box and Andersen, 1955; In the groundbreaking papers by Box (1953) and

Shoemaker, 2003). Box and Andersen (1955), a clear distinction was made

2. Perform an analysis of variance (ANOVA) on a between the Type I error robustness of tests on means

data set in which each observation is replaced by a and the nonrobustness of tests on variances. Box, in

scale variable such as the absolute deviation from fact, coined the term “robustness” in his 1953 paper.

We briefly outline the differences in robustness of the

the mean or median (Levene, 1960; Brown and

two types of tests in the two-sample and k-sample de-

Forsythe, 1974). A related procedure is to perform

signs.

ANOVA on the jackknife pseudo-values of a scale

Our notation for the general k-sample problem is

quantity such as the log of the sample variance

as follows. Let X11 , . . . , X1n1 , . . . , Xk1 , . . . , Xknk be

(Miller, 1968).

k independent samples, each sample being i.i.d. with

3. Use resampling to obtain p values for a given

distribution function Gi (x), mean µi and variance

test statistic (Box and Andersen, 1955; Boos and

σi2 , i = 1, . . . , k. The sample means and variances are

i = n−1 ni Xij and s 2 = (ni − 1)−1 ni (Xij −

Brownie, 1989).

X i j =1 i j =1

Assuming normality leads naturally to tests about i )2 , respectively, and the pooled sample mean and

X

variance rather than to tests about other measures of

variance are X = N −1 ki=1 ni Xi and sp2 = (N −

dispersion. Approach 1 above is thus related to vari- −1

i=1 (ni − 1)si , respectively, where N = n1 +

k 2

ance comparisons. The ANOVA and resampling meth- k)

ods, however, can be used with statistics that focus · · · + nk .

on other measures of dispersion. A reason for empha- We first consider the case of two samples and the

sizing variances is that variances (and standard de- pooled t statistic:

viations) are the most frequently used measures of X 2

1 − X

dispersion and are building blocks in the formulation tp = .

sp2 (1/n1 + 1/n2 )

of many statistical models. On the other hand, test

procedures that are based on alternative measures of Under the null hypothesis H0 : µ1 = µ2 and assuming

scale, such as the mean absolute deviation from the equal variances in the two populations, tp is asymptot-

median (MDM), can have superior Type I and Type II ically a standard normal random variable for any type

error properties. In fact, taking these properties into of population distributions G1 and G2 . Thus, if the two

account, as well as simplicity of computation and in- populations have equal variances, tp will be asymptoti-

terpretation, lead us to recommend approach 2, using cally distribution-free under H0 . Moreover, if variances

COMPARING VARIANCES 573

k

are not equal, tp can be replaced by the asymptotically 1 1 1

distribution-free Welch t, C =1+ −

3(k − 1) i=1

ni − 1 N −k

1 − X

X 2

tW = . and C is a correction factor to speed convergence to a

s12 /n1 + s22 /n2 2

χk−1 distribution.

In contrast, consider the logarithm of the normal- Now let us relax the normal assumption and as-

theory test statistic s12 /s22 for H0 : σ12 = σ22 , suitably sume that each Gi (x) is from the location–scale fam-

standardized by sample sizes: ily generated by F0 , Gi (x) = F0 ((x − µi )/σi ). Then

1/2 under H0 , B and B/C converge in distribution to

n1 n2

(1) T= [log s12 − log s22 ]. (1/2)[β2 (F0 ) − 1] times a χk−1

2 random variable. Thus,

n1 + n2 2

for a test at nominal level α based on B/C with χk−1

The usual test is to compare s12 /s22 to an F distribu- critical values, the asymptotic level will be

tion with n1 − 1 and n2 − 1 degrees of freedom. As-

2

ymptotically this test is equivalent to comparing T to P 2

χk−1 > χ 2 (1 − α) ,

a N(0, 2) distribution. However, if we assume a com- β2 (F0 ) − 1 k−1

mon underlying distribution function for both popula- 2 (1 − α) refers to the upper 1 − α quantile

where χk−1

tions, G1 (x) = F0 ((x − µ1 )/σ1 ) and G2 (x) = F0 ((x − 2

µ2 )/σ2 ), then under H0 : σ12 = σ22 , T converges in of the χk−1 distribution. For example, when compar-

distribution to a N(0, β2 (F0 ) − 1) distribution, where ing k = 5 population variances, if β2 (F0 ) = 5, the true

β2 (F0 ) = µ4 (F0 )/[µ 2 asymptotic level of a nominal α = 0.05 test is 0.315.

2 (F0 )] i is the kurtosis of F0 and In contrast, the one-way ANOVA F statistic for

µi (F0 ) = [x − y dF0 (y)] dF0 (x) is the ith central

moment of F0 . Some authors use the term “kurtosis” means,

for γ2 = β2 − 3, but γ2 is more properly called the co- i − X )2

1 k

(X

efficient of excess or kurtosis excess. (3) F= ni ,

Location–scale population models such as F0 ((x − k − 1 i=1 sp2

µ)/σ ) have the same kurtosis as the parent F0 . Nor-

with F (k − 1, N − k) critical values is asymptotically

mal distibutions have kurtosis β2 = 3 and thus the

correct for all types of distributions Gi under the null

N(0, β2 (F0 ) − 1) distribution is the correct limiting

hypothesis of equal means and assuming equal vari-

N(0, 2) distribution of (1) when the populations are

ances. Box (1953) and others have pointed out that (3)

normal. However, if the populations have kurtosis

greater than 3, commonly observed with real data, is fairly robust to unequal variances in the case of equal

comparison of s12 /s22 to an F distribution is asymp- sample sizes.

totically equivalent to comparing a N(0, β2 (F0 ) − 1)

random variable to a N(0, 2) distribution. For exam- 3. DEVELOPMENT OF TYPE I ERROR

ple, the true asymptotic level of a nominal α = 0.05 ROBUST METHODS

one-sided test would be In this section we explain three basic strategies for

2 constructing Type I error correct methods for compar-

P Z> 1.645 ,

β2 (F0 ) − 1 ing variances or other measures of dispersion. All three

approaches originate from ideas in Box (1953) and in

where Z is a standard normal random variable. Thus, if

Box and Andersen (1955). For specific procedures, we

β2 (F0 ) = 5, then the asymptotic level is 0.122. Table 1

note whether asymptotic validity holds and summarize

of Box (1953) gave similar levels for two-sided tests

results for small sample properties obtained via Monte

for comparing two or more population variances. In

fact, the problem is even worse when comparing more Carlo simulations, often citing the large survey study

than two variances. performed by Conover, Johnson and Johnson (1981) as

Assuming the populations are normally distributed, well as the more recent studies by Boos and Brownie

Bartlett’s modified likelihood ratio statistic for H0 : (1989), Lim and Loh (1996) and Shoemaker (2003).

σ12 = σ22 = · · · = σk2 in the k-sample problem is given 3.1 Kurtosis Adjustment of Normal-Theory Tests

by B/C, where on Variances

k sp2

(2) B= (ni − 1) log , Box (1953, page 330) commented that an “inter-

i=1 si2 nal” estimate of the variation of the sample variances

574 D. D. BOOS AND C. BROWNIE

**si2 should be used to reduce dependency of a normal- corrected by using medians instead of means to center
**

theory test statistic on kurtosis. In that paper, Box pro- the variables. Brown and Forsythe (1974) and Conover,

posed dividing each sample into subsets, computing Johnson and Johnson (1981) studied the small sam-

the sample variance sij2 for each subset and using (3) ple properties of ANOVA on Yij = |Xij − Mi |, where

on the log sij2 . This ad hoc approach actually inspired Mi is the sample median for the ith sample. Conover,

the strategies of Section 3.2. The idea of an internal es- Johnson and Johnson (1981) found that this procedure,

timate of the variation also led to the proposal in Box referred to as Lev1:med, had satisfactory Type I and

and Andersen (1955) to divide the statistic B of (2) by Type II error properties for a variety of distributions,

a consistent estimate, based on sample cumulants, of although more recent simulation studies have shown

(1/2)[β2 (F0 ) − 1]. Layard (1973) proposed the less bi- that Lev1:med can be conservative with corresponding

ased kurtosis estimator loss of power (Lim and Loh, 1996; Shoemaker, 2003).

i Conover, Johnson and Johnson (1981) also reported

N ki=1 nj =1 (Xij − Xi )4

(4)

β2 = n . good properties for ANOVA on normal scores of the

k i 2 2

i=1 j =1 (Xij − Xi ) ranks of the |Xij − Mi |. The asymptotic validity of

this rank ANOVA has not been demonstrated, however,

Conover, Johnson and Johnson (1981) used Layard’s

and O’Brien (1992) reported loss of power of the rank

β2 to “correct” Bartlett’s statistic. Thus they compared

ANOVA compared to Lev1:med. Other spread vari-

B/[C(1/2)(β2 − 1)] to percentiles of the χk−1 2 dis-

ables have been proposed with the goal of producing F

tribution and referred to the resulting procedure as

tests with good Type I error properties (e.g., O’Brien,

Bar2. In their Monte Carlo study of 56 procedures,

1978, 1979).

Conover, Johnson and Johnson (1981) found that Bar2

Miller (1968) provided the basis for an asymptot-

holds its level well for symmetric distributions, but is

ically correct approach to develop scale variables to

liberal for skewed distributions. For example, for the

which the ANOVA F test can be applied. The method

square of a double exponential distribution with k = 4

and n1 = n2 = n3 = n4 = 10, Conover, Johnson and is based on the jackknife and was prompted by Box’s

Johnson (1981, Table 6) reported that the estimated ANOVA on log sij2 , the sij2 being based on splitting

level of Bar2 for nominal α = 0.05 is 0.17. Boos and each sample into two or more subsets. Miller applied

Brownie (1989) found that Bar2 lost power when com- his jackknife approach to log si2 , which is not a scale

pared to bootstrapping of Bartlett’s B/C, especially for estimator, but rather a monotone function of the scale

large k, say k ≥ 16. estimator si . The F statistic (3) is calculated on the

Shoemaker (2003) proposed several approximate jackknife pseudovalues

F tests for the two-sample case by matching mo- Uij = ni log si2 − (ni − 1) log si(j

2

),

ments of log(s12 /s22 ) with the moments of the loga-

rithm of an F random variable. The estimated degrees where si(j2

) is the sample variance in the ith sample

of freedom use Layard’s estimate of kurtosis (4). For with Xij left out. Miller (1968) proved that the sample

k > 2, Shoemaker proposed using a sum of squares on variances of the Uij converge in probability to β2 − 1

log si2 , standardized using (4), with χk−12 percentiles. and that the means of the pseudovalues in the ith sam-

Shoemaker’s simulations suggest that, similar to Bar2, ple are asymptotically normal with mean log σi2 and

the procedures hold their level fairly well for symmet- variance (β2 − 1)/ni . Together these facts give the

ric distributions, but can be liberal for skewed distribu- asymptotic correctness of (3) applied to the Uij . In

tions. small samples, especially with unequal ni , the proce-

3.2 ANOVA on Scale Variables dure does not hold its level as well as ANOVA on the

Levene-type variables Yij = |Xij − Mi | (e.g., Conover,

Levene (1960) proposed using the one-way ANOVA Johnson and Johnson, 1981; Boos and Brownie, 1989).

F statistic (3) on new variables Yij = |Xij − X i |

The beauty of the jackknife approach, however, is

or, more generally, Yij = g(|Xij − X|), where g is that it can be applied to any chosen scale estimator for

monotonically increasing on (0, ∞). Miller (1968) which the jackknife variance estimate is appropriate.

showed that ANOVA on Levene’s variables |Xij − X i | For example, Gini’s mean difference

will be asymptotically incorrect if the population

1

means are not equal to the population medians (essen- (5) ni |Xij − Xik |

tially requiring symmetry) and that the problem can be 2 j <k

COMPARING VARIANCES 575

**is a measure of scale that could be used in Miller’s Since the permutation method based on S is not ex-
**

jackknife-plus-ANOVA method. Although Gini’s mean act, it makes sense to consider sampling with replace-

difference is not fully robust, it has similar robustness ment from S, in other words, to bootstrap resample

properties as the mean absolute deviation from the me- from the residuals Xij − µ i . Formally, the procedure

dian. Moreover, as a U statistic it is unbiased for any is to generate B sets of resamples from S and com-

sample size and, therefore, the mean of the pseudoval- pute the statistic of interest T for each set of resamples,

ues for each sample is exactly equal to (5). resulting in T1∗ , . . . , TB∗ . Then p B = {# of Ti∗ ≥ T0 }

Procedures based on nonrobust scale quantities such is an estimated bootstrap p value, where T0 is the

as si2 , log si2 or MDMi are surprisingly powerful for statistic calculated for the original sample. It might

be more intuitive to think of generating residuals eij ∗

heavy-tailed distributions. However, in the face of ex-

tremely large outliers, say generated from the Cauchy and then adding the location estimates µ i , resulting

in a bootstrap set of resamples (Xij ∗ =µ i + eij∗ ,j =

distribution, one might want to consider more robust

scale estimators such as those based on trimmed av- 1, . . . , ni , i = 1, . . . , k), as is the case for bootstrap-

erages of the |Xij − Xik | (e.g., GLG of Boos and ping in regression settings. However, test statistics for

Brownie, 1989, page 73) or single ordered values (Qn comparing scale are invariant to location shifts, so that

of Rousseeuw and Croux, 1993). For these more ro- adding the µ i back in is not necessary. Note also that

bust scale estimators, the resampling methods of the drawing each sample from (6) produces a null situa-

next subsection are more appropriate because the cor- tion of equal variablility (and equal kurtosis, etc.) that

responding jackknife variance estimates are deficient is crucial for the bootstrap to work correctly. We men-

in small samples. tion this because in applications of the bootstrap to ob-

tain standard errors and confidence intervals, it is more

3.3 Resampling Methods usual to resample separately from the original individ-

Box and Andersen (1955, page 16) introduced ual samples.

the permutation version of the two-sample test of Monte Carlo results in Tables 1–3 of Boos and

H0 : σ12 = σ22 with known means based on F = Brownie (1989) show minor differences between the

1 2 Box–Andersen approach with estimated degrees of

n2 ni=1 (X1i − µ1 )2 /n1 ni=1 (X2i − µ2 )2 . They then

freedom, the permutation approach based on S and

transformed to a beta random variable and gave an F

the bootstrap approach based on S. Generally, though,

approximation to the permutation distribution involv-

the bootstrap seemed to be the best procedure and so

ing the permutation moments that in turn are functions

only bootstrap resampling was studied for the k > 2

of the sample second and fourth moments. Estimating

cases. In those cases, the bootstrap approach applied

the means then led to an approximate F distribution

to Bartlett’s B/C statistic in (2) was comparable to the

(with estimated degrees of freedom) for the standard

kurtosis adjusted version of B/C (called Bar2) in terms

test statistic F = s12 /s22 . Thus, although motivated by

of Type I error, but the bootstrap had an advantage in

the permutation approach, this method correctly be-

terms of power. In fact, bootstrapping can be used to

longs in Section 3.1. improve the Type I error (and possibly Type II error)

Boos and Brownie (1989) implemented the permu- properties of any reasonable procedure. Thus, Lim and

tation approach implied by Box and Andersen (1955). Loh (1996) found that bootstrapping Lev1:med results

That is, they used the permutation distribution based in a procedure that is less conservative and has better

on drawing samples without replacement from power than using F percentiles to obtain critical val-

(6)

S = {eij = Xij − µ

i , j = 1, . . . , ni , i = 1, . . . , k}, ues. It is also worth noting that bootstrapping a Studen-

tized statistic such as Lev1:med produces better Type I

where the µ i are location estimates such as the sam- error rates compared to bootstrapping an unstudentized

ple mean or trimmed mean. They implemented the ap- statistic like B/C, due to improved theoretical conver-

proach in the two-sample case for F = s12 /s22 and for a gence rates (see, e.g., Hall, 1992, page 150).

ratio of robust scale estimators. Note that because the

3.4 The Case Against Linear Rank Tests

residuals Xij − µi from different samples are not ex-

changeable, such a permutation procedure is not exact, Rank tests are noticeably absent from the robust

but Boos, Janssen and Veraverbeke (1989) proved that methods reviewed. Tests for dispersion based on lin-

the procedure is asymptotically correct for a large class ear rank statistics do not compare favorably with other

of U statistics. robust methods for the following reasons:

576 D. D. BOOS AND C. BROWNIE

**1. Linear rank statistics are not typically asymptoti-
**

cally distribution-free under the null of equal scales

(i.e., do not hold the correct level in large samples)

unless the medians are equal or are known and can

be subtracted (rarely the case, but of course then

permutation methods give exact tests), or the me-

dians are estimated and subtracted before perform-

ing the rank tests and the underlying populations are

symmetric (e.g., Boos, 1986, Theorem 1).

2. Even if the tests hold their level, Klotz (1962) found

that small sample performance of linear rank tests

did not match expectations based on their asymp- F IG . 1. Box plots of total pheromone for eight treatment groups:

totic relative efficiency (ARE) values. For example, Hsij refers to the Heliothis subflexa in season i = 1 or i = 2 and

the normal scores test has ARE = 1 compared to the j = 0 (virgin) or j = 1 (injected ). Notation for the four Heliothis

virescens groups is similar.

F test for normal data. For two samples at n1 = n2

and α = 0.0635, he obtained small sample efficien-

cies from 0.803 to 0.640 as the ratio of scale pa- When interest is in variability, rather than mean re-

rameters ranged from 1.5 to 4. The most powerful sponse, there is less justification for employing a “vari-

rank test did not perform much better, having ef- ance stabilizing” transformation. We therefore carried

ficiencies from 0.810 to 0.688. Thus Klotz states, out tests for differences in variation, both on the orig-

“It appears that the loss in efficiency is inherent in inal scale and on square-root-transformed data (a log-

the use of ranks for small samples with a rather high transformation produced negative skewness).

price being paid for the insurance obtained with Table 2 gives the p values for the following compar-

rank statistics.”

isons:

1. Bartlett’s statistic B/C from (2) compared to χk−12

4. EXAMPLE

To facilitate the study of pheromone blend com- critical values (Bartlett χ 2 ).

position in two interbreeding moth species (Heliothis 2. Statistic B/C divided by (1/2)[β2 − 1] compared

2

to χk−1 critical values (Bar2 χ 2 ).

subflexa and Heliothis virescens), Groot et al. (2005)

investigated the use of pheromone biosynthesis acti- 3. Shoemaker’s X 2 = ki=1 (ni − 1)(Zi − Z) 2 /(β2 −

vating neuropeptide (PBAN) to stimulate production (ni − 3)/ni ), where Zi = log si2 , compared to χk−1

2

**critical values (Shoe X ).2
**

of pheromone in mated females. Total production (and

blend composition) was compared for virgin and mated 4. The ANOVA F (3) on the |Xij − Mi | compared to

PBAN-injected females in each of two seasons for F critical values (Lev1:med F ).

both species. One of the interesting (and unexpected) 5. Statistic B/C compared to bootstrap critical val-

findings was that variation in pheromone amounts ues with B = 9999 resamples using fractional 20%

among individuals appeared to be smaller for the mated trimmed means in (6) (Bar boot trim).

PBAN-injected females than for virgin females. We 6. The ANOVA F (3) on the |Xij − Mi | compared to

use these data to illustrate a number of the tests for bootstrap critical values (Lev1:med boot trim).

variance equality described in Section 3.

Measurements of the total amount of pheromone

(nanograms) produced by females in each of eight TABLE 1

groups are displayed graphically in Figure 1. Table 1 Summary statistics for pheromone production

gives summary statistics for each group. The groups

Statistic Hs10 Hs11 Hs20 Hs21 Hv10 Hv11 Hv20 Hv21

are arranged by species (Hs, Hv), season (1, 2) and

PBAN (0 = virgin female; 1 = mated PBAN-injected Sample size 13 23 13 14 7 9 18 14

female). The data tend to be right-skewed with evi- Mean 414 201 166 231 196 260 317 279

dence of variance heterogeneity. Before carrying out an Std. dev. 261 101 149 106 116 112 194 130

MDM 175 82 103 85 91 95 147 107

ANOVA to compare means, common practice would

be to log-transform or square-root-transform the data. N OTE. MDM is the mean absolute deviation from the median.

COMPARING VARIANCES 577

**TABLE 2 When the same analyses are carried out on square-
**

p values for variance equality among groups in the root-transformed values (not shown), an indication of

pheromone study increased uniformity among mated PBAN-injected fe-

males is provided by the bootstrapped Bartlett test for

Test 1 vs. 2 All groups PBAN

groups 1 and 2 (p value = 0.098) and by the PBAN

Bartlett χ 2 0.000 0.002 main effect contrast (Lev1:med, p value = 0.074).

Bar2 χ 2 0.056 0.306

Shoe X 2 0.049 0.390

5. CONCLUDING REMARKS

Lev1:med F 0.039 0.192 0.058

Bar boot trim 0.028 0.144

We have presented the three main approaches for

Lev1:med boot trim 0.021 0.143

comparing measures of scale or spread with proper

N OTE. PBAN main effect contrast. Type I error control under a range of distributional

types. The computationally simplest Type I error robust

To illustrate two-sample tests we consider only procedure is based on comparing means of the scale

groups 1 and 2, and report the two-tailed p values variable Yij = |Xij − Mi |. This appealing approach

in column 1 of Table 2. The Bartlett test is very sig- (Lev1:med) is based on an efficient scale estimator—

nificant, but Layard’s β2 in (4) is 8.7, suggesting that the mean absolute deviation from the median—and

the Bartlett test cannot be trusted. Each of the five ro- can be generalized to factorial designs and multivari-

bust tests suggests that variation is lower in the PBAN- ate data.

injected females, with the bootstrapped Bartlett and A number of Monte Carlo studies have found,

Lev1:med having lower p values than Shoemaker’s X 2 however, that in the k-sample problem Lev1:med

and Bar2. has significance levels below the nominal level, and

Equality of variances for the eight groups was tested, especially low if the sample sizes are small and odd.

ignoring the 23 factorial structure, to illustrate the Lim and Loh (1996) demonstrated that this conser-

k-group analysis. Layard’s estimate of β2 for the eight vatism can be eliminated by using the bootstrap of

groups was 6.5. Results for the tests appear in col-

Section 3.3, but then the computational simplicity

umn 2 of Table 2, and ignoring Bartlett χ 2 , only the

is lost. Shoemaker (2003) argued persuasively for

bootstrap tests are close to significance, agreeing with

kurtosis-adjusted normal-theory methods in situations

Monte Carlo results that show greater power compared

where the distributions are approximately symmetric.

to procedures that utilize an explicit correction for kur-

tosis excess. Simulations by Shoemaker (2003) and Lim and Loh

With the exception of the bootstrap procedures, test (1996) showed good power properties for these meth-

statistics and p values can be obtained, with a little ef- ods when k ≤ 4. However, our own simulations showed

fort, using standard software such as SAS (SAS Insti- that for larger k (k = 8 not shown; k = 16, 18 in Boos

tute, 1999). (PROC MEANS followed by a data step and Brownie, 1989, Table 6), these methods lack power

can be used to compute absolute deviations from me- compared to Lev1:med (even with the more conserva-

dians, and also second and fourth central sample mo- tive F percentiles) and bootstrapping Bartlett’s statis-

ments.) The Lev1:med F is produced directly for k ≥ 2 tic.

groups using the MEANS/HOVTEST = BF option in If computational simplicity is not important, we

SAS PROC GLM. The bootstrap procedures are not agree with Lim and Loh’s (1996) recommendation for

available in any commercial software that we know of. the use of Lev1:med with bootstrap critical values.

Finally, to address the question of whether PBAN Moreover, once a decision is made to bootstrap, there

increased uniformity of response, Lev1:med was used is incentive to consider a statistic based on one of the

to test the main effect contrast for PBAN (easily more robust scale estimators mentioned at the end of

computed by applying PROC GLM, followed by a Section 3.2.

CONTRAST statement, to the |Xij − Mi | values). The

third column p value of 0.058 is consistent with the ACKNOWLEDGMENT

results for groups 1 and 2, and suggests that variation

in the amount of pheromone produced is lower among We thank Astrid Groot, Department of Entomology,

mated PBAN-injected females compared to virgin fe- North Carolina State University, for permission to use

males. the data in the example.

578 D. D. BOOS AND C. BROWNIE

**REFERENCES H ALL , P. (1992). The Bootstrap and Edgeworth Expansion.
**

B OOS , D. D. (1986). Comparing k populations with linear rank Springer, New York.

statistics. J. Amer. Statist. Assoc. 81 1018–1025. K LOTZ , J. (1962). Nonparametric tests for scale. Ann. Math. Sta-

B OOS , D. D. and B ROWNIE , C. (1989). Bootstrap methods for tist. 33 498–512.

testing homogeneity of variances. Technometrics 31 69–82. L AYARD , M. W. J. (1973). Robust large-sample tests for homo-

B OOS , D. D., JANSSEN , P. and V ERAVERBEKE , N. (1989). geneity of variances. J. Amer. Statist. Assoc. 68 195–198.

Resampling from centered data in the two-sample problem. L EVENE , H. (1960). Robust tests for equality of variances. In Con-

J. Statist. Plann. Inference 21 327–345. tributions to Probability and Statistics (I. Olkin, ed.) 278–292.

B OX , G. E. P. (1953). Non-normality and tests on variances. Bio- Stanford Univ. Press, Stanford, CA.

metrika 40 318–335. L IM , T.-S. and L OH , W.-Y. (1996). A comparison of tests of

B OX , G. E. P. and A NDERSEN , S. L. (1955). Permutation theory equality of variances. Comput. Statist. Data Anal. 22 287–301.

in the derivation of robust criteria and the study of departures

M ILLER , R. G. (1968). Jackknifing variances. Ann. Math. Statist.

from assumption (with discussion). J. Roy. Statist. Soc. Ser. B

39 567–582.

17 1–34.

B ROWN , M. B. and F ORSYTHE , A. B. (1974). Robust tests for the NAIR , V. and P REGIBON , D. (1988). Analyzing dispersion ef-

equality of variances. J. Amer. Statist. Assoc. 69 364–367. fects from replicated factorial experiments. Technometrics 30

C ARROLL , R. J. (2003). Variances are not always nuisance para- 247–257.

meters. Biometrics 59 211–220. O’B RIEN , P. C. (1992). Robust procedures for testing equality of

C ONOVER , W. J., J OHNSON , M. E. and J OHNSON , M. M. (1981). covariance matrices. Biometrics 48 819–827.

A comparative study of tests for homogeneity of variances, O’B RIEN , R. G. (1978). Robust techniques for testing heterogene-

with applications to the outer continental shelf bidding data. ity of variance effects in factorial designs. Psychometrika 43

Technometrics 23 351–361. 327–342.

FAIRFULL , R. W., C ROBER , D. C. and G OWE , R. S. (1985). Ef- O’B RIEN , R. G. (1979). A general ANOVA method for robust tests

fects of comb dubbing on the performance of laying stocks.

of additive models for variances. J. Amer. Statist. Assoc. 74

Poultry Science 64 434–439.

877–880.

G AMES , P. A., W INKLER , H. B. and P ROBERT, D. A. (1972). Ro-

bust tests for homogeneity of variance. Educational and Psy- ROUSSEEUW, P. J. and C ROUX , C. (1993). Alternatives to the me-

chological Measurement 32 887–909. dian absolute deviation. J. Amer. Statist. Assoc. 88 1273–1283.

G ROOT, A., FAN , Y., B ROWNIE , C., J URENKA , R. A., SAS I NSTITUTE I NC . (1999). SAS online doc, version 8. SAS In-

G OULD , F. and S CHAL , C. (2005). Effect of PBAN on stitute Inc., Cary, NC.

pheromone production by mated Heliothis virescens and S HOEMAKER , L. H. (2003). Fixing the F test for equal variances.

Heliothis subflexa females. J. Chemical Ecology 31 15–28. Amer. Statist. 57 105–114.

- Testing the predictive ability of two classes of modelsUploaded bypaundpro
- Employee Branding FINAL VERSIONUploaded byMaria Hassan
- RCL Master's ThesisUploaded byRob Lonsinger
- hw12.pdfUploaded byvrb126
- MODELING SYSTEMATIC RISK AND POINT-IN-TIME PROBABILITY OF DEFAULT UNDER THE VASICEK ASYMPTOTIC SINGLE RISK FACTOR MODEL FRAMEWORKUploaded byh_y02
- Output Marketing RevUploaded byAfauji
- Part 1 Project-Simulation ExerciseUploaded byNilrey Jim D. Cornites
- 1-s2.0-S0005796717301067-mainUploaded byMarvin Henriquez
- joshi2011Uploaded bySai Praneethtej Saspret
- 10.1016@j.aap.2017.07.006.pdfUploaded bySai Praneethtej Saspret
- Box CoxUploaded byrogersonuruguay
- Class Activity Wk 5Uploaded bysinbad97
- Ammann Tracking Error Variance Decomposition FinalUploaded byswinki3
- 6 Multiple RegressionUploaded byRajiv Sharma
- Full TextUploaded byNarasimha Sunchu
- RPoE.pdfUploaded byVaibhav Rajvanshi
- handout2Uploaded byCatalin Parvu
- PAPER -1 AKBUploaded byVineet Chouhan
- SymbolUploaded byrijalul fikri
- Isoplot3Uploaded bybefoa
- CH11B063 Assignment 4Uploaded byprasadboda
- Efficiency and Profitability Analysis of Investment Banking in Pakistan (2001-2011)Uploaded byFarzan Yahya
- Accrual DeterminantsUploaded byMichelle Coyle
- Linear ModelsUploaded byCART11
- Chapter 2 correlation and regression.docUploaded bySumesh Mirashi
- Sea Surface Temper 00 and eUploaded bypacotao123
- VarianceUploaded byTapan Jain
- Model Predictive Heuristic Control: Applications to Industrial ProcessesUploaded byAhmad Rahan
- Mathematics Skills as Predictors of Physics Students’ Performance in Senior Secondary SchoolsUploaded byIjsrnet Editorial
- Concurrent Optimisation of Parameter and Tolerance Design via Computer Simulation and Statistical MethodUploaded byNano Koes Ardhiyanto