You are on page 1of 11

This article was downloaded by: [Universidad de los Andes]

On: 16 August 2015, At: 09:13


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954
Registered office: 5 Howick Place, London, SW1P 1WG

International Journal of
Mathematical Education in
Science and Technology
Publication details, including instructions for authors
and subscription information:
http://www.tandfonline.com/loi/tmes20

Analysis of variance and


analysing variation
a a
Donna F. Stroup & Mary M. Whiteside
a
Department of General Business , University of
Texas , Austin, Texas, 78712, U.S.A.
Published online: 09 Jul 2006.

To cite this article: Donna F. Stroup & Mary M. Whiteside (1985) Analysis of variance
and analysing variation, International Journal of Mathematical Education in Science and
Technology, 16:1, 1-9, DOI: 10.1080/0020739850160101

To link to this article: http://dx.doi.org/10.1080/0020739850160101

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the
information (the “Content”) contained in the publications on our platform.
However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or
suitability for any purpose of the Content. Any opinions and views expressed
in this publication are the opinions and views of the authors, and are not the
views of or endorsed by Taylor & Francis. The accuracy of the Content should
not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions,
claims, proceedings, demands, costs, expenses, damages, and other liabilities
whatsoever or howsoever caused arising directly or indirectly in connection
with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan, sub-
licensing, systematic supply, or distribution in any form to anyone is expressly
forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Downloaded by [Universidad de los Andes] at 09:13 16 August 2015
INT. J. MATH. EDUC. SCI. TECHNOL., 1 9 8 5 , VOL. 1 6 , NO. 1, 1-9

Analysis of variance and analysing variation

by DONNA F. STROUP and MARY M. W H I T E S I D E


Department of General Business, University of Texas,
Austin, Texas 78712, U.S.A.

(Received 31 August 1983)

Easily accessible tests for homogeneity of variance are compared to ANOVA


procedures for the problem of deciding whether significant differences exist
among independent samples. Frequently an analysis of variance is performed to
Downloaded by [Universidad de los Andes] at 09:13 16 August 2015

detect such differences assuming the data are similar in shape and variability.
This paper argues that analysis of variance performed on the data transformed to
the squares of the corresponding ranks is preferable for detecting general
differences in distributions. In business and economic decision-making, the
assumptions of similar shape and variability are rarely justified, but ironically the
question of interest is usually that of general differences.

1. Introduction
Frequently, a manager must decide whether there is a difference among several
products or processes. For example. What are the regional differences in sales of a
newly developed caffeine-free soft drink? Or, Are managers trained in transcen-
dental meditation more effective than those adhering to Theory Z? The usual
procedure to address such questions is an analysis of variance (ANOVA), testing for
equality of means. This paper presents a more appropriate procedure for a more
appropriate question. That is, rather than restrict our investigation to differences in
means, are we not interested in general differences between the products or
processes? That is, are we not also interested in any differences due to variability or
shape characteristics? The results of ANOVA are influenced by these other types of
differences.
In particular, it is often important to test the hypothesis of common variance
when analysing data from several independent samples. Sometimes the central
question is that of common variance itself, while at other times, the assumption of
common variance is necessary for subsequent analysis. An example from the field of
finance illustrates the former situation. A measure of risk for a particular security or
portfolio is obtained by regressing the return of the security against the return of the
market, usually measured by Standard and Poor's 500 Composite Index. T h e
measure of risk is the beta value obtained from the ordinary least-squares fit for the
regression model. However, in assessing overall risk for several portfolios, it is
clearly important to compare variability of observed returns about predicted values
in addition to beta values. Examples of the latter situation, statistical analyses that
assume common variance, include the classical analysis of variance and empirical
Bayes estimation of population means.
In spite of the importance of testing the hypothesis of homogeneity of variance,
there is no generally satisfactory test statistic. The very existence of more than 50
alternatives, thoroughly investigated by Conover et al. [6] in the Youden Prize
2 D. F. Stroup and M. M. Whiteside

Winning article for best expository paper, attests that none is appropriate for all
cases. A particularly difficult problem is test selection when dealing with small
samples from non-normal distributions that may differ in both location and shape.
This paper presents two useful results. First, an analysis of variance on the
squared ranks of the data is proposed as a useful procedure for finding general
differences in data (not restricted to differences in means). Secondly, we demonstrate
that under certain assumptions some of the most readily accessible tests of
homogeneity of variance are most powerful for the hypothesis of identical
distribution versus an alternative of differences in both mean and variance.

2. Background
The general ^-sample problem can be posed as follows: given k independent
samples, can we infer that they come from different populations? That is, can we
reject the hypothesis that their distribution functions are identical? How one tests
Downloaded by [Universidad de los Andes] at 09:13 16 August 2015

this hypothesis depends upon the statement of the alternative hypothesis and
assumptions made about the distributions. If the distribution functions are normal
with common variance, the alternative is different means, and the classical
ANOVA F is the appropriate test statistic. Even if the distribution functions are
non-normal, if sample sizes are sufficiently large and homogeneity of variance holds,
then ANOVA F is still indicated.
When samples sizes are relatively small, and variances are equal but the
distributions are non-normal, then the Kruskal-Wallis statistic (or equivalently,
ANOVA F on the rank transformed data) is indicated. Hodges and Lehmann [9]
showed that if distributions all have the same shape and scale and differ only with
respect to location, the asymptotic relative efficiency of the Kruskal—Wallis test with
respect to ANOVA F is bounded below by 0-864, is 3/7t = 0-955 for normal
distributions, and can be large without bound. Extensive results for the rank
transform have been presented by Conover and Iman [4, 5,10,11] and others. This
brings us to the case where sample size is small and variances are not necessarily
equal. For this case neither ANOVA F nor Kruskal-Wallis is clearly appropriate. At
this point, a powerful test of the assumption of common variance becomes
important. If homogeneity is erroneously assumed, subsequent conclusions about
the identity of the means are also likely to be invalid. On the other hand, if the
assumption of common variance is rejected, the ^-sample problem has been
answered. Thus, the first research question is : 'What is the best test of common
variance for relatively small samples from unknown distributions?' This question
has been specifically addressed by Conover et al. [6], Games et al. [7], Gartside [8],
and Layard [12].
Carter and Whiteside [3] examined the effectiveness of the three tests readily
available to the business and economic practitioner as part of the Statistical Package
for the Social Sciences (SPSS) sub-routine ONEWAY for small samples from non-
normal distributions. That study showed little difference among Bartlett's,
Cochran's and Hartley's tests for a given distribution, but considerable variation in
power and robustness for all three tests from one distribution to another. This
variation, a function of the kurtosis of the distribution, is consistent with earlier
findings [7]. These tests are likely to be the most frequently used among alternative
tests for the common variance assumption. Procedures of the type of the analysis of
variance can provide another class of tests for the general ^-sample problem that are
readily available with the most popular statistical packages.
ANO VA and analysing variation 3

A recent investigation by Conover et al. [6] divides tests of homogeneity into four
classes:
(1) Tests that are classically based on an estimate of sampling fluctuation
assuming normality (e.g. Bartlett's, Cochran's and Hartley's tests for
equality of variance).
(2) Tests that attempt to estimate kurtosis (e.g. Box's modification of Bartlett's
maximum likelihood test).
(3) Tests based on a modification of the F-test for means (e.g. ANOVA on
location-adjusted data).
(4) Linear rank tests (e.g. tests based on ranks of location-adjusted data).
As superior selections in terms of robustness and power, the authors recommend an
ANOVA on median-adjusted data and a modification of the Fligner-Killeen test also
Downloaded by [Universidad de los Andes] at 09:13 16 August 2015

using ranks of median-adjusted data. The authors have also considered a squared-
ranks test on mean-adjusted data [6].
This paper examines an analysis of variance on raw data, rank transformed data,
and squared rank data in addition to Bartlett's, Cochran's and Hartley's tests as
alternative procedures for the ^-sample problem where, under the alternative
hypothesis, populations differ by both location and scale.

3. Methodology
In actual application where the question is one of general differences, it is
inappropriate to adjust data for location. None of these earlier studies considered the
implications of this since interest centred on variance. Table 1 defines the six tests
investigated in this study. The first four (B, C, H, A) were chosen because of their
accessibility as part of most available software packages. The tests A, AR, and AR 2
do not adjust data for location prior to ranking. Thus the ANOVA procedures differ
from those previously considered.
Monte Carlo methods were employed as follows. On a CDC Dual Cyber
Computer, a pseudorandom number generator was used to produce uniformly
distributed random numbers which were then transformed to give random values
from normal [1], exponential [13], rectangular, Poisson, and double exponential
distributions. (The first two transformations are available with the Minitab
statistical package; the last three are obtained with inverses of the respective
distribution functions.) These distributions were chosen to give examples of
symmetric, asymmetric, bounded- and unbounded-support densities, and varying
degrees of skewness and kurtosis.
In the cases of the exponential and Poisson distributions, changes in location and
variance occur simultaneously. To investigate the effect of this simultaneous change
on symmetric bounded and unbounded densities, we have repeated the experiment
for the rectangular, double exponential, and the normal distributions with mean
changes that are proportional to the variance shifts.
For a given error distribution, an experiment consists of k = 3 samples and equal
sample sizes of Mj—10 for each sample. Presumably, the adverse impact of
heterogeneous variances on ANOVA results would be even greater with unequal
sample sizes but this case is not addressed here. The variances of the three
populations are taken to be a\ = a\ = a\jW=\. The values for Ware: 1, 1-21, 2-25,
3-24 (the increments reflecting an increase of 10, 50 and 80%, respectively, in the
D. F. Stroup and M. M. Whiteside

B: Bartlett test for common variance performed on the data

vPi 8
k

where g} = n}—\\ g= £ 8j

C: Cochran test for common variance performed on the data


maxs2,
C=^4

H: Hartley test for common variance performed on the data


Downloaded by [Universidad de los Andes] at 09:13 16 August 2015

max 5
H=

A: Analysis of variance performed on the data


AR: Analysis of variance performed on the ranks of the data (Kruskal-Willis), where ranks
are assigned for pooled samples
AR2: Analysis of variance performed on the squared ranks of the data, where ranks are
assigned for pooled samples

Table 1. Tests of homogeneity considered in this study

standard deviation common to populations 1 and 2). Thus the shifts in location for
the symmetric distributions will be 0, 0-05, 0-25, and 0-40, respectively. One
thousand replications of samples were generated for each value of W. In each
replication, the proportion of the 1000 sets for which the null hypothesis
Ho: a\ = a\ = <J\ is rejected at the nominal 5% level is recorded. For the case W=\,
these proportions are the experimentally determined estimates of the actual
significance levels of the various tests. For W^\, the proportions of significant
results represent values of the empirical power functions of the tests.

4. Results
The empirical significance levels and average power for the six tests are given in
table 2. Conover et al. [6] call a test with nominal size 0-05 robust if the maximum
type I error rate is less than 0-10. As reported previously for B, C, and H, the
empirical significance levels vary considerably among the distributions. These levels
are strikingly robust for all the ANOVA procedures (A, AR, AR2). The expected
probability of Type I error is 0-05 for normal distributions and the observed
significance levels for all ANOVA procedures is in fact never significantly different
from this (figures l ^ f ) . For the B, C, and H tests, the probability of Type I error is
significantly larger for the leptokurtic distributions: exponential, Poisson, and
double exponential (for example, see figure 5). For the flat or platykurtic uniform
distribution, the observed probability is considerably smaller than the desired 0-05

"(Trend in power for the first 100 of the 1000 replications are shown in figures 1 to 6.
ANO VA and analysing variation 5

level. These findings reflect known properties of the tests with respect to kurtosis.
(For an assumed significance level of 5 per cent, the standard error of the estimated
significance level is V{(O-O5)(O-95)/lOOO}sO-Ol.
The ANOVA-type procedures are apparently preferable for the non-normal
distributions considered. It should be noted, however, that means are unequal in
these cases. However, in the normal case where means are different, the classical tests
of common variance (B,C, and H) are substantially more powerful in detecting
differences.

NORM EXPO UNIF POIS DEXP NORMf


Downloaded by [Universidad de los Andes] at 09:13 16 August 2015

B 0-05(041) 0-36(0-57) 0-01(0-37) 0-10(0-19) 0-31(0-55) 0-05(0-41)


C 005(0-32) 0-32(0-47) 000(0-27) 013(016) 0-27(0-45) 0-05(0-32)
H 0-05(0-40) 0-35(0-56) 001(0-37) 0-10(019) 0-30(0-54) 0-05(0-40)

A 005(006) 006(0-22) 0-06(0-51) 0-05(0-37) 0-05(0-30) 005(010)


AR 005(0-06) 0-06(0-23) 005(0-44) 006(0-36) 0-05(0-38) 005(010)
AR 2 005(0-07) 0-07(0-26) 006(0-52) 0-05(0-37) 006(0-40) 0-05(0-15)

Table 2. Size and average power (parenthesized) for the six tests and six distributions of this
study. Standard error of the size estimate is 0 0 1 ; nominal significance level 005.
fProportional mean and variance shifts.

LEGEND
RflNKS *
SO RBNK x

'Voo

Figure 1. ANOVA procedures for normal distribution.


D. F. Stroup and M. M. Whiteside
Downloaded by [Universidad de los Andes] at 09:13 16 August 2015

LEGEND
onrn o
RBNK5 *
50 RHNK

Figure 2. ANOVA procedures for uniform distribution.

LEGEND
OBTR B
RRNKS ft
SO RflNK i

3.00 3.50 4.00

Figure 3. ANOVA procedures for double exponential distribution.


ANOVA and analysing variation
Downloaded by [Universidad de los Andes] at 09:13 16 August 2015

LEGEND
DfiTfl 0
flflNKS ft
SO RRNK i

Figure 4. ANOVA procedures for normal distribution (proportional mean and standard
deviation shifts).

LEGEND

CL50 liOO 1.50 2^00 2^50 3^00 3^50 4.00

W
Figure 5. Common variance procedures for double exponential distribution.
8 D. F. Stroup and M. M. Whiteside

5. Summary and conclusions


The results of the current study can be summarized as follows. For non-normal
distributions, the best test of the k sample hypothesis of identical population
distribution functions versus simultaneous changes of location and variance appears
to be an analysis of variance .F-test on data transformed by squaring the
corresponding ranks. For normal distributions, the B, C, and H procedures are more
powerful and are, therefore, recommended (compare figures 4 and 6). It should be
noted that the changes in variance in this study and [6] are of the specific type for
which the Cochran test is designed; i.e., there is one large variance. Even for this case
Bartlett's and Hartley's tests are more powerful (shown by table 2 and exemplified in
figures 5 and 6), and this pattern is repeated when variances differ in other ways [2].
For the decision-maker who intends never to use a test more exotic than those
readily available in common statistical packages such as SPSS, the important
Downloaded by [Universidad de los Andes] at 09:13 16 August 2015

implication is this: perform the B or H procedure (or possibly C) in conjunction with


an analysis of variance on the squared ranks of the data. The former procedures are
generally quite powerful and the latter is generally quite robust. When the results
differ, examine underlying distributions for normality incorporating a priori
knowledge and results for historical data with current sample information. This
placement of the test for normality late in the analysis is non-traditional. However,
goodness of fit tests are notoriously weak for small samples and too sensitive to
deviations when considering extremely large samples. These guidelines are easily
followed and should result in clear-cut decisions in a majority of applications.

Figure 6. Common variance procedures for normal distribution (proportional mean and
standard deviation shifts). .
ANOVA and analysing variation 9

References
[1] Box, G. E. P., and MULLER, M . E., 1958, Ann. math, statist., 29, 610.
[2] CARTER, J., 1982, Study of three tests of homogeneity of variance. Proceedings of
Southwest AIDS, Dallas.
[3] CARTER, J., and WHITESIDE, M. M., 1981, A comparison of tests for homogeneity.
Proceedings of Southwest AIDS, New Orleans.
[4] CONOVER, W. J., and IMAN, R. L., 1976, Commun. Statist. Theory Methods, 7, 1349.
[5] CONOVER, W. J., and IMAN, R. L., 1978, Commun. Statist. Simulation Computing, B7. 5,
491.
[6] CONOVER, W. J., JOHNSON, M. E., and JOHNSON, M. M., 1981, Technometrics, 1981, 23,
351.
[7] GAMES, P., WINKLER, H., and PROBERT, D., 1972, Educat. psychol. Measur., 32, 887.
[8] GARTSIDE, P., 1972, J. Am. Statist. Ass., 67, 342.
[9] HODGES, J. L., J R . , and LEHMANN, E., 1956, Ann. math, statist., 27, 324.
[10] IMAN, R. L., and CONOVER, W. J., 1980, Proc. Am. Inst. Decision Sci., 2, 217.
Downloaded by [Universidad de los Andes] at 09:13 16 August 2015

[11] IMAN, R. L., and CONOVER, W. J., 1980, Proc. Am. Inst. Decision Sci., 2, 218.
[12] LAYARD, M. W. J., 1973, Am. Statist. Ass., 68, 195.
[13] LEHMANN, R. S., and BAILEY, D . E., 1968, Digital Computing: Fortran IV and Its
Applications in Behavioral Science (New York: John Wiley).

You might also like