You are on page 1of 11

Explaining Psychological Statistics (2nd Ed.) by Barry H.

changes. In fact, the error term of the Brown-Forsythe F (I=ll refer to it as F=) is quite similar (though not identical) to what you would get from dividing each sample variance by its corresponding sample size rather than pooling the variances. The formula for F= consists of placing the usual value for MSbet over the following error term (MSW=):
(1 - ni ) si2 NT MS W = df bet

Formula 12.21

where the summation goes from 1 to k, which is the number of groups (i.e., levels of the IV). When all of the variances are equal, the top part of the formula reduces to (1 - ni / NT) s2 = (k - 1) s2, which is why it must be divided by k - 1 (i.e., dfbet). A simple example involving three groups will help to demonstrate the difference between MSW and MSW=. Suppose we are comparing three patient groups on a psychiatric ward, and the sizes of these groups are 10, 15, and 25. Further suppose that the variances of the three groups are 3, 6, and 8, respectively. The ordinary MSW would equal (9*3 + 14*6 + 24*8) / 47 = 303 / 47 = 6.45. However, MSW= will be different: (1 MS W = 10 15 25 )* 3 + (1 )* 6 + (1 )* 9 10.6 50 50 50 = = 5.3 2 2

In this example, as the group gets larger, so does its variance; this pattern always results in MSW= being smaller than MSW, which means that F= will be larger and more likely to attain significance than the usual F. Consequently, the usual F is conservative in this case, and whereas it usually has less power than F=, the conservative statistician will not object to the use of the ordinary ANOVA in this situation. Let=s reverse the variances of the largest and smallest groups, and see what happens. The ordinary MSW is reduced to (9*8 + 14*6 + 24*3) / 47 = 228 / 47 = 4.85, but MSW= increases: (1 MS W = 10 15 25 )* 8 + (1 )* 6 + (1 )* 3 12.1 50 50 50 = = 6.05 2 2

Now, MSW= is larger than MSW, so F= is smaller than F. When the larger groups have the smaller variances, using the usual ANOVA has more power, but can result in a higher Type I error rate than the alpha you are using to make your statistical decisions. This possibility is unacceptable to the conservative researcher. However, if you want to use F= you have to deal with the fact that even when the null hypothesis is true, F= does not follow an F distribution with the usual df=s. Fortunately, F= has been found to follow what is called a Aquasi-F@ distribution, which means that its distribution looks similar to an F distribution, but the df for the error term of the F= distribution is not the usual dfw. If this problem sounds familiar, that=s because it is closely related to the Behren=s-Fisher problem that you read about in Chapter 7, section C. The most popular solution to that problem is called the WelchSatterthwaite (W-S) formula, which is used to adjust the df of a separate-variances t test. I did not present that formula in Chapter 7, but I will here, because the df adjustment for F= is a logical extension of the W-S formula, and the F= version is too complex to grasp by looking at it.

Chapter 12 (Section D) B. H. Cohen

Welch-Satterthwaite degrees of freedom When there are only two groups in the analysis, MSW= reduces to the following sum: [1 - (n1 / NT)] s12 + [1 - (n2 / NT)] s22. With a little algebra, this expression can be rearranged like this: (n2 / NT) s12 + (n1 / NT) s22. Because the harmonic mean of the two sample sizes, nh, is equal to: 2n1n2 / NT, MSW> can be transformed to the following form in the two-group case:

MS W =

n h s1 s 2 ( + ) 2 n1 n2

The relation to the s-v t test should be obvious; when there are only two groups, F= equals the square of the s-v t value. Moreover, the df associated with the denominator of F=, in the two-group case, are the same as you would get from the W-S formula, which I present next. To make the formula easier to read, it is usually expressed in terms of weighting factors, such that w1 = s12 / n1 and w2 = s22 / n2.

df W - S =

( w1 + w2 )2 2 2 w1 + w2 n1 - 1 n2 - 1

Formula 12.22

When the two samples are equal in size, the p-v t is the same as the s-v t, and F= is the same as F. However, as long as the sample variances differ, dfW-S will be less than dfW, even when the n=s are equal. I have never seen any researcher use dfW-S when the n=s are equal, but Formula 12.22 reduces to a simple form, which can be instructive, in this special case. For n1 = n2, the formula for dfW-S reduces to:

( 2+ 2 ) df W - S = (n - 1) s14 s 24 s1 + s 2

In the above formula, you can see that the term n - 1 is being multiplied by a correction factor (note that s4 is the same as squaring the variance). What is less obvious is that the maximum value of this correction factor is 2. The maximum occurs when the variances are equal. Suppose that both variances equal 5. The correction factor would then be: (5 + 5)2 / (52 + 52) = 102 / (25 + 25) = 100 / 50 = 2, so dfW-S = 2 (n - 1). This is the ordinary (i.e., uncorrected) df for a t test with equal n=s. Now let us suppose that the two variances still sum to 10, but are much more disparate, say 1 and 9; the correction factor becomes: (1 + 9)2 / (12 + 92) = 102 / (1 + 81) = 100 / 82 = 1.22. As the variances continue to diverge, the correction factor approaches a minimum of 1, and dfW-S approaches n - 1. You may recall from Chapter 7 that in the case of two unequal groups, the minimum for dfW-S is the smaller of n1 - 1 and n2 - 1, while the maximum is n1 + n2 - 2. From the above example you can see that even when the n=s are equal the error df can be minimally or maximally corrected, depending on the divergence of the variances. For the same reason that the above formula reduces dfW-S more and more as the variances increasingly diverge, dfW-S gets smaller in the general case (n=s not all equal) as the w=s in Formula 12.22 grow further apart. To show what pattern of sample sizes and variances results in a greater discrepancy between the two w=s, I=ll return to the example of the three patient groups. In the first part of the example, when the larger groups have the larger variances, the w=s would come out to: .3, .4, and

Chapter 12 (Section D) B. H. Cohen

.32. In the second part of the example, the variances of the smallest and largest groups are reversed, leading to w=s of: .8, .4, .12. Notice how much more divergent the w=s are when the largest group has the smallest variance (and vice versa). It should be clear that in the two-group case this pattern not only produces a larger error term, making the s-v t smaller than the p-v t, it also yields a greater reduction in df, thus raising the critical value, and further reducing power (but also curbing a possible inflation of the Type I error rate, which is the point of the s-v test). Brown-Forsythe Degrees of Freedom As long as all of the samples are the same size, the error df for F= (i.e., dfW=) reduces to a simple formula no matter how many groups are involved.

( 2) df W = (n - 1) si4 si

Formula 12.23

(Note that both summations go from 1 to k.) As you may have guessed, dfW= ranges from a maximum of k (n - 1), when all the sample variances are equal, down to a minimum of n - 1, as the variances maximally diverge. The denominator of F= also reduces to a (very) simple formula when all of the groups are the same size:
(1 MS W = n NT df bet ) si2 = (1 n NT df bet ) si2

Because 1 - (n / NT) happens to equal 1 - (1 / k), which also equals (k - 1) / k, and dfbet = k - 1, the formula can be further simplified.

k -1 2 si s2 k = i MS W = k -1 k
As you can see, when all of the n=s are equal, MSW= is identical to the ordinary MSW no matter how many groups are involved B and, because F= involves no adjustment of MSbet for equal n=s, the Brown-Forsythe F is always the same as the ordinary F when the groups are all the same size. As in the case of two equal-sized groups, the df can still be adjusted when more than two groups are all the same size (the greater the discrepancies among the sample variances, the more severe is the df correction), but because the one-way ANOVA is quite robust with respect to the HOV assumption when all n=s are equal, the df adjustment is very rarely used in this case. Whereas the Brown-Forsythe F makes no adjustment when the n=s are equal, and makes no adjustment to MSbet, in any case, this is not true of Welch=s formula for F, which I will label as F*. To help you understand the correction that F* makes to MSbet, I will compare the usual weighted-means to the unweighted-means solution for the one-way ANOVA. [I glossed over the latter procedure in section B of this chapter, because it is so rarely used in practice. The unweighted-means approach used to be popular for two-way ANOVA (see Chapter 14, section C), but has been almost entirely replaced by the regression approach (see Chapter 18, section A).]

Chapter 12 (Section D) B. H. Cohen

The Analysis of Unweighted Means for One-Way ANOVA The usual formula for MSbet [ ni (Mi - MG )2/ (k - 1)] weighs the squared difference of each group mean from the grand mean by the size of that group, and therefore forms the basis of what is called Athe weighted-means@ ANOVA. Let=s apply this formula to the patient-group example with n=s of 10, 15, and 25; suppose the means of these groups are 7, 9, & 17, respectively. Then, the grand mean is: (10*7 + 15*9 + 25*17) / NT = (70 + 135 + 425) / 50 = 630 / 50 = 12.6. So, MSbet is [10*(-5.6)2 + 15*(-3.6)2 + 25*(4.4)2] / 2 = (313.6 + 194.4 + 484) / 2 = 992 / 2 = 496. If the means of the largest and smallest groups were reversed, the grand mean would be reduced to 9.6, and MSbet would become: [10*(7.4)2 + 15*(-.6)2 + 25*(-2.6)2] / 2 = (547.6 + 5.4 + 169) / 2 = 722 / 2 = 361. Notice that when the most deviant mean (i.e., 17) is associated with the largest group (n = 25), MSbet is considerably larger than when the most deviant mean is associated with the smallest group (496 vs. 361, respectively). The unweighted-means formula for MSbet (you can think of it as the Aequally-weighted@ formula) is identical to the formula for equal n=s, except that the harmonic mean of the sample sizes (nh) replaces An@: unweighted MSbet = nh s2. Using Formula 13.15, nh for 10, 15, and 25 is14.52, and the unbiased variance of 7, 9, and 17 is 28, so unweighted MSbet = 14.52 * 28 = 406.6. Note that this value for MSbet is between the two more extreme values for the weighted-means solution, and does not depend at all on the association between means and sample sizes. The unweighted-means analysis seems to make sense when the differences in sample sizes are accidental, so that none of the samples actually represents a larger population, but as I mentioned earlier in this chapter, it is rarely used B which is why it is not included in major statistical packages, like SPSS. Contributing to the lack of popularity of the method of unweighted means is the fact that the resulting F ratio may be slightly biased in the positive direction, increasing the Type I error rate above the alpha that is used to look up the critical F. So, why am I mentioning this method at all? Because understanding the difference between the weighted and unweighted approaches to MSbet can help you grasp an important difference between the Welch F (F*) and F=. The numerators of both F* and F= are based on a weighted-means solution, but F* uses weights that reflect not only sample sizes, but sample variances, as well. I will call the numerator of the Welch formula Wnum, because it is not directly comparable to MSbet. However, dividing Wnum by the denominator of the Welch formula does yield F*, which follows a quasi-F distribution that is similar but not identical to the distribution of F=. Welch Formula The formula for Wnum is similar to the weighted-means formula for MSbet, but with different weights. Instead of using only the sample sizes as weights, Wnum uses the ratio of sample size to sample variance for each group. If wi is defined as in Formula 12.22 (i.e., wi = si2 / ni), then Wnum can be written as:
W num = 1 wi ( X i - X WG ) k -1
2

Formula 12.24

[Note that due to font difficulties, I will be using the symbol M for mean in the text, and X-bar in the formulas.] It makes sense to use the reciprocal of wi in the above formula, because the weights are being applied to the numerator rather than the error term or dferror. Also, note that MWG is not the usual grand mean, which can be found by weighing the various group means by their sample sizes, but rather a AWelch@ grand mean that is found by using the wi>s as the weighting factors, as follows:

Chapter 12 (Section D) B. H. Cohen

X WG =

Xi wi 1 wi

The wi>s have already been calculated for our patient-group example, so let us see how Wnum is affected by the pattern of means, variances, and sample sizes. In the first part of the example, the larger groups have the larger variances (and means), and the wi=s came out to: .3, .4, and .32. Therefore, MWG equals: (3.333*7 + 2.5*9 + 3.125*17) / (3.333+2.5+3.125) = 98.956 / 8.958 = 11.05 (this is very close to a simple average of 7, 9, and 17, because in this case the larger n=s are being divided by larger s2's helping to cancel out the larger weights that would be given to the means of the larger groups, when finding the ordinary grand mean). For this part of the example, Wnum is equal to:

3.333 (7 - 11.05 )2 + 2.5 (9 - 11.05 )2 + 3.125 (17 - 11.05 )2 54.67 + 10.51+ 110.63 = = 87.91 MS bet * = 3-1 2
In the second part of the example, the variances of the smallest and largest groups are reversed (but not the means), so the 1/wi=s are: 1.25, 2.5, 8.33, and MWG is 14.31 (now, the large group has the small variance, and is therefore having a large effect on the Welch grand mean). In this case, Wnum is equal to 1.25*(7.31)2 + 2.5*(5.31)2 + 8.33*(2.69)2] / 2 = (66.795 + 70.49 + 60.277) / 2 = 197.56 / 2 = 98.78. You cannot compare this value directly to MSbet, but you can compare 98.78 to 87.91; when the larger groups have the smaller variances, the weighting factors are more discrepant (and on balance, larger), so the pattern of means can have a greater effect. For instance, if we reverse the largest and smallest means in this latest example, so that the smallest group not only has the largest variance but the largest mean as well, the Welch grand mean is 8.5, and Wnum is only 54.84 (in this case, the most discrepant mean, 17, is getting the smallest weight, 1.25. In comparison, when the larger samples have the larger variances, the weighting factors are less discrepant, minimizing the effect of discrepant means. For instance, in the first part of our example (when the weights are 3.333, 2.5, and 3.125), reversing the means has little effect on the Welch grand mean, which goes from 11.05 to 11.28. Moreover, the value of Wnum, which was 87.91 (see above) before the reversal of means, increases only slightly (to 89.65) upon reversing the means (you might want to calculate this for yourself, as an exercise). The denominator of the Welch formula is weighted in a manner similar to MSW=B the denominator gets larger when the smaller groups are associated with larger variances B so F*, like F=, tends to be conservative in this case, and more powerful when it is the larger groups that have the larger variances (the adjustment of dfW also tends to be similar between the Welch and Brown-Forsythe solutions). However, as we have just seen, the Welch formula can be seriously affected by whether the most discrepant means are associated with, for instance, large groups that have small variances, or small groups with large variances. The association of means and variances can even have an effect when all the n=s are equal, so unlike F=, F* is not usually equal to the ordinary F when all of the samples are the same size. Therefore, some statisticians recommend the use of F* even when the n=s are equal, if the variances are quite discrepant, but given the reputation of the ordinary F=s robustness with respect to heterogeneity of variance when the n=s are equal, this suggestion has little chance of being widely adopted anytime soon.

Chapter 12 (Section D) B. H. Cohen

before choosing the Welch procedure. There is one more point I wish to emphasize. Simulation studies have shown that the various alternatives for F do not diverge dramatically until the variance of one group is at least several times the variance of another. However, if your samples exhibit extreme differences in variance, it simply does not seem sensible to test the null hypothesis that the population means are all equal. Obviously, whatever it is that distinguishes your groups, is having some effect on your data, and it would seem incumbent upon you to explore your data further in an attempt to understand just why the variances diverge so widely, before testing any difference in the means. Effect-size Estimates in One-way ANOVA Estimating the Proportion of Population Variance Accounted for As I pointed out in section C of this chapter, eta-squared, which gives you the proportion of variance in your DV that is accounted for by your IV B in your data, is biased; it is an overestimate of omega-squared (2), the variance that would be accounted for if your study involved the entire population from which you are sampling. Eta-squared, as expressed in Formula 12.12 (i.e., SSbet / SStotal) can be modified to produce a much less biased estimate of 2, as in Formula 12.14. However, if you are reading a journal article that provides an F ratio and its associated df=s, but no indication of effect size, it would be much more convenient to calculate eta-squared with Formula 12.13 (reproduced below) than with Formula 12.12.

2 =

df bet F df bet F + df w

Formula 12.13

Of course, you get the same (biased) estimate of omega-squared from both of the formulas for eta-squared, but whereas I showed you the bias correction for Formula 12.12 (see Formula 12.14), I did not show the equivalent correction as applied to Formula 12.13. However, I think it would be instructive to present the Acorrected@ version of Formula 12.13 here, as Formula 12.25.
est. 2 = df bet (F - 1) df bet (F - 1) + N T

Formula 12.25

Although both formulas always give exactly the same answer, one fact that becomes obvious in Formula 12.25, but not in Formula 12.14, is that this estimate is not defined if F is less than 1.0 (2 cannot be negative). The estimate for 2 is zero when F equals 1.0, and by convention it is set to zero for any F below 1.0. Usually 2 is only estimated when F is statistically significant, or was expected to be significant (or, perhaps, if one wants to make a point about how small it is), but very rarely estimated when F is near 1.0 (recall that an F of 1.0 is telling you that the variability of your sample means is just about what you would expect purely from sampling error B without the contribution of any experimental effect). Note that when there are only two groups, dfbet equals 1, and F equals t2, so Formula 12.25 reduces to the following:
2 2 t -1 t -1 = 2 est. = 2 t - 1 + ( df W + 2) t + df W + 1 2

which is identical to Formula 10.15 (the unbiased estimate of omega-squared associated with a two-

Chapter 12 (Section D) B. H. Cohen

group t test). To illustrate the use of Formula 12.25, I will use the example summarized in Table 12.3 in section B. However, suppose that you don=t have access to the full information in Table 12.3; you have simply seen this phrase in a journal article: A...the difference in means approached statistical significance, F (2, 12) = 3.4, p < .07...@, and you want to obtain an (almost) unbiased estimate of omegasquared. You know immediately that dfbet = 2 and dfW = 12, so dftotal = 14 and NT therefore equals 15. Inserting these values into Formula 12.25, we obtain the following estimate of omega-squared:

est. 2 =

2 (3.4 - 1) 4.8 = = .242 2 (3.4 - 1) + 15 19.8

Had I kept three digits to the right of the decimal point when illustrating the use of Formula 12.14 in section C, the value above is exactly what I would have obtained, although I was not using the F value in section C, but rather the appropriate SS components from Table 12.3. Multiple Regression Approach Formula 12.25 gives you exactly the same value (except for possible rounding errors) as Formula 12.14, but there is another interesting formula for adjusting eta-squared, which generally gives you a slightly different estimate. To understand this alternative formula, it would help if you have already read pages 528, 535, 536, 566, and 567 in the text. Otherwise, it might be better to skip this section until you have covered some basics of multiple regression. Just as a t test for two independent groups can always be performed by first calculating the corresponding point-biserial r, and then using a t value to test it for significance (see p. 295), a one-way ANOVA can be performed by creating k - 1 Adummy@ predictors, using multiple regression to find the R2 for predicting the DV from those predictors, and then using an F ratio to test R2 for significance. The connection here is that the R2 you would get from the multiple regression is exactly the same as eta-squared, and like any R2 it is an overestimate of the population variance accounted for (i.e., 2). In ordinary multiple regression situations it is routine to correct the bias of R2 by calculating what is called an adjusted R2 (a typical formula for that purpose would be the square of Formula 17.14). Because the number of predictors (P) equals k - 1 when multiple regression is used to perform an ANOVA, I will square and then modify Formula 17.14 to represent eta-squared accordingly.
adj. 2 = 2 (k - 1) (1 - 2 ) NT - k

In terms of the usual df components of a one-way ANOVA, this formula can be expressed as:
adj. 2 = 2 df bet (1 - 2 ) df W

This form of the correction formula is useful in that you can see two factors that affect how much is subtracted from the original eta-squared. First, you can see that for a given total N, there is less adjustment if you have a few relatively large groups as opposed to many smaller groups (you get yet another chance to underestimate the error term, and thereby inflate eta-squared, each time you calculate a sample mean and then the variance around it). Second, it is obvious that the correction gets smaller as eta-squared gets larger; there is simply less room for error as the DV becomes increasingly more predictable from knowing which group a score comes from. I will apply the formula above to the data from Table 12.3; note that 2 in that example equals 161.97 / 448.00 = .36154.

10

2 .6385 (1 - .36154) = .36154 = .36154 - .10641 = .255 12 6

Adjusted 2 is an almost unbiased estimate of 2, but it is not identical to the estimate I got from Formula 12.25 (the latter was .242, but the former is .255). This is not a rounding error; the two formulas are not algebraically equivalent, and will usually yield values that are different B but only slightly different. Formula 12.25 represents the more traditional approach for estimating 2 in the context of ANOVA, but both estimates are considered reasonable. Although I don=t expect you to use it for any practical purpose, I will present another formula for adjusted 2, which is algebraically equivalent to the one above.
adj. 2 = 2 (1 1 ) F
Formula 12.26

I like the conceptual simplicity of this formula. Notice that 2 is being adjusted by being multiplied by a correction factor that depends only on the F ratio for testing the ANOVA. In the above formula, you can see a property that the adjusted 2 has in common with the estimate of omega-squared as expressed by Formula 12.25 B that is, the estimate is zero when F equals 1, and the adjustment is not valid for F less than 1. You can also see that as F gets larger, the correction diminishes, eventually heading for its maximum value of 1.0 (i.e., no adjustment). It doesn=t matter if F is getting larger due to a larger effect, or just larger sample sizes; larger F=s indicate that the effect in your samples is a more accurate reflection of the effect in the population. In case the formula looks just too simple to work, let=s put in the numbers from the example I have been using all along:

adj. 2 = .36154 (1 -

1 ) = .36154 (.706) = .255 3.4

As I mentioned above, the estimate created by Formula 12.25 is preferred to this one when reporting an ANOVA, but I will return to Formula 12.26 in the context of multiple regression, where it is equivalent to the usual adjustment of R2.

Chapter 12 (Section D) B. H. Cohen

11

References Brown, M. B., & Forsythe, A. B. (1974). The ANOVA and multiple comparisons for data with heterogeneous variances. Biometrics, 30, 719B724. Clinch, J. J., & Keselman, H. J. (1982). Parametric alternatives to the analysis of variance. Journal of Educational Statistics, 7, 207B214. Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic Press. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum. Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic Press. James, G. S. (1951). The comparison of several groups of observations when the ratios of the population variances are unknown. Biometrika, 38, 324B329. Tomarken, A. J., & Serlin, R. C. (1986). Comparisons of ANOVA alternatives under variance heterogeneity and specific noncentrality structures. Psychological Bulletin, 99, 90B99. Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38, 330B336. Wilcox, R. R. (1988). A new alternative to the ANOVA F and new results on James=s secondorder method. British Journal of Mathematical and Statistical Psychology, 41, 109B117. Wilcox, R. R. (2001). Fundamentals of modern statistical methods: Substantially improving power and accuracy. New York: Springer-Verlag.