Professional Documents
Culture Documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Journal of Educational Statistics
Summer 1985, Volume 10, Number 2, pp. 75-98
KEY WORDS. Empirical Bayes estimation, mixed linear models, maximum likelihood,
meta-analysis, effect size data.
There has been a recent surge of interest in quantitative methods for sum-
marizing results from many related studies. In this form of inquiry, called
meta-analysis (Glass, 1976), individual studies conducting tests of the same
hypothesis become cases in a "study of the studies." Several early meta-
analyses (Rosenthal & Rubin, 1978; Smith, Glass, & Miller, 1980) focused
primarily on finding the average effect across all studies. Such an approach
assumes that the size of the effect reported in each study is an estimate of a
common effect size of the whole population of studies.
More recent meta-analytic work (Hedges, 1982b; Hedges & Olkin, 1983;
Light & Pillemer, 1984; Rosenthal & Rubin, 1982) concentrates on discover-
ing and explaining variation in effect sizes. It is now common to test the hypoth-
esis that the variability in reported effects is attributable solely to sampling
error. If this hypothesis of homogeneity is upheld, the case for summarizing
all studies with a single average effect size estimate is strengthened. If the
hypothesis is rejected, no single number can adequately account for the
variety of reported results. Then the interesting question is to discover the
sources of variation among the reported outcomes. It is becoming routine to
test hypotheses about how variations in treatments, contexts, subjects, and
methods influence study outcomes. Increasingly, social scientists view a set of
conflicting findings as an opportunity for learning rather than a cause for
dismay. Hedges (1982b) and Hedges and Olkin have introduced a statistical
framework that greatly facilitates analysis of such conflicting results. The key
75
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
76 Raudenbush and Bryk
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 77
unit model is estimated separately for each unit (e.g., school, country, or
study). The parameters of the within-unit models are viewed as varying
randomly across units, so it is logical to pose a second-stage or between-unit
model. This model explains variation in the within-unit parameters as a func-
tion of differences between units.
ei - N(O, vi).
We assume that the estimated effect size di of study i is equivalent to a true
effect size 8, plus an error of estimate ei. The errors, ei, are assumed independ-
ently, normally distributed with variance vi. Quite commonly, di is Glass'
(1976) standardized effect size
We note that this sampling variance vi of di5ji depends on the size of 8i itself.
When 8i are small relative to the ni, this dependence has little consequence.
(See Hedges and Olkin, 1983, for a variance-stabilizing transformation of di.)
Between-study model. The effect size parameters 8i vary as a function of
known study characteristics and random error:
and
and= Wiy+ U, i=1,..., k, (5)
ui" " N(0, 72).
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
78 Raudenbush and Bryk
bi = Wi'y. (6)
In fact, the nonstochastic model is a special case of Equation 5 with 72 = 0.
To clarify this difference further, we combine Equations 1 and 5 to obtain
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 79
k
k10/2
[= 11(2vi) exp{- V2i (d -8)2i (10)
x (2,r72)-k2exp{- V2E(8i - Wi'y)2/T2},
Taking the derivative of log L and setting a log LI/ i = 0, we find that
ShiXWiWiW'y = Y X1iWidi.
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
80 Raudenbush and Bryk
di = Wi'y + Ui + ei,
from which it follows that
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 81
L (y,72;d) = [2rr(vi + 72) -1/2 exp{- V2E (Vi + 72)-'(di - Wiiy)2} (20)
X exp{- V2Q1,
from which it follows that the log of the likelihood is proportional to
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
82 Raudenbush and Bryk
1. estimate the variance of the random effects and test the hypothesis of no
variation among the effect size parameters;
2. estimate the fixed effects and test hypotheses about them, that is, hy-
potheses about the relationship between study characteristics and study
outcomes;
3. find improved empirical Bayes estimates of the individual effects, en-
abling better answers to questions such as: How large is the largest effect
of the experimental treatment?
4. examine the sensitivity of all substantive inferences to likely errors in the
estimation of variance components; and
5. investigate a series of between-study models, monitoring model ade-
quacy in reducing uncertainty about the effect parameters.
In the following example, we restrict our attention to two models. The first
is the unconditional model, wherein the effect parameters vary around a grand
mean. The second is the conditional model, wherein the effect parameters
depend on measured study characteristics plus error. A list of the teacher
expectancy studies and their effect sizes is provided in Table I.
Unconditional Model
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 83
TABLE I
Summary Results of Experiments Assessing the Effect of Teacher
Expectancy on Pupil IQ
Standard error
Weeks of of effect
Study prior contact Effect size size estimate
1. Rosenthal et al. (1974) 2 .03 .125
2. Conn et al. (1968) 21 .12 .147
3. Jose and Cody (1971) 19 -.14 .167
4. Pellegrini and Hicks (1972) 0 1.18 .373
5. Pellegrini and Hicks (1972) 0 .26 .369
6. Evans and Rosenthal (1968) 3 -.06 .103
7. Fielder et al. (1971) 17 -.02 .103
8. Claiborn (1969) 24 -.32 .220
9. Kester (1969) 0 .27 .164
10. Maxwell (1970) 1 .80 .251
11. Carter (1970) 0 .54 .302
12. Flowers (1966) 0 .18 .223
13. Keshock (1970) 1 -.02 .289
14. Henrikson (1970) 2 .23 .290
15. Fine (1972) 17 -.18 .159
16. Greiger (1970) 5 -.06 .167
17. Rosenthal and Jacobson (1968) 1 .30 .139
18. Fleming and Anttonen (1971) 2 .07 .094
19. Ginsburg (1970) 7 -.07 .174
Note. The effect size d, represents the mean difference between experimental and
control children divided by the standard deviation pooled within groups. Raudenbush
(1984b) used the control group standard deviation instead of the pooled within group
standard deviation so the effect size estimates are slightly different.
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
84 Raudenbush and Bryk
parameter variance, 2.
.4
.2.2
.3 Study 17
parameter variance, T2
FIGURE 1. Values of effect parameter variance, T7. la: The relative likelihood of 19
observed effects as a function of T. The value of maximizing this likelihood is
approximately .019. Ib: Upper and lower 95% confidence limits for the average effect
size, A, as a function of r'; 1c: Empirical Bayes estimates, 8i*, from the five studies
reporting largest effects as a function of possible values of T.2
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 85
and is our point estimate for the variance among the 5i parameters. We note
that effect parameter variance, T1, represents the total explainable variation
among the observed di. Later we use knowledge about study characteristics to
account for this variation.
Random effects: hypothesis testing. A point estimate of .019 does not rule
out the possibility that T2 = 0. We now pose the null hypothesis
Ho: T' = 0,
which is equivalent to the hypothesis that all experiments share the same
underlying population effect size. That is,
Ho: 1 = 82 = ...=k = A.
When this null hypothesis is true, the variation of the di around the grand
mean A is solely chance variation, so that when Ho is true,
di - N (A, vi).
d = 2 vi-ldi vi,1,
has an asymptotic chi-square distribution with k -1 degrees of freedom
(Hedges, 1982a; Rosenthal & Rubin, 1982). As the observed values of di
deviate widely from the grand mean (relative to the size of the sampling
variance, vi), the statistic becomes large and the hypothesis is rejected. In the
present case the value of the test statistic is 35.85, which exceeds the 99th
percentile point of the chi-square distribution with 18 degrees of freedom.
Therefore, we infer that there is significant parameter variance so that we
accept
Ha: > 0,
which implies that the observed variation across the 19 studies reflects more
than sampling error. Different experiments had different treatment effects,
either because of differential effectiveness of treatments, contextual features,
or methodological differences.
Fixed effects: estimation. In this simple case only one fixed effect, A, is to
be estimated. Here Equation 17 reduces to
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
86 Raudenbush and Bryk
A* - A .084
z -.. 1.62, (34)
[ (v, + 72)-1]1/2 .052 '
which is compared to the critical values for the standard normal distribution.2
Therefore, we cannot reject the hypothesis that the average effect is 0.
However, because the effect sizes have been found to vary, this result does not
imply that each experiment had no effect.
Improved estimation of individual effects. If we believe that T7 = 0, we
would infer that all individual bi are equivalent to the grand mean A. In the
present instance we have estimated 7T to be .019. What does this imply about
the individual effect sizes bi? Equation 16 gives the empirical Bayes estimates,
which here reduce to:
In this instance the empirical Bayes estimate 8" is the weighted average of an
estimate di derived entirely from data gathered from within study i and of A*,
the estimated grand mean for all experiments.
Figure 2 contrasts the frequency distribution of the individual effect esti-
mates di and the empirical Bayes estimates 8". Note the smaller dispersion of
the empirical Bayes estimates, which are "shrunk" toward the grand mean A*
by a factor 1 - X*. This shrinkage becomes more apparent if we re-express
Equation 35 as
8* = di - (1 - h*)(di - A*),
where 1 - 4h is the shrinking factor of James and Stein (1961) and Efron and
Morris (1975). In addition, the Xh has a substantive interpretation. It is equal
to the reliability of di as an estimate of bi, that is, it is the estimated ratio of
parameter ("true score") variance, T1, to "observed score" variance, 7 + v,.
2 Since the number of studies is small here, our estimate of r2 is likely to be imprecise
and so the z statistic from Equation 34 should be viewed as a rough approximation. The
"sensitivity" analysis (Figure 1) provides a check on the tenability of this approxi-
mation.
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 87
* *
* *
* *
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
88 Raudenbush and Bryk
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 89
1.20+ .
0.30@ .
* a
* 3
0 1 2 >2
*
o 2 > 2
In the case of our illustrative example, we consider the hypothesis that the
amount of teacher-pupil contact prior to the experiment influences the size of
the expectancy effect. (See Raudenbush, 1984b, for a discussion of the justifi-
cation of this hypothesis.) Figure 3 indicates a substantially negative relation-
ship. Studies reporting 0 or 1 week of prior contact show larger effects than
studies reporting more contact. After more than two weeks of contact, there
appears to be no effect of expectancy.
Statistical model. To investigate this relationship more formally we pose a
second hierarchical linear model. The first stage model remains the same (see
Equation 1). However, at the second stage we incorporate information about
the timing of expectancy induction:
U2i - N(O, vi + ).
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
90 Raudenbush and Bryk
Now 7i is the conditional variance of the effect sizes, that is, that part of the
parameter variance left unexplained after knowing the extent of prior teacher-
student contact. Combining Equations 1 and 36, we have:
-1 log (Vi + T2) - log l1 (Vi + T2)-1 2 w7 (Vi + T2)-1 - [1 wi(vi + T2)-1]2
-- (Vi + T2)-'(di - y* - Y-*Wi)2. (39)
Figure 4a shows the likelihood of our data as a function of possible values
of T2. The value of Tr2 that maximizes the likelihood is 0.
Random effects: hypothesis testing. After accounting for the effect of prior
contact, is there significant residual variability in the effect sizes? The fact that
the maximum likelihood estimate of this variance is 0 suggests that the residual
variability in the effect sizes-the component of parameter variation left un-
explained by prior contact-is small. Another way to see this is to test the
hypothesis:
Ho: T' = 0.
If r2 were zero then the variance of the effect sizes 8i would be vi, and the
statistic
"o* = .407,
= - .157.
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 91
A
B
Upper confidence limits, y
.6
.2
"U
SLower confidence
0
limits, y
Study 11
-.2 . . *
parameter variance, T2
2
Study 4
.6
.6 Study
Study 10
10
Study 11
* Study 17
Study 9
S.2
-.2
FIGURE 4. Values of residual parameter variance, 7'. 4a: The relative likelihood of
19 observed effect sizes as a function of possible values of r2; 4b: Confidence limits for
intercept, Y0, and slope, Y-, as a function of r 2; 4c: Empirical Bayes estimates- , *from
the five studies reporting the largest effects as a function of 72.
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
92 Raudenbush and Bryk
These results suggest first that studies with no teacher-student contact prior
to expectancy induction produce an average experimental effect of .407. Sec-
ond, we are led to expect a loss of effect of .157 for each additional week of
prior contact for each of the next 3 weeks.
Fixed effects: hypothesis testing. We can also formally test hypotheses about
-o and Yl, for example,
Ho1: "0 = 0,
H02: Y1 = 0.
The test statistics for these hypotheses are the ratios of the coefficients to
their estimated standard errors. The latter are computed from the square roots
of the diagonal elements of
IX (vi + T2*)-'lWiWi]-.
These ratios have asymptotic unit normal distributions under the null hypoth-
eses H01 and H02. For the teacher expectancy effect data, we find that for Ho01,
z = .407/.087 = 4.68,
z = .157/.036 = -4.39.
Since both z values exceed the 99.9% critical values of the standard normal
distribution, we reject both Ho01 and H02. We conclude that the average effect
of expectancy for studies with no prior teacher-pupil contact is greater than 0
TABLE II
Estimation and Hypothesis Testing Concerning Fixed Effects:
Intercept and Effect of Prior Contact
Estimation
Statistic Estimate
r* 0[ (v + T W*)-?w]-l [Y.4]07
x = (vi + 7*)-Wid - .157
Hypothesis Testing
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 93
and that the size of the effect diminishes for studies in which such prior contact
occurred.3 See Table II.
SAgain the computed z values are rough approximations (see footnote 2). The
sensitivity analysis (Figure 2) again provides a check on this approximation.
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
94 Raudenbush and Bryk
* estimate the residual variation among the effect parameters, noting the
reduction in uncertainty made possible by accounting for study character-
istics;
* test the hypothesis that the residual parameter variance is 0;
* estimate and test hypotheses about the fixed effects, that is, the effects of
between-study predictors;
* calculate improved empirical Bayes estimates of individual study effect
sizes, exploring, for instance, the size of the maximum effects detected in
a set of studies; and
* use the likelihood function to examine the sensitivity of all substantive
inferences to likely errors in variance component estimation.
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 95
interact with any of these factors, averaging can be misleading (see Light,
1983, for discussion).
A more sensible approach is to begin by assuming that the effect of a treat-
ment is influenced by a host of factors. Under this assumption the analyst has
several options. First, one might assume that the variability among the true
study effect sizes is entirely random, that is, unexplainable. Then Hedges'
(1983) random effects model is an appropriate way to proceed and can be ex-
tended to include empirical Bayes estimates of individual effects (DerSimonian
& Laird, 1983; Rubin, 1981).
Second, one might assume that the variability among the true effects is
entirely explainable. Then the fixed effects methods of Hedges (1982b) ap-
ply. The mixed model presented here incorporates both possibilities (effects
entirely random, effects entirely explainable) and also provides workable
methods when effects are partly explainable. It should be emphasized that
when parameter variance is small, the mixed model produces results similar to
those of the fixed effects model, and the latter is computationally simpler. The
advantages of the present approach are its generality, its improved estimation
of individual effects, and the sensitivity analysis it facilitates. To gain these
advantages requires distributional assumptions concerning the unobservable
random effects. The robustness of the method to violations of this assumption
is an important topic for future research. Nevertheless the mixed model ap-
proach, with fixed and random treatment effects, offers promise as a tool for
modeling diversity in a stream of research results.
APPENDIX
Maximum Likelihood Estimation
of Mixed Models Meta-Analysis via the EM Algorithm
To derive maximum likelihood estimates of vi and 72 by means of the EM algorithm
(Dempster, Laird, & Rubin, 1977), we rewrite the within- and between-study models
from Equations 1 and 5 as follows:
d = 8 + e, e - N(O,V), (Al)
and
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
96 Raudenbush and Bryk
8* = Xd + (I- -)Wy*,
V* = XV + (I - X)S(I - )',
y* = (W'A-7W)-'W'A-'d,
v = (W'!-'w)-1,
where
A = diag(X,),
A = diag(vi + 2),
and
S = Var(Wy*) = W(W'A-lW)-lW.
For the derivation of the results above, see Raudenbush (1984a). From Equations
A3 and A4 it follows that U = 8 - Wy has posterior distribution
U* = X(d - Wy*),
and
Vi = xv + xsx'.
We further utilize the fact that V has diagonal elements
The logic of the EM algorithm works as follows. Suppose 8 were known. Then vi also
would be known from Equation 4:
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Empirical Bayes Meta-Analysis 97
Acknowledgments
The authors wish to acknowledge partial support for this work from a Spencer
Foundation seed grant to Harvard University. We also wish to thank Larry Hedges and
Richard Light for their extensive comments on earlier drafts of the paper.
References
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from
incomplete data via the EM algorithm (with discussion). Journal of the Royal Statis-
tical Society, Series B, 39, 1-38.
Dempster, A. P., Rubin, D. B., & Tsutakawa, R. K. (1981). Estimation in covariance
components models. Journal of the American Statistical Association, 76, 341-353.
Deeley, J. J., & Lindley, D. V. (1981). Bayes empirical Bayes. Journal of the American
Statistical Association, 76, 833-841.
DerSimonian, R., & Laird, N. M. (1983). Evaluating the effect of coaching on SAT
scores: A meta-analysis. Harvard Educational Review, 53 (1), 1-15.
Efron, B., & Morris, C. (1975). Data analysis using Stein's estimator and its gener-
alizations. Journal of the American Statistical Association, 74, 311-319.
Glass, G. V (1976). Primary, secondary, and meta-analysis of research. Educational
Researcher, 5, 3-8.
Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and
related estimators. Journal of Educational Statistics, 6 (2), 107-128.
Hedges, L. V. (1982a). Estimation of effect size from a series of independent experi-
ments. Psychological Bulletin, 92, 490-499.
Hedges, L. V. (1982b). Fitting continuous models to effect size data. Journal of
Educational Statistics, 7 (4), 245-270.
Hedges, L. V. (1983). A random effects model for effect size. Psychological Bulletin,
93, 388-395.
Hedges, L. V., & Olkin, I. (1983). Regression models in research synthesis. American
Statistician, 37 (2), 137-140.
Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating
research findings across studies. Beverly Hills: Sage.
James, W., & Stein, C. (1961). Estimation with quadratic loss. Proceedings of the
fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1.
Berkeley: University of California.
Light, R. J. (Ed.). (1983). Evaluation studies review annual. Beverly Hills: Sage.
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
98 Raudenbush and Bryk
Light, R. J., & Pillemer, D. B. (1984). Summing up: The science of reviewing research.
Cambridge, MA: Harvard University Press.
Lindley, D. V., & Smith, A. F. M. (1972). Bayes estimates for the linear model.
Journal of the Royal Statistical Society Series B, 34, 1-41.
Mason, W. M., Wong, G. Y., & Entwistle, B. (1984). Contextual analysis through the
multi-level linear model. Sociological Methodology. San Francisco, CA: Jossey-
Bass.
Authors
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms