Raudenbush1985empirical Bayes Meta Analysis

Empirical Bayes Meta-Analysis
Author(s): Stephen W. Raudenbush and Anthony S. Bryk

Source: Journal of Educational Statistics, Vol. 10, No. 2 (Summer, 1985), pp. 75-98
Published by: American Educational Research Association and American Statistical
Association
Stable URL: http://www.jstor.org/stable/1164836
Accessed: 25-06-2016 07:10 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
Sage Publications, Inc., American Educational Research Association, American

Statistical Association are collaborating with JSTOR to digitize, preserve and extend access to Journal of
Educational Statistics
This content downloaded from 159.178.22.27 on Sat, 25 Jun 2016 07:10:38 UTC
All use subject to http://about.jstor.org/terms
Journal of Educational Statistics
Summer 1985, Volume 10, Number 2, pp. 75-98
EMPIRICAL BAYES META-ANALYSIS
STEPHEN W. RAUDENBUSH ANTHONY S. BRYK

and
Michigan State University University of Chicago
KEY WORDS. Empirical Bayes estimation, mixed linear models, maximum likelihood,
meta-analysis, effect size data.
ABSTRACT. As interest in quantitative research synthesis grows, investigators in-

creasingly seek to use information about study features-study contexts, designs,
treatments, and subjects-to account for variation in study outcomes. To facilitate
analysis of diverse study findings, a mixed linear model with fixed and random effects
is presented and illustrated with data from teacher expectancy experiments. This
strategy enables the analyst to (a) estimate the variance of the effect size parameters
by means of maximum likelihood; (b) pose a series of linear models to explain the
effect parameter variance; (c) use information about study characteristics to derive
improved empirical Bayes estimates of individual study effect sizes; and (d) examine
the sensitivity of all substantive inferences to likely errors in the estimation of variance
components.
There has been a recent surge of interest in quantitative methods for sum-
marizing results from many related studies. In this form of inquiry, called
meta-analysis (Glass, 1976), individual studies conducting tests of the same
hypothesis become cases in a "study of the studies." Several early meta-
analyses (Rosenthal & Rubin, 1978; Smith, Glass, & Miller, 1980) focused
primarily on finding the average effect across all studies. Such an approach
assumes that the size of the effect reported in each study is an estimate of a
common effect size of the whole population of studies.
More recent meta-analytic work (Hedges, 1982b; Hedges & Olkin, 1983;
Light & Pillemer, 1984; Rosenthal & Rubin, 1982) concentrates on discover-
ing and explaining variation in effect sizes. It is now common to test the hypoth-
esis that the variability in reported effects is attributable solely to sampling
error. If this hypothesis of homogeneity is upheld, the case for summarizing
all studies with a single average effect size estimate is strengthened. If the
hypothesis is rejected, no single number can adequately account for the
variety of reported results. Then the interesting question is to discover the
sources of variation among the reported outcomes. It is becoming routine to
test hypotheses about how variations in treatments, contexts, subjects, and
methods influence study outcomes. Increasingly, social scientists view a set of
conflicting findings as an opportunity for learning rather than a cause for
dismay. Hedges (1982b) and Hedges and Olkin have introduced a statistical
framework that greatly facilitates analysis of such conflicting results. The key
75
76 Raudenbush and Bryk
element in that framework is a statistical model that represents variation in

standardized effect sizes as a function of study characteristics. However, an
assumption of that model, which may limit its application, is that all the
variance among the study effects other than sampling variance can be ex-
plained as a function of known study characteristics. When insufficient knowl-
edge is available to fully account for such variation, the model is misspecified,
meaning that parameter estimates and standard errors are untrustworthy. By
using Hedges' (1982b) test of homogeneity of effect size, the investigator can
detect such model misspecification. However, when no model is found that
survives this test, it is unclear how to proceed.
Rubin (1981) suggested a somewhat different statistical approach for syn-
thesizing research results from parallel randomized experiments. Whereas the
Hedges (1982b) model views study effects as fixed, Rubin's model views them
as random. The random effects model enables the estimation of the variance
of the effect parameters and yields improved, empirical Bayes estimates of the
individual effect sizes. However, the Rubin model makes no provision for
explaining variability in effect sizes as a function of study characteristics.
Further, since each of the parallel randomized experiments has the same
outcome variable, no provision is made for incorporating standardized effect
size estimates.
Other related work on random effects models in meta-analyses includes that

of Hunter, Schmidt, and Jackson (1982), who treated correlations and effect
sizes as random variables. Hedges (1983) presented a random effects model
for effect sizes, including unbiased estimators of both the grand mean effect
size and the variance of the effect sizes around the grand mean. DerSimonian
and Laird (1983) used a similar approach, but also found improved empirical
Bayes estimates of individual effect sizes.
In this paper we offer a general model that synthesizes these various devel-
opments. In particular, we view the standardized effect size as random and
model the variation among the effect sizes as a function of study characteristics
plus error. This model enables the analyst to test hypotheses about the effects
of study characteristics on study outcomes, to find improved empirical Bayes
estimates of individual study effect sizes, and to examine the sensitivity of all
substantive inferences to imprecision in estimation of variance components.
A Hierarchical Model for Meta-Analysis
A General Two-Stage Model

Empirical Bayes meta-analysis may be viewed as a special case of a two-
stage hierarchical linear model (Dempster, Rubin, & Tsutakawa, 1981; Lind-
ley & Smith, 1972; Mason, Wong, & Entwistle, 1984; Raudenbush, 1984a;
Strenio, 1981; Strenio, Weisberg, & Bryk, 1983). At the first stage, a within-
Empirical Bayes Meta-Analysis 77
unit model is estimated separately for each unit (e.g., school, country, or
study). The parameters of the within-unit models are viewed as varying
randomly across units, so it is logical to pose a second-stage or between-unit
model. This model explains variation in the within-unit parameters as a func-
tion of differences between units.
Within-study model. In applying the hierarchical linear model (HLM) to

effect size data, the within-study model is particularly simple:
di= i+ei, i= 1,...,k, (1)

and
ei - N(O, vi).
We assume that the estimated effect size di of study i is equivalent to a true
effect size 8, plus an error of estimate ei. The errors, ei, are assumed independ-
ently, normally distributed with variance vi. Quite commonly, di is Glass'
(1976) standardized effect size
di = (yli - y2i)/Si, (2)

where Yli - y2i is the mean difference between two treatment groups having
sample sizes n1i and n2i respectively, and Si is either the control group standard
deviation or a pooled, within-group standard deviation. (Henceforth we use
the pooled within-group standard deviation.)
The corresponding population parameter is
8i = (Jbli - R2i)/IOi. (3)

Hedges (1981) has shown that the standardized effect estimator, di, for fixed
values of 8i, is distributed as [(nli + n2i)/nlin2i]/2 times a noncentral t variate,
and therefore has an asymptotically normal distribution. For any fixed 8i, as
nli and n2i become large,
dii" -- N(bi, vi), (4)

with
Vi- (nli + n2i)/nlin2i + 82/2(nli + n2i).
We note that this sampling variance vi of di5ji depends on the size of 8i itself.
When 8i are small relative to the ni, this dependence has little consequence.
(See Hedges and Olkin, 1983, for a variance-stabilizing transformation of di.)
Between-study model. The effect size parameters 8i vary as a function of
known study characteristics and random error:
and
and= Wiy+ U, i=1,..., k, (5)
ui" " N(0, 72).
Here Wi is a (q x 1) vector of constants representing known differences

between studies, y is a (q x 1) vector of between-study parameters, and Ui is
a random error assumed independently normally distributed with mean 0 and
variance 72. Equation 5 is referred to in the Bayesian literature as the prior
distribution of 8j. Because we intend to estimate the parameters of this prior
distribution on the basis of data from many studies, our method is termed
empirical Bayes.
The chief difference between the two-stage model and that proposed by
Hedges (1982b) and Hedges and Olkin (1983) is that they proposed no prior
distribution for 8i. Rather, the dependence of 5i on Wi' is viewed as non-
stochastic, so that
bi = Wi'y. (6)
In fact, the nonstochastic model is a special case of Equation 5 with 72 = 0.
To clarify this difference further, we combine Equations 1 and 5 to obtain
di = W'ly + Ui + ei. (7)

Hence, assuming Ui and ei are independent, the marginal distribution of di
is
di - N(Wj'y, vi + 72), (8)

whereas in Hedges' model
di - N (W/'y, vi). (9)

Thus the hierarchical model (with a second stage or "prior" distribution) is
equivalent to a mixed model with fixed effects y and random effects Ui. The
Hedges model involves only the fixed effects, y.
Estimation for the Two-Stage Model

Parametric empirical Bayes methods provide a general strategy for esti-
mation when many parameters must be estimated and the parameters them-
selves constitute realizations from a "prior" probability distribution. Because
of the Bayesian flavor of this conceptualization, empirical Bayes is often
thought to be a branch of Bayesian statistics. Deeley and Lindley (1981) and
Morris (1983), however, show that empirical Bayes methods may be viewed as
asymptotically optimal classical methods. In fact here we use maximum like-
lihood techniques to derive empirical Bayes estimates because these tech-
niques are more widely understood than Bayesian methods.
Estimation assuming known variances. When vi and 72 are known, finding
maximum likelihood estimators 8i* and y* is straightforward. The joint likeli-
hood of d = (d1,d2, ..., dk)', given parameters 8 = (81,82, .., 5k)' and y, is
given by:
k
L (8,y;d) = H fi(diji) g/(8iWi,,y)
k10/2
[= 11(2vi) exp{- V2i (d -8)2i (10)
x (2,r72)-k2exp{- V2E(8i - Wi'y)2/T2},
so that log L is proportional to
-V2 I (di - 8i)2/Vi -2 I (8i - Wi'y)2/72. (11)

Note that Equation 10 would be the exact likelihood of the di if Equation 4
were the exact sampling distribution of the di. Since Equation 4 is a large
sample approximation, Equation 10 is also a large sample approximation of
the likelihood.
Taking the derivative of log L and setting a log LI/ i = 0, we find that
(di - 8i)/vi = (8i - W,'y)/72. (12)

This yields the first set of k likelihood equations
i = 72/(T2 + vi)di + Vi/(T2 + Vi) WiT'y

= Xidi + (1- hi)W/'y, i = 1, ... k, (13)
with
Xi = 72/(72 + Vi) = vi-1/(vi-l1+ 7-2). (14)

Setting aLog L/ / y = 0 yields
S6 iWi' = Wi'y Wit'

= Y ,Wii,
Transposing both sides yields the final likelihood equation,
IY Wiii= Y W/W6i"Y. (15)

Substituting Equation 13 into Equation 15 yields
Y Wihidi + Y Wj(1 - Xi)Wi'y = Y WiWi'y,

or
ShiXWiWiW'y = Y X1iWidi.
Thus the maximum likelihood estimators, assuming hiXWiWi' nonsingular,

are
8i* = Xidi + (1 - /)Wi 'y*, i = 1, ... k, (16)

y*(I ( XiWiWil')- 1 YiWidi
= [ (v, + 72) -1WiWi,']-1 (v + T2)-'Widi. (17)
Equation 16 is the empirical Bayes estimator (Efron & Morris, 1975;

James & Stein, 1961), which is also the mean of the posterior distribution of
8i given di; i = 1, ... k, when prior knowledge about y is vague (Raudenbush,
1984a). The estimate 8i* is a weighted composite of di and Wi'y*. To under-
stand the significance of this composite, let us consider two situations. First,
suppose we saw only the data from study i. Then an estimate of 8j, based on
study i's data, would be di. On the other hand, suppose we ignored the data
from study i, but we knew something from the whole sample about the
relationship between study features W and study outcomes 8j. Then our best
estimate of bi would be Wi'y*. Now suppose both estimates were available.
How could we combine them to yield an improved estimator, superior to each
of its components? Equation 16 shows the optimal combination of the two
estimates, optimal in the sense that the Bayesian posterior mean minimizes
squared-error loss. The weights for di are proportional to v-'1, the precision in
estimating 5i based only on the data from study i. The weights for Wi'Y* are
proportional to 7-2, which measures the concentration of 8i around Wi'y. Thus
the more precise the within-study results, the more weight is accorded di. The
more the parameters 5i are concentrated around their conditional mean Wi'y,
the more weight is accorded the estimate Wi'y*.
Between-study effects y are estimated by y* from Equation 17, which is a
weighted least squares estimator. To see this, recall from Equation 7 that
di = Wi'y + Ui + ei,
from which it follows that
Var(di) = Var(Ui) + Var(ei) = 72 + vi. (18)

Thus Equation 17 may be viewed as a weighted least squares estimator of y,
where outcomes di are weighted inversely proportional to their variances.
Variance estimation. The assumption that vi and 72 are known clarifies the
logic behind the estimators 8i* and y*. Most applications however, will require
variance estimation. When vi are unequal, as they will be in meta-analysis,
maximum likelihood estimates of 8i and y are not available in closed form. An
iterative method, the EM algorithm (Dempster, Laird, & Rubin, 1977) can be
employed. This method is presented in the Appendix.
However, when it is justifiable to assume that the estimate of vi from each
study is approximately equivalent to its true value, a simpler method is avail-
able. In particular, when the within-study samples n1, and n2i are large, the
sample estimate $3i of vi from Equation 4 is
Vi = (nli + n2i)/nlin2i + d2/2(ni + n2i). (19)

The only discrepancy between ii, and vi is that di has been substituted for 8j.
When the ni are large relative to 8i, this discrepancy will have little con-
sequence. Thus, when the asymptotic theory concerning the distribution of di

is justified, it will often be justified to assume that Vi = vi. When employing this
assumption, it is straightforward to find the likelihood of the data as a function
of 72 alone so that it becomes straightforward to find a maximum likelihood
estimate of T2.1
The approximate likelihood of the di is given by the marginal density of the

vector d = (dl,d2, ... , dk)' (see Equation 8).
L (y,72;d) = [2rr(vi + 72) -1/2 exp{- V2E (Vi + 72)-'(di - Wiiy)2} (20)
= {i [2n(vi + T2)-l1/2} exp{-V2 Q}, (21)

where
Q = I (Vi + 72)-l(di - W/'y)2 (22)

= (vi + 7T2)-1(d - Wiy*)2 + (y* - Y)' + 72)-1WiWi(y* -) (23)
= QI + Q2.
Equation 23 shows that if 72 is known, y* is sufficient for y. That is, with 72
known, the density of d factors into a constant times
exp{- ?2Q1}) exp{- Q2}. (24)

The first term exp{-12AQ1} is free of y and the second term exp{-1/2Q2
depends on the data only through y*. We can employ the principle of con-
ditioning on the sufficient statistic, y*, to yield the likelihood of the data
free of the parameter oy. Under the distributional assumptions represented
in Equation 8, y* is multivariate normal with mean vector y and dispersion
[I(Vi + T2)-1IWit-1. The density of y* is therefore
f(y*;T2,y) = (21r)-q/2 I (Vi + )-T1Wiit 11/2 exp{- ?Q2}. (25)

Therefore the density of d conditional on y* is
L____;d 1fH[2rr(vi + 2)]-1/2 exp{-V?(Q + Q2)}

g(dL (y,72;d) -i
jy*;72)L=)+ (26)
f(Y*;T2,y) (2"rr)-q/2 I (Vi + T 12)-'1W' 1/2 exp{-Q2} (26)
k
= (2".) -1/2(k-q) I- (vi + T2)-1/2 I (Vi + T2)-IWiWit-1/2
X exp{- V2Q1,
from which it follows that the log of the likelihood is proportional to
'An alternative and more mathematically elegant solution to the dependence of vi

on bi is to employ Hedges and Olkin's (1983) variance stabilizing transformation.
However, when study sample sizes are large, the present approach is reasonable and
has the advantage of preserving the standardized effect size metric.
-1 log(vi + 72) - log I (Vi + 72)-1'WWi' I- E (Vi + T2)-l(di - W'Py*)2. (27)
To estimate T2, it simply remains to evaluate Equation 27 for a range of

values of T2. The hypothetical value of T2 maximizing this likelihood can then
be taken as a point estimate of T7. We call this value T2*, and henceforth when
referring to y*, assume that 72* is substituted into Equation 17.
We now have all the necessary results for large sample hypothesis testing.
In particular, inferences about second stage parameters y can be made from
the asymptotic distribution of y*, which under the assumptions above and
with k large is
(,* - Y) - N{0O, [Y (Vi + T2*)-1Wii]I-1}. (28)

Illustrative Example
Application of the model is illustrated with data from 19 experiments assess-

ing the effects of teacher expectancy on pupil IQ (Raudenbush, 1984b). We
demonstrate how HLM can be used to do the following:
1. estimate the variance of the random effects and test the hypothesis of no
variation among the effect size parameters;
2. estimate the fixed effects and test hypotheses about them, that is, hy-
potheses about the relationship between study characteristics and study
outcomes;
3. find improved empirical Bayes estimates of the individual effects, en-
abling better answers to questions such as: How large is the largest effect
of the experimental treatment?
4. examine the sensitivity of all substantive inferences to likely errors in the
estimation of variance components; and
5. investigate a series of between-study models, monitoring model ade-
quacy in reducing uncertainty about the effect parameters.
In the following example, we restrict our attention to two models. The first
is the unconditional model, wherein the effect parameters vary around a grand
mean. The second is the conditional model, wherein the effect parameters
depend on measured study characteristics plus error. A list of the teacher
expectancy studies and their effect sizes is provided in Table I.
Unconditional Model
The within-study model is given by Equation 1. The effect size estimate di

is a function of the true effect size bi plus random error ei, where each ei is
independently, normally distributed with a mean of 0 and variance vi, and
where vi is given by Equation 19.
In the between-study model, the effect size parameters vary around a grand
mean:
Ai = A + Uij, Ui -- N(0, r ). (29)

This model is equivalent to Equation 5 with y equal to the scalar A and Wi'
equal to the scalar 1.
Random effects: estimation. To estimate 7T, we evaluate the likelihood
whose logarithm is given by Equation 27, which in this case reduces to
-1 log (vi + T7) - log Y (v, + T)-' - Q1, (30)

with
Ql = c (vi + 7T)-1d[2 - [ (vi + Ti)-1di]2/ (vi + 7T)-1.

The maximum likelihood estimate for 7T can be determined by evaluating
Equation 30 over the range of possible values for 7T. Figure la shows the
likelihood of observing the 19 values of di as a function of possible values
of T7. In this case, 7T2 = .019 maximizes the likelihood of observing our data
TABLE I
Summary Results of Experiments Assessing the Effect of Teacher
Expectancy on Pupil IQ
Standard error
Weeks of of effect
Study prior contact Effect size size estimate
1. Rosenthal et al. (1974) 2 .03 .125
2. Conn et al. (1968) 21 .12 .147
3. Jose and Cody (1971) 19 -.14 .167
4. Pellegrini and Hicks (1972) 0 1.18 .373
5. Pellegrini and Hicks (1972) 0 .26 .369
6. Evans and Rosenthal (1968) 3 -.06 .103
7. Fielder et al. (1971) 17 -.02 .103
8. Claiborn (1969) 24 -.32 .220
9. Kester (1969) 0 .27 .164
10. Maxwell (1970) 1 .80 .251
11. Carter (1970) 0 .54 .302
12. Flowers (1966) 0 .18 .223
13. Keshock (1970) 1 -.02 .289
14. Henrikson (1970) 2 .23 .290
15. Fine (1972) 17 -.18 .159
16. Greiger (1970) 5 -.06 .167
17. Rosenthal and Jacobson (1968) 1 .30 .139
18. Fleming and Anttonen (1971) 2 .07 .094
19. Ginsburg (1970) 7 -.07 .174
Note. The effect size d, represents the mean difference between experimental and
control children divided by the standard deviation pooled within groups. Raudenbush
(1984b) used the control group standard deviation instead of the pooled within group
standard deviation so the effect size estimates are slightly different.
parameter variance, 2.
.4
.02 .04 .06 .08 .10 .12 .14 .16
.2.2
.3 Study 17
-.1wer confidence limits, A

.5
.02 .04 .06 .08 .10 .12 .14 .16
parameter variance, T2
FIGURE 1. Values of effect parameter variance, T7. la: The relative likelihood of 19
observed effects as a function of T. The value of maximizing this likelihood is
approximately .019. Ib: Upper and lower 95% confidence limits for the average effect
size, A, as a function of r'; 1c: Empirical Bayes estimates, 8i*, from the five studies
reporting largest effects as a function of possible values of T.2
and is our point estimate for the variance among the 5i parameters. We note
that effect parameter variance, T1, represents the total explainable variation
among the observed di. Later we use knowledge about study characteristics to
account for this variation.
Random effects: hypothesis testing. A point estimate of .019 does not rule
out the possibility that T2 = 0. We now pose the null hypothesis
Ho: T' = 0,
which is equivalent to the hypothesis that all experiments share the same
underlying population effect size. That is,
Ho: 1 = 82 = ...=k = A.
When this null hypothesis is true, the variation of the di around the grand
mean A is solely chance variation, so that when Ho is true,
di - N (A, vi).
Under this hypothesis the statistic
2 vi-1 (di - d)2, (31)

with
d = 2 vi-ldi vi,1,
has an asymptotic chi-square distribution with k -1 degrees of freedom
(Hedges, 1982a; Rosenthal & Rubin, 1982). As the observed values of di
deviate widely from the grand mean (relative to the size of the sampling
variance, vi), the statistic becomes large and the hypothesis is rejected. In the
present case the value of the test statistic is 35.85, which exceeds the 99th
percentile point of the chi-square distribution with 18 degrees of freedom.
Therefore, we infer that there is significant parameter variance so that we
accept
Ha: > 0,
which implies that the observed variation across the 19 studies reflects more
than sampling error. Different experiments had different treatment effects,
either because of differential effectiveness of treatments, contextual features,
or methodological differences.
Fixed effects: estimation. In this simple case only one fixed effect, A, is to
be estimated. Here Equation 17 reduces to
A* = z (vi + T2*)-ldI/ (v1 + Ti*)-1. (32)

Substituting the data from Table I and an estimate for Ti of .019 into Equation
32 yields a value of .084 as an estimate of the average study effect, A.
Fixed effects: hypothesis testing. We use the asymptotic sampling distribu-

tion of A* given by Equation 28, which reduces here (since y* = A*) to
(A* - A)- N (0, [E (vi + Ti*)-']-l). (33)

Therefore to test the hypothesis that A = 0, that is, that the average effect
size is 0, we use a z statistic:
A* - A .084
z -.. 1.62, (34)
[ (v, + 72)-1]1/2 .052 '
which is compared to the critical values for the standard normal distribution.2
Therefore, we cannot reject the hypothesis that the average effect is 0.
However, because the effect sizes have been found to vary, this result does not
imply that each experiment had no effect.
Improved estimation of individual effects. If we believe that T7 = 0, we
would infer that all individual bi are equivalent to the grand mean A. In the
present instance we have estimated 7T to be .019. What does this imply about
the individual effect sizes bi? Equation 16 gives the empirical Bayes estimates,
which here reduce to:
'* = * di + (1 - X)A* (35)

with
' = T*/(T2 + Vi).
In this instance the empirical Bayes estimate 8" is the weighted average of an
estimate di derived entirely from data gathered from within study i and of A*,
the estimated grand mean for all experiments.
Figure 2 contrasts the frequency distribution of the individual effect esti-
mates di and the empirical Bayes estimates 8". Note the smaller dispersion of
the empirical Bayes estimates, which are "shrunk" toward the grand mean A*
by a factor 1 - X*. This shrinkage becomes more apparent if we re-express
Equation 35 as
8* = di - (1 - h*)(di - A*),
where 1 - 4h is the shrinking factor of James and Stein (1961) and Efron and
Morris (1975). In addition, the Xh has a substantive interpretation. It is equal
to the reliability of di as an estimate of bi, that is, it is the estimated ratio of
parameter ("true score") variance, T1, to "observed score" variance, 7 + v,.
2 Since the number of studies is small here, our estimate of r2 is likely to be imprecise
and so the z statistic from Equation 34 should be viewed as a rough approximation. The
"sensitivity" analysis (Figure 1) provides a check on the tenability of this approxi-
mation.
.o3 ..2 o .. o .. .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 1.1 1.2
* *
* *
* *
-.3 -.2 -.1 -.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 1.1 1.2
FIGURE 2. Frequency distribution of effect size estimates for 19 experiments: esti-

mates di based on individual study data (top), and empirical Bayes estimates 68*
(bottom).
Thus, if we had a perfectly reliable estimate di of 8i, so that Xh = 1, there would

be no shrinkage.
Especially noteworthy in the present data set is the fact that for several
studies reporting large effects the empirical Bayes estimates, 8i, are substan-
tially smaller than the separately estimated effect sizes, di. These studies,
seeming to reveal especially large effects, also had relatively small samples and
therefore relatively low reliabilities of estimation. As a result, these effect
estimates experienced considerable shrinkage toward the mean.
Sensitivity of substantive inferences to errors in variance component estima-
tion. Our estimate of the grand mean, A, its standard error, and the empirical
Bayes estimates, 8*, for the 19 individual effect sizes are conditional on our
estimate of T7. We now examine the sensitivity of our substantive inferences
about A and bi to likely errors in estimating T7.
If we knew the true value of T2 we could compute an exact confidence
interval for A. In the absence of knowing Ti, we can compute an exact con-
fidence interval for each possible value of A and then examine the resulting
range of intervals. In this vein, Figure lb shows for each value of T7 our esti-
mate of A and a 95% confidence interval for A. Figure la indicates for each
value of T2 its relative likelihood of yielding our data. By reviewing simulta-
neously Figures la and ib, we can examine the confidence interval for A in
light of our uncertainty about the true value of T2. Thus we can see that over
the range of plausible values of T7, the upper and lower confidence limits for
A run from about -.10 to .30.
By a similar process we can examine the sensitivity of individual effect size

estimates V to uncertainty about T 2. Especially interesting here is the ques-
tion: How big are the biggest effects of teacher expectancy on pupil IQ ? We
could answer the question with some confidence if we knew the true value of
Ti. Since we don't know Ti, we instead estimate the effects for different
possible value of T2 where the range of possible values for T is determined by
relative likelihood presented in Figure la. Thus, Figure ic shows the empirical
Bayes estimates for the five experiments reporting the largest effects as a
function of different possible values for 71.
Clearly our opinion about the maximum effect of expectancy on IQ is highly
sensitive to our assumptions about Tr. The maximum effect could be as small
as about .06 or as large as about .60. Each value between these extremes has
some likelihood of yielding data like ours. Yet the difference between them is
vast-the difference between a trivially small and a large effect.
The essence of the findings so far is that a great deal of uncertainty about
the effect of expectancy remains. Our best estimate of the average effect size
from the 19 studies is small and not significantly different from zero. Yet we
have determined that to summarize these studies with a single number is mis-
leading. It seems clear that some experiments had larger effects than others,
but how large are the largest effects of expectancy on IQ ? Our answer to this
question depends on our assumption about the size of T7, the parameter vari-
ance. Yet a rather wide range of T7 values is plausible. Over this range, infer-
ences about the effects of expectancy vary quite dramatically.
One way to reduce this uncertainty would be to conduct more experiments.
If the new experiments continued to reveal the patterns shown above, our
estimate of 72 would be more precise so that the likelihood function plotted in
Figure la would be more peaked. Since the range of plausible values of T
would be smaller, the associated range of plausible substantive inferences
would also be smaller.
An alternative strategy for reducing uncertainty about these effects is to use

knowledge about how studies differed to account for variation in their effects.
We now illustrate how this latter strategy can be pursued through HLM.
Expanding the Model: Taking Between-Study Factors Into Account

The next logical step in empirical Bayes meta-analysis is to use knowledge
about differences between studies to reduce uncertainty about effect size
variation. We therefore pose a linear model for the effect size parameters,
estimate their residual variance, retest the hypothesis of no residual parameter
variance, and test hypotheses about the "fixed effects," that is, the effects of
study characteristics on study outcomes. We then reexamine the sensitivity of
resulting substantive inferences to likely errors in estimating the residual
parameter variance.
1.20+ .
0.30@ .
* a
* 3
0 1 2 >2
*
o 2 > 2
Weeks of prior contact
FIGURE 3. Effect size estimates di (vertical axis) as a function of weeks of prior

contact (horizontal axis).
In the case of our illustrative example, we consider the hypothesis that the
amount of teacher-pupil contact prior to the experiment influences the size of
the expectancy effect. (See Raudenbush, 1984b, for a discussion of the justifi-
cation of this hypothesis.) Figure 3 indicates a substantially negative relation-
ship. Studies reporting 0 or 1 week of prior contact show larger effects than
studies reporting more contact. After more than two weeks of contact, there
appears to be no effect of expectancy.
Statistical model. To investigate this relationship more formally we pose a
second hierarchical linear model. The first stage model remains the same (see
Equation 1). However, at the second stage we incorporate information about
the timing of expectancy induction:
=i -= o + ylWi + U2i, (36)

where
y0 = the average effect of expectancy for studies with no prior teacher-

student contact,
"Yi = the effect of an additional week of prior contact (up to three weeks),
and
wi = 0, 1, 2, or 3 for studies having 0, 1, 2, or 3 or more weeks of prior

contact, respectively.
We assume
U2i - N(O, vi + ).
Now 7i is the conditional variance of the effect sizes, that is, that part of the
parameter variance left unexplained after knowing the extent of prior teacher-
student contact. Combining Equations 1 and 36, we have:
di = yo + yWi + U2i + ei. (37)

Therefore,
di - N(,o + ylwi, vi + 7T2). (38)

Random effects: estimation. To estimate the residual parameter variance T2
conditional on knowing about wi, we evaluate the log likelihood given by
Equation 27, which in this case is proportional to
-1 log (Vi + T2) - log l1 (Vi + T2)-1 2 w7 (Vi + T2)-1 - [1 wi(vi + T2)-1]2
-- (Vi + T2)-'(di - y* - Y-*Wi)2. (39)
Figure 4a shows the likelihood of our data as a function of possible values
of T2. The value of Tr2 that maximizes the likelihood is 0.
Random effects: hypothesis testing. After accounting for the effect of prior
contact, is there significant residual variability in the effect sizes? The fact that
the maximum likelihood estimate of this variance is 0 suggests that the residual
variability in the effect sizes-the component of parameter variation left un-
explained by prior contact-is small. Another way to see this is to test the
hypothesis:
Ho: T' = 0.
If r2 were zero then the variance of the effect sizes 8i would be vi, and the
statistic
- v'1 (di - * - y*'wi)2 (40)

would have a chi-square distribution with k -2 = 17 degrees of freedom. In
this case the value of the statistic in Equation 40 is 16.58, which is less than the
mean of a chi-square distribution with 17 degrees of freedom.
After controlling for prior teacher-pupil acquaintance, the residual vari-
ability in the effect sizes is consistent with the hypothesis that no variability in
the bi remains to be explained. That is, if no variability remained to be
explained, a test statistic of 16.58 would not be surprising. Of course, other
hypotheses, for example, that T2 is somewhat greater than 0, cannot be dis-
missed on the basis of this test.
Fixed effects: estimation. We apply the weighted least squares algorithm

from Equation 17 to estimate yo and yl. Results are
"o* = .407,
= - .157.
A
0 .02 .04 .06

parameter variance, T. 2
B
Upper confidence limits, y
.6
.2
"U
SLower confidence
0
limits, y
Upper confidence limits, y
Study 11
-.2 . . *
Lower confidence limits,-y
parameter variance, T2
2
Study 4
.6
.6 Study
Study 10
10
Study 11
* Study 17
Study 9
S.2
-.2
parameter variance, 'T2

2
FIGURE 4. Values of residual parameter variance, 7'. 4a: The relative likelihood of
19 observed effect sizes as a function of possible values of r2; 4b: Confidence limits for
intercept, Y0, and slope, Y-, as a function of r 2; 4c: Empirical Bayes estimates- , *from
the five studies reporting the largest effects as a function of 72.
These results suggest first that studies with no teacher-student contact prior
to expectancy induction produce an average experimental effect of .407. Sec-
ond, we are led to expect a loss of effect of .157 for each additional week of
prior contact for each of the next 3 weeks.
Fixed effects: hypothesis testing. We can also formally test hypotheses about
-o and Yl, for example,
Ho1: "0 = 0,
H02: Y1 = 0.
The test statistics for these hypotheses are the ratios of the coefficients to
their estimated standard errors. The latter are computed from the square roots
of the diagonal elements of
IX (vi + T2*)-'lWiWi]-.
These ratios have asymptotic unit normal distributions under the null hypoth-
eses H01 and H02. For the teacher expectancy effect data, we find that for Ho01,
z = .407/.087 = 4.68,
and for H02,
z = .157/.036 = -4.39.
Since both z values exceed the 99.9% critical values of the standard normal
distribution, we reject both Ho01 and H02. We conclude that the average effect
of expectancy for studies with no prior teacher-pupil contact is greater than 0
TABLE II
Estimation and Hypothesis Testing Concerning Fixed Effects:
Intercept and Effect of Prior Contact
Estimation
Statistic Estimate
r* 0[ (v + T W*)-?w]-l [Y.4]07
x = (vi + 7*)-Wid - .157
Var(,*) = [T (vi + r2*)-WiW]- L

Var [* -.00757 -.00283
l -.00283 .00128
Hypothesis Testing
Effect Coefficient Standard error z p

yo = intercept .407 .087 4.68 <.001
y, = effect of prior contact -.157 .036 -4.39 <.001
and that the size of the effect diminishes for studies in which such prior contact
occurred.3 See Table II.
Re-estimation of individual effects. Under the conditional model specified

by Equations 1 and 36 the empirical Bayes estimator for 8; is
u* = 'Xd, + (1 - X ')(y%* + y w,), (41)

where now,
T'= (T/( + vi).
If we assume 2 = 0, then Xh = 0 for all i and the empirical Bayes estimates

reduce simply to y ' + y ' wi, which is the fixed effects model (see Equation 9).
Of course the estimate that Tr2 = 0 could easily be mistaken, and the next step
of the empirical Bayes task is to monitor the sensitivity of our inferences to
likely errors of estimation of T2.
Examining the sensitivity of inference to mis-estimation of variance com-
ponents. Our estimates of the fixed effects, yo, yl, their standard errors, and
the estimate of individual effects 81, 82, ..., 8k are conditional on our belief
about Tr2, the residual variance of the random effects. We now examine the
sensitivity of our substantive inferences to mis-estimation of T2. In the present
example, we find that a small range of T2 values is plausible, and over that
range, key substantive inferences are quite similar.
Figure 4b displays, for each possible value of T2, the 95 % confidence interval
for fixed effects, yo and Yi. For no plausible value of Tr does the confidence
interval for yo include 0, indicating that experiments with no prior contact had,
on average, a significant positive effect. Also, for no value of TrE does the 95%
confidence interval for y include 0, indicating that prior contact was signifi-
cantly negatively related to effect size. Thus we see that key inferences about
fixed effects yo and i are not sensitive to our uncertainty about Tr.
For each possible value of T"r, Figure 4c displays the empirical Bayes esti-
mates, 8", for the five studies reporting greatest experimental effects. For
values of T" near 0, the estimated effects depend almost entirely on each
study's weeks of prior contact. As "T departs from 0, empirical Bayes effect
size estimates increasingly resemble the original reported effect size estimates.
However, values far from zero are implausible, as indicated by the accom-
panying likelihood function. Over plausible values of T", 8* values are fairly
similar. Comparing this result to that of Figure ic reveals a definite reduction
in uncertainty about individual effect sizes, a reduction made possible by
utilizing prior contact as a predictor of effect size.
SAgain the computed z values are rough approximations (see footnote 2). The
sensitivity analysis (Figure 2) again provides a check on this approximation.
Summary and Discussion
We have presented in this paper a general strategy for meta-analysis that

synthesizes recent advances on this topic, particularly from Hedges (1982b),
Rubin (1981), and DerSimonian and Laird (1983). To employ this approach,
the investigator begins by studying the unconditional variation of a set of
standardized effect sizes around their grand mean. Using maximum likeli-
hood, the variance of the random effects is estimated, and the hypothesis of
no parameter variance can be tested. Since this variance represents the total
explainable variation among the effect size parameters, it serves as a standard
for monitoring the reduction in uncertainty made possible by subsequent
models.
Next, we employ knowledge about differences between studies to account

for variation in outcomes. For each linear model posed to account for effect
size variation, we can
* estimate the residual variation among the effect parameters, noting the
reduction in uncertainty made possible by accounting for study character-
istics;
* test the hypothesis that the residual parameter variance is 0;
* estimate and test hypotheses about the fixed effects, that is, the effects of
between-study predictors;
* calculate improved empirical Bayes estimates of individual study effect
sizes, exploring, for instance, the size of the maximum effects detected in
a set of studies; and
* use the likelihood function to examine the sensitivity of all substantive
inferences to likely errors in variance component estimation.
An important advantage of the model is that estimates of fixed effects and

their standard errors are interpretable even when it is impossible to retain a
hypothesis of effect size homogeneity.
The model is potentially generalizable in two ways. First, the within-study
model may be elaborated to include both main effects and interaction effects.
For instance, a two-way ANOVA model could be estimated for each study and
variation among within-study parameters examined as a function of between-
study factors. Second, the within-study model could be expanded to include
multiple outcomes. Estimates of fixed effects could be adjusted for depend-
ence among such multiple outcomes.
On a more conceptual level, we believe that far too much attention in past
applications of meta-analysis has focused on averaging effect sizes. To sum-
marize a stream of research with an average effect size is to imply that the effect
of a treatment is constant across populations of subjects, contexts, implemen-
tations of the treatment, and research designs. To the extent the treatments
interact with any of these factors, averaging can be misleading (see Light,
1983, for discussion).
A more sensible approach is to begin by assuming that the effect of a treat-
ment is influenced by a host of factors. Under this assumption the analyst has
several options. First, one might assume that the variability among the true
study effect sizes is entirely random, that is, unexplainable. Then Hedges'
(1983) random effects model is an appropriate way to proceed and can be ex-
tended to include empirical Bayes estimates of individual effects (DerSimonian
& Laird, 1983; Rubin, 1981).
Second, one might assume that the variability among the true effects is
entirely explainable. Then the fixed effects methods of Hedges (1982b) ap-
ply. The mixed model presented here incorporates both possibilities (effects
entirely random, effects entirely explainable) and also provides workable
methods when effects are partly explainable. It should be emphasized that
when parameter variance is small, the mixed model produces results similar to
those of the fixed effects model, and the latter is computationally simpler. The
advantages of the present approach are its generality, its improved estimation
of individual effects, and the sensitivity analysis it facilitates. To gain these
advantages requires distributional assumptions concerning the unobservable
random effects. The robustness of the method to violations of this assumption
is an important topic for future research. Nevertheless the mixed model ap-
proach, with fixed and random treatment effects, offers promise as a tool for
modeling diversity in a stream of research results.
APPENDIX
Maximum Likelihood Estimation
of Mixed Models Meta-Analysis via the EM Algorithm
To derive maximum likelihood estimates of vi and 72 by means of the EM algorithm
(Dempster, Laird, & Rubin, 1977), we rewrite the within- and between-study models
from Equations 1 and 5 as follows:
d = 8 + e, e - N(O,V), (Al)
and
8 = Wy + U, U - N(O, 2I1) (A2)

where d = (dl,d2, ..., dk)', 8 = (51,52, . , Sk)', e = (e1,e2, ek)', V = diag(vi),
W = (W,W2,..., Wk)', where Wi = (1,Wli, ..., wq-li)', and U = (U1,U2, ... Uk)'.
We use the fact that the posterior distribution of 8 and y, given d, T2, and V, with a
vague prior distribution for y, are given by
8 d,V,r2 - N(8*, V*), (A3)

and
y Id,V, T2 - N(y*, V*), (A4)

with
8* = Xd + (I- -)Wy*,
V* = XV + (I - X)S(I - )',
y* = (W'A-7W)-'W'A-'d,
v = (W'!-'w)-1,
where
A = diag(X,),
A = diag(vi + 2),
and
S = Var(Wy*) = W(W'A-lW)-lW.
For the derivation of the results above, see Raudenbush (1984a). From Equations
A3 and A4 it follows that U = 8 - Wy has posterior distribution
U I d,V,72 - N(U*,V*), (A5)

with
U* = X(d - Wy*),
and
Vi = xv + xsx'.
We further utilize the fact that V has diagonal elements
Var (8i I d,V,j2) = hivi + (1 - Xi)2Sii, (A6)

with
Si = Var (W'y*) = W'(W'A-'W)-'Wi.

Also V* has diagonal elements
Var (UT I d,V,T2) = Xivi + X2Sii. (A7)
The logic of the EM algorithm works as follows. Suppose 8 were known. Then vi also
would be known from Equation 4:
Vi = (ni + n2i)/n1in2i + i2/2(nii + n2i). (A8)
Further, if U had been observed, T2 could easily be estimated by means of maximum

likelihood:
"T2 = 1/k U'U. (A9)

The dilemma is that if V and 72 were known, 8, y, and U could be estimated from
Equations A3, A4, and A5 by using their posterior means (which are also their
maximum likelihood estimates as is evident from Equations 16 and 17). On the other
hand, if 8, y, and U were known, V would be known and 72 could be estimated. The
solution to this dilemma afforded by EM is an iterative process that begins with some
reasonable initial estimate of T2 and V, say T7 and Vo. Assuming temporarily that Tr and
Vo are accurate, we find the posterior expectations of 82 and U'U and substitute them
into Equations A8 and A9, yielding new estimates of V and 72. The posterior expecta-
tions can be found from Equations A3 and A5 with rT substituted for 72 and Vo substi-
tuted for V:
E(82 1 d,V,72) = Var(i I d,V,72) + *2

= Xhv, + (1 - h,)2S,, + {hd, + (1 - h,)Wvy}2, (A10)
and
E(U'U) = tr(VU) + U*'U*

= (Xv, + + 2Sii) + X 2(d - WPy)2. (All)
Once new estimates of V and '2 are found, they can be substituted into the formulas
for the posterior means and variances of 8, y, and U given by Equations A3, A4, and
A5. These new values yield new posterior expectations, Equations A10 and All,
which yield new variance estimates from Equations A8 and A9. The process iterates
back and forth with convergence to maximum likelihood estimates demonstrated by
Dempster, Laird, and Rubin (1977) under assumptions such as those employed here.
Acknowledgments
The authors wish to acknowledge partial support for this work from a Spencer
Foundation seed grant to Harvard University. We also wish to thank Larry Hedges and
Richard Light for their extensive comments on earlier drafts of the paper.
References
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from
incomplete data via the EM algorithm (with discussion). Journal of the Royal Statis-
tical Society, Series B, 39, 1-38.
Dempster, A. P., Rubin, D. B., & Tsutakawa, R. K. (1981). Estimation in covariance
components models. Journal of the American Statistical Association, 76, 341-353.
Deeley, J. J., & Lindley, D. V. (1981). Bayes empirical Bayes. Journal of the American
Statistical Association, 76, 833-841.
DerSimonian, R., & Laird, N. M. (1983). Evaluating the effect of coaching on SAT
scores: A meta-analysis. Harvard Educational Review, 53 (1), 1-15.
Efron, B., & Morris, C. (1975). Data analysis using Stein's estimator and its gener-
alizations. Journal of the American Statistical Association, 74, 311-319.
Glass, G. V (1976). Primary, secondary, and meta-analysis of research. Educational
Researcher, 5, 3-8.
Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and
related estimators. Journal of Educational Statistics, 6 (2), 107-128.
Hedges, L. V. (1982a). Estimation of effect size from a series of independent experi-
ments. Psychological Bulletin, 92, 490-499.
Hedges, L. V. (1982b). Fitting continuous models to effect size data. Journal of
Educational Statistics, 7 (4), 245-270.
Hedges, L. V. (1983). A random effects model for effect size. Psychological Bulletin,
93, 388-395.
Hedges, L. V., & Olkin, I. (1983). Regression models in research synthesis. American
Statistician, 37 (2), 137-140.
Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating
research findings across studies. Beverly Hills: Sage.
James, W., & Stein, C. (1961). Estimation with quadratic loss. Proceedings of the
fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1.
Berkeley: University of California.
Light, R. J. (Ed.). (1983). Evaluation studies review annual. Beverly Hills: Sage.
Light, R. J., & Pillemer, D. B. (1984). Summing up: The science of reviewing research.
Cambridge, MA: Harvard University Press.
Lindley, D. V., & Smith, A. F. M. (1972). Bayes estimates for the linear model.
Journal of the Royal Statistical Society Series B, 34, 1-41.
Mason, W. M., Wong, G. Y., & Entwistle, B. (1984). Contextual analysis through the
multi-level linear model. Sociological Methodology. San Francisco, CA: Jossey-
Bass.
Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications.

Journal of the American Statistical Association, 78, 47-65.
Raudenbush, S. W. (1984a). Applications of a hierarchical linear model in educational
research. Unpublished doctoral dissertation, Harvard University.
Raudenbush, S. W. (1984b). Magnitude of teacher expectancy effects on pupil IQ as
a function of the credibility of expectancy induction: A synthesis of findings from 18
experiments. Journal of Educational Psychology, 76 (1), 85-97.
Rosenthal, R., & Rubin, D. B. (1978). Interpersonal expectancy effects: The first 345
studies. Behavioral and Brain Sciences, 3, 377-386.
Rosenthal, R., & Rubin, D. B. (1982). Comparing effect sizes of independent studies.
Psychological Bulletin, 92, 500-504.
Rubin, D. B. (1981). Estimation in parallel randomized experiments. Journal of Edu-
cational Statistics, 6 (4), 377-400.
Smith, M. L., Glass, G. V, & Miller, T. I. (1980). The benefit of psychotherapy.
Baltimore: The Johns Hopkins University Press.
Sternio, J. L. F. (1981). Empirical Bayes estimation for a hierarchical linear model.
Unpublished doctoral dissertation, Department of Statistics, Harvard University.
Sternio, J. L. F., Weisberg, H. I., & Bryk, A. S. (1983). Empirical Bayes estimation
of individual growth curves parameters and their relationship to covariates. Bio-
metrics, 39, 71-86.
Authors
STEPHEN W. RAUDENBUSH, Assistant Professor, Department of Counseling,

Educational Psychology, and Special Education, Michigan State University, East
Lansing, MI 48824. Specializations: Multi-level data analysis, research synthesis.
ANTHONY S. BRYK, Associate Professor, Department of Education, University of

Chicago, Chicago, IL 60637. Specializations: Multi-level data analysis, program
evaluation, research on school effects.

Raudenbush1985empirical Bayes Meta Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Raudenbush1985empirical Bayes Meta Analysis

Uploaded by

Copyright:

Available Formats

Empirical Bayes Meta-Analysis

Author(s): Stephen W. Raudenbush and Anthony S. Bryk

Sage Publications, Inc., American Educational Research Association, American

EMPIRICAL BAYES META-ANALYSIS

STEPHEN W. RAUDENBUSH ANTHONY S. BRYK

ABSTRACT. As interest in quantitative research synthesis grows, investigators in-

element in that framework is a statistical model that represents variation in

Other related work on random effects models in meta-analyses includes that

A Hierarchical Model for Meta-Analysis

A General Two-Stage Model

Within-study model. In applying the hierarchical linear model (HLM) to

di= i+ei, i= 1,...,k, (1)

di = (yli - y2i)/Si, (2)

8i = (Jbli - R2i)/IOi. (3)

dii" -- N(bi, vi), (4)

Vi- (nli + n2i)/nlin2i + 82/2(nli + n2i).

Here Wi is a (q x 1) vector of constants representing known differences

di = W'ly + Ui + ei. (7)

di - N(Wj'y, vi + 72), (8)

di - N (W/'y, vi). (9)

Estimation for the Two-Stage Model

L (8,y;d) = H fi(diji) g/(8iWi,,y)

so that log L is proportional to

-V2 I (di - 8i)2/Vi -2 I (8i - Wi'y)2/72. (11)

(di - 8i)/vi = (8i - W,'y)/72. (12)

i = 72/(T2 + vi)di + Vi/(T2 + Vi) WiT'y

Xi = 72/(72 + Vi) = vi-1/(vi-l1+ 7-2). (14)

S6 iWi' = Wi'y Wit'

Transposing both sides yields the final likelihood equation,

IY Wiii= Y W/W6i"Y. (15)

Y Wihidi + Y Wj(1 - Xi)Wi'y = Y WiWi'y,

Thus the maximum likelihood estimators, assuming hiXWiWi' nonsingular,

8i* = Xidi + (1 - /)Wi 'y*, i = 1, ... k, (16)

Equation 16 is the empirical Bayes estimator (Efron & Morris, 1975;

Var(di) = Var(Ui) + Var(ei) = 72 + vi. (18)

Vi = (nli + n2i)/nlin2i + d2/2(ni + n2i). (19)

sequence. Thus, when the asymptotic theory concerning the distribution of di

The approximate likelihood of the di is given by the marginal density of the

= {i [2n(vi + T2)-l1/2} exp{-V2 Q}, (21)

Q = I (Vi + 72)-l(di - W/'y)2 (22)

exp{- ?2Q1}) exp{- Q2}. (24)

f(y*;T2,y) = (21r)-q/2 I (Vi + )-T1Wiit 11/2 exp{- ?Q2}. (25)

L____;d 1fH[2rr(vi + 2)]-1/2 exp{-V?(Q + Q2)}

= (2".) -1/2(k-q) I- (vi + T2)-1/2 I (Vi + T2)-IWiWit-1/2

'An alternative and more mathematically elegant solution to the dependence of vi

-1 log(vi + 72) - log I (Vi + 72)-1'WWi' I- E (Vi + T2)-l(di - W'Py*)2. (27)

To estimate T2, it simply remains to evaluate Equation 27 for a range of

(,* - Y) - N{0O, [Y (Vi + T2*)-1Wii]I-1}. (28)

Application of the model is illustrated with data from 19 experiments assess-

The within-study model is given by Equation 1. The effect size estimate di

Ai = A + Uij, Ui -- N(0, r ). (29)

-1 log (vi + T7) - log Y (v, + T)-' - Q1, (30)

Ql = c (vi + 7T)-1d[2 - [ (vi + Ti)-1di]2/ (vi + 7T)-1.

.02 .04 .06 .08 .10 .12 .14 .16

-.1wer confidence limits, A

.02 .04 .06 .08 .10 .12 .14 .16

Under this hypothesis the statistic

2 vi-1 (di - d)2, (31)

A* = z (vi + T2*)-ldI/ (v1 + Ti*)-1. (32)

Fixed effects: hypothesis testing. We use the asymptotic sampling distribu-

(A* - A)- N (0, [E (vi + Ti*)-']-l). (33)

'* = * di + (1 - X)A* (35)

' = T*/(T2 + Vi).

A* = z (vi + T2)-ldI/ (v1 + Ti)-1. (32)

Var(,) = [T (vi + r2)-WiW]- L

8 d,V,r2 - N(8, V), (A3)

y Id,V, T2 - N(y, V), (A4)

U I d,V,72 - N(U,V), (A5)

E(U'U) = tr(VU) + U'U