You are on page 1of 11

STATISTICS IN MEDICINE

Statist. Med. 18, 2331}2341 (1999)

DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL


META-ANALYSES

INGRAM OLKIN*
Department of Statistics and School of Education, Stanford University, Stanford, CA 94305, U.S.A.

SUMMARY
The number of published meta-analyses in medicine has had phenomenal growth, to a point where over 300
meta-analyses in medicine are published yearly. Because meta-analyses tend to lead to policy decisions, it is
extremely important that the analyses be robust and that alternative analyses yield consistent results. We
herein provide a discussion of diagnostic statistical procedures. Copyright  1999 John Wiley & Sons, Ltd.

1. INTRODUCTION
Meta-analysis has been de"ned in a variety of ways. One popularized de"nition (New >ork ¹imes,
1994) is: &A meta-analysis aims at gleaning more information from existing data by pooling the
results of many smaller studies and applying one or more statistical techniques. The bene"ts or
hazards that might not be detected in small studies can be found in meta-analysis that uses data
from thousands of patients.'
Why has meta-analysis become so popular? I believe that there are several reasons. First, and
perhaps foremost, is the information explosion that has occurred between the 1940s and the
present. In most "elds the number of journals has increased approximately ten- to twelve-fold.
For example, the number of biomedical journals has increased from 2300 to over 25,000. This
number is translated to over 9 million articles per year. A similar escalation has occurred in other
"elds such as psychology, education and mathematics.
A second impetus to meta-analysis in medicine is the movement towards the use of evidence-
based health care both in providing health care and in establishing public policy. Meta-analysis is
particularly appealing because it makes use of existing information. The Cochrane Collaboration
is a move towards implementing data centres that provide systematic reviews in a wide range of
medical areas, for example, acute respiratory infections, colorectal cancer, depression, hyperten-
sion, pregnancy and childbirth, and schizophrenia.
To illustrate the growth and breadth of meta-analytic studies in medicine, in Table I we list by
area the meta-analyses in 1996 found in MEDLINE. Thus we "nd approximately 300 meta-
analytic studies that encompass results of over 3000 articles.
There are two distinct, but related, parts of a meta-analysis } the pre-statistical and the
statistical. The pre-statistical part involves searching, coding, evaluating coding decisions,

* Correspondence to: Ingram Olkin, Department of Statistics and School of Education, Stanford University, Stanford,
CA 94305, U.S.A.

CCC 0277}6715/99/172331}11$17.50
Copyright  1999 John Wiley & Sons, Ltd.
2332 I. OLKIN

Table I. Meta-analyses for 1996

Types Number

Cardiovascular diseases 26
Gastro-intestinal diseases 12
Hypertension 11
Infectious diseases 19
Joint and bone diseases 13
Neurologic disorders 13
Oncology 32
Pregnancy 16
Psychological techniques (non-pharmacological) 28
Other 59
Total 229

Epidemiological meta-analysis 42
Diagnostic 22
Cost e!ectiveness 13

determining variables, extracting data, determining inclusion}exclusion criteria, dealing with


missing data, judging research quality and managing databases. This part of the project can be,
and often is, formidable. The entire analysis depends heavily on how bias, quality, equivalence of
measures, di!erent doses, compliance, drop-outs, and so on, will be resolved. It is this portion of
a meta-analysis in which decisions have to be made on how to treat studies that have dissimilar-
ities as well as similarities. Indeed, it is here that di!erent analysts may make di!erent decisions,
and thereby arrive at somewhat di!erent conclusions. However, one would expect that re-
analyses might di!er somewhat in the estimates of risk, but not in direction or in e!ectiveness. In
general, it is most helpful to plan a meta-analysis with a three-pronged team consisting of
a medically knowledgeable member (that is, a specialist in the substantive aspects), an epidemiol-
ogist, and a biostatistician. Of course, several members in each category is preferable. There are
currently a number of sources that discuss the pre-statistical portion of a meta-analytic project
(for example, see Light and Pillemer, Lang and Cecic and Cooper and Hedges).
The fact that the pre-statistical problems loom large does not negate that we should pay
considerable attention to the statistical issues. It is in this context that I con"ne myself to a set of
guiding principles that focus on statistical diagnostic measures that should be considered. Diagnos-
tic procedures are not automatic, and their interpretation is an art. If a treatment or intervention is
to have an e!ect on health care practice, then we need to be con"dent of the conclusions.

2. DIAGNOSTIC PRINCIPLES
The following are some basic principles that should be considered in the planning and the
statistical portion of a meta-analysis.

2.1 Consistency of results


Do not rely on the results of a single analysis. Rather, carry out alternative statistical analyses.
Although the resulting estimates might di!er quantitatively to some degree, they should provide
qualitatively similar results of the e!ectiveness of the treatement or intervention.

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2333

Table II. Twelve studies on the e!ectiveness of beta-blockers

Beta-blocker Mean age Beta-blocker Mean age

Bucindolol 55 Carvedilol 52
Bucindolol 54 Carvedilol 58
Bucindolol 52 Metoprolol 50
Carvedilol 67 Metoprolol 50
Carvedilol 55 Metoprolol 63
Carvedilol 51 Metoprolol 49

The following are some examples of alternative analyses:


(i) "xed e!ects models and random e!ects models;
(ii) subgroup analyses and analysis of variance;
(iii) multivariate analyses and Bonferroni adjustments for multiple univariate analyses;
(iv) Bayesian analysis and Bayesian hierarchical models.
To illustrate alternative analyses using subgroups and analysis of variance, consider a set of 12
studies on the e!ectiveness of beta-blockers with age as a covariate as shown in Table II. One
analysis, without taking account of age or type of beta-blocker, would give an overall mean e!ect,
after which subsequent subgroup analyses with respect to age or beta-blocker could be made. An
alternative analysis is a two-way unbalanced analysis of variance with type of beta-blocker being
the row e!ects and age (less than 55, 55 and over) as column e!ects. This analysis then gives
a mean e!ect, a beta-blocker e!ect, an age e!ect and an interaction e!ect. It also obviates the need
for multiple comparison adjustments.
A commentary of warning is needed if an analysis of variance is to be used. The assumptions of
independence, additive model and approximate normality are satis"ed for most measures of
e!ectiveness of a treatment versus a control such as the di!erence of proportions, di!erence of the
logarithms of proportions or odds ratios. However, the assumption of homoscedasticity is
violated. To use an analysis of variance model, we should "rst stabilize the variances by using the
variance stabilizing transformation z"2 arcsin (p for which the asymptotic variance is 1/n.
A consequence of this transformation is that we have created an unbalanced analysis of variance
for which the variances are known. The analysis for such a design is that of a weighted least
squares.
The arcsin transformation for proportions is similar in spirit to the transformation of correla-
tions to Fisher's z. In the case of the variance-stabilizing transformation for correlations, Fisher
states three advantages: (i) the variance is independent of the parameter; (ii) the distribution of
z tends to normality rapidly as the sample is increased; (iii) the distribution of the sample
correlation changes in form as the population parameter changes, whereas the distribution
of z is nearly constant. These advantages remain true for proportions as well as for
correlations. Although the arcsin transformation is somewhat alien to many practitioners, it is
easy to use and has the distinct advantage of not having to estimate the variance as in the case of
other measures.
The use of transformations was studied early on in the development of analysis of
variance methodology. Bartlett  discusses the square root transformation, the inverse sine or
angular transformation, and the probit transformation. Cochran discusses the square root

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
2334 I. OLKIN

Table III. Relative risk for 35 studies on passive smoking e!ects

Relative risk Con"dence interval

Fixed e!ects 1)19 (1)02, 1)38)


Random e!ects 1)20 (1)07, 1)34)
Bayesian hierarchical 1)22 (1)08, 1.37)

transformation and the angular transformation. Because p and 1!p arise in the binomial
distribution it was natural to consider p"sin a and 1!p"cos a. Cochran's summary states,
&This paper discusses the theoretical basis for the use of the square root and inverse since
transformations in analyzing data whose experimental errors follow the Poisson and binominal
frequency laws respectively.' Curtiss provides the necessary limiting distribution of the inverse
sign statistic.
An analysis that shows consistency of results is provided by Givens et al. who obtained data
from 35 studies concerning environmental tobacco smoke with a comparison of alternative
analyses (Table III). The authors show a con"dence interval for each of the 35 studies and also
display a combined e!ect size con"dence interval for the EPA "xed e!ects meta-analysis
(based on U.S. studies only), the standard random e!ects meta-analysis, the standard Bayesian
meta-analysis, and the Bayesian meta-analysis accounting for potential bias. For this data
there are very small discrepancies in the estimates, so that we obtain almost complete consistency
of results using alternative methods of analysis. This consistency is very comforting to the
researcher.

2.2. Sensitivity analyses


I think of sensitivity analyses as a set of robustness procedures designed to determine whether the
results hold up under modest perturbations of the data. It is essential that policy conclusions not
be reversed by the results of a few cases.
1. Small cell counts can create changes in results. Redo the analyses with small modi"cations
in the cell counts. For example, would the results vary if a proportion is changed from 1/20
to 2/20 or 0/20? The resulting conclusions should not be too di!erent.
2. Some measures of the di!erence between treatment and control are sensitive to perturba-
tions, whereas others are relatively stable. For example, if p "0)02 and p "0)01 then
! 2
(p !(p "0)041 and log p /p "0)69. A change of p to 0)008 changes the square root
! 2 ! 2 2
di!erence to 0)052 and the di!erence of logarithms to 0)92. To determine the importance of
such changes we need to refer to the distributions of these measures. Under the null
hypothesis of no di!erence between treatment and control, the di!erence in square roots
is approximately normally distributed with mean zero and variance (1!p )/
2
4n #(1!p )/4n , whereas the di!erence in logarithms has approximate variance
2 ! !
(1!p )/p n #(1!p )/p n .
2 2 2 ! ! !
3. Drop one study at a time and obtain a con"dence interval for the remaining studies. If there
are k studies, this method requires redoing the analysis (k#1) times. The resulting display
will show whether a single study is particularly in#uential. This type of analysis is a good
check for consistency.

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2335

Figure 1. Plot of ordered p-values p versus their rank i


G

4. Plot, plot, plot whenever and whatever you can. &A picture is worth a thousand words'. Plots
often exhibit patterns that tabular displays do not show.
(a) Plot funnel plots to help detect heterogeneity. This display is most useful when there is
an ample number of studies. Funnel plots o!er a visualization of e!ects, but su!er from
a lack of inferential methods.
(b) Plot p-values against ranks. Under the null hypothesis these should fall on a line (see
Figure 1).
(c) Plot the combined results of the k studies with one omitted, plus the overall e!ect (see
Figure 2).
(d) Plot cumulative results in which the cumulation is based on di!erent covariates such as
chronology, sample size, e!ect size, and so on. (For a discussion of cumulative meta-
analysis and examples, see Lau et al.)
(e) Plot regressions of e!ect size against covariates. For example, the time that thrombolytic
therapy was administered is very important. In Figure 1 the x-axis represents time and
the y-axis represents e!ect size. Such a plot will show whether time alters the e!ec-
tiveness of the therapy. In other examples, age may be an important covariate, or length
of time used.
(f ) Another type of &regression' is to plot e!ect size against control rate. When control rates
are very low, that is, a low incidence (a healthy population), then the e!ects will not be
too pronounced. For more details on this see Schmid et al.
(g) Compare the results of large studies with those of small studies. Is there a di!erence? An
analysis of studies on thromobolytic therapy in acute myocardial infarction yielded the
following odds ratios using "xed and random e!ects models:

12 studies with sample sizes over 500:


"xed e!ects 0)53 (0)39}0)72)
random e!ects 0)59 (0)42}0)81)

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
2336 I. OLKIN

Figure 2. Con"dence intervals and relative risks for the 35 ETS studies, the EPA "xed e!ects meta-analysis (based on
U.S. studies only), the standard random e!ects meta-analysis, the standard Bayesian meta-analysis and the Bayesian
meta-analysis accounting for potential publication bias (with permission from Statistical Science)

32 studies with sample sizes under 100:


"xed e!ects 0)75 (0)70}0)80)
random e!ects 0)75 (0)70}0)80).

Here we "nd close agreement between the "xed and random e!ects models, some
discrepancy in mean e!ect between large and small studies, but consistency in terms of
direction. Thus, the issue here is to explain sample size discrepancy, but not potential
heterogeneity.
(h) P-values have a bad reputation because they focus on an overall single number and do
not lead to con"dence intervals. Although I agree with this view in a general way,

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2337

p-value analyses are very useful in that they provide simple non-parametric diagnostic
procedures. The Fisher procedure of combining p-values by comparing !2 I log p
 G
with a chi-square distribution with 2k degrees of freedom, permits a variety of diagnostic
procedures.
5. Be conservative. The notion of being conservative in science is at the heart of testing null
hypotheses in that you do not prove, but disprove, the null hypothesis. In an article (New
>ork ¹imes, October 12, Section 4, p. 6) there is a quotation from a molecular biologist on
this point: &It's a very peculiar psychological thing that goes on. If you're a good scientist,
you should always be doing experiments to attempt to discredit your own theory. But you
have this psychological block } you don't want there to be an experiment that disagrees.'
Some good advice from another scientist: &It's OK to sleep with a hypothesis, but you should
never become married to one.'
It is often stated that 5}10 per cent of a data set is #awed in some way. This suggests that
some trimming at the extremes might be appropriate. (For example, omit 1 extreme study
out of 10, and 1}2 out of 20 studies.) Omission of some of the most signi"cant results is being
conservative, but one has to be careful that this does not mask a true e!ect. Does the
trimming alter the results in terms of direction?
A note of caution is needed here because omission of the most signi"cant result means
omission of an order statistic. Thus, the analysis will require alternative theory for using
order statistics (see Saner).

2.3. Statistical theory


Although many of the procedures currently in use in meta-analysis are standard procedures,
questions continue to arise that require further study. Indeed, it is unlikely that a well done
meta-analysis will not require consideration of methods for which there is little theory. In this
section I present a number of such studies in which I have been involved. Thus, this list is a very
personal one. Many additional examples could be provided by other authors.
1. A fundamental question in the analysis of 2;2 tables is how to measure the e!ectiveness of
a treatment versus a control. Measures of e!ectiveness that are in some use are:
(a) (p !(p ;
! 2
(b) p !p or (b) 2 arcsin (p !2 arcsin (p ;
! 2 ! 2
(c) log p !log p ;
! 2
(d) log p /(1!p )!log p /(1!p ) or (d) h (p )!h (p ),
! ! 2 2 ! 2
where h(p)" (2p!1) (+p (1!p),# arcsin (2p!1).
 
Why do we need di!erent measures of e!ect? Each measure has some advantages and some
disadvantages, so there is no &best' distance measure.
A geographical atlas will display multiple types of projections: Mercator; conic; azi-
muthal; sinusoidal projections. Each projection exhibits and emphasizes di!erent
relationships between countries. The Richter scale for measuring earthquakes is a univer-
sally used logarithmic scale (base 10); for example, a value of 100 is 2 on the Richter scale.
One can move readily from one scale to the other. However, averages no longer can be
translated. The average of 2 and 4 on the Richter scale does not translate to the average of
100 and 10,000, and each scale provides a di!erent perception of heterogeneity. This

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
2338 I. OLKIN

phenomenon is recognized in the economic literature wherein salaries are often on a square
root or logarithmic scale.
In the case of the measures (a) to (d), only (b) gives an unbiased estimate of the population
parameters. Measures (b) and (d) have variances independent of the parameters, that is,
they are variance stabilized measures. Some measures are more sensitive to perturbations,
and some show more heterogeneity than others.
What is unknown is the degree of bias of these measures and their mean square errors. The
approach to normality is known to some degree, but some questions remain unanswered.
Fisher provides graphs of the density of a correlation and the variance-stabilized version; the
di!erence in the approach to normality is marked. Some simulations to study bias and means
square error for these measures, joint with L.V. Hedges, are currently in progress.
2. The number needed to treat (NNT) is a new measure that has some appeal. It is de"ned as
the number of patients who must be treated in order to prevent one adverse event. In this
context the ideal NNT is one, because that would be the situation in which all patients who
receive the treatment bene"t, and all those in the control group do not bene"t. The sample
NNT is 1/(p !p ). (For a more detailed discussion of the rationale and use of this measure
2 !
see Cook and Sackett, McQuay et al., Rembold and McQuay and Moore. Although
a con"dence interval of NNT can be obtained directly from a con"dence interval for
proportions, there are alternative methods for obtaining a con"dence interval that may have
shorter expected length, or better coverage probabilities. This is being studied jointly with
L.J. Gleser.
3. When proportions measure the occurrence of rare events, how should we compare the
binomial and Poisson model? This comparison in the context of meta-analysis has not been
studied.
4. It has long been recognized that there may be a bias due to the fact that signi"cant results
have a better chance of being published. A number of models have been presented to deal
with this problem. Rosenthal de"ned a fail-safe number as the number of non-signi"cant
studies required to reverse a signi"cant meta-analytic results. Modi"cations and extensions
of this model are considered by Iyengar and Greenhouse. These models are diagnostic and
do not provide inferential procedures. An inferential procedure was obtained by Gleser and
Olkin.
On the assumption that the k observed studies are the most signi"cant from a total of
N#k studies, where N is the number of unreported studies, we can obtain a maximum
likelihood estimator NK of N : NK "" k (1!p )/p ", where p is the largest signi"cance
I I I
value. More important, we can obtain a lower con"dence bound for N. For example, with
k"7 studies, and a largest p-value of 0)095, the lower con"dence bound for N is 30. More
general models provide other estimates.
5. Meta-analysis is normally carried out using literature summary data (MAL). In some cases
individual patient data may be available from which a meta-analysis can be carried out
(MAP).
Clearly, a "rst question is whether the results from a MAL and MAP will yield di!erent
results. Olkin and Sampson show that the meta-analysis estimators based on data from
k independent and homogeneous studies are the least-squares estimators of treatment
contrasts in the two-way "xed e!ects analysis of variance model with no interaction. Thus
the two methods of MAL and MAP yield the same analyses. Of course, MAP will contain
other data that are unavailable in a MAL.

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2339

A surprising by-product of this analysis is the fact that the meta-analysis approach
provides an explicit representation of treatment contrasts in the unbalanced two-way model.
We note that Mathew and NordstroK m show that a similar result applies to the case in
which there are covariates.
6. A comparison of a meta-analysis using literature summary data (MAL) and a meta-analysis
from individual patient data (MAP) using study as a factor was analysed to empirically
determine potential di!erences between the methods.
The following are some of the conclusions on a study of the association between ovarian
cancer and 10 other reproductive variables and contraceptive use:
(i) A MAL can incorporate studies for which individual patient data is unavailable.
(ii) A MAP can scan for subgroup analyses that may not be available in summary data.
(iii) A MAP can use a logistic regression model, which is generally not powerful with
summary data.
(iv) A MAP is costly and time consuming.
The MAL for this data took over 1000 hours. The cost comparison showed that MAP was
over 5 times as costly as the MAL.
The results for hospital control studies with 4- to 5-year oral contraceptive use yielded on
odds ratio of 0)69 for MAP and 0)64 for MAL.
Because individual patient data provides measures of many characteristics, it often leads
to exploratory analyses. Of course, "ndings from an exploratory analysis can lead to
signi"cantly positive results, which has been labelled &self-ful"lling prophecy'. Thus, care
must be exercised to distinguish between exploratory and con"rmatory analyses.
A MAP will normally generate more tests than a MAL, so that it is important to control
the type I error in multiple tests (see point 7).
7. Multiple comparison procedures should be employed more often. There is now a variety of
alternatives to the use of Bonferroni inequality corrections. See Hochberg and Tamhane
for a general dicussion and Sha!er for an exposition of multiple comparison methods in
applications, and for more recent developments and references. To date few of the more
recently developed methods have been used.
8. Almost all studies will examine several correlated outcome measures. These can be dealt with
using Bonferroni corrections or by a multivariate analysis. To date few multivariate analyses
have been used, and a majority of examined meta-analyses ignore correlational e!ects.
In some cases there will be a single control with unrelated multiple endpoints. However,
because each endpoint is compared to the same control, the ensuing analysis will involve
correlated e!ect measures. An analysis for dealing with such a design is provided by Gleser
and Olkin. 

In this section I have listed a number of theoretical issues that have arisen in the conduct of meta-
analysis. To aid the reader it may be useful to mention several other papers that have appeared
recently. The paper by Givens et al. provides a discussion of publication bias; the references
therein and in the commentary papers cover a spectrum of the "eld. Vevea and Hedges present a
general linear model for estimating e!ect size when there may be publication bias. The edited book
provides a number of additional papers and references. In particular, the paper by DuMouchel
deals with Bayesian meta-analysis. The paper by Abrams and SonsoH  gives a Bayesian approach
to random e!ects models. A discussion of heterogeneity appears in Hardy and Thompson.

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
2340 I. OLKIN

3. DISCUSSION
The theory for meta-analysis is reasonably well established. There are statistical procedures,
guidelines, checklists and software to help the analyst. All of these are designed to simplify
carrying out a meta-analysis. However, there still remain uncertainties that need to be examined.
There are many decision points and a variety of alternative ways to analyse the data. A good
meta-analysis will require looking at the data from di!erent perspectives to increase the validity
of the results. As in many other contexts, data analysis and diagnosis is an art. We here provide
a set of principles and diagnostics that are indicative of an approach to a meta-analytic study. The
guiding principle is to view a meta-analysis as you would a court case, with opposing attorneys
scrutinizing every step and every conclusion.

REFERENCES
1. Kupelnick, B. &Compilation of meta-analyses for the year 1996 (provisional)', Online Journal of Current
Clinical ¹rials, May 1997.
2. Light, R. J. and Pillemer, D. B. Summing up: ¹he Science of Reviewing Research, Harvard University
Press, Cambridge, MA, 1984.
3. Lang, T. A. and Cecic, M. How to Report Statistics in Medicine, American College of Physicians,
Philadelphia, PA, 1997.
4. Cooper, H. and Hedges, L. V. (eds) ¹he Handbook of Research Synthesis, Russell Sage Foundation, New
York, 1993.
5. Fisher, R. A. Statistical Methods for Research =orkers, 10th edn, Oliver and Boyd Edinburgh,
1948.
6. Bartlett, M. S. The square root transformation in the analysis of variance', Journal of the Royal Statistical
Society Supplement, 3, 68}78. (1936).
7. Bartlett, M. S. &The use of transformations', Biometrics Bulletin, 3, 39}53 (1947).
8. Cochran, W. G. &The analysis of variance when experimental errors follow the Poisson or binomial laws',
Annals of Mathematical Statistics, 11, 335}347 (1940).
9. Curtiss, J. H. &On transformations used in the analysis of variance', Annals of Mathematical Statistics, 14,
107}122, (1943).
10. Givens, G. H., Smith, D. D. and Tweedie, R. L. &Publication bias in meta-analysis: a Bayesian
data-augmentation approach to account for issues exempli"ed in the passive smoking debate (with
comments and rejoinder)', Statistical Science, 12, 221}250 (1997).
11. Lau, J., Antaman, E. M., Jimenez-Silva, J., Kupelnick, B., Mosteller, F. and Chalmers, T. C. &Cumulative
meta-analysis of therapeutic trials for myocardial infarction', New England Journal of Medicine, 327,
248}254 (1992).
12. Schmid, C. H., Lau, J., McIntosh, M. W. and Cappelleri, J. C. &An empirical study of the e!ect of the
control rate as a predictor of treatment e$cacy in meta-analysis of clinical trials', Statistics in Medicine,
17, 1923}1942 (1998).
13. Saner, H. &Approximations to a conservative inverse chi-square procedure for combining p-values in
integrative research', Statistics and Probability ¸etters, 15, 215}222 (1992).
14. Cook, R. J. and Sackett, D. L. &The number needed to treat: a clinically useful measure of treatment
e!ect', British Medical Journal, 310, 452}454 (1995).
15. McQuay, H. M., Trainer, M., Nye, B. A., Carroll, D., Witten, P. J. and Moore, R. H. &A systematic review
of antidepressants in neuropathic pain', Pain, 68, 217}227 (1996).
16. Rembold, C. M. &A number needed to treat (NNT) analysis of the prevention of myocardial infarction
and eath by dislipidemic therapy', Journal of Family Practice, 42, 577}581 (1996).
17. McQuay, H. J. and Moore, R. A. &Using numerical results from systematic reviews in clinical practice',
Annals of Internal Medicine, 126, 712}720 (1997).
18. Rosenthal, R. &The &&"le drawer problem'' and tolerance for null results', Psychological Bulletin, 86,
638}641 (1979).

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2341

19. Iyengar, S. and Greenhouse, J. B. &Selection models and the "le-drawer problem', Statistical Science, 3,
(1), 109}135 (1988).
20. Gleser, L. J. and Olkin, I. &Models for estimating the number of unpublished studies', Statistics in
Medicine, 15, (23), 2493}2507 (1996).
21. Olkin, I. and Sampson, A. &Comparison of meta-analysis versus analysis of variance of individual patient
data', Biometrics, 54, 272}277 (1998).
22. Mathew, T. and NordstroK m, K. &On the equivalence of meta-analysis using literature and using
individual patient data', Personal communication, 1998.
23. Steinberg, K. K., Smith, S. J., Stroup, D. F., Olkin, I., Lee, N. C., Williamson, G. D. and Thacker, S. B. &A
comparison of e!ect estimates from a meta-analysis of summary data from published studies and from
a meta-analysis using individual patient data for covariance cancer studies', American Journal of
Epidemiology, 145, (10), 917}925 (1997).
24. Hochberg, Y. and Tamhane, A. C. Multiple Comparison Procedures, Wiley, 1987.
25. Sha!er, J. &Multiple hypothesis testing', Annual Review of Psychology, 46, 561}584 (1995).
26. Gleser, L. J. and Olkin, I. &Stochastically dependent e!ect sizes', in Cooper, H and Hedges, L. V. (eds),
¹he Handbook of Research Synthesis, Russell Sage Foundation, New York, 1993, pp. 339}355.
27. Vevea, J. L. and Hedges, L. V. &A general linear model for estimating e!ect size in the presence of
publication bias', Psychometrika, 60(3), 419}435 (1995).
28. Berry, D. (ed.) Statistical Methods for Pharmacology, M. Dekker, New York, 1990.
29. DuMouchel, W. &Bayesian meta-analysis', in Berry, D. (ed.), Statistical Methods for Pharmacology,
M. Dekker, New York, 1990, pp. 509}529.
30. Abrams, K. and SansoH , B. &Approximate Bayesian inferences for a random e!ects meta-analysis',
Statistics in Medicine, 17, 201}218 (1998).
31. Hardy, R. J. and Thompson, S. G. &Detecting and describing heterogeneity in meta-analysis', Statistics in
Medicine, 17, 841}856 (1998).

Copyright  1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)

You might also like