Professional Documents
Culture Documents
INGRAM OLKIN*
Department of Statistics and School of Education, Stanford University, Stanford, CA 94305, U.S.A.
SUMMARY
The number of published meta-analyses in medicine has had phenomenal growth, to a point where over 300
meta-analyses in medicine are published yearly. Because meta-analyses tend to lead to policy decisions, it is
extremely important that the analyses be robust and that alternative analyses yield consistent results. We
herein provide a discussion of diagnostic statistical procedures. Copyright 1999 John Wiley & Sons, Ltd.
1. INTRODUCTION
Meta-analysis has been de"ned in a variety of ways. One popularized de"nition (New >ork ¹imes,
1994) is: &A meta-analysis aims at gleaning more information from existing data by pooling the
results of many smaller studies and applying one or more statistical techniques. The bene"ts or
hazards that might not be detected in small studies can be found in meta-analysis that uses data
from thousands of patients.'
Why has meta-analysis become so popular? I believe that there are several reasons. First, and
perhaps foremost, is the information explosion that has occurred between the 1940s and the
present. In most "elds the number of journals has increased approximately ten- to twelve-fold.
For example, the number of biomedical journals has increased from 2300 to over 25,000. This
number is translated to over 9 million articles per year. A similar escalation has occurred in other
"elds such as psychology, education and mathematics.
A second impetus to meta-analysis in medicine is the movement towards the use of evidence-
based health care both in providing health care and in establishing public policy. Meta-analysis is
particularly appealing because it makes use of existing information. The Cochrane Collaboration
is a move towards implementing data centres that provide systematic reviews in a wide range of
medical areas, for example, acute respiratory infections, colorectal cancer, depression, hyperten-
sion, pregnancy and childbirth, and schizophrenia.
To illustrate the growth and breadth of meta-analytic studies in medicine, in Table I we list by
area the meta-analyses in 1996 found in MEDLINE. Thus we "nd approximately 300 meta-
analytic studies that encompass results of over 3000 articles.
There are two distinct, but related, parts of a meta-analysis } the pre-statistical and the
statistical. The pre-statistical part involves searching, coding, evaluating coding decisions,
* Correspondence to: Ingram Olkin, Department of Statistics and School of Education, Stanford University, Stanford,
CA 94305, U.S.A.
CCC 0277}6715/99/172331}11$17.50
Copyright 1999 John Wiley & Sons, Ltd.
2332 I. OLKIN
Types Number
Cardiovascular diseases 26
Gastro-intestinal diseases 12
Hypertension 11
Infectious diseases 19
Joint and bone diseases 13
Neurologic disorders 13
Oncology 32
Pregnancy 16
Psychological techniques (non-pharmacological) 28
Other 59
Total 229
Epidemiological meta-analysis 42
Diagnostic 22
Cost e!ectiveness 13
2. DIAGNOSTIC PRINCIPLES
The following are some basic principles that should be considered in the planning and the
statistical portion of a meta-analysis.
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2333
Bucindolol 55 Carvedilol 52
Bucindolol 54 Carvedilol 58
Bucindolol 52 Metoprolol 50
Carvedilol 67 Metoprolol 50
Carvedilol 55 Metoprolol 63
Carvedilol 51 Metoprolol 49
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
2334 I. OLKIN
transformation and the angular transformation. Because p and 1!p arise in the binomial
distribution it was natural to consider p"sin a and 1!p"cos a. Cochran's summary states,
&This paper discusses the theoretical basis for the use of the square root and inverse since
transformations in analyzing data whose experimental errors follow the Poisson and binominal
frequency laws respectively.' Curtiss provides the necessary limiting distribution of the inverse
sign statistic.
An analysis that shows consistency of results is provided by Givens et al. who obtained data
from 35 studies concerning environmental tobacco smoke with a comparison of alternative
analyses (Table III). The authors show a con"dence interval for each of the 35 studies and also
display a combined e!ect size con"dence interval for the EPA "xed e!ects meta-analysis
(based on U.S. studies only), the standard random e!ects meta-analysis, the standard Bayesian
meta-analysis, and the Bayesian meta-analysis accounting for potential bias. For this data
there are very small discrepancies in the estimates, so that we obtain almost complete consistency
of results using alternative methods of analysis. This consistency is very comforting to the
researcher.
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2335
4. Plot, plot, plot whenever and whatever you can. &A picture is worth a thousand words'. Plots
often exhibit patterns that tabular displays do not show.
(a) Plot funnel plots to help detect heterogeneity. This display is most useful when there is
an ample number of studies. Funnel plots o!er a visualization of e!ects, but su!er from
a lack of inferential methods.
(b) Plot p-values against ranks. Under the null hypothesis these should fall on a line (see
Figure 1).
(c) Plot the combined results of the k studies with one omitted, plus the overall e!ect (see
Figure 2).
(d) Plot cumulative results in which the cumulation is based on di!erent covariates such as
chronology, sample size, e!ect size, and so on. (For a discussion of cumulative meta-
analysis and examples, see Lau et al.)
(e) Plot regressions of e!ect size against covariates. For example, the time that thrombolytic
therapy was administered is very important. In Figure 1 the x-axis represents time and
the y-axis represents e!ect size. Such a plot will show whether time alters the e!ec-
tiveness of the therapy. In other examples, age may be an important covariate, or length
of time used.
(f ) Another type of ®ression' is to plot e!ect size against control rate. When control rates
are very low, that is, a low incidence (a healthy population), then the e!ects will not be
too pronounced. For more details on this see Schmid et al.
(g) Compare the results of large studies with those of small studies. Is there a di!erence? An
analysis of studies on thromobolytic therapy in acute myocardial infarction yielded the
following odds ratios using "xed and random e!ects models:
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
2336 I. OLKIN
Figure 2. Con"dence intervals and relative risks for the 35 ETS studies, the EPA "xed e!ects meta-analysis (based on
U.S. studies only), the standard random e!ects meta-analysis, the standard Bayesian meta-analysis and the Bayesian
meta-analysis accounting for potential publication bias (with permission from Statistical Science)
Here we "nd close agreement between the "xed and random e!ects models, some
discrepancy in mean e!ect between large and small studies, but consistency in terms of
direction. Thus, the issue here is to explain sample size discrepancy, but not potential
heterogeneity.
(h) P-values have a bad reputation because they focus on an overall single number and do
not lead to con"dence intervals. Although I agree with this view in a general way,
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2337
p-value analyses are very useful in that they provide simple non-parametric diagnostic
procedures. The Fisher procedure of combining p-values by comparing !2 I log p
G
with a chi-square distribution with 2k degrees of freedom, permits a variety of diagnostic
procedures.
5. Be conservative. The notion of being conservative in science is at the heart of testing null
hypotheses in that you do not prove, but disprove, the null hypothesis. In an article (New
>ork ¹imes, October 12, Section 4, p. 6) there is a quotation from a molecular biologist on
this point: &It's a very peculiar psychological thing that goes on. If you're a good scientist,
you should always be doing experiments to attempt to discredit your own theory. But you
have this psychological block } you don't want there to be an experiment that disagrees.'
Some good advice from another scientist: &It's OK to sleep with a hypothesis, but you should
never become married to one.'
It is often stated that 5}10 per cent of a data set is #awed in some way. This suggests that
some trimming at the extremes might be appropriate. (For example, omit 1 extreme study
out of 10, and 1}2 out of 20 studies.) Omission of some of the most signi"cant results is being
conservative, but one has to be careful that this does not mask a true e!ect. Does the
trimming alter the results in terms of direction?
A note of caution is needed here because omission of the most signi"cant result means
omission of an order statistic. Thus, the analysis will require alternative theory for using
order statistics (see Saner).
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
2338 I. OLKIN
phenomenon is recognized in the economic literature wherein salaries are often on a square
root or logarithmic scale.
In the case of the measures (a) to (d), only (b) gives an unbiased estimate of the population
parameters. Measures (b) and (d) have variances independent of the parameters, that is,
they are variance stabilized measures. Some measures are more sensitive to perturbations,
and some show more heterogeneity than others.
What is unknown is the degree of bias of these measures and their mean square errors. The
approach to normality is known to some degree, but some questions remain unanswered.
Fisher provides graphs of the density of a correlation and the variance-stabilized version; the
di!erence in the approach to normality is marked. Some simulations to study bias and means
square error for these measures, joint with L.V. Hedges, are currently in progress.
2. The number needed to treat (NNT) is a new measure that has some appeal. It is de"ned as
the number of patients who must be treated in order to prevent one adverse event. In this
context the ideal NNT is one, because that would be the situation in which all patients who
receive the treatment bene"t, and all those in the control group do not bene"t. The sample
NNT is 1/(p !p ). (For a more detailed discussion of the rationale and use of this measure
2 !
see Cook and Sackett, McQuay et al., Rembold and McQuay and Moore. Although
a con"dence interval of NNT can be obtained directly from a con"dence interval for
proportions, there are alternative methods for obtaining a con"dence interval that may have
shorter expected length, or better coverage probabilities. This is being studied jointly with
L.J. Gleser.
3. When proportions measure the occurrence of rare events, how should we compare the
binomial and Poisson model? This comparison in the context of meta-analysis has not been
studied.
4. It has long been recognized that there may be a bias due to the fact that signi"cant results
have a better chance of being published. A number of models have been presented to deal
with this problem. Rosenthal de"ned a fail-safe number as the number of non-signi"cant
studies required to reverse a signi"cant meta-analytic results. Modi"cations and extensions
of this model are considered by Iyengar and Greenhouse. These models are diagnostic and
do not provide inferential procedures. An inferential procedure was obtained by Gleser and
Olkin.
On the assumption that the k observed studies are the most signi"cant from a total of
N#k studies, where N is the number of unreported studies, we can obtain a maximum
likelihood estimator NK of N : NK "" k (1!p )/p ", where p is the largest signi"cance
I I I
value. More important, we can obtain a lower con"dence bound for N. For example, with
k"7 studies, and a largest p-value of 0)095, the lower con"dence bound for N is 30. More
general models provide other estimates.
5. Meta-analysis is normally carried out using literature summary data (MAL). In some cases
individual patient data may be available from which a meta-analysis can be carried out
(MAP).
Clearly, a "rst question is whether the results from a MAL and MAP will yield di!erent
results. Olkin and Sampson show that the meta-analysis estimators based on data from
k independent and homogeneous studies are the least-squares estimators of treatment
contrasts in the two-way "xed e!ects analysis of variance model with no interaction. Thus
the two methods of MAL and MAP yield the same analyses. Of course, MAP will contain
other data that are unavailable in a MAL.
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2339
A surprising by-product of this analysis is the fact that the meta-analysis approach
provides an explicit representation of treatment contrasts in the unbalanced two-way model.
We note that Mathew and NordstroK m show that a similar result applies to the case in
which there are covariates.
6. A comparison of a meta-analysis using literature summary data (MAL) and a meta-analysis
from individual patient data (MAP) using study as a factor was analysed to empirically
determine potential di!erences between the methods.
The following are some of the conclusions on a study of the association between ovarian
cancer and 10 other reproductive variables and contraceptive use:
(i) A MAL can incorporate studies for which individual patient data is unavailable.
(ii) A MAP can scan for subgroup analyses that may not be available in summary data.
(iii) A MAP can use a logistic regression model, which is generally not powerful with
summary data.
(iv) A MAP is costly and time consuming.
The MAL for this data took over 1000 hours. The cost comparison showed that MAP was
over 5 times as costly as the MAL.
The results for hospital control studies with 4- to 5-year oral contraceptive use yielded on
odds ratio of 0)69 for MAP and 0)64 for MAL.
Because individual patient data provides measures of many characteristics, it often leads
to exploratory analyses. Of course, "ndings from an exploratory analysis can lead to
signi"cantly positive results, which has been labelled &self-ful"lling prophecy'. Thus, care
must be exercised to distinguish between exploratory and con"rmatory analyses.
A MAP will normally generate more tests than a MAL, so that it is important to control
the type I error in multiple tests (see point 7).
7. Multiple comparison procedures should be employed more often. There is now a variety of
alternatives to the use of Bonferroni inequality corrections. See Hochberg and Tamhane
for a general dicussion and Sha!er for an exposition of multiple comparison methods in
applications, and for more recent developments and references. To date few of the more
recently developed methods have been used.
8. Almost all studies will examine several correlated outcome measures. These can be dealt with
using Bonferroni corrections or by a multivariate analysis. To date few multivariate analyses
have been used, and a majority of examined meta-analyses ignore correlational e!ects.
In some cases there will be a single control with unrelated multiple endpoints. However,
because each endpoint is compared to the same control, the ensuing analysis will involve
correlated e!ect measures. An analysis for dealing with such a design is provided by Gleser
and Olkin.
In this section I have listed a number of theoretical issues that have arisen in the conduct of meta-
analysis. To aid the reader it may be useful to mention several other papers that have appeared
recently. The paper by Givens et al. provides a discussion of publication bias; the references
therein and in the commentary papers cover a spectrum of the "eld. Vevea and Hedges present a
general linear model for estimating e!ect size when there may be publication bias. The edited book
provides a number of additional papers and references. In particular, the paper by DuMouchel
deals with Bayesian meta-analysis. The paper by Abrams and SonsoH gives a Bayesian approach
to random e!ects models. A discussion of heterogeneity appears in Hardy and Thompson.
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
2340 I. OLKIN
3. DISCUSSION
The theory for meta-analysis is reasonably well established. There are statistical procedures,
guidelines, checklists and software to help the analyst. All of these are designed to simplify
carrying out a meta-analysis. However, there still remain uncertainties that need to be examined.
There are many decision points and a variety of alternative ways to analyse the data. A good
meta-analysis will require looking at the data from di!erent perspectives to increase the validity
of the results. As in many other contexts, data analysis and diagnosis is an art. We here provide
a set of principles and diagnostics that are indicative of an approach to a meta-analytic study. The
guiding principle is to view a meta-analysis as you would a court case, with opposing attorneys
scrutinizing every step and every conclusion.
REFERENCES
1. Kupelnick, B. &Compilation of meta-analyses for the year 1996 (provisional)', Online Journal of Current
Clinical ¹rials, May 1997.
2. Light, R. J. and Pillemer, D. B. Summing up: ¹he Science of Reviewing Research, Harvard University
Press, Cambridge, MA, 1984.
3. Lang, T. A. and Cecic, M. How to Report Statistics in Medicine, American College of Physicians,
Philadelphia, PA, 1997.
4. Cooper, H. and Hedges, L. V. (eds) ¹he Handbook of Research Synthesis, Russell Sage Foundation, New
York, 1993.
5. Fisher, R. A. Statistical Methods for Research =orkers, 10th edn, Oliver and Boyd Edinburgh,
1948.
6. Bartlett, M. S. The square root transformation in the analysis of variance', Journal of the Royal Statistical
Society Supplement, 3, 68}78. (1936).
7. Bartlett, M. S. &The use of transformations', Biometrics Bulletin, 3, 39}53 (1947).
8. Cochran, W. G. &The analysis of variance when experimental errors follow the Poisson or binomial laws',
Annals of Mathematical Statistics, 11, 335}347 (1940).
9. Curtiss, J. H. &On transformations used in the analysis of variance', Annals of Mathematical Statistics, 14,
107}122, (1943).
10. Givens, G. H., Smith, D. D. and Tweedie, R. L. &Publication bias in meta-analysis: a Bayesian
data-augmentation approach to account for issues exempli"ed in the passive smoking debate (with
comments and rejoinder)', Statistical Science, 12, 221}250 (1997).
11. Lau, J., Antaman, E. M., Jimenez-Silva, J., Kupelnick, B., Mosteller, F. and Chalmers, T. C. &Cumulative
meta-analysis of therapeutic trials for myocardial infarction', New England Journal of Medicine, 327,
248}254 (1992).
12. Schmid, C. H., Lau, J., McIntosh, M. W. and Cappelleri, J. C. &An empirical study of the e!ect of the
control rate as a predictor of treatment e$cacy in meta-analysis of clinical trials', Statistics in Medicine,
17, 1923}1942 (1998).
13. Saner, H. &Approximations to a conservative inverse chi-square procedure for combining p-values in
integrative research', Statistics and Probability ¸etters, 15, 215}222 (1992).
14. Cook, R. J. and Sackett, D. L. &The number needed to treat: a clinically useful measure of treatment
e!ect', British Medical Journal, 310, 452}454 (1995).
15. McQuay, H. M., Trainer, M., Nye, B. A., Carroll, D., Witten, P. J. and Moore, R. H. &A systematic review
of antidepressants in neuropathic pain', Pain, 68, 217}227 (1996).
16. Rembold, C. M. &A number needed to treat (NNT) analysis of the prevention of myocardial infarction
and eath by dislipidemic therapy', Journal of Family Practice, 42, 577}581 (1996).
17. McQuay, H. J. and Moore, R. A. &Using numerical results from systematic reviews in clinical practice',
Annals of Internal Medicine, 126, 712}720 (1997).
18. Rosenthal, R. &The &&"le drawer problem'' and tolerance for null results', Psychological Bulletin, 86,
638}641 (1979).
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)
DIAGNOSTIC STATISTICAL PROCEDURES IN MEDICAL META-ANALYSES 2341
19. Iyengar, S. and Greenhouse, J. B. &Selection models and the "le-drawer problem', Statistical Science, 3,
(1), 109}135 (1988).
20. Gleser, L. J. and Olkin, I. &Models for estimating the number of unpublished studies', Statistics in
Medicine, 15, (23), 2493}2507 (1996).
21. Olkin, I. and Sampson, A. &Comparison of meta-analysis versus analysis of variance of individual patient
data', Biometrics, 54, 272}277 (1998).
22. Mathew, T. and NordstroK m, K. &On the equivalence of meta-analysis using literature and using
individual patient data', Personal communication, 1998.
23. Steinberg, K. K., Smith, S. J., Stroup, D. F., Olkin, I., Lee, N. C., Williamson, G. D. and Thacker, S. B. &A
comparison of e!ect estimates from a meta-analysis of summary data from published studies and from
a meta-analysis using individual patient data for covariance cancer studies', American Journal of
Epidemiology, 145, (10), 917}925 (1997).
24. Hochberg, Y. and Tamhane, A. C. Multiple Comparison Procedures, Wiley, 1987.
25. Sha!er, J. &Multiple hypothesis testing', Annual Review of Psychology, 46, 561}584 (1995).
26. Gleser, L. J. and Olkin, I. &Stochastically dependent e!ect sizes', in Cooper, H and Hedges, L. V. (eds),
¹he Handbook of Research Synthesis, Russell Sage Foundation, New York, 1993, pp. 339}355.
27. Vevea, J. L. and Hedges, L. V. &A general linear model for estimating e!ect size in the presence of
publication bias', Psychometrika, 60(3), 419}435 (1995).
28. Berry, D. (ed.) Statistical Methods for Pharmacology, M. Dekker, New York, 1990.
29. DuMouchel, W. &Bayesian meta-analysis', in Berry, D. (ed.), Statistical Methods for Pharmacology,
M. Dekker, New York, 1990, pp. 509}529.
30. Abrams, K. and SansoH , B. &Approximate Bayesian inferences for a random e!ects meta-analysis',
Statistics in Medicine, 17, 201}218 (1998).
31. Hardy, R. J. and Thompson, S. G. &Detecting and describing heterogeneity in meta-analysis', Statistics in
Medicine, 17, 841}856 (1998).
Copyright 1999 John Wiley & Sons, Ltd. Statist. Med. 18, 2331}2341 (1999)