You are on page 1of 27

A Review and Evaluation of Meta-Analysis

Practices in Management Research


Inge Geyskens*
Tilburg University, Warandelaan 2, 5000 LE Tilburg, the Netherlands
Rekha Krishnan
Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
Jan-Benedict E. M. Steenkamp
Kenan-Flagler Business School, University of North Carolina, Chapel Hill, NC 27599-3490
Paulo V. Cunha
Rua das Musas, 2.05.02, 1990-174, Lisbon, Portugal

Meta-analysis has become increasingly popular in management research to quantitatively inte-


grate research findings across a large number of studies. In an effort to help shape future appli-
cations of meta-analysis in management, this study chronicles and evaluates the decisions that
management researchers made in 69 meta-analytic studies published between 1980 and 2007 in
14 management journals. It performs four meta-analyses of relationships that have been stud-
ied with varying frequency in management research, to provide empirical evidence that meta-
analytical decisions influence results. The implications of the findings are discussed with a focus
on the changes that seem appropriate.

Keywords: meta-analysis; empirical generalizations; research synthesis

The first author gratefully acknowledges support from the Netherlands Organization for Scientific Research. This
research was conducted while the fourth author was a doctoral student at Tilburg University. This article was
accepted under the editorship of Russell Cropanzano.

*Corresponding author: Tel.: +31-13-466 80 83; fax: +31-13-466 83 54.

E-mail address: i.geyskens@uvt.nl


Journal of Management, Vol. 35 No. 2, April 2009 393-419
DOI: 10.1177/0149206308328501
2009 Southern Management Association. All rights reserved.
393
394 Journal of Management / April 2009

Over the past 25 years, there has been a massive growth in the volume of management
research, with the results being spread over many journals. Despite the large number of empir-
ical studies, or perhaps because of it, insights from management research have not always been
cumulative. Conflicting findings are more numerous today than before, which impedes scien-
tific progress. In response, meta-analysisthe quantitative synthesis of research findings
across a large number of studiesenjoys a growing interest in the management area. Meta-
analysis has, for example, been used to establish empirical generalizations in the contexts of
franchising (Combs & Ketchen, 2003), employee turnover (Griffeth, Hom, & Gaertner, 2000),
first-mover advantages (VanderWerf & Mahon, 1997), the concentrationperformance rela-
tionship (Datta & Narayanan, 1989), and transaction cost theory (Geyskens, Steenkamp, &
Kumar, 2006).
Meta-analysis is important because some primary studies lack sufficient power (i.e., sam-
ple size) to achieve statistically significant results and nearly all studies lack the power for a
precise estimate of effect size (Lipsey & Wilson, 2001). By combining into a single estimate
the findings of multiple independent studies that bear on the same relationship, while cor-
recting for the distorting effects of artifacts that may produce the illusion of conflicting find-
ings, meta-analysis arrives at more accurate conclusions than those presented in any one
primary study (Hunter & Schmidt, 2004). In case variation in results remains among studies
after artifact correction, meta-analysis helps researchers identify moderator variables and
areas in which future research is needed.
Although management researchers have recognized the importance of meta-analysis, con-
siderable variation exists regarding the procedures used. This variation may have important
consequencesnamely, yielding different meta-analytic conclusions. In an effort to help
shape future applications of meta-analysis in management research, the goals of this article
are to (a) reveal the state of meta-analytic research in the field and (b) demonstrate how the
decisions made by researchers may affect the conclusions they reach through meta-analysis.
To meet these goals, we chronicle and evaluate what decisions researchers made in 69
meta-analyses published between 1984 and 2007 in 14 management journals. We discuss the
implications of our findings, with a focus on the changes that seem appropriate. Then, we
perform four meta-analyses, to provide empirical evidence that meta-analytical decisions
influence results. Before beginning our analyses, we develop a brief checklist of critical
decisions that the meta-analyst makes. This checklist serves as our organizing framework for
the remainder of the article.

Critical Decisions in the Application of Meta-Analysis

Users of meta-analysis are confronted with many critical decisions. We focus on the
sequential set of decisions outlined in Table 1.1 For each decision, we discuss options avail-
able, list the advantages and disadvantages, and recommend high-quality meta-analytic prac-
tices. It is important to note up front that for a number of these decisions, there is no right
answer. In these instances, we draw attention to existing controversies. Although Table 1
does not present anything new or unavailable in other sources, it does present all the infor-
mation in one place and, as such, may serve a useful function.

(text continues on p. 399)


Table 1
Critical Decisions Confronting Users of Meta-Analysis
Meta-analytic decision Advantages Disadvantages Our recommendation

Effect size metric (James, Demaree, & Mulaik, 1986; Law, 1995; Schulze, 2004; Shadish & Haddock, 1994)a
r - Easy to interpret. - As population value of r gets further No preferred procedure.
- Downward bias produced by r is less from zero, distribution of rs sampled
than rounding error (.005). from that population becomes more
skewed.
- Variance of r depends on population
correlation value.
Fishers z - Nearly normally distributed. - Less easy to interpret.
- Variance of z does not depend on - Upward bias that is larger than down-
population correlation value. ward bias produced by using r.

Corrections for systematic attenuating artifacts (Bass & Ager, 1991; Hunter & Schmidt, 1990, 1994; Kemery, Dunlap, & Griffeth, 1988; Muchinsky, 1996; Paese
& Switzer, 1988; Rosenthal, 1994; Sackett, 2003; Schmidt & Hunter, 1999; Williams, 1990)
None - Information required to apply correc- - Downward bias in mean effect size. Correct for measurement error and
tions is often missing. dichotomization of continuous variables
- Unadjusted effect sizes are more com- if information is available at the individ-
parable than when some effect sizes ual level.
are adjusted while others are not.
At the individual level - More accurate estimate of true - Corrections may lead to correlations Since usage of artifact distributions and
population effect size. larger than 1. corrections for range restriction are con-
- Artifact information required to apply troversial, report results before and after
corrections at the individual level is correcting.
often missing.
Using artifact distributions - Applicable when artifact information is - Corrections may lead to correlations Examples:
only sporadically available. larger than 1. - Balkundi & Harrison (2006)
- Artifact distribution may not always - Cohen (1993)
represent studies for which artifact
information is not available.

(continued)

395
396
Table 1 (continued)
Meta-analytic decision Advantages Disadvantages Our recommendation

Correction for interdependent effect sizes (Cheung & Chan, 2004; Hunter & Schmidt, 2004; Martinussen & Bjornstad, 1999)
Selection of one effect size - Simplest approach. - Judgmental biases may arise (ignoring Create a single composite variable. If inter-
- Choice can be theory-based or random. potentially meaningful information). correlations among the dependent effect
Averaging - More precise measure than any single - Unclear what to use as sample size. sizes are not available, use averaging.
effect size. - Underestimates degree of heterogene-
ity of effect sizes. Examples:
Creation of a single - Theoretically most correct measure. - Presence of correlation structure - Geyskens, Steenkamp, & Kumar (2006)
composite variable among interdependent effect sizes is - LePine, Podsakoff, & LePine (2005)
required.

Identification of outliers (Beal, Corey, & Dunlap, 2002; Huffcutt & Arthur, 1995)
Traditional outlier detection techniques - Simple. - Do not take sample size into account. Use SAMD on r. Report results with
(e.g., schematic plot analysis) and without outliers.

Sample-adjusted meta-analytic deviancy - Takes sample size into account. - Developed for r as opposed to z. Examples:
(SAMD) - Overidentifies small relative to large - Geyskens, Steenkamp, & Kumar
correlations as outliers, especially for (2006) for SAMD
small (k < 20) meta-analyses. - Williams & Livingstone (1994)
for analyses with and without outlier
elimination
Mean effect size (Schulze, 2004)
Variance weighted - Optimal weights from a theoretical - Requires z transformation - It is more important to weight than
point of view. what weights to use.
Sample-size weighted - Easy to use. - Sample size is not the optimal weight Example:
from a theoretical point of view. - Combs & Ketchen (2003)

(continued)
Table 1 (continued)
Meta-analytic decision Advantages Disadvantages Our recommendation

Homogeneity analysis (Cortina, 2003; Koslowsky & Sagie, 1993; Schulze, 2004; Shadish & Haddock, 1994)
Chi-square test - Used most frequently. - Low power unless number of effect Use combination of procedures.
sizes is large. When within-study sam- Examples:
ple sizes are very large, homogeneity - Griffeth, Hom, & Gaertner (2000)
may be rejected even when individual - Webber & Donahue (2001)
effect size estimates hardly vary.
75% rule - Most powerful moderator detection - Indicates heterogeneity in homoge-
technique, if population correlations neous situation far too often.
differ by more than .2 and if number of - Small number of studies meta-
studies meta-analyzed is large. analyzed, lack of between study-
variability on a moderator, and small
differences between population effect
sizes may also lead to a variance ratio
larger than 75%.
Credibility interval - Intuitively appealing. - Credibility intervals may include zero
if population mean is small, even in
absence of a moderator.
- Arbitrary cutoff of what is a large
credibility interval.
Residual standard deviation - Intuitively appealing. - No widespread agreement about
cut-off.

Moderator analysis (Raudenbush, 1994; Steel & Kammeyer-Mueller, 2002; Viswesvaran & Ones, 1998)
Subgroup analysis - Easy. - Moderators are seldom orthogonal. Weighted least squares is preferred
- Only for categorical moderators. above subgroup analyses.
Hierarchical subgroup analysis - Is able to separate effects of different - Number of studies available is usually Examples:
moderators. too small. - Bhaskar-Shrinivas et al. (2005)
- Only for categorical moderators.

(continued)

397
398
Table 1 (continued)
Meta-analytic decision Advantages Disadvantages Our recommendation

Weighted regression analysis - Is able to separate effects of different


Fixed-effects moderators.
- Analytically easier than random-effects - Underlying assumption that population Use random-effects models provided that
model. effect size is the same in all studies is the number of studies is large enough.
- More statistical power to detect moder- usually false, in which case type I error Otherwise use fixed-effects models.
ator relationships. rates are (often far) too high.
- Inferences apply only to studies like
Random-effects those under examination.
- More realistic: population effect size - When the number of studies meta- Examples:
varies from study to study. analyzed is small, estimates of random - Callahan, Kiker, & Cross (2003)
- Always yields appropriate type I error effects can become unstable. - Sturman (2003)
rate.
Publication bias (Begg, 1994; Duval & Tweedie, 2000; Rosenthal, 1979)
Compare published with unpublished - Easy to calculate and interpret. - Requires that sufficient number of Use trim and fill.
studies unpublished studies is available. Example:
- Sample of unpublished studies may be - Geyskens, Steenkamp, & Kumar (2006)
unrepresentative of unpublished work
in the area.
Rosenthals file-drawer method - Easy to calculate and interpret. - Does not account for sample sizes of
studies.
- Choice of zero for average effect of the
unpublished study is arbitrary.
Trim and fill - Accounts for sample sizes of the - More difficult to compute.
studies.
- Estimates number of unpublished
studies and publication-bias-adjusted
estimate of true mean effect size.

a
The advantages and disadvantages listed are discussed more extensively in the references next to each meta-analytic decision.
Geyskens et al. / Meta-Analysis Practices in Management Research 399

Step 1: Data Preparation

Choice of effect size metric. The two most generally accepted effect size metrics are the
correlation coefficient r and Cohens d,2 of which r is most commonly used by management
researchers.3 As shown in Table 1, the literature is divided about whether one should use r
or Fishers variance-stabilizing z-transform. Although the use of z does not have an absolute
advantage over r in terms of estimation accuracy (Shadish & Haddock, 1994), z-transformed
correlations have the desirable statistical properties of (a) being approximately normally dis-
tributed and (b) having the sample variance depend only on sample size and not on the pop-
ulation correlation itself; that is, from a theoretical point of view, one can weight the
correlations with the optimal weight.

Correction for systematic artifacts.4 Some imperfections or artifacts that systematically


attenuate observed correlations and so inflate their variability can, relatively easily, be cor-
rected fornamely, (a) measurement error, which occurs when there is unreliability in either
variable upon which the correlation is based; (b) range restriction in either variable, which
occurs if a variable upon which the correlation is based has a smaller standard deviation in
the study sample than it does in the population; and (c) dichotomization of a truly continu-
ous variable, which causes the (biserial) correlation for the dichotomized variable to have a
maximum size of .7978 (MacCallum, Zhang, Preacher, & Rucker, 2002).5
If systematic artifact information is individually available for (nearly) every primary
study, each correlation (prior to the z transform) can be individually corrected for the atten-
uating effect of the systematic artifact, and the meta-analysis can be conducted on the indi-
vidually corrected correlations.6 The few missing artifact values can be replaced by the mean
value of the artifact across the studies where information is given. If systematic artifact infor-
mation is missing for many studies, the method of artifact distributions can be used to cor-
rect the weighted mean correlation and variance at a later point (Hunter & Schmidt, 2004).
Because the goal of science is to establish relations among constructsnot relations
among imperfect measures of constructsthere is relatively widespread agreement that one
should correct for measurement error and dichotomization if this information is available for
every study or nearly every study. Consensus is lacking regarding correcting for measure-
ment error using the method of artifact distributions and correcting for range restriction.
Given the lack of consensus in the latter cases (for opposing views, see Hunter & Schmidt,
2004; Rosenthal, 1991), we recommend that when such corrections are made, the uncor-
rected results be presented as well.

Correction for interdependent effect sizes. Interdependent effect sizes occur when more
than one effect size relevant to a given relationship comes from the same sample. If interde-
pendent effect sizes were to be treated as if they were independent data points, the studies in
question would be overrepresented; sample sizes would be artificially inflated beyond the
number of actual participants; observed variability of the effect sizes would be reduced; and
standard errors would be biased (Arthur, Bennett, & Huffcutt, 2001). To deal with interde-
pendent effect sizes, the most accurate procedure is to combine them into a single correla-
tion, using the formulas for the correlations of composites provided by Hunter and Schmidt
(1990: 457-460).
400 Journal of Management / April 2009

When this approach is not feasible because between-measure correlations are not avail-
able, the best alternative is to average the conceptually equivalent correlations. Alternatively,
one could randomly select an effect size (Martinussen & Bjornstad, 1999). We prefer aver-
aging to random selection because the latter is usually done manually, which is likely to lead
to judgmental biases (Lipsey & Wilson, 2001; Wanous, Sullivan, & Malinak, 1989).

Identification of outliers. Traditional outlier detection techniques, such as schematic plot


analysis or number of standard deviations from the mean, are inappropriate in the context of
a meta-analysis because they do not take sample size into account. A better method is to use
Huffcutt and Arthurs (1995) sample-adjusted meta-analytic deviancy statistic (SAMD). This
statistic computes the difference between each primary studys effect size and the mean
sample-weighted effect size (with the latter value not including the former value); then, it
adjusts that difference for the sample size of the study. It is by no means established that out-
liers, once identified, should be eliminated. A sensitivity analysis is advisable, comparing the
results from a meta-analysis on the full data set with the results from a comparable meta-
analysis on a reduced data set excluding the outliers.

Step 2: Substantive Questions

Mean effect size. The random effect of sampling error can be corrected for by averaging
effect sizes. When averaging effect sizes, each one should be weighted so that its contribu-
tion to the mean is proportionate to its precision. Two forms of weights can be used: the rec-
iprocals of the estimated variances of the observed effect sizes and the individual study
sample sizes, where the former is the better choice (see Table 1).

Homogeneity analysis. A subsequent step analyzes the adequacy of the mean effect size for
representing the entire distribution of effect size values. Calculating a mean effect size without
evidence of homogeneity is problematic. In that case, the calculated mean effect size is the
mean of several population effect sizes rather than an estimate of one population effect size.
Four alternative approaches have been used (see Table 1 for an overview).
Importantly, no single test is without drawbacks (Cortina, 2003). We offer two best-practice
recommendations. First and more preferable, researchers should develop hypotheses for mod-
erators a priori, and they should test these, regardless of the outcome of the heterogeneity tests.
Second and less preferable, the heterogeneity tests listed above should be used in combination,
as opposed to isolation, to suggest the presence of moderators post hoc. As a simple rule of
thumb, the researcher draws the conclusion that follows from the majority of tests.

Moderator analysis. The next phase of the analysis is to search for moderators, or bound-
ary conditions of relationships between variables. From a statistical perspective, weighted
regression analysis is recommended because it controls for correlations between moderators
and does not require moderators to be nested. It can be performed using fixed-effects or
random-effects models (Hedges, 1994; Raudenbush, 1994). Neither model is without draw-
backs. Although the fixed-effects model has more statistical power to detect moderator rela-
tionships, it has high Type I error rates when the underlying assumption is false that the
Geyskens et al. / Meta-Analysis Practices in Management Research 401

population effect size is the same in all studies. Conversely, the random-effects model is the
more realistic and therefore the better choice, but estimates of random effects can become
unstable when the number of studies meta-analysed is small (Schulze, 2004).

Step 3: Publication Bias

Publication bias, often referred to as the file drawer problem, is a serious problem for
the meta-analyst. Publication bias is present when the probability that a study is published is
contingent on the magnitude, direction, or significance of the studys results (Begg, 1994).
This may occur, for instance, when studies that fail to uncover statistically significant find-
ings are less likely to be submitted to journals or accepted for publication but are consigned
to the file drawer (Rosenthal, 1979).
There are three major approaches to deal with publication bias (Table 1), of which Duval
and Tweedies (2000) trim and fill method is clearly superior because it accounts for the
sample sizes of the studies and estimates not only the number of unpublished studies but also
the publication-bias-adjusted estimate of the true mean effect size.
Now that we have described the sequential set of critical decisions that the meta-analyst
makes, we continue by first evaluating the application of meta-analysis in management
research. Do meta-analyses in management adhere to established methodological guidelines,
or is there room for improvement? Second, we provide four meta-analytical examples to
illustrate how the decisions that a researcher makes may affect the conclusions reached
through meta-analysis.

Survey of Current Meta-Analytic Practices in Management Research

Sample and Coding

To investigate the application of meta-analysis in management research, we searched for


meta-analytical studies published in management journals in the period dating 1980 to 2007.
The management journals searched included MacMillans forum for strategy research (1991,
1994), supplemented with the journals included by Dalton and Dalton (2005) in their overview
of meta-analysis in strategic management. This yielded a sample of 69 meta-analytical studies
that statistically integrate the results of independent primary studies, of which 59 use Hunter
and Schmidts method (1990); 15, Hedges and Olkins method (1985); 3, Rosenthals method
(1991); and 1, Glasss method (Glass, McGaw, & Smith, 1981).7
The journals and the number of meta-analyses published in these journals are as follows:
Academy of Management Journal (n = 19), Academy of Management Review (n = 3),
Administrative Science Quarterly (n = 2), Entrepreneurship Theory and Practice (n = 2),
Journal of Business Research (n = 7), Journal of International Business Studies (n = 3),
Journal of Management (n = 12), Journal of Management Studies (n = 5), Journal of
Managerial Issues (n = 1), Management Science (n = 3), Omega (n = 2), Organization
Science (n = 1), Organization Studies (n = 5), and Strategic Management Journal (n = 4).
Each study was coded on a number of variables, using the framework developed in the
402 Journal of Management / April 2009

previous section. The last author coded all 69 studies. To check on the reliability of coding,
the first author coded a random sample of 25 studies. Coding agreement was perfect.

Results

Table 2 provides a summary of decisions made in the 69 meta-analyses on each issue


identified in Table 1. Some articles report more than one meta-analysis; that is, they meta-
analyze more than one relationship. All our descriptive statistics pertain to the article level
and not to the level of the individual meta-analysis.

Choice of effect size metric. Of the 69 studies reviewed, 47 studies use r, 9 use Fishers
z, 6 use d, and 7 use other metrics (e.g., omega squared). The literature is divided about
whether one should use r or its Fishers z-transformed value (see Table 1). This has also been
reflected in meta-analyses in management research, with 47 meta-analyses using r and 9
meta-analyses using its Fisher-z transform. Although there is no universal agreement about
the advantage of using z over r, 47 meta-analyses forsake the advantage of optimal weight-
ing that Fishers z transforming provides.

Correction for systematic artifacts. In total, 34 studies explicitly indicated that they cor-
rected for measurement error; 15 studies noted that they corrected for range restriction; and
5 studies mentioned correcting for dichotomization. Furthermore, 10 out of 15 studies cor-
rected for range restriction in continuous variables, whereas 5 studies corrected for range
restriction in dichotomous variables caused by divergence from a 50-50 split.8
Of the 34 studies that indicated correcting for measurement error, 13 (38%) correct for
measurement error at the individual level. For 7 out of these 13 studies, complete reliability
information was available. Five studies imputed sample-size-weighted reliability estimates
from other studies when reliability information was unavailable in the original studies, and
one meta-analysis performed individual-level measurement error corrections only for stud-
ies where reliability information was available.
In sum, 19 out of 34 (56%) studies correcting for measurement error used the method of
artifact distribution. Artifact distributions can be either empirical or assumed (Sackett,
2003). Using an empirical artifact distribution involves collecting whatever artifact infor-
mation is available in the studies included in the meta-analysis and assuming that this dis-
tribution represents the studies for which such artifact information was not available. In
contrast, using an assumed artifact distribution, as 7 studies in our sample did, involves con-
structing a distribution that is assumed to represent the artifacts in the set of studies meta-
analyzed. For example, a uniform reliability of .80 can be assumed. Assumed artifact
distributions should be used with caution because the accuracy of the estimates of the mean
and, especially, the variance of the population correlations is affected by how closely the
assumed artifact distributions match the true distributions of the artifacts, the degree to
which is unknown (Raju & Drasgow, 2003).
Our findings indicate that 35 out of 69 (51%) meta-analyses do not report making cor-
rections for any of these statistical artifacts. This does not imply that these meta-analyses are
Geyskens et al. / Meta-Analysis Practices in Management Research 403

Table 2
Summary of Decisions Made by Management
Researchers When Using Meta-Analysis
Decision Number of Articles Percentage of Total

Effect size metric


r 47 68
Fishers z 9 13
d 6 9
Other metrics (e.g., elasticities) 7 10
Correction for systematic attenuating artifacts
Measurement error 34 49
At individual level 13 19
Using empirical artifact distributions 12 17
Using assumed artifact distributions 7 10
No information provided about which type 2 3
of correction is used
Dichotomization 5 7
At individual level 5 7
Using empirical artifact distributions 0 0
Range restriction 15 22
At individual level 5 7
Using empirical artifact distributions 10 15
No information provided 35 51
Correction for interdependent effect sizes
Choice of one best estimate 4 5
Averaging 15 22
Composite correlations 10 15
Treating interdependent effect sizes as independent 3 4
No information provided 37 54
Identification of outliers
Sample size outliers 6 9
Effect size outliers 5 7
Traditional outlier detection techniques 4 6
Sample-adjusted meta-analytic deviancy 1 1
No information provided 58 84
Mean effect size
Unweighted average 9 13
Sample-size-weighted average 56 81
Variance-weighted average 3 4
No information provided 1 2
Homogeneity test
Chi-square test 24 35
75 percent rule 32 46
Credibility interval 15 22
Residual standard deviation 1 1
Other 1 1
No homogeneity test 13 19
Single method 41 59
Multiple methods 15 22

(continued)
404 Journal of Management / April 2009

Table 2 (continued)
Decision Number of Articles Percentage of Total

Moderator analysis
Subgroup analysis 42 61
Hierarchical 15 22
Nonhierarchical 27 39
Regression analysis / analysis of variance 18 26
Ordinary least squares 8 12
Weighted least squares 10 15
Fixed effects 15 22
Random effects 3 4
Correlation between moderator and effect size 6 9
Other methods (e.g., cluster analysis) 1 2
No moderator analysis 6 9
Publication bias
Comparison of published with unpublished studies 0 0
Rosenthals file drawer analysis 10 15
Trim and fill 1 2
Comparison of focal with non-focal relationships 1 2
No analysis of publication bias 57 83

Note: The number of articles adopting each methodological choice does not necessarily add up to 69, because each
article may combine multiple approaches in each stage of the meta-analytic procedure. For instance, in performing
moderator analysis, 5 articles combine subgroup analysis and regression analysis.

bad practice, given that some of these corrections may not have been required in a number
of these meta-analyses in the first place. (The alternativethat corrections may have been
implemented but not reportedis less likely, because it is common practice to report what
one has corrected for.) Nevertheless, the number of studies that do not correct for any type
of statistical artifact seems on the high side, especially when considering that it includes
measurement error as well. Measurement error has a unique status among the systematic
artifacts in that it is present in all data because there are no perfectly reliable measures
(Hunter & Schmidt 2004, p. 462). This practice may have resulted in biased findings and the
conclusion that effect sizes were more variable across samples than they actually were.
In contrast to measurement error, other systematic artifacts, such as range restriction and
dichotomization of continuous variables, may be absent in a set of studies being subjected to
meta-analysis. For example, there are research domains where the dependent variable has
never been dichotomized; hence, there need be no correction for that artifact. Only two meta-
analytical studies in our sample included a rationale for why dichotomization corrections
(Griffeth et al., 2000) and range restriction corrections in dichotomous variables (Sturman,
2003) were not needed. For the remaining 67 meta-analyses, it was not clear whether these
corrections were not needed or whether they were needed but overlooked. Thus, firm conclu-
sions about the state of the field on these dimensions are not possible. In view of this, we rec-
ommend future publication practice to always offer the grounds for not making corrections.
Geyskens et al. / Meta-Analysis Practices in Management Research 405

Correction for interdependent effect sizes. Thirty-seven meta-analyses did not indicate how
they dealt with interdependent effect sizes. It is unclear, however, whether the issue was
ignored or whether there were no interdependent effect sizes. In 3 studies, the authors decided
to treat correlations as independent, despite recognizing that interdependence between effect
sizes was present, thereby overrepresenting the studies in question. Of the 29 meta-analyses
that dealt with nonindependence, 10 (34%) combined the interdependent correlations into a
single correlation, using the formulas for the correlations of composites provided by Hunter
and Schmidt (1990: 457-460). Furthermore, 15 (52%) studies averaged the interdependent
correlations, and the average was used as the one value representing the study. In 4 studies,
the authors chose one estimate that they believed to be best, and they omitted the others.

Identification of outliers. Identifying outliers in meta-analytic datasets is important


because they can have a substantial impact on empirical findings and so alter the conclusions
reached. In terms of outlier identification, the data paint a fairly bleak picture: 58 studies
(84%) did not report whether they tested for outliers; futhermore, only 5 studies tested for
sample size outliers whereas 6 studies tested for effect size outliers. We recommend against
excluding studies with large sample sizesnamely, because the statistical theory underlying
the meta-analysis procedures described below requires that in any statistical analysis, esti-
mates based on large effect sizes should play a larger role than estimates based on smaller
samples. Removing sample size outliers places a bonus on small and moderate sample-size
studies, whereas an effect size based on a large (random) sample is a more precise estimate
of the population value than an effect size based on a small (random) sample.
In addition, 5 out of 6 studies (83%) that tested for effect size outliers used traditional out-
lier detection techniques, such as schematic plot analysis and number of standard deviations
from the mean, which are inappropriate in the context of a meta-analysis because they do not
take sample size into account. Only one study in our review used the more appropriate SAMD.
Finally, 5 out of 11 studies (45%) testing for outliers followed recommended practice and
compared the results from a meta-analysis on the full data set with the results from a com-
parable meta-analysis on a reduced data set excluding the identified outliers.

Mean effect size. We coded whether studies reported weighted or unweighted mean effect
sizes. Given the importance of weighting so that effect sizes that embody less sampling error
play a proportionally larger role than do those embodying more sampling error, it is encour-
aging that only 9 meta-analyses (13%) reported only an unweighted mean effect size and
hence treated all effect sizes as if they were equal.

Homogeneity analysis. Of the 69 meta-analyses that we reviewed, 13 (19%) did not test
for heterogeneity in effect sizes; the other 56 approached the question of homogeneity of
effect sizes in four ways: The chi-square test was used in 24 of these 56 meta-analyses
(43%). This test examines whether the observed variation in effect size values is greater than
that expected from sampling error alone. When this test is significant, it suggests the pres-
ence of possible moderator variables. Furthermore, 32 of 56 meta-analyses (57%) used the
75% rule of thumb, which states that looking for moderators is warranted if less than 75%
of the observed variance in correlations is attributable to sampling error and artifacts. Fifteen
406 Journal of Management / April 2009

meta-analyses (27%) calculated a credibility interval around the mean effect size. When a
credibility interval is sufficiently wide (exceeding .11) or includes zero, moderators are prob-
ably in operation (Koslowsky & Sagie, 1993). One meta-analysis considered the size of the
residual standard deviation, which is the standard deviation in the observed correlations after
removing sampling error and study-to-study artifact variations. Forty-one of 56 meta-analyses
testing for heterogeneity (73%) concentrated on one test, whereas the other 15 meta-analyses
(27%) used multiple approaches to test for heterogeneity. The latter approach is preferable
because none of the heterogeneity tests are without drawbacks (see Table 1).
It is important to note that two meta-analyses have confused credibility intervals with
confidence intervals, which can lead to substantive differences in conclusions. Whereas a
credibility interval addresses the question of whether moderators are operating, a confidence
interval provides information about the accuracy and the significance of the estimate of the
population effect size or, in other terms, the extent to which sampling error remains in the
mean effect size (Whitener, 1990). The appropriate procedure for assessing whether moder-
ators are operating is to use credibility intervals, not confidence intervals.9

Moderator analysis. The most common moderator analysis is subgroup analysis (42
meta-analyses; 61%), followed by multiple regression analysis (18 meta-analyses; 26%). Six
meta-analyses (9%) report the correlation between the moderator and the effect size, and one
(2%) uses cluster analysis. Six meta-analyses (9%) do not search for moderators.
Of 42 meta-analyses performing subgroup analyses, only 15 (36%) performed hierarchi-
cal subgroup analyses. It is disconcerting that of those studies that searched for moderators,
64% used simple, one-by-one moderator testing. Because moderators are seldom orthogo-
nal, these studies are not able to separate effects of different moderators. Furthermore, of 18
meta-analyses performing multiple regression analysis, only 10 used weighted least squares.
The use of fixed-effects models is the rule rather than the exception. Only 3 out of 18
studies (17%) performing a multiple regression analysis used a random-effects model, with
15 studies (83%) using a fixed-effects model. These 15 studies may have understated the
actual uncertainty in research findings.

Publication bias. The results for publication bias testing are rather unsettling. The major-
ity of 57 meta-analyses (83%) did not report a publication bias test.10 Hence, caution is
needed when drawing conclusions from the results obtained because undetected publication
bias may lead to spurious conclusions. When a publication bias test is used, it is nearly
always Rosenthals file drawer method (1979) or at least a variant of. One meta-analysis
used the trim and fill method. Another meta-analysis evaluated publication bias by com-
paring the effect size between articles where the relationship is focal and those where the
relationship is nonfocal, with the idea being that effect sizes for nonfocal relationships are
not critical determinants of publishability and are thus less likely to suffer from publication
bias (Wagner, Stimpert, & Fubara, 1998).
The paucity of meta-analytical studies testing for publication bias offers cause for con-
cern because confidence in the validity of the findings of a meta-analysis depends on ruling
out the presence of publication bias. In conducting a meta-analysis, researchers should
always make efforts to assess to what extent publication bias may affect their findings,
preferably using the trim and fill method.
Geyskens et al. / Meta-Analysis Practices in Management Research 407

Relationship of Meta-Analytic Decisions With Time

Since the pioneering work in the 1970s, the number of articles using and further devel-
oping meta-analytical methods has increased substantially. Considering the length of time
that meta-analysis has been applied in management research, one would expect its more
recent applications to be less likely to overlook the critical issues outlined in Table 1. If this
is true, our concern about the state of meta-analytic application in management research may
be lessened. To assess whether the application of meta-analysis has improved over time, we
grouped the 69 meta-analytical studies into two periods: early meta-analyses (1980 to 1994;
n = 24) and recent meta-analyses (1995 to 2007; n = 45).
For six meta-analytic practices, we compared the percentage of studies following that
practice between the two periods, using a z test to compare the proportions from two inde-
pendent groups. The meta-analytic practices considered were as follows: correcting for mea-
surement error, reporting weighted mean effect sizes, using multiple heterogeneity tests in
combination, using moderator regression analysis (as opposed to subgroup analysis), using
weighted versus unweighted regression analysis, and performing a publication bias test.
These meta-analytic practices are incontestably best practices, and sufficient information is
available in each of the 69 meta-analyses to make a meaningful comparison.11 No significant
differences were found, for any of the meta-analysis practices, between meta-analyses pub-
lished recently versus those published earlier. Taken together, the lack of a significant trend
toward more accurate meta-analytic methods and the low percentages overall suggest that
there is no compelling evidence that meta-analytical applications in management research
have yet caught up with statistical progress in meta-analytical methods.

Relationship of Meta-Analytic Decisions With Rigor of Review Process

Review processes are typically more rigorous in higher-quality journals. Hence, it is pos-
sible that meta-analyses published in top-tier journals have made relatively higher-quality
decisions than have meta-analyses published in second-tier journals. We classified journals
as top-tier when they had an impact factor larger than 2 and a cited half-life larger than 10
years. Again, we used a z test to compare the percentages of studies, between top-tier jour-
nals and other journals, following the same six meta-analytic practices discussed above. No
significant differences were found in meta-analytic techniques adopted by studies published
in top-tier journals versus other journals.
In addition, journals that publish a lot of meta-analyses are more likely to have more
meta-analysis experts on their review boards and among their ad hoc reviewers. We distin-
guished between journals that published more than 10 meta-analysesAcademy of
Management Journal (n = 19) and Journal of Management (n = 12)versus those that pub-
lished fewer than 10 meta-analyses (all other journals) in the 19802007 period. Again, no
significant differences were found for any of the meta-analysis practices between studies
printed in journals that publish meta-analyses frequently and those printed in journals that
publish meta-analyses infrequently.
408 Journal of Management / April 2009

Consequences of Critical Decisions in Meta-Analysis

Although the previous analyses suggest that meta-analytical practice in management


research has not generally caught up with established methodological guidelines, these find-
ings do not offer direct evidence regarding whether overlooking certain critical issues does
in fact influence meta-analytical results. The purpose of this section is to examine the effect
of overlooking the critical issues on empirical results by meta-analyzing four relationships
that have been studied with varying degrees of frequency in the management literature
namely, the relationship between transaction-specific assets and the choice between hierar-
chical and market governance (Meta-Analysis 1, k = 78 independent samples), the
relationship between asymmetric ownership and alliance performance (Meta-Analysis 2, k
= 21 independent samples), the relationship between alliance size and alliance performance
(Meta-Analysis 3, k = 18 independent samples), and the relationship between international
experience and alliance performance (Meta-Analysis 4, k = 10 independent samples). These
values are similar to many published meta-analyses, as well as previously published Monte
Carlo simulations of meta-analyses (e.g., Beal, Corey, & Dunlap, 2002; Harwell, 1997).12

Full-Fledged Meta-Analyses

For each of the four relationships under study, we carried out a full-fledged statistical
meta-analysis of effect sizes. First, we converted every outcome statistic (e.g., r, univariate
F, t, 2, ) to a correlation coefficient. Second, we individually corrected each retrieved cor-
relation coefficient, r, for the biasing influence of four systematic artifacts (if applicable):
dichotomization of a continuous dependent variable, dichotomization of a continuous inde-
pendent variable, range restriction in a dependent dichotomous variable, and range restric-
tion in an independent dichotomous variable.
Third, if a sample reported more than one correlation for a single relationship (e.g.,
because of multiple operationalizations of the same construct), these correlations were com-
bined into a linear composite correlation using the formulas provided by Hunter and
Schmidt (1990: 457-460). Reliabilities of the newly formed combined measures were com-
puted accordingly, using the Mosier formula (Hunter & Schmidt, 1990: 461). In the few
cases where this was not possible, the correlations were averaged, and only the average cor-
relation was entered into the meta-analysis.
Fourth, we computed Huffcutt and Arthurs sample-adjusted meta-analytic deviancy sta-
tistic (1995) to detect outlying observations (i.e., correlations). These were subsequently
dropped from the data set. Then, the partially corrected data points were meta-analyzed,
yielding a sample-size-weighted mean correlation. The obtained mean was corrected for
measurement error using the method of artifact distributions, because measurement error
information was not available for all data points. These corrections yield the key summary
statistic that describes the true relation between the study variables in the population: the
average corrected correlation (see Table 3).
Table 3
Consequences of Critical Decisions in Meta-Analysis
Asymmetric Alliance International
TSAHierarchical OwnershipAlliance SizeAlliance ExperienceAlliance
Governance Performance Performance Performance
Choice (k = 78) (k = 21) (k = 18) (k = 10)
Meta-Analysis 1 Meta-Analysis 2 Meta-Analysis 3 Meta-Analysis 4

Decision Difference in (%) Difference in (%) Difference in (%) Difference in (%)

Full-fledged meta-analysis .16 .06 .14 .32


No systematic artifact correction at the .14 13 .01 78 .06 57 .20 38
individual levela
No systematic artifact correction using .12 25 .05 17 .13 7 .27 16
artifact distributionsb
No correction for interdependent .09 44 .03 50 .14 0 .29 9
effect sizes
No outlier elimination .10 38 0 0 .25 22
Unweighted .21 31 .10 67 .15 7 .36 13
No systematic artifact correctionsc .06 63 .01 82 .06 60 .12 63
and no outlier elimination

Note: TSA = transaction-specific assets. Dashes () indicate not applicable.


a. Dichotomization, range restriction in dichotomous variables.
b. Measurement error.
c. Dichotomization, range restriction, measurement error.

409
410 Journal of Management / April 2009

Before and After Analyses

We then reran our meta-analyses five times, to demonstrate how the decisions that a
researcher makes may affect the conclusions reached through meta-analysis. First, we cal-
culated without correcting for systematic artifacts at the individual correlation levelthat
is, dichotomization and range restriction in a dichotomous variable. Second, we calculated
without correcting for measurement error using artifact distributions. Third, we calculated
without correcting for nonindependent observations. Fourth, we calculated without
removing outliers. Fifth, we calculated an unweighted as opposed to a sample-size-
weighted .
Measurement error, dichotomization of continuous variables, and range restriction sys-
tematically attenuate the effect size. As shown in Table 3, not correcting for systematic arti-
facts at the individual level (dichotomization of continuous variables, range restriction in
dichotomous variables) downwardly affects by 13% (Meta-Analysis 1), 38% (Meta-
Analysis 4), 57% (Meta-Analysis 3), and 78% (Meta-Analysis 2).13 Not correcting for mea-
surement error using artifact distributions results in a downward effect on , ranging from
7% to 25%. Thus, none of the systematic artifacts can be routinely ignored.
Although not correcting for interdependent effect sizes has no effect in Meta-Analysis 3,
the other three values are reduced by 9%, 44%, and 50%.14 On the basis of a sample-
adjusted meta-analytic deviancy analysis, we identified three outliers for Meta-Analysis 1
and one outlier for Meta-Analysis 4. Not removing these outliers negatively affects these
relationships, with 38% and 22%, respectively. Calculating an unweighted instead of a
sample-size-weighted leads to an upward bias in all four meta-analyses, ranging from 7%
to 67%.
It is important to note that whereas corrections for measurement error, dichotomization
of continuous variables, and range restriction systematically increase , there is no reason
to expect a consistent effect for the other corrections. Results depend on the specific corre-
lations in the studies that were not combined, eliminated, or weighted. In our case, results
across the four meta-analytical examples are consistent by chance, but this is not to be
expected in all cases.
More than half the 69 meta-analyses that were included in our content analysis did not
correct for systematic artifacts (dichotomization, range restriction, and measurement error),
nor did they eliminate outliers. We therefore also calculated for the combination of not
correcting for systematic artifacts (dichotomization, range restriction in a dichotomous vari-
able) at the individual level, not correcting for measurement error using artifact distribu-
tions, and not eliminating outliers. Combined, failure to correct for these biasing factors
leads to a downward effect on ranging from 60% to 82% (Table 3, seventh data row).
We further examined whether the individual correlations on which the average correla-
tions are based are drawn from the same population, using four tests: chi-square, Hunter and
Schmidts 75% rule-of-thumb, a credibility interval, and the residual standard deviation (see
Table 4).
Some of the heterogeneity statistics in our four meta-analytic examples point in different
directions, underscoring the importance of using a combination of heterogeneity tests. For
example, although the significant chi-square statistic in the meta-analysis of the asymmet-
ric ownership relationship suggests the presence of moderators, the small residual standard
Table 4
Heterogeneity Tests
TSA Asymmetric Alliance International
Hierarchical OwnershipAlliance SizeAlliance ExperienceAlliance
Governance Performance Performance Performance
Choice (k = 78) (k = 21) (k = 18) (k = 10)
Test Meta-Analysis 1 Meta-Analysis 2 Meta-Analysis 3 Meta-Analysis 4

Chi-square 756.6 (p < .01) 36.11 (p < .05) 27.08 (p > .05) 17.22 (p < .05)
Result suggests presence of moderators? Yes Yes No Yes
75% rule 10.7 58.1 66.5 58.1
Result suggests presence of moderators? Yes Yes Yes Yes
Credibility interval .16, .49 .38, .49 .05, .32 .08, .56a
Result suggests presence of moderators? Yes Yes Yes Yes
Residual standard deviation .13 .02 .02 .04
Result suggests presence of moderators?b No No No No

Note: TSA = transaction-specific assets.


a. Although this credibility interval does not include zero, it is wider than .11 and therefore suggests the presence of moderators.
b. Any selection of a cutoff for the residual standard deviation is somewhat arbitrary. We follow Cortinas recommendation (2003) that .19 be the absolute maximum.

411
412 Journal of Management / April 2009

deviation suggests the absence of moderators. As another example, although the nonsignifi-
cant chi-square in Meta-Analysis 3 suggests the absence of moderators, the wide credibility
interval including zero suggests the presence of moderators. Even when we focus solely on
the two most uncontested heterogeneity tests (the credibility interval and the residual stan-
dard deviation), we find inconclusive results in two out of four cases. Whereas the credibil-
ity intervals of Meta-Analyses 2 and 3 suggest the existence of moderators, the residual
standard deviations suggest the opposite, which further underlies the importance of specify-
ing moderator hypotheses a priori and testing for them regardless of the outcome of the het-
erogeneity tests.
We further illustrate the importance of not confusing confidence intervals and credibility
intervals in Meta-Analysis 1. This meta-analysis results in a confidence interval of .09, .16
and a credibility interval of .16, .49. The confidence interval does not include zero, indi-
cating that the mean effect size is significantly different from zero. The credibility interval
is wide and it includes zero, indicating that the mean effect size is the mean of several sub-
populations and that moderators are in operation. If the meta-analyst had interpreted the con-
fidence interval as if it were a credibility interval, it would have led to the faulty conclusion
that a search for moderators is not warranted. Conversely, if the meta-analyst had interpreted
the credibility interval as if it were a confidence interval, it would have led to the faulty con-
clusion that the mean effect size is not significantly different from zero.
Finally, we compare the use of regression analysis with the use of subgroup analysis for
our largest meta-analysis (Meta-Analysis 1, k = 75 after removal of 3 outliers). We tested for
the following two study characteristics as potential moderators: data source (primary data
versus secondary data) and country (United States versus other country). If the meta-analyst
had performed two subgroup analyses, she or he would have found a significant effect of
country ( = .12 for the United States versus .21 for other countries; Z = 2.97, p < .01) and
data source ( = .13 for primary data versus .19 for secondary data; Z = 1.99, p < .05). In
contrast, a weighted least squares multiple regression of the artifact-adjusted correlation
coefficient on data source (1 = primary data, 0 = secondary data) and country (1 = United
States, 0 = other) simultaneously shows a significant effect of country (b = .088,
p = .05, for the fixed-effects model; b = .085, p = .09, for the random-effects model) but a
nonsignificant effect of data source (b = .029, p = .50 for the fixed-effects model; b = .030,
p = .61, for the random-effects model).

Conclusion

For the further advancement of management science, meta-analyses are increasingly used
to summarize knowledge and establish empirical generalizations. The soundness of such
empirical generalizations depends on the accuracy with which meta-analysis methods are
applied. Any critical issue that is overlooked can have fundamental implications for the qual-
ity of the resulting empirical generalizations. In view of this, we conducted a comprehensive
review of meta-analytic practices. Our meta-analysis of meta-analyses shows that there is
considerable room for improvement in the way that meta-analyses are conducted and
Geyskens et al. / Meta-Analysis Practices in Management Research 413

reported in management research. Our concern about meta-analytic decisions is validated by


four empirical examples that provide concrete evidence indicating that such decisions can
make a considerable difference. It is therefore important that researchers give adequate
thought to the various decisions required in meta-analysis and to be sensitive to the conse-
quences of those decisions.
We highlight the following areas as being particularly important for improvement. We
urge researchers to make greater use of composite correlations to correct for interdependent
effect sizes. We further advise researchers to correct for measurement error and dichotomiza-
tion of continuous variables, if this information is available for nearly every study. We rec-
ommend that outliers be identified using the sample-adjusted meta-analytic deviancy
statistic. Heterogeneity tests should never be used in isolation. Moderator hypotheses should
be offered a priori, and they should always be tested, regardless of the outcome of hetero-
geneity tests. Moderator tests should be performed using weighted least squares. Finally,
researchers should always assess to what extent publication bias may affect their findings,
preferably by using the trim and fill method.
Some other meta-analytic decisions are not uncontestable. Regarding systematic artifact
correction, there is no best method for artifact correction using artifact distributions and for
corrections for range restriction. Similarly, it is not established that outliers, once identified,
should be eliminated. We therefore advise to report meta-analytic effects with and without
systematic artifact correction and with and without outlier elimination.
We further urge researchers to be much more specific in their reporting. We found that
substantial percentages of studies omitted important information about their meta-analyses.
For example, we found that 54% of the time, meta-analyses did not specify whether inter-
dependent effect sizes occurred and were corrected for. Similarly, 51% of the meta-analyses
did not specify whether artifact correction was needed, and 84% provided no information
about outlier identification. This may be due to a persistent belief that it does not matter very
much which decisions are made, so there is no reason to report this information (cf. Conway
& Huffcutt, 2003); alternatively, the issue in question did not occur in the meta-analysis, or
the issue was simply overlooked. We have demonstrated that meta-analytic decisions can be
quite consequential (see our four examples where simply ignoring a few outliers or a few
correlations that need to be addressed for systematic artifacts or interdependencies can cre-
ate fairly large differences in the outcome of the meta-analysis); as such, it is important for
readers to be able to evaluate researchers meta-analytic results. This requires that
researchers report important decisions regarding their analyses, as summarized in Table 5.
Researchers should report every decisionand they should report on the reasoning behind
every decisionso that others can reanalyze their meta-analyses with alternative decisions.
We hope that our guidelines will stimulate researchers to give adequate thought to their
meta-analytic decisions and that they will help shape future applications of meta-analysis in
management research. Meta-analysis is an important research tool. With the recognition of
how it can be done better, it will continue to serve an important function in management
research.
414 Journal of Management / April 2009

Table 5
Checklist of What to Report
Meta-Analytic Decision Authors Should Report . . .

Effect size metric Whether a Fishers z transform was used


Corrections for systematic For how many effect sizes measurement
attenuating artifacts error was reported
How many effect sizes were affected by
dichotomization of continuous
variables and range restriction
Results with and without artifact corrections
Whether corrections were made at the individual
level or using the method of artifact
distributions (empirical or assumed)
Correction for interdependent effect sizes How many interdependent effect sizes were present
How interdependent effect sizes were corrected for
Identification of outliers Whether outliers were present or not
Outlier test and results with and without outlier elimination
Mean effect size Whether variance weights or sample-size weights were used
Homogeneity analyses Multiple homogeneity tests
Moderator analyses Whether variance weights or sample size weights were used
Whether fixed effects or random effects were used and why
Publication bias An estimate of the number of unpublished studies and the
publication-bias-adjusted estimate of the true effect size

Appendix:
Studies Included in the Alliance Performance Meta-Analyses
Aulakh, P., Kotabe, M., & Sahay, A. 1996. Trust and performance in cross-border marketing partnerships: A behav-
ioral approach. Journal of International Business Studies, 27: 1005-1032.
Boateng, A., & Glaister, K. W. 2002. Performance of international joint ventures: Evidence for West Africa.
International Business Review, 11: 523-541.
Child, J., & Yan, Y. 2003. Predicting the performance of international joint ventures: An investigation in China.
Journal of Management Studies, 40: 283-320.
Choi, C., & Beamish, P. W. 2004. Split management control and international joint venture performance. Journal
of International Business Studies, 35: 201-215.
Cullen, J. B., Johnson, J. L., & Sakano, T. 1995. Japanese and local partner commitment to IJVs: Psychological
consequences of outcomes and investments in the IJV relationship. Journal of International Business Studies,
26: 91-115.
Deeds, D. L., & Rothaermel, F. T. 2003. Honeymoons and liabilities: The relationship between age and perfor-
mance in research and development alliances. Journal of Product Innovation Management, 20: 468-484.
Garcia-Canal, E., Valdes-Llaneza, A., Arino, A. 2003. Effectiveness of dyadic and multi-party joint ventures.
Organization Studies, 24: 743-770.
Heide, J. B., & Stump, R. L. 1995. Performance implications of buyersupplier relationships in industrial markets.
Journal of Business Research, 32: 57-66.
Hill, R. C., & Hellriegel, D. 1994. Critical contingencies in joint venture management: Some lessons from man-
agers. Organization Science, 5: 594-607.
Hu, M. Y., & Chen, H. 1996. An empirical analysis of factors explaining foreign joint venture performance in
China. Journal of Business Research, 35: 165-173.

(continued)
Geyskens et al. / Meta-Analysis Practices in Management Research 415

Appendix: (continued)
Isobe, T., Makino, S., & Montgomery, D. B. 2000. Resource commitment, entry timing and market performance of
foreign direct investments in emerging economies: The case of Japanese international joint ventures in China.
Academy of Management Journal, 43: 468-484.
Konopaske, R., Werner, S., & Neupert, K. E. 2002. Entry mode strategy and performance: The role of FDI staffing.
Journal of Business Research, 55: 759-770.
Krishnan, R., Martin, X., & Noorderhaven, N. G. 2006. When does trust matter to alliance performance? Academy
of Management Journal, 49: 894-917.
Lane, P. J., Salk, J. E., & Lyles, M. A. 2001. Absorptive capacity, learning, and performance in international joint
ventures. Strategic Management Journal, 22: 1139-1161.
Lee, J., Chen, W., & Kao, C. 2003. Determinants and performance impact of asymmetric governance structures in
international joint ventures: An empirical investigation. Journal of Business Research, 56: 815-828.
Lin, X., & Germain, R. 1998. Sustaining satisfactory joint venture relationships: The role of conflict resolution
strategy. Journal of International Business Studies, 29: 179-196.
Lopez-Navarro, M. A., & Camison-Zornoza, C. 2003. The effect of group composition and autonomy on the per-
formance of joint ventures (JVs): An analysis based on Spanish export JVs. International Business Review, 12:
17-39.
Lu, J. W., & Beamish, P. W. 2006. Partnering strategies and performance of SMEs international joint ventures.
Journal of Business Venturing, 21: 461-486.
Luo, Y. 1997. Partner selection and venturing success: The case of joint ventures with firms in the Peoples Republic
of China. Organization Science, 8: 648-662.
Luo, Y. 2002a. Contract, cooperation and performance in international joint ventures. Strategic Management
Journal, 23: 903-919.
Luo, Y. 2002b. Product diversification in international joint ventures: Performance implications in an emerging
market. Strategic Management Journal, 23: 1-20.
Luo, Y. 2007. Are joint venture partners more opportunistic in a more volatile environment? Strategic Management
Journal, 28: 39-60.
Luo, Y., & Park, S. H. 2004. Multiparty cooperation and performance in international equity joint ventures. Journal
of International Business Studies, 35: 142-160.
Lyles, M. A., Doanh, L. D., & Barden, J. Q. 2000. Trust, organizational controls, knowledge acquisition from for-
eign parents, and performance in Vietnamese international joint ventures (Working Paper No. 329). Ann Arbor:
University of Michigan, Stephen M. Ross School of Business.
Makino, S., & Delios, A. 1996. Local knowledge transfer and performance: Implications for alliance formation in
Asia. Journal of International Business Studies, 27: 905-927.
Mjoen, H., & Tallman, S. 1997.Control and performance in international joint ventures. Organization Science, 8:
257-274
Pearce, R. J. 2001. Looking inside the joint venture to help understand the link between inter-parent cooperation
and performance. Journal of Management Studies, 38: 557-582.
Pothukuchi, V., Damanpour, F., Choi, J., Chen, C. C., & Park, S. H. 2002. National and organizational culture dif-
ferences and international joint venture performance. Journal of International Business Studies, 33: 243-265.
Ramaswamy, K., Gomes, L., & Veliyath, R. 1998. The performance correlates of ownership control: A study of U.S.
and European MNE joint ventures in India. International Business Review, 7: 423-441.
Sengupta, S., Castaldi, R. M., & Silverman, M. 2000. A dependencetrust model of export alliance performance in
small and medium enterprises (SMEs). Journal of Transnational Management Development, 5: 25-40.
Sim, A. B., & Ali, Y. 1998. Performance of international joint ventures from developing and developed countries:
An empirical study in a developing country context. Journal of World Business, 33: 357-377.
Steensma, H. K., Tihanyi, L., Lyles, M. A., & Dhanaraj, C. 2005. The evolving value of foreign partnerships in tran-
sitioning economies. Academy of Management Journal, 48: 213-235.
Yan, A., & Gray, B. 2001. Antecedents and effects of parent control in international joint ventures. Journal of
Management Studies, 38: 393-416.
Yeheskel, O., Zeira, Y., Shenkar, O., & Newburry, W. 2001. Partner company dissimilarity and equity international
joint venture effectiveness. Journal of International Management, 7: 81-104.
416 Journal of Management / April 2009

Notes
1. These steps are preceded by problem specification and an extensive search of the literature to retrieve the
relevant studies. It is outside the scope of this article to discuss these steps, given that they do not distinguish meta-
analyses from narrative reviews. See Lipsey and Wilson (2001) for details.
2. For ease of explication and because of rs dominance in management research, meta-analytic decisions are dis-
cussed in this article in terms of r. We do not want to imply that r is superior to d. The same principles apply to d.
3. Formulas for computing r from a variety of commonly reported statistics are provided in the work of Hunter
and Schmidt (1990: 272) and Lipsey and Wilson (2001: 189-195). Peterson and Brown (2005) have recently pro-
vided a formula to convert standardized regression coefficients into r. Descriptive statistics, such as the mean of a
variable or the proportion of respondents in a sample with a particular characteristic, can also be used as effect size
statistics. This type of effect size is, however, seldom meta-analyzed in management research.
4. In addition to these systematic artifacts, the correlation coefficient is affected by sampling error. Sampling
error is a nonsystematic artifact: The magnitude of the sampling error in any one study is unknown; hence, the sam-
pling error in a single correlation cannot be corrected. Sampling error can, however, be corrected for at the level of
the meta-analysis, which is discussed in Step 2.
5. Note that truly dichotomous theoretical constructs (e.g., gender) are appropriately represented by dichoto-
mous variables, which should not be corrected for dichotomization.
6. Although measurement error information is usually available for nearly every study, this is not always the
case. Dichotomization information is typically available for every study (or nearly so); hence, correlations can be
individually corrected for the attenuating effect of dichotomization. Range restriction information for dichotomous
variables is reported in almost every study, whereas this is not so for range restriction information for continuous
variables.
7. In addition, we located seven studies that were exclusively based on the vote-counting procedure or on the
combination of significance levels, and these studies were not confined to earlier years. In fact, four out of the seven
studies were published after 1995. Because meta-analysis of effect sizes is more informative and more precise than
vote counting and the combination of significance levels, we concentrate on the former in this article.
8. Because the variance of a dichotomous variable is the proportion of cases in one group multiplied by the
proportion of cases in the other group, the maximum variance of a dichotomized variable is .25, and it occurs in
case of a 50-50 split on the dichotomous variables. For correlations between a continuous variable and a dichoto-
mous variable, the range restriction correction estimates what the correlation would have been in case of a 50-50
split on the dichotomous variable. The maximum correlation possible after this correction is made is .80 (specifi-
cally, .7978). The correction for dichotomization further increases the maximum size of the correlation to 1.00.
9. Confidence intervals are generated around the uncorrected weighted mean effect size, using the standard
error of the mean effect size. If a confidence interval includes zero, it suggests that the mean effect size is not sig-
nificantly different from zero. Credibility intervals are generated around the mean-corrected effect size, using the
corrected standard deviation around the mean. If a credibility interval is wide and includes zero, it suggests that
moderators may be operating. For more information, see Whitener (1990).
10. Twelve studies included unpublished papers. Whereas doing so may attenuate publication bias, it does not
solve the problem, because one can never be sure that all, or even most, of the unpublished studies have been iden-
tified (Begg, 1994).
11. We did not compare the percentages of studies correcting for range restriction, correcting for dichotomiza-
tion of continuous variables, correcting for interdependent effect sizes, and identifying outliers, because 51% of the
meta-analyses did not specify whether artifact correction was needed and 84% provided no information about out-
lier identification; that is, outliers may have been tested for, but the test may not have been reported, because no
outliers were identified.
12. Data for the relationship between transaction-specific assets and the choice between hierarchical versus market
governance were taken from Geyskens, Steenkamp, and Kumar (2006). Data for the three meta-analyses on alliance
performance were identified through four phases of data collection. First, we performed a bibliographic search of
ABI/Inform Global, EconLit, JSTOR, Kluwer Online, Elsevier Science Direct, and the Social Science Citation Index,
using the terms joint venture(s) and strategic alliance(s). Second, we performed manual searches over the 19802007
period of 10 leading journals in management and marketing: Academy of Management Journal, Administrative Science
Quarterly, Journal of International Business Studies, Journal of Management, Journal of Marketing, Journal of
Geyskens et al. / Meta-Analysis Practices in Management Research 417

Marketing Research, Management Science, Organization Science, Organization Studies, and Strategic Management
Journal. Third, we performed Internet searches using standard search engines. Finally, we examined the reference sec-
tions of all the articles retrieved and of prior narrative reviews of the strategic alliance literature (e.g., Gulati, 1998).
Studies included in the three meta-analyses on alliance performance are listed in the appendix.
13. The number of dichotomized and range-restricted continuous variables equals 14 dependent and 15 indepen-
dent variables in Meta-Analysis 1, 3 independent and 2 dependent variables in Meta-Analysis 2, 4 dependent and no
independent variables in Meta-Analysis 3, and 2 dependent and 2 independent variables in Meta-Analysis 4.
14. If we had not corrected for interdependencies, the number of correlations in Meta-Analyses 1, 2, 3, and 4
would have increased from 78 to 104, from 21 to 29, from 18 to 23, and from 10 to 14, respectively. Most interde-
pendent correlations arose because studies used multiple measures to assess transaction-specific assets (Meta-
Analysis 1) or alliance performance (Meta-Analyses 24).

References

Arthur, W., Jr., Bennett, W., Jr., & Huffcutt, A. I. 2001. Conducting meta-analysis using SAS. Mahwah, NJ:
Lawrence Erlbaum.
Balkundi, P., & Harrison, D. A. 2006. Ties, leaders, and time in teams. Strong inference about network structures
effects on team viability and performance. Academy of Management Journal, 49: 49-68.
Bass, A. R., & Ager, J. 1991. Correcting point-biserial turnover correlations for comparative analysis. Journal of
Applied Psychology, 76: 595-598.
Beal, D. J., Corey, D. M., & Dunlap, W. P. 2002. On the bias of Huffcutt and Arthurs (1995) procedure for identi-
fying outliers in the meta-analysis of correlations. Journal of Applied Psychology, 87: 583-589.
Begg, C. B. 1994. Publication bias. In H. M. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis:
399-409. New York: Russell Sage.
Bhaskar-Shrinivas, P., Harrison, D. A., Shaffer, M. A., & Luk, D. M. 2005. Input-based and time-based models of
international adjustment: Meta-analytic evidence and theoretical extensions. Academy of Management Journal,
48: 257-281.
Callahan, J. S., Kiker, D. S., & Cross, T. 2003. Does method matter? A meta-analysis of the effects of training
method on older learner training performance. Journal of Management, 29: 663-680.
Cheung, S. F., & Chan, D. K.-S. 2004. Dependent effect sizes in meta-analysis: Incorporating the degree of inter-
dependence. Journal of Applied Psychology, 89: 780-791.
Cohen, A. 1993. Organizational commitment and turnover: A meta-analysis. Academy of Management Journal, 36:
1140-1157.
Combs, J. G., & Ketchen, D. J., Jr. 2003. Why do firms use franchising as an entrepreneurial strategy? A meta-
analysis. Journal of Management, 29: 443-465.
Conway, J. M., & Huffcutt, A. I. 2003. A review and evaluation of exploratory factor analysis practices in
organizational research. Organizational Research Methods, 6: 147-168.
Cortina, J. M. 2003. Apples and oranges (and pears, oh my!): The search for moderators in meta-analysis.
Organizational Research Methods, 6: 415-439.
Dalton, D. R., & Dalton, C. M. 2005. Strategic management studies are a special case for meta-analysis. Research
Methodology in Strategy and Management, 2: 31-63.
Datta, D. K., & Narayanan, V. 1989. A meta-analytic review of the concentrationperformance relationship:
Aggregating findings in strategic management. Journal of Management, 15: 469-483.
Duval, S., & Tweedie, R. 2000. A nonparametric trim and fill method of accounting for publication bias in meta-
analysis. Journal of the American Statistical Association, 95: 89-98.
Geyskens, I., Steenkamp, J. B. E. M., & Kumar, N. 2006. Make, buy, or ally: A transaction cost theory meta-analysis.
Academy of Management Journal, 49: 519-543.
Glass, G. V., McGaw, B., & Smith, M. L. 1981. Meta-analysis in social research. Beverly Hills, CA: Russell Sage.
Griffeth, R. W., Hom, P. W., & Gaertner, S. 2000. A meta-analysis of antecedents and correlates of employee
turnover: Update, moderator tests, and research implications for the next millennium. Journal of Management,
26: 463-488.
Gulati, R. 1998. Alliances and networks. Strategic Management Journal, 19: 293-317.
418 Journal of Management / April 2009

Harwell, M. 1997. An empirical study of Hedges homogeneity test. Psychological Methods, 2: 219-231.
Hedges, L. V. 1994. Fixed effects models. In H. M. Cooper & L. V. Hedges (Eds.), The handbook of research syn-
thesis: 285-299. New York: Russell Sage Foundation.
Hedges, L. V., & Olkin, I. 1985. Statistical methods for meta-analysis. San Diego, CA: Academic Press.
Huffcutt, A. I., & Arthur, W. 1995. Development of a new outlier statistic for meta-analytic data. Journal of Applied
Psychology, 80: 327-334.
Hunter, J. E., & Schmidt, F. L. 1990. Methods of meta-analysis. Correcting error and bias in research findings.
Thousand Oaks, CA: Sage.
Hunter, J. E., & Schmidt, F. L. 1994. Correcting for sources of artificial variation across studies. In H. M. Cooper
& L. V. Hedges (Eds.), The handbook of research synthesis: 323-336. New York: Russell Sage.
Hunter, J. E., & Schmidt, F. L. 2004. Methods of meta-analysis. Correcting error and bias in research findings (2nd
ed.). Thousand Oaks, CA: Sage.
James, L. R., Demaree, R. G., & Mulaik, S. A. 1986. A note on validity generalization procedures. Journal of
Applied Psychology, 71: 440-450.
Kemery, E. R., Dunlap, W. P., & Griffeth, R. W. 1988. Correction for variance restriction in point-biserial correla-
tions. Journal of Applied Psychology, 73: 688-691.
Koslowsky, M., & Sagie, A. 1993. On the efficacy of credibility intervals as indicators of moderator effects in meta-
analytic research. Journal of Organizational Behavior, 14: 695-699.
Law, K. S. 1995. The use of Fishers Z in SchmidtHunter-type meta-analyses. Journal of Educational and
Behavioral Statistics, 20: 287-306.
LePine, J. A., Podsakoff, N. P., & LePine, M. A. 2005. A meta-analytic test of the challenge stressorhindrance
stressor framework: An explanation for inconsistent relationships among stressors and performance. Academy
of Management Journal, 48: 764-775.
Lipsey, M. W., & Wilson, D. B. 2001. Practical meta-analysis. Thousand Oaks, CA: Sage.
MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. 2002. On the practice of dichotomization of quan-
titative variables. Psychological Methods, 7: 19-40.
MacMillan, I. C. 1991. The emerging forum for business policy scholars. Strategic Management Journal, 12: 161-165.
MacMillan, I. C. 1994. The emerging forum for business policy scholars. Journal of Business Venturing, 9: 85-89.
Martinussen, M., & Bjornstad, J. F. 1999. Meta-analysis calculations based on independent and nonindependent
cases. Educational and Psychological Measurement, 59: 928-950.
Muchinsky, P. M. 1996. The correction for attenuation. Educational and Psychological Measurement, 56: 63-75.
Paese, P. W., & Switzer, F. S. 1988. Validity generalization and hypothetical reliability distributions: A test of the
SchmidtHunter procedure. Journal of Applied Psychology, 73: 267-274.
Peterson, R. A., & Brown, S. P. 2005. On the use of beta coefficients in meta-analysis. Journal of Applied
Psychology, 90: 175-181.
Raju, N. S., & Drasgow, F. 2003. Maximum likelihood estimation in validity generalization. In K. R. Murphy (Ed.),
Validity generalization: A critical review: 263-285. Mahwah, NJ: Lawrence Erlbaum.
Raudenbush, S. W. 1994. Random effects models. In H. M. Cooper & L. V. Hedges (Eds.), The handbook of
research synthesis: 301-321. New York: Russell Sage.
Rosenthal, R. 1979. The file-drawer problem and tolerance for null results. Psychological Bulletin, 86: 638-641.
Rosenthal, R. 1991. Meta-analytic procedures for social research. Beverly Hills, CA: Sage.
Rosenthal, R. 1994. Parametric measures of effect sizes. In H. M. Cooper & L. V. Hedges (Eds.), The handbook of
research synthesis: 231-244. New York: Russell Sage.
Sackett, P. R. 2003. The status of validity generalization research: Key issues in drawing inferences from cumula-
tive research findings. In K. R. Murphy (Ed.), Validity generalization: A critical review: 91-114. Mahwah, NJ:
Lawrence Erlbaum.
Schmidt, F. L., & Hunter, J. E. 1999. Comparison of three meta-analysis methods revisited: An analysis of Johnson,
Mullen, and Salas (1995). Journal of Applied Psychology, 84: 144-148.
Schulze, R. 2004. Meta-analysis: A comparison of approaches. Cambridge, MA: Hogrefe & Huber.
Shadish, W. R., & Haddock, C. K. 1994. Combining estimates of effect size. In H. M. Cooper & L. V. Hedges (Eds.),
The handbook of research synthesis: 261-281. New York: Russell Sage Foundation.
Steel, P. D., & Kammeyer-Mueller, J. D. 2002. Comparing meta-analytic moderator estimation techniques under
realistic conditions. Journal of Applied Psychology, 87: 96-111.
Geyskens et al. / Meta-Analysis Practices in Management Research 419

Sturman, M. C. 2003. Searching for the inverted U-shaped relationship between time and performance: Meta-analy-
ses of the experience/performance, tenure/performance, and age/performance relationships. Journal of
Management, 29: 609-640.
VanderWerf, P. A., & Mahon, J. F. 1997. Meta-analysis of the impact of research methods on findings of first-mover
advantage. Management Science, 43: 1510-1519.
Viswesvaran, C., & Ones, J. I. 1998. Moderator search in meta-analysis: A review and cautionary note on existing
approaches. Educational and Psychological Measurement, 58: 77-87.
Wagner, J. A., III, Stimpert, J. L., & Fubara, E. I. 1998. Board composition and organizational performance: Two
studies of insider/outsider effects. Journal of Management Studies, 35: 655-677.
Wanous, J. P., Sullivan, S. E., & Malinak, J. 1989. The role of judgment calls in meta-analysis. Journal of Applied
Psychology, 74: 259-264.
Webber, S. S., & Donahue, L. 2001. Impact of highly and less job-related diversity on work group cohesion and
performance: A meta-analysis. Journal of Management, 27: 141-162.
Whitener, E. M. 1990. Confusion of confidence intervals and credibility intervals in meta-analysis. Journal of
Applied Psychology, 75: 315-321.
Williams, C. R. 1990. Deciding when, how, and if to correct turnover correlations. Journal of Applied Psychology,
75: 732-737.
Williams, C. R., & Livingstone, L. P. 1994. Another look at the relationship between performance and voluntary
turnover. Academy of Management Journal, 37: 269-298.