You are on page 1of 42

Journal of

Experimental Psychopathology
JEP Volume 2 (2011), Issue 2, 210–251
ISSN 2043-8087 / DOI: 10.5127/jep.010410

An SEM Perspective on Evaluating Mediation: What Every


Clinical Researcher Needs to Know

Erik Woody
University of Waterloo

Abstract
After a brief consideration of the definition and importance of mediation, statistical tests for mediation are
reviewed, including the joint significance of the two effects involved in the mediation, the Sobel test and
its variants, resampling with the bootstrap, Bayesian estimation using MCMC simulation, and the effect
ratio. A structural-equation-modeling (SEM) perspective on mediation then introduces the alternative
scenarios that could yield a false-positive mediation finding. Design-based, partial solutions are
advanced for problems of measurement, uncontrolled common causes, and temporal ordering that can
confound mediation analysis. Next, the issue of heterogeneity of effects and statistical interactions in
mediation analyses are addressed, including a discussion of moderated mediation and mediated
moderation. Finally, the relation of mediation analysis to experimentation is discussed, with attention to
the possibility of creatively integrating SEM-based mediation analysis and experimental design.
© Copyright 2011 Textrum Ltd. All rights reserved.
Keywords: Mediation; mediational analysis; bootstrap; Bayesian estimation; effect ratio; structural equation
modeling; moderated mediation; mediated moderation; experimental design
Correspondence to: Erik Woody, Department of Psychology, University of Waterloo, Waterloo, Ontario N2L 3G1,
Canada. Email: ewoody@uwaterloo.ca
Received 5-Aug-10; received in revised form 26-Oct-10; accepted 26-Oct-10
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 211

Table of Contents
Introduction
Some Definitions
Statistical Tests of Mediation
Joint significance of the two paths involving the mediator.
Sobel test and its variants.
Resampling with the bootstrap.
Bayesian estimation using MCMC simulation techniques.
“Partial” versus “full” mediation and the effect ratio.
Recommendations and power.
How Hard is It to Pass a Statistical Test of Mediation?
The Need for Explicit Models
An SEM-Based Perspective on Mediation
Alternative Models Having to do with Measurement Issues
Alternative Models Having to do with Uncontrolled (Omitted) Common Causes
Alternative Models Having to do with Temporal Ordering
Implications of Alternative Models
Multiple indicators.
Correlated errors.
Measuring and controlling for third variables.
Measuring and controlling for earlier levels of the same variables as in the model.
Recommendations
Heterogeneity and Interactions in the Evaluation of Mediation
Heterogeneity of Partial Slopes
Moderated Mediation and Mediated Moderation
The Relation between Mediation Analysis and Experimental Design
Integrating Mediation Analysis and Experimental Design
Concluding Recommendations
Acknowledgements
References
Appendix A: Demonstration of SEM-Based Methods for Evaluating Mediation
Joint significance of the two effects involved in the mediation.
Sobel test and its variants.
Bias-corrected bootstrap.
Bayesian estimation using the MCMC technique.
The effect ratio.
Appendix B: Demonstration of Two-Mediator, Two-Sample SEM and Contrasts Using Bayesian Custom-Estimands
in Amos

Introduction
In the last two decades, the field of psychology has witnessed a remarkable surge of interest in the
concept of mediation. Why has there been so much attention to this topic? We will see that both
conceptual and methodological developments have been key factors.
Zanna and Fazio (1982) pointed out that as a field of research advances, the nature of cutting-edge
questions in it goes through successive “generations”. First-generation questions address simply
whether variables are related. For example, does a particular treatment, compared to a control condition,
lead to improvement in the symptoms of a particular psychopathology? Once these relations are
established, research moves to later generations of questions. Second-generation questions address
boundary conditions: when, or under what circumstances, the variables are related. For example, does a
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 212

particular treatment lead to improvement in symptoms only for patients from particular backgrounds, or
only when the psychotherapist has particular kinds of training? Such second-generation questions
concern issues of moderation—that is, the conditions under which relations are enhanced or diminished.
Finally, third-generation questions address the processes that underlie relations between variables. For
example, does the treatment ameliorate symptoms because it produces changes in underlying
cognitions, which in turn produce symptom change? Such third-generation questions concern issues of
mediation—that is, the underlying processes (or mechanisms) that produce relations between variables.
As the foregoing sequence of questions illustrates, when any field of research matures, it naturally
moves toward questions about mediation. In this sense, the recent burgeoning of interest in issues of
mediation could be regarded as a healthy sign that many areas of psychological research have
advanced well beyond the initial stage of demonstrating relations among relevant variables. In addition,
psychological theories do not just make predictions about which variables should be related; they also
specify the underlying processes that are hypothesized to yield these relations. Hence, the burgeoning
interest in mediation attests to researchers’ concern for grounding their work in strong theoretical
foundations.

Some Definitions
The term mediation is somewhat unfortunate, because it is very easily confused with the term
moderation. Indeed, it is a rather common error, both in speech and in print, for a psychologist to use
one of these terms when he or she really means the other.
Thus, let’s make sure the meaning of these terms is clear. Consider that variable X has a relation with
variable Y. When we say that variable M is a mediator of this relation, we mean that X has an effect on
M, and M in turn has an effect on Y. That is, a mediator is an intermediary, a link in a causal chain. If we
could block the effect of X on M, or the effect of M on Y, or both, we would weaken (or possibly even
eliminate) the relation of X with Y.
In contrast, when we say that variable V is a moderator of the relation of X to Y, we mean that the form
or strength of this relation depends on the level of V. That is, a moderator tells us about relevant contexts
or boundary conditions. For example, there may be a level of V under which the relation of X with Y is
much weaker (or possibly even absent).
The concepts of mediation and moderation can also be combined. For example, in moderated mediation,
the level of a moderator variable V affects how strongly the mediating variable M links X and Y. For
example, change in underlying cognitions (M) may strongly mediate the relation of treatment (X) to
symptom amelioration (Y) only if the treatment model has been explained well to the client (V).

Statistical Tests of Mediation


In terms of impact on the world of psychological research, the methodological article on mediation by
Baron and Kenny (1986) was surely one of the most influential events in the last quarter of a century.
This article has now been cited more than ten thousand times. To clarify the concept of mediation, Baron
and Kenny adopted the logic of path analysis (or structural equation modeling), but without burdening the
reader with a detailed explanation of the crucial assumptions on which the soundness of such logic rests
(as covered more thoroughly, for example, in Kenny, 1979). The inadvertent result was that the article
had the effect of convincing a great many psychologists that evaluating mediation is a fairly
straightforward problem of applying appropriate statistical tests. Just as there are familiar statistical tests
to evaluate the first-generation question of whether X is related to Y, so there appeared to be statistical
tests to evaluate the third-generation question of whether M mediates the relation of X to Y. That is,
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 213

evaluating mediation appeared simply to involve performing such statistical tests and looking at the
resulting p-values.
The most widely known of such mediation tests is the Sobel test (Sobel, 1982, 1986). Use of such tests
has become pervasive in many areas of psychological research; indeed, Gilovich (2009) quipped, “Sobel
test is now the password for JPSP [the Journal of Personality and Social Psychology]”.
There are numerous other statistical tests of mediation that have been proposed and subsequently
evaluated to some extent in methodological research (e.g., see MacKinnon, Lockwood, Hoffman, West,
& Sheets, 2002; MacKinnon, Lockwood, & Williams, 2004; Shrout & Bolger, 2002). Nonetheless, all
these proposed tests rest on a common underlying logic, which can be represented in the form of a
structural diagram, as shown in Figure 1.

Figure 1: Structural diagram for a model with a single mediator, M, of the effect of X on Y

In the diagram, the letters on the paths denote partial regression coefficients for the relation between the
two variables connected by the path: a represents the effect of X on M; b represents the effect of M on Y,
holding X constant; and c represents the effect of X on Y, holding M constant. The error variable e1
represents the variance in M that is not explained by X, and the error variable e2 represents the variance
in Y that is not explained by M and X.
The path coefficients a, b, and c can be estimated readily either through multiple regression or structural
equation modeling (SEM) software. The diagram implies two regression equations: one for M as the
criterion variable, and another for Y as the criterion variable. A regression predicting M with X will yield
the parameter estimate for a; and a second regression predicting Y simultaneously with M and X will
yield the parameter estimates for b and c, respectively. These parameter estimates may be expressed in
either unstandardized form (B’s) or standardized form (betas). The standardized coefficients are
generally much easier for an audience to understand, because their interpretation does not depend on
knowledge of how each variable is scaled. As an alternative to using a multiple regression program, the
diagram shown in Figure 1 can be drawn into SEM software, such as Amos (Arbuckle, 2009). The
software will then estimate each of the coefficients. Both approaches will also yield estimates of the
standard errors for each of the coefficients.
Let’s assume standardized coefficients for simplicity. The indirect effect (IE) is the product of a times b:
IE = ab
The indirect effect represents the effect of X on Y that is mediated by M. For example, say that ab
equalled .3. Then if there were a one standard deviation increase in X, we would expect the causal chain
through M to yield an increase in Y of .3 standard deviations.
According to the model in the diagram, another part of the effect of X on Y is unmediated, as
represented by the coefficient c. This is called the direct effect (DE):
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 214

DE = c
For example, say that c equalled .2. Then if there were a one standard deviation increase in X, we would
expect that other processes, distinct from whatever M transmits, would yield an increase in Y of .2
standard deviations. A direct effect has as its necessary context the model in which it is embedded,
because it is always possible that a direct effect is really an indirect effect in which the relevant
mediators have yet to be discovered or specified.
Finally, the total effect (TE) of X on Y is the sum of the indirect effect plus the direct effect:
TE = IE + DE = ab + c
If the coefficients (a, b, and c) are in standardized form, then the total effect is equal to the correlation
between X and Y. Thus, this mediation model explains the correlation between X and Y as the sum of an
indirect effect (mediated by M) and a direct effect (unmediated by M).
Coming up with a statistical test for mediation is rather tricky, because it not only simultaneously involves
two parameter estimates, a and b, but also their product, ab. Baron and Kenny (1986; see also Judd &
Kenny, 1981) proposed a sequence of four “steps” to evaluate mediation. Although these steps are very
widely known and served as the foundation for much later thinking, they should now be regarded as
obsolete. One weakness of this approach is that there is a consensus among methodologists that Baron
and Kenny’s proposed first step—to verify that there is a statistically significant relation between X and
Y—may be ill-advised (MacKinnon & Fairchild, 2009; MacKinnon et al, 2002; Shrout & Bolger, 2002). In
some instances, such as when power is relatively low or there is statistical suppression, the data would
fail this first test, yet would correctly support the hypothesis of mediation using other approaches.
Of the many other statistical tests for mediation that have been proposed, five of them merit specific
attention, as follows. (Practical information about performing each of these tests, together with a
numerical example, is provided in Appendix A.)

Joint significance of the two paths involving the mediator.


According to this simple rule of thumb, we infer support for mediation if two conditions are met: path a is
statistically significant, and path b is also statistically significant. No further calculations are needed for
this method. Even though this test is simply the second and third of Baron and Kenny’s steps, in
methodological evaluations with simulated data it consistently outperforms the full set of four steps (e.g.,
Fritz & MacKinnon, 2007; MacKinnon et al., 2002). In addition, few psychologists seem to be aware that
this simple rule also outperforms the much more widely used Sobel test (MacKinnon et al., 2002); for
example, it is consistently more powerful than the Sobel test (Fritz & MacKinnon, 2007). Thus, the
strengths of this test are that it is the simplest to use and it works very well compared to the alternative
tests. Its main weakness is that it does not yield a confidence interval for the indirect (mediated) effect.

Sobel test and its variants.


The approach taken in the Sobel method and its variants is to perform a statistical test of the product ab.
To do this, we need to know the standard error of the product ab. According to Sobel (1982, 1986), this
standard error can be estimated with the following expression, using unstandardized estimates for a and
b:

𝑆𝐸𝑎𝑏 = �𝑎2 𝑆𝐸𝑏2 + 𝑏 2 𝑆𝐸𝑎2


Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 215

Variants of this method involve other, slightly different ways to estimate this standard error. For example,
a formula derived by Goodman (1960) includes a third term (typically quite negligible in size) under the
square root sign:

𝑆𝐸𝑎𝑏 = �𝑎2 𝑆𝐸𝑏2 + 𝑏 2 𝑆𝐸𝑎2 − 𝑆𝐸𝑎2 𝑆𝐸𝑏2

Once we have an estimate of the standard error, a z-test may be obtained by dividing the value of ab by
it:
𝑎𝑏
𝑧=
𝑆𝐸𝑎𝑏
For p < .05, the criterion z value is the usual 1.96; any z value greater than this would be interpreted as
support for the hypothesis of mediation. (Alternatively, if the analyst prefers, the value of z can be
converted into an exact p-value by using a z table or the corresponding function in a spreadsheet such
as Excel.) In addition, we can obtain a 95% confidence interval for the indirect effect as follows:
𝑎𝑏 ± 1.96𝑆𝐸𝑎𝑏
This approach appears to be an advance over the joint significance rule in two ways: A test statistic is
actually calculated for the indirect effect, and a confidence interval may be computed. However, it has a
potentially serious shortcoming: The sampling distribution for ab is often not normal (instead, it is
skewed), and this lack of normality makes both the significance test and the confidence interval
somewhat inaccurate. In particular, methodological evaluations with simulated data show that the Sobel
test can have low power: It generates a relatively high rate of Type II errors (MacKinnon et al., 2002).
One way to attempt to improve the Sobel test is to evaluate the statistic according to the actual non-
normal shape of the sampling distribution. In a corresponding fashion, a more correct (asymmetric)
confidence interval can also be computed. In MacKinnon et al. (2002), this approach is called the z' test,
and the empirical critical value for the .05 significance level is proposed to be .97, rather than the 1.96
based on normal theory. Obviously such a difference in the criterion value could be important for many
studies. Unfortunately, this z' test, although more powerful, tended to introduce a new defect: It yielded a
relatively high rate of Type I errors when either path (a or b) is zero in the population and the other path
is not (MacKinnon et al., 2002).
More recent work by MacKinnon and his colleagues has taken a somewhat different approach that
appears to overcome such limitations. Available is a downloadable program called PRODCLIN that
computes a confidence interval for the mediated effect based on the non-normal distribution of the
product (MacKinnon, Fritz, Williams, & Lockwood, 2007). This improved procedure has performed well in
simulation studies (Fritz & MacKinnon, 2007).
In summary, the strengths of the Sobel test are that it is very widely known and produces statistics that
look precise—e.g., one can compute an exact p-value and derive a confidence interval for the mediated
effect, ab. Weaknesses are the Sobel test may have relatively low power (relatively high rate of Type II
errors), and the estimated confidence interval may be somewhat inaccurate. These shortcomings may
be overcome by special software (i.e., PRODCLIN) that takes into account the non-normality of the
product ab.

Resampling with the bootstrap.


Another way to deal with the non-normality of the sampling distribution of the product ab is to use
resampling, also called “the bootstrap”. In this approach, many new samples are created from the data
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 216

by random selection of cases with replacement (in each of the samples, any case can be included more
than once). From the estimates of ab computed from each of hundreds or thousands of such new
samples, the characteristics of the sampling distribution of ab can be inferred without assuming
normality. Shrout and Bolger (2002) provide a more detailed explanation of this approach.
Rather than what is called the percentile bootstrap, slightly more accurate results are obtained with the
bias-corrected bootstrap (Shrout & Bolger, 2002; Fritz & MacKinnon, 2007). Although the bootstrap
procedure is very computationally intensive, it is easy to do using some SEM software, particularly Amos
(Arbuckle, 2009). The Amos output provides the upper and lower bounds for a confidence interval of any
level specified by the analyst (e.g., 95%), as well as an exact p-value to test the indirect effect against
zero. The bootstrap can be performed on either the unstandardized or the standardized indirect effect (or
both); typically the obtained p-values for these are very similar, but not exactly equal. (Amos provides
bootstrapped standard errors for the indirect effect as well, but the confidence intervals and associated
p-values are more useful because they entirely avoid assuming a symmetric sampling distribution.)
An alternative to using SEM software to perform a bootstrap analysis of mediation is a stand-alone
program called indirect.sps devised by Preacher and Hayes (2008). This program runs using the SPSS
Statistics Basic Script Editor and has an easy-to-use interface. One advantage of this program is that it
readily handles multiple simultaneous mediators (to be discussed later).
In methodological evaluations with simulated samples, the bias-corrected bootstrap test for mediation
performs extremely well (MacKinnon et al., 2004). For example, among the various mediation tests
investigated by Fritz and MacKinnon (2007), the bias-corrected bootstrap was consistently the most
powerful. Some analysts believe that the bias-corrected bootstrap is the first choice among the currently
available mediation tests (e.g., Shrout & Bolger, 2002). A possible weakness is that it may have a
tendency toward somewhat inflated Type I error rates under some conditions (see MacKinnon et al.,
2004).

Bayesian estimation using MCMC simulation techniques.


Yet another way to deal with the non-normality of the sampling distribution of the product ab is to use
Bayesian estimation based on Markov chain Monte Carlo (MCMC) techniques. In a sense, this approach
is somewhat similar to the bootstrap, in that it is extremely computationally intensive, involving the
generation of thousands of simulated analysis samples. The Bayesian approach has now been
implemented in Amos; straightforward instructions for its use are provided in Examples 26-29 of
Arbuckle (2009). In particular, Examples 28 and 29 specifically illustrate the Bayesian estimation of
mediation statistics.
One possible disadvantage of this approach is that it is based on Bayesian statistics, which may be
unfamiliar to many psychologists and involves some new terminology. In Amos, the Bayesian analysis of
an indirect effect yields a marginal posterior density distribution for the product ab. The mean of this
distribution, called the posterior mean, corresponds to the conventional point estimate of the indirect
effect. The standard deviation of the distribution, called the posterior standard deviation, corresponds to
the conventional standard error. The marginal posterior distribution may have any kind of skew or other
non-normality, as readily shown by plotting it. By finding the 2.5 percentile and 97.5 percentile points in
this distribution, the Bayesian 95% credible interval may be obtained, which corresponds to the
conventional confidence interval. Finally, by evaluating the proportion of the marginal posterior
distribution that lies in the tail or tails (depending on whether we want a one- or two-tailed test), we can
obtain a Bayesian type of p-value to test the indirect effect against zero (or any other value).
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 217

Because the Bayesian approach is quite new and has not yet been used much, it is difficult to know how
it compares to the foregoing approaches. The present author’s experience has been that it produces
results that correspond closely to those obtained with the bias-corrected bootstrap. With modest sample
sizes, the p-values seem to be very slightly more conservative for the Bayesian approach compared to
the bootstrap; however, there is some reason to believe that the Bayesian approach may be more
appropriate for relatively small samples (Arbuckle, 2009). In any case, fuller knowledge of the properties
of Bayesian estimation for mediation will depend on more formal methodological evaluations.
Despite its newness and lack of familiarity, the Bayesian approach to testing mediation, as implemented
in Amos, is an attractive option to consider for at least two reasons. First, the option of estimating and
statistically evaluating any user-defined quantity (or estimand) makes the approach extremely flexible.
To illustrate, Example 29 in Arbuckle (2009) shows the estimation and significance testing of the
difference between the direct effect and the indirect effect. It is also possible to evaluate, for example,
multiple mediators across multiple groups, as illustrated later in this article. Second, the plot of the
marginal posterior density distribution available in Amos allows the analyst to directly examine the
posterior probabilities of the effect under consideration and see whatever non-normality is present. The
next section of this article provides an illustration of how such an examination may be informative.

“Partial” versus “full” mediation and the effect ratio.


Although tests of mediation focus mainly on the statistical significance of the product ab, a second major
focus in many published analyses is the direct effect, c as shown in Figure 1. (To reiterate, c represents
the effect of X on Y holding M constant.) Specifically, if c is not significantly different from zero, many
researchers claim to have demonstrated “full” mediation, implying that M is the sole mediator of the
effect of X on Y. If instead c is significantly different from zero, then such analysts claim to have shown
only “partial” mediation, implying that some of the effect of X on Y is not mediated by M. The extra value
that researchers perceive in demonstrating “full” rather than “partial” mediation is another legacy of the
seminal article by Baron and Kenny (1986).
Unfortunately, such claims of demonstrating “full” mediation are almost invariably dubious because they
involve the fallacy of proving the null hypothesis—that is, asserting that if an effect is not statistically
significantly different from zero, then it must be zero. In a similar vein, Shrout and Bolger (2002) pointed
out that much greater statistical power than is typical for psychological studies would be needed to make
a compelling case for full rather than partial mediation. In short, especially with the modest sample sizes
that psychologists often have to work with, failure to show that c is statistically significant should not be
regarded as a demonstration of full mediation.
By the way, the same type of issue also underlies why one fairly common, informal method for
evaluating mediation is faulty—specifically, showing that when M is held constant, the relation between X
and Y becomes statistically insignificant. Consider, for example, a correlation between X and Y with p =
.04; when M is held constant, the resulting partial regression coefficient (or partial correlation) has p =
.06 (i.e., becomes “insignificant”). It should be intuitively obvious that such a difference is not
demonstrating anything; thus, the insignificance of the partial relation is not necessarily informative.
Instead, what is more germane is whether the unpartialled relation (rXY) and partial relation (c) are
significantly different. Normally (e.g., for ordinary least squares statistics), this question is equivalent to
ascertaining whether the product ab is significant. We can show this equivalence readily with some
simple algebra for the standardized coefficients (referring to Figure 1):
𝑟𝑋𝑌 = 𝐼𝐸 + 𝐷𝐸 = 𝑎𝑏 + 𝑐
𝑟𝑋𝑌 − 𝑐 = 𝑎𝑏
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 218

Returning to the underlying question of how important M is as a mediator, an alternative to the “full”
versus “partial” dichotomy is a continuously graded statistic called the effect ratio. This statistic, for which
Shrout and Bolger (2002) use the symbol 𝑃�𝑀 , is defined as the ratio of the indirect effect to the total
effect:
𝐼𝐸
𝑃�𝑀 =
𝑇𝐸
For one mediator (M in Figure 1),
𝑎𝑏
𝑃�𝑀 =
𝑎𝑏 + 𝑐
In other words, the effect ratio is the proportion of the total effect of X on Y that is mediated by M. (Thus,
“full” mediation would imply an effect ratio of 1.)
The effect ratio is an intuitively appealing index. To illustrate, the sample data in Appendix A yield an
effect ratio of .37, indicating that 37% of the effect of X on Y is mediated by M. This seems much
sounder than interpreting the non-significance of the direct effect (c) as support for “full” mediation.
Frequency

Effect Ratio

Figure 2: Marginal Posterior Density Distribution of the Effect Ratio

Nonetheless, the effect ratio has some important limitations. One is that it becomes nonsensical if there
is any suppression, whereupon its estimate can take on values less than 0 or greater than 1, which have
no sensible interpretation as a ratio. In the context of our mediation model, suppression means that the
direct and indirect effects of X on Y are opposite in sign (i.e., ab and c have opposite signs). As an
example, consider X as shyness, M as amount of studying in school, and Y as post-school occupational
success. Shyer people may study harder (rather than partying), and this better studying, in turn, may
mediate greater post-school success. However, their shyness may also have a direct negative effect on
occupational success. Such a model makes conceptual sense, but the effect ratio is meaningless for it.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 219

For example, if the indirect effect (via studying) and the direct effect were of equal magnitude, the
denominator of the effect ratio would be zero.
Another important limitation of the effect ratio is that it has a surprisingly large amount of sampling error
and an extremely skewed sampling distribution. Therefore, it tends to be only a very rough estimate
unless the sample size is quite large (e.g., greater than 500). To illustrate, Figure 2 shows a graph of the
marginal posterior density distribution of the effect ratio for a small sample (N = 50), as calculated using
the user-defined-estimand option in the Bayesian estimation procedure of Amos (Arbuckle, 2009).
(Appendix A provides relevant details about how this analysis was performed.) Note that the distribution
of the effect ratio is extremely broad and extremely skewed. It is obvious that we actually know rather
little about the effect ratio on the basis of these data: There is a huge range of reasonably probable
values. Indeed, the corresponding 95% credible interval ranges from .04 to .89! Also note that, because
of the skew, the posterior mean in this case is not a particularly good point estimate: 37% is not the most
probable value; instead, a value about 10 percentage points lower is more probable. Most importantly,
these results rather dramatically illustrate the point that with a modest sample size, a point estimate of
the effect ratio may not be particularly informative.

Recommendations and power.


Painstaking work by MacKinnon and his colleagues (MacKinnon et al. 2002, 2004; Fritz & MacKinnon,
2007) has shown that the joint-significance criterion, the PRODCLIN-based procedure, and the bias-
corrected bootstrap consistently outperform (i.e., yielding lower rates of Type I and II errors) the more
widely used four steps of Baron and Kenny (1986) and Sobel test. The Bayesian SEM-based approach,
although as yet much less investigated, appears to yield results fairly comparable to the bias-corrected
bootstrap. An advantage of the PRODCLIN-based procedure, the bias-corrected bootstrap, and the
Bayesian approach over the joint-significance criterion is that they provide good estimates of the
confidence interval for the mediated effect.
Given that the main shortcoming of the Sobel test appears to be low power (MacKinnon et al., 2002), it
may be reasonable to assume that if the Sobel test is statistically significant, then better tests would
likely lead to the same conclusion. A similar argument likely applies to the Baron and Kenny (1986) four
steps (given that the joint-significance criterion is steps 2 and 3 alone). However, neither of these
approaches provides a good estimate of the confidence interval, and, if a confidence interval is not
wanted, it is hard to understand why analysts would prefer them to the much simpler joint-significance
criterion.
Especially for these not-recommended tests of mediation, statistical power may be relatively low in many
psychological studies. Considering that the Baron and Kenny (1986) approach is still the most widely
used test, the following quote from MacKinnon and Fairchild (2009) is memorable: “The Baron and
Kenny causal-steps approach required approximately 21,000 subjects for adequate ability to detect an
effect when the effect sizes of the a and b paths were of small strength and all of the relation of X to Y
was mediated” (p. 17). For the purposes of grant applications and other research proposals, an
invaluable, straightforward resource for estimating statistical power is Fritz and MacKinnon (2007; see
Table 3, p. 237). However, it is well to keep in mind that such power estimates may be somewhat
simplistic, given that they do not take account of the effects of measurement error, confounds, and other
issues discussed in the later parts of this article.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 220

How Hard is It to Pass a Statistical Test of Mediation?


There is a general perception among psychologists that it is a noteworthy achievement to show one’s
data pass a statistical test of mediation. Many recently published articles would probably not have been
regarded as publishable without such a demonstration, and for many articles in some areas of
psychology, a mediation test is the centerpiece of the results.
Unfortunately, it does not seem to be widely understood that it can be remarkably easy to pass tests of
mediation, even if no mediation whatever is actually present. A good way to appreciate this is with a
straightforward demonstration.
The Restraint Scale is a ten-item measure of the tendency to be chronically dieting (Herman & Mack,
1975). Psychometrically, it is similar to thousands of other scales used in psychological research, with
fairly good internal consistency. In addition, it is useful for our purposes because I assume that no
readers would find mediation to be a plausible hypothesis for the relations within subsets of items from
such a scale. Nonetheless, using a real sample from 80 respondents, let’s see what happens when we
apply a test of mediation to various subsets of the items. We will apply the highly recommended bias-
corrected bootstrap approach, using Amos.
To begin, let’s treat the items #1, #2, and #3 of the scale as X, M, and Y, respectively. The bias-
corrected bootstrap test of mediation yields a two-tailed p-value of .003. The standardized estimates of
paths a and b are .33 and .51, respectively, both significant at the p < .01 level; whereas the
standardized estimate of path c is .07, p = .48. From these results, some analysts would be willing to
declare support for “full” mediation!
How about another, slightly different subset: items #1, #2, and #4 as X, M, and Y, respectively? The test
of mediation for these three measures yields a two-tailed p-value of .002. How about further along in the
scale: items #6, #7, and #8? The test of mediation for these three measures yields a two-tailed p-value
of .002. How about sets of items in other orders, such as items #6, #4, and #2 as X, M, and Y,
respectively? The test of mediation for this set of three yields a two-tailed p-value of .01. Indeed,
although not all other sets of 3 items pass the statistical test of mediation, a high proportion of them do.
Obviously we could “discover” countless other, similarly bogus mediated relations in a great many other
such datasets. In addition, this insight may be readily generalized beyond scale items. Meehl (1990)
noted that in the social sciences “everything correlates to some extent with everything” (p. 123) for
relatively trivial or uninteresting reasons. Lykken (1968) called this tendency the ambient correlation
noise, and Meehl dubbed it the crud factor. Across several areas of psychology, Meehl roughly
estimated the magnitude of the crud factor at about r = .3. Based on this estimate, the crud factor alone,
together with a sample size of 60 or more, is sufficient to pass a statistical test of mediation at p < .05,
even in the absence of any real mediation.
One might hope that researchers would be able to disregard many of the numerous possible instances
of bogus mediation findings by discerning that the constructs represented by the three variables do not
really fit the concept of mediation. However, this hope seems dubious. To illustrate, consider the content
of the variables for the first example in the foregoing demonstration:
X: 1. How often are you dieting?
M: 2. What is the maximum amount of weight (in pounds) you have ever lost within one month?
Y: 3. What is your maximum weight gain within a week?
Post hoc at least, it is easy to fashion a mediation scenario to this set of variables: Chronic dieting (X)
tends to lead to instances of great weight loss (M), which in turn leads to a boomerang effect as
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 221

physiologically based mechanisms promote subsequent weight gain (Y) (e.g., see Herman & Polivy,
1980). The problem is that it is very hard to tell whether the statistical test of mediation is really
evaluating such a scenario, rather than simply reflecting ambient correlation noise. Also relevant is work
by Gergen (1989) showing that people can readily invent psychological scenarios to explain why virtually
any three variables fit together, no matter what they are.

The Need for Explicit Models


Clearly something is wrong: It is very possible to pass a statistical test of mediation even if there is, in
actuality, no mediation at all. How can this be?
The problem is not in the numbers or the statistical tests themselves; it is in the inference drawn from the
tests: “If significance is found, mediation is considered to be present” (Fritz & MacKinnon, 2007, p. 235).
All psychologists are taught that one cannot infer a causal linkage from a correlation. A test of mediation
involves a double jeopardy: two successive causal links. Yet, with tests of mediation, the assumptions
that would be necessary to support an inference of mediation from a positive test result readily fall into
the background, out of view.
It is, in fact, no sounder to call these statistics “tests of mediation” than it would be to call the significance
test of a correlation a “test of causation”. They are more properly significance tests of the product of two
regression weights. Whether a positive test result means more than this depends crucially on a set of
assumptions that are challenging to evaluate fully, in the same sense that it is often difficult to evaluate
the assumptions that underlie drawing an inference of causation from a statistically significant
correlation.
A statistical test of mediation is always relative to a model, such as the one shown in Figure 1, and
inferences drawn from the test are provisional, in the sense that we need to preface our inference with
the qualifier, “If this model were correct, …” Hence, the underlying model needs to be made explicit.
Even the very simple three-variable model shown in Figure 1 is potentially informative, because it should
encourage analysts and their audiences to consider what is missing from the model – e.g., confounding,
third variables. In addition, we need to be able to rule out, conceptually or empirically, alternative models
that do not involve the mediation hypothesized by the researcher, but could otherwise yield the same
results. Making these alternative models explicit, using a structural-equation-modeling (SEM) framework,
is the theme of the next part of this article.

An SEM-Based Perspective on Mediation


As just explained, it is important to realize that the statistical significance of an indirect effect is, by itself,
a surprisingly lax criterion for demonstrating mediation. In many published studies of mediation, the
authors make little or no attempt to consider models that are plausible alternatives to mediation and
would fit the data just as well. Indeed, researchers may be unaware of the existence of such alternative
models, none of which are evaluated by commonly used statistical tests of mediation. In addition,
because the application of procedures like the Sobel test is so straightforward, requiring just three
measures, researchers may use a research design that is too limited to effectively address these
alternative models.
To illustrate the estimation of different models, we will use data from a published study (Wiebe &
McCabe, 2002). (From the statistics provided in the article, it is possible to recover the covariance matrix
needed for these re-analyses.) Although based on a somewhat modest sample size (N = 55), these
data are reasonably typical of those for a great many published tests of mediation. Figure 3 shows the
mediation model proposed by the researchers, together with the standardized regression coefficients
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 222

estimated from the data. Rather than labelling the putative mediator with M, as we did in Figure 1, here it
is denoted by Y1, indicating that it is the first explained (or “endogenous”) variable in the model. We use
this notation because in some alternative models, Y1 will not be a mediator. The putative outcome
variable, on which X exerts a mediated effect through Y1, is denoted by Y2, indicating that it is the
second explained variable in the model.

Figure 3: Y1 as mediator of the effect of X on Y2

Although the original authors tested mediation by using the Sobel test on coefficients estimated with
multiple regression, we may readily verify statistical significance using the bias-corrected bootstrap
procedure of Amos. The resulting p-value for the indirect effect is .027.
Although this result is encouraging, we need to keep in mind that the only alternative models such a test
allows us to reject are those in which one or both of the paths involving the proposed mediator, Y1, are
zero in the population. Other possible alternative models become plausible if one or more of three types
of assumption underlying the mediation model are not met:
1. Measurement: The variables need to have been measured with high reliability and validity.
2. No omitted variables: The model must not have omitted any important confounds—that is, common
causes or third variables.
3. Correct temporal ordering: The variables must be in the correct temporal sequence, with X preceding
Y1, which in turn precedes Y2.
Let’s look at the alternative models that result from difficulties in meeting each of these three types of
assumption.

Alternative Models Having to do with Measurement Issues


Imperfect reliability may affect the results of the mediation analysis shown in Figure 3 in important ways.
The lower the reliability of the measure of Y1, the more the model will tend to underestimate the size of
the mediated effect, ab. Obviously, modest reliability of Y1 could make a statistical test of mediation
insignificant, even when this hypothesis is actually true. Modest reliability of the measure of X may also
distort the results, in potentially complex ways (see Dwyer, 1983, pp. 211-219). When X is an
experimentally manipulated variable (e.g., treatment group versus control), reliability is most likely not an
issue; nonetheless, even in such instances the variable X is “measured,” in the sense that the condition
to which each participant was assigned needs to have been recorded correctly.
Three interesting alternative models arise from problems with the validity of the measurement of the
variables. First, Panel A of Figure 4 shows a model in which Y1 is an inadvertent measure of the same
outcome variable as Y2. For the purposes of this model, the underlying outcome is represented as a
latent variable (or construct), LY, and there are two measures (or “indicators”) of this latent variable, Y1
and Y2. The error variable d1 (also known as a “disturbance”) represents variance in the underlying
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 223

outcome that is not explained by X, and the error variables e1 and e2 represent measurement error in
Y1 and Y2, respectively. Estimation of this model results in the standardized path coefficients shown on
the diagram. These estimates are quite plausible, and indicate, for example, that Y1 and Y2 are about
equally good measures of the outcome (correlating .62 and .56 with it, respectively).
The essential point here is that if the model in Panel A of Figure 4 is the state of affairs that generated
the data, then the mediation model of Figure 3 will show statistically significant mediation even though
there was, in fact, no mediation. In short, another measure of the outcome can readily masquerade as a
“mediator”. In addition, note that this model, like all the other alternative models we will be considering,
fits the data perfectly. Thus, there is no way to show statistically that it is inferior to the mediation model
of Figure 3.
Second, Panel B of Figure 4 shows a model in which Y1 is an inadvertent measure of the same predictor
variable as X. The underlying predictor is represented as a latent variable, LX, and there are two
measures of it, Y1 and X. Error variables e1 and e3 represent measurement error in Y1 and X,
respectively, and e2 represents both unexplained variance and measurement error in the outcome, Y2.
Estimation of this model results in the standardized path coefficients shown on the diagram, which again
are quite plausible.
If this model represents the state of affairs that generated the data, then the mediation model of Figure 3
will likewise show statistically significant mediation even though there was, in fact, no mediation. Thus,
what this model demonstrates is that another measure of the predictor can also masquerade as a
“mediator”.
Finally, a more subtle measurement problem is shown in Panel C of Figure 4. In this model, Y1 is
correlated with the mediating variable, but is not actually the mediator. The latent variable LM represents
the underlying mediator, and measure Y1 serves as an indicator of it. The error variable e1 represents
measurement error in Y1, whereas d1 represents variance in the underlying mediator that is not
explained by X. As before, e2 represents both unexplained variance and measurement error in the
outcome, Y2. Estimation of this model results in the standardized path coefficients shown on the
diagram: The latent variable LM serves as a mediator, as shown by the paths of .49 and .46, and the
measure Y1 correlates .70 with this underling mediator. Furthermore, the remaining, direct effect of X
(i.e., unmediated by LM) appears to be negligible (.09).
If this model represents the state of affairs that generated the data, then the mediation model of Figure 3
will again show statistically significant mediation. But why is this a problem? A classic illustration was
provided by Myers (1979, p. 430-431). Imagine that X is two different methods of teaching (e.g., a
dummy or effect-coded variable), Y1 is a measure of each student’s amount of study time, and Y2 is an
outcome measure of how much each student learned. The results like those shown in Figure 3 would
suggest that amount of study time mediates the effect of teaching method on the amount learned. That
is, study time is the “active ingredient”. However, this conclusion could be quite incorrect. Consider LM in
Panel C of Figure 4 as the variable motivation. One teaching method may yield better outcomes than the
other method because it is more motivating, and study time may simply serve as an indicator of such
motivation. If so, then study time may not be the active ingredient at all. More generally, anything that
correlates with the real mediating variable can look like the mediator, even if it is only an indicator (that
is, a side effect) of the mediator.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 224

A) Y1 as an inadvertent measure of the outcome

B) Y1 as an inadvertent measure of the predictor

C) Y1 as an indicator, but not the mediating variable

Figure 4: Alternative Models Having to do with Measurement Issues

In summary, shortcomings in the validity of the measures can lead to a statistical test of mediation
supporting the hypothesis of mediation even when mediation is actually absent. In addition, all three of
the alternative models we have just considered yield perfect fit to the data, and hence they cannot
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 225

normally be ruled out statistically. One way around this impasse is to collect multiple measures for at
least some of the variables, as we shall see later.

Alternative Models Having to do with Uncontrolled (Omitted) Common Causes


Once a mediation analysis is depicted as a structural model, as in Figure 3, a question that needs to be
asked is whether it is safe to have left out other, potentially relevant variables. Prime candidates are
uncontrolled common causes.
First, Panel A of Figure 5 shows a model that uses an SEM convention to depict an omitted common
cause or causes of Y1 and Y2. In this model, Y1 does not have an effect on Y2, as indicated by the
absence of a path. Instead, their relation is attributable to the correlation of their error variables, e1 and
e2, representing the effect of a common cause or causes. A simple example would be measuring both
Y1 and Y2 with self-report or some other common method, such that these variables share systematic
biases or other irrelevant variance. This model, like the previous ones, fits the data perfectly and yields
plausible estimates.
Some methodologists have pointed to this type of scenario as the nearly inescapable Achilles’ heel of
mediation analysis (Bullock, Green, & Ha, 2008, 2010). Bullock and his colleagues (2010, p. 551) argued
that the errors e1 and e2 “are almost certain to covary,” either because an unobserved variable affects
both Y1 and Y2, or because Y1 is correlated with another, unobserved mediator. This problem
introduces bias that leads to overestimation of mediation in the model of Figure 3, and hence it may lead
to a finding of “mediation” even when Y1 has no genuine effect on Y2. Moreover, this bias cannot be
overcome through experimental manipulation and random assignment: Random assignment of X does
not address the correlation of e1 with e2, and random assignment of Y1 would necessarily wipe out the
relation of Y1 with X, making mediation analysis pointless.
Panel B of Figure 5 shows a model in which X does not have an effect on Y1, as indicated by the
absence of a path; instead, their relation is depicted as an unanalyzed correlation, representing the
effect of an unspecified common cause or causes. This scenario is a threat whenever X is not a
randomly assigned variable. For example, if treatment is not randomly assigned, background factors that
led a participant to join one type of treatment rather than another may produce the relation between the
treatment variable X and Y1, rather than any genuine effect of treatment. This model, too, fits the data
perfectly and yields plausible estimates.
Finally, Panel C of Figure 5 shows a model in which there are no causal linkages among X, Y1, and Y2
at all. Instead, all three variables are effects of a common underlying influence or factor, F. As before,
this model fits the data perfectly and yields estimates that are highly plausible. This is the type of
common-factor model that is conceptually appropriate for the Restraint Scale items that we considered
earlier. The essential point here is that even if a simple common factor generated the data, the mediation
model of Figure 3 will, nonetheless, show statistically significant “mediation”.
In summary, omitted common causes can readily produce a supportive finding for a statistical test of
mediation even when mediation is actually absent. In addition, once again, all three of the alternative
models we have just considered yield perfect fit to the data and hence cannot normally be ruled out
statistically. Manipulation and random assignment for X can help ensure against the scenarios in Panels
B and C, but it does not address the scenario in Panel A. Partial solutions to the problem in Panel A may
require good guesses about the nature of the omitted common causes, as we shall explore later.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 226

A) Relation of Y1 to Y2 due to omitted common cause

B) Relation of X to Y1 due to omitted common cause

C) Common factor model

Figure 5: Alternative Models Having to do with Uncontrolled Common Causes

Alternative Models Having to do with Temporal Ordering


A third set of possible alternatives consists of models that involve mediation, but not the type of
mediation proposed by the researchers (as shown in Figure 3). These alternative models involve other
temporal orders for the three variables.
A crucial assumption of the mediation model of Figure 3 is that all paths in the reverse direction (e.g.,
from Y1 to X, from Y2 to Y1, and from Y2 to X) are zero—in other words, that they can safely be omitted
from the model. Hence, violations of this assumption are sometimes referred to as the problem of
“reverse causation”.
Panel A of Figure 6 shows a model in which Y1 is the effect of Y2, rather than its cause. Panel B shows
a model in which Y1 is the cause of X, rather than its effect. Finally, Panel C shows a model with causal
ordering that is the complete reverse of the model in Figure 3: unlike the models in Panels A and B, Y1 is
indeed a mediator, but it mediates the effect of Y2 on X, not vice versa.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 227

A) Y1 as the effect of Y2 (rather than its cause)

B) Y1 as the cause of X (rather than its effect)

C) Reverse causal ordering

Figure 6: Alternative Models Having to do with Temporal Ordering

All three of these models involving temporal orders that are alternatives to the one specified by the
researchers fit the data perfectly and yield highly plausible standardized path coefficients for our
illustrative data, as can be seen from the values in Figure 6. The essential point here is that even if some
other temporal ordering, which reverses the direction of one or more of the researchers’ hypotheses,
actually generated the data, the incorrect mediation model of Figure 3 will nonetheless pass a statistical
test of mediation.
What considerations allow us to shuffle around the temporal order of the variables like this? If the
variables have been measured cross-sectionally (i.e., at the same time), then the order of the variables
proposed in the model represents a set of assumptions that are typically untestable. Alternative temporal
orders may be just as plausible and fit the data equally well. Even if the variables have actually been
measured in the temporal order that corresponds with the researcher’s favored model, this does not
necessarily ensure that the temporal sequence proposed in the model is correct. For example, if
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 228

attitudes were measured at a time well before the time of measuring sex, this would not ensure that it is
attitudes that have an effect on sex, rather than vice versa.
Sometimes researchers try to turn their uncertainty about the correct temporal order into an empirical
question to be answered with the data. That is, they try different orders and see which resulting set of
path estimates seems most plausible; or they omit the direct effects from the models (e.g., remove path
c from the model in Figure 3), in which case the models no longer necessarily fit perfectly and thus may
possibly fit better or worse than one another. These are faulty strategies, because they involve mutually
contradictory assumptions. For example, consider the comparison between the model in Figure 3 and
the model in Panel A of Figure 6, which pose alternative directions of effect for Y1 and Y2. In the model
of Figure 3, correctly estimating the effect of Y1 on Y2 requires the assumption that Y2 has no effect on
Y1. In the model in Panel A of Figure 6, correctly estimating the effect of Y2 on Y1 requires the
assumption that Y1 has no effect on Y2. If the researcher is so uncertain about the direction of effects, a
more plausible model is one with reciprocal causation, which includes the paths in both directions.
Unfortunately, a model with reciprocal causation can only be estimated under special circumstances,
beyond the three-variable case we are considering here (e.g., with longitudinal data in which variables
are remeasured at multiple times—see Dwyer, 1983—or with non-hierarchical models involving
instrument variables—e.g., see Sadler & Woody, 2003). In short, researchers who are quite uncertain
about the correct temporal ordering of their variables should not be evaluating mediation.
Even if the temporal order of a model is correctly specified, the only data available for estimating it may
be cross-sectional (measured at the same time). In such cases, the measured levels of at least some of
the variables must serve as proxies for their levels at earlier times, when the proposed causal processes
were actually occurring. Unfortunately, as Maxwell and Cole (2007) showed, such data can yield
substantially biased estimates for the model.

Implications of Alternative Models


The alternative models we have just considered dramatically illustrate the assumptions necessary for
estimated parameters to be interpreted as a mediated causal chain. In essence, we have portrayed each
of various potential threats to internal validity as a structural diagram, thus using SEM as the “medium of
methodological discourse” (Dwyer, 1983, p. viii).
The basic principle that these models demonstrate is that the evidence for any proposed model, such as
the mediation model of Figure 3, can only become convincing to the extent that major alternative models
can be ruled out. Researchers need to be aware of what these major alternative models are, and they
need to consider which of them can be argued against, on the basis of prior knowledge, the research
design used, or empirical analysis of the data.
Arguments on the basis of prior knowledge include the following rationales:
1. Alternative models having to do with measurement issues are less plausible if it has already been
established in previous work that the measures used have excellent reliability and validity (including
discriminant validity).
2. Alternative models having to do with uncontrolled common causes may be less plausible if the
research is conducted in a setting known to eliminate, or at least substantially reduce, the impact of
important confounds (unmeasured third variables).
3. Alternative models having to do with temporal ordering are less plausible if prior work, or else careful
conceptual analysis, has shown that reverse causation is unlikely.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 229

Consideration of major alternative models, best done before collecting the data, may reveal that some of
them cannot be ruled out adequately unless the research design is improved. Mediation tests and other
SEM-based statistics can be applied to an extremely broad spectrum of research designs—including
true experiments with manipulated, randomly assigned variables; quasi-experiments such as
nonequivalent control group designs; longitudinal designs based on passive observation; and cross-
sectional designs. Because the calculation of the estimates and statistical tests is the same for all of
these, it is easy to forget that the differences between research designs, in terms of plausibly ruling out
alternative models, can be huge. Possible design upgrades include the following:
1. Rather than passive observation, use randomly assigned manipulation of one or more of the
variables. This very important possibility is discussed in more detail in a later section. To illustrate,
random assignment for X plausibly rules out the alternative models shown in Panels B and C of
Figure 5.
2. Rather than a cross-sectional design, collect the data longitudinally—that is, at multiple, theoretically
appropriate times of measurement. This helps to rule out the alternative models shown in Figure 6.
However, determining the appropriate times of measurement can be difficult, because many theories
in psychology are not explicit about the time lags over which the posited causal processes take
place. Unfortunately, measuring a variable (or wave of variables) too early or too late can
substantially bias the estimates of the model (Kessler, 1987).
A final approach to dealing with major alternative models is to try to disconfirm them with the empirical
data. However, all of the three-variable models we have been considering—the favored mediation model
and the nine alternative models—fit the data equally well. A set of models that fit data equally well are
termed equivalent models (Herschberger, 1994; Lee & Herschberger, 1990; MacCallum, Wegener,
Uchino, & Fabrigar, 1993). It is important to understand that this equivalence is a property of the models
themselves, not any particular dataset; the fact that these are equivalent models applies to all possible
datasets. Because they are equivalent models, the possibilities for empirically disconfirming any of these
alternative models are very limited. All we can do for each alternative model is to inspect the parameter
estimates to see if they are implausible on either of two grounds. First, a parameter estimate may be
inadmissible—that is, outside of the range of sensible values. For example, an error variance (e.g., for
e1 or d1) may be negative. Second, a parameter estimate may be clearly wrong, based on prior
knowledge. An example might be a negative path coefficient for the effect of intelligence on
achievement. Implausible estimates suggest that the alternative model in which they occur may not be
viable. (The present illustrative dataset, as we saw, happened to yield no such implausibility for any of
the nine alternative models.).
To entertain the possibility of rejecting some of the alternative explanations on the basis of poor fit, we
would need to enhance the design of the research, typically by adding particular kinds of additional
measures. The following are four major possibilities:

Multiple indicators.
Figure 7 shows a model in which the mediator and outcome are represented by latent variables LM and
LY, respectively. Each of these constructs is tapped by multiple measures, called indicators—M1, M2,
and M3 for the mediator; and Y1, Y2, and Y3 for the outcome. The error variables e1 through e6
represent the measurement errors for each indicator (for the moment, ignore the arcs between some
pairs of them), and d1 and d2 represent unexplained variance in the two constructs. A further possibility
for some research, particularly in which X is not manipulated (and therefore presumably measured
perfectly), would be multiple indicators for it as well.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 230

Figure 7: Mediation Model with Multiple Indicators and Correlated Errors

This more sophisticated type of model can only be estimated with SEM software (such as Amos, etc.). In
addition, estimating it well requires a larger sample size than models that have only measured
variables—say, at least 100 to 200 cases. However, it has several potential advantages. One is that the
mediated effect, represented by coefficients a and b, is couched in terms of error-free latent variables;
thus, these values are corrected for imperfect reliability in the indicators and should be more accurate.
Even though the product ab is the combination of two latent relations, it can be tested readily using SEM
software. For example, in Amos, confidence intervals and significance tests for the product can be
performed using either the bias-corrected bootstrap or Bayesian estimation. Another important
advantage is that with multiple indicators, the model becomes testable, and its fit helps to evaluate the
measurement assumptions that underlie the model. For example, it will fit well only if it is the theoretically
relevant content that matters, as represented by the latent variables, rather than the specific content of
particular indicators.

Correlated errors.
As mentioned earlier, a major potential shortcoming of the mediation model is the possibility of correlated
errors between the measure of the mediator and the measure of the outcome, as shown in Panel A of
Figure 5. Having multiple indicators allows us to model some types of correlated errors, as shown by the
arcs in Figure 7. For example, imagine that M1 and Y1 are both self-report measures and hence may
share common biases. The inclusion of the arc between the errors of these indicators, e1 and e4,
models this irrelevant common variance, and as a result, the estimate of the latent relation b is corrected
for such systematic errors. Similarly, indicators M2 and Y2 could share some other common method—for
example, they might both be based on observer ratings—and this irrelevant common variance may be
modeled by allowing e2 and e5 to correlate. Note that we cannot just draw in correlated errors
everywhere; instead, there needs to be a specific, credible rationale for each one.

Measuring and controlling for third variables.


Another potentially important strategy for dealing with confounds is to measure them as additional
variables, build these variables into the model, and thereby control for them statistically. As a simple
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 231

illustration, Figure 8 shows one measured confound, C1, being controlled for with regard to the relation
of X to M, and another measured confound, C2, being controlled for with regard to the relation of M to Y.

Figure 8: Mediation Model Controlling for Third Variables C1 and C2

The principal shortcoming of such statistical control is that it is useful only to rule out specific, known and
measurable confounds, rather than an entire class of alternative model (as shown, for example, in
Panels A and B of Figure 5). It is important to realize that even when such statistical controls are
included in an analysis, the results remain somewhat provisional and potentially open to criticism,
because it is unlikely that all relevant third variables are known and measured (Dwyer, 1983; Kessler,
1987).

Measuring and controlling for earlier levels of the same variables as in the model.
A somewhat related strategy is to measure earlier levels of the mediator and outcome variables and use
these earlier levels as control variables. To illustrate, in Figure 8 imagine that C1 is the earlier level of M
(e.g., baseline, measured prior to X), and C2 is the earlier level of Y. This is essentially a simple type of
autoregressive model (Bollen & Curran, 2006), in which measures are used to predict their later values.
The rationale for using the earlier values as statistical controls is that they serve as a proxy for
unmeasured and possibly unknown third variables; hence, they may help establish that it is change in M
and change in Y that are being modeled, rather than the distorting effects of relations of these variables
with background factors. Although some methodologists are guardedly optimistic about how well this
strategy may work (e.g., Shrout & Bolger, 2010), others have been somewhat pessimistic (e.g., Kessler,
1987). In addition, the once considerable enthusiasm about autoregressive models in SEM (Cole &
Maxwell, 2003; Dwyer, 1983) has dissipated somewhat because of unresolved controversy about how to
model change (e.g., McArdle & Hamagami, 2001) and increasing interest in the alternative of latent
growth curve models (e.g., Duncan, Duncan, & Strycker, 2006). In such latent curve models, the
outcome variable is tracked across time and its slope or rate of change becomes the Y variable.
Although the concept of mediation can certainly be extended to such models, further consideration of
them lies beyond the practical limits of this article. An excellent resource for further information is Bollen
& Curran (2006). In a related vein, mediation analysis has also been extended to multilevel SEM
(Preacher, Zyphur, & Zhang, 2010).

Recommendations
Let’s close this SEM-based consideration of mediation with the following summary recommendations:
1. It is always a good idea to draw up one’s mediation hypotheses as a structural diagram and identify
major alternative models that may compete with it. Ideally, this exercise should precede the
finalization of the research design and data collection.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 232

2. For those alternative models that one cannot plausibly rule out on the basis of prior knowledge,
consider whether an enhanced research design would substantially improve matters.
3. The most appropriate model may well require additional variables beyond the basic threesome
underlying the logic of mediation. SEM provides a very flexible method for estimating these more
elaborate models. (See Arbuckle, 2009, and Kline, 2011, for excellent general resources on SEM.)

Heterogeneity and Interactions in the Evaluation of Mediation


Unfortunately, it is possible that a mediated effect can be heterogeneous across different subsets of the
participants. Some critical reconsiderations of mediation have focused on the possibility of such
heterogeneity (Bullock et al., 2010; Kraemer, Kiernan, Essex, & Kupfer, 2008). In addition, the presence
of such heterogeneity implies a statistical interaction. There has been considerable interest in how to
incorporate interactions into mediation models (Muller, Judd, & Yzerbyt, 2005; Muller, Yzerbyt, & Judd,
2010). The present section briefly reviews these lines of thought.
However, first it is important to understand clearly how the assumption of homogeneity of partial slopes
contributes to the usual interpretation of a mediation analysis. It will help to focus our attention
productively if we consider scenarios in which X (the predictor variable) consists of two levels, a
treatment group and a control group, and this manipulation produces a substantial effect on both M (the
mediating variable) and Y (the outcome variable). Such a circumstance is shown in Panel A of Figure 9,
in which the two groups clearly differ in their mean levels of both M and Y.
Is this configuration sufficient to support an inference of mediation? The answer is no. Indeed, to answer
the question of whether there is mediation, we need to take into account the within-group slopes relating
M to Y.
To see this, look at Panel B, in which the homogeneous within-group slopes are flat. In this scenario, all
of the between-groups difference on Y is attributable to the direct effect of X; there is no mediation by M.
Using the conventions of the earlier figures (e.g., Figure 1), we designate the partial (within-groups)
slope relating M to Y as b, and the direct effect of X on Y, which holds M constant, as c. In this scenario,
even when we control for M, the effect of X on Y is the full difference in means between the groups
(represented by c).
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 233

A) Group means for M and Y

Y
Treatment group

Control group

B) Within-group slopes of zero: No mediation

c
b=0

C) Moderate within-group slopes: Partial mediation

b=+
c

Figure 9: Within-Group Slopes Relating M to Y under the Assumption of Homogeneity


Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 234

D) Strong within-group slopes: Full mediation

b = ++

c≈0

(Figure 9 continued)

In contrast, in Panel C, the homogeneous within-group slopes are moderately positive. As shown along
the Y-axis, this positive slope reduces the value of c, the direct effect, compared to the foregoing
scenario. Thus, this scenario represents partial mediation: Some of the effect of X on Y is mediated by
M, but the rest of it is attributable to the direct effect of X on Y.
Finally, in Panel D, the homogeneous within-group slopes are more strongly positive, and as a result the
value of c approaches zero. This scenario represents full mediation: Once we control for the within-group
effect of M on Y, there is no remaining, direct effect of X on Y.
In summary, the partial (within-group) slope relating M to Y is crucial for the inference of mediation. In
addition, this is a good juncture at which to explain why it is the partial slope we are concerned about
(i.e., controlling for X), rather than the unpartialled relation between M and Y. A plausible alternative
hypothesis to mediation is what may be termed a concomitant-effects model. In this model, X has
concomitant (i.e., co-occurring) effects on M and Y, but M has no role in transmitting any of the effect of
X on Y. A concomitant-effects model is highly plausible in many clinical applications; for example,
psychotherapy has many concomitant effects, at least some of which probably have nothing to do with
producing the hoped-for therapeutic outcome.
However, concomitant effects will tend to be associated solely because they share the treatment as a
common cause. That is, even if M has no effect on Y, there will be an association between M and Y,
because X is their common cause. Hence the unpartialled association between M and Y cannot be used
to evaluate mediation, even though this is sometimes advocated (e.g., James & Brett, 1984). Panel B of
Figure 9 shows the concomitant-effects model: M and Y will be correlated because they share X as a
cause, but M has no role in affecting Y, as shown by the partial slope of zero.

Heterogeneity of Partial Slopes


What happens if the within-group slopes are heterogeneous? Figure 10 shows some possible scenarios.
In Panel A, there is a within-groups relation of M to Y in the control group, but it disappears in the
treatment group. In Panel B, conversely, there is a within-groups relation in the treatment group, but it
does not show up in the control group. Finally, in Panel C the within-group slopes even have opposite
signs. In none of these scenarios is it immediately clear how, or even if, the concept of mediation can be
applied.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 235

There seem to be alternative answers to this question. One possible answer is simply to acknowledge
that a variable can simultaneously be both a mediator (e.g., M’s role in the 𝑏𝑀 term) and also a
moderator (e.g., M’s role in the 𝑑𝑋 ∙ 𝑀 term). This is the approach advocated by Muller and colleagues
(2010), who argue that nowadays researchers are generally so familiar with the concepts of mediation
and moderation that applying both at the same time need not generate any undue confusion.
A) Stronger within-group slope for control than for treatment

Y
Treatment group

Control group

B) Stronger within-group slope for treatment than for control

Y Treatment group

Control group

C) Opposite within-group slopes for control and treatment

Y
Treatment group

Control group

Figure 10: Heterogeneous Within-Group Slopes Relating M to Y


Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 236

To capture any such configuration of heterogeneous slopes, we would need to include the product term,
X∙M, in the regression equation explaining Y. That is, rather than
𝑌 = 𝑏𝑀 + 𝑐𝑋 + 𝑒𝑟𝑟𝑜𝑟,
we would have the following equation that allows X and M to interact statistically:
𝑌 = 𝑏𝑀 + 𝑐𝑋 + 𝑑𝑋 ∙ 𝑀 + 𝑒𝑟𝑟𝑜𝑟.
Capturing heterogeneity in this way is potentially so important that some clinical methodologists have
argued that the X∙M product term (that is, the treatment-by-mediator interaction) should always be
included in mediation analyses (Kraemer et al., 2008).
However, the presence of an interaction is classically linked with the concept of moderation, rather than
mediation. Thus, heterogeneous slopes confront us with the question of how these two terms, mediation
and moderation, should be applied to such scenarios.
In contrast, a second position is that the resulting terminological confusion is a serious problem, and
therefore the concepts of mediation and moderation need to be redefined to overcome it. This is the
position taken by Kraemer and colleagues (2008), who propose new definitions of these terms such that
it is never possible for the same variable to be simultaneously both a mediator and a moderator. In the
“MacArthur approach” that they advocate, the X∙M product term (the treatment-by-mediator interaction)
is always included in the model, and “mediation” is established by the presence of either a main effect of
M (represented by coefficient b in the foregoing equation) or an interaction between X and M
(represented by coefficient d in the equation). In other words, the interaction is counted as “mediation”.
To distinguish the term “moderation” from this kind of interaction effect, they stipulate that for M to be a
“moderator”, it must have temporal precedence over X; whereas for M to be a “mediator”, the reverse
must be true—X must have temporal precedence over M.
In addition, in this approach the term “mediator” is purely descriptive and does not imply any underlying
causal model or other causal claims:
In the MacArthur approach, the criteria used to define moderators and mediators do not ... assume
any necessarily causal role for moderators or mediators once identified. Instead, the criteria for
establishing whether variables are moderators or mediators are based solely on temporal
precedence, association, and the nature of the joint association. (Kraemer et al., 2008, p. S105)
In other words, from this viewpoint, on the basis of results from some dataset it could be correct to call a
variable a mediator even if one already knew for certain that it did not function as an intermediary or
underlying mechanism! This position is vastly inconsistent with what many other methodologists mean
by the term “mediator”. For example, Bullock and colleagues (2010) consider the claim that a variable is
a “mediator” to be a very strong, theoretically anchored assertion about underlying mechanism that can
only be established through a program of multiple studies using diverse designs.
The present author’s position is that heterogeneous scenarios like those shown in Figure 10 stretch the
terms mediation and moderation to the breaking point—perhaps beyond their useful range of application.
These words are both simply terms of convenience and might better be avoided when they become
inconvenient, let alone counterproductive. No researcher has much to gain by fretting about whether M
in the scenarios in Figure 10 should be called a mediator and a moderator, only a mediator, or only a
moderator. Moreover, if asked, all researchers ought to be able to explain their ideas clearly without
using the terms mediation, moderation, mediator, and moderator at all.
What is more productive in heterogeneous scenarios is to focus on the best possible account of the
functional form of the relations involved and to develop ideas about what underlying processes could
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 237

give rise to that form of relation. For example, in Panel A of Figure 10, it would appear that both the
treatment (X) and the mediator (M) can increase the outcome (Y), but their effects do not sum. Instead,
each is functioning as if it is sufficient but not necessary to increase the outcome—that is, X and M
appear to be able to substitute for each other. Thus, the treatment may be useful only for those for whom
the mediator alone would not be effective.
As another example, in Panel B the effects of the treatment and the mediator again do not sum. In this
scenario, higher levels of the mediator are associated with higher outcomes, but only in the treatment
group. Thus, the treatment may be functioning as a necessary but not sufficient condition for increases in
M to lead to increases in Y. That is, being in treatment may enable other processes. In any particular
study, the researcher’s knowledge of the specific variables involved may suggest plausible, interesting
reasons why such non-additive processes would be taking place.
Another possible interpretation of such scenarios is that they may reflect underlying nonlinearity in the
relation of M to Y. To illustrate, look again at Panel A of Figure 10 and picture an underlying curvilinear
function relating M to Y, in which the function at first increases sharply across relatively low values of M,
but then decelerates in slope and approaches an upper asymptote across relatively high levels of M.
This underlying functional shape would explain why differences in relatively low levels of M are
associated with differences in Y, but differences in relatively high levels of M are not. In effect, the
treatment, by moving participants to the right, may be pushing them into the range on M at which there is
little or no further effect on Y.
As another illustration, look again at Panel B and picture an underlying curvilinear function that is
relatively flat across low values of M, but then accelerates in slope and becomes quite positive across
relatively high levels of M. In effect, the treatment may be pushing participants into the range on M at
which it starts to have effect on Y. Finally, the scenario in Panel C could reflect an underlying
curvilinearity of the inverted-U sort. For example, given an inverted-U relation between arousal and
performance, treatment may push participants into a range in which greater arousal, rather than aiding
performance, detracts from it. Obviously such nonlinear interpretations would need to be based on an
extensive understanding of the specific variables in any particular study. Their representation in a data
analytic model would require a curvilinear function, such as the quadratic:
𝑌 = 𝑏𝑀 + 𝑐𝑋 + 𝑑𝑀2 + 𝑒𝑟𝑟𝑜𝑟.
See Hayes and Preacher (2010) for a further discussion of such analytic possibilities.
The division of research participants into treatment and control groups, which has provided the basis for
the foregoing discussion, is only one possible source of heterogeneous effects. As Bullock and
colleagues (2010) point out, there are many other possibilities for subsets of participants who might
show heterogeneous effects not only of M on Y, but also of X on M. One particularly disturbing possibility
is that the participants for whom X has an effect on M are not the same ones as those who for whom M
has an effect on Y. In such a scenario, the logic of mediation analysis, based on examining the product
ab which combines these two component effects, clearly breaks down.
To attempt to avoid such problems of heterogeneity, Bullock and colleagues (2010, p. 555) make the
following recommendation to researchers: “Identify relatively homogeneous subgroups and make
inferences about indirect effects for each subgroup rather than a single inference about an average
indirect effect for an entire sample”. Unfortunately, using such subgroup-based analysis productively
may require larger samples than many psychologists work with. The same reservation likely applies to
the examination of non-additive and nonlinear effects. Nonetheless, even when the sample size does not
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 238

allow for a more formal evaluation of these possibilities concerning heterogeneity, it is important for
researchers to keep them in mind when interpreting and discussing their mediation analyses.

Moderated Mediation and Mediated Moderation


The compound concepts of moderated mediation and mediated moderation have attracted the interest of
many researchers, particularly since the publication of Muller et al. (2005), which explained them clearly,
gave substantive examples, and provided a data-analytic framework. We can only review this topic
briefly here, and we refer readers to that article and others (Edwards & Lambert, 2007; Preacher,
Rucker, & Hayes, 2007) for further information.
As mentioned earlier, in moderated mediation, the level of a moderator variable V affects how the
strongly (or possibly in what direction) a mediating variable M links X and Y. To do such an analysis in a
graphically oriented SEM program like Amos requires the representation of product terms as separate
variables, as shown in Figure 11. To the basic mediation model involving X, M, and Y, we have added a
fourth variable, V, as well as its products with X and M, namely, V∙X and V∙M. These three new predictor
variables are allowed to covary with X and with one other, as indicated by the arcs at the left. Note that
the path coefficient from V∙X to M is labeled e, the path coefficient from V∙M to Y is labeled d, and the
path coefficient from V∙X to Y is labeled f.

Figure 11: Mediation Model with a Fourth Variable, V, and its Interaction Terms Added

This model allows both the effect of X on M and the effect of M on Y to vary as a function of the level of
V. According to this model, the effect of X on M is (a + eV), and the effect of M on Y is (b + dV). Because
the indirect effect is the product of these two effects (i.e., X on M, and M on Y), it is evident that the level
of V can affect the size (or even direction) of the mediation:
(𝑎 + 𝑒𝑉)(𝑏 + 𝑑𝑉) = 𝑎𝑏 + 𝑎𝑑𝑉 + 𝑏𝑒𝑉 + 𝑑𝑒𝑉 2
In an SEM that includes product terms representing interactions, like the one shown in Figure 11, there
are three important things to keep in mind. First, although not strictly necessary, it is good practice to
mean-center the predictors (i.e., X, V, and M). Doing so makes the “main effects” for the predictors more
interpretable. For example, with centered variables, the product ab can be interpreted as the indirect
effect when V is at its mean (that is, a typical or average indirect effect). Second, all main effects (or
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 239

lower-order terms) that go into product terms must be included. For example, it would be a mistake to
omit V from the model, even if the analyst is not interested in its main effect. Third, in models with
product terms, only the unstandardized path coefficients are useful. The “standardized estimates”
provided in the output for such models are problematic because standardization of the product variables
is not handled correctly.
Even though Figure 11 sets up the correct SEM analysis, it does not provide a particularly intelligible way
to convey the results. Instead, better communication with an audience comes from providing two
mediation path diagrams (i.e., showing only X, M, and Y), one diagram for a high value of V (e.g., one
standard deviation above its mean) and a contrasting diagram for a low value of V (e.g., one standard
deviation below its mean). Given estimates for the coefficients a through f in Figure 11, the values for
these simple effects can be computed readily as follows:
Coefficient from X to M: a + eV,
Coefficient from M to Y: b + dV,
Coefficient from X to Y: c + fV,
where V = a value one SD above the mean for one diagram, and a value one SD below the mean for the
other diagram.
For circumstances in which the moderator (i.e., V) is a categorical variable (consisting of two or more
discrete kinds), rather than a continuous variable, a very attractive method for studying moderated
mediation is multiple-group SEM (Arbuckle, 2009). There are no product terms to wrestle with, and the
Amos output is already in the form of contrasting path diagrams, one for each condition or sample. We
return to this approach later in the article.
What about mediated moderation? This term refers to a mediating variable that explains how an
interaction between two other variables is transmitted to a later outcome. As Muller et al. (2005, p. 852)
pointed out, moderated mediation and mediated moderation are “in some sense the flip sides of the
same coin”. This can be seen in Figure 11, previously described as depicting moderated mediation. The
figure also shows a mediated moderation: M is the mediator of the effect of the interaction of V and X
(i.e., V∙X) on Y. Although this makes a kind of sense, it is probably not the side of the coin that
researchers should usually be looking at.
Finally, given the concerns raised in earlier parts of this article about the largely underappreciated
difficulties in establishing the case for mediation, one should be at least as skeptical about claims of
moderated mediation and mediated moderation. In addition, such claims tend to be vacuous unless they
are yoked to a strong and compelling theoretical framework.

The Relation between Mediation Analysis and Experimental Design


Prior to Baron and Kenny (1986), the prevailing belief among psychologists was that to advance the
issue of underlying mechanisms beyond the realm of speculation, we require well-conducted
experiments with carefully designed manipulations. The Baron and Kenny article convinced a great
many researchers that a much broader array of data could be brought to bear on questions of underlying
mechanism, including quasi-experimental, passive-observational, and purely questionnaire-based survey
data. However, some critiques of mediation analysis have attempted to shift the balance back toward the
older wisdom strongly favoring true experiments (Bullock et al., 2010; Spencer, Zanna, & Fong, 2005;
Stone-Romero & Roposa, 2008).
If we experimentally manipulate X and then observe subsequent levels of M and Y, our inference of the
effect of X on M is clear, but our inference of the effect of M on Y is still subject to confounds, because M
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 240

was not manipulated. For this reason, methodologists often recommend a second experiment, in which
M is experimentally manipulated to more unambiguously study its effect on Y (e.g., Spencer et al., 2005).
Hence, in this two-experiment approach, there is a separate experiment for each link in the causal chain.
However, the recommendation to experimentally manipulate the putative mediator is often of limited
value to clinical researchers. This is because there may be no more direct way to affect M than the
manipulation of X. As an example, consider cognitive therapy, in which the therapeutic manipulation has
the goal of changing underlying cognitions, which in turn ameliorate symptoms. If we already knew of a
more direct (and ethical) way to change cognitions, this would presumably already be incorporated into
cognitive therapy!
In clinically oriented research, the main application of directly manipulating the mediator is to test
relatively simple (and possibly simplistic) explanations of effects. Such “nothing-but” explanations imply
that a simpler condition should produce the same effects as a more complex, standard one. For
example, Spanos (1986) proposed that hypnotic amnesia was nothing but motivated disattention to the
to-be-forgotten material. If this proposal were correct, then giving explicit instructions for such
disattention, which would more directly manipulate the hypothesized mediator, should be just as effective
as standard hypnotic suggestions for amnesia. Bowers and Woody (1996) ran an experiment to test this
prediction and found, to the contrary, that attempts to disattend led to frequent intrusions of the to-be-
forgotten material, whereas hypnotic suggestions for amnesia did not. Thus, direct manipulation of the
hypothesized mediator yielded evidence against the nothing-but position.
Because it may be difficult or impossible to directly manipulate the putative mediator in clinical research,
a different, somewhat indirect strategy is often needed. It may be possible to come up with manipulations
that, although they are not themselves sufficient to produce the mediating effect, may either weaken or
strengthen it. MacKinnon (2008) calls these blockage and enhancement designs, and Spencer et al.
(2005) call them the moderation-of-process design.
Hypnosis research also yields a useful example of this strategy. Hypnotic suggestions for analgesia tend
to elicit both counter-pain imagery and pain reduction, and most clinical hypnotists believe that the
imagery mediates the pain reduction. In contrast, Hargadon, Bowers, & Woody (1995) hypothesized that
the counter-pain imagery has no causal role in hypnotic pain reduction. To test this hypothesis, they ran
a hypnosis experiment contrasting a condition in which counter-pain imagery was proscribed (i.e.,
blocked) versus a condition in which counter-pain imagery was prescribed (i.e., enhanced). Consistent
with their hypothesis, they found that although the two conditions produced very different levels of
counter-pain imagery, they were equally effective for pain reduction.
It is interesting to compare such an experiment to an alternative opened up by mediation analysis. The
Hargadon et al. (1995) hypothesis corresponds to what we earlier termed the concomitant-effects
model—that is, counter-pain imagery and pain reduction are concomitant effects of hypnotic suggestions
for analgesia, but imagery does not mediate the effect of the suggestions on pain reduction. By using
mediation analysis, one could investigate this hypothesis without manipulating the mediator. In such a
study, X would be hypnotic suggestions for analgesia versus a control condition, M would be the self-
reported frequency of counter-pain images, and Y would be self-reported pain. The shortcoming of this
design is that M and Y could share any of many possible uncontrolled common causes, just one
example of which is self-report bias. Hence, although the results of such a study could be quite
interesting, they would not likely be as clear as the ones from the foregoing experiment.
Nonetheless, it is important to emphasize that experimental studies attempting to manipulate the
mediator do not overcome all ambiguities about mediation. In particular, it needs to be established that
the experimental manipulation actually changed the underlying mediator of interest, and that it did not
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 241

inadvertently affect other possible mediators that could have an impact on Y (Bullock et al., 2010). One
way to try to establish this is to incorporate a measure of the mediator into the study and show that the
manipulation has a strong effect on this measure, and that scores on the measure explain the effect of
the manipulation on Y. The astute reader will realize that this is a mediation analysis, and we have come
full circle. Thus, it makes the best sense to consider experimental designs and mediation analysis as
complementary tools, rather than competing alternatives.

Integrating Mediation Analysis and Experimental Design


There are many creative possibilities for integrating mediation analysis and experimental design. To
illustrate, let’s look at an example of such a model.
As just mentioned, a shortcoming of experimental manipulations of the mediating variable is uncertainty
about whether the manipulation specifically affected the targeted mediator rather than some other
mediator. One way to address this problem is to pose a model in which there are two (or more)
simultaneous mediators and two (or more) experimentally manipulated conditions that should
differentially affect these mediators.
Such a model can be handled elegantly with multiple-group SEM, as shown in Figure 12. The same
model appears for each of two samples, and this model poses two simultaneous mediators, M1 and M2,
as intermediaries in the effect of X (e.g., treatment versus control) on Y (the outcome). In the model, M1
and M2 are measured variables. It is the two samples that represent contrasting experimentally
manipulated conditions. For example, Sample 1 could be those participants in a condition that should
enhance (or alternatively, block) M1, but leave M2 unaffected; and Sample 2 could be those participants
in a condition that should enhance (or alternatively, block) M2, but leave M1 unaffected. (A third, control
condition which has neither of these manipulations may be helpful in interpreting effects, but is not
shown in this particular model.) The pairs of paths comprising the indirect effects are labeled with letters
a and b, as in our earlier models. The first number after the letter indicates which mediator the indirect
effect is about, and the second number indicates which sample is involved.
This model allows a double dissociation in which we can verify that each manipulation affects the
intended mediator and not the other. Using the Bayesian custom-estimands option in Amos, we can set
up contrasts between the various indirect effects. For instance, to test the hypothesis that the condition
provided to Sample 1 differentially affected M1, we can estimate and test the following contrast:
𝑎11 𝑏11 − 𝑎12 𝑏12
Likewise, to test the hypothesis that the condition provided to Sample 2 differentially affected M2, we can
estimate and test the following contrast:
𝑎21 𝑏21 − 𝑎22 𝑏22
Appendix B shows how to run such a model and perform these contrasts. (This type of model would also
be useful for studying already-existing groups of people—for example, Samples 1 and 2 could be people
from two different cultural groups.)
Such integrations of SEM-based mediation analysis and experimental design may help to overcome
some of the respective weaknesses of either approach alone. The basic idea is to build mediation-
based, theoretically relevant contrasts into the experimental design and evaluate them using SEM.
Ideally, this gives us the best of both worlds.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 242

Sample 1

Sample 2

Figure 12: Multiple-Group SEM with Two Simultaneous Mediators, M1 and M2

Concluding Recommendations
Let’s conclude with a few important summary recommendations:
1. Be appropriately skeptical of researchers’ claims that mediation has been demonstrated. In and of
itself, passing a statistical test of mediation is not particularly persuasive – keep in mind that the crud
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 243

factor and N = 60 would be enough. Claims of “full mediation” are especially unlikely, particularly
given the modest sample sizes on which such claims tend to be based.
2. Keep in mind that compared to prevailing practice in psychology over the last twenty years, a much
more critical perspective is presently emerging about claims of mediation from non-experimental
data. To illustrate, consider the following two quotes:
Recent decades have seen dramatic growth of interest in mediation ... yet the quality of
argumentation remains inadequate because researchers have not come to grips with
some of the key assumptions on which their analyses depend. Deficient argumentation in
turn leads to insufficient attention of issues of design. Assessing mediation is a
conceptually deep and empirically vexing task, and those impatient for answers to
questions of mediation seem to underestimate the challenges presented by the study of
causal pathways. (Bullock et al., 2010, p. 557)
The recent literature on mediation ... has insufficiently emphasized the very difficult
assumptions that underlie the mediational model. This deficiency has led to many papers
that employ a mediation test without any discussion of the strong assumptions that are
implicitly being made. ... The most difficult issue is in establishing the adequacy of the
theory underlying the mediational model and the design considerations that have been
used to buttress that theory. We should expect to see an extended discussion of these
issues whenever mediational analyses are presented. (Judd & Kenny, 2010, p. 119)
3. Both in designing research and interpreting results, use an SEM framework (e.g., structural
diagrams) to pose and evaluate alternative, competing models. Which can be ruled out plausibly on a
priori grounds? Which can only be ruled out by empirical results, and what design features (and
analytic strategies) are needed in the research to make such tests possible?
4. Give some thought to the possibilities of heterogeneity of effects, non-additivity, and non-linearity.
Although these possibilities can seem statistically exotic and require relatively large samples to nail
down well, their clinical implications can be very substantial. For example, a mediating process that
has a large effect for a small minority of therapy clients is very different from one that has a moderate
effect for the great majority of clients.
5. Consider experiments that directly manipulate the hypothesized mediator, if possible. Without these,
the M-to-Y causal link remains at least somewhat provisional. There are important quasi-
experimental possibilities, as well, such as “natural experiments” and the use of instrumental
variables, as yet infrequently used by psychologists. See Antonakis, Bendahan, Jacquart, and Lalive
(2010) for a recent review of these quasi-experimental approaches.
6. Realize that establishing mediation is difficult, requiring multiple converging approaches in a creative
program of research. It is not, and never will be, reducible to any formulaic statistical test, no matter
how sophisticated.

Acknowledgements
Parts of this article are based on lecture material from the author’s graduate course, and I thank students
and colleagues for their feedback.
Preparation of this article was supported by an operating grant from the Natural Sciences and
Engineering Council of Canada (RGPGP 283352-04).
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 244

References
Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making causal claims: A review and
recommendations. Leadership Quarterly, 21, 1086-1120. doi:10.1016/j.leaqua.2010.10.010
Arbuckle, J. (2009). Amos 18 User’s Guide. Chicago: Amos Development Corporation.
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological
research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social
Psychology, 51, 1173-1182. doi:10.1037/0022-3514.51.6.1173
Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation perspective. Hoboken,
NJ: Wiley.
Bowers, K. S., & Woody, E. Z. (1996). Hypnotic amnesia and the paradox of intentional forgetting.
Journal of Abnormal Psychology, 105, 381-390. doi:10.1037/0021-843X.105.3.381
Bullock, J. G., Green, D. P., & Ha, S. E. (2008). Experimental approaches to mediation: A new guide for
assessing causal pathways. Unpublished manuscript, Yale University, New Haven, CT.
Bullock, J. G., Green, D. P., & Ha, S. E. (2010). Yes, but what’s the mechanism? (Don’t expect an easy
answer). Journal of Personality and Social Psychology, 98, 550-558. doi:10.1037/a0018933
Cole, D. A., & Maxwell, S. E. (2003). Testing mediation with longitudinal data: Questions and tips in the
use of structural equation modeling. Journal of Abnormal Psychology, 112, 558-577.
doi:10.1037/0021-843X.112.4.558
Duncan, T. E., Duncan, S. C., & Strycker, L. A. (2006). An introduction to latent variable growth curve
modeling: Concepts, issues, and applications (2nd ed.). Mahwah, NJ: Erlbaum.
Dwyer, J. H. (1983). Statistical models for the social and behavioral sciences. New York: Oxford
University Press.
Edwards, J. R., & Lambert, L. S. (2007). Methods for integrating moderation and mediation: A general
analytical framework using moderated path analysis. Psychological Methods, 12, 1-22.
doi:10.1037/1082-989X.12.1.1
Fritz, M. S., & MacKinnon, D. P. (2007). Required sample size to detect the mediated effect.
Psychological Science, 18, 233-239. doi:10.1111/j.1467-9280.2007.01882.x
Gergen, K. J. (1989). The possibility of psychological knowledge: A hermeneutic inquiry. In M. J. Packer
& R. B. Addison (Eds.), Entering the circle: Hermeneutic investigation in psychology (pp. 239-258).
Albany: State University of New York Press.
Gilovich, T. (2009, May). Where the mind goes. The Third Annual Ziva Kunda Memorial Lecture,
University of Waterloo, Waterloo, Ontario.
Goodman, (1960). On the exact variance of products. Journal of the American Statistical Association, 55,
708-713.
Hargadon, R., Bowers, K. S., & Woody, E. Z. (1995). Does counter-pain imagery mediate hypnotic
analgesia? Journal of Abnormal Psychology, 104, 508-516. doi:10.1037/0021-843X.104.3.508
Hayes, A. F., & Preacher, K. J. (2010). Quantifying and testing indirect effects in simple mediation
models when the constituent paths are nonlinear. Multivariate Behavioral Research, 45, 627-660.
doi:10.1080/00273171.2010.498290
Herman, C. P., & Mack, D. (1975). Restrained and unrestrained eating. Journal of Personality, 43, 647-
660. doi:10.1111/j.1467-6494.1975.tb00727.x
Herman, C. P., & Polivy, J. (1980). Restrained eating. In A. J. Stunkard (Ed.), Obesity (pp. 208–225).
London: W.B. Saunders.
Herschberger, S. L. (1994). The specification of equivalent models before the collection of data. In A.
von Eye & C. C. Clogg (Eds.), Latent variables analysis (pp. 68-105). Thousand Oaks, CA: Sage.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 245

James, L. R., & Brett, J. M. (1984). Mediators, moderators and tests for mediation. Journal of Applied
Psychology, 69, 307-321. doi:10.1037/0021-9010.69.2.307
Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating mediation in treatment evaluations.
Evaluation Review, 5, 602-619. doi:10.1177/0193841X8100500502
Judd, C. M., & Kenny, D. A. (2010). Data analysis in social psychology: Recent and recurring issues. In
S. T. Fiske, D. T. Gilbert, & G. Lindzey, G., Handbook of social psychology, vol. 1 (pp. 115-139). New
York: Wiley.
Kenny, D. A. (1979). Correlation and causality. New York: Wiley.
Kessler, R. C. (1987). The interplay of research design strategies and data analysis procedures in
evaluating the effects of stress on health. In S. V. Kasl & C. L. Cooper (Eds.), Stress and health:
Issues in research methodology (pp. 113-140). New York: Wiley.
Kraemer, H. C., Kiernan, M., Essex, M., & Kupfer, D. J. (2008). How and why criteria defining
moderators and mediators differ between the Baron & Kenny and MacArthur approaches. Health
Psychology, 27, S101-S108.
Kline, R. B. (2011). Principles and practice of structural equation modeling (3rd ed.). New York: Guilford.
Lee, S., & Herschberger, S. L. (1990). A simple rule for generating equivalent models in covariance
structure modeling. Multivariate Behavioral Research, 25, 313-334.
Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70, 151-
159. doi:10.1037/h0026141
MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The problem of equivalent
models in applications of covariance structure analysis. Psychological Bulletin, 114, 185-199.
doi:10.1037/0033-2909.114.1.185
MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. Mahwah, NJ: Erlbaum.
MacKinnon, D. P., & Fairchild, A. J. (2009). Current directions in mediation analysis. Current Directions
in Psychological Science, 18, 16-20. doi:10.1111/j.1467-8721.2009.01598.x
MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A comparison of
methods to test mediation and other intervening variable effects. Psychological Methods, 7, 83-104.
doi:10.1037/1082-989X.7.1.83
MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect:
Distribution of the product and resampling methods. Multivariate Behavioral Research, 39, 99-128.
doi:10.1207/s15327906mbr3901_4
Maxwell, S. E., & Cole, D. A. (2007). Bias in cross-sectional analyses of longitudinal mediation.
Psychological Methods, 12, 23-44. doi:10.1037/1082-989X.12.1.23
McArdle, J. J., & Hamagami, F. (2001). Latent difference score structural models for linear dynamic
analyses with incomplete longitudinal data. In L. M. Collins & A. G. Sayer (Eds.), New methods for the
analysis of change (pp. 137-175). Washington, DC: American Psychological Association.
doi:10.1037/10409-005
Meehl, P. E. (1990). Appraising and amending theories: The strategy of Lakatosian defense and two
principles that warrant it. Psychological Inquiry, 1, 108-141. doi:10.1207/s15327965pli0102_1
Muller, D., Judd, C. M., & Yzerbyt, V. Y. (2005). When moderation is mediated and mediation is
moderated. Journal of Personality and Social Psychology, 89, 852-863. doi:10.1037/0022-
3514.89.6.852
Muller, D., Yzerbyt, V., & Judd, C. M. (2010). Can a variable be both a mediator and a moderator? Paper
presented at the 11th Annual Meeting for the Society for Personality and Social Psychology, Las
Vegas, Nevada.
Myers, J. L. (1979). Fundamentals of experimental design (3rd ed.). Boston: Allyn and Bacon.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 246

Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling procedures for assessing and
comparing indirect effects in multiple mediator models. Behavior Research Methods, 40, 879-891.
doi:10.3758/BRM.40.3.879
Preacher, K. J., Rucker, D. D., & Hayes, A. F. (2007). Addressing moderated mediation hypotheses:
Theory, methods, and prescriptions. Multivariate Behavioral Research, 42, 185-227.
Preacher, K. J., Zyphur, M. J., & Zhang, Z. (2010). A general multilevel SEM framework for assessing
multilevel mediation. Psychological Methods, 15, 209-233. doi:10.1037/a0020141
Sadler, P., & Woody, E. (2003). Is who you are who you’re talking to? Interpersonal style and
complementarity in mixed-sex interactions. Journal of Personality and Social Psychology, 84, 80-96.
doi:10.1037/0022-3514.84.1.80
Shrout, P. E., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New
procedures and recommendations. Psychological Methods, 7, 422-445. doi:10.1037/1082-
989X.7.4.422
Shrout, P. E., & Bolger, N. (2010, January). Refining inferences about mediated effects in studies of
personality and social psychological processes. Paper presented at the 11th Annual Meeting for the
Society for Personality and Social Psychology, Las Vegas, Nevada.
Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. In
S. Leinhardt (Ed.), Sociological methodology, 1982 (pp. 290-312). Washington, DC: American
Sociological Association. doi:10.2307/270723
Sobel, M. E. (1986). Some new results on indirect effects and their standard errors in covariance
structure models. Sociological Methodology, 13, 290-312.
Spanos, N. P. (1986). Hypnotic behavior: A social-psychological interpretation of amnesia, analgesia,
and “trance logic”. Behavioral and Brain Sciences, 9, 449-467. doi:10.1017/S0140525X00046537
Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005). Establishing a causal chain: Why experiments are
often more effective than mediational analyses in examining psychological processes. Journal of
Personality and Social Psychology, 89, 845-851. doi:10.1037/0022-3514.89.6.845
Stone-Romero, E. F., & Roposa, P. (2008). The relative validity of inferences about mediation as a
function of research design characteristics. Organizational Research Methods, 11, 326-352.
doi:10.1177/1094428107300342
Weibe, R. E., & McCabe, S. B. (2002). Relationship perfectionism, dysphoria, and hostile interpersonal
behaviours. Journal of Social and Clinical Psychology, 21, 67-91.
Zanna, M. P., & Fazio, R. H. (1982). The attitude-behavior relation: Moving toward a third generation of
research. In M. P. Zanna, E. T. Higgins, & C. P. Herman (Eds.), Consistency in social behavior: The
Ontario Symposium (Vol. 2, pp. 283-301). Hillsdale, NJ: Erlbaum.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 247

Appendix A: Demonstration of SEM-Based Methods for Evaluating


Mediation
To run all parts of this demonstration, the following six files need to be in the same directory:
01 Check relation of X to Y.amw
02 Test mediation using SEM.amw
03 Bayesian estimation of effect ratio.AmosBayes
03 Bayesian estimation of effect ratio.amw
03 Effect ratio.vb
Mediation data.sav
We will use the simulated data in the file Mediation data.sav and the SEM program Amos to
demonstrate the different SEM-based methods for evaluating mediation. The data have been devised to
illustrate some interesting differences between the methods.
Preliminarily, we can look at whether there is a significant relation of X with Y (ignoring M, the mediator).
Running the Amos input file, 01 Check relation of X to Y.amw, yields the following text output:
Regression Weights:

Estimate S.E. C.R. P Label


Y <--- X .428 .234 1.832 .067

The insignificance of this X-to-Y relation fails the first of Baron and Kenny’s (1986) well-known steps for
evaluating mediation. However, more recently methodologists have expressed skepticism about the
value of this step in evaluating mediation. For example, Shrout and Bolger (2002, p. 422) state that it
“should not be a requirement when there is a priori belief that the effect size is small or suppression is a
possibility”. Oddly enough, even if there is no suppression, the mediated relation may be significant
although the relation of X to Y is not. The present data provide an example of this possibility.
Running the mediation model in the Amos input file, 02 Test mediation using SEM.amw, yields the
following text output:

Regression Weights:

Estimate S.E. C.R. P Label


M <--- X -.357 .208 -1.722 .085 a
Y <--- M -.451 .147 -3.056 .002 b
Y <--- X .267 .221 1.211 .226 c

Joint significance of the two effects involved in the mediation.


Are both paths involving M statistically significant? Here the answer is no, because X M is not
significant. Thus, this test does not support the hypothesis of mediation.

Sobel test and its variants.


The Sobel test is readily computed from the estimates and standard errors in the foregoing table:
𝑎𝑏 (−.357)(−.451)
𝑧= = = 1.49
�𝑎2 𝑆𝐸𝑏2 + 𝑏 2 𝑆𝐸𝑎2 �(−.3572 )(. 1472 ) + (−. 4512 )(. 2082 )

Because this z falls short of 1.96, the Sobel test also fails to support mediation.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 248

Alternatively, we can evaluate the foregoing z against non-normal sampling distributions. In MacKinnon
et al. (2002), this is called the z' test, and the critical value for the .05 significance level is stated to be
0.97 (p. 90). Because the foregoing z of 1.49 exceeds 0.97, this test supports the hypothesis of
mediation. An improved version of this approach is available through use of the program PRODCLIN
(MacKinnon, Fritz, Williams, & Lockwood, 2007), not illustrated here.

Bias-corrected bootstrap.
Using the same Amos input file, we can obtain bias-corrected bootstrap estimates for the indirect effect
with the following steps:
1. In View / Analysis Properties, under the Output tab, click the box for Indirect, direct & total
effects.
2. Also in View / Analysis Properties, under the Bootstrap tab, click the box for Perform bootstrap
and enter 5000 (or more) for the Number of bootstrap samples. Click the box for Bias-corrected
confidence intervals. The default confidence level is 90; to obtain 95% confidence intervals, change
it to 95.
3. Run the model. (It will take a little while.)
4. Go to View / Text Output. In the list that appears at the upper left, double-click on Estimates. Then
double-click on Matrices (which appears under Estimates). Next, click on either Indirect Effects or
Standardized Indirect Effects (depending on whether you prefer unstandardized or standardized
estimates). Finally, in the Estimates/Bootstrap panel at the left in the middle, click on Bootstrap
Confidence to get Bias-corrected percentile method. Confidence intervals and two-tailed p-values
appear for each of the indicated indirect effects.
For our demonstration data set, here is the resulting bootstrap text output for the standardized estimates:
Standardized Indirect Effects - Lower Bounds (BC)

X M
M .000 .000
Y .006 .000

Standardized Indirect Effects - Upper Bounds (BC)

X M
M .000 .000
Y .186 .000

Standardized Indirect Effects - Two Tailed Significance (BC)

X M
M ... ...
Y .040 ...

Thus, this test offers statistically significant support for the hypothesis of mediation (p = .04), and the
95% confidence interval for the standardized indirect effect is 0.006 to 0.186.

Bayesian estimation using the MCMC technique.


Using the same Amos input file, we can also obtain Bayesian estimates for the indirect effect. To begin,
select Analyze / Bayesian Estimation. This procedure generates many random samples in order to
estimate the parameters of the model. It will take a little while. We can tell that we have enough samples
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 249

when the Convergence Statistic (C.S.) closely approaches a value of 1, and the unhappy face becomes
a happy face. Also relevant is the column headed S.E., for Standard Error. These values index how
closely the Monte-Carlo-based parameter estimate may lie to the true value. Thus, we want these S.E.
values to be very small.
Note that these S.E. values do not have the same meaning as we are used to for the standard error of a
parameter, for example in maximum likelihood estimation. Instead, in the Bayesian approach, it is the
posterior standard deviation (labeled S.D. in the Amos output) that is analogous to the standard error in
ML estimation. In addition, in the Bayesian approach, it is the posterior mean (labeled Mean in the Amos
output) that is analogous to the ML parameter estimate.
Once we get the happy face indicating convergence, we can proceed with obtaining the Bayesian results
for the indirect effect:
1. In the Bayesian SEM dialog box, select View / Additional estimands.
2. In the Additional Estimands dialog box, use the slider (towards the upper left) to find Standardized
Indirect Effects and click on the box next to it. For these data, we see that the mean is 0.090; this is
our best single estimate of the standardized indirect effect.
3. Click on 95% Lower bound and then 95% Upper bound at the middle left. (If the confidence level is
not this, go to View / Options and change Confidence level to 95.) The values shown are the limits
for a Bayesian 95% credible interval. The Bayesian credible interval is analogous to a conventional
confidence interval, but its interpretation is more straightforward. The credible interval is a probability
statement about the parameter itself: We are 95% sure that the true value of the parameter lies
between the two limits. For these data, we find that the 95% credible interval is -0.018 to 0.231.
Because this interval includes zero (although only just barely), we lack reasonable confidence that
the indirect effect is not zero. Thus, note that the Bayesian approach provides a slightly more
conservative test for mediation than the bootstrapping approach, at least for a small sample like the
present one (N = 50).
4. We can also examine a graph of the posterior probabilities for various possible values of the
standardized indirect effect. In the Additional Estimands dialog box, click on View / Posterior, and
then click on the corresponding value (e.g., 0.090) in the window to the right. Note that the
distribution has some positive skew; this is the issue that causes problems for the conventional Sobel
test.

The effect ratio.


Based on the same regression weights as we used for the Sobel test, we can quantify the strength of the
mediation by computing the effect ratio:
𝑎𝑏 (−.357)(−.451)
𝑃�𝑀 = = = .376
𝑎𝑏 + 𝑐 (−.357)(−.451) + .267
That is, 37.6% of the effect of X on Y is mediated by M.
However, it is more informative to use the Bayesian approach to estimate the effect ratio. The Amos
input file called 03 Bayesian estimation of effect ratio.amw is set up to do this. There are two
important points to this set-up:
1. We need to disallow the rare random samples produced by Amos that are cases of suppression—
that is, in which the indirect and direct effects are of opposite sign. I have done this by modifying the
prior probabilities of the model’s parameters. To see this, go to View / Priors. Clicking on MX
under Regression weights, we see the prior probability distribution for this parameter. This
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 250

distribution represents what we believe about the parameter before we analyze the data from this
study. The Amos default is a uniform prior distribution, said to be diffuse or non-informative; it
represents the position that prior to the study we know nothing. I have reset the upper bound to zero.
This bounded distribution represents the following position: Although we do not know what the size of
the weight is, we are sure that it does not make sense for it to be positive. Likewise, clicking on the
regression weight for YM, note that again I have specified an upper bound of zero. Because both
these weights will now be constrained to be negative, their product, the indirect effect, will always be
positive. Finally, clicking on the regression weight for YX, note that I have specified a lower bound
of zero. Thus, all estimates of the direct effect to be considered will be positive. In this way, we
disallow all instances of suppression (for which the effect ratio is undefined).
2. We need to add the code needed to calculate the effect ratio. Select View / Custom estimands,
which opens a programming dialog box. Use File / Open to open the file called 03 Effect ratio.vb,
which contains the simple code to calculate the effect ratio. (For instructions about writing this code,
see Example 29 of Arbuckle, 2009.)
3. Now we click the Run button, and in a few moments we get our results. Our point estimate, the
Mean, is about .37. Of particular interest is the 95% credible interval, which runs from about .04 to
about .89. Thus, the Bayesian approach gives us the important additional information that this
sample does not actually provide a good estimate of the effect ratio.
4. To see a graph of posterior probabilities for the effect ratio, click on View / Posterior, and then click
on effectratio (under Numeric Estimands). The graph that appears is the one shown as Figure 2 in
the main text.
Journal of Experimental Psychopathology, Volume 2 (2011), Issue 2, 210–251 251

Appendix B: Demonstration of Two-Mediator, Two-Sample SEM and


Contrasts Using Bayesian Custom-Estimands in Amos
To run this demonstration, the following four files need to be in the same directory:
Two-mediator two-sample SEM (Fig 12).amw
Custom estimands for multiple-group SEM.vb
Sample 1 for multiple-group SEM.sav
Sample 2 for multiple-group SEM.sav
Opening the file Two-mediator two-sample SEM (Fig 12).amw in Amos brings up the two-sample
model shown in Figure 12 of the main text. This can be estimated in the usual way with ML, which is
worth doing first to see what the resulting path coefficients are like on the structural diagram. (For
instructions about how to set up and estimate models with multiple samples in Amos, see Examples 10
and 11 of Arbuckle, 2009.)
Bayesian estimation allows us to evaluate contrasts between the various indirect effects. To see this,
select Analyze / Bayesian Estimation. Wait until the Convergence Statistic (C.S.) closely approaches a
value of 1. Then select View / Custom estimands, which opens a programming dialog box. Use File /
Open to open the file called Custom estimands for multiple-group SEM.vb, in which is written the
code to calculate two differences between respective indirect effects and their associated p-values. (For
instructions about how to write such code, see Example 29 of Arbuckle, 2009.) These custom estimands
are as follows:
“contrast for M1” = 𝑎11 𝑏11 − 𝑎12 𝑏12
“contrast for M2” = 𝑎21 𝑏21 − 𝑎22 𝑏22
“p for M1”: One-tailed p-value for the first contrast
“p for M2”: One-tailed p-value for the second contrast.
Click Run, and in a few moments the results for these contrasts appear in the Custom Estimands box.
One-tailed p-values appear near the bottom of the box (Dichotomous Estimands), in the column
headed P. Even if we double these results to obtain two-tailed p-values, we see that both contrasts are
statistically significant, p < .05. The means and 95% credible intervals for the contrasts appear near the
top of the box (Numerical Estimands). Note that the direction of the second contrast is not consistent
with the hypothesis of differential effects: The condition administered to Sample 1 appears to have
enhanced both mediators, M1 and M2, compared to the condition administered to Sample 2.

You might also like