Professional Documents
Culture Documents
net/publication/274438998
Sample Size Limits for Estimating Upper Level Mediation Models Using
Multilevel SEM
CITATIONS READS
22 700
2 authors, including:
S. Natasha Beretvas
University of Texas at Austin
122 PUBLICATIONS 3,309 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by S. Natasha Beretvas on 23 June 2016.
To cite this article: Xin Li & S. Natasha Beretvas (2013) Sample Size Limits for Estimating Upper
Level Mediation Models Using Multilevel SEM, Structural Equation Modeling: A Multidisciplinary
Journal, 20:2, 241-264, DOI: 10.1080/10705511.2013.769391
Download by: [University of Texas Libraries] Date: 02 May 2016, At: 09:40
Structural Equation Modeling, 20:241–264, 2013
Copyright © Taylor & Francis Group, LLC
ISSN: 1070-5511 print/1532-8007 online
DOI: 10.1080/10705511.2013.769391
1
The University of Texas at Austin
This simulation study investigated use of the multilevel structural equation model (MLSEM) for
handling measurement error in both mediator and outcome variables (M and Y) in an upper
level multilevel mediation model. Mediation and outcome variable indicators were generated with
measurement error. Parameter and standard error bias, confidence interval coverage, and power
to detect the ab mediated effect using Empirical-M confidence interval estimates were assessed
for the correct MLSEM versus a conventional multilevel model (MM) that used composite scores
for M and Y. The following conditions were manipulated: level 1 and 2 sample sizes, intraclass
correlation, degree of measurement error in M, and the true value of ab. The MLSEM more
accurately recovered the ab effect’s value, but serious convergence issues were encountered with
MLSEM estimates based on fewer than 80 clusters. More power for detecting a nonzero ab was
found for MM than for MLSEM estimates.
A mediating variable, M, is defined as a variable that accounts fully or partially for the
relationship between a dependent variable, Y, and an independent variable, X (Baron & Kenny,
1986). Using Baron and Kenny’s notation, the total effect of X on Y is represented using c.
Coefficient a is used to represent the impact of X on M and b represents the relationship
between M and Y, controlling for X. The coefficient c0 , commonly termed the direct effect,
is used to represent the effect of X on Y after controlling for M. For the simplest mediation
models, researchers have used two measures of the mediated effect. The product, ab, of the a
and b coefficients provides one measure of the mediated effect. The other measure entails the
difference between the total and the direct effects (i.e., c c 0 ). Both mediated effect estimators
are equivalent for single-level data sets (MacKinnon, 2008). However, although estimates of
ab and of .c c 0 / differ only negligibly when multilevel mediation models are estimated, use
of ab provides a more efficient estimate of the mediated effect (Krull & MacKinnon, 1999). In
addition, for multiple-mediator models, use of the ab estimator permits assessment of mediator-
Correspondence should be addressed to S. Natasha Beretvas, 1 University Station, The University of Texas at Austin,
Educational Psychology Department, Population Research Center and Meadows Center for Preventing Educational
Risk, MS/D5800, Austin, TX 78712. E-mail: tberetvas@austin.utexas.edu
241
242 LI AND BERETVAS
specific mediated effects. This study thus focuses solely on the aO bO estimate of ab and leaves
consideration of the .c c 0 / measure as a direction for future research.
There are countless examples of research studies assessing mediation effects across a
wide range of fields and study designs. Mediation analysis has not only been conducted
with conventional single-level data sets, but has been extended for use with multilevel data
structures. Krull and MacKinnon (2001) introduced use of the LX ! LM ! LY notation
(where Lj indicates the level of the j variable) to categorize some of the different possible
designs encountered in multilevel mediation data. Note that it is possible that a data structure
entails more than two levels, but this study focuses solely on a two-level data structure modeling
individuals at Level 1 clustered within sites (e.g., schools) at Level 2. The upper level mediation
model (see Figure 1) is denoted as a 2 ! 2 ! 1 design (Krull & MacKinnon, 2001), indicating
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
that the first two variables in the mediation chain (X and M ) are measured at Level 2, whereas
the dependent variable, Y, is measured at the individual level. The mediated effect in the
2 ! 2 ! 1 model is calculated as the product of the a and b coefficients in Figure 1.
Methodological research focusing on multilevel mediation models has considered only
manifest X, M, and Y variables in which a single observed variable (typically a total or a
mean score) is used to represent respondents’ scores on the relevant latent construct. When a
single variable is used to measure a latent construct, this involves the assumption that the single
measure is perfectly reliable and represents the latent construct with no measurement error.
However, measurement of psychological and educational constructs is rarely perfectly reliable.
Ignoring measurement error can lead to biased parameter estimates. As noted by MacKinnon
(2008), the size of the mediated effect will be diminished by the unreliability of the measures
used. Hoyle and Kenny (1999) conducted a simulation study assessing recovery of the true
value of a mediated effect in a single-level mediation model using observed indicators. As part
of their study, the authors manipulated the reliability of scores on indicators of the mediating
variable. Hoyle and Kenny found that use of a latent rather than an observed mediating variable
improved the accuracy of the estimated mediated effect. Hoyle and Kenny’s results are not
unexpected, as described next.
FIGURE 1 Path model of the upper level mediation model with observed mediating and outcome variables.
CM and CY are used to represent composite scores on the mediating and outcome variables, respectively.
MULTILEVEL MEDIATION AND MLSEM 243
Bollen and Lennox (1991) illustrated how an observed composite score variable calculated as
the sum or mean of a latent variable’s indicators would differ from the latent variable with
individual indicators. Their illustration will be extended, here, to explain the expectation that
measurement error will hamper recovery of the a and b and thus the ab effects when it is
assumed that there is no measurement error in both endogenous variable versus when the
measurement error is explicitly modeled.
For a single-mediator structural equation model, latent variable, M, might be measured by u
indicators, m1 through mu , and latent variable Y is measured using v indicators y1 through yv .
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
To simplify this explanation, the values of the set of loadings for each latent variable (M and
Y) are going to be assumed equal. And œSm will be used to represent the standardized loading
of each indicator, m, on M, and œSy will represent the standardized loading of each indicator,
y, on Y. Because the standardized loadings are assumed, here, to be the same for each of the
M and Y factors, the measurement errors of each of the two factors’ indicators will also be the
same (denoted here by ©m and ©y for each m and y, respectively). The measurement model can
be written as follows:
2 3 2 S 3 2 3
m1 œm 0 ©m
6 :: 7 6 :: :: 7 6 :: 7
6 : 7 6 : : 77 6 : 7
6 7
6 7 6 S
6mu 7 6œm 0 7 M 6©m 7
6 7D6
6 y1 7 6 0 œSy 7 Y C 6 ©y 7 : (1)
7 6 7
6 7 6 7 6 7
6 : 7 6 : :: 7 6:7
4 :: 5 4 :: : 5 4 :: 5
yv 0 œSy ©y
Instead of using a structural equation model that models the measurement error in the
measures of the M and Y factors, mediation researchers have used composite scores (either the
sum of or mean across indicator scores) as the relevant endogenous variables.
P The composite
CM D uiD1 mi . Similarly,
score for the mediating construct, M, termed CM , will be such that P
the composite score for Y, CY , will be calculated such that CY D viD1 yi .
Based on Equation 1 and Bollen and Lennox (1991), the relationship between the composite
scores and their respective latent variables, M and Y, can then be expressed as:
S
CM œ 0 M ©
D m C m : (2)
CY 0 œSm Y ©y
Thus, clearly the composite scores CM and CY differ from their respective latent variable
scores, M and Y, as a function of the indicators’ loadings (and of their measurement errors).
The composite scores only equal the factor scores when the measurement error, ©, for each
indicator is zero. Solving Equation 2 for the latent variables M and Y, we obtain the following:
.1=œSm /
M 0 .CM ©m /
D : (3)
Y 0 .1=œSy / .CY ©y /
244 LI AND BERETVAS
The structural model for the latent variables in the single-level, structural equation model for
mediation can be written as follows:
M 0 0 M a —
D C 0 ŒX C M ; (4)
Y b 0 Y c —Y
where —j represents the disturbance for factor j. Substituting Equation 3 into the left side of
Equation 4, and solving for the vector of composite scores, we see that now
" 0 # " a # " —M #
0 M
CM œSm œS
D b C c 0 ŒX C —YmB : (5)
CY œS
0 Y œS S œy
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
m y
Solving Equation 4 for a and b, we obtain the following results for a and b as a function of
the latent variable scores on M and Y:
8̂
M —M
<a D
ˆ
X
0
: (6)
ˆ
:̂b D Y — Y c X
M
Solving Equation 5 for a and b as a function of CM and CY , we obtain the following:
œSm .CM ©M / —M
8̂
ˆ a D
X
ˆ
<
! ; (7)
S
ˆb D œm œy .CY
ˆ S
©y / —Y c0X
œSy M
:̂
where a is used to represent the path between X and the composite mediating variable, CM ,
and b represents the path between the composite mediator and outcome variables, CM and
CY .
Comparing a and a in Equations 6 and 7, it is obvious that the path, a , between the
independent variable, X, and the composite mediating variables (CM ) will be smaller than the
corresponding, a, between X and the latent mediating variable (M) if œSm < 1. And with more
measurement error, the loading, œSm , will be smaller and thus a will be comparatively less
than a. However, a comparison of b with b depends on the ratio of œSm to œSy although b will
always be less than b.
One of the primary general benefits of structural equation modeling (SEM) is that it allows
modeling of measurement error in indicators of an underlying factor (Bollen, 1989), resulting
in more precise estimates of parameters describing relationships among factors and observed
variables (Hoyle & Kenny, 1999). This particular benefit of SEM also applies to its use for
mediation model analyses and applied researchers have adopted the use of SEM in single-
level mediation model analyses (see, e.g., Dunkley & Blankstein, 2000; Wei, Heppner, &
Mallinckrodt, 2003). The SEM framework has been extended to include modeling of the
clustering of Level 1 units (e.g., students) within Level 2 units (e.g., schools) through the
use of multilevel SEM (MLSEM). However, although applied researchers are increasingly
MULTILEVEL MEDIATION AND MLSEM 245
using MLSEM to handle measurement error and multilevel data structures’ complications,
MLSEM is still not being used for multilevel mediation model analyses. Preacher, Zyphur,
and Zhang (2010) proposed using the MLSEM framework for assessing multilevel mediated
effects because MLSEM provides a more flexible framework that can be used to analyze a larger
variety of designs. In addition, as the authors explained, use of MLSEM allows partitioning
of the variability in each variable into its Level 1 and Level 2 components, thereby removing
the potential conflation of relationships among Level 2 variables with those among Level 1
variables that hamper use of the conventional multilevel model (MM). Pituch and Stapleton
(2011) demonstrated the use of Mplus software for different multilevel mediation models with
observed variables. Preacher, Zhang, and Zyphur (2011) conducted a simulation study to assess
mediated effects’ parameter recovery using MLSEM estimation procedures. However, Preacher
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
et al.’s study assessed MLSEM estimation in scenarios with composite scores for M and Y.
Thus, methodological researchers are encouraging use of the MLSEM framework for me-
diation analysis. However, no research has been conducted that has empirically assessed use
of the MLSEM framework for assessing multilevel mediated effects to handle endogenous
constructs that are measured with measurement error. Estimation of an MLSEM that includes
latent variables is more parameterized than the corresponding MM and it is unclear how well
the value of the true mediated effect will be recovered when using MLSEM. Thus, this study
was designed to assess parameter recovery for the mediated effect when the mediator and the
outcome variable are both latent factors with multiple indicators.
MULTILEVEL SEM
The statistical theory that underlies MLSEM has been discussed for decades (e.g., Hox, 1995;
McDonald & Goldstein, 1989; Muthén, 1989, 1990, 1994; Muthén & Satorra, 1995), however,
only more recently have procedures been available for estimating MLSEM models. MLSEM
allows modeling of interrelationships among latent and observed variables while appropriately
handling the dependencies originating in the multilevel structure of the data set. MLSEM takes
into account both between-cluster (cluster-level) variability and within-cluster (individual-level)
variability.
As with the conventional SEM model, MLSEM is founded on a measurement model and a
structural model. The measurement model consists of the pattern of the relationships between
each factor and its manifest indicators. Fit of the measurement or multilevel confirmatory
factor analysis (MLCFA) model can be assessed. The structural model describes the pattern of
relationships among factors and observed variables.
As with the conventional MM, the intraclass correlation, ¡ICC , provides a measure of the
degree of cluster-related dependence for each endogenous observed variable. For a multivariate
data set scenario, calculation of the jth variable’s ¡ICC in a MLSEM analysis can be expressed
as follows:
d ΠB j
¡ICC j D ; (8)
d ΠB j C d ΠW j
(Muthén, 1991) where d Œ†B j is the jth diagonal entry of the covariance matrix for the between-
cluster structure and d ΠW j is the jth diagonal entry of the covariance matrix for the within-
246 LI AND BERETVAS
cluster structure (i.e., the jth variance component at the respective levels). However, when
calculating the set of sample ¡ICC s in the multivariate case, it is not as simple as substituting
sample variance components into Equation 8 directly, because the sample between- and within-
cluster covariance matrices (SB and SW ) are not unbiased estimators of the population between-
and within-cluster covariance matrices, †B and †W , respectively. Once the unbiased estima-
tors are defined, the sample ¡ICC s can then be computed by substituting the corresponding
components into Equation 8.
Measurement Model
Estimation of an MLCFA model starts with partitioning of the total covariance matrix into two
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
†T D †B C †W ; (9)
where †B and †W are the population between- and within-cluster covariance matrices, respec-
tively, and †T is the population total covariance matrix. Using matrix notation, the MLCFA
model for a set of p indicators regressed on r latent variables at the between-cluster level and
q indicators regressed on s latent variables at the within-cluster level, the covariance structure
models at both the between- and within-cluster levels can be expressed using the following
respective equations:
†B D ƒB ‰B ƒ0B C ‚B ; (10)
and
†W D ƒW ‰W ƒ0W C ‚W ; (11)
where ƒB and ƒW are matrices of dimensions .p r / and .q s/, respectively, whose elements
consist of the factor loadings (i.e., œs); ‰B and ‰W are the r r and s s estimated covariance
matrices for the latent factors at the between- and within-cluster levels, respectively; and ‚B
and ‚W are the p p and q q estimated covariance matrices of the measurement errors, ©,
at the between- and within-cluster levels, respectively.
Structural Model
O for a conventional single-level SEM model
Recall that the model-implied covariance matrix †
for a set of 1 exogenous variables regressed on k exogenous latent variables and a set of m
endogenous variables regressed on n endogenous latent variables with both measurement and
structural models can be expressed as:
FIGURE 2 Multilevel structural equation model (MLSEM) path model of the upper level mediation model
with latent mediating and outcome variables.
248 LI AND BERETVAS
where ƒY w is a .v 1/ vector, whose elements consist of the factor loadings (i.e., œ.W /y s) for
the indicators (ys) of the within-cluster level factor, YW ; ˜W is a .1 1/ vector representing the
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
where ƒYB is a Œ.u C v/ 2 vector, whose elements consist of the œm s and œ.B/y factor
loadings for each of the m and y indicators of their respective between-cluster level M and YB
factors; ˜B is a .2 1/ vector of endogenous latent variables, and ©B is a Œ.u C v/ 1 vector
of measurement errors for each indicator of factors M and YB .
The 2 ! 2 ! 1 model indicates that the independent variable, X, and the latent mediating
variable, M, are only measured at the cluster level, and thus their effects on the latent dependent
variable, Y, are hypothesized only at the cluster level. That is, under the 2 ! 2 ! 1 model,
variables X, M, and Y are not assumed to have any structural relationship at the within-cluster
level. Therefore, the structural portion of a 2 ! 2 ! 1 model is only specified at the between-
cluster level and it can be represented as follows:
˜B D B˜B C Ÿ C —; (15)
where ˜B and Ÿ are .2 1/ and .1 1/ column vectors for the endogenous factors, MB and YB ,
respectively; B is a .22/ square matrix containing structural coefficients between MB and YB ;
is a .2 1/ vector of structural parameters from X to MB and YB ; and — is a .2 1/ column
vector of disturbances for the two endogenous factors, MB and YB . Last, the between- and
within-cluster ‰ and ‚© matrices (i.e., ‰B , ‰W , ‚B , and ‚W ) between the latent mediating
and dependent variables must also be specified.
Several specialized statistical software packages, including, for example, LISREL and Mplus,
can be used to estimate the MLSEM mediation model. The estimate of the resulting mediated
effect can then be calculated as the product of the relevant two parameter estimates, ”O MX
and “O Y M (see Figure 2). Researchers are usually also interested in testing the statistical
significance of the mediated effect estimate. A variety of methods for testing mediated effects
have been suggested, including the four causal steps strategy (Baron & Kenny, 1986), the
joint significance test (Cohen & Cohen, 1983), the traditional Z-test statistic (MacKinnon,
Lockwood, & Williams, 2004), the Empirical-M test of the mediated effect (Meeker, Cornwell,
& Aroian, 1981; MacKinnon, Fritz, Williams, & Lockwood, 2007), and bootstrapping-based
tests. Among these methods for testing the significance of the mediated effect, the Empirical-M
MULTILEVEL MEDIATION AND MLSEM 249
test of mediation and the bias-corrected bootstrap method have been found to perform best,
exhibiting strong statistical power while maintaining reasonable Type I error rates. The bias-
corrected bootstrap method has been found to outperform the Empirical-M method with slightly
more power with differences appearing only at the second decimal place and diminishing for
larger total sample sizes (MacKinnon et al., 2004). Thus, given the ease with which applied
researchers can use the Empirical-M test (by using MacKinnon et al.’s PRODCLIN software),
this study only investigates use of the Empirical-M test for testing the statistical significance
of the estimated mediated effect. Future research could extend this investigation by assessing
the functioning of bootstrapping procedures in similar contexts.
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
PRODCLIN then uses numerical integration to provide the critical values necessary for the
asymmetric confidence interval estimate of ab.
Research has supported use of the Empirical-M test of the mediated effect with conventional
MM estimates of a and b assuming perfectly reliable measures (e.g., see Pituch, Stapleton,
& Kang, 2006; Pituch, Whittaker, & Stapleton, 2005). And in the context of the MLSEM
formulation of the 2 ! 2 ! 1 model, the estimates of the a and b parameters correspond to
the ”O MX and “O Y M paths, respectively (see Figure 2). Thus, values for ”O MX and “O Y M as well
as their standard error estimates can be entered into PRODCLIN to obtain the asymmetric
confidence interval estimate of ab. However, performance of the Empirical-M test has not been
evaluated with MLSEM estimates of the ab mediated effect.
In summary, although the basic mediation model has already been extended to permit
modeling of multilevel data structures, research has only assessed estimation of mediated effects
under the assumption of perfectly reliable variables. This study is intended to assess use of
the MLSEM framework for handling the complexities introduced by imperfectly measured
mediating and outcome variables and multilevel data structures. It is as yet unclear what
minimum sample sizes might be needed to provide accurate and precise tests of the mediated
effect when using MLSEM estimates. With multilevel data, sample size is affected by both
the number of Level 2 units as well as the number of Level 1 units per Level 2 unit. It
is also unclear how other important design factors might impact estimation of the mediated
effect under MLSEM. These other conditions include the degree of cluster-level dependence
as operationalized using the ¡ICC , and the true magnitude of the mediated effect, both of which
250 LI AND BERETVAS
have been found to affect recovery of the mediated effect parameter (Krull & MacKinnon, 1999,
2001; Pituch et al., 2005). And last, use of the MLSEM rather than the MM permits appropriate
modeling of measurement error associated with the mediating and outcome variables. Thus,
this study will also manipulate the degree of measurement error in the mediating variable as
a design condition. The primary goal of this study was then to assess how well use of the
MLSEM techniques works in terms of estimating the mediated effect when both the mediating
and outcome variables are latent factors with multiple indicators and different degrees of
measurement error. And although use of the MLSEM has been suggested very recently for
multilevel mediation models (e.g., Pituch & Stapleton, 2011; Preacher et al., 2010), applied
researchers are still using the conventional MM. Thus, this study compared recovery of the
mediated effect when using the MLSEM as opposed to its recovery when using the traditional
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
METHOD
Data were generated to fit an upper level 2 ! 2 ! 1 mediation model mimicking a cluster
randomized trial with Level 2 units randomly assigned to intervention. A dichotomous inde-
pendent variable, X, was used to represent treatment versus control group membership. The
mediator, M, was a continuous, cluster-level latent variable measured by four .u D 4/ indicators.
The outcome variable, Y, was also a latent variable measured by four .v D 4/ indicators and
measured at the individual level.
The simulation study was designed to assess use of MLSEM model estimation for recovering
the mediated effect under a variety of manipulated conditions. The manipulated conditions in
the simulation study included the following: ¡ICC ; number of clusters, G; within-cluster sample
size, nj ; degree of measurement error, œsm, in measures of the latent mediating variable; and
true values of the a and b parameters whose product provides the mediated effect.
Simulated Conditions
¡ICC . Values of the ¡ICC (¡ICC D 0:05 and ¡ICC D 0:15) were manipulated based on values
found from a search of methodological research studies investigating multilevel mediation
(Krull & MacKinnon, 1999, 2001; Pituch et al., 2006) and general MM studies (Julian, 2001;
Maas & Hox, 2005; Meyers & Beretvas, 2006).
Number of clusters. Several researchers (Hox & Maas, 2001; Kreft & de Leeuw, 1998;
Maas & Hox, 2005; Van der Leeden, Busing, & Meijer, 1997) have made recommendations
about the optimal minimal number of clusters, G, to use when estimating MMs. High pro-
portions of nonconvergent cases, and biased parameter and standard error estimates have been
associated with small numbers of clusters in previous simulation studies (Hox & Maas, 2001;
Julian, 2001; Stapleton, 2002; Ryu & West, 2009). Hox and Maas (2001) conducted a simulation
study that found that MLCFA Level 2 parameter estimates were not substantially biased when
the number of clusters was as low as 100. However, it can be difficult to obtain data from a
large number of clusters and many applied studies have used MLSEM with smaller numbers
of clusters. In addition, values of G used in previous simulation studies of multilevel mediation
MULTILEVEL MEDIATION AND MLSEM 251
analysis have mostly ranged from 10 to 200 (Bauer, Preacher, & Gil, 2006; Kenny, Korchmaros,
& Bolger, 2003; Krull & MacKinnon, 1999, 2001; Pituch et al., 2006; Pituch et al., 2005).
Therefore, to assess the impact of G in multilevel mediation analysis using MLSEM and to
test the sample size limits when estimating an MLSEM, three levels of G were considered (20,
40, and 80) in this study. For all levels of G, half of the clusters were randomly assigned to
the experimental and half to the control group.
Within-cluster sample size. For simplicity’s sake, this study only considered balanced
designs with the same within-cluster sample size, nj , across clusters. Thus, the total sample size,
N, equals Gnj . Choice of the values for nj of 20 and 40 was based on previous methodological
research findings (Hox & Maas, 2001; Julian, 2001; Pfeffermann, Skinner, Holmes, Goldstein,
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
& Rasbash. 1998; Preacher et al., 2011; Stapleton, 2002) concerning minimum within-cluster
sample sizes.
Degree of measurement error for M. Measurement error for indicators of the cluster-
level mediator, M, are a function of the factor’s standardized loading values, the variance of
M, and the variance of each indicator of M (see Figure 2). Thus, to manipulate the degree
of measurement error as a design condition, two different standardized loading .œSm / values
were used. Because the standardized between-cluster factor loading values found in previous
applied and simulation MLCFA and MLSEM research typically vary from 0.4 to 0.8 (e.g., Hox
& Maas, 2001; Peugh & Enders, 2010; Rowe, 2003; Yuan & Bentler, 2002; Yuan & Hayashi,
2005), the two standardized path values were chosen to be 0.5 and 0.8 to represent large and
small degrees of measurement error in M, respectively. Note that scores on the indicators of
the outcome latent variable, Y, were also generated as imperfectly reliable although using only
a single standardized loading value of 0.8 at both the within and cluster level.
True values of a and b. The mediated effect assessed in this study was calculated using
ab. Conditions in which there was no mediated effect (i.e., where ab D 0) were examined, as
well as conditions with a small-size mediated effect. Three null conditions were examined (i.e.,
ab D 0), in which the following combinations of true a and b values were used: (a D 0, b D 0),
(a D 0, b D 0:3), and (a D 0:3, b D 0). (Remember that in MLSEM notation, a D ”MX and
b D “Y M in Figure 2). These a and b values were chosen because they are among those that have
been investigated in previous simulation studies of multilevel mediation analysis (e.g., Krull
& MacKinnon, 1999; Pituch et al., 2006; Pituch et al., 2005). Previous studies (MacKinnon
et al., 2004; Pituch et al., 2005) have found differences in results when both a and b are zero
(leading to no mediated effect) versus conditions in which either a or b were zero.
Three nonnull conditions (here, ab D 0:09) were generated using the following combinations
of values: (a D 0:3, b D 0:3), (a D 0:2, b D 0:45), and (a D 0:45, b D 0:2). The true value
of 0.09 for the mediated effect is similar to values investigated in several previous multilevel
mediation simulation studies (Krull & MacKinnon, 1999, 2001; Pituch et al., 2006; Pituch
et al., 2005; Preacher et al., 2011). Finally, the generating value for the direct effect of X on
Y was generated to be a constant value of 0.1 across conditions. This value matched the value
for c 0 used in previous related studies (Krull & MacKinnon, 1999, 2001; Pituch et al., 2006;
Pituch et al., 2005; Preacher et al., 2011).
252 LI AND BERETVAS
Data Generation
SAS (Version 9.2) was used to generate all data sets designed to fit the model depicted in
Figure 2 with four indicators for each of the mediating and outcome factors (i.e., u D v D 4).
Values on the cluster-level dichotomous independent variable, X, were sampled from a binomial
distribution with a mean of 0.5. Values for each of the other observed and latent variables were
sampled from normal distributions with means of zero and condition-specific covariance matrix
element values.
SAS PROC IML was used for generating data. Data were generated for all possible com-
binations of conditions yielding a total number of 144 combinations of conditions. For each
combination of conditions, 1,000 replication data sets were generated. The multilevel mediation
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
models in Figures 1 and 2 were estimated assuming a conventional MM and MLSEM model,
respectively. For the MM analysis, total scores on each set of four indicators of M and of Y
were used as the observed M and Y variables (corresponding to CM and CY , respectively) in
Figure 1.
Evaluation Criteria
Both the MM and MLSEM mediation models were estimated using Mplus software’s (Muthén
& Muthén, 2007) MLR estimation procedure, which provides maximum likelihood parameter
estimates and robust standard errors. Convergence rates for the first 1,000 data sets were
tallied. Relative parameter bias (RPB; Hoogland & Boomsma, 1998) and relative standard
error bias (RSEB) estimates for the ab mediated effect for 1,000 converged solutions for each
combination of conditions were compared across conditions and across the two estimating
models. A maximum value of 0.05 for RPB and 0.10 for RSEB was assumed to constitute
acceptable degrees of bias.
In addition, PRODCLIN software (MacKinnon et al., 2007) was used to estimate the 95%
confidence interval limits of the mediated effect. To assess the accuracy of the asymmetric
confidence interval limits of the mediated effect obtained using PRODCLIN, the proportion
of replications in which each confidence interval estimate contained the true mediated effect’s
value was tallied. Last, the proportions of replications for the ab D 0:09 conditions in which
the confidence interval estimates did not contain zero were tallied.
RESULTS
Inadmissible Solutions
Across all conditions, almost all MM solutions were admissible. However, estimation of
the MLSEM mediation model was far more problematic, leading to a much higher rate of
inadmissible solutions (see Tables 1 and 2). The proportion of inadmissible cases under the
MLSEM was mostly influenced by the number of clusters, G, and the value of the true ¡ICC .
More specifically, the proportion of inadmissible solutions decreased as the number of clusters
increased. The overall average percentages of inadmissible solutions for G D 20, 40, and
80 conditions were 19.7%, 6.7%, and 2.5%, respectively. The average percentages for the
¡ICC D 0:05 and ¡ICC D 0:15 conditions were 14.0% and 5.3%, respectively. The larger the
MULTILEVEL MEDIATION AND MLSEM 253
TABLE 1
Proportion of Inadmissible Cases for Each of the Three ab D 0 Conditions by
Estimating Model and Condition
Note. MLSEM D multilevel structural equation model; MM D conventional multilevel mediation model.
true ¡ICC value, the more admissible solutions there were. The degree of measurement error in
the indicators of the latent mediating variable also influenced the proportion of inadmissible
solutions such that for conditions with more measurement error .œSmi D 0:5/, the average per-
centage was 12.1% and when there was less measurement error .œSmi D 0:8/, the average was
7.2%. The within-cluster sample size had a slightly lesser effect such that the average percentage
of the nonconvergent cases decreased as nj increased where the average percentages were
10.9% and 8.5% for nj D 20 and nj D 40, respectively.
TABLE 2
Proportion of Inadmissible Cases for the Three ab D 0.09 Conditions by Estimating Model and Condition
Note. MLSEM D multilevel structural equation model; MM D conventional multilevel mediation model.
measurement error. Across conditions, the average RPB found in œsmi D 0:8 conditions was
19.8% and in œsmi D 0:5 conditions, the mean RPB was 49.8%.
Substantial bias was found in a much smaller subset of conditions for the MLSEM estimates
(than for the MM estimates) and the degree of bias was consistently less than that found for
the MM estimates. When substantial bias was found it was mostly positive. The pattern of bias
noted for the MLSEM estimates appeared to depend on several factors. Bias was found in more
conditions in which the value of a was greater than b (a D 0:45, b D 0:2) as compared with
conditions in which a was less than or equal to b (i.e., for [a D 0:2, b D 0:45] and [a D 0:3,
b D 0:3], respectively). No substantial bias was found in the (a D 0:3, b D 0:3) conditions
with the larger ¡ICC value and similarly, less bias was found in the smaller ¡ICC conditions for
the other pairs of true a and b value conditions. Similarly across all conditions, no substantial
bias was found when the data sets entailed 80 Level 2 units.
TABLE 3
Relative Parameter Bias for the ab D 0.09 Conditions by Model and Condition
Note. Boldface and italicized values indicate substantial relative parameter bias (Hoogland & Boomsma, 1998).
MLSEM D multilevel structural equation model; MM D conventional multilevel mediation model.
(see Table 4). This bias was worst for the (a D 0, b D 0) conditions for which almost all
conditions led to substantial RSEB for both MM and MLSEM estimates. Substantial RSEB
was identified for MM SE estimates in only 1 of 24 (a D 0, b D 0:3) and 4 of the 24 (a D 0:3,
b D 0) conditions. For MM SE estimates in the (a D 0, b D 0) conditions, more positive
RSEB was found in data sets with the largest number of Level 2 units .G D 80/ and for data
sets with a smaller nj (specifically, when nj D 20 the average RSEB was 15.3% rather than
13.2% for nj D 40).
The magnitude of the RSEB was consistently larger for the MLSEM than the MM estimates
across ab D 0 conditions. And the degree of bias was worse for smaller nj than larger nj
conditions and in ab D 0 conditions with a smaller ¡ICC as compared with the larger ¡ICC
conditions. When only one of the a or b parameters were zero and ¡ICC was 0.15 rather than
0.05, then substantial RSEB for the MLSEM SE estimates was only found in 1 or 2 of the 12
conditions (see Table 4).
In conditions in which ab ¤ 0, far fewer conditions led to substantial RSEB for both MM
and MLSEM estimates. Almost all of the substantial RSEB in MLSEM estimates were found
256 LI AND BERETVAS
TABLE 4
Relative Standard Error Bias for the ab D 0 Conditions by Model and Condition
Note. Boldface and italicized values indicate substantial relative standard error bias (Hoogland & Boomsma,
1998). MLSEM D multilevel structural equation model; MM D conventional multilevel mediation model.
for the smallest G value. And although most of this substantial RSEB was positive, a couple of
RSEB estimates were negative (see Table 5). However, the substantial RSEB found in the MM
estimates were quite consistently negative. The (a D 0:45, b D 0:2) conditions were associated
with more frequent substantial RSEB for MM estimates and only when G was 20 or 40.
TABLE 5
Relative Standard Error Bias for the ab D 0.09 Conditions by Model and Condition
Note. Boldface and italicized values indicate substantial relative standard error bias (Hoogland & Boomsma,
1998). MLSEM D multilevel structural equation model; MM D conventional multilevel mediation model.
estimates exhibited better coverage rates (89.6% overall) than the MM estimates (82.3%
overall). For both models’ estimates, the number of Level 2 clusters was positively related to
the percentage of coverage. No other factor assumed to strongly influence the coverage rates.
Power
Because applied researchers typically test the statistical significance of the mediated effect
(i.e., against a null hypothesized value), the proportions of replication in which the confidence
interval estimates did not contain zero were tallied for the ab D 0:09 conditions (see Table 8).
These results equate to a two-tailed power analysis of the MM and MLSEM estimates. Power
was slightly higher for MM (18.0% overall) than for MLSEM estimates (15.6% overall)
although the rejection rates for both models’ estimates were quite low for a true ab value
of 0.09. Both nj and G were positively related to power with the number of clusters having a
stronger effect than the number of individuals per cluster. In addition, higher power was found
in conditions with a more reliable mediating variable (i.e., for œSm D 0:8 vs. œSm D 0:5) for both
MM and MLSEM estimates. Last, for the (a D 0:3, b D 0:3) conditions slightly more power
258 LI AND BERETVAS
TABLE 6
Confidence Interval Coverage for the ab D 0 Condition by Condition and Model
Note. MLSEM D multilevel structural equation model; MM D conventional multilevel mediation model.
was found in ¡ICC D 0:05 than in ¡ICC D 0:15 conditions for MM (20.2% vs. 16.6%) and for
MLSEM (18.5 vs. 15.5%) estimates. However, the reverse pattern was seen for the (a D 0:2,
b D 0:45) and (a D 0:45, b D 0:2) conditions where the overall MM estimates were 14.1%
versus 21.5%, and the overall MLSEM estimates were 10.1% and 19.7% for the ¡ICC D 0:05
and ¡ICC D 0:15 conditions, respectively.
DISCUSSION
When using the MLSEM framework to assess a multilevel mediation model, the mediated effect
of X on the latent variable Y through the latent variable M can be modeled. When using the MM,
however, the mediated effect of X on the observed variable Y through the observed variable M is
modeled. If Y and M are perfectly reliable, and assuming that the confounds noted in Preacher
et al. (2011) are handled, the mediated effect estimated assuming the MLSEM versus the MM
framework should not differ (beyond estimation procedure differences). However, if either or
both of Y and M are not perfectly reliable, then the MLSEM and MM mediated effect estimates
MULTILEVEL MEDIATION AND MLSEM 259
TABLE 7
Confidence Interval Coverage for the ab D 0.09 Condition by Condition and Model
Note. MLSEM D multilevel structural equation model; MM D conventional multilevel mediation model.
are not expected to be the same (see Equations 6 and 7). In fact, the MLSEM estimates of ab
are expected to be more accurate given the model’s parameter estimates are well recovered.
In addition, the less reliable the Y and M scores, the more accurate the MLSEM estimates
should be when compared with the MM estimates of ab. However, the caution associated with
this expectation is that many more parameters are estimated as part of the MLSEM versus
a corresponding MM model and thus it was also expected that larger sample sizes would be
required for reasonably performing MLSEM model estimation.
Given what was derived in Equations 6 versus 7, underestimation of the ab effect would be
expected under the MM (Hoyle & Kenny, 1999) when imperfectly reliable composite CM and
CY variables are used in the model. And the underestimation bias in the MM estimates would
be expected to be worse in conditions with more measurement error. These expectations were
validated by the substantially negative RPB results for MM estimates of ab and matched
the pattern of negative bias found in MM estimates by Preacher et al. (2011). However,
note Preacher et al.’s study generated perfectly reliable CM and CY variables. In this study,
measurement error in the M variable was generated and the degree of the negative bias in MM
estimates of ab depended primarily on the degree of measurement error in M.
260 LI AND BERETVAS
TABLE 8
Rejection Rates (Power) for the ab D 0.09 Condition by Condition and Model
Note. MLSEM D multilevel structural equation model; MM D conventional multilevel mediation model.
Additional analyses were run to assess individual parameter bias in each of the two com-
ponents (a and b) that constitute ab. The degree of bias for MM estimates of b was greater
than the degree of the bias found in MM estimates of a. This matches what would be expected
based on Equation 7. Only the unreliability of M contributes to estimation of a , however, the
degree of underestimation in b is aggravated not only by œm but also by œy .
Although MLSEM estimates were expected to better recover the true ab effect given the
direct correspondence between the estimating and generating MLSEM models, it was unclear
what sort of minimum sample sizes might be required for decent estimates. The true mediated
effect parameter values were consistently better recovered using the MLSEM than the MM
when the model’s estimation converged. However, serious convergence issues were encountered
when estimating the MLSEM with the smaller sample size (for G < 80) conditions. Preacher
et al. (2011) also found negligible convergence problems with smaller sample sizes when
estimating the MM for mediation with observed M and Y variables (using MLSEM estimation).
Similarly, the pattern of convergence issues encountered with the MLSEM mediation model
matched Hox and Maas’s (2001) results and more generally with results from other MLSEM
estimation research (Julian, 2001; Stapleton, 2002; Ryu & West, 2009). Hox and Maas also
MULTILEVEL MEDIATION AND MLSEM 261
found that the smallest number of clusters (G D 50 in their study) was associated with the
largest percentage of inadmissible solutions and that the lower the ¡ICC , the higher the in-
admissible solutions’ rate. In his single-level SEM estimation study, Boomsma (1983) found
that SEM estimation for samples with a total N of less than 100 tended to result in higher
nonconvergence rates. To test the sample size limits in this study, several of the conditions
examined here entailed values for total N of less than 100. This study’s results all strongly
indicate that the MLSEM should be used only with data sets involving an absolute minimum
of 80 clusters and only when estimating the simplest MLSEM mediation model (matching that
investigated here).
Standard error bias was quite consistently worse for MLSEM than for MM estimates of
Sobel’s ¢ab (see Equation 16). MLSEM estimates of ¢ab were overestimated across most of
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
the (a D 0, b D 0) conditions and less so in the other combinations of true a and b values. SE
bias was improved in conditions with larger sample sizes (as a function of both G and of nj )
for both MM and MLSEM estimates and in the ab ¤ 0 conditions. MLSEM estimates of the
standard error of aO bO were also improved in conditions with the larger ¡ICC value.
Parameter recovery was assessed using PRODCLIN and the MLSEM and MM estimates
of ab. Coverage rates were overall quite high (see Tables 6 and 7). For conditions in which
ab D 0, the confidence intervals’ coverage rates provide a form of two-tailed Type I error
rate assessment. The results then indicated that both MLSEM and MM estimates performed
somewhat conservatively with many conditions leading to coverage rates exceeding a nominal
alpha level of 5%. The conservative rates increased for larger values of G and of ¡ICC . MM
estimates were slightly more conservative than MLSEM estimates.
Despite the slightly conservative coverage rates for the set of three ab D 0 conditions,
coverage rates were quite high for the ab D 0:09 conditions (see Table 7). For the ab D 0:09
conditions, coverage rates for MLSEM estimates were better than for the MM. This matches
the expectation that modeling of measurement error using latent variables in the MLSEM
framework will lead to more accurate estimation of the ab parameter (see Equations 6 and 7).
Power was found to be quite low across conditions (and estimates). The nonzero ab value that
was generated closely corresponds with values used in other multilevel mediation model studies
(e.g., Krull & MacKinnon, 1999, 2001; Pituch et al., 2006; Pituch et al., 2005; Preacher et al.,
2011). The low power (ranging from 3.7% up to 43.6% across conditions and models estimated)
matches results found with similar conditions in previous research (Pituch & Stapleton, 2011;
Pituch et al., 2006; Preacher et al., 2011) However, no previous research has investigated power
in conditions with endogenous variables that were not perfectly reliable. In addition, although
Preacher et al. (2011) had investigated a model similar to the MM model assessed here by
using composite variables for M and Y, the researchers had used the Wald test rather than the
Empirical-M test when constructing confidence intervals for ab and looked at a 2 ! 1 ! 1
rather than a 2 ! 2 ! 1 design. However, despite the differences between the Preacher et al.
study and this study, both studies found that power was (as expected) positively related to
both Level 1 and Level 2 sample sizes. In addition, lower power was found with smaller ¡ICC
values. This study also found (as would be expected from Equation 7) that the more reliable
were the indicators of M, the more power was found. This result held for both MM and
MLSEM estimates. Unfortunately, however, despite the enhanced accuracy of MLSEM versus
MM estimates of the nonzero ab, more power was found for MM than MLSEM Empirical-M
confidence interval estimates (see Table 8).
262 LI AND BERETVAS
increases the number of parameters to be estimated. This means that larger sample sizes (in
terms of the number of Level 2 units) will be required to avoid convergence issues. Given
that the analyst does not necessarily have the option of obtaining a larger sample, alternative
methods for handling measurement error in multilevel mediation models should be explored.
Thus, future research could explore use of Spearman’s attenuation correction with the relevant
variables in a multilevel mediation model. Research conducted by Goldstein, Kounali, and
Robinson (2008) on including modeling of measurement error in the conventional MM could
prove useful for this research.
Another limitation in this study is that the degree of measurement error was only manipulated
for the cluster-level mediating variable, M. In reality, both M and the outcome variable, Y, could
have measurement error. In addition, the measurement error for Y could be found on either or
both levels of the data structure and to varying degrees. The degree of measurement error in Y
will also impact recovery of model parameters as it did for M (see Equation 6). Future research
could explore how various levels of measurement error in Y affect recovery of the mediated
effect parameter under both the MLSEM and the MM.
In addition, this study looked only at balanced per-cluster sample sizes. Real-world data
more typically do not consist of equal per-cluster sample sizes. Including additional values for
the manipulated design conditions would help to provide a stronger rationale and guidelines
for when to use the MLSEM for assessing multilevel mediated effects.
Conclusions
Despite the inaccuracies identified with estimating the ab effect when ignoring the unreliability
of the mediating variable’s indicators, use of composite scores with the conventional MM results
in more powerful detection of a nonzero ab effect than modeling the measurement error using
MLSEM. As already noted, this research is limited to the conditions examined here including
the smaller sample sizes that were specifically used to test the limits of estimating the more
parameterized MLSEM model. Future research should extend the current research assessing
MLSEM and MMM estimates of latent mediating variable effects with different true values and
with larger cluster sizes to investigate whether MLSEM estimation performance improves. In
addition, this line of research could be extended to include different multilevel mediation model
designs and manipulation of measurement error in Y for larger sample sizes. Use of statewide or
large-scale assessment data might involve larger values for G and it would therefore prove useful
in that context to assess how well the mediated effect is recovered when measurement error
MULTILEVEL MEDIATION AND MLSEM 263
is modeled using both the MLSEM and other alternatives that have already been mentioned.
Unfortunately, based on this study’s results, modeling of measurement error using the MLSEM
cannot be recommended over the use of composite scores and the conventional MM for
multilevel mediation modeling in the majority of the conditions examined here.
REFERENCES
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research:
Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182.
Bauer, D. J., Preacher, K. J., & Gil, K. M. (2006). Conceptualizing and testing random indirect effects and moderated
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016
mediation in multilevel models: New procedures and recommendations. Psychological Methods, 11, 142–163.
Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: Wiley.
Bollen, K. A., & Lennox, R. (1991). Conventional wisdom on measurement: A structural equation perspective.
Psychological Bulletin, 110, 305–314.
Boomsma, A. (1983). On the robustness of LISREL (maximum likelihood estimation) against small sample size and
nonnormality. Amsterdam, The Netherlands: Sociometric Research Foundation.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.).
Hillsdale, NJ: Erlbaum.
Dunkley, D. M., & Blankstein, K. R. (2000). Self-critical perfectionism, coping, hassles, and current distress: A
structural equation modeling approach. Cognitive Therapy and Research, 24, 713–730.
Goldstein, H., Kounali, D., & Robinson, A. (2008). Modelling measurement errors and category misclassifications in
multilevel models. Statistical Modelling, 8, 243–261.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and a
meta-analysis. Sociological Methods and Research, 26, 329–367.
Hox, J. J. (1995). Applied multilevel analysis. Amsterdam, The Netherlands: TT-Publikaties.
Hox, J. J., & Maas, C. (2001). The accuracy of multilevel structure equation modeling with pseudobalanced groups
and small samples. Structural Equation Modeling, 8, 157–174.
Hoyle, R. H., & Kenny, D. A. (1999). Sample size, reliability, and tests of statistical mediation. In R. H. Hoyle (Ed.),
Statistical strategies for small sample research (pp. 195–222). Thousand Oaks, CA: Sage.
Julian, M. W. (2001). The consequences of ignoring multilevel data structures in nonhierarchical covariance modeling.
Structural Equation Modeling, 8, 325–352.
Kenny, D. A., Korchmaros, J. D., & Bolger, N. (2003). Lower level mediation in multilevel models. Psychological
Methods, 8, 115–128.
Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York, NY: Guilford.
Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. Newbury Park, CA: Sage.
Krull, J. L., & MacKinnon, D. P. (1999). Multilevel mediation modeling in group-based intervention studies. Evaluation
Review, 23, 144–158.
Krull, J. L., & MacKinnon, D. P. (2001). Multilevel modeling of individual and group level mediated effects.
Multivariate Behavioral Research, 36, 249–277.
Maas, C. J. M., & Hox, J. J. (2005). Sufficient sample size for multilevel modeling. Methodology: European Journal
of Research Methods for the Behavioral & Social Sciences, 1, 86–92.
MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. New York, NY: Erlbaum.
MacKinnon, D. P., Fritz, M. S., Williams, J., & Lockwood, C. M. (2007). Distribution of the product confidence limits
for the indirect effect: Program PRODCLIN. Behavior Research Methods, 39, 384–389.
MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of
the product and resampling methods. Multivariate Behavioral Research, 39, 99–128.
McDonald, R. P., & Goldstein, H. (1989). Balanced versus unbalanced designs for linear structural relations in two-level
data. British Journal of Mathematical and Statistical Psychology, 42, 215–232.
Meeker, W. Q., Cornwell, L. W., & Aroian, L. A. (1981). Selected tables in mathematical statistics: Vol. VII. The
product of two normally distributed random variables. Providence, RI: American Mathematical Society.
264 LI AND BERETVAS
Meyers, J. L., & Beretvas, S. N. (2006). The impact of inappropriate modeling of cross classified data structures.
Multivariate Behavioral Research, 41, 473–497.
Muthén, B. O. (1989). Multiple-group structural modeling with non-normal continuous variables. British Journal of
Mathematical and Statistical Psychology, 42, 55–62.
Muthén, B. O. (1990, April). Multilevel structural equation modeling. Paper presented at the annual meeting of the
American Educational Research Association, Boston, MA.
Muthén, B. O. (1991). Multilevel factor analysis of class and student achievement components. Journal of Educational
Measurement, 28, 338–354.
Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376–398.
Muthén, B. O., & Satorra, A. (1995). Complex sample data in structural equation modeling. In P. V. Marsden (Ed.),
Sociological methodology (pp. 267–316). Washington, DC: American Sociological Association.
Muthén, L. K., & Muthen, B. O. (2007). Mplus (Version 5) [Computer program]. Los Angeles, CA: Muthén & Muthén.
Peugh, J. L., & Enders, C. K. (2010). Specification searches in multilevel structural equation modeling: A Monte Carlo
Downloaded by [University of Texas Libraries] at 09:40 02 May 2016