Professional Documents
Culture Documents
Applications of Structural Equation Modeling (SEM) in Ecological Studies: An Updated Review
Applications of Structural Equation Modeling (SEM) in Ecological Studies: An Updated Review
Abstract
Aims: This review was developed to introduce the essential components and variants of structural equation
modeling (SEM), synthesize the common issues in SEM applications, and share our views on SEM’s future in
ecological research.
Methods: We searched the Web of Science on SEM applications in ecological studies from 1999 through 2016 and
summarized the potential of SEMs, with a special focus on unexplored uses in ecology. We also analyzed and
discussed the common issues with SEM applications in previous publications and presented our view for its future
applications.
Results: We searched and found 146 relevant publications on SEM applications in ecological studies. We found that
five SEM variants had not commenly been applied in ecology, including the latent growth curve model, Bayesian
SEM, partial least square SEM, hierarchical SEM, and variable/model selection. We identified ten common issues in
SEM applications including strength of causal assumption, specification of feedback loops, selection of models and
variables, identification of models, methods of estimation, explanation of latent variables, selection of fit indices,
report of results, estimation of sample size, and the fit of model.
Conclusions: In previous ecological studies, measurements of latent variables, explanations of model parameters,
and reports of key statistics were commonly overlooked, while several advanced uses of SEM had been ignored
overall. With the increasing availability of data, the use of SEM holds immense potential for ecologists in the future.
Keywords: SEM, Ecological, Model fit, Sample size, Feedback loops, Model identification, Model selection, Bayesian,
Latent growth curve
© The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made.
Fan et al. Ecological Processes (2016) 5:19 Page 2 of 12
latent and observable variables, Grace and Bollen (2008) ratio. Also, Capmouteres and Anand (2016) defined the
introduced composite variables for ecological applications habitat function as an environmental indicator that ex-
of SEM. Composite variables are also unobservable vari- plained both plant cover and native bird abundance for the
ables, but which assume no error variance among the forest ecosystems by using CFA.
indicators and is not estimated by factor analysis. Instead In addition to CFA, there is another type of factor
of extracting the factors from a set of indicators, compost analysis: exploratory factor analysis (EFA). The statistical
variable is an exact linear combination of the indicator estimation technique is the same for both. The CFA is
variables based on given weights. For example, Chaudhary applied when the indicators for each latent variable is
et al. (2009) conducted a study on the ecological relation- specified according to the related theories or prior know-
ship in semiarid scrublands and measured fungal abun- ledge (Joreskog 1969; Brown 2006; Harrington 2009),
dance, which is composed of hyphal density and the whereas EFA is applied to find the underlying latent vari-
concentration of Bradford-reactive soil proteins, as a com- ables. In practice, EFA is often performed to select the
posite variable. Jones et al. (2014) applied soil minerals as useful underlying latent constructs for CFA when there is
a composite variable to represent the concentrations of little prior knowledge about the latent construct (Browne
zinc, iron, and phosphorus in soil. and Cudeck 1993; Cudeck and Odell 1994; Tucker and
MacCallum 1997).
Confirmatory factor analysis SEM is composed of the measurement model and the
Confirmatory factor analysis (CFA) is the method for structural model. A measurement model measures the
measuring latent variables (Hoyle 1995; 2011; Kline latent variables or composite variables (Hoyle 1995,
2010; Byrne 2013). It extracts the latent construct from 2011; Kline 2010), while the structural model tests all
other variables and shares the most variance with related the hypothetical dependencies based on path analysis
variables. For example, abiotic stress as a latent variable (Hoyle 1995, 2011; Kline 2010).
is measured by the observation of soil changes (i.e., soil
salinity, organic matter, flooding height; Fig. 2, Grace et Performing SEM
al. 2010). Confirmatory factor analysis estimates latent There are five logical steps in SEM: model specification,
variables based on the correlated variations of the data- identification, parameter estimation, model evaluation,
set (e.g., association, causal relationship) and can reduce and model modification (Kline 2010; Hoyle 2011; Byrne
the data dimensions, standardize the scale of multiple 2013). Model specification defines the hypothesized rela-
indicators, and account for the correlations inherent in tionships among the variables in an SEM based on one’s
the dataset (Byrne 2013). Therefore, to postulate a latent knowledge. Model identification is to check if the model
variable, one should be concerned about the reason to is over-identified, just-identified, or under-identified.
use a latent variable. In the abiotic stress example given Model coefficients can be only estimated in the just-
above, community stress and disturbance are latent identified or over-identified model. Model evaluation
variables that account for the correlation in the dataset. assesses model performance or fit, with quantitative
Shao et al. (2015) applied CFA to constrict the soil- indices calculated for the overall goodness of fit. Modifi-
nutrition features to one variable that accounted for soil cation adjusts the model to improve model fit, i.e., the
organic carbon, litter total nitrogen, and carbon-to-nitrogen post hoc model modification. Validation is the process
Fig. 2 Measurements of the latent variables. This SEM measures abstract concepts (i.e., latent variables) in the ovals based on the observed
variables (modified from Grace et al. 2010)
Fan et al. Ecological Processes (2016) 5:19 Page 4 of 12
to improve the reliability and stability of the model. size, it is no longer recommended (MacCallum and
Popular programs for SEM applications are often Hong 1997; Sharma et al. 2005).
equipped with intuitive manuals, such as AMOS, Mplus, Normed fit index (NFI): NFI is highly sensitive to the
LISREI, Lavaan (R-package), piecewiseSEM (R-package), sample size (Bentler 1990). For this reason, NFI is no
and Matlab (Rosseel 2012; Byrne 2013; Lefcheck 2015). longer used to assess model fit (Bentler 1990; Hoyle 2011).
The specific details for SEM applications are compli- Tucker-Lewis index (TLI): TLI is a non-normed fit index
cated, but users can seek help from tutorials provided by (NNFI) that partly overcomes the disadvantages of NFI
Grace (2006) and Byrne (2013). and also proposes a fit index independent of sample size
(Bentler and Bonett 1980; Bentler 1990). A TLI of >0.90 is
Model evaluation indices considered acceptable (Hu and Bentler 1999).
SEM evaluation is based on the fit indices for the test of Akaike information criterion (AIC) and Bayesian infor-
a single path coefficient (i.e., p value and standard error) mation criterion (BIC): AIC and BIC are two relative
and the overall model fit (i.e., χ2, RMSEA). From the measures from the perspectives of model selection ra-
literature, the usability of model fit indices appears flex- ther than the null hypothesis test. AIC offers a relative
ible. Generally, the more fit indices applied to an SEM, estimation of the information lost when the given model
the more likely that a miss-specified model will be rejec- is used to generate data (Akaike 1974; Kline 2010; Hoyle
ted—suggesting an increase in the probability of good 2011). BIC is an estimation of how parsimonious a
models being rejected. This also suggests that one model is among several candidate models (Schwarz
should use a combination of at least two fit indices (Hu 1978; Kline 2010; Hoyle 2011). AIC and BIC are not
and Bentler 1999). There are recommended cutoff values useful in testing the null hypothesis but are useful for
for some indices, though none serve as the golden rule selecting the model with the least overfitting (Burnham
for all applications (Fan et al. 1999; Chen et al. 2008; and Anderson 2004; Johnson and Omland 2004).
Kline 2010; Hoyle 2011).
Chi-square test (χ2): χ2 tests the hypothesis that there Powerful yet unexplored SEMs
is a discrepancy between model-implied covariance Experimental and observational databases in ecological
matrix and the original covariance matrix. Therefore, the studies are often complex, non-randomly distributed, are
non-significant discrepancy is preferred. For optimal hierarchically organized and have spatial and temporal
fitting of the chosen SEM, the χ2 test would be ideal constraints (i.e., potential autocorrelations). While corre-
with p > 0.05 (Bentler and Bonett 1980; Mulaik et al. sponding SEMs exist for each type of unique data, these
1989; Hu and Bentler 1999). One should not be overly powerful and flexible SEMs have not yet been widely
concerned regarding the χ2 test because it is very sensi- explored in ecological research. Here we introduce some
tive to the sample size and not comparable among unexplored SEM uses for future endeavors.
different SEMs (Bentler and Bonett 1980; Joreskog and
Sorbom 1993; Hu and Bentler 1999; Curran et al. 2002). Latent growth curve (LGC) model
Root mean square error of approximation (RMSEA) LGC models can be used to interpret data with serial
and standardized root mean square residual (SRMR): changes over time. The LGC model is built on the
RMSEA is a “badness of fit” index where 0 indicates the assumption that there is a structure growing along with
perfect fit and higher values indicate the lack of fit (Brown the data series. The slope of growth is a latent variable,
and Cudeck 1993; Hu and Bentler 1999; Chen et al. 2008). which represents the change in growth within a specified
It is useful for detecting model misspecification and less interval, and the loading factors are a series of growing
sensitive to sample size than the χ2 test. The acceptable subjects specified by the user (Kline 2010; Hoyle 2011;
RMSEA should be less than 0.06 (Browne and Cudeck Duncun et al. 2013).
1993; Hu and Bentler 1999; Fan et al. 1999). SRMR is There are few ecological publications using LGC
similar to RMSEA and should be less than 0.09 for a good models. However, we found a civil engineering study on
model fit (Hu and Bentler 1999). water quality applying the LGC model to examine the
Comparative fit index (CFI): CFI represents the acidic deposition from acid rain in 21 stream sites across
amount of variance that has been accounted for in a the Appalachian Mountain Region from 1980 to 2006
covariance matrix. It ranges from 0.0 to 1.0. A higher (Chen and Lin 2010). This study estimated the time-
CFI value indicates a better model fit. In practice, the varying latent variable for each stream as the change of
CFI should be close to 0.95 or higher (Hu and Bentler water properties over time by using the LGC model.
1999). CFI is less affected by sample size than the χ2 test Because longitudinal data (e.g., time series) is common
(Fan et al. 1999; Tabachnick and Fidell 2001). in ecological research, LGC is especially effective in test-
Goodness-of-fit index (GFI): The range of GFI is 0–1.0, ing time-varying effects (Duncan et al. 2013; Kline 2010;
with the best fit at 1.0. Because GFI is affected by sample Hoyle 2011).
Fan et al. Ecological Processes (2016) 5:19 Page 5 of 12
In addition to LGC, SEM can be incorporated into a the distribution of the data, or even the missing data.
time series analysis (e.g., autoregressive integrated mov- Users with small sample sizes and less theoretical support
ing average model). For example, Almaraz (2005) applied for their research can apply PLS-SEM to test the causal
a time series SEM to predict the population growth of relationship (Hair et al. 2013). The algorithm of PLS-SEM
the purple heron (Ardea purpurea). The moving average is different from the common SEM, which is based on
process was used as a matrix of time-based weights for maximum likelihood. When the sample size and data dis-
analyzing the seasonal changes and autocorrelations. tribution of research can be hardly used by a common
From an ecological perspective, LGC is more plausible SEM, PLS-SEM has a more functional advantage.
than the conventional time series analysis, because an By 2016, no publications on the application of PLS-
LGC only needs longitudinal data with more than three SEM in ecological studies were found, according to our
periods rather than a time series analysis, which requires literature search. We recommend that users at the be-
a larger time series/more observations (e.g., time series ginning stage or those who have fewer data apply PLS-
of economic or climatic changes). The LGC assumes a SEM to generate the necessary evidence for causal rela-
stable growth curve of the observation. Therefore, users tionship and variable selections. This will allow users to
can weigh the curve based on the time span rather than continue collecting long-term data while updating their
time series, which requires steady intervals in the series. hypotheses (Monecke and Leisch 2012).
For further guidance, refer to the book written by Bollen
and Curran (2006). Hierarchical SEM
The hierarchical model, also known as multilevel SEM,
Bayesian SEM (BSEM) analyzes hierarchically clustered data. Hierarchical SEM
BSEM assumes theoretical support and that the prior can specify the direct and indirect causal effect between
beliefs are strong. One can use new data to update a clusters (Curran 2003; Mehta and Neale 2005; Kline
prior model so that posterior parameters can be esti- 2010). It is common for an experiment to fix some vari-
mated (Raftery 1993; Lee 2007; Kaplan and Depaoli ables constantly, resulting in multiple groups or a nested
2012). The advantage of BSEM is that it has no require- dataset. The conventional SEM omits the fact that path
ments on sample size. However, it needs prior know- coefficients and intercepts will potentially vary between
ledge on data distribution and parameters. Arhonditsis hierarchical levels (Curran 2003; Mehta and Neale 2005;
et al. (2006) applied BSEM to explore spatiotemporal Shipley 2009; Kline 2010). This method focuses on data
phytoplankton dynamics, with a sample size of <60. The generated with a hierarchical structure. Therefore, the
estimation of the model parameters’ posterior distribu- sample size needs to be large.
tion is based on various Monte Carlo simulations to The application of hierarchical SEM is flexible. Take
compute the overall mean and a 95% confidence inter- the work by Shipley (2009), for example, who analyzed
val. Due to the Bayesian framework, the model assess- the nested effects on plant growth between hierarchies,
ment of BSEM is more like a model comparison that is which included three clusters: site, year, and age. The
not based on χ2, RMSEA, CFI, etc. There are many com- causal relationship between the levels could be devel-
parison methods for the Bayesian approach. BIC is oped by Shipley’s d-sep test. With knowledge of a causal
widely used, and many statisticians suggest posterior nested system, one can first specify the hierarchies before
predictive checking to estimate the predictive ability of developing the SEM analysis within each nested structure
the model (Raftery 1993; Lee 2007; Kaplan and Depaoli (Curran 2003; Mehta and Neale 2005; Kline 2010, Fig. 3).
2012). The SEM analysis, which uses maximum likeli- The model in Fig. 3 is a confirmatory factor analysis, with
hood (ML) and the likelihood ratio χ2 test, often strictly model parameters varying in each hierarchy.
rejects the substantive theory and unnecessarily utilizes
model modification to improve the model fit by chance. SEM models and variable selection
Therefore, the Bayesian approach has received escalating Selecting the appropriate variables and models is the ini-
attention in SEM applications due to its flexibility and tial step in an SEM application. The selection algorithm
better representation of the theory. can be based on preferable variables and models accord-
ing to certain statistical criteria (Burnham and Anderson
Partial least square SEM (PLS-SEM) 2002; Burnham et al. 2011). For example, the selection
PLS-SEM is the preferred method when the study object criterion could be based on fit indexes (e.g., AIC and
does not have a well-developed theoretical base, particu- BIC). Variable selection is also called the feature selec-
larly when there is little prior knowledge on causal tion—a process of selecting the most relevant variables
relationship. The emphasis here is about the explora- to prevent overfitting (Sauerbrei et al. 2007; Murtaugh
tions rather than confirmations. PLS-SEM requires 2009; Burnham and Anderson 2002)—and is also a
neither a large sample size nor a specific assumption on required procedure for both PLS-SEM and exploratory
Fan et al. Ecological Processes (2016) 5:19 Page 6 of 12
Table 2 Presence of papers with sound explanations or confuse a new user. Kline (2006) listed two assumptions
justifications of necessary procedures, parameters, and indices for feedback loops:
Characteristics All (%) With latent Without latent
variable (%) variable (%) One is that of equilibrium, which means that any
Justification of sample size 0.0 0.0 0.0 changes in the system underlying a feedback relation
Evidence of data screening 0.0 0.0 0.0 have already manifested their effects and that the
Data distribution and preparation 0.0 0.0 0.0 system is in a steady state (Heise 1975). The other
Variable selection 8.9 7.5 9.4
assumption is that the underlying causal structure
does not change over time.
Hypothesized model diagram 96.6 94.7 97.0
Evidence of causal relationship 94.2 90.0 96.2 Some data are generated naturally from the ecosystem
Explain models modification 13.7 12.5 14.2 without artificial manipulation. The specification of the
Path coefficients 82.2 75.0 84.9 cause and outcome of ecological dynamics is confusing
Standard errors 8.9 7.5 6.8 because the underlying mechanisms of data generation
are complex and simultaneous. The applications of feed-
back loops, which specify the causal relationship in a
strong assumption encoded by setting the coefficient loop, can explain the ecological dynamics in a cyclical
to zero. Or, if one assumes that two disturbances are perspective. When a research design is based on a loop
uncorrelated, then we have another strong perspective, the SEM analysis can evaluate if the cycle is
assumption that the covariance equals zero. virtuous, vicious, or neutral.
Fig. 4 Illustration of feedback loops in ecosystem analysis. Feedback loops in SEM analysis is flexible and can be direct or indirect
Fan et al. Ecological Processes (2016) 5:19 Page 8 of 12
Model identification Model identification was often (CFA or EFA) are used to measure the latent variable,
overlooked, with only 67.8% reporting the model identifi- which requires a theoretical basis. The prior knowledge
cation, and happened when latent variables were esti- of a measurement model includes two parts: (1) the
mated. Kline (2010) proposed three essential requirements prior knowledge of indicators for a latent variable and
when identifying the appropriate SEM: (1) “the model (2) the prior knowledge of the relationships between the
degrees of freedom must be at least zero to assure the de- latent variable and its indicators (Bentler and Chou 1987).
grees of freedom (df) is greater than zero”; (2) “every For example, the soil fertility of a forest as a latent variable
latent variable (including the residual terms) must be was estimated based on two types of prior knowledge,
assigned a scale, meaning that either the residual terms’ including (1) the observation of tree density, water re-
(disturbance) path coefficient and one of the latent vari- sources, and presence of microorganisms and (2) the posi-
able’s factor loading should be fixed to 1 or that the vari- tive correlations among the three observed variables.
ance of a latent variable should be fixed to 1”; and (3) If the estimation of a latent variable is performed with-
“every latent variable should have at least two indicators.” out prior knowledge, CFA will become a method only
Most publications provided the df values in their for data dimension reduction. In addition, we did not
SEMs and we estimated the df of those that did not re- find any CFAs in the ecological publications explaining
port. All publications had a df greater than zero. All the the magnitude of the latent variable. Therefore, these
models with CFA met the requirement that each latent latent variables lack a meaningful explanation in regard
variable should have at least two indicators. However, to the hypothesis of an SEM (Bollen 2002; Duncan et al
many studies skipped over scaling the latent variables 2013). Another issue concerning latent variables in
before estimation, resulting in non-robust results. The ecological research is that some “observable variables”
unscaled latent variable can hardly provide useful infor- (e.g., salinity, pH, temperature, and density) are measured
mation to the causal test. Otherwise, it is likely that the as a latent variable. The reasons are very flexible for meas-
user had just fit the model by chance. uring a latent variable, but they require the user to explain
the application of CFA carefully.
Estimation methods Many estimation methods in SEM SEM requires measurement models to be based on prior
exist, such as maximum likelihood (ML), generalized knowledge so that latent variables can be interpreted
least squares, weighted least squares, and partial least correctly (Bentler and Chou 1987). SEM is not a method
squares. Maximum likelihood estimation is the default to only reduce data dimensions. Instead, one should
estimation method in many SEM software (Kline 2010; explain the magnitude and importance of indicators and
Hoyle 2011). All of the publications stated the estima- latent variables. Therefore, users should base their expla-
tion methods were based on ML, which assumes that (1) nations on theory when discussing the associated changes
no skewness or kurtosis in the joint distribution of the between latent variables and indicators. The explanation
variables exists (e.g., multivariate normality); (2) the vari- should include the analysis of the magnitude of the latent
ables are continuous; and (3) there are very few missing variable, indicators, and factor loadings.
data (i.e., <5%, Kline 2010; Hoyle 2011). However, very
few publications provided this key information about Report of model fit indices Reporting of fit indices in
their data. Instead, they simply ignored the data quality any SEM is strongly recommended and needed. Approxi-
or chose not to discuss the raw data. Some papers briefly mately 93.8% of the publications provided model fit indi-
discussed the multivariate normality of their data, but ces. However, none justified their usage of the chosen fit
none discussed the data screening and transformation indices. Those that did not report model fit indices also
(i.e., skewness or kurtosis, continuous or discrete, and did not provide the reason for doing so. From these publi-
missing data). We assume that most of their ecological cations, χ2, CFI, RMSEA, TLI, GFI, NFI, SRMR, AIC, and
data was continuous, yet one needs to assure the con- BIC were frequently used. The χ2 was included in almost
tinuity of the data to support their choice of estimation every paper because it is the robust measure for model
methods. The partial least square method requires fitness. Some publications without significant χ2 tests
neither continuous data nor multi-normality. reported their SEM results regardless. In addition, GFI
and NFI were also used even though they are not recom-
Explanations of the measured latent variables We did mended as measures for model fit.
not find a publication with sufficient explanation for its Fit indices are important indicators of model perfor-
CFA in regard to the prior knowledge or preferred func- mances. Due to their different properties, they are sensi-
tion (i.e., unmeasured directly, quantifiable, and neces- tive to many factors, such as data distribution, missing
sary to the model) for measuring the latent variable. data, model size, and sample size (Hu and Bentler 1999;
Factor analysis is a useful tool for dimension reduction. Fan and Sivo 2005; Barrett 2007). Most fit indices (i.e.,
The factor analysis applied in SEM measurement models χ2, CFI, RMSEA, TLI, GFI, NFI, SRMR) are greatly
Fan et al. Ecological Processes (2016) 5:19 Page 9 of 12
influenced by multivariate normality (i.e., a property of variables, factor loadings, standard errors, p values,
ML method that is applied in SEM). Meanwhile, CFI, R2, standardized and unstandardized structure
RMSEA, and SRMR are useful in detecting model mis- coefficients, and graphic representations of the
specification, and relative fit indices (e.g., AIC and BIC) model.
are mainly used for model selection (Curran et al. 1996;
Fan and Sivo 2005; Ryu 2011). Selection of model fit Sample size The estimation of sample size is another
indices in an SEM exercise is key to explaining the issue for the SEM application. So far the estimation of
model (e.g., type, structure, and hypothesis). Users should sample size is flexible, and users could refer to several
at least discuss the usage of fit indices to ensure that they authors’ recommendations (Fan et al. 1999; Muthen and
are consistent with their study objectives. Muthen 2002; Iacobucci 2010). While some (61.0%)
studies reported the sample size clearly, none of them
Report of the results An SEM report should include all provided a justification for the sample size with sound
the estimation and modeling process reports. However, theory (Table 2). Technically, sample size for an SEM
most publications did not include a full description of varies depending on many factors, including fit index,
the results for their hypothesis tests. Some publications model size, distribution of the variables, amount of miss-
provided their SEMs based on a covariance matrix ing data, reliability of the variables, and strength of path
(Table 1), while even fewer studies reported the exact in- parameters (Fan et al. 1999; Muthen and Muthen 2002;
put covariance or correlation matrix. No study reported Fritz and MacKinnon 2007; Iacobucci 2010). Some
the multivariate normality, absence, or outliers of their researchers recommend a minimum sample size of 100–
data. The majority of the papers (82.2%) reported the 200 or five cases per free parameter in the model
path coefficients, but very few reported both unstan- (Tabachnick and Fidell 2001; Kline 2010). One should be
dardized and standardized path coefficients. A small cautious when applying these general rules, however.
percentage (8.9%) of the publications reported the stand- Increasingly, use of model-based methods for estimation
ard error for the path coefficient. The basic statistics (i.e., of sample size is highly recommended, with sound
p value, R2, standard errors) are of equal importance as methods based on fit indices or power analysis of the
the overall fit indices because they explain the validity and model. Muthen and Muthen (2002) developed a method
reliability of each path, providing evidence for when the based on the Monte Carlo simulation to utilize SEM’s
overall fit is poor (Kline 2010; Hoyle 2011). statistical power analysis and calculate sample size (Cohen
Hoyle and Isherwood (2013) suggested that a publica- 2013). Kim (2005) developed equations to compute the
tion with an SEM analysis should follow the Journal sample size based on model fit indices for a given
Article Reporting Standards of the American Psychological statistical power.
Association. The reporting guidelines are comprised of five
components (McDonald and Ho 2002; Jackson et al. 2009; Model validation We did not find that SEM was vali-
Kline 2010; Hoyle and Isherwood 2013): dated in the reviewed ecological studies, even though it
is a necessary process for quantitative analysis. This is
1. Model specification: Model specification process probably because most SEM software is developed with-
should be reported, including prior knowledge of the out model validation features. The purpose of model
theoretically plausible models, prior knowledge of validation is to provide more evidence for the hypothet-
the positive or negative direct effects among ical model. The basic method of model validation is to
variables, data sampling method, sample size, and test a model by two or more random datasets from
model type. the same sample. Therefore, the validation requires a
2. Data preparation: Data processing should be large sample size. The principle of the model validation
reported, including the assessment of multivariate is to assure that the parameters are similar when a
normality, analysis of missing data, method to model is based on different datasets from the same
address missing data, and data transformations. population. This technique is a required step in many
3. Estimation of SEM: The estimation procedure learning models. However, it is still unpopular in
should be reported, including the input matrix, SEM applications.
estimation method, software brand and version, and
method for fixing the scale of latent variables. Conclusions
4. Model evaluation and modification: The model SEM is a powerful multivariate analysis tool that has great
evaluation should be reported, including fit indices potential in ecological research, as data accessibility con-
with cutoff values and model modification. tinues to increase. However, it remains a challenge even
5. Reports of findings: All of the findings from an SEM though it was introduced to the ecological community
analysis should be reported, including latent decades ago. Regardless of its rapidly increased application
Fan et al. Ecological Processes (2016) 5:19 Page 10 of 12
Burnham KP, Anderson DR, Huyvaert KP (2011) AIC model selection and Hoyle RH (2011) Structural equation modeling for social and personality
multimodel inference in behavioral ecology: some background, observations, psychology. Sage, London
and comparisons. Behav Ecol Sociobiol 65(1):23–35 Hoyle RH, Isherwood JC (2013) Reporting results from structural equation
Byrne BM (2013) Structural equation modeling with AMOS: basic concepts, modeling analyses in Archives of Scientific Psychology. Arch Sci
applications, and programming. Routledge, New York Psychol 1:14–22
Capmourteres V, Anand M (2016) Assessing ecological integrity: a multi-scale Hu LT, Bentler PM (1999) Cutoff criteria for fit indexes in covariance structure
structural and functional approach using structural equation modeling. Ecol analysis: conventional criteria versus new alternatives. Struct Equ Modeling
Indic http://dx.doi.org/10.1016/j.ecolind.2016.07.006 6(1):1–55
Chang WY (1981) Path analysis and factors affecting primary productivity. J Iacobucci D (2010) Structural equations modeling: fit indices, sample size, and
Freshwater Ecol 1(1):113–120 advanced topics. J Consum Psychol doi: http://dx.doi.org/10.1016/j.jcps.2009.09.003
Chapin FS, Conway AJ, Johnstone JF, Hollingsworth TN, Hollingsworth J (2016) Jackson DL, Gillaspy JA, Purc-Stephenson R (2009) Reporting practices in
Absence of net long-term successional facilitation by alder in a boreal Alaska confirmatory factor analysis: an overview and some recommendations.
floodplain. Ecology, doi: http://dx.doi.org/10.1002/ecy.1529 Psychol Methods 14(1):6–23
Chaudhary VB, Bowker MA, O'Dell TE, Grace JB, Redman AE, Rillig MC, Johnson Johnson JB, Omland KS (2004) Model selection in ecology and evolution. Trends
NC (2009) Untangling the biological contributions to soil stability in semiarid Ecol Evol 19(2):101–108
shrublands. Ecol Appl 19(1):110–122 Jones CM, Spor A, Brennan FP, Breuil MC, Bru D, Lemanceau P, Griffiths B, Hallin
Chen Y, Lin L (2010) Structural equation-based latent growth curve modeling of S, Philippot L (2014) Recently identified microbial guild mediates soil N2O
watershed attribute-regulated stream sensitivity to reduced acidic deposition. sink capacity. Nat Clim Change 4(9):801–805
Ecol Model 221(17):2086–2094 Joreskog KG (1969) A general approach to confirmatory maximum likelihood
Chen F, Curran PJ, Bollen KA, Kirby J, Paxton P (2008) An empirical evaluation of factor analysis. Psychometrika 34(2):183–202
the use of fixed cutoff points in RMSEA test statistic in structural equation Joreskog KG (1970) A general method for estimating a linear structural equation
models. Socio Meth Res 36(4):462–494 system. ETS Research Bulletin Series 1970(2):1–41
Chen J, John R, Shao C, Fan Y, Zhang Y, Amarjargal A, Brown DG, Qi J, Han J, Joreskog KG (1978) Structural analysis of covariance and correlation matrices.
Lafortezza R, Dong G (2015) Policy shifts influence the functional changes of Psychometrika 43(4):443–473
the CNH systems on the Mongolian plateau. Environ Res Lett 10(8):085003 Joreskog KG, Goldberger AS (1975) Estimation of a model with multiple indicators
Cohen J (2013) Statistical power analysis for the behavioral sciences. Academic and multiple causes of a single latent variable. JASA 70(351a):631–639
Press, New York Joreskog K, Sorbom D (1993) LISREL 8: structural equation modeling with the
Cover TM, Thomas JA (2012) Elements of information theory. John Wiley & Sons, SIMPLIS command language. Scientific Software International, Chicago
Hoboken, New Jersey Kaplan D, Depaoli S (2012) Bayesian structural equation modeling. In: Hoyle RH
Cudeck R, Odell LL (1994) Applications of standard error estimates in unrestricted (ed) Handbook of structural equation modeling. Guilford, New York
factor analysis: significance tests for factor loadings and correlations. Psychol Kim KH (2005) The relation among fit indexes, power, and sample size in
Bull 115(3):475–487 structural equation modeling. Struct Equ Modeling 12(3):368–390
Curran PJ (2003) Have multilevel models been structural equation models all Kline RB (2006) Reverse arrow dynamics. Formative measurement and feedback
along? Multivar Behav Res 38(4):529–569 loops. In: Hancock GR, Mueller RO (eds) Structural equation modeling: A
Curran PJ, West SG, Finch JF (1996) The robustness of test statistics to second course. Information Age Publishing, Greenwich
nonnormality and specification error in confirmatory factor analysis. Psychol Kline RB (2010) Principles and practice of structural equation modeling. Guilford
Methods 1(1):16–29 Press, New York
Curran PJ, Bollen KA, Paxton P, Kirby J, Chen F (2002) The noncentral chi-square Lamb EG, Mengersen KL, Stewart KJ, Attanayake U, Siciliano SD (2014) Spatially
distribution in misspecified structural equation models: finite sample results explicit structural equation modeling. Ecology 95(9):2434–2442
from a Monte Carlo simulation. Multivar Behav Res 37(1):1–36 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Duncan TE, Duncan SC, Strycker LA (2013) An introduction to latent variable Lee SY (2007) Structural equation modeling: a Bayesian approach. John Wiley &
growth curve modeling: concepts, issues, and applications, 2nd edn. Sons, Hong Kong
Psychology Press, New York Lefcheck J (2015) piecewiseSEM: Piecewise structural equation modelling in r
Eisenhauer N, Bowker M, Grace J, Powell J (2015) From patterns to causal for ecology, evolution, and systematics. Methods Ecol Evol,
understanding: structural equation modeling (SEM) in soil ecology. doi: http://dx.doi.org/10.1111/2041-210X.12512
Pedobiologia 58(2):65–72 Liu X, Swenson NG, Lin D, Mi X, Umaña MN, Schmid B, Ma K (2016) Linking
Fan X, Sivo SA (2005) Sensitivity of fit indexes to misspecified structural or individual-level functional traits to tree growth in a subtropical forest.
measurement model components: rationale of two-index strategy revisited. Ecology, doi: http://dx.doi.org/10.1002/ecy.1445
Struct Equ Modeling 12(3):343–367 MacCallum RC, Hong S (1997) Power analysis in covariance structure modeling
Fan X, Thompson B, Wang L (1999) Effects of sample size, estimation methods, using GFI and AGFI. Multivar Behav Res 32(2):193–210
and model specification on structural equation modeling fit indexes. Struct Maddox GD, Antonovics J (1983) Experimental ecological genetics in Plantago: a
Equ Modeling 6(1):56–83 structural equation approach to fitness components in P. aristata and P.
Fritz MS, MacKinnon DP (2007) Required sample size to detect the mediated patagonica. Ecology 64(5):1092–1099
effect. Psychol Sci 18(3):233–239 McDonald RP, Ho MH (2002) Principles and practice in reporting structural
Galton F (1888) Personal identification and description. Nature 38:173–177 equation analyses. Psychol Methods 7(1):64–82
Grace JB (2006) Structural equation modeling and natural systems. Cambridge Mehta PD, Neale MC (2005) People are variables too: multilevel structural
University Press, New York equations modeling. Psychol Methods 10(3):259–284
Grace JB, Bollen KA (2008) Representing general theoretical concepts in structural Monecke A, Leisch F (2012) semPLS: structural equation modeling using partial
equation models: the role of composite variables. Environ Ecol Stat 15(2):191–213 least squares. J Stat Softw 48(3):1–32
Grace JB, Anderson TM, Olff H, Scheiner SM (2010) On the specification of Mulaik SA, James LR, Van Alstine J, Bennett N, Lind S, Stilwell CD (1989)
structural equation models for ecological systems. Ecol Monogr 80(1):67–87 Evaluation of goodness-of-fit indices for structural equation models. Psychol
Haavelmo T (1943) The statistical implications of a system of simultaneous Bull 105(3):430–445
equations. Econometrica 11(1):1–12 Murtaugh PA (2009) Performance of several variable-selection methods applied
Hair JF, Hult GT, Ringle C, Sarstedt M (2013) A primer on partial least squares to real ecological data. Ecol Lett 12(10):1061–1068
structural equation modeling (PLS-SEM). Sage, Thousand Oak Muthen LK, Muthen BO (2002) How to use a Monte Carlo study to decide on
Harrington D (2009) Confirmatory factor analysis. Oxford University Press, New York sample size and determine power. Struct Equ Modeling 9(4):599–620
Heise DR (1975) Causal analysis. John Wiley & Sons, Oxford Pearl J (2003) Causality: models, reasoning, and inference. Econometr Theor, doi: :
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief http://dx.doi.org/10.1017/S0266466603004109
nets. Neural Comput 18(7):1527–1554 Pearl J (2009) Causality. Cambridge University Press, New York
Hoyle RH (1995) Structural equation modeling: concepts, issues, and applications. Pearl J (2012) The causal foundations of structural equation modeling. In: Hoyle RH
Sage, Thousand Oak (ed) Handbook of Structural Equation Modeling. Guilford, New York, pp 68–91
Fan et al. Ecological Processes (2016) 5:19 Page 12 of 12