You are on page 1of 26

58  I.

F ou n dat i o ns

This rather intricate question, which involves nested on a set of 10 individuals, with Joe being participant 1.
conditionals, is the basis for defining mediation, to Each is characterized by a distinct vector ui = (1, 2, 3),
be discussed fully in the next section. Using our sub- as shown in the first three columns of Table 3.1.
script notation, the quantity sought can be written as For each triplet (1, 2, 3), the model of Figure 3.4a
Y0,Z1, where Z1 is the value that Z would attain had X enables us to complete a full row of the table, including
been one. To compute this quantity we need to form Y0 and Y1, which stand for the potential outcomes under
two modified models. The first, shown in Figure 3.5a, control (X = 0) and treatment (X = 1) conditions, respec-
to compute Z1, the second antecedent in Y0,Z1: tively. We see that a simple structural model like the
one in Figure 3.4a encodes in effect a synthetic popula-
Z1 = 1.0 × 0.5 + 0.75 = 1.25 tion of individuals together with their predicted behav-
ior under both observational and experimental condi-
The second, shown in Figure 3.5(b), to compute Y0,Z1 tions. The columns labeled X, Y, Z predict the results
and thus provide an answer to Q2: of observational studies, and those labeled Y0 , Y1, Z 0 ,
Z1 predict the hypothetical outcome under two treat-
Y0,Z1 = Y0,1.25 = 1.25 × 0.4 + 0.75 = 1.25 ment regimens, X = 0 and X = 1. Many more, in fact
infinite potential outcomes may be predicted as well,
If we compare this value of Y0,Z1 = 1.25 with Joe’s for example, Y X=0.5,Z=2.0 computed in Figure 3.4c, and
outcome had he not received any treatment, Y0 = 0.75 all combinations of subscripted variables. From this
× 0.4 + 0.75 = 1.05, the difference is, as expected, the synthetic population one can find the distribution of
indirect effect of X on Y, Y0,Z1 – Y0 = 0.20 = b × g. every counterfactual query on variables X, Y, Z, includ-
This exercise may seem unnecessarily complicated ing, in particular, retrospective counterfactuals, such as
in linear models, where we can compute our desired the probability that a person chosen at random would
quantity directly from the product b × g. The benefit have passed the exam by getting assistance given that,
of using counterfactuals is revealed in the next section, in reality, he/she failed the exam and did not receive
where indirect effects are defined for discrete variables any assistance.7
and estimated from data without assuming any para- This prediction power was facilitated, of course,
metric forms of the equations. with the help of two untestable pieces of information:
(1) the structure of the model (which includes the as-
sumption of independent error terms) and (2) the values
Predicting Outcomes and Potential Outcomes
of the model parameters (which include the distribu-
in Empirical Studies
tion of each exogenous variable). Whereas the latter can
Having convinced ourselves that every counterfactual often be inferred from the data (see the next section),
question can be answered (using Equation 3.6) from a the former depends largely on scientific judgment.
fully specified structural model, we next move to pop- Now assume that we have no information whatso-
ulation-level analysis and ask a policy-related question ever about the underlying model and all we have are

ε2 = 0.75 ε2 = 0.75

Z = 1.25 Z = 1.25
γ=

γ=
0.5

ε1 = 0.5 ε3 = 0.75 ε1= 0.5 ε3 = 0.75


β=

0.4

0.4

X= 1 α = 0.7 Y = 1.95 X= 0 α = 0.7 Y= 1.25


(a) (b)

FIGURE 3.5. Unit-specific structural models used for answering a nested counterfactual question concerning the indirect effect
of X on Y. (a) Modified model needed for calculating Z1. (b) Modified model needed for calculating Y0,Z1.
3. The Causal Foundations of Structural Equation Modeling  59

TABLE 3.1. Potential and Observed Outcomes Predicted by the Structural Model
of Figure 3.4(a)
Participant
characteristics Observed behavior Predicted potential outcomes
Participant 1 2 3 X Y Z Y0 Y1 Z0 Z1 Y00 . . .

1 0.5 0.75 0.75 0.5 1.50 1.0 1.05 1.95 0.75 1.25 0.75
2 0.3 0.1 0.4 0.3 0.71 0.25 0.44 1.34 0.1 0.6 0.4
3 0.5 0.9 0.2 0.5 1.01 1.15 0.56 1.46 0.9 1.4 0.2
4 0.6 0.5 0.3 0.6 1.04 0.8 0.50 1.40 0.5 1.0 0.3
5 0.5 0.8 0.9 0.5 1.67 1.05 1.22 2.12 0.8 1.3 0.9
6 0.7 0.9 0.3 0.7 1.29 1.25 0.66 1.56 0.9 1.4 0.3
7 0.2 0.3 0.8 0.2 1.10 0.4 0.92 1.82 0.3 0.8 0.8
8 0.4 0.6 0.2 0.4 0.80 0.8 0.44 1.34 0.6 1.1 0.2
9 0.6 0.4 0.3 0.6 1.00 0.7 0.46 1.36 0.4 0.9 0.3
10 0.3 0.8 0.3 0.3 0.89 0.95 0.62 1.52 0.8 1.3 0.3

Note. Units were selected at random, with each i uniformly distributed over [0, 1].

measurements on Y taken in the experimental study in to the experimenter, where a square indicates that the
which X is randomized over two levels, X = 0 and X = 1. response was not observed.8 Randomization assures us
Table 3.2 describes the responses of the same 10 that although half of the potential outcomes are not ob-
participants (Joe being participant 1) under such ex- served, the difference between the observed means in
perimental conditions. The first two columns give the the treatment and control groups will converge to the
true potential outcomes (taken from Table 3.1) while average of the true difference, E(Y1 – Y0) = 0.9.
the last two columns describe the information available In our model, since all exogenous variables are in-
dependent, the slope of the regression of Y on X would
also converge to the average causal effect. Bias will be
introduced if 1 is correlated with 2 or with 3. How-
TABLE 3.2. Potential and Observed Outcomes in a ever, such correlation will not bias the average causal
Randomized Clinical Trial with X Randomized over effect estimated in the experimental study.
X = 0 and X = 1
Predicted potential
Relations to
outcomes Observed outcomes
the Potential‑Outcome Framework
Participant Y0 Y1 Y0 Y1
Definition 2 constitutes the bridge between SEM and
1 1.05 1.95 1.05  a framework called “potential outcome” (Rubin, 1974)
2 0.44 1.34  1.34 which is often presented as a “more principled alter-
3 0.56 1.46  1.46 native” to SEM (Holland, 1988; Rubin, 2004; Sobel,
4 0.50 1.40  1.40 1996, 2008; Wilkinson et al., 1999). Such presentations
5 1.22 2.12 1.22  are misleading and misinformed; the two frameworks
6 0.66 1.56 0.66  have been proven to be a logical equivalent, differing
7 0.92 1.82  1.82 only in the language in which researchers are permitted
8 0.44 1.34 0.44  to express assumptions. A theorem in one is a theorem
9 0.46 1.36  1.36 in the other (Pearl, 2009, pp. 228–231), with Definition
10 0.62 1.52 0.62  2 providing the formal basis for both.
The idea of potential-outcome analysis is simple. Re-
True average Study average searchers who feel uncomfortable presenting their as-
treatment effect: 0.90 treatment effect: 0.68
sumptions in diagrams or structural equations may do
60  I. F ou n dat i o ns

so in a roundabout way, using randomized trial as the section illustrates how such symbiosis clarifies the con-
ruling paradigm, and interpret the counterfactual Yx(u) ceptualization and estimation of direct and indirect ef-
as the potential outcome of subject u to hypothetical fects, a task that has lingered on for several decades.
treatment X = x, ignoring the mechanisms that govern
that outcome. The causal inference problem is then set
up as one of “missing data,” where the missing data THE TESTABLE IMPLICATIONS
are the potential outcomes Yx(u) under the treatment OF STRUCTURAL MODELS
not received, while the observed data are the potential
outcomes under the received treatments, as shown in This section deals with the testable implications of
Table 3.2. structural models, sometimes called “overidentifying
Thus, Yx becomes a new latent variable that reveals restrictions,” and ways of reading them from the graph.
its value only when X = x, through the relation

X =x ⇒ Yx =Y (3.7) The d‑Separation Criterion


Although each causal assumption in isolation cannot be
sometimes written (for binary X) tested in nonexperimental studies, the sum total of all
causal assumptions in a model often has testable impli-
Y = xY1 + (1 – x)Y0 cations. The chain model of Figure 3.3a, for example,
encodes seven causal assumptions, each corresponding
Beyond this relation (known as “consistency as- to a missing arrow or a missing double-arrow between
sumption”), the investigator may ignore the fact that a pair of variables. None of those assumptions is test-
Yx is actually Y itself, only measured under different able in isolation, yet the totality of all seven assump-
conditions (as in Figure 3.4c), and proceed to estimate tions implies that Z is unassociated with Y in every
the average causal effect, E(Yx′) – E(Yx), with all the ma- stratum of X. Such testable implications can be read off
chinery that statistics has developed for missing data. the diagrams using a graphical criterion known as d-
Moreover, since Equation 3.7 is also a theorem in the separation (Pearl, 1986, 1988), which is the basis of all
logic of structural counterfactuals (Pearl, 2009, Ch. 7) methods of discovering structure from data.
and a complete one,9 researchers in this camp are guar-
anteed never to obtain results that conflict with those Definition 3 (d-separation)
derived in the structural framework.
The weakness of this approach surfaces in the prob- A set S of nodes is said to block a path p if either
lem formulation phase where, deprived of diagrams and (1) p contains at least one arrow-emitting node
structural equations, researchers are forced to express that is in S, or (2) p contains at least one collision
the (inescapable) assumption set A in a language totally node that is outside S and has no descendant in S.
removed from scientific knowledge, for example, in the If S blocks all paths from set X to set Y, it is said to
form of conditional independencies among counterfac- “d-separate X and Y,” and then, it can be shown that
tual variables (see Pearl, 2010a). variables X and Y are independent given S, written
For example, to express the fact that, in randomized X ⊥⊥ Y | S.10
trial, X is independent on both 2 and 3 (Figure 3.4a),
the investigator would need to write the cryptic, “strong To illustrate, the path UZ → Z → X → Y in Figure
ignorability” expression X ⊥⊥ {Z1, Z 0 , Y00 , Y01, Y10 , Y11}. 3.3a is blocked by S = {Z} and by S = {X}, since each
To overcome this obstacle, Pearl (2009) has devised emits an arrow along that path. Consequently, we can
a way of combining the best features of the two ap- infer that the conditional independencies U Z ⊥⊥ Y | Z
proaches. It is based on encoding causal assumptions and U Z ⊥⊥ Y | X will be satisfied in any probability
in the language of diagrams or structural equations, function that this model can generate, regardless of
translating these assumptions into counterfactual nota- how we parametrize the arrows. Likewise, the path
tion, performing derivation in the algebraic language of UZ → Z → X ← U X is blocked by the null set {∅}, but
counterfactuals, using axioms derived from Equation it is not blocked by S = {Y}, since Y is a descendant of
3.6, and, finally, interpreting the result in plain causal the collision node X. Consequently, the marginal inde-
language. The mediation problem discussed in the next pendence U Z ⊥⊥ U X will hold in the distribution, but
3. The Causal Foundations of Structural Equation Modeling  61

U Z ⊥⊥ U X | Y may or may not hold. This special han- only testable implications of the model. If the model
dling of collision nodes (or colliders, e.g., Z → X ← U X) contains correlated errors, additional constraints are
reflects a general phenomenon known as Berkson’s imposed, called “dormant independence” (Shpitser &
paradox (Berkson, 1946), whereby observations on a Pearl, 2008) or Verma’s constraints (McDonald, 2002;
common consequence of two independent causes ren- Verma & Pearl, 1990), generated by missing links that
der those causes dependent. For example, the outcomes would otherwise be identified (e.g., the missing link
of two independent coins are rendered dependent by the from X to W in Figure 3.7). This means that traditional
testimony that at least one of them is a tail. algebraic methods of recognizing “overidentified mod-
The testable implications of any given model are els,” deriving “overidentifying restrictions” and deter-
vividly advertised by its associated graph G. Each d- mining “parameter identification” (Kenny & Milan,
separation condition in G corresponds to a conditional 2012)11 can be replaced by simple graphical conditions,
independence test that can be performed on the data to advertised by nonadjacent variables in the model.
support or refute the validity of M. These can easily be
enumerated by attending to each missing edge in the
Equivalent Models
graph and selecting a set of variables that d-separate
the pair of variables corresponding to that missing D-separation also defines conditions for model equiva-
edge. For example, in Figure 3.6, three of the missing lence that are easily ascertained in the Markovian mod-
edges are Z1 – Z2 , Z1 – Y, and Z2 – X, with separat- els (Verma & Pearl, 1990) as well as semi-Markovian
ing sets {∅}, {X, Z2 , Z3}, and {Z1, Z3}, respectively. models (Ali, Richardson, & Spirtes, 2009). These
Accordingly, the testable implications of M include mathematically proven conditions should amend the
Z1 ⊥⊥ Z 2 , Z1 ⊥⊥ Y | { X , Z 2 , Z 3}, and Z 2 ⊥⊥ X | {Z1, Z 3}. restricted (and error-prone) rules currently prevail-
In linear systems, these conditional independence ing in SEM’s research (Kline, 2011; Williams, 2012),
constraints translate into zero partial correlations, based primarily on the replacement rules of Lee and
or zero coefficients in the corresponding regression Hershberger (1990). The general necessary rule for any
equations. For example, the three implications above modification of a model to preserve equivalence is that
translate into the following constraints: rZ1Z 2 = 0 , the modification not create or destroy any d-separation
rYZ1  XZ 2 Z3 = 0 , and rZ 2 X  Z1Z3 = 0. condition in the modified graph.
Such tests are easily conducted by routine regres- For example, consider the model of Figure 3.7.
sion techniques, and they provide valuable diagnostic According to the replacement criterion of Lee and
information for model modification, in case any of Hersh­berger we can replace the arrow X → Y with a
them fail (see Pearl, 2009, pp. 143–145). Software rou- double-arrow edge X ↔ Y (representing residual cor-
tines for automatic detection of all such tests, as well as relation) when all predictors (Z) of the effect variable
other implications of graphical models, are reported in (Y) are the same as those for the source variable (X)
Kyono (2010). (see Hershberger, 2006). Unfortunately, the postre-
If the model is Markovian (i.e., acyclic with uncor- placement model imposes a constraint, rW Z Y = 0, that
related errors), then the d-separation conditions are the is not imposed by the prereplacement model. This

Z1
Z2 Z X W
W1
Z3 W2

X
W3
Y Y

FIGURE 3.6. A Markovian model illustrating d-separation. FIGURE 3.7. Showing discrepancy between Lee and Hersh-
Error terms are assumed mutually independent and not berger’s replacement rule and d-separation, which forbids the
shown explicitly. replacement of X → Y by X ↔ Y.
62  I. F ou n dat i o ns

can be seen from the fact that, conditioned on Y, the The intuition behind the back-door criterion is as fol-
path Z → Y ← X ↔ W is unblocked and will become lows: The back-door paths in the diagram carry spuri-
blocked if replaced by Z → Y ↔ X ↔ W. The same ap- ous associations from X to Y, while the paths directed
plies to path Z → X ↔ W, since Y would cease to be a along the arrows from X to Y carry causative associa-
descendant of X. tions. Blocking the former paths (by conditioning on S)
ensures that the measured association between X and
Identification Using Graphs: Y is purely causal, namely, it correctly represents the
The Back‑Door Criterion target quantity: the causal effect of X on Y. The reason
for excluding descendants of X (e.g., W3 or any of its
Consider an observational study where we wish to find
descendants) and conditions for relaxing this restriction
the effect of X on Y—for example, treatment on re-
are given in (Pearl, 2009, pp. 338–341).
sponse—and assume that the factors deemed relevant
to the problem are structured as in Figure 3.6; some of
these factors may be unmeasurable, such as genetic trait Identifying Parameters and Causal Effects
or lifestyle; others are measurable, such as gender, age,
The back-door criterion provides a simple solution to
and salary level. Using the terminology of the third sec-
many identification problems, in both linear and non-
tion, our problem is to determine whether the query Q
linear models, and is summarized in the next theorem.
= P(y | do(x)) is identifiable given the model and, if so,
to derive an estimand Q(P) to guide the estimation of Q.
Theorem 1 (causal effects identification)
This problem is typically solved by “adjustment,”
that is, selecting a subset of factors for measurement, For any two disjoint sets of variables, X and Y in
so that comparison of treated versus untreated subjects a causal diagram G, the causal effect of X on Y is
having the same values of the selected factors gives the given by
correct treatment effect in that subpopulation of sub-
jects. Such a set of factors is called a “sufficient set” or P (Y y=
= do ( X x ) )
(3.8)
“admissible set” for adjustment.
The following criterion, named “back-door” in Pearl
= (Y
∑ P= y= , S s ) P=
X x= (S s)
s
(1993), provides a graphical method of selecting admis-
sible sets of factors and demonstrates that nonparamet- where S is any set of covariates satisfying the back-
ric queries such as Q = P(y | do(x)) can sometimes be door condition of Definition 4.
identified with no knowledge of the functional form of
the equations or the distributions of the latent variables Since all factors on the right-hand side of the equa-
in M. tion are estimable (e.g., by regression) from preinter-
ventional data, the causal effect can likewise be esti-
Definition 4 (admissible sets—the back-door cri- mated from such data without bias.
terion) A set S is admissible (or “sufficient”) if two In linear systems, identified causal effect expres-
conditions hold: sions like Equation 3.8 reduce to sums and products
1. No element of S is a descendant of X. of partial regression coefficients. For example, if we
2. The elements of S “block” all “back-door” wish to estimate the total effect tXY of X on Y in the
paths from X to Y—namely, all paths that end linear version of Figure 3.6, we simply take the regres-
with an arrow pointing to X. sion coefficient of Y on X, partialed on any sufficient
set S, giving
In this criterion, “blocking” is interpreted as in
Definition 3. Based on this criterion we see, for ex- tXY = rY X·S = rY X·Z1Z3 = rY X·W1Z3 = . . .
ample in Figure 3.6, that the sets {Z1, Z2 , Z3}, {Z1,
Z3}, {W1, Z3}, and {W2 , Z3} are each sufficient for Current SEM practices do not take advantage of this
adjustment, because each blocks all back-door paths capability to decide identification graphically, prior to
between X and Y. The set {Z3}, however, is not suffi- obtaining data, and to estimate the identified quantities
cient for adjustment because it does not block the path directly, by partialing out sufficient sets (see Kenny &
X ← W1 ← Z1 → Z3 ← Z2 → W2 → Y. Milan, 2012). Rather, the prevailing practice is either to
3. The Causal Foundations of Structural Equation Modeling  63

engage in lengthy algebraic manipulations, or to iden- In Figure 3.7, for example, bXY equals rYX  Z , or the
tify the model in its entirety by running ML routines coefficient b1 in the regression Y = b1X + b2 Z + , while
on noisy data and hoping for their convergence. This bYW, labeling the arrow Y → W, is equal to rWY  XZ . Note
is unfortunate because the target quantity may often be that regressing W on Y and X alone is insufficient, for
identifiable when the model as a whole is not (see Pearl, it would leave the path Y ← Z → X ↔ W unblocked. In
2009, p. 151, for examples). Moreover, estimation accu- a similar fashion we obtain bZY = rYX  Z and bZX = rXZ .
racy deteriorates when we allow noisy data of irrelevant If no set S can be found that satisfies the conditions
variables to corrupt the estimation of the target quantity of Corollary 1, then bXY cannot be reduced to a single
(McDonald, 2004). The theory of d-separation and the regression coefficient, and other identification tech-
back-door criterion enable us to focus the identification niques may be invoked, for example, instrumental vari-
of target quantities on the relevant variables and extract ables (Brito & Pearl, 2002a).
an identifying estimand by inspection or through algo-
rithmic routines (Kyono, 2010). We also note that when Recognizing Instrumental Variables
applied to linear models, all identification conditions
are valid for feedback systems as well. Instrumental variables is one of the oldest identifica-
The back-door criterion is only one among many tion techniques devised for linear systems (Wright,
patterns in the causal diagram, which allows nonpara- 1928). The method relies on finding a variable Z that
metric identification. Another pattern, known as “front- is correlated with X and is deemed uncorrelated with
door,” has totally different structure and still permits us the error term in an equation (see Pearl, 2009, pp. 242–
to identify causal effects by double adjustment (Pearl, 248, for formal definition). While no statistical test
1995). The more general question of deciding when can certify a variable as instrument, the d-separation
and by what means we can identify causal effects has criterion permits us to identify such variables in the
received a complete answer using do-calculus—a set causal graph and use them to identify parameters that
of three rules that transform do-expressions into modi- do not satisfy the condition of Corollary 1. Moreover,
fied do-expressions whenever appropriate conditions the graph also shows us how to turn variables into in-
prevail in the diagram. Identification holds if and only struments when none exist. In Figure 3.6, for example,
if the rules of do-calculus succeed in removing the do- Z1 is not an instrumental variable for the effect of Z3 on
operator from the causal effect P(y | do(x)), thus reduc- Y because there is a directed path from Z3 to Y, via W1
ing it to expression in ordinary probability calculus and X. Controlling for X will not remedy the situation
(Shpitser & Pearl, 2006b). because X being a descendant of Z3 would unblock the
path Z1 → Z3 ← Z2 → W2 → Y. However, controlling
for W1 will render Z1 a legitimate instrumental variable,
Parametric Identification in Linear SEM since all paths connecting Z1 to Y would go through Z3.
Remarkably, a close cousin of the back-door criterion The general criterion is given by the following theo-
has resolved an age-long identification problem in lin- rem.
ear SEMs: Under what conditions can a path coefficient
bXY be estimated by regression, and what variables Theorem 2 (identification using instrumental
should serve as the regressors? The answer is given by variables)
a criterion called “single door” (Pearl, 2009, p. 150), Let bXY stand for the path coefficient assigned to the
which reads: arrow X → Y in a causal graph G. Parameter bXY is
identified if there exists a pair (Z, W), where Z is a
Corollary 1 (the single-door criterion) single node in G (not excluding Z = X), and W is a
Let bXY be the structural coefficient labeling the (possibly empty) set of nodes in G, such that
arrow X → Y and let rYX  S stand for the X coefficient 1. W consists of nondescendants of Y,
(slope) in the regression of Y on X and S, namely, 2. W d-separates Z from Y in the graph GXY
rYX  S = ∂∂x E (Y | x, s ). The equality bXY = rYX  S holds if formed by removing X → Y from G,
1. The set S contains no descendant of Y. 3. Z and X are d-connected, given W, in GXY.
2. S blocks all paths between X to Y, except the Moreover, the estimand induced by the pair (Z, W)
direct path X → Y. is given by
64  I. F ou n dat i o ns

cov (Y , Z W ) no longer define direct and indirect effects in terms of


b XY = structural or regressional coefficients, and all attempts
cov ( X , Z W ) to extend the linear paradigms of effect decomposi-
tion to nonlinear systems produced distorted results
Additional identification conditions for linear mod- (MacKinnon, Lockwood, Brown, Wang, & Hoffman,
els are given in Pearl (2009, Ch. 5), McDonald (2002, 2007). The counterfactual reading of structural equa-
2004), and Brito and Pearl (2002a, 2002b) and imple- tions (Equation 3.6) enables us to redefine and analyze
mented in Kyono (2010). For example, a sufficient mod- direct and indirect effects from first principles, uncom-
el-identification condition resulting from these tech- mitted to distributional assumptions or a particular
niques is the “non-bow rule” (Brito & Pearl, 2002b), parametric form of the equations. This will be demon-
that is, that any pair of variables be connected by at strated in the next two subsections, using the mediation
most one type of edge. For example, one can add a bi- model of Figure 3.8, in which it is desired to find the
directed arc between any two nonadjacent variables in direct and indirect effect of X on Y, mediated by Z.
Figure 3.6 and still be able to identify all model param-
eters.12 Complete graphical criteria for causal-effect Direct Effects
identification in nonparametric models is developed in
Tian and Pearl (2002) and Shpitser and Pearl (2006b). Conceptually, we can define the direct effect DEx,x′(Y)13
as the expected change in induced by changing X from
x to x′, while keeping all mediating factors constant at
Mediation: Direct and Indirect Effects whatever value they would have obtained under do(x)
Decomposing Effects, Aims, and Challenges (Pearl, 2001; Robins & Greenland, 1992). Accordingly,
Pearl defined direct effect using counterfactual notation:
The decomposition of effects into their direct and in-
direct components carries theoretical scientific impor-
tance, for it tells us “how nature works” and therefore
DE
= x , x′ (Y ) ( )
E Yx′, Z x − E (Yx ) (3.9)

enables us to predict behavior under a rich variety of Here, Yx′, Z x represents the value that Y would attain
conditions and interventions. For example, an investi- under the operation of setting X to x′ and, simultaneous-
gator may be interested in assessing the extent to which ly, setting Z to whatever value it would have obtained
the effect of a given variable can be reduced by weak- under the setting X = x. Given certain assumptions of
ening an intermediate process, standing between that “no confounding,” it is possible to show (Pearl, 2001)
variable and the outcome. that the direct effect can be reduced to a do-expression:
Structural equation models provide a natural lan-
guage for analyzing path-specific effects and, indeed,
considerable literature on direct, indirect, and total ef-  (
DE x , x′ (Y ) = ∑  E Y ( do ( x′, z ) , w )
zw
fects has been authored by SEM researchers (Bollen, − E (Y do ( x, z ) , w )  (3.10)
1989) for both recursive and nonrecursive models. This P ( z do ( x ) , w ) P ( w )
analysis usually involves sums of powers of coefficient
matrices, where each matrix represents the path coef-
ficients associated with the structural equations.
Yet despite its ubiquity, the analysis of mediation has W1 W2
long been a thorny issue in the social and behavioral sci- Z Z
ences (Baron & Kenny, 1986; MacKinnon, 2008) pri-
marily because the distinction between causal param-
eters and their regressional interpretations were often
X Y X Y
conflated, as in Holland (1995) and Sobel (2008). The
(a) (b)
difficulties were further amplified in nonlinear models,
where sums and products are no longer applicable. As
demands grew to tackle problems involving categorical FIGURE 3.8. A generic model depicting mediation through Z
variables and nonlinear interactions, researchers could (a) with no confounders and (b) two confounders, W1 and W2.
3. The Causal Foundations of Structural Equation Modeling  65

where W satisfies the back-door criterion relative to TE x , x′ (Y )  E (Yx′ −


= Yx ) DE x , x′ (Y ) − IE x′, x (Y ) (3.14)
both X → Z and (X, Z) → Y.
In particular, expression (Equation 3.10) is both In linear systems, where reversal of transitions amounts
valid and identifiable in Markovian models (i.e., no to negating the signs of their effects, we have the stan-
unobserved confounders) where each term on the right dard additive formula
can be reduced to a “do-free” expression using Equa-
tion 3.8, then estimated by regression. TE
= x , x′
(Y ) DE x, x′ (Y ) + IE x, x′ (Y ) (3.15)
For example, for the model in Figure 3.8b, Equation
3.10 reads Since each term above is based on an independent op-
erational definition, this equality constitutes a formal
DE x , x′ (Y ) = ∑∑ P ( w2 )  E (Y x′, z , w2 )
 ) justification for the additive formula used routinely in
z w linear systems.
)
− E (Y x, z , w2 )  (3.11)
2


∑ P ( z x, w1, w2 ) P ( w1 ) The Mediation Formula: A Simple Solution
w1 to a Thorny Problem
while for the confounding-free model of Figure 3.8(a), This subsection demonstrates how the solution provid-
we have ed in Equations 3.12 and 3.15 can be applied in assess-
ing mediation effects in nonlinear models. We use the
simple mediation model of Figure 3.8a, where all error
DE x , x′ (Y )
= ∑  E (Y x′, z ) − E (Y x, z )  P ( z x ) (3.12)
terms (not shown explicitly) are assumed to be mutual-
z
ly independent, with the understanding that adjustment
Equations 3.11 and 3.12 can be estimated by a two-step for appropriate sets of covariates W may be necessary
regression. to achieve this independence (as in Equation 3.11) and
that integrals should replace summations when dealing
with continuous variables (Imai, Keele, & Yamamoto,
Indirect Effects
2010).
Remarkably, the definition of the direct effect (Equation Combining Equations 3.12 and 3.14, the expression
3.9) can be turned around and provide an operational for the indirect effect, IE, becomes
definition for the indirect effect—a concept shrouded
in mystery and controversy because it is impossible, by= IE x , x′ (Y ) ∑ E (Y x, z )  P ( z x′ ) − P ( z x )  (3.16)
controlling any of the variables in the model, to disable z
the direct link from X to Y so as to let X influence Y which provides a general formula for mediation effects,
solely via indirect paths. applicable to any nonlinear system, any distribution (of
The indirect effect, IE, of the transition from x to x′ is U), and any type of variables. Moreover, the formula is
defined as the expected change in Y affected by holding readily estimable by regression. Owing to its generality
X constant, at X = x, and changing Z to whatever value it and ubiquity, I have referred to this expression as the
would have attained had X been set to X = x′. Formally, “Mediation Formula” (Pearl, 2009, 2012).
this reads The Mediation Formula represents the average in-
crease in the outcome Y that the transition from X = x
 ( )
IE x , x′ (Y )  E  Yx , Z x′ − E (Yx ) 

(3.13) to X = x′ is expected to produce absent any direct effect
of X on Y. Though based on solid causal principles, it
which is almost identical to the direct effect (Equation embodies no causal assumption other than the generic
3.9) save for exchanging x and x′ in the first term (Pearl, mediation structure of Figure 3.8a. When the outcome
2001). Y is binary (e.g., recovery, or hiring), the ratio (1 – IE/
Indeed, it can be shown that, in general, the total ef- TE) represents the fraction of responding individuals
fect TE of a transition is equal to the difference between who owe their response to direct paths, while (1 – DE/
the direct effect of that transition and the indirect effect TE) represents the fraction who owe their response to
of the reverse transition. Formally, Z-mediated paths.
66  I. F ou n dat i o ns

The Mediation Formula tells us that IE depends only where t is the slope of the total effect:
on the expectation of the counterfactual Yxz, not on its
functional form f Y (x, z, uY) or its distribution P(Yxz = y). ( )
t = E (Y x′ ) − E (Y x ) / ( x′ − x ) = a + bg
It calls therefore for a two-step regression that, in prin-
ciple, can be performed nonparametrically. In the first We thus obtained the standard expressions for indi-
step, we regress Y on X and Z, and obtain the estimate rect effects in linear systems, which can be estimated
either as a difference t – a of two regression coeffi-
g ( x, z ) = E ( Y x, z ) (3.17) cients (Equation 3.22) or as a product bg of two regres-
sion coefficients (Equation 3.21) (see MacKinnon et al.,
for every (x, z) cell. In the second step, we fix x and 2007). These two strategies do not generalize to non-
regard g(x, z) as a function gx(z) of Z. We now estimate linear systems; direct application of Equation 3.16 is
the conditional expectation of gx(z), conditional on X = necessary Pearl (2010a).
x′ and X = x, respectively, and take the difference To understand the difficulty, assume that the cor-
rect model behind the data contains a product term dxz
IE x , x′ (Y ) EZ X  g x ( z ) x′ − EZ X  g x ( z ) x  (3.18)
= added to Equation 3.19, giving:

Nonparametric estimation is not always practical. y = c0 + ax + gz + dxz + uY


When Z consists of a vector of several mediators, the
dimensionality of the problem might prohibit the es- Further assume that we correctly account for this added
timation of E(Y | x, z) for every (x, z) cell, and the need term and, through sophisticated regression analysis,
arises to use parametric approximation. We can then we obtain accurate estimates of all parameters in the
choose any convenient parametric form for E(Y | x, z) model. It is still not clear what combinations of parame-
(e.g., linear, logit, probit), estimate the parameters sepa- ters measure the direct and indirect effects of X on Y, or,
rately (e.g., by regression or ML methods), insert the more specifically, how to assess the fraction of the total
parametric approximation into Equation 3.16, and esti- effect that is explained by mediation and the fraction
mate its two conditional expectations (over z) to get the that is owed to mediation. In linear analysis, the former
mediated effect (VanderWeele, 2009). fraction is captured by the product bg/t (Equation 3.21),
Let us examine what the Mediation Formula yields the latter by the difference (t – a)/t (Equation 3.22) and
when applied to the linear version of Figure 3.8a, which the two quantities coincide. In the presence of interac-
reads tion, however, each fraction demands a separate analy-
sis, as dictated by the Mediation Formula.
x = uX To witness, substituting the nonlinear equation in
z= b0 + bx + u Z (3.19) Equations 3.12, 3.15, and 3.16, and assuming x = 0 and
y= c0 + ax + gz + uY x′ = 1, yields the following effect decomposition:

with uX, uY, and uZ uncorrelated, zero-mean error DE = a + b0d


terms. Computing the conditional expectation in Equa-
tion 3.16 gives IE = bg
TE = a + b0d + b ( g + d )
E (Y x, z )= E ( c0 + ax + gz + uY )= c0 + ax + gz = DE + IE + bg
and yields We therefore conclude that the portion of output change
for which mediation would be sufficient is
IE x , x′ (Y=
) ∑ ( ax + gz )  P ( z x′) − P ( z x )
z
(3.20) IE = bg
=g  E ( Z x′ ) − E ( Z x ) 
while the portion for which mediation would be neces-
= ( x′ − x )( bg )
(3.21)
sary is
= ( x′ − x )( t − a )
(3.22)
TE – DE = bg + bd
3. The Causal Foundations of Structural Equation Modeling  67

We note that, due to interaction, a direct effect can IE    = (h1 – h 0)(g01 – g00) (3.24)
be sustained even when the parameter a vanishes and,
moreover, a total effect can be sustained even when TE = g11h1 + g10(1 – h1) – [g01h 0 + g00(1 – h 0)] (3.25)
both the direct and indirect effects vanish. This illus-
trates that estimating parameters in isolation tells us We see that logistic or probit regression is not neces-
little about the effect of mediation and, more generally, sary; simple arithmetic operations suffice to provide a
mediation and moderation are intertwined and cannot general solution for any conceivable data set, regardless
be assessed separately. of the data-generating process.
If the policy evaluated aims to prevent the outcome Y
by ways of weakening the mediating pathways, the tar-
Numerical Example
get of analysis should be the difference TE – DE, which
measures the highest prevention potential of any such
To anchor these formulas in a concrete example, let us
policy. If, on the other hand, the policy aims to prevent
assume that X = 1 stands for a drug treatment, Y = 1
the outcome by weakening the direct pathway, the tar-
for recovery, and Z = 1 for the presence of a certain
get of analysis should shift to IE, for TE – IE measures
enzyme in a patient’s blood that appears to be stimu-
the highest preventive potential of this type of policy.
lated by the treatment. Assume further that the data
The main power of the Mediation Formula shines in
described in Tables 3.4 and 3.5 was obtained in a ran-
studies involving categorical variables, especially when
domized clinical trial and that our research question is
we have no parametric model of the data-generating pro-
whether Z mediates the action of X on Y, or is merely a
cess. To illustrate, consider the case where all variables
catalyst that accelerates the action of X on Y.
are binary, still allowing for arbitrary interactions and
Substituting this data into Equations 3.23 to 3.25
arbitrary distributions of all processes. The low dimen-
yields
sionality of the binary case permits both a nonparametric
solution and an explicit demonstration of how mediation
DE = (0.40 – 0.20)(1 – 0.40) + (0.80 – 0.30)0.40
can be estimated directly from the data. Generalizations
= 0.32
to multivalued outcomes are straightforward.
Assume that the model of Figure 3.8a is valid and
IE = (0.75 – 0.40)(0.30 – 0.20) = 0.035
that the observed data is given by Table 3.3. The factors
E(Y | x, z) and P(Z | x) can be readily estimated as shown
TE = 0.80 × 0.75 + 0.40 × 0.25
in the two right-most columns of Table 3.3 and, when
– (0.30 × 0.40 + 020 × 0.60) = 0.46
substituted in Equations 3.12, 3.15, and 3.16, yield
IE/TE = 0.07 DE/TE = 0.696 1 – DE/TE = 0.304
DE = (g10 – g00)(1 – h 0) + (g11 – g01)h 0 (3.23)

TABLE 3.3. Computing the Mediation Formula for the Model


in Figure 3.8a, with X, Y, Z Binary
Number
of samples X Z Y E(Y | x, z) = gxz E(Z | x) = hx

n1 0 0 0 n2 n3 + n4
= g 00 = h0
n2 0 0 1 n1 + n2 n1 + n2 + n3 + n4
n3 0 1 0 n4
= g 01
n4 0 1 1 n3 + n4

n5 1 0 0 n6 n7 + n8
= g10 = h1
n6 1 0 1 n5 + n6 n5 + n6 + n7 + n8
n7 1 1 0 n8
= g11
n8 1 1 1 n7 + n8
68  I. F ou n dat i o ns

TABLE 3.4. How Parameter gxz in Table 3.3 Is directly from the data, circumventing the parametric
Computed in Experimental Example analysis altogether, as shown in Equations 3.23 to 3.25.
Attempts to extend the difference and product heu-
Treatment Enzyme present Percentage cured
X Z gxz = E(Y | x, z) ristics to nonparametric analysis have encountered am-
biguities that conventional analysis fails to resolve.
YES YES g11 = 80% The product-of-coefficients heuristic advises us to
YES NO g10 = 40% multiply the unit effect of X on Z,
NO YES g01 = 30% Cb = E(Z | X = 1) – E(Z | X = 0) = h1 – h 0
NO NO g00 = 20%
by the unit effect of Z on Y given X,

Cg = E(Y | X = x, Z = 1) – E(Y | X = x, Z = 0) = gx1 – gx0


TABLE 3.5. How Parameter hx in Table 3.3
Is Computed in Experimental Example but does not specify on what value we should condition
Treatment X Percentage with Z present
X. Equation 3.24 resolves this ambiguity by determin-
ing that Cg should be conditioned on X = 0; only then
NO h 0 = 40% would the product Cb Cg yield the correct mediation
YES h1 = 75% measure, IE.
The difference-in-coefficients heuristics instructs us
to estimate the direct effect coefficient

We conclude that 30.4% of all recoveries is owed to Ca = E(Y | X = 1, Z = z) – E(Y | X = 0, Z = z) = g1z – g0z
the capacity of the treatment to enhance the secretion
of the enzyme, while only 7% of recoveries would be and subtract it from the total effect, but does not specify
sustained by enzyme enhancement alone. The policy on what value we should condition Z. Equation 3.23 de-
implication of such a study would be that efforts to de- termines that the correct way of estimating Ca would
velop a cheaper drug, identical to the one studied, but be to condition on both Z = 0 and Z = 1 and take their
lacking the potential to stimulate enzyme secretion, weighted average, with h 0 = P(Z = 1 | X = 0) serving as
would face a reduction of 30.4% in recovery cases. the weighting function.
More decisively, proposals to substitute the drug with To summarize, the Mediation Formula dictates that
one that merely mimics its stimulant action on Z but has in calculating IE, we should condition on both Z = 1
no direct effect on Y are bound for failure; the drug evi- and Z = 0 and average, while in calculating DE, we
dently has a beneficial effect on recovery that is inde- should condition on only one value, X = 0, and no aver-
pendent of, though enhanced by, enzyme stimulation. age need be taken.
In comparing these results to those produced by The difference and product heuristics are both le-
conventional mediation analyses, we should note that gitimate, with each seeking a different effect measure.
conventional methods do not define direct and indirect The difference heuristics, leading to TE – DE, seeks
effects in a setting where the underlying process is un- to measure the percentage of units for which media-
known. MacKinnon (2008, Ch. 11), for example, ana- tion was necessary. The product heuristics, on the other
lyzes categorical data using logistic and probit regres- hand, leading to IE, seek to estimate the percentage of
sions, and constructs effect measures using products units for which mediation was sufficient. The former
and differences of the parameters in those regressional informs policies aiming to modify the direct pathway
forms. This strategy is not compatible with the causal while the latter informs those aiming to modify medi-
interpretation of effect measures, even when the param- ating pathways.
eters are precisely known; IE and DE may be extremely In addition to providing causally sound estimates for
complicated functions of those regression coefficients mediation effects, the Mediation Formula also enables
(Pearl, 2012). Fortunately, those coefficients need not researchers to evaluate analytically the effectiveness
be estimated at all; effect measures can be estimated of various parametric specifications relative to any as-
3. The Causal Foundations of Structural Equation Modeling  69

sumed model. This type of analytical “sensitivity anal- and psychology (Shadish, Cook, & Campbell, 2002),
ysis” has been used extensively in statistics for param- the statistical language available to researchers prior
eter estimation but could not be applied to mediation to the advent of graphical models was not sufficiently
analysis, owing to the absence of an objective target powerful for the task. External validity requires a for-
quantity that captures the notion of indirect effect in mal language within which the notions of “experimental
both linear and nonlinear systems, free of parametric setting” can be given a precise characterization and dif-
assumptions. The Mediation Formula of Equation 3.16 ferences among settings can be encoded and analyzed.
explicates this target quantity formally, and casts it in I next illustrate a particular variant of generalizabil-
terms of estimable quantities. It has also been used by ity, called “transportability,” that has received a com-
Imai and colleagues (2010) to examine the robustness plete formal treatment using the do-calculus. Transport-
of empirical findings to the possible existence of un- ability is defined as a license to transfer causal effects
measured confounders. learned in experimental studies to a new population,
The derivation of the Mediation Formula was facili- in which only observational studies can be conducted.
tated by taking seriously the graphical–counterfactual– Using a representation called “selection diagrams” to
structural symbiosis spawned by the surgical interpre- encode knowledge about differences and commonali-
tation of counterfactuals (Equation 3.6). In contrast, ties among populations of interest, Pearl and Bareinbo-
when the mediation problem is approached from an im (2014) have reduced questions of transportability to
exclusivist potential-outcome viewpoint, void of the symbolic derivations in the do-calculus, and developed
structural guidance of Equation 3.6, counterintuitive procedures for deciding whether causal effects in the
definitions ensue, carrying the label “principal strati- target population can be inferred from experimental
fication” (Rubin, 2004), which are at variance with findings in the study population.
common understanding of direct and indirect effects. A selection diagram is a causal diagram annotated
For example, the direct effect is definable only in units with new variables, called S-nodes, which point to
absent of indirect effects. This means that a grandfa- the mechanisms where discrepancies between the two
ther would be deemed to have no direct effect on his populations are suspected to take place (see Figure 3.9).
grandson’s behavior in families where he has had some The task of deciding if transportability is feasible now
effect on the father. This precludes from the analysis reduces to a syntactic problem of separating (using the
all typical families, in which a father and a grandfather do-calculus) the do-operator from the S-variables in the
have simultaneous, complementary influences on chil- query expression P(y | do(x), z, s).
dren’s upbringing. In linear systems, to take an even
sharper example, the direct effect would be undefined Theorem 3 Pearl and Bareinboim (2011)
whenever indirect paths exist from the cause to its ef- Let D be the selection diagram characterizing
fect. The emergence of such paradoxical conclusions two populations, p and p*, and S a set of selection
underscores the wisdom, if not necessity, of a symbi- variables in D. The relation R = P*(y | do(x), z) is
otic analysis, in which the counterfactual notation Yx(u) transportable from p and p* if and only if the ex-
is governed by its structural definition, Equation 3.6.14 pression P(y | do(x), z, s) is reducible, using the rules
of do-calculus, to an expression in which S appears
only as a conditioning variable in do-free terms.
EXTERNAL VALIDITY AND TRANSPORTABILITY
While Theorem 3 does not specify the sequence of
Generalizing empirical findings to new environments, rules leading to the needed reduction (if such exists), a
settings, or populations, often called “external validity,” complete and effective graphical procedure devised by
is critical in most scientific explorations since, invari- Bareinboim and Pearl (2014) also produces a transport
ably, the conclusions of such explorations are intended formula whenever possible. Each transport formula
to be applied in settings that differ from those in the determines what information needs to be extracted
study. Remarkably, the theory of external validity has from the experimental and observational studies and
not advanced since Donald Campbell and Julian Stanley how they ought to be combined to yield an unbiased
(1966) recognized and defined the term. While several estimate of the relation R = P(y | do(x), s) in the target
efforts were attempted in economics (Manski, 2007) population p*.
70  I. F ou n dat i o ns

Z
S

S
Z S

X Y X Y X Y
Z
(a) (b) (c)

FIGURE 3.9. Selection diagrams depicting differences in populations. In (a), the two populations differ in age distributions. In
(b), the populations differ in how reading skills (Z) depend on age (an unmeasured variable, represented by the hollow circle)
and the age distributions are the same. In (c), the populations differ in how Z depends on X.

For example, the transport formulas induced by the tor may be constructed from multiple sources even in
three models in Figure 3.9 are given by cases where it is not constructible from any one source
in isolation.
(a) P ( y do ( x ) , s ) = ∑ z P ( y do ( x ) , z ) P ( z s ) Another problem that falls under the Data Fusion
(b) P ( y do ( x ) , s ) = P ( y do ( x ) ) umbrella is that of “Selection Bias” (Bareinboim, Tian,
& Pearl, 2014), which requires a generalization from a
(c) P ( y do ( x ) , s ) = ∑ z P ( y do ( x ) , z ) P ( z x, s ) . subpopulation selected for a study to the population at
large, the target of the intended policy.
Each of these formulas satisfies Theorem 3, and each Selection bias is induced by preferential selection of
describes a different procedure of pooling information units for data analysis, usually governed by unknown
from p and p*. factors including treatment, outcome, and their con-
For example, (c) states that to estimate the causal sequences, and represents a major obstacle to valid
effect of X on Y in the target population p*, we must causal and statistical inferences. It cannot be removed
estimate the z-specific effect P(y | do(x), z) in the source by randomized experiments and can rarely be detected
population p and average it over z, weighted by P(z | x, in either experimental or observational studies. For in-
s), that is, the conditional probability P(z | x) estimated stance, subjects recruited for a medical trial are typi-
at the target population p*. cally motivated by financial incentives or expectations
A generalization of transportability theory to multi- to benefit from the treatment. Since the sample no lon-
environment has led to a method called “data fusion” ger represents the population for which the treatment
(Bareinboim & Pearl, 2016) aimed at combining results is intended, biased estimates will be produced regard-
from many experimental and observational studies, less of how many samples were collected. The analysis
each conducted on a different population and under a of Bareinboim et al. (2014) identifies conditions under
different set of conditions, so as to synthesize an ag- which such nonrepresentative selection of units can be
gregate measure of effect size in yet another environ- neutralized.
ment, different than the rest. This fusion problem has
received enormous attention in the health and social
sciences, where it is typically handled inadequately by RECOVERY FROM MISSING DATA
a statistical method called “meta-analysis” which “av-
erages out” differences instead of rectifying them. Although the study of missing data has been part of
Using multiple “selection diagrams” to encode com- SCM research since the 1980s (Muthén, Kaplan, &
monalities among studies, Bareinboim and Pearl (2013, Hollis, 1987), this work was almost entirely tied to Ru-
2014) “synthesized” an estimator that is guaranteed to bin’s theory and taxonomy of missing data problems
provide an unbiased estimate of the desired quantity (Rubin, 1976); therefore, it suffers from basic limita-
based on information that each study shares with the tions along three dimensions: transparency, estimabil-
target environment. Remarkably, a consistent estima- ity, and testability.
3. The Causal Foundations of Structural Equation Modeling  71

• Transparency: The criteria distinguishing differ- advantages of a symbiotic approach by offering a sim-
ent levels of the taxonomy are cognitively formi- ple solution to the mediation problem for models with
dable, making it almost impossible to decide what categorical data.
type of missingness is present in one’s data and, Finally, I have sketched progress in two problem
consequently, what tools would be appropriate for areas that have been lingering for decades, external va-
analysis or estimation. lidity and missing data, to which complete algorithms
• Estimability: Users cannot ascertain whether the have been developed using causal graphical models.
parameter of interest can be estimated consistently An issue that was not discussed in this chapter is the
from the partially observed data available and/or problem of going from population data to estimating
whether the estimate obtained by any given meth- individual behavior, as well as identifying situation-
od is consistent. specific causes of effects. I refer the reader to Pearl
• Testability: It is impossible to tell if any of the (2015a) and Li and Pearl (2019), where these issues re-
model’s assumptions is incompatible with the ceive formal treatments.
available data (corrupted by missingness). Some researchers would naturally prefer a methodol-
ogy in which claims are less sensitive to judgmental as-
These three limitations have been lifted recently sumptions; unfortunately, no such methodology exists.
using “missingness graphs”—a graphical encoding of The relationship between assumptions and claims is a
the reasons for missingness (Mohan & Pearl, 2021; universal one—namely, for every set A of assumptions
Thoemmes & Mohan, 2015). Significantly, Rubin’s (knowledge) there is a unique set of conclusions C that
taxonomy has been replaced by variable-based taxon- one can deduce from A, given the data, regardless of the
omy of “missing at random” (MAR) categories that re- method used. The completeness results of Shpitser and
searchers can both comprehend and test against data. In Pearl (2006a) imply that SEM operates at the boundary
particular, simple procedures were devised that operate of this universal relationship; no method can do better
on the missingness diagram and provide meaningful without strengthening the assumptions.
performance guarantees in broad categories of missing
data problems, including when data are missing not at
random (MNAR). These include testability conditions ACKNOWLEDGMENTS
for both MAR and MNAR categories.
More generally, the missing-data problem was shown This chapter has benefited from discussions with Elias Bare-
to be a causal, not statistical, problem. The statistical inboim, Peter Bentler, Ken Bollen, James Heckman, Jeffrey
Hoyle, Marshall Joffe, David Kaplan, David Kenny, David
terminology that has dominated the SEM literature
MacKinnon, Rod McDonald, Karthika Mohan, Stanley Mu-
in the past is incapable of capturing the assumptions laik, William Shadish, Leland Wilkinson, and Larry Wil-
needed for processing missing data problems. liams, and was supported in parts by grants from the National
Institutes of Health (1R01 LM009961-01), the National Sci-
ence Foundation (IIS-0914211 and IIS-1018922), and the Of-
CONCLUSION fice of Naval Research (N000-14-09-1-0665).

This chapter casts the methodology of SEM as a causal


inference engine that takes qualitative causal assump- NOTES
tions, data, and queries as inputs and produces quanti-
1. An account of Wright’s heroic insistence on the causal
tative causal claims, conditional on the input assump-
reading of SEM is narrated in Pearl and Mackenzie (2018).
tions, together with data-fitness ratings to well-defined
A tribute to Haavelmo’s contributions to economics, in par-
statistical tests. ticular his causal interpretation of path coefficients, is given
I have shown that graphical encodings of the input in Pearl (2015b), which also discusses the tension between
assumption can also be used as efficient mathemati- the “structuralist” and “experimentalist” schools in econo-
cal tools for identifying testable implications, decid- metrics.
ing query identification and generating estimable ex- 2. A more comprehensive account of the history of SEM
pressions for causal and counterfactual expressions. I and its causal interpretations is given in Pearl (1998). Pearl
discussed the logical equivalence of the structural and (2009, pp. 368–374) devotes a section of his book Causality
potential-outcome frameworks and demonstrated the to advise SEM students on the causal reading of SEM and
72  I. F ou n dat i o ns

how to defend it against the skeptics. Another gentle intro- able implications and none of its parameters identified. Like-
duction is given in Pearl et al. (2016), while a nontechnical wise, the traditional algebraic distinction between “overiden-
perspective can be found in Pearl and Mackenzie (2018). tified” and “just identified” parameters is usually misleading
3. This is important to emphasize in view of the often (see Pearl, 2004).
heard criticism that in SEM, one must start with a model in 12. This rule subsumes Bollen’s (1989, p. 95) “recursive
which all causal relations are presumed known, at least quali- rule,” which forbids a bidirected arc between a variable and
tatively. All other methods must rest on the same knowledge, any of its ancestors.
though some tend to hide the assumptions under catchall 13. Robins and Greenland (1992) called this notion of
terms such as “ignorability” or “nonconfoundedness.” When direct effect “Pure” while Pearl called it “Natural,” denoted
a priori knowledge is not available, the uncertainty can be NDE, to be distinguished from the “controlled direct effect”
represented in SEM by adding links with unspecified param- that is specific to one level of the mediator Z. We delete the
eters. letter N from the acronyms of both the direct and indirect ef-
4. Causal relationships among latent variables are as- fects and use DE and IE, respectively.
sessed by treating their indicators as noisy measurements of 14. Such symbiosis is now standard in epidemiology re-
the former (Bollen, 1989; Cai & Kuroki, 2008; Pearl, 2010b). search (Hafeman & Schwartz, 2009; Joffe & Green, 2009;
5. The reason for this fundamental limitation is that no Petersen, Sinisi, & van der Laan, 2006; Robins, 2001;
death case can be tested twice, with and without treatment. VanderWeele, 2009; VanderWeele & Robins, 2007) and is
For example, if we measure equal proportions of deaths in making its way slowly toward the social and behavioral sci-
the treatment and control groups, we cannot tell how many ences (Imai et al., 2010; Morgan & Winship, 2007).
death cases are actually attributable to the treatment itself;
it is quite possible that many of those who died under treat-
ment would be alive if untreated and, simultaneously, many REFERENCES
of those who survived with treatment would have died if not
treated. Ali, R., Richardson, T., & Spirtes, P. (2009). Markov equiva-
lence for ancestral graphs. Annals of Statistics, 37, 2808–
6. Connections between structural equations and a re-
2837.
stricted class of counterfactuals were first recognized by
Balke, A., & Pearl, J. (1995). Counterfactuals and policy
Simon and Rescher (1966). These were later generalized by
analysis in structural models. In P. Besnard and S. Hanks
Balke and Pearl (1995), using surgeries (Equation 3.6), thus
(Eds.), Proceedings of the 11th Conference on Uncertainty
permitting endogenous variables to serve as counterfactual
in Artificial Intelligence (pp. 11–18). San Francisco: Mor-
antecedents. The “surgery definition” was used in Pearl
gan Kaufmann.
(2000, p. 417) and defended in Pearl (2009, pp. 362–382,
Bareinboim, E., & Pearl, J. (2013). Meta-transportability of
374–379).
causal effects: A formal approach. Proceedings of the 16th
7. This probability, written P(Y1 = 1 | X = 0, Y = 0), also International Conference on Artificial Intelligence and
known as the “probability of causation” (Pearl, 2009, Ch. 9) Statistics (AISTATS), pp. 135–143.
quantifies “causes of effect,” as opposed to “effect of causes,” Bareinboim, E., & Pearl, J. (2014). Transportability of causal
and was excluded, prematurely I presume, from the province effects: Completeness results. (Tech. Rep. R-390-L). Re-
of potential-outcome analysis (Holland, 1986). trieved from http://ftp.cs.ucla.edu/pub/stat_ser/r390-L.
8. Such tables are normally used to explain the philoso- pdf. Extended version of paper in the 2012 Proceedings of
phy behind the potential-outcome framework (e.g., West the 26th AAAI Conference, Toronto, Canada, pp. 698–704.
and Thoemmes, 2010) in which Y1 and Y0 are taken as un- Bareinboim, E., & Pearl, J. (2016). Causal inference and the
explained random variables. Here they are defined by, and data-fusion problem. Proceedings of the National Acad-
derived from, a simple structural model. emy of Sciences, 113, 7345–7352.
9. In other words, a complete axiomization of structural Bareinboim, E., Tian, J., & Pearl, J. (2014). Recovering from
selection bias in causal and statistical inference. In C. E.
counterfactuals in recursive systems consists of Equation 3.7
Brodley and P. Stone (Eds.), Proceedings of the 28th AAAI
and a few nonessential details.
Conference on Artificial Intelligence (pp. 2410–2416).
10. See Hayduk et al. (2003), Mulaik (2009), and Pearl Palo Alto, CA: AAAI Press. Best Paper Award (http://ftp.
(2009, p. 335) for gentle introduction to d-separation. Pearl cs.ucla.edu/pub/stat_ser/r425.pdf).
(1986) demonstrates how d-separation yields a method of Baron, R., & Kenny, D. (1986). The moderator–mediator
structuring causal trees from data, despite the presence of variable distinction in social psychological research: Con-
hidden variables. ceptual, strategic, and statistical considerations. Journal of
11. The nomenclature “overidentifying restriction” is Personality and Social Psychology, 51, 1173–1182.
somewhat misleading because a model may have many test- Baumrind, D. (1993). Specious causal attributions in social
3. The Causal Foundations of Structural Equation Modeling  73

sciences: The reformulated stepping-stone theory of hero One more step into causal thinking. Structural Equation
in use as exemplar. Journal of Personality and Social Psy- Modeling, 10, 289–311.
chology, 45, 1289–1298. Hershberger, S. L. (2006). The problem of equivalent struc-
Berkson, J. (1946). Limitations of the application of four- tural models. In G. R. Hancock & R. O. Mueller (Eds.),
fold table analysis to hospital data. Biometrics Bulletin, Structural equation modeling: A second course (pp. 21–
2, 47–53. 25). Greenwich, CT: Information Age.
Blalock, H. (1964). Causal inferences in nonexperimental Holland, P. (1986). Statistics and causal inference. Journal of
research. Chapel Hill: University of North Carolina Press. the American Statistical Association, 81, 945–960.
Bollen, K. (1989). Structural equations with latent variables. Holland, P. (1988). Causal inference, path analysis, and re-
New York: Wiley. cursive structural equations models. In C. Clogg (Ed.), So-
Bollen, K., & Pearl, J. (2013). Eight myths about causality and ciological methodology (pp. 449–484). Washington, DC:
structural equation models. In S. Morgan (Ed.), Handbook American Sociological Association.
of causal analysis for social research (pp. 301–328). New Holland, P. (1995). Some reflections on Freedman’s critiques.
York: Springer. Foundations of Science, 1, 50–57.
Brito, C., & Pearl, J. (2002a). Generalized instrumental vari- Imai, K., Keele, L., & Yamamoto, T. (2010). Identification,
ables. In A. Darwiche and N. Friedman (Eds.), Proceed- inference, and sensitivity analysis for causal mediation ef-
ings of the 18th Conference on Uncertainty in Artificial In- fects. Statistical Science, 25, 51–71.
telligence (pp. 85–93). San Francisco: Morgan Kaufmann. Imbens, G. (2010). An economist’s perspective on Shadish
Brito, C., & Pearl, J. (2002b). A new identification condition (2010) and West and Thoemmes (2010). Psychological
for recursive models with correlated errors. Journal Struc- Methods, 15, 47–55.
tural Equation Modeling, 9, 459–474. Joffe, M., & Green, T. (2009). Related causal frameworks for
Byrne, B. (2006). Structural equation modeling with EQS: surrogate outcomes. Biometrics, 65, 530–538.
Basic concepts, applications, and programming (2nd ed.). Kelloway, E. (1998). Using LISREL for structural equation
New York: Routledge. modeling. Thousand Oaks, CA: Sage.
Cai, Z., & Kuroki, M. (2008). On identifying total effects Kenny, D. A., & Milan, S. (2012). Identification: A nontech-
in the presence of latent variables and selection bias. In nical discussion of a technical issue. In R. Hoyle (Ed.),
D. McAllester & P. Myllymäki (Eds.), Proceedings of the Handbook of structural equation modeling (pp. 145–163).
24th Conference on Uncertainty in Artificial Intelligence New York: Guilford Press.
(pp. 62–69). Arlington, VAL AUAI. Kline, R. B. (2011). Principles and practice of structural
Campbell, D., & Stanley, J. (1966). Experimental and quasi- equation modeling (3rd ed.). New York: Guilford Press.
experimental designs for research. Chicago: R. McNally. Koopmans, T. (1953). Identification problems in econometric
Chin, W. (1998). Commentary: Issues and opinion on struc- model construction. In W. Hood & T. Koopmans (Eds.),
tural equation modeling. Management Information Sys- Studies in econometric method (pp. 27–48). New York:
tems Quarterly, 22, 7–16. Wiley.
Cliff, N. (1983). Some cautions concerning the application Kyono, T. (2010). Commentator: A front-end user-interface
of causal modeling methods. Multivariate Behavioral Re- module for graphical and structural equation modeling
search, 18, 115–126. (Tech. Rep. R-364, Master’s Thesis). Los Angeles: Depart-
Duncan, O. (1975). Introduction to structural equation mod- ment of Computer Science, University of California.
els. New York: Academic Press. Lee, S., & Hershberger, S. (1990). A simple rule for generat-
Freedman, D. (1987). As others see us: A case study in path ing equivalent models in covariance structure modeling.
analysis (with discussion). Journal of Educational Statis- Multivariate Behavioral Research, 25, 313–334.
tics, 12, 101–223. Li, A., & Pearl, J. (2019). Unit selection based on counter-
Galles, D., & Pearl, J. (1998). An axiomatic characteriza- factual logic. Proceedings of the 28th International Joint
tion of causal counterfactuals. Foundation of Science, 3, Conference on Artificial Intelligence, IJCAI-19, pp. 1793–
151–182. 1799.
Haavelmo, T. (1943). The statistical implications of a sys- MacKinnon, D. (2008). Introduction to statistical mediation
tem of simultaneous equations. Econometrica, 11, 1–12. analysis. New York: Erlbaum.
Reprinted in D. F. Hendry & M. S. Morgan (Eds.), The MacKinnon, D., Lockwood, C., Brown, C., Wang, W., &
foundations of econometric analysis (pp. 477–490, 1995). Hoffman, J. (2007). The intermediate endpoint effect in
Cambridge, UK: Cambridge University Press. logistic and probit regression. Clinical Trials, 4, 499–513.
Hafeman, D., & Schwartz, S. (2009). Opening the black box: Manski, C. (2007). Identification for prediction and deci-
A motivation for the assessment of mediation. Internation- sion. Cambridge, MA: Harvard University Press.
al Journal of Epidemiology, 3, 838–845. McDonald, R. (2002). What can we learn from the path
Hayduk, L., Cummings, G., Stratkotter, R., Nimmo, M., Gry- equations?: Identifiability constraints, equivalence. Psy-
goryev, K., Dosman, D., et al. (2003). Pearl’s d-separation: chometrika, 67, 225–249.
74  I. F ou n dat i o ns

McDonald, R. (2004). The specific analysis of structural causal calculus [Special issue on Haavelmo Centennial].
equation models. Multivariate Behavioral Research, 39, Econometric Theory, 31, 152–179.
687–713. Pearl, J. (2017). A linear “microscope” for interventions and
Mohan, K., & Pearl, J. (2021). Graphical models for process- counterfactuals. Journal of Causal Inference, 5, 1–15.
ing missing data. Journal of the American Statistical As- Pearl, J., & Bareinboim, E. (2011). Transportability across
sociation, 116, 1023–1037. studies: A formal approach. Proceedings of the 25th Con-
Morgan, S., & Winship, C. (2007). Counterfactuals and ference on Artificial Intelligence (AAAI-11), pp. 95–101.
causal inference: Methods and principles for social re- http://ftp.cs.ucla.edu/pub/stat_ser/r372a.pdf.
search (Analytical Methods for Social Research). New Pearl, J., & Bareinboim, E. (2014). External validity: From
York: Cambridge University Press. do-calculus to transportability across populations. Statis-
Mulaik, S. A. (2009). Linear causal modeling with structural tical Science, 29, 579–595.
equations. New York: CRC Press. Pearl, J., Glymour, M., & Jewell, N. (2016). Causal inference
Muthén, B. (1987). Response to Freedman’s critique of path in statistics: A primer. New York: Wiley.
analysis: Improve credibility by better methodological Pearl, J., & Mackenzie, D. (2018). The book of why: The new
training. Journal of Educational Statistics, 12, 178–184. science of cause and effect. New York: Basic Books.
Muthén, B., Kaplan, D., & Hollis, M. (1987). On structural Petersen, M., Sinisi, S., & van der Laan, M. (2006). Es­
equation modeling with data that are not missing com- timation of direct causal effects. Epidemiology, 17, 276–
pletely at random. Psychometrika, 52, 431–462. 284.
Pearl, J. (1986). Fusion, propagation, and structuring in belief Robins, J. (2001). Data, design, and background knowledge in
networks. Artificial Intelligence, 29, 241–288. etiologic inference. Epidemiology, 12, 313–320.
Pearl, J. (1988). Probabilistic reasoning in intelligent sys- Robins, J., & Greenland, S. (1992). Identifiability and ex-
tems. San Mateo, CA: Morgan Kaufmann. changeability for direct and indirect effects. Epidemiol-
Pearl, J. (1993). Comment: Graphical models, causality, and ogy, 3, 143–155.
intervention. Statistical Science, 8, 266–269. Rubin, D. (1974). Estimating causal effects of treatments in
Pearl, J. (1995). Causal diagrams for empirical research. randomized and nonrandomized studies. Journal of Edu-
Biometrika, 82, 669–710. cational Psychology, 66, 688–701.
Pearl, J. (1998). Graphs, causality, and structural equation Rubin, D. (1976). Inference and missing data. Biometrika, 63,
models. Sociological Methods and Research, 27, 226–284. 581–592.
Pearl, J. (2000). Causality: Models, reasoning, and inference. Rubin, D. (2004). Direct and indirect causal effects via po-
New York: Cambridge University Press. tential outcomes. Scandinavian Journal of Statistics, 31,
Pearl, J. (2001). Direct and indirect effects. In J. Breese & 161–170.
D. Koller (Eds.), Proceedings of the 17th Conference on Shadish, W., Cook, T., & Campbell, D. (2002). Experimental
Uncertainty in Artificial Intelligence (pp. 411–420). San and quasi-experimental design for generalized causal in-
Francisco: Morgan Kaufmann. ference. Boston: Houghton Mifflin.
Pearl, J. (2004). Robustness of causal claims. In M. Chicker- Shpitser, I., & Pearl, J. (2006a). Identification of conditional
ing & J. Halpern (Eds.), Proceedings of the 20th Confer- interventional distributions. In R. Dechter & T. Richard-
ence on Uncertainty in Artificial Intelligence (pp. 446– son (Eds.), Proceedings of the 22nd Conference on Uncer-
453). Arlington, VA: AUAI Press. tainty in Artificial Intelligence (pp. 437–444). Corvallis,
Pearl, J. (2009). Causality: Models, reasoning, and inference OR: AUAI Press.
(2nd ed.). New York: Cambridge University Press. Shpitser, I., & Pearl, J. (2006b). Identification of joint inter-
Pearl, J. (2010a). An introduction to causal inference. The In- ventional distributions in recursive semi-Markovian caus-
ternational Journal of Biostatistics, 6, doi: 10.2202/1557– al models. In Proceedings of the 21st National Conference
4679.1203. on Artificial Intelligence (pp. 1219–1226). Menlo Park,
Pearl, J. (2010b). On measurement bias in causal inference. CA: AAAI Press.
In P. Grünwald & P. Spirtes (Eds.), Proceedings of the Shpitser, I., & Pearl, J. (2008). Dormant independence. In
26th Conference on Uncertainty in Artificial Intelligence Proceedings of the 23rd Conference on Artificial Intel-
(pp. 425–432). Corvallis, OR: AUAI. ligence (pp. 1081–1087). Menlo Park, CA: AAAI Press.
Pearl, J. (2012). The mediation formula: A guide to the as- Simon, H., & Rescher, N. (1966). Cause and counterfactual.
sessment of causal pathways in non-linear models. In C. Philosophy and Science, 33, 323–340.
Berzuini, P. Dawid, & L. Bernardinelli (Eds.), Causality: Sobel, M. (1996). An introduction to causal inference. Socio-
Statistical perspectives and applications (pp. 151–179). logical Methods and Research, 24, 353–379.
Hoboken, NJ: Wiley. Sobel, M. (2008). Identification of causal parameters in ran-
Pearl, J. (2015a). Causes of effects and effects of causes. Jour- domized studies with mediating variables. Journal of Ed-
nal of Sociological Methods and Research, 44, 149–164. ucational and Behavioral Statistics, 33, 230–231.
Pearl, J. (2015b). Trygve Haavelmo and the emergence of Sørensen, A. (1998). Theoretical methanisms and the empiri-
3. The Causal Foundations of Structural Equation Modeling  75

cal study of social processes. In P. Hedström & R. Swed- modification: A classification based on directed acyclic
berg (Eds.), Social mechanisms: An analytical approach graphs. Epidemiology, 18, 561–568.
to social theory, studies in rationality and social change Verma, T., & Pearl, J. (1990). Equivalence and synthesis of
(pp. 238–266). Cambridge, UK: Cambridge University causal models. Proceedings of the 6th Conference on Un-
Press. certainty in Artificial Intelligence, pp. 220–227.
Stelzl, I. (1986). Changing a causal hypothesis without chang- West, S., & Thoemmes, F. (2010). Campbell’s and Rubin’s
ing the fit: Some rules for generating equivalent path mod- perspectives on causal inference. Psychological Methods,
els. Multivariate Behavioral Research, 21, 309–331. 15, 18–37.
Thoemmes, F., & Mohan, K. (2015). Graphical representation Wilkinson, L., Task Force on Statistical Inference, & APA
of missing data problems. Structural Equation Modeling: Board of Scientific Affairs. (1999). Statistical methods in
A Multidisciplinary Journal, 22, 631–642. psychology journals: Guidelines and explanations. Ameri-
Tian, J., & Pearl, J. (2002). A general identification condition can Psychologist, 54, 594–604.
for causal effects. In R. Dechter, M. Kearns, & R. Sut- Williams, L. J. (2012). Equivalent models: Concepts, prob-
ton (Eds.), Proceedings of the 18th National Conference lems, alternatives. In R. Hoyle (Ed.), Handbook of struc-
on Artificial Intelligence (pp. 567–573). Menlo Park, CA: tural equation modeling (pp. 247–260). New York: Guil-
AAAI Press/The MIT Press. ford Press.
VanderWeele, T. (2009). Marginal structural models for the Wright, P. (1928). The tariff on animal and vegetable oils.
estimation of direct and indirect effects. Epidemiology, New York: Macmillan.
20, 18–26. Wright, S. (1921). Correlation and causation. Journal of Agri-
VanderWeele, T., & Robins, J. (2007). Four types of effect cultural Research, 20, 557– 585.
CH A P T E R 4

Visualizations for structural Equation Modeling


Jolynn Pek
Erin K. Davisson
Rick H. Hoyle

Visualizations are essential for communicating com- among adolescents. The first example introduces the
plex relationships among variables in an accessible and basic covariance structure model without mean struc-
succinct manner. Statistical graphics also serve the im- ture, and the second example extends the model to in-
portant role of facilitating the exploration of multivari- clude mean structure. We conclude with a discussion
ate data, conducting diagnostics to aid in modeling the of strategies to consider when making use of graphics
data, and presenting arrays of results. Because models with SEM.
fit using structural equation modeling (SEM) tend to
be multivariate in nature, often expressing a complex
network of directional and nondirectional relationships MODEL SPECIFICATION
among manifest and latent variables, researchers rely
on graphics to facilitate specifying and expressing the Specification of a model involves formally expressing it
model, analyzing the data, and presenting their results. in terms of mathematical equations or a path diagram.
In this chapter, we examine the fundamental role of A given model encompasses a set of manifest variables
visualizations and offer strategies and recommenda- and latent variables. A manifest variable (MV) is a
tions on how to use graphics to facilitate the use of SEM variable for which scores are available in the data set
to fit models to data. We begin by reviewing aspects of such as participants’ responses to a Likert-type item
model specification using the popular LISREL matrix measuring positive emotions. In contrast, a latent vari-
notation (Jöreskog & Sörbom, 2006),1 highlighting the able (LV) is a variable for which scores are not avail-
isomorphism between the algebraic representation of able in the data set but rather assumed to be reflected
models and the path diagram. Here, we emphasize ad- by MVs (see Bollen & Hoyle, Chapter 5, this volume,
vantages and caveats associated with the use of graph- for a detailed treatment of LVs). Often, LVs are regard-
ics in model specification. Next, we introduce several ed as key constructs in the social and behavioral scienc-
univariate and multivariate graphics that are useful for es that are indicated by MVs. For instance, depression
modeling data with SEM. Finally, we extend the use as measured by the Beck Depression Inventory is an
of graphics to the presentation of SEM results. In each LV indicated by 21 MVs (Beck, Steer, Ball, & Ranieri,
of these sections, we illustrate the use of graphics with 1996). When an LV is indicated by multiple MVs, it is
an empirical example examining the effects of sensa- known as a common factor in factor analysis. Common
tion seeking and self-regulation on problem behavior factors indicated by multiple MVs are considered free

76 
4. Visualizations for SEM  77

of measurement error whereby the variance of the LV son, exogenous variables within a model are specified
represents the communality (common variance) among to correlate with one another.
the MVs. The residual variances of each of these MVs The parameters of the model are directional paths
are called uniquenesses, which can be further parti- from exogenous MVs and LVs to endogenous MVs and
tioned into specific and error variance. Specific vari- LVs, nondirectional paths among exogenous MVs and
ance is the systematic variance that is particular to the LVs, and variances of exogenous MVs and LVs. Im-
MV, whereas the error variance represents noise. Note portantly, the variances of the endogenous variables
that residual variances are also considered LVs because (MVs and LVs) are functions of other parameters in the
they are not directly measured. model. Stated differently, nondirectional associations
With a set of MVs and LVs, the model expresses (lin- involving endogenous variables are not permissible
ear) directional and nondirectional relationships among because these associations are indirectly implied by
the MVs and LVs. Directional relationships are influ- other variables in the system (MacCallum, 1995). Each
ences of a predictor variable on an outcome variable of these parameters will either be free (i.e., its value is
(i.e., a regression slope or coefficient). For example, to be estimated from data) or fixed (i.e., specified to
the idea of the experience of a recent traumatic event take on a particular value). Fixing parameters to spe-
triggering increased levels of depression implies a di- cific values is often motivated by the need to identify
rectional effect of an MV on an LV. A nondirectional the model.
relationship is a correlational association between two
variables (i.e., the two variables have no special sta-
Identification
tus as predictor vs. outcome). For example, expecting
anxiety and depression to covary in a similar direction Identification is necessary to obtain unique values for
implies a positive nondirectional relationship between estimated parameters. A common identification con-
two LVs. A researcher thus specifies a model by defin- straint involves setting the scale of an LV. Because LVs
ing the form of the network or system of directional and are not directly observed, they are scale free. To obtain
nondirectional relationships among MVs and LVs. unique values for directional and nondirectional paths
Every variable in the model is either an endogenous between an LV and other variables, the LV requires an
or an exogenous variable. Endogenous variables re- explicitly defined unit of measurement. Consider a di-
ceive at least one directional influence from another rect effect of an exogenous LV predictor on an endog-
variable in the system, implying that their variance enous MV outcome. Let this LV be depression and the
can be accounted for by that variable in the system. MV be number of suicidal thoughts. For the direct ef-
Endogenous variables can also directly influence an- fect to have a value such that a one-unit increase in de-
other variable in the system (e.g., a mediating variable pression is associated with some increment in the count
between a predictor and an outcome or an error term of suicidal thoughts, the depression LV requires a scale.
with autoregressive effects). However, not all of the To set the scale of an exogenous LV so that the direct
variance of an endogenous variable can be accounted effect of depression on counts of suicidal thoughts is
for by the variables included within the system. Thus, identified (i.e., has a unique estimated value), we can
the residual variance of an endogenous variable is de- fix the variance of depression to be 1.0 (and its mean
fined as an “error term” (e.g., unique variance of an to be 0) such that this LV adopts the standardized scale
MV that is an indicator of a factor). Error terms are (i.e., values on the LV are z-scores). Then, a direct ef-
also examples of exogenous LVs. In contrast to endog- fect of value 0.5 implies that a 1 standard unit increase
enous variables, exogenous variables are not directly in the LV depression is associated with an increment of
influenced by another variable. Instead, exogenous 0.5 number of suicidal thoughts. Note that when both
variables typically exert directional influences on en- variables involved in directional or nondirectional re-
dogenous variables and are often associated with other lations are standardized, the estimated parameter is a
exogenous variables by nondirectional relationships standardized effect (akin to a standardized regression
(e.g., predictors in a multiple linear regression speci- coefficient).
fied to be correlated with one another). Because the Recall that the variance of an endogenous LV is a
variances of exogenous variables are not explained by function of other parameters of the model. Thus, one
any variable in the model, it is assumed that what in- cannot directly fix the variance of an endogenous LV
fluences them are external to the model. For this rea- by setting its scale. Instead, when the variance of an
78  I. F ou n dat i o ns

endogenous LV is to be scaled to 1.0, a constraint needs 1977, on which this example is based). Because these
to be imposed on the function of the parameters that conditions are insufficient, meeting them does not
make up the variance of the endogenous LV. Although guarantee that a model is identified. The identifiability
this approach has been implemented by Browne and of a model can be algebraically determined (see Long,
Mels (1992) in RAMONA, which is incorporated into 1983, for examples), but empirical checks implemented
SYSTAT, it is not readily available in other SEM soft- in SEM software have become the default approach
ware. However, one can impose a constraint of 1.0 on to diagnosing problems with identification (Bollen &
a function of parameters that form the variance of the Bauldry, 2010).
exogenous LV to achieve this form of identification (see In the next section, we illustrate how models can be
Example 1 on the next page). equivalently specified with path diagrams and LISREL
The default approach to setting the scale of exog- equations in two examples. We also demonstrate the
enous and endogenous LVs in most SEM software, in- imposition of identification constraints and how to tally
stead, is to fix a directional path from the LV to an MV model degrees of freedom.
to 1.0. Consider the Rosenberg (1965) self-esteem scale
in which responses to 10 items are made on a 4-point
Path Diagrams
Likert-type scale with 1 = strongly disagree, 2 = dis-
agree, 3 = agree, and 4 = strongly agree. When the LV The development of path diagrams to communicate
is scaled by fixing the direct path (factor loading) from structural equation models is attributed to Wright
the LV to the first indicator, the LV takes on the scale of (1920), who studied gene and environment effects on
this MV item (i.e., a value of 1 represents a change from the coat color of guinea pigs. The standard convention
one level of the ordered categorical response to the is to use squares or rectangles () to represent MVs
next). For example, suppose that the LV self-esteem, and circles or ellipses () to represent LVs (including
scaled according to an indicator MV, is predicted by error terms). Directional effects are represented by
a binary MV, where 0 = female and 1 = male. A direct single-headed arrows emanating from a predictor vari-
path value of 0.8 is then interpreted as males having able to an endogenous variable. Nondirectional effects
a 0.8 higher self-esteem score than females; the scale between variables are represented by double-headed
of this self-esteem LV adopts the MV’s 4-point Likert- arrows. Variances of exogenous variables are also rep-
type scale. Note also that error terms (LVs) are often resented by double-headed arrows that start and return
scaled by fixing 1.0 to the direct path of the error term to the variables themselves. Each of these directional
to its respective endogenous variable. and nondirectional arrows represents a parameter of
Model identification is a complex topic. There are the model, of which some are fixed and others are to
two necessary but insufficient conditions to obtain an be estimated. Before estimation, parameters are usu-
identified model. The first condition is to set the scale ally represented by Greek letters associated with their
of all LVs in the model. As reviewed earlier, this can be respective LISREL matrices (to be reviewed below),
done by fixing the LV variances to 1.0 to standardize and estimated parameters are represented by their es-
them or by setting a directional path from the LV to timated numerical values.
an indicator to adopt the scale of that MV. The second Beyond these standard conventions, there are varia-
condition is to ensure that the model has nonzero de- tions in how other features of models are represented.
grees of freedom. The degrees of freedom of the model For example, means or intercepts for models with a
are the total number of unique elements in the covari- mean structure are traditionally represented as direc-
ance matrix of the MVs minus the effective number of tional paths from a triangle () with a value of 1.0
estimated parameters in the model. The effective num- within them to their respective variables. Alternatively,
ber of estimated parameters is a count of the total num- means and intercepts can be represented by Greek let-
ber of parameters to be estimated minus any additional ters or their estimated values resting on their respec-
constraints placed on them (e.g., equality constraints). tive LV ellipse or MV rectangle (e.g., Bauer & Cur-
p ( p+1)
Suppose that we have p = 6 MVs, resulting in 2 = ran, 2020; see Figure 4.2). Another variation in path
21 unique elements in the sample covariance matrix. If diagrams is to suppress information about the error
there are 20 parameters and three identification con- terms by not displaying them as LVs (ellipses; e.g., Mu-
straints, the number of effective parameters is 20 – 3 rayama, 1998; Paxton, Hipp, & Marquart-Pyatt, 2011)
= 17. Taken together, the degrees of freedom would be or not explicitly graphing their correlational structure
21 – 17 = 4 (see Wheaton, Muthén, Alwin, & Summers, with double-headed arrows (e.g., Bollen & Curran,
4. Visualizations for SEM  79

2006). In more complex models such as those with =x L xx + d (4.1)


multilevel structures, researchers have made use of dif-
ferent font types (e.g., italics, bold) in path diagrams to =y L yh + e (4.2)
distinguish between different approaches to centering
data (Curran & Bauer, 2007). Consistent with the prin- h = Gx + Bh + z (4.3)
ciple of transparency, we propose that path diagrams
should explicitly represent all the parameters of the In general, matrices (i.e., not scalars, which have a sin-
model such that they are equivalent to their mathemati- gle element) are denoted by bold symbols. Equation 4.1
cal expression. expresses the measurement model for the exogenous
variables. In our example, x is an 8 × 1 vector of exog-
enous MVs in which x1 to x4 indicate sensation seeking
Examples
(x1), x5 to x 7 indicate self-regulation (x2), and x8 is the
Our two examples are based on data from the third MV for SES. Lx is an 8 × 3 matrix of factor loadings, x
wave of a four-wave study of self-regulation in adoles- is a 3 × 1 vector of exogenous variables, and d is an 8 ×
cence. At this wave (N = 703), the mean age of partici- 1 vector of exogenous MV unique variances.
pants was 15.04 (SD = 1.15; range: 11–18). Here, we ex- In matrix notation, Equation 4.1 can be expanded to
amined relations among the LVs of sensation-seeking,
self-regulation, and problem behaviors while condi- l 0 0 
 x 1   x11   d1 
tioning on the MV of subjective socioeconomic status  x   l x21 0 0  d 
(SES). Sensation seeking was measured by the four-  2    2
item Brief Sensation-Seeking Scale (Hoyle, Stephen-  x 3   l x31 0 0   d3 
son, Palmgreen, Pugzles Lorch, & Donohew, 2002;   
x l 0 0   x1  d 
response scale: 1 = not at all true of me to 5 = very =  4   x41  x 2  +  4  (4.4)
much true o f me), denoted by x1, x2, x3, and x4. Self-
x5   0 l 0     d5 
  x8   
x52
  
regulation was measured by the 13-item Questionnaire x6   0 lx 0  d6 
on Self-Regulation (Novak & Clayton, 2001; response x   62
 d 
scale: 1 = not at all true of me to 5 = very much true  7   0 l x72 0   7
 x 8  
   0 
of me). These 13 items were reduced to three subscale  0 0 1 
scores (x5, x6, and x 7) that are the means across specific
sets of items. Subjective SES (x8) was measured by a Elements of matrices take on lowercase letters of their
single item created for the study (response scale: 1 = uppercase matrix counterpart. For instance, an element
we do not have enough money to meet our basic needs in Lx is denoted by lx. The numerical subscripts ac-
to 4 = we have enough money to do most anything we companying each element indicate the row and then
want). Problem behaviors (i.e., aggressive and devi- column position where the element sits within the ma-
ant behavior) were measured using 26 items from the trix. For example, l x52 is the element within Lx located
Problem Behavior Frequency Scale (Multisite Violence on the fifth row and second column. This element is
Prevention Project, 2004; response scale: 0 = never to 5 the factor loading of x5 that is an indicator of x2. Ele-
= 20+ times in the last month). The three indicators of ments in vectors have a single numerical subscript to
problem behaviors (y1, y2, and y3) are subscale scores indicate their row position (e.g., d7). Additionally, con-
obtained by averaging responses over specific sets of sider the first equation, x1 = l x11x1 + 1.0d1. In Figure
items. Participant sex, used to illustrate mean structure, 4.1, this equation is represented by the two directional
was coded 0 = male and 1 = female. arrows from x1 (represented by  labeled by x1) and s1
(represented by  labeled by d1) to x1 (represented by
 labeled as x1). Note that the magnitude of the path
Example 1
from x1 to x1 is l x11 and the magnitude of the path from
The path diagram in Figure 4.1 expresses the linear d1 to x1 is 1.0. In this vein, Equation 4.1 represents the
relationships among MVs and LVs in which sensation exogenous portion of the path diagram relating LVs to
seeking (x1) and self-regulation (x2), conditioned on their indicators. Note that the LISREL equations only
SES (x8), predict problem behavior (h1). Path diagrams allow linear relations among LVs. Thus, Equation 4.4
are a visual expression of the following three LISREL expresses the exogenous MV of SES (x8) as x8 = x8 such
data model equations as combinations of matrices: that d8 = 0 (Long, 1983, p. 28). Alternatively, as shown
80  I. F ou n dat i o ns

θ δ 11 θ δ 22 θ δ 33 θ δ 44

δ1 δ2 δ3 δ4
1.0 1.0 1.0 1.0

x1 x2 x3 x4

λx 21 λx 31
λx 11 λx 41

ϕ33
Sensation
Seeking
γ 11
ξ1 ψ 11
SES ψ 11
ϕ 11 =1.0
ξ3 ϕ 31 ζ1
ζ1
ϕ 33 1.0 1.0
λx 83 = 1.0 ϕ 21 γ 18 Problem Problem
SES
Behavior Behavior 1.0
x8 η1
η1
x8

1.0 ϕ 32 γ 12 λy λy
ϕ 22 = 1.0 λy = 1.0 λy 11 31
δ8 11
λy
31
λy
21 21
Self-
y1 y2 y3 y1 y2 y3
θ δ 88 = 0 Regulation
ξ2 1.0 1.0 1.0 1.0
1.0 1.0
ϵ1 ϵ2 ϵ3 ϵ1 ϵ2 ϵ3
λx 52
λx 72
λx 62 θ ϵ 11 θ ϵ 22 θ ϵ 33 θ ϵ 11 θ ϵ 22 θ ϵ 33

x5 x6 x7
1.0 1.0 1.0
θ δ55 δ5 δ6 δ7 θ δ77
θ δ66
θ δ 75

FIGURE 4.1. Path diagram of Example 1 depicting the hypothesized directional effects of sensation seeking and self-regulation
on problem behavior, conditioning on socioeconomic status (SES). The breakout plot from SES illustrates how an exogenous
MV is specified in LISREL as an LV that is perfectly indicated by the MV. The breakout plot from problem behavior presents
an alternative approach to setting the scale of the endogenous LV by constraining its variance to 1.0 (see Equation 4.16) instead
of fixing ly11 = 1.0.

in the breakout plot in Figure 4.1, exogenous MVs can enous portion of the system of equations, relating MVs
be thought of as perfect indicators of an exogenous LV; to their respective LVs. Here, y is a 3 × 1 vector of MVs
in Equation 4.4, x8 in x can be replaced by x3, the value in which y1 to y3 indicate problem behaviors, Ly is a 3 ×
0 in d can be replaced by d8, and the variance of d8 1 factor loading matrix linking the MVs to the LV, and
(VAR[d8] = qd88) is constrained to 0. Similar to Equation e is the 3 × 1 vector of endogenous MV unique vari-
4.1, Equation 4.2 can be expanded to ances. Finally, Equation 4.3 is termed the “structural
equation—because it expresses direct (structural) paths
 y1   l y11   e1  among the exogenous and endogenous variables. In ma-
 y  = l  h + e  trix notation, Equation 4.3 can be reexpressed as
 2   y21  [ 1 ]  2  (4.5)
 y3   l   e 3   x1 
 y31 
h1 = [ g11 g12 g13 ] x 2  + z1 (4.6)
Equation 4.5 is the measurement model for the endog-  x8 
4. Visualizations for SEM  81

where G is the 1 × 3 matrix of structural paths from the  f11 


VAR [ x ] =
F=f 
exogenous variables to the endogenous variables and z  21 f 22  (4.11)
contains a single residual term of the endogenous LV  f31 f32 f33 
of problem behaviors (h1). Because the theorized struc-
tural relations in Equation 4.11 do not involve endog-
VAR [ z ] = Y = [y11 ] (4.12)
enous variables predicting other endogenous variables,
the matrix B is a zero matrix, which does not show up
in Equation 4.6 (cf. Equation 4.3). In scalar form, Equa- VAR [=
d ] Q=
d
tion 4.6 is h1 = g11x1 + g12x2 + g13x8 + z1, which maps onto
the directional arrows from x1, x2, and x8 to h1 in Figure  qd11 
 
4.1. Note that the elements in the data model (Equa-  0 qd22 
tions 4.4 to 4.6) pertain to only directional paths (i.e.,  0 0 qd33 
single-headed arrows) in a path diagram. The nondirec-  
tional paths (i.e., double-headed arrows) are parameters  0 0 0 qd44  (4.13)
 
within covariance matrices in the covariance structure  0 0 0 0 qd55 
of the model.  
The covariance structure of the model is obtained  0 0 0 0 0 qd66 
by applying covariance algebra to Equations 4.1 to 4.3,  
 0 0 0 0 0 0 qd77 
whereby the elements of the covariance matrix for the
MVs are an expression of model parameters. Let S de-
note the population covariance matrix for the 11 MVs VAR [ e ] = Q e = diag qe11 , qe 22 , qe33  (4.14)
in Example 1.
 S xx  Note that diag[ ] denotes a square matrix with diago-
S=  (4.7) nal elements listed within the brackets. The elements
S
 yx S yy 
within the matrices of Equations 4.11 to 4.14 represent
where Sxx is the 8 × 8 covariance matrix for x1 to x8, nondirectional paths. Variances of the variables are
Syy is the 3 × 3 covariance matrix for y1 to y3, and Syx the diagonal elements of the covariance matrices and
the 3 × 8 covariance matrix among pairs of exogenous are represented as double-headed arrows starting and
and endogenous MVs. S in Equation 4.7 is a square ending at the variable itself. For instance, the variance
and symmetric matrix, and we follow convention by of SES (x8) is denoted as the double-headed arrow la-
suppressing elements in the upper triangular portion beled as f33 in Figure 4.1. Covariances of the variables
of the matrix; matrix elements above the diagonal are are the off-diagonal elements of the matrices. Thus,
the transpose of elements below the diagonal (e.g., Sxy the double-headed arrow labeled f21 is the covariance
= S′yx ). By expressing the submatrices as functions of between sensation seeking (x1) and self-regulation (x2).
model parameters, we have Similarly, the covariance between the unique variances
of x5 and x 7, depicted by the double-headed arrow la-
S xx L x FL ′x + Q d
= (4.8) beled qd75 implies shared systematic variance that is
unaccounted for by self-regulation (x2).
L x + FG ( I − B )′
S yx = (4.9)
Model Identification. The specification of the model
S yy = L y ( I − B )
−1
( GFG + Y )( I − B )′−1 L′y + Q e (4.10) for Example 1 in Equations 4.4 to 4.6 and 4.11 to 4.14
represents theorized relations among MVs and LVs in
where I is an identity matrix of the same order as B, the population. At this point, the system of simultane-
F is the covariance matrix among the exogenous vari- ous equations is independent of data, and fitting the
ables in x, Y is the covariance matrix of the residuals model to data obtains unique numerical values (esti-
of the endogenous LVs in h, and Qd and Qe are the mates) for the model parameters, as well as indices of
covariance matrices of the unique variances of the model fit. Not all Greek letters in the model equations
exogenous and endogenous MVs, respectively. For our are parameters to be estimated; these nonparameter
example, Greek letters are x, h, d, and e that denote LVs. The
82  I. F ou n dat i o ns

model parameters associated with these LVs are their exogenous variables, 1 residual LV variance (z1), and
variances and covariances in the matrices F, Y, Qd, and 3 structural paths (elements in G). These add up to 31
Qe , respectively, as well as directional paths among x effective parameters. Given that there are three identi-
and h in the matrices B and G. fication constraints, the degrees of freedom would be
Recall that model identification is necessary for ob- 66 – (31 – 3) = 38. Taken together, the model speci-
taining unique values of parameter estimates. In Ex- fied in Figure 4.1 meets the two necessary but insuf-
ample 1, the first necessary condition is to set the scale ficient conditions of identification. The next example
of the LVs of sensation seeking (x1), self-regulation (x2), examines noninvariance in the measurement model of
and problem behavior (h1). In Figure 4.1, the scale of sensation seeking (x1) among boys and girls to illustrate
the exogenous LVs (sensation seeking and self-regula- mean structure.
tion) is set by fixing their variances to 1.0 (i.e., f11 = f22
= 1.0). Because problem behavior is an endogenous LV,
Example 2
and its variance is a function of other parameters in the
model, the most straightforward way to set its scale is Consider modeling group differences in a multiple
to fix the loading of y1 in which l x11 = 1.0 (see Figure group SEM (Jöreskog, 1971; Thompson, Liu, & Green,
4.1). Alternatively, the scale of problem behavior can Chapter 21, this volume). Let g denote different groups
be fixed to 1.0 by imposing a particular constraint. By that map onto the populations of boys (g = 1) versus
applying covariance algebra to Equation 4.6, it can be girls (g = 2). Then, the LISREL data model for exog-
shown that the variance of the endogenous LV is enous MVs in Equation 4.1 and the covariance struc-
ture in Equation 4.8 can be expanded to allow group
VAR [ h1 ] = g11
2 2
f11 + g12 2
f 22 + g13f33 + y11 differences whereby each group has its own set of equa-
(4.15)
+2 [ g11g12f 21 + g11g13f31 + g12 g13f31 ] tions. Thus, model parameters can differ across groups.
Because groups can differ on means of exogenous
By substituting f11 = f22 = 1.0 as identification con- variables and intercepts on endogenous variables, the
straints to set the scale for the endogenous LVs in Equa- model equations are expanded to allow for means and
tion 4.15, we can impose the following constraint to fix intercepts. In the previous example, it was assumed that
the variance of h1 to 1.0: the means and intercepts of MVs and LVs are 0. With
mean structure, Equation 4.1 for group g becomes
2 2 2
g11 + g12 + g13f33 + y11
(4.16)
+2 [ g11g12f 21 + g11g13f31 + g12 g13f31 ] =1.0 xg =
v xg + L xg x g + d g (4.17)

In Figure 4.1, this alternative approach to setting the where n xg is the vector of intercept terms for the exog-
scale of the exogenous LV is shown in the breakout plot enous MVs associated with group g. Furthermore, an
located to the right. The breakout plot depicts the vari- additional parameter vector for the means of the exog-
ance of h1 by a dashed double-headed arrow because enous LVs for each gth group is denoted as kg = E[xg],
VAR[h1] is a function of model parameters. Fixing where E[·] is the expectation function. With these ex-
VAR[h1] = 1.0 is achieved by imposing the constraint in panded equations, the covariance and mean structures
Equation 4.16. Finally, recall that error terms are also for the MVs for each gth group are
LVs. Typically, these LVs are identified by fixing direc-
tional paths from error terms (i.e., z, d, and e) to their E  x=
g
 v xg + L xg K g (4.18)
respective variables to 1.0 (see Figure 4.1). In this vein,
the variances of the error terms take on the scale of =S xxg L xg Fg L ′xg + Q d g (4.19)
their respective indicators.
The second necessary but insufficient condition We report on scalar invariance between the measure-
of an identified model is to have nonzero degrees of ment models for the boys (g = 1) and girls (g = 2) (see
freedom. The total number of unique elements in the Widaman & Olivera-Aguilar, Chapter 20, this volume,
11(12)
covariance matrix of p = 11 MVs is 2 = 66. From for a detailed treatment of measurement invariance
Figure 4.1, there are 10 factor loadings, 10 MV unique testing). With scalar invariance, the item intercepts
variances and one covariance between two unique vari- (nx) and factor loadings (Lx) are constrained to be equal
ances (qd75), 6 variances and covariances among the across groups, and group differences are expected for
4. Visualizations for SEM  83

the LV means (k), the LV variances (F), and the MV tercepts are not variables, they are conventionally
unique variances (Qd). Thus, the parameter matrices in represented by a triangle with a value of 1.0 to dis-
Equations 4.17 are tinguish them from MVs (represented by rectangles)
and LVs (represented by ellipses). Similar to Ex-
 v x1  ample 1, any equation can be reproduced from a dia-
  gram using path tracing rules. For example, consider
 v x2 
v=x1 v x=2
v=
x v  x1=
1
1.0n x1 + l x1 x11 + 1.0qd11. The MV x11 is represented
 x3  by a  with its label, the 1.0 multiplier of n x1 would tra-
v  ditionally be represented by a  with a 1.0 inside it, and
 x4 
the LVs of x11 and qd11 would be represented by a , each
 l x1  labeled with their respective Greek symbols. As shown
  in Figure 4.2, we recommend removing  for means
 l x2 
L=x1 L= x2 L=x l  and intercepts to avoid “chartjunk” (i.e., unnecessary
 x3  visual elements for comprehending the information in
l  the figure; Tufte, 2001). This approach to simplifying
 x4 
path diagrams with mean structure was introduced by
 d11   d 21  Bauer and Curran (2020). In Figure 4.2, MV intercepts
   
d 22  and LV means are depicted as parameters sitting on top
d12 
d1 =         d 2 =   of their rectangles and ellipses, respectively. Equality
 d13   d 23  of parameters across the two groups is represented by
d  d  the same symbols (e.g., factor loadings and item inter-
 14   24 
cepts), whereas group differences are depicted by dif-
When there are group differences, the first numerical ferent symbols (e.g., LV means). An alternative depic-
subscript in our expanded notation denotes the group tion using the triangle representation is provided in the
and the numerical sub-subscript indicates the position online supplement.
of the element within its vector or matrix. For example,
d 23 is the error term for girls (g = 2) associated with Model Identification. In multiple group SEM, the
the third item (x23 ) of the sensation-seeking measure. model is fit to the covariance matrix and mean vector
Because the MVs are not fully invariant, the MVs for of MVs to each group. The total number of unique ele-
the two groups have a group subscript. Similarly, the ments in the covariance matrix for p = 4 by 2 groups is
4(5)
LVs are assumed to be noninvariant across the groups 2
× 2 = 20. The total number of means is p × 2 = 4 ×
in terms of their means and variances, in which we dis- 2 = 8. Thus, the total number of unique elements is 20
tinguish x1 for boys from x2 for girls. In this vein, the + 8 = 28. From Figure 4.2, there are four factor load-
means on sensation seeking for boys and girls are k1 = ings, four item intercepts, eight MV unique variances,
[k11] and k2 = [k 21], respectively. Similarly, the parame- two LV means, and two LV covariances, totaling 20 ef-
ter matrices in Equation 4.19 for boys are F1 = [q111] and fective parameters. With two identification constraints,
the degrees of freedom are 28 – (20 – 2) = 10.
Q d= diag qd1 , qd1 , qd1 qd1  We next provide information on how to generate
1  11 22 33 44 
path diagrams, features of a good diagram, and how to
Those for girls are F2 = [f 211] and use such diagrams for model specification.

Q d= diag qd2 , qd2 , qd2 qd2 


2  11 22 33 44  Strategies and Recommendations
For identification, the mean and variance of sensation In this first section, we have discussed the fundamen-
seeking for boys are scaled to be standardized (i.e., tals of model specification using matrix notation and
k11 = 0 and f111 = 1.0). path diagrams. Using two examples, we have demon-
With the inclusion of mean structure, the isomor- strated how models can be explicitly defined by matri-
phism between the model equations and path diagram ces or path diagrams, emphasizing their isomorphism.
remains preserved by including mean and intercept When a path diagram is produced, its value lies in the
parameters into the diagram. Because means and in- clarity, accuracy, and completeness of the information

You might also like