Professional Documents
Culture Documents
Discovering Pure Measurement Models
Discovering Pure Measurement Models
1) Motivation
2) Representing/Modeling Causal Systems
1
Discovering
Pure Measurement Models
Richard Scheines
Carnegie Mellon University
Ricardo Silva*
University College London
3
Goals:
• What Latents are out there?
• Causal Relationships Among Latent Constructs
Relationship
Depression Satisfaction
or
Relationship
Depression Satisfaction
or ?
4
Needed:
Ability to detect
conditional independence
among latent variables
5
Lead and IQ
e2
Parental Resources e3
Lead IQ
Exposure
Lead _||_ IQ | PR
PR ~ N(m=10, s = 3)
6
Psuedorandom sample: N = 2,000
Parental Resources
Lead
Exposure IQ
Regression of IQ on Lead, PR
PR 0.98 0.000 No
7
Measuring the Confounder
e1 e2 e3
X1 X2 X3
Parental
Lead Resources
IQ
Exposure
8
Scales don't preserve conditional independence
X1 X2 X3
Parental
Lead Resources
IQ
Exposure
9
Indicators Don’t Preserve Conditional Independence
X1 X2 X3
Parental
Lead Resources
IQ
Exposure
X1 0.22 0.002 No
X2 0.45 0.000 No
X3 0.18 0.013 No
Lead -0.414 0.000 No
10
Structural Equation Models Work
X1 X2 X3
Parental
Resources
Lead
Exposure IQ
F1 F2 F3
12
Local Independence Desirable
Truth
F1 F2 F3
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
1 2 3
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
E ( ˆ31 ) 0 13
Correct Specification Crucial
Truth
F1 F2 F3
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
F4
1 2 3
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
E ( ˆ31 ) 0 14
Strategies
15
Correctly Specify Deviations from Local Independence
Truth
F1 F2 F3
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
F4
1 2 3
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
x4 z4
E ( ˆ31 ) 0 16
Correctly Specifying Deviations from Local Independence
is Often Very Hard
Truth
F1 F2 F3
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
F5 F4 F6
17
Finding Pure Measurement Models -
Much Easier
Truth
F1 F2 F3
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
F5 F4 F6
Truth
F1 F2 F3
x1 x2 x3 y1 y2 y3 y4 z3 z4
F5 F6
18
Tetrad Constraints
L W = 1 L + 1
1 2 4 X = 2 L + 2
3
Y = 3L + 3
W X Y Z Z = 4L + 4
• it follows that tetrad
constraints
CovWXCovYZ = (122L) (342L) =
= (132L) (242L) = CovWYCovXZ
g
rm1 * rr1 = rm2 * rr2
m1 m2 r1 r2
Impurities/Deviations from Local Independence
defeat tetrad constraints selectively
Truth Truth
F1 F1
x1 x2 x3 x4 x1 x2 x3 x4
F5
True Model F1 F2 F3
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
Initially Specified F3
F1 F2
Measurement Model
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
22
Purify
Iteratively remove item whose removal most improves
measurement model fit (tetrads or c2)
– stop when confirmatory fit is acceptable
F1 F2 F3
Remove x4
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
F1 F2 F3
Remove z2
x1 x2 x3 x4 y1 y2 y3 y4 z1 z2 z3 z4
F
Purify
F1 F2 F3
x1 x2 x3 y1 y2 y3 y4 z1 z3 z4
24
Purify
25
How a pure measurement model is useful
F1 F2 F3
x1 x2 x3 y1 y2 y3 y4 z1 z3 z4
b21
Input:
- Purified Measurement Model
- Covariance matrix over set of pure items
MIMbuild
PC algorithm with independence tests
performed directly on latent variables
28
Purify & MIMbuild
29
Goal 2: What Latents are out there?
30
Latents and the clustering of items they measure
imply tetrad constraints diffentially
Model 1 Model 2
F1 F2 F1 F2
x1 x2 x3 x4 x5 x5 x1 x2 x3 x4 x5 x5
Model 3 Model 4
F3 F1 F2 F3
F1 F2
x6 x1 x2 x3 x4 x5 x6
x1 x2 x3 x4 x5
31
Build Pure Clusters (BPC)
Input:
- Covariance matrix over set of original items
BPC
1) Cluster (complicated boolean combinations of tetrads)
2) Purify
32
Build Pure Clusters
33
Build Pure Clusters
Qualitative Assumptions
1. Two types of nodes: measured (M) and latent (L)
2. M L (measured don’t cause latents)
3. Each m M measures (is a direct effect of) at least one l L
4. No cycles involving M
Quantitative Assumptions:
1. Each m M is a linear function of its parents plus noise
2. P(L) has second moments, positive variances, and no deterministic
relations
34
Build Pure Clusters
Output - provably reliable (pointwise consistent):
Equivalence class of measurement models over a pure subset of M
For example:
L1 L2 L3
True Model
m1 m2 m3 m4 m5 m6 m10 m11 m7 m8 m9
L1 L2 L3
Output
m1 m2 m3 m4 m5 m6 m7 m8 m9
35
Build Pure Clusters
Measurement models in the L1 L2 L3 L4
equivalence class are at most
refinements, but never
coarsenings or permuted m1 m2 m3 m4 m5 m6 m7 m8 m9
clusterings.
L1 L3
L1 L2 L3
m1 m2 m3 m 4 m 5 m6 m7 m 8 m 9
m 1 m 2 m3 m4 m5 m6 m7 m8 m9
L1 L2 L3
Output
m1 m2 m3 m4 m 5 m6 m7 m 8 m 9
36
Build Pure Clusters
Algorithm Sketch:
1. Use particular rank (tetrad) constraints on the measured correlations
to find pairs of items mj, mk that do NOT share a single latent parent
2. Add a latent for each subset S of M such that no pair in S was found
NOT to share a latent parent in step 1.
3. Purify
4. Remove latents with no children
37
Build Pure Clusters + MIMbuild
38
Case Studies
39
Case Study: Stress, Depression, and Religion
St1 Dep1
12 Specified Model 12
St2 Dep2
1.2 + Depression 12
Stress .
. .
+
-
St21 Dep20
12 Coping 12
p = 0.00
C1 C2 . . C20 40
Case Study: Stress, Depression, and Religion
41
Case Study: Stress, Depression, and Religion
42
Case Study : Test Anxiety
Bartholomew and Knott (1999), Latent variable models and factor analysis
12th Grade Males in British Columbia (N = 335)
20 - item survey (Likert Scale items): X1 - X20:
X3
X2 Exploratory Factor Analysis:
X4
X8
X5
X9
X6
Emotionality Worry
X10
X7
X15
X14
X16
X17
X18
X20 43
Case Study : Test Anxiety
X2 X3
Cares About
X8 Achieving X5
X9
X7
X10 Emotionalty
X11 X6
Self-
X16 Defeating
X14
X18
44
Case Study : Test Anxiety
X3
X2 X3
X2
X4 Worries About
X8 X8 Achieving X5
X5
X9 X9
X7
X6
Emotionality Worry Emotionalty
X10 X10
X7
X15 X11 X6
X14 Self-
X16 X16 Defeating
X14
X17
X18
X18
X20
45
Case Study : Test Anxiety
X9
X7
Emotionalty-
X10 Emotionalty
Scale
X11 X6
Self- Self-
X16 Defeating Defeating
X14
X18
p = .43 Uninformative
46
Limitations
47
Open Questions/Projects
• IRT models?
• Bi-factor model extensions?
• Appropriate incorporation of background knowledge
48
References
• Tetrad: www.phil.cmu.edu/projects/tetrad_download
• Spirtes, P., Glymour, C., Scheines, R. (2000). Causation, Prediction,
and Search, 2nd Edition, MIT Press.
• Pearl, J. (2000). Causation: Models of Reasoning and Inference,
Cambridge University Press.
• Silva, R., Glymour, C., Scheines, R. and Spirtes, P. (2006) “Learning the
Structure of Latent Linear Structure Models,” Journal of Machine Learning
Research, 7, 191-246.
• Learning Measurement Models for Unobserved Variables, (2003). Silva, R.,
Scheines, R., Glymour, C., and Spirtes. P., in Proceedings of the Nineteenth
Conference on Uncertainty in Artificial Intelligence , U. Kjaerulff and C.
Meek, eds., Morgan Kauffman
49