Professional Documents
Culture Documents
Standard research model
Seminar overview
V4
V5
V6
V1 V2 V3 F1
F2
F3
V7
V8
V9
Measurement model
Seminar overview
V4
V5
V6
V1 V2 V3 F1
F2
F3
V7
V8
V9
Structural model
Seminar overview
F2 F1 F3
Before estimating, measurement model must be validated
Seminar overview
V4
V5
V6 Session 1
V7
V8
V9
5
Standard research model
Seminar overview
V4
V5
V6
V1 V2 V3 F1
V7
V8
V9
Standard research model
Seminar overview
V4
V5
V6
V1 V2 V3 F1
F2
F3
V7
V8
V9
Standard research model
Seminar overview
V4
V5
V6 Seminar on July, 14th 2011 Estimate SEM using a completely different approach: Partial Least Squares (PLSPM) V9
V1 V2 V3 F1
F2
F3
V7
V8
Example (tribute to Spearman, 1904; Sharma, 1996) Suppose we have students grades for the following courses: Mathematics (M) Physics (P) Chemistry (C) English (E) History (H) French (F) Researcher has no hypothesis regarding intelligence is a one dimension factor (general intelligence) or two factors (quantitative and verbal intelligence). Exploratory Factor Analysis (EFA) is data driven Or maybe the researcher do has a hypothesis about intelligence dimensions: Quantitative intelligence will provoke a better performance on M, P and C courses while Verbal intelligence will provoke a better performance on E, H and F. Confirmatory Factor Analysis (CFA) is theory driven
Exploratory Factor Analysis
12=21
1 61 21 31 41
51
2 12 22 32 42 52
11
62
x1
x2
x3
x4
x5
x6
10
Confirmatory Factor Analysis
CFA is theory driven, not data driven We must a priori decide on the number of factors and their inter-correlations We must decide in advance which variables load on which factors
11
Terminology Manifest variables, Indicators, observed variables X1 to X6: M, P , C, E, H and F Represented by a square box Latent variables, Factors Common factors: Verbal intelligence (V), Quantitative intelligence (Q= Specific factors or errors, uniqueness, disturbances: Random measurement errors. Circles Covariance between latent variables Curved two-headed arrow Factorial loadings, structural coefficients, paths Straight arrow
12
X3
X4
13
14
Y = a + bX + e Var (Y ) = Var (a + bX + e) Cov( X , Y ) = Cov( X , a + bX + e) Var (Y ) = b 2Var ( X ) + Var (e) Cov( X , Y ) = bVar ( X ) b 2Var ( X ) + Var (e) bVar ( X ) bVar ( X ) Var ( X )
b
X
15
Formulation
x1 = 111 + 1 x2 = 211 + 2
0 x1 11 1 x 0 2 21 2 x 0 1 3 3 31 = + x 0 42 2 4 4 x5 0 52 5 x 0 6 6 62
x = +
16
Formulation
x = +
= E [ ] = E [ ]
= +
0 11 21 0 0 31 ; = 11 12 ; = = 0 12 22 42 0 52 0 62
0 0 11 12 13 0 0 0 0 21 22 23 0 0 0 31 32 33 0 0 0 44 45 46 0 0 54 55 56 0 0 0 64 65 66 0
17
Formulation
2 11 11 + 11 112111 + 12 2 21 11 + 22 211111 + 21 + 312111 + 32 = 31 11 11 31 42 2112 42 1112 522112 52 11 12 62 2112 62 1112
113111 + 13 114221 115221 116221 213111 + 23 214221 215221 216221 2 31 11 + 33 314221 315221 316221 2 42 3112 42 22 + 44 425222 + 45 426222 + 46 2 523112 524222 + 54 52 22 + 55 526222 + 56 2 623112 624222 + 64 625222 + 65 62 22 + 66
18
Formulation
Population covariance matrices (infinite # solutions) Restricted population covariance matrix
Y = R\Rv + O
Identification
Y * = R *\* R v * + O*
Solution
v+O = R\R Y
Solution
19
Model specification
11
21 31
42
52 62
x1
x2
x3
x4
x5
x6
20
Model specification Step 1. An error term must be specified for each dependent variable
11
21 31
42
52 62
x1
x2
x3
x4
x5
x6
21
Model specification Step 2. Only independent variables admit covariances
22
Model specification Step 2. Only independent variables admit covariance
Covariance among error terms are technically possible but conceptually questionable
23
Model specification Step 3. Check that no error term has been drawn for independent variables
24
Model specification Step 4. Variances for each independent variable in the model must be estimated
* *
25
Model specification Step 5. Covariance between independent variables must be estimated
* *
26
Model specification Step 6. Regression coefficients must be estimated (structural coefficients, loadings, error terms paths)
12=21 *
1
* *
2
* * *
11 21 31
* * *
42 52 62
x1
x2
x3
x4
x5
x6
* * * * * * * * * * * *
1 2 3 4 5 6
27
Identification (Diamantopoulos & Siguaw, 2000; p.48-49)
Can a single unique value for each and every free parameter can be obtained from the observed data? AB=40, some solutions can be discarded (2,25) o (4,5), but not many others (1,40), (2,20), (4,10), (5,8) If the pieces of information in the sample covariance matrix are not enough, the estimation process can find infinite number of parameter estimates that result in perfect model fit: identification problem What means enough pieces of information? We should have more variances and covariances in the sample matrix that parameters to be estimated: overidentified model. If this happens a unique estimate for each parameter can be obtained in multiple ways from sample data. If we have the same number of data than parameters to be estimated, the model is just-identified. A unique estimate for each parameter exists but can be found through only and one only manipulation of observed data (e.g. AB=40; A+2B=18(A=8; B=5).
28
Identification (Diamantopoulos & Siguaw, 2000; p.48-49)
Can a single unique value for each and every free parameter can be obtained from the observed data? More data points that parameters: overidentified model. AB=40 (1) A+2B=18 (2) 2A+B=24 (3) De (1) y (2) (A=8; B=5) De (2) y (3) (A=10; B=4) As at least two estimations can be found for each parameter, the second estimation can be used to test the model. If both estimations differ significantly, there is evidence that the model cannot be trusted (Aaker y Bagozzi, 1979; p.153)
29
Identification Can a single unique value for each and every free parameter can be obtained from the observed data? Scaling the latent variable (LV are unobserved, and thus have no defined metrics) The loading of an indicator of the LV is fixed to 1 Variance of LV is fixed to 1 Error term path coefficient is fixed to 1 (same as OLS regression) Checking our model is an overidentified model: Data points: sampling variances and covariances: q(q+1)/2 Parameters to be estimated: LV covariances LV variances Loadings Error variances
30
Identification
12=21
* *
2
Scaling the latent variable (LV are unobserved, and thus have no defined metrics) The loading of an Indicator of the LV is fixed to 1 Variance of LV is fixed to 1
*
1
* 1 *
11 21 31
* 1 *
42 52 62
x1
x2
x3
x4
x5
x6
* * * * * * * * * * * *
1 2 3 4 5 6
Cov ( X 1 , X 3 ) = 11 Var (1 ) 13
31
Identification
12=21
* *
2
* 1 *
11 21 31
* 1 *
42 52 62
x1
x2
x3
x4
x5
x6
1 1 1 1 1 1 * * * * * *
1 2 3 4 5 6
32
E H F M P C
X1=L X2=FSF
1.00 .493 .401 .278 .317 .284 1.00 .314 .347 .318 .327 1.00 .147 .183 .179 1.00 .587 .463 1.00 .453 1.00
Checking our model is an overidentified model: Data points: sampling variances and covariances: q(q+1)/2 Parameters to be estimated: LV covariances LV variances Loadings Error variances
12=21 *
1
* *
2
* 1 *
11 21 31
* 1 *
42 52 62
x1
x2
x3
x4
x5
x6
1 1 1 1 1 1 * * * * * *
1 2 3 4 5 6
33
Estimation
Unweighted least squares (ULS) Generalized least squares (GLS) Maximum likelihood Asymptotic distribution free (ADF) = Weighted least squares (WLS)
Minimize the function
34
Estimation
Non metric (categorical) indicators (Brown, 2006): WLS (ADF in Amos o AGLS in EQS) WLSMV, only implemented in Mplus Non-normality (Chou, Bentler & Satorra,1991; Hu, Bentler & Kano, 1992) : ML estimation (known as MLM in Mplus) but Corrected standard errors and Chi-square statistic (Satorra-Bentler scaled chi-square; Satorra & Bentler, 1988)
35
Let us write EQS syntax
12=21 *
1
* *
2
* 1 *
11 21 31
* 1 *
42 52 62
x1
x2
x3
x4
x5
x6
1 1 1 1 1 1 * * * * * *
1 2 3 4 5 6
/TITLE CFA QI and VI /SPECIFICATIONS CASE=275; VAR=6; ME=ML; MA=COR; ANAL=COV; /MATRIX 1.000 0.493 1.000 0.401 0.314 1.000 0.278 0.347 0.147 1.000 0.317 0.318 0.183 0.587 1.000 0.284 0.327 0.179 0.463 0.453 1.000 /STANDARD DEVIATIONS 1.09 0.59 0.98 1.10 0.41 1.11
36
Let us write EQS syntax
/LABELS
V1=E;
V2=H;
V3=F;
V4=M;
V5=P;
V6=C;F1=IV;
F2=IQ;
/EQUATIONS
V1=
F1+E1;
V2=*F1+E2;
V3=*F1+E3;
V4=
F2+E4;
V5=*F2+E5;
V6=*F2+E6;
/VARIANCES
F1
TO
F2=*;
E1
TO
E6=*;
/COVARIANCES
F1
TO
F2=*;
12=21 *
1
* *
2
* 1 *
11 21 31
* 1 *
42 52 62
x1
x2
x3
x4
x5
x6
1 1 1 1 1 1 * * * * * *
1 2 3 4 5 6
37
Let us write EQS syntax
12=21 *
1
* *
2
* 1 *
11 21 31
* 1 *
42 52 62
x1
x2
x3
x4
x5
x6
1 1 1 1 1 1 * * * * * *
1 2 3 4 5 6
38
Goodness-of-Fit Up to which degree the model implied covariance matrix is similar to the sample covariance matrix? 2 = ( N 1) FML Chi-square: Degrees of freedom are taken into account in its evaluation Tests the null hypotesis that the implied covariance matrix equals the so no-rejection is preferred (the sample covariance matrix: H 0 : S = higher the p value, the best) Important criticisms (Long, 1983, p.75; Marsh, Balla & McDonald, 1988; p.392; Brown, 2006, p.81): In many instances (small N, non-normal data) its underlying distribution is not chi-square distributed compromising statistical significance tests. It is inflated by sample size and thus large N solutions are routinely rejected even when differences between matrices are negligible It is based on a very stringent hypothesis Check the residual covariance matrix Small residuals (Centred and symmetrical EQS graph)
39
Goodness-of-Fit Taken into account chi-square limitations many alternative fit indices are based on less stringent standards. However more than 30 have been developed. The GOF indices we present are selected on the basis of their favourable performance in Monte Carlo Research (Hu & Bentler, 1998; Marsh, Balla & McDonald, 1988) Absolute Fit Indicate how well your estimated model reproduces the observed data, evaluating the reasonability of the null hypothesis H 0 : S = SRMR (Standardized Root Mean Square Residual) Square root of the sum of the squared elements of the residual correlation matrix (S - S) divided by the number of elements of that matrix (on and below the diagonal, that is, q(q+1)/2) <.08 benchmark for a good fit (Hu & Bentler, 1999)
40
Goodness-of-Fit Parsimony corrected Incorporate a penalty function for poor model parsimony (equal absolute fit with more free parameters = lower df ) RMSEA (Root Mean Square of Error Aproximation) Chi squared statistic is corrected in such a way that the higher parsimony (higher df ) the lower the index (better fit):
Benchmark (Browne & Cudeck, 1993) <.05 good fit .05 a .08 reasonable >.08 mediocre
41
Goodness-of-Fit Comparative fit Indicate how well your estimated model fits relative to some alternative baseline model. The most common baseline model is one that assumes all observed variables are uncorrelated, which means you have all single item scales. NFI (Normed Fit Index; Bentler & Bonnet, 1980) >.90 good fit (Ullman, 2001)
2 2 < rM rB NFI = 2 rB
NNFI (Non Normed Fit Index, aka TLI, Tucker & Lewis, 1973), corrects by df but can provide values higher to 1 >.90 good fit (Schumacker & Lomax, 1996)
2 rB <
NNFI = TLI =
Goodness-of-Fit Comparative fit CFI (Comparative Fit Index; Bentler, 1990) as TLI corrects by parsimony (df) but avoids values higher that 1.0 .90 -.95 acceptable fit, >.95 good fit (Hu & Bentler, 1999)
2 < dfM rM CFI = 1 < 2 r B < dfB
Reporting goodness-of-fit Combine Goodness with Badness of fit indices At least one index of each type: absolute, parsimony and comparative
Parsimony RMSEA
43
Goodness-of-Fit Standardized model residual covariance matrix
STANDARDIZED
RESIDUAL
MATRIX:
L
FSF
H
M
FSC
Q
V
1
V
2
V
3
V
4
V
5
V
6
L
V
1
0.000
FSF
V
2
-0.011
0.000
H
V
3
0.041
-0.024
0.000
M
V
4
-0.046
0.043
-0.070
0.000
FSC
V
5
-0.007
0.013
-0.035
0.010
0.000
Q
V
6
0.022
0.081
0.003
-0.004
-0.014
0.000
AVERAGE
ABSOLUTE
STANDARDIZED
RESIDUALS
=
0.0201
AVERAGE
OFF-DIAGONAL
ABSOLUTE
STANDARDIZED
RESIDUALS
=
0.0282
LARGEST
STANDARDIZED
RESIDUALS:
V
6,V
2
V
4,V
3
V
4,V
1
V
4,V
2
V
3,V
1
0.081
-0.070
-0.046
0.043
0.041
V
5,V
3
V
3,V
2
V
6,V
1
V
6,V
5
V
5,V
2
-0.035
-0.024
0.022
-0.014
0.013
V
2,V
1
V
5,V
4
V
5,V
1
V
6,V
4
V
6,V
3
-0.011
0.010
-0.007
-0.004
0.003
V
2,V
2
V
3,V
3
V
6,V
6
V
5,V
5
V
4,V
4
0.000
0.000
0.000
0.000
0.000
44
Goodness-of-Fit Standardized model residual covariance matrix
DISTRIBUTION
OF
STANDARDIZED
RESIDUALS
----------------------------------------
!
!
20-
-
!
!
!
!
!
!
!
!
RANGE
FREQ
PERCENT
15-
-
!
!
1
-0.5
-
--
0
0.00%
!
!
2
-0.4
-
-0.5
0
0.00%
!
!
3
-0.3
-
-0.4
0
0.00%
!
*
!
4
-0.2
-
-0.3
0
0.00%
10-
*
*
-
5
-0.1
-
-0.2
0
0.00%
!
*
*
!
6
0.0
-
-0.1
11
52.38%
!
*
*
!
7
0.1
-
0.0
10
47.62%
!
*
*
!
8
0.2
-
0.1
0
0.00%
!
*
*
!
9
0.3
-
0.2
0
0.00%
5-
*
*
-
A
0.4
-
0.3
0
0.00%
!
*
*
!
B
0.5
-
0.4
0
0.00%
!
*
*
!
C
++
-
0.5
0
0.00%
!
*
*
!
-------------------------------
!
*
*
!
TOTAL
21
100.00%
----------------------------------------
1
2
3
4
5
6
7
8
9
A
B
C
EACH
"*"
REPRESENTS
1
RESIDUALS
45
Goodness-of-Fit Chi-squared and Gof indices
GOODNESS
OF
FIT
SUMMARY
INDEPENDENCE
MODEL
CHI-SQUARE
=
392.818
ON
15
DEGREES
OF
FREEDOM
INDEPENDENCE
AIC
=
362.81793
INDEPENDENCE
CAIC
=
293.56637
MODEL
AIC
=
-7.15793
MODEL
CAIC
=
-44.09210
CHI-SQUARE
=
8.842
BASED
ON
8
DEGREES
OF
FREEDOM
PROBABILITY
VALUE
FOR
THE
CHI-SQUARE
STATISTIC
IS
0.35579
THE
NORMAL
THEORY
RLS
CHI-SQUARE
FOR
THIS
ML
SOLUTION
IS
9.157.
BENTLER-BONETT
NORMED
FIT
INDEX=
0.977
BENTLER-BONETT
NONNORMED
FIT
INDEX=
0.996
COMPARATIVE
FIT
INDEX
(CFI)
=
0.998
BOLLEN
(IFI)
FIT
INDEX=
0.998
McDonald
(MFI)
FIT
INDEX=
0.998
LISREL
GFI
FIT
INDEX=
0.989
LISREL
AGFI
FIT
INDEX=
0.971
ROOT
MEAN
SQUARED
RESIDUAL
(RMR)
=
0.027
STANDARDIZED
RMR
=
0.044
ROOT
MEAN
SQ.
ERROR
OF
APP.(RMSEA)=
0.020
90%
CONFIDENCE
INTERVAL
OF
RMSEA
(
0.000,
0.075)
46
ITERATIVE SUMMARY PARAMETER ITERATION ABS CHANGE ALPHA FUNCTION 1 0.298689 1.00000 0.88599 2 0.124292 1.00000 0.10692 3 0.026794 1.00000 0.03287 4 0.008439 1.00000 0.03231 5 0.001469 1.00000 0.03227 6 0.000443 1.00000 0.03227
47
Interpretation of the model BEFORE interpreting, are parameter estimations statistically reasonable? Correlations higher than |1|? Standardized loadings higher than |1|? Negative variances? significant?
48
Interpretation of the model BEFORE interpreting, are parameter estimations statistically reasonable?
VARIANCES
OF
INDEPENDENT
VARIABLES
VARIANCES
OF
INDEPENDENT
VARIABLES
----------------------------------
----------------------------------
E
D
V
F
---
---
---
---
E1
-
E
.552*I
I
I
F1
-
IV
.636*I
.088
I
I
I
.117
I
6.256
I
I
I
5.443
I
I
I
I
I
E2
-
H
.183*I
I
I
F2
-
IQ
.698*I
.025
I
I
I
.112
I
7.294
I
I
I
6.244
I
I
I
I
I
E3
-
F
.728*I
I
.071
I
I
10.281
I
I
I
I
E4
-
M
.512*I
I
.075
I
I
6.828
I
I
I
I
E5
-
P
.071*I
I
.010
I
I
6.807
I
I
I
I
E6
-
C
.767*I
I
.079
I
I
9.655
I
I
I
I
Negative variances?
49
Interpretation of the model BEFORE interpreting, are parameter estimations statistically reasonable?
CORRELATIONS
AMONG
INDEPENDENT
VARIABLES
---------------------------------------
V
F
---
---
I
F2
-
IQ
.582*I
I
F1
-
IV
I
I
I
50
Interpretation of the model Are parameter estimations significant?
MEASUREMENT
EQUATIONS
WITH
STANDARD
ERRORS
AND
TEST
STATISTICS
E
=V1
=
1.000
F1
+
1.000
E1
H
=V2
=
.509*F1
+
1.000
E2
.068
7.467
F
=V3
=
.604*F1
+
1.000
E3
.096
6.319
M
=V4
=
1.000
F2
+
1.000
E4
P
=V5
=
.373*F2
+
1.000
E5
.039
9.467
C
=V6
=
.817*F2
+
1.000
E6
.096
8.552
51
Interpretation of the model Interpret!
STANDARDIZED
SOLUTION:
E
=V1
=
.732
F1
+
.682
E1
H
=V2
=
.688*F1
+
.725
E2
F
=V3
=
.492*F1
+
.871
E3
M
=V4
=
.759
F2
+
.651
E4
P
=V5
=
.760*F2
+
.650
E5
C
=V6
=
.615*F2
+
.789
E6
0,582
0,732
0,688
0,492
0,759
0,760
0,615
x1
0,682 0,725
x2
0,871
x3
0,651
x4
0,650
x5
x6
0,789
52
Model respecification: Modification indices How much the chi-square would decrease if the free parameter is constrained or the constrained parameter is freely estimated Rationale: Improving fit (?) Testing an additional hypothesis In our opinion: we shouldnt forget we are using a confirmatory technique not an exploratory one. Our goal is not to find a good fitting model but looking for evidence about our model fits or not the real world. Specification searches rarely leaded to the real model in simulation research (MacCallum, 1986). Estimating a fixed parameter should only be done if this parameter can be interpreted substantively and can be justified on the basis of prior theory (Jreskog, 1993; Silvia & MacCallum, 1988). Other adverse consequence of atheoretical specification searches is capitalization on chance association in the sample data, that is, accounting for weak effects in the data that stem from sampling error and are not apt to be replicated in independent data sets (Brown, 2006)
53
Model respecification: Modification indices EQS provides two modification indices:
Lagrange Multiplier test: How much the chi-square will be reduced (fit improves) if we freely estimate a constrained parameter (if we add a non hypothesized relationship) Wald test: How much the chi-square will be reduced if we constrain a parameter that was freely estimated (we delete a hypothesized relationship)
54
Model respecification: Modification indices
MULTIVARIATE LAGRANGE MULTIPLIER TEST BY SIMULTANEOUS PROCESS IN STAGE 1 PARAMETER SETS (SUBMATRICES) ACTIVE AT THIS STAGE ARE: PVV PFV PFF PDD GVV GVF GFV GFF BVF BFF CUMULATIVE MULTIVARIATE STATISTICS UNIVARIATE INCREMENT ---------------------------------- -------------------- STEP PARAMETER CHI-SQUARE D.F. PROBABILITY CHI-SQUARE PROBABILITY ---- ----------- ---------- ---- ----------- ---------- ----------- 1 V2,F2 4.741 1 0.029 4.741 0.029
WALD CONTRASTE (FOR DROPPING PARAMETERS) MULTIVARIATE WALD TEST BY SIMULTANEOUS PROCESS CUMULATIVE MULTIVARIATE STATISTICS UNIVARIATE INCREMENT ---------------------------------- -------------------- STEP PARAMETER CHI-SQUARE D.F. PROBABILITY CHI-SQUARE PROBABILITY ---- ----------- ---------- ---- ----------- ---------- ----------- ************ NONE OF THE FREE PARAMETERS IS DROPPED IN THIS PROCESS.
55