You are on page 1of 36

Confirmatory Factor Analysis

Definition

Confirmatory factor analysis (CFA) is a procedure for learning the extent to which k observed
variables might measure m abstract variables, wherein m is less than k. In CFA, we indirectly
measure non-observable behavior by taking measures on multiple observed behaviors.
Conceptually, in using CFA we can assume either nominalist or realist constructs, yet most
applications of CFA in the social sciences assume realist constructs.

Terminology: Factor = Abstract Concept = Abstract Construct = Latent Variable.

CFA differs from EFA in that it specifies a factor structure based upon expected theoretical
relationships. Whereas we might think of EFA as a procedure for inductive theory construction,
CFA is a procedure for testing hypotheses deduced from theory. CFA allows the researcher to
conduct two forms of data analysis not avaliable in EFA:

1. CFA allows for the examination of second-order (i.e., higher-order) latent variables. We
might posit, for example, that marital satisfaction (a latent variable) consists of four sub-
dimensions (each a latent variable), satisfaction with: romance, companionship, family
finances, and child rearing.
2. CFA allows for testing hypotheses related to construct validity. We can test for statistical
significance of the effect of a latent variable on each of the observed variables posited to
measure it.

The web page entitled, "Using CFA to Test Empirical Validity" provides an example of how CFA
can be used to examine construct and predictive validity for a second-order latent variable.

CFA requires one to specify the measurement of and relationships among the factors.
Therefore, it relies upon deductive examination of a theory. Deductive analysis has the
advantage of knowing a priori the factor structure, which allows one to test hypotheses related
to examining the various types of construct validity. However, whereas the EFA model is never
underidentified, the CFA model can be underidentified, requiring one to understand
mathematical identification and the rules for certifying model identification.

Assumptions

1. Typically, realism rather than nominalism: Abstract variables are real in their consequences.
2. Normally distributed observed variables.
3. Continuous-level data.
4. Linear relationships among the observed variables.
5. Content validity of the items used to measure an abstract concept.
6. E(ei) = 0 (random error).
7. Theoretically specified relationships among observed variables and factors.
8. A sample size greater than 100 (more is better).
Note: In CFA:
1. we use the symbol (xi) to refer to an exogenous factor (an independent latent variable).
2. we use the symbol (eta) to refer to an endogenous factor (a dependent latent variable).
3. we use the symbol to refer to the intercept of the measurement model.
4. we use the symbol to refer to the variance/covariance matrix of the factor(s). Note that the
variance of a factor always equals 1 in EFA.

The diagram shown below shows the terminology typically used in CFA and Structural Equation
Modeling (SEM). This course addresses the "measurement" model, meaning the
measurements of and relationships among the exogenous variables. Soc 613 addresses the
"causal" model, referring to relationships among the exogenous and endogenous variables.

When we address second-order CFA, we


Most of the notes in this lecture refer to measuring
will discuss measuring and . The principles discussed in measuring apply also to
measuring .
Notation

y p x 1 vector of observed endogenous (i.e., dependent) variables.


x q x 1 vector of observed exogenous (i.e., independent) variables.
m x 1 vector of latent endogenous variables (i.e., dependent factors).
n x 1 vector of latent exogenous variables (i.e., independent factors).
p x 1 vector of measurement errors in y.
q x 1 vector of measurement errors in x.
m x n matrix of coefficients of the variables.
m x m matrix of coefficients of the variables.
m x 1 matrix of coefficients of equation errors in the relationships between and
m x m matrix of coefficients of correlations among the errors in the relationships between
and

Example of a CFA Model

The model shown below specifies that a set of three abstract variables related
to locus of controlinternal, chance, and powerful otherscan be measured with sufficient
validity and reliability by nine observed variables, wherein each latent variable is measured with
three observed variables.
Software Packages for Conducting CFA

The Sociology 512 web site provides examples of conducting CFA using six well known
software packages: LISREL, MPlus, R, SAS, SPSS/AMOS, and Stata. The examples shown in
these notes rely mainly upon the LISREL software package.

Consequences of Measurement Error

We noted above that CFA assumes a sample size of at least 100. Understanding the
consequences of measurement error can explain why we make this assumption.

Single Indicator of

X1 = 1 + 111 + 1, where:

1. refers to the intercept of the equation


2. the population mean of = K (kappa)
3. the population mean of X = x1 (mu)

E(X1) = E(1 + 111 + 1), or: x1 = 1 + 11K1

Recall that this equation cannot be solved because of the linear dependencies in the matrix.
To estimate the parameters, we must make one of the following assumptions:

1. The variance of the factors equals one.


2. The parameter estimate (11) for the effect of the factor (1) on X1 equals one.
3. For example:
a. let 1 = 0 (standardized variables),
b. set 11 = 1.
4. Then, X1 = 1 + 1.
5. Then, E(X1) = E() + E().
6. Assume: E() = 0 (i.e., random errors in measurement).
7. Then, E(X1) = E(), or x1 = K1.
8. Hence, E(X1) is an unbiased estimator of K1. (so far, so good!).
Multiple Indicators of

Consider two indicators of :

X1 = 1 + 111 + 1
X2 = 2 + 211 + 2

We must set a scale for one of the latent variables.


a. let 1 = 0 (standardized variables),
b. set 11 = 1.
4. Then, E(X1) = k.
5. Having set a scale for using 1 and 11, it is unnecessary and incorrect to so again for 2
and 21.
6. E(X1) = k (as before).
7. The mean of X2, however, may not equal the mean of 1.

Consider the consequences of estimating the variance of 1:


1. var(X1) = 11211 + var(1), where 11 = var(1)
2. if 11 = 1 and var(1) = 0, then var(X1) = 11 (unbiased estimator)
3. if var(2) 0, however, then var(X2) > 11 (biased estimator).
4. Therefore, var(X2) is a biased estimator of 11.
5. To resolve this issue, CFA relies upon asymptotic distribution theory, which essentially
states that for large sample sizes (n 100), var(X2) is an unbiased estimator of 11.

The Measurement Model

A measurement model specifies a structural relationship that connects latent variables to one or
more observed variables. The general linear model for specifying these relationships is:

= () = E(XX), where:

1. refers to reality.
2. () refers to theory.
3. E(XX) refers to the correlation matrix of observed variables.
Consider the following example of the measurement model:

For standardized variables:

X1 = 1 + 111 + 1
X2 = 2 + 211 + 2

or, in general:

X = x +

Most latent variables in the social sciences are abstract ones. Abstract variables require an
arbitrary scale. There are two approaches to setting a scale:

1. Set the variances of all latent variables () to 1.


2. Set one of the estimates in x (for each xii) to 1.
3. Do not do both procedures in the same model.

The Covariance Matrix for the CFA Model

1. The covariance matrix for X = E(XX)


2. Therefore, () = E(XX), wherein: X = x +
3. XX = (x + ) (x + )
4. XX = (x + ) (x + )
5. XX = xx + x + x +
6. E(XX) = xE()x + xE() + E()x + E()

Assume:
1. E() = E(),= 0, factors are not correlated with errors (random errors in measurement).
2. E() is the covariance matrix of latent variables:
3. E() is the covariance matrix of errors:
4. Therefore: = () = E(XX) = xx' +

Model Identification

In conducting CFA we specify a set of parameters to be estimated. We therefore must specify a


model that contains sufficient information to estimate these parameters.

Consider the following model:

X1 X2 X3 X4

Let: 11 = 1 to set the scale for 1


Let: 32 = 1 to set the scale for 2

Assume: uncorrelated error terms. This assumption is not necessary in CFA; it is made here to
simplify the presentation regarding model identification.

Then,

X= | X1 |
| X2 |
| X3 |
| X4 |

x= | 1 0 |
| 21 0 |
| 0 1 |
| 0 42 |


= | 1 |
| 2 |

= | |
| |

= | var(1) 0 0 0 |
| 0 var(2) 0 0 |
| 0 0 var(3) 0 |
| 0 0 0 var(4) |

Compute ():

| 1 0 |
x | 21 0 | *
| 0 1 |
| 0 42 |

| |
| | =

| |
| 21 21 |
x | | *
| 42 42 |

x' | 1 21 0 0 |
| 0 0 1 42 | =
'
xx

| |
| 21
21
2
|
| 21 | +
| 42 2142 42
42
2
|

| var(1) 0 0 0 |
| 0 var(2) 0 0 |
| 0 0 var(3) 0 | =
| 0 0 0 var(4) |
() | var(1) |

| 21 212var(2) |
| 21 var(3) |
| 42 2142 42 var(4)
42
2
|

Using E(XX'):

| var(X1) |
| cov(X1 X2) var(X2) |
| cov(X1 X3) cov(X2 X3) var(X3) |
| cov(X1 X4) cov(X2 X4) cov(X3 X4) var(X4) |

Then:
cov(X1 X3)
21 = cov(X2 X3) / cov(X1 X3)
42 = cov(X1 X4) / cov(X1 X3)
11 = [cov(X1 X2) * cov(X1 X3)] / cov(X2 X3)
22 = [cov(X3 X4) * cov(X1 X3)] / cov(X1 X4)
var(1) = var(X1) - 11
var(2) = var(X2) - 2111
var(3) = var(X3) - 22
var(4) = var(X4) - 4211

Example

Assume the correlation matrix shown below. Calculate the parameter estimates given the
model as identified above.

rx1x2 =
| 1 |
| .305 1 |
| .233 .230 1 |
| .216 .213 .308 1 |

rx1x3 = .233
21 = rx2x3 / rx1x3 = .987
41 = rx1x4 / rx1x3 = .927
11 = (rx1x2 * rx1x3) / rx2x3 = .309
22 = (rx3x4 * rx1x4) / rx2x3 = .332
= 1 - .309 = .691
= 1 [.9872 * .309] = .699
= 1 - .332 = .668
= 1 [.9272 * .332] = .715

Item reliabilities (squared multiple correlation) = ij2ii / var(Xi)

X1 = (12)(.309) / 1 = .309.
X2 = (.9872)(.309) / 1 = .301.
X3 = (12)(.332) / 1 = .332.
X4 = (.9272)(.332) / 1 = .285.

Summary

All parameters to be estimated in () must be expressed in terms present in E(XX').

Rules for Mathematical Identification

Some rules can help one know if a model will be identified.

T-rule

t (q)(q + 1), where:

t = the number of parameters to be estimated


q = the number of observed variables.

The t-rule is necessary, but not sufficient for mathematical identification.

Example: For the model shown above:

1. t = (4)(5) = 10.
2. The nine estimated parameters to be estimated are: 21, 41, 11, 22, 12, , 22,33, and
44
3. Therefore, the model meets the t rule. In this case, the model is said to be "underidentified"
because t < 10.

Three Indicator Rule

1. Three or more observed variables per latent variable.


2. Each row of x has only one non-zero element in addition to 11 = 1. That is, each X is an
indicator of just one latent variable.
3. is a diagonal matrix. That is, the errors are uncorrelated.

This rule is sufficient, but not necessary for mathematical identification.


Two Indicator Rule

1. Two observed variables per latent variable.


2. Each row of x has only one non-zero element in addition to 11 = 1. That is, each X is an
indicator of just one latent variable.
3. is a diagonal matrix. That is, the errors are uncorrelated.
4. More than one latent variables.
5. has no zero elements. That is, the latent variables are correlated with one another.

This rule is sufficient, but not necessary for mathematical identification.

Degrees of Freedom

The degrees of freedom for a CFA model:

d.f. = [q(q+1) / 2] t.

That is, the number of potential parameters minus the number of estimated parameters.

Model Evaluation

Theoretical proposition:

= () = E(XX), where:
1. refers to reality.
2. () refers to theory.
3. E(XX) refers to the correlation matrix of observed variables.
Notation:
S = E(XX), the observed correlation matrix.
( ) = the matrix of estimated parameters.
Alternative Hypothesis: The theory fits the data.
S = ( )
Null Hypothesis: There is no difference between the estimated parameter matrix and the
observed correlation matrix.
S - ( ) = 0
Note: A relatively small value for a model test statistic, such as chi-square, indicates that the
theory fits the data. Such a finding would indicate support for the theory. Thus, in evaluating
model fit, we look for a low chi-square value relative to the degrees of freedom, showing a
probability of alpha < .05.

Note: Measures of overall fit are not applicable to exactly identified models because at least one
degree of freedom is required for the hypothesis.

Note: Although evaluation statistics might indicate an overall good fit for the model, the individual
parameter estimates might be theoretically inappropriate or statistically non-significant.

Maximum Likelihood Chi-Square

Ho: S - ( ) = 0

2 = (n 1) (log|| + tr(( )-1S) log|S| - q), where:

n = sample size.
log refers to the natural log.

d.f. + [1/2 (q) (q+1)] - t, where:

t = the number of possible parameters to be estimated.


q = the number of observed variables.

Consider the conceptual foundation of chi-square. It equals a summary of the estimated score
minus the observed score in a table. In this same manner, chi-square equals the estimated
parameters plus their item reliabilities (the trace, or diagonal of the observed correlations divided
by the estimated parameters) minus the observed correlation matrix minus the number of
observed variables.
Coefficient of Determination

The coefficient of determination (R-square) calculates the percent of variance explained in the
observed variables (X matrix) by the latent variables ( matrix). It equals 1 minus the
determinant of the errors in estimating X (the matrix) divided by the determinant of Sigma-hat
(i.e., the input correlation matrix).

R2 = 1 [ || / | XX' |

Goodness of Fit Indexes

Various goodness of fit indexes have been developed to assess model fit. Ones more
commonly used are the Goodness of Fit Index (GFI) and the Adjusted Goodness of Fit Index
(AGFI). The Residual Mean Square (RMS) and Critical N (CN) also are a popular statistics
used to assess model fit. Critical N is equal to "what chi-square would be if the sample size
were 200." Thus, Critical N adjusts chi-square for very large samples, wherein a large sample
size can create a large chi-square statistic even when the "amount of error" is small.

These indexes have the disadvantage of not having ratio scales. Thus, the community of
scholars must arrive at some agreed upon level of the indexes that assures them of adequate
model fit. In general, a GFI or AGFI of .9 or above is considered acceptable. The community of
scholars looks for an RMS of below .05. The community of scholars looks for a CN of above
200, meaning that "a sample size of more than 200 is needed to arrive at a chi-square that
indicates a probability of alpha greater than .05." See related article by Schreiber et al. for a
detailed description of model evaluation for CFA and Structural Equation models.

Component Fit Measures

The t-test at 1 degree of freedom is used to evaluate the statistical significance of a parameter
estimate, wherein t = estimate / standard error of the estimate. A t-ratio of 1.98 or greater
indicates statistical significance at alpha = .05.
Reliability of the Parameter Estimates

Consider this model:

Reliability of Xi

The reliability (i.e., communality) of Xi is the magnitude of the direct relationship that all latent
variables have on Xi.

Thus, the reliability of Xi = i=1-qij2ii / var(Xi).

Reliability and Model Specification

The reliability of Xi can vary depending upon model specification:

In the Parallel Model:


11 = 21
1 = 2
In the Tau-Equivalent Model:

11 = 21
1 2
In the Congeneric Model:
11 21
1 2
The Reliability of

q q q
(the reliability of ) = xi)2 / xi)2 + i)
i 1 i 1 i 1

Average Variance Explained by

vc((average variance explained by ) = i=1-q xi2 / i=1-q xi2 + i=1-q i

Standardized Parameter Estimates

Reporting standardized parameter estimates enables the community of scholars to compare


different studies of the same model. The formulas for calculating standardized estimates are:

ijs = ij [jj / var(Xi)]1/2

ijs = ij / jj1/2 ij1/2

ijs = ij / var(Xi)

where i refers to an observed variable and j refers to a latent variable.

In matrix format:

xs = Dx-1x D
s = D-1 D-1
s = Dx-1 Dx-1

where:

Dx = (diag [xx' + ]1/2

D= (diag )1/2
Unique Validity Variance

In cases where a measurement model specifies correlated factors or error terms one might want
to know the unique commonality for an observed variable.

Uxij (the unique validity variance, or commonality) of the effect of Xi on Xj) = Rxi2 - Rxi(j)2, where:

Rxi2 is the squared multiple correlation coefficient for Xi. This is the proportion of variance in Xi
explained by all latent variables in the model that have a direct effect on Xi.

Rxi2 = xi *-1 xi' where:

1. xiis the correlations of on Xi, for all that affect Xi. (a 1 x d vector, where d is the
number of with direct effects on Xi).
2. * = correlation matrix of all with direct effects on Xi.

Rxi(j)2 is the squared multiple correlation coefficient for Xi, controlling for the effects of the latent
variable on other observed variables.

Rxi(j)2 = [xi() ij*-1 xi()'] / var(Xi), where:

1. xi() = is the correlations of on Xi, for all that affect Xi, except for j, the latent variable of
interest (a 1 x d vector, where d is the number of with direct effects on Xi).
2. (j)* = correlation matrix of all with direct effects on Xi, except for j, the latent variable of
interest.

Note: The unique validity variance might be relatively low in comparison with Rxi2 because Xi
might depend upon highly correlated latent variables.

Degree of Collinearity

A measurement model with more than one latent variable, wherein the latent variables are
correlated with one another, should be evaluated for its degree of collinearity.

R(j)2 = [(j)(j) (j)*-1 (j)(j)'] / (jj)

or, the R-square for affecting Xi other than the of interest.

Note: For just two x affecting Xi:

R(j)2 = 122 / 1122 (i.e., the squared correlation of 1 and 2.


Factor Score Estimation

Having found some underlying dimension(s) in the data, the researcher might want to
construct a factor scale. A factor scale is a latent variable derived from two or more
observed variables that have been demonstrated to have content and construct validity,
and which are sufficiently reliable to be used for further analysis.

Factor scales can be used in two ways: 1) to examine observations in terms of their
scores on the latent variables, 2) to use the latent variables in subsequent analysis as
independent and/or dependent variables.

Measurements on factor scales can be constructed in several ways. First, they can be
calculated by simply adding or obtaining the mean of the two or more observed variables
comprising the scale. If the observed variables differ in their item reliabilities, however, the
researcher might want to construct the factor scale based upon weighted observed
variables. Observed variables typically are weighted by their parameter estimates on the
factor. Listed below are three procedures that use different assumptions to create more
refined factor scores.

Least Squares Procedure

Factor Score = (xS-1)x, where S = the observed correlation matrix.

Bollen's Procedure

Bollen suggests accounting for the correlations among the latent variables:
Factor Score = (xS-1)x, where S = the observed correlation matrix.

Bartlett's Procedure

Barlett suggests giving more weight to observed variables with greater item reliability:
Factor Score = [(x'-2)(S-2S)-1]x, where S = the observed correlation matrix.
Hypothesis Testing and Model Comparison

One advantage to theory testing and the subsequent use of CFA is that nested models can be
used to test hypotheses. One can conduct a difference in chi-square test, for example, to
evaluate the extent to which changes in model specification affect model fit.

Research and Null Hypotheses for All Models

Ha: The model fits the data.

If the model fits the data, then chi-square will be low and the prob. of a type-I error will be
over .05 (assuming an assigned type-1 error rate of 5%).

Ho: There is no relationship between the model and the data.

If there is no relationship between the model and the data, then chi-square will be high and
the prob. of a type-I error will be less than .05 (assuming an assigned type-1 error rate of
5%).

The approach to testing differences in estimates across two samples, or testing for the
moderating effect of an external variable, is to estimate a baseline model that assumes no
difference in estimates across the two samples. Then, estimate less restricted models, ones
that allow for differences in parameter estimates across levels of the external variable. The chi-
square calculation for each less restricted model will be less than the chi-square value for the
baseline model. And the degrees of freedom for the less restricted model will be less than that
of the baseline model. To determine if a less restricted model fits the data better than the
baseline model, one can calculate a chi-square difference test:

2r - 2u
chi-square (baseline) chi-square (less restricted).

This difference score is evaluated at the difference in the degrees of freedom for the two
models:

df (baseline) df (less restricted).

For example, suppose the chi-square for a baseline model that contains three parameters in the
gamma matrix equals 142.691 at 123 d.f. Suppose that a less restricted model is estimated that
allows for the three parameters in the gamma matrix to be estimated separately for the two
groups under consideration. And suppose that the chi-square for this less restricted model
equals 110.527 at 120 d.f. Then the difference in chi-square equals 32.164 at 3 d.f. The critical
value of chi-square at three degrees of freedom for a type-I error rate of 5% equals 7.815.
Therefore, we would conclude that, at a type-I error rate of 5%, the less restricted model fits the
data better than does the baseline model, meaning that the parameter estimates differ
significantly from one another across the two levels of the external variable. The next step
would be to conduct a chi-square difference test for each of the paths in the gamma matrix to
determine which of the three paths has significantly different parameter estimates across the
two levels of the external variable.
Typically, one would allow a matrix of estimates, such as the lambda, gamma, beta, and error
matrices (psi, theta-delta, and theta-epsilon) matrices to become less restricted to examine the
possibility of differences in parameters across the levels of the external variable. If the chi-
square difference test indicates that the baseline model and less restricted model contain at
least some significantly different parameter estimates, then one would test each path within a
matrix at a time to locate the ones that differ significantly from one another (they might all be
significantly different from one another).

If one finds a less restricted model that fits the data significantly better than the baseline model,
then this model becomes the new "baseline" model for testing of further differences in
parameter estimates across levels of the external variable.

The Sociology 512 web site includes notes on hypothesis testing using the SAS and LISREL
software packages.

Second-Order (Higher-Order) Factor Analysis

Some latent variables are themselves considered to be composed of multiple latent variables.
The latent variable Locus of Control (LOC), for example, is thought to comprise three sub-
dimensions: internal, chance, and powerful others. The diagram below illustrates a second-
order model of LOC, with the variable "perceived risk" used to assess the predictive validity of
the measure of LOC.
= + , where:

(eta): dependent, (i.e., endogenous) latent variable.


(gamma): parameter estimates for the independent (i.e., exogenous) latent variables.
(xi): independent (i.e., exogenous) latent variables.
(zeta): errors for the equation.

Sensitivity Analysis: Testing Equality of Parameter Estimates Across Two Groups

A central premise of CFA is that the theory fits the data. Thus, if an observed variable is posited
to measure just one latent variable then it should not also have a significant parameter estimate
on another latent variable. If an observed variable X1 is posited to measure 1, for example,
then X1 should not have a significant parameter estimate on 2. If it does, then we can question
the construct validity of X1 as an indicator of 1 as well as the theory that specifies that X1
measures only 1.

Sensitivily analysis examines the extent to which a theory has construct validity: the extent to
which hypotheses of no relationship are supported by the data.

Consider, for example, the Locus of Control CFA model as specified by Sapp and Harrod (see:
http://www.soc.iastate.edu/sapp/Soc512MeasurementRefs.html). Sapp and Harrod posit that 1)
the latent variable Internal is measured with three observed variables: Own Actions, Protect,
and Determine, 2) the latent variable Chance is measured with three observed variables:
Accidential Happenings, Bad Luck Happenings, and Lucky, and 3) the latent variable Powerful
Others is measured with three observed variables: Pressure Groups, Powerful Others, and
Powerful People (see: http://www.soc.iastate.edu/sapp/soc512LOCCFAModel.pdf). Implied by
this model is that Own Actions, for example, which is posited to measure the latent variable
Internal, is not significantly related to either of the remaining latent variables: Chance or
Powerful Others.

Sensitivity analysis examines whether the implied hypotheses of no relationship are supported
by the data. Shown below are examples of sensitivity analysis for the Sapp and Harrod LOC
model conducted in LISREL.

The Sociology 512 web site includes notes on hypothesis testing using the SAS and LISREL
software packages.
Means and Intercepts for Latent Variables

In CFA with multiple samples, it is possible to estimate means and intercepts for the latent
variables.

X(g) = x + x(g) + (g)

where:

x is the constant intercept term for each Xi. This value is set to be equal across samples (g).

Common Metric Standardized Solution Factor Loadings

Loadings are listed in the Lambda X matrices. Intercepts are listed in the Tau X matrices.
These matrices are the same for all groups.

Factor Loadings Tau-x


Item Internal Chance P. Others Var(x) Intercept

Actions .579 1.144 5.880


Protect .673 0.860 5.670
Detrmine .520 2.453 4.556
Acchap .538 1.449 5.181
Badhap .543 1.838 4.818
Lucky .692 2.743 4.813
Pressure .512 1.935 4.966
Powoth .822 2.231 5.260
Powple .721 1.469 5.002

Factor Covariance Matrices

Phi Matrices in Common Metric Standardized Solution for Each Group

Risk Latent Variables


Perceivers Internal Chance Powerful Others

Internal 1.121

Low Chance .612 .679


(n=67)
P. Others .586 .251 1.024

Internal .869

High Chance .814 .348


(n=62)
P. Others .627 .696 .974
Factor Means (Kappa Matrix)

Factor Means Scaled Factor Means

Internal Chance Power Internal Chance Power

Low .000 .000 .000 .113 .129 .014

High -.218 -.248 -.026 -.113 -.129 -.014

Scaled Factor Mean

g
[( xij x j ) n / g ] / ni
i 1

Where: i = 1, 2, 3 ... g groups

j = 1, 2, 3 ... k factors

n = number of observations in the group

Example for X21

X.1 = (0 + -.218) / 2 = -.109

n = (62 + 67) / 2 = 64.5


i 1

g
[( xij x j ) n / g ] / ni = [ (-.218 + .109) 64.5 ] / 62 = -.113
i 1
Analysis of Ordinal Variables

Albright and Park (2009) note that:

The maximum likelihood estimation (MLE) approach relies on the strong assumption of
multivariate normality. In practice, a substantial amount of social science data is non-normal.
Survey responses are often coded as yes/no or as scores on an ordered scale (e.g. strongly
disagree, disagree, neutral, agree, strongly agree). In the presence of categorical or ordinal
data, MLE may not work properly, calling for alternative estimation methods.

Mplus and LISREL employ a multi-step method for ordinal outcome variables that analyzes a
matrix of polychoric correlations rather than covariances. This approach works as follows:

1) thresholds are estimated by maximum likelihood,


2) these estimates are used to estimate a polychoric correlation matrix, which in turn is used to
3) estimate parameters through (diagonally) weighted least squares using the inverse of the
asymptotic covariance matrix as the weight matrix (Muthn, 1984; Jreskog, 1990).

In LISREL, the diagonally weighted least squares (DWLS) method needs to be specified.
Alternatively, the polychoric correlation matrix and asymptotic covariance matrix is estimated
and saved into a LISREL system file (.dsf) using PRELIS before fitting the model.

Mplus automatically follows above steps when the syntax includes a line identifying observed
variables as categorical.

Instructions

[For those times when you will be using data collected by persons other than those who
graduated from ISU, given that ISU graduates never would be so silly as to collect ordinal-level
data! ]

In cases of non-normality (i.e., assumed for ordinal-level data), it is a misuse of CFA


methodology to:
Use arbitrary scale scores for categories, pretending that these scores have interval
scale properties.
Compute a covariance matrix or product-moment correlation matrix for such scores.
Analyze cov/correlation matrices using the method of maximum-likelihood.

Such misuse can lead to:


distorted parameter estimates.
incorrect measures of chi-square.
incorrect estimates of standard error, and therefore incorrect t-ratios.

When conducting CFA with ordinal-level data, use weighted least squares with an asymptotic
covariance matrix. N must be at least 200 if k < 12 and at least 1.5 k(k+1) if K 12.
Power Analysis

From MEERA: [http://meera.snre.umich.edu/plan-an-evaluation/related-topics/power-


analysis-statistical-significance-effect-size]

What is power?

To understand power, it is helpful to review what inferential statistics test. When you conduct
an inferential statistical test, you are often comparing two hypotheses:

The null hypothesis This hypothesis predicts that your program will not have an
effect on your variable of interest. For example, if you are measuring students level of
concern for the environment before and after a field trip, the null hypothesis is that their
level of concern will remain the same.
The alternative hypothesis This hypothesis predicts that you will find a difference
between groups. Using the example above, the alternative hypothesis is that students
post-trip level of concern for the environment will differ from their pre-trip level of
concern.

Statistical tests look for evidence that you can reject the null hypothesis and conclude that
your program had an effect. With any statistical test, however, there is always the possibility
that you will find a difference between groups when one does not actually exist. This is
called a Type I error. Likewise, it is possible that when a difference does exist, the test will
not be able to identify it. This type of mistake is called a Type II error.

Power refers to the probability that your test will find a statistically significant difference when
such a difference actually exists. In other words, power is the probability that you will reject
the null hypothesis when you should (and thus avoid a Type II error). It is generally accepted
that power should be .8 or greater; that is, you should have an 80% or greater chance of
finding a statistically significant difference when there is one.

Power Estimation: Bollen Procedure

(See pages 338-349).

1. Estimate the more specified model and ACOV(a), the covariance matrix of the parameter
estimates for this model.
2. Calculate the added parameter estimates for the more specified model (Ha)under the
assumption that all standardized estimates equal .1.
3. NCP = [(column x row) matrix of the added parameter estimates] * [diagonal matrix of the
variances of the added parameters (inverse)] * [(row * column) matrix of the added
parameter estimates].
4. Calculate the power of the test.
The Multitrait-Multimethod Matrix

Home Measurement Construct Validity

The Multitrait-Multimethod Matrix (hereafter labeled MTMM) is an approach to


assessing the construct validity of a set of measures in a study. It was developed in
1959 by Campbell and Fiske (Campbell, D. and Fiske, D. (1959). Convergent and
discriminant validation by the multitrait-multimethod matrix. 56, 2, 81-105.) in part
as an attempt to provide a practical methodology that researchers could actually use
(as opposed to the nomological network idea which was theoretically useful but did
not include a methodology). Along with the MTMM, Campbell and Fiske introduced
two new types of validity -- convergent and discriminant -- as subcategories of
construct validity. Convergent validity is the degree to which concepts that should
be related theoretically are interrelated in reality. Discriminant validity is the degree
to which concepts that should not be related theoretically are, in fact, not interrelated
in reality. You can assess both convergent and discriminant validity using the
MTMM. In order to be able to claim that your measures have construct validity, you
have to demonstrate both convergence and discrimination.

The MTMM is simply a matrix or table of correlations arranged to facilitate the


interpretation of the assessment of construct validity. The MTMM assumes that you
measure each of several concepts (called traits by Campbell and Fiske) by each of
several methods (e.g., a paper-and-pencil test, a direct observation, a performance
measure). The MTMM is a very restrictive methodology -- ideally you should
measure each concept by each method.
The Multitrait-Multimethod Matrix

To construct an MTMM, you need to arrange the correlation matrix by concepts


within methods. The figure shows an MTMM for three concepts (traits A, B and C)
each of which is measured with three different methods (1, 2 and 3) Note that you
lay the matrix out in blocks by method. Essentially, the MTMM is just a correlation
matrix between your measures, with one exception -- instead of 1's along the
diagonal (as in the typical correlation matrix) we substitute an estimate of the
reliability of each measure as the diagonal.

Before you can interpret an MTMM, you have to understand how to identify the
different parts of the matrix. First, you should note that the matrix is consists of
nothing but correlations. It is a square, symmetric matrix, so we only need to look at
half of it (the figure shows the lower triangle). Second, these correlations can be
grouped into three kinds of shapes: diagonals, triangles, and blocks. The specific
shapes are:

The Reliability Diagonal


(monotrait-monomethod)

Estimates of the reliability of each measure in the matrix. You can


estimate reliabilities a number of different ways (e.g., test-retest, internal
consistency). There are as many correlations in the reliability diagonal as
there are measures -- in this example there are nine measures and nine
reliabilities. The first reliability in the example is the correlation of Trait
A, Method 1 with Trait A, Method 1 (hereafter, I'll abbreviate this
relationship A1-A1). Notice that this is essentially the correlation of the
measure with itself. In fact such a correlation would always be perfect
(i.e., r=1.0). Instead, we substitute an estimate of reliability. You could
also consider these values to be monotrait-monomethod correlations.

The Validity Diagonals


(monotrait-heteromethod)

Correlations between measures of the same trait measured using


different methods. Since the MTMM is organized into method blocks,
there is one validity diagonal in each method block. For example, look
at the A1-A2 correlation of .57. This is the correlation between two
measures of the same trait (A) measured with two different measures (1
and 2). Because the two measures are of the same trait or concept, we
would expect them to be strongly correlated. You could also consider
these values to be monotrait-heteromethod correlations.

The Heterotrait-Monomethod Triangles

These are the correlations among measures that share the same method
of measurement. For instance, A1-B1 = .51 in the upper left heterotrait-
monomethod triangle. Note that what these correlations share is
method, not trait or concept. If these correlations are high, it is because
measuring different things with the same method results in correlated
measures. Or, in more straightforward terms, you've got a strong
"methods" factor.

Heterotrait-Heteromethod Triangles
The Multitrait-Multimethod Matrix

These are correlations that differ in both trait and method. For instance,
A1-B2 is .22 in the example. Generally, because these correlations share
neither trait nor method we expect them to be the lowest in the matrix.

The Monomethod Blocks

These consist of all of the correlations that share the same method of
measurement. There are as many blocks as there are methods of
measurement.

The Heteromethod Blocks

These consist of all correlations that do not share the same methods.
There are (K(K-1))/2 such blocks, where K = the number of methods.
In the example, there are 3 methods and so there are (3(3-1))/2 =
(3(2))/2 = 6/2 = 3 such blocks.

Now that you can identify the different parts of the MTMM, you can begin to
understand the rules for interpreting it. You should realize that MTMM
interpretation requires the researcher to use judgment. Even though some of the
principles may be violated in an MTMM, you may still wind up concluding that you
have fairly strong construct validity. In other words, you won't necessarily get perfect
adherence to these principles in applied research settings, even when you do have
evidence to support construct validity. To me, interpreting an MTMM is a lot like a
physician's reading of an x-ray. A practiced eye can often spot things that the
neophyte misses! A researcher who is experienced with MTMM can use it identify
weaknesses in measurement as well as for assessing construct validity.

To help make the principles more concrete, let's make the example a bit more
realistic. We'll imagine that we are going to conduct a study of sixth grade students
and that we want to measure three traits or concepts: Self Esteem (SE), Self
Disclosure (SD) and Locus of Control (LC). Furthermore, let's measure each of
these three different ways: a Paper-and-Pencil (P&P) measure, a Teacher rating, and a
Parent rating. The results are arrayed in the MTMM. As the principles are presented,
try to identify the appropriate coefficients in the MTMM and make a judgement
The Multitrait-Multimethod Matrix

yourself about the strength of construct validity claims.

The basic principles or rules for the MTMM are:

Coefficients in the reliability diagonal should consistently be the highest in the


matrix.

That is, a trait should be more highly correlated with itself than with
anything else! This is uniformly true in our example.

Coefficients in the validity diagonals should be significantly different from


zero and high enough to warrant further investigation.

This is essentially evidence of convergent validity. All of the correlations


in our example meet this criterion.

A validity coefficient should be higher than values lying in its column and row
in the same heteromethod block.

In other words, (SE P&P)-(SE Teacher) should be greater than (SE


P&P)-(SD Teacher), (SE P&P)-(LC Teacher), (SE Teacher)-(SD P&P)
and (SE Teacher)-(LC P&P). This is true in all cases in our example.

A validity coefficient should be higher than all coefficients in the heterotrait-


monomethod triangles.

This essentially emphasizes that trait factors should be stronger than


methods factors. Note that this is not true in all cases in our example.
For instance, the (LC P&P)-(LC Teacher) correlation of .46 is less than
(SE Teacher)-(SD Teacher), (SE Teacher)-(LC Teacher), and (SD
Teacher)-(LC Teacher) -- evidence that there might me a methods
factor, especially on the Teacher observation method.

The same pattern of trait interrelationship should be seen in all triangles.

The example clearly meets this criterion. Notice that in all triangles the
SE-SD relationship is approximately twice as large as the relationships
that involve LC.

The MTMM idea provided an operational methodology for assessing construct


validity. In the one matrix it was possible to examine both convergent and
discriminant validity simultaneously. By its inclusion of methods on an equal footing
with traits, Campbell and Fiske stressed the importance of looking for the effects of
how we measure in addition to what we measure. And, MTMM provided a rigorous
framework for assessing construct validity.

Despite these advantages, MTMM has received little use since its introduction in
1959. There are several reasons. First, in its purest form, MTMM requires that you
have a fully-crossed measurement design -- each of several traits is measured by each
of several methods. While Campbell and Fiske explicitly recognized that one could
The Multitrait-Multimethod Matrix

have an incomplete design, they stressed the importance of multiple replication of


the same trait across method. In some applied research contexts, it just isn't possible
to measure all traits with all desired methods (would you use an "observation" of
weight?). In most applied social research, it just wasn't feasible to make methods an
explicit part of the research design. Second, the judgmental nature of the MTMM
may have worked against its wider adoption (although it should actually be perceived
as a strength). many researchers wanted a test for construct validity that would result
in a single statistical coefficient that could be tested -- the equivalent of a reliability
coefficient. It was impossible with MTMM to quantify the degree of construct validity
in a study. Finally, the judgmental nature of MTMM meant that different researchers
could legitimately arrive at different conclusions.

As mentioned
above, one of the
most difficult
aspects of MTMM
from an
implementation
point of view is that
it required a design
that included all
combinations of
both traits and
methods. But the
ideas of convergent
and discriminant
validity do not
require the methods factor. To see this, we have to reconsider what Campbell and
Fiske meant by convergent and discriminant validity.

It is the principle that measures of theoretically similar constructs should be highly


intercorrelated. We can extend this idea further by thinking of a measure that has
multiple items, for instance, a four-item scale designed to measure self-esteem. If
each of the items actually does reflect the construct of self-esteem, then we would
expect the items to be highly intercorrelated as shown in the figure. These strong
intercorrelations are evidence in support of convergent validity.
The Multitrait-Multimethod Matrix

It is the principle that measures of theoretically different constructs should not correlate highly
with each other. We can see that in the example that shows two constructs --
self-esteem and locus of control -- each measured in two instruments. We would
expect that, because these are measures of different constructs, the cross-construct
correlations would be low, as shown in the figure. These low correlations are
evidence for validity. Finally, we can put this all together to see how we can address
both convergent and discriminant validity simultaneously. Here, we have two
constructs -- self-esteem and locus of control -- each measured with three
instruments. The red and green correlations are within-construct ones. They are a
reflection of convergent validity and should be strong. The blue correlations are
cross-construct and reflect discriminant validity. They should be uniformly lower
than the convergent coefficients.

The important thing to notice about this matrix is that it does not explicitly include a
methods factor as a true MTMM would. The matrix examines both convergent and
discriminant validity (like the MTMM) but it only explicitly looks at construct intra-
and interrelationships. We can see in this example that the MTMM idea really had
two major themes. The first was the idea of looking simultaneously at the pattern of
convergence and discrimination. This idea is similar in purpose to the notions
implicit in the nomological network -- we are looking at the pattern of
interrelationships based upon our theory of the nomological net. The second idea in
MTMM was the emphasis on methods as a potential confounding factor.
The Multitrait-Multimethod Matrix

While methods may confound the results, they won't necessarily do so in any given
study. And, while we need to examine our results for the potential for methods
factors, it may be that combining this desire to assess the confound with the need to
assess construct validity is more than one methodology can feasibly handle. Perhaps
if we split the two agendas, we will find that the possibility that we can examine
convergent and discriminant validity is greater. But what do we do about methods
factors? One way to deal with them is through replication of research projects, rather
than trying to incorporate a methods test into a single research study. Thus, if we
find a particular outcome in a study using several measures, we might see if that same
outcome is obtained when we replicate the study using different measures and
methods of measurement for the same constructs. The methods issue is considered
more as an issue of generalizability (across measurement methods) rather than one of
construct validity.

When viewed this way, we have moved from the idea of a MTMM to that of the
multitrait matrix that enables us to examine convergent and discriminant validity, and
hence construct validity. We will see that when we move away from the explicit
consideration of methods and when we begin to see convergence and discrimination
as differences of degree, we essentially have the foundation for the pattern matching
approach to assessing construct validity.

Copyright 2006, William M.K. Trochim, All Rights Reserved


Purchase a printed copy of the Research Methods Knowledge Base
Last Revised: 10/20/2006