Professional Documents
Culture Documents
1
Exploratory Factor Analysis
• Developed for problems that involve quantities that are
not directly measurable
→ E.g. social class, intelligence, . . .
→ These quantities are called latent variables
• Examine such quantities through measurable indicators
that are related to the quantity of interest
→ E.g. Educational background, occupation, income,
test scores, . . .
→ These indicators are called manifest variables
2
Exploratory Factor Analysis
• Uncover the relationship between the latent and
manifest variables
• dimension reduction
3
Factor Analysis Model
• The manifest variables are centered:
xj := xj − x̄j
4
Factor Analysis Model
A regression model of the following form is assumed
5
Factor Analysis Model
or shortly xi = Λfi + ui
λ11 λ12 ... λ1k
λ21 λ22 ... λ2k
where Λ = . ..
.. ..
. .
6
Factor loadings
• λql expresses how manifest variable xq
depends on common factor f l
7
Factor Analysis Model: assumptions
• The specific variates u1 . . . , uk are uncorrelated with
each other and with the factors f 1 . . . , f k .
→ Correlations between manifest variables arise from
their relationship with the factors
8
Variance decomposition
From the assumptions it follows that
∑
k
var(xj ) = λ2jl + ψj = h2j + ψj
l=1
where ψj
= var(uj ).
∑k
• hj = l=1 λ2jl is the communality of xj , that is, the
2
∑
k
cov(xj , xq ) = λjl λql
l=1
10
Variance-covariance matrix
decomposition
The factor analysis model assumes that the
variance-covariance matrix S can be decomposed
as follows:
S = ΛΛt + ψ
where ψ = diag(ψ1 , . . . , ψp )
11
Finding FA solutions
• Several estimation procedures have been
proposed
12
Example: Examination marks
Children’s examination marks in the following
subjects are considered
x1 Classics
x2 French
x3 English
13
Example: Examination marks
The data produced the following correlation matrix
x1 x2 x3
Classics 1.00 0.83 0.78
14
Example: Factor model
We assume a single factor model
xi1 = λ1 fi + ui1
xi2 = λ2 fi + ui2
xi3 = λ3 fi + ui3
15
Example: Solution
The solution is
16
Choosing the number of factors
• Solution completely differs if number of factors is
changed
17
Choosing the number of factors
• Sometimes suggested by the problem and data
18
Example: Life expectancy
Life expectancy in years for 31 countries and for 8
categories of people
M0 Males at age 0
M25 Males at age 25
M50 Males at age 50
M75 Males at age 75
W0 Females at age 0
W25 Females at age 25
W50 Females at age 50
W75 Females at age 75
19
Example: Life expectancy
We use the MLE test to determine the number of factors:
20
Example: Life expectancy
−→ 3 factors is sufficient
21
Factor rotation
• The factors are not uniquely defined!
−1
• Consider an orthogonal matrix M (M t
= M ), then
t
xi = Λfi + ui = (ΛM )(M fi ) + ui
t t
and S = (ΛM )(ΛM ) + ψ = ΛΛ + ψ
22
Factor rotation
Rotate factors to obtain a simple structure
23
Orthogonal factor rotation
• Factors remain orthogonal
24
Oblique factor rotation
• Give up on orthogonality of factors
• A simpler structure of loadings is possible
• Sometimes correlated factors are expected
• Loadings are not covariances/correlation
anymore!
25
Example: Life expectancy
No rotation Varimax
26
Example: Life Expectancy
Factor interpretation
27
Example: Life expectancy
Varimax Promax
28
Finding factor scores
Under the assumption of normality, one can obtain
factor scores as
ˆ −1
fi = Λ̂S xi
29
Example: Life expectancy
Algeria
Dominican Rep
EcuadorNicaragua
2
Panama
Costa Rica Tunisia
El Salvador
Guatemala
1
Columbia
Mexico
Argentina
Grenada
Jamaica
Chile Canada
Madagascar Seychelles
Factor3
Factor2
Honduras
Trinidad(62)United States (W66)
(66)
Greenland United States (67) 4
0
South Africa(C)
United States (NW66)
Trinidad (67)
MauritiusSouth Africa(W) 3
Reunion 2
−1
Cameroon
0
−1
−2
−2
−3 −2 −1 0 1 2
Factor1
30
FA vs PCA
PCA and FA both focus on explaining a data
structure through a small number of dimensions
31
FA vs PCA
• PCA based on covariance or correlation matrix is
completely different while FA based on
covariance and correlation matrix are essentially
equivalent
32
Confirmatory factor analysis
• Used to check some existing theory
• Incorporate constraints on loadings into
model
33
Final example: Drug use
Data about the use of psychoactive substances
among school children (7th-11th grade) in 11 schools
in Los Angeles metropolitan area. The number of
times a particular substance had been used is
recorded on a 5-point scale:
1 : Never tried
2 : Only once
3 : A few times
4 : Many times
5 : Regularly
34
Example: Drug use
Questions about the following substances were
asked
x1 : cigarettes
x2 : beer
x3 : wine
x4 : liquor
x5 : cocaine
x6 : tranquillizers
x7 : drug store medicine
x8 : heroin/opiates
x9 : marijuana
x10 : hashish
x11 : inhalants (glue, gasoline,. . . )
x12 : hallucinogenics (LSD, . . . )
x13 : amphetamine, stimulants
35
Example: Drug use
Factor1 Factor2 Factor3 Factor4
36
Example: Drug use
Factor interpretation
37