You are on page 1of 32

Factor analysis

(PCA)

Prof : Xavier Boute


boute@hec.fr

The 5 keys for decision


?
I : Data structuration

II : Data integration

III : Basic descriptive statistics

IV : Multidimensional analysis

V : Visualization
IV : Multidimensional analysis
Id Variable 1 Variable 2 … Variable p
Individual 1 Id 1
N in millions
Individual 2 Id 2
p in hundreds

Individual N Id N

Describe Explain
Varp = f(Var1; Var2; . . . ; Varp−1)

Factor analysis : what is it ?


Geometrical approach
X6
*
F2

*
*
*

* 0
F1

X2

X5
3D / 2D
Example
Describe a dataset individus x variables
n=24 cars
p=6 variables

  X1=Engine size (cm3)


  X2=Powerful (horsepower)
  X3=Speed (km/h)
  X4=Weight (kg)
  X5=Width (cm)
  X6=Length (cm)

Source : l’argus de l’automobile 2004


First factor
F1 = a1X1 + a2X2 + ⋯ + apXp

X6 * X1=Engine size (cm3)


F2
* X2=Powerful (horsepower)
* *
X3=Speed (km/h)
* 0 X4=Weight (kg)
F1 X5=Width (cm)
X2 X6=Length (cm)
*

X5
« Choux et carottes »

We introduce some new variables Zscore :


Xj − μj Z : no unit
Zj = Mean =0
σj Std deviation = 1
3
For example, Enginesize (Citroën C2) = 1124cm
3
μEnginesize = 2722.54cm
3
σEnginesize = 1516.445cm

1124 − 2722.54
So ZEnginesize (Citroën C2) = = − 1.05
1516.445

Properties of Zscore
If Zj(i) > 0, then individual i has a value above the mean for the variable j .

If Zj(i) < 0, then individual i has a value below the mean for the variable j .

If Zj(i) > 2, then individual i has a an high extrem value for the variable j .


If Zj(i) < − 2, then individual i has a a low extrem value for the variable j .
The standardized data (Zscore)

Easier to
compare,
identify
extreme
values and
for what !
Z6 First factor

F1 = u11Z1 + u12Z2 + ⋯ + u1pZp


* F2
Z2
*
*
*
* 0 F1
Z5

F1 has to be well correlated with Z1; Z2; ⋯; Zp
Matrix of correlations
R
Y Y Y

R∼−1 X R∼0 X R∼1 X


−2 +2
n n

−1 0 +1
Negative No Positive
correlation correlation correlation

Matrix of correlations
First factor

F1 = u11Z1 + u12Z2 + ⋯ + u1pZp

Problem : Find u11; u12⋯; u1p


such as :
p
2

R (F1; Zj) is maximum.
j=1

Correlation between F1 and Z


maximum
HOW IS IT POSSIBLE ???
First factor : process

λ1 ≥ λ2 ≥ ⋯ ≥ λp
Eigenvalues

Matrix of correlations

F1 = u11Z1 + u12Z2 + ⋯ + u1pZp (u11; u12⋯; u1p) λ1


p Eigenvector
2

R (F1; Zj) is maximum and is  = λ1
In this case : j=1
Mean (F1) = 0 and standard-deviation(F1) = 1
SPSS : eigenvalues

λ1 = 4.411
p
2

So  R (F1; Zj) = 4.411
j=1
SPSS : eigenvectors

F1 = .218ZEngsize
+.209ZPower
+.201ZSpeed
+.172ZWeight
+.182ZWidth
+.180ZLength

Calculate F1 for all the individuals
Interpret F1 as a new variable
Give the quality of F1
Calculate F1 for all the individuals

F1 = .218ZEngsize F1(Citroën) = .218ZEngsize(Citroën) F1(Citroën) = .218 × (−1.054)


+.209ZPower +.209ZPower(Citroën) +.209 × (−.935)
+.201ZSpeed +.201ZSpeed(Citroën) +.201 × (−1.000)
+.172ZWeight +.172ZWeight(Citroën) +.172 × (−1.431)
+.182ZWidth +.182 × (−.812)
+.182ZWidth(Citroën)
+.180 × (−1.052)
+.180ZLength +.180ZLength(Citroën)

F1(Citroën) = − 1.210
Interpret F1 as a new variable :
F1 = .218ZEngsize Give a name to both parts of the axis (- and +)
+.209ZPower
+.201ZSpeed « Which variables make a  F1 for a car
+.172ZWeight as negative as possible ?
+.182ZWidth F1 as positive as possible ? »
+.180ZLength Tina’s question !

Small cars Big cars +


− 0
ZEngsize < 0 Engsize < Engsize Engsize > Engsize ZEngsize > 0
ZPower < 0 Power < Power Power > Power ZPower > 0
ZSpeed < 0 Speed < Speed Speed > Speed ZSpeed > 0
ZWidth < 0 Width < Width Width > Width ZWidth > 0
ZLength < 0 Length < Length Length > Length ZLength > 0
ZWeight < 0 Weight < Weight Weight > Weight ZWeight > 0

Small cars Big cars


− 0 +
Quality of F1
p
2

R (F1; Zj) is maximum and is  = λ1
j=1

If F1was totally corraleted with all the Zj :
p
2

R (F1; Zj) would be equal to p .
j=1

λ1
So, the quality of F1 =
p

73 %  of the p original variables is explained by F1
Z6 Second factor

*
F2 = u21Z1 + u22Z2 + ⋯ + u2pZp
F2
Z2
*
* *
* 0 F1
Z5

*
F2 has to be well correlated with Z1; Z2; ⋯; Zp

Cor(F1; F2) = 0
Second factor : process

λ1 ≥ λ2 ≥ ⋯ ≥ λp
Eigenvalues

Matrix of correlations

F2 = u21Z1 + u22Z2 + ⋯ + u2pZp (u21; u22⋯; u2p) λ2


Eigenvector
p
2

Cor(F1; F2) = 0; R (F2; Zj) is maximum and is  = λ2
In this case :
j=1
Mean (F2) = 0 and standard-deviation(F2) = 1
SPSS : eigenvalues
Total Variance Explained

Eigenvalues λ2 = 0.853
Component
1 4.411 73.521 73.521 p
2

R (F2; Zj) = 0.853
2
3
.853
.436
14.223
7.261
87.745
95.006
So 
4 .236 3.931 98.937 j=1
5 .051 .857 99.794
6 .012 .206 100.000
Extraction Method: Principal Component Analysis.

λ2 .853
Quality of F2 = = = 14.223 %
p 6
λ1 + λ2
Quality of (F1; F2) = = 87.745 %
p
SPSS : eigenvectors

F2 = −.149ZEngsize
−.413ZPower
−.397ZSpeed
+.675ZWeight
−.130ZWidth
+.591ZLength
« Which variables make a  F2 for a car
as negative as possible ?
F2 as positive as possible ? »
Tina’s question !

Interpret F2 as a new variable
F2 = −.149ZEngsize « Which variables make a  F2 for a car
as negative as possible ?
−.413ZPower
F2 as positive as possible ? »
−.397ZSpeed Tina’s question !
+.675ZWeight
−.130ZWidth
+.591ZLength
Sport cars Family cars
+
− 0
ZWeight < 0 Weight < Weight Weight > Weight ZWeight > 0
ZLength < 0 Length < Length Length > Length ZLength > 0
ZPower > 0 Power > Power ZPower < 0
Power < Power
ZSpeed > 0 Speed > Speed Speed < Speed ZSpeed < 0
ZEngsize > 0 Engsize > Engsize ZEngsize < 0
Engsize < Engsize
ZWidth > 0 Width > Width ZWidth < 0
Width < Width

MODÈLE Facteur 2
1 Land Rover Discovery Td5 2.035
2 Jaguar S-Type 2.7 V6 Bi-Turbo .951
3 Mercedes Classe S 400 CDI .858
4 Land Rover Defender Td5 .796
5 Nissan X-Trail 2.2 dCi .765
6 Volkswagen Touran 1.9 TDI 105 .755
7 BMW 745i .646
8 Peugeot 407 3.0 V6 BVA .554
9 Mercedes Classe C 270 CDI .510
10
11
BMW 530d
Renault Scenic 1.9 dCi 120
.488
.403 Sport cars Family cars
− +
12 Peugeot 307 1.4 HDI 70 .318
13
14
Audi A3 1.9 TDI
Bentley Continental GT
.179
.068
0
15 Citroën C3 Pluriel 1.6i -.231
16 Nissan Micra 1.2 65 -.428
17 Audi TT 1.8T 180 -.487
18 Citroën C2 1.1 Base -.540
19 BMW Z4 2.5i -.632
20 Aston Martin Vanquish -.678
21 Mini 1.6 170 -.864
22 Renault Clio 3.0 V6 -.970
23 Smart Fortwo Coupé -1.765
24 Ferrari Enzo -2.734
Factorial map Family cars (14%)

Land Rover Discovery


2

Nissan X-Trail 2.2 d


Jaguar S-Type 2.7 V6
1 Volkswagen Touran
Mercedes Classe S
Land Rover Defender Peugeot 407 3.0 V6
Renault Scenic 1.9 d Mercedes Classe C BMW 745i
Peugeot 307 1.4 HDI BMW 530d Bentley Continental
Small cars 0 Citroën C3 Pluriel
Nissan Micra 1.2
Audi A3 1.9 TDI
Audi TT 1.8T 180
Big cars
Aston Martin Vanquish
BMW Z4 2.5i
-1
Citroën C2 1.1
Mini 1.6 170 Renault Clio 3.0 V6 (73%)
Facteur 2

-2 Smart Fortwo Coupé

Ferrari Enzo

-3
-2 -1 0 1 2 3

Facteur 1

Sport cars
« Non puoi insegnare niente a un uomo.
Puoi solo aiutarlo a scoprire ciò che ha dentro di sé »

You might also like