Professional Documents
Culture Documents
(PCA)
II : Data integration
IV : Multidimensional analysis
V : Visualization
IV : Multidimensional analysis
Id Variable 1 Variable 2 … Variable p
Individual 1 Id 1
N in millions
Individual 2 Id 2
p in hundreds
…
Individual N Id N
Describe Explain
Varp = f(Var1; Var2; . . . ; Varp−1)
*
*
*
* 0
F1
X2
X5
3D / 2D
Example
Describe a dataset individus x variables
n=24 cars
p=6 variables
First factor
F1 = a1X1 + a2X2 + ⋯ + apXp
X5
« Choux et carottes »
1124 − 2722.54
So ZEnginesize (Citroën C2) = = − 1.05
1516.445
Properties of Zscore
If Zj(i) > 0, then individual i has a value above the mean for the variable j .
Easier to
compare,
identify
extreme
values and
for what !
Z6 First factor
F1 has to be well correlated with Z1; Z2; ⋯; Zp
Matrix of correlations
R
Y Y Y
ȳ
−1 0 +1
Negative No Positive
correlation correlation correlation
Matrix of correlations
First factor
λ1 ≥ λ2 ≥ ⋯ ≥ λp
Eigenvalues
Matrix of correlations
λ1 = 4.411
p
2
∑
So R (F1; Zj) = 4.411
j=1
SPSS : eigenvectors
F1 = .218ZEngsize
+.209ZPower
+.201ZSpeed
+.172ZWeight
+.182ZWidth
+.180ZLength
Calculate F1 for all the individuals
Interpret F1 as a new variable
Give the quality of F1
Calculate F1 for all the individuals
F1(Citroën) = − 1.210
Interpret F1 as a new variable :
F1 = .218ZEngsize Give a name to both parts of the axis (- and +)
+.209ZPower
+.201ZSpeed « Which variables make a F1 for a car
+.172ZWeight as negative as possible ?
+.182ZWidth F1 as positive as possible ? »
+.180ZLength Tina’s question !
If F1was totally corraleted with all the Zj :
p
2
∑
R (F1; Zj) would be equal to p .
j=1
λ1
So, the quality of F1 =
p
73 % of the p original variables is explained by F1
Z6 Second factor
*
F2 = u21Z1 + u22Z2 + ⋯ + u2pZp
F2
Z2
*
* *
* 0 F1
Z5
*
F2 has to be well correlated with Z1; Z2; ⋯; Zp
Cor(F1; F2) = 0
Second factor : process
λ1 ≥ λ2 ≥ ⋯ ≥ λp
Eigenvalues
Matrix of correlations
Eigenvalues λ2 = 0.853
Component
1 4.411 73.521 73.521 p
2
∑
R (F2; Zj) = 0.853
2
3
.853
.436
14.223
7.261
87.745
95.006
So
4 .236 3.931 98.937 j=1
5 .051 .857 99.794
6 .012 .206 100.000
Extraction Method: Principal Component Analysis.
λ2 .853
Quality of F2 = = = 14.223 %
p 6
λ1 + λ2
Quality of (F1; F2) = = 87.745 %
p
SPSS : eigenvectors
F2 = −.149ZEngsize
−.413ZPower
−.397ZSpeed
+.675ZWeight
−.130ZWidth
+.591ZLength
« Which variables make a F2 for a car
as negative as possible ?
F2 as positive as possible ? »
Tina’s question !
Interpret F2 as a new variable
F2 = −.149ZEngsize « Which variables make a F2 for a car
as negative as possible ?
−.413ZPower
F2 as positive as possible ? »
−.397ZSpeed Tina’s question !
+.675ZWeight
−.130ZWidth
+.591ZLength
Sport cars Family cars
+
− 0
ZWeight < 0 Weight < Weight Weight > Weight ZWeight > 0
ZLength < 0 Length < Length Length > Length ZLength > 0
ZPower > 0 Power > Power ZPower < 0
Power < Power
ZSpeed > 0 Speed > Speed Speed < Speed ZSpeed < 0
ZEngsize > 0 Engsize > Engsize ZEngsize < 0
Engsize < Engsize
ZWidth > 0 Width > Width ZWidth < 0
Width < Width
MODÈLE Facteur 2
1 Land Rover Discovery Td5 2.035
2 Jaguar S-Type 2.7 V6 Bi-Turbo .951
3 Mercedes Classe S 400 CDI .858
4 Land Rover Defender Td5 .796
5 Nissan X-Trail 2.2 dCi .765
6 Volkswagen Touran 1.9 TDI 105 .755
7 BMW 745i .646
8 Peugeot 407 3.0 V6 BVA .554
9 Mercedes Classe C 270 CDI .510
10
11
BMW 530d
Renault Scenic 1.9 dCi 120
.488
.403 Sport cars Family cars
− +
12 Peugeot 307 1.4 HDI 70 .318
13
14
Audi A3 1.9 TDI
Bentley Continental GT
.179
.068
0
15 Citroën C3 Pluriel 1.6i -.231
16 Nissan Micra 1.2 65 -.428
17 Audi TT 1.8T 180 -.487
18 Citroën C2 1.1 Base -.540
19 BMW Z4 2.5i -.632
20 Aston Martin Vanquish -.678
21 Mini 1.6 170 -.864
22 Renault Clio 3.0 V6 -.970
23 Smart Fortwo Coupé -1.765
24 Ferrari Enzo -2.734
Factorial map Family cars (14%)
Ferrari Enzo
-3
-2 -1 0 1 2 3
Facteur 1
Sport cars
« Non puoi insegnare niente a un uomo.
Puoi solo aiutarlo a scoprire ciò che ha dentro di sé »