You are on page 1of 64

Data Common Structure Groups Study Partial Analyses To go further Example

Multiple Factor Analysis

Julie Josse

Applied Mathematics Department, Agrocampus Ouest

Stat 300
Stanford, July 2015

1 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Multiple Factor Analysis

1 Data - Issues
2 Common Structure
3 Groups Study
4 Partial Analyses
5 Example

2 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Multi-blocks data set


Groups of variables (MFA)

Groups of
variables are
quantitative and/
or qualitative

Objectives: - study the link between the sets of variables


• Sensory analysis:- balance
productsthe -influence
sensorial, physico-chemical
of each group of variables
• Survey: individuals - questionnaires
- give the classical graphsthemes
but also (students
specific graphs:
health:
addicted consumptions,groups of variables -conditions,
psychological partial representation
sleep, id)
• Economy: countries - economic indicators each year
Examples: - Genomic: DNA, protein
• Biology: samples - -Omics
Sensorydata (brain
analysis: tumors:
sensorial, CGH, transcriptome;
physico-chemical
mouse: transcriptome, hepatic fatty acid measurements)
- Comparison of coding (quantitative / qualitative)

⇒ Generalized Canonical Correlation, Procrustes, Statis, etc. 3

⇒ MFA (Escofier & Pagès, 1998)


⇒ Continuous / categorical / contingency sets of variables
3 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

The data
<Experiment>Example: gliomas brain tumors
Gliomas: Brain tumors, WHO classification

- astrocytoma (A)……….……… x5

- oligodendroglioma (O)……… x8
43 tumor samples
- oligo-astrocytoma (OA)…… x6

- glioblastoma (GBM)………… x24

(Bredel et al.,2005)

• Transcriptional modification
- damage to DNA, CGH arrays (RNA), microarrays: 489 variables
• Damage -to DNA (CGHThe array):data,113
transcriptional modification (RNA), Microarrays
the expectations
variables
<Merged data tables>
<Transcriptome> <Genome alteration>
1 j1 J1 1 j2 J2
1
Tumors

‘-omics’ data

4 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Objectives

• Study the similarities between individuals with respect to all


the variables
• Study the linear relationships between variables

⇒ taking into account the structure on the data (balance the


influence of each group)

• Find the common structure with respect to all the groups -


highlight the specificities of each group
• Compare the typologies obtained from each group of variables
(separate analyses)

5 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Principal component methods

The core of principal component methods is PCA on particular


matrices

"Doing a data analysis, in good mathematics, is simply searching


eigenvectors, all the science of it (the art) is just to find the right
matrix to diagonalize"
Benzécri

MFA is a particular weighted PCA!

6 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Balancing the groups of variables


MFA is a weighted PCA:
• compute the first eigenvalue λj1 of each group of variables
• perform a global PCA on the weighted data table:
 
X1 X2 XJ
q ; q ; ...; q 
λ11 λ21 λJ1
⇒ Same idea as in PCA when variables are standardized: variables
are weighted to compute distances between individuals i and i 0

8 variables 2 var
i highly
correlated
i′

7 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Balancing the groups of variables

Transcriptome Genome
λ1 162 12
λ2 35 10
λ3 21 5

This weighting allows that:


• Same weight for all the variables of one group: the structure of
the group is preserved
• For each group the variance of the main dimension of
variability (first eigenvalue) is equal to 1
• No group can generate by itself the first global dimension
• A multidimensional group will contribute to the construction
of more dimensions than a one-dimensional group

8 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Individuals and variables representations

Individual factor map Correlation circle


2

GBM30 O4

1.0
● ●

GBM22 MGC39606
● O1 CEAL1
GBM28
GBM31 O5 ● IGSF3 GYPA
GS2 O3 HNRNPG.T
LU
1

JPA2 ●


● EGLN2
KIAA1543
GBM6 O2 O

GBM21 ZNF226
EDG1ZNF549 ACAD8
GBM25
GBM23 ●

GPSM2

0.5

GBM27 GBM15 sGBM3

GBM4GBM9
● GBM11
GBM AO1
● ●
MGC39581 KLRC3TLL2
GBM5 ● AOA6 AO3
LGG1

● ● MGP PPP1R14A
FBXO32 D21S2089E
GBM26
GS1 AOA4

SERPINA3
IFI30 TAP1 AKR1C2
GBM29

sGBM1 ●
●●

TMEM49
SOCS3 RPS3AKCNK6
GPNMB NDUFB11
MGC42367 T51726 CACNB2
DSCR1L1
CYR61 LUM KIAA0934
0


AOA3 ●
ITGA5
BGN GADD45B
NPD014 FAM84A DDEF1RAB11FIP4
Dim 2 (13.51 %)

Dim 2 (13.51 %)


T97457
NNMT
SERPING1
TIMP1VEGF
HMOX1
H20822CDKN1A
PRG1 CA12
AI822135
HLA.DRB1
COL1A2 HLA.G PSG1
R52960 BLVRB RAB30YWHAHAKR1C1
HPRT1
LOC400451
SERPINI1
AA598555

● LOC541471
CHI3L2
PLAUR ADM
AI871056
AI262682
DPYD
ANXA1
IGFBP3
LGALS3 ITGA3
MB
HLA.F
LYN
FN1
LY96 UBE4B
CAV1 OSR1 L3MBTL4
USP6NL
ANK3 SNRPN
H24428
YWHAG
VSNL1
AO2 S100A11 RGS16
H08563 COL3A1
DNASE1L1
AA281932AA490257
MSTP9
SP100CCL2
ABCC3
H86813 ABCA5
GBP2
HLA.AZFHX1B
PLA2G2A NR4A1 MICAL2
SYT7PRKCG
CDKN2D
EPB49X38595
GBM1 CD47
CLIC1
TNFRSF12A HSPG2
COL4A2
GBP1
NPL
MST150
AI335002
ASPA
AA489629 PYGL
BCL2A1
AA479357
UGCG
COL1A1
ADFP
C6orf12
CD53
PDLIM7 FAS
IL1RAP
SAA2
TncRNA
LTF
C8orf4
STC1
NCF2
CYBA
URP2 IL32
LOC283130 S100A1 L1CAM
PRKCB1
SPINT2
IQSEC1
HPCA KLRC1
H41096
GOT1
R61377FGF13
KCNMA1
PRKAR1B
PRSS1 CLTB
KLRC2
PDE4DIP
KIAA1644
COL6A2
FCGR2B
LSP1
AA873230F13A1
LAPTM5
CD58
PLAU
C1R RAB20
LAMB1
CCR1
NDST1
CD44
R70506APCS IGHG1
ESM1 KCNJ1
SYNGR3 MAL2
CHN1
RBP4
CBLN2NSF
INPP5F
PRSS3 PDE2A
AI357047
STXBP1
PRKCZ

IMAGE.33267EFHB
SERPINH1
H78560 HAMP
NPC2 GYPC SNCG RFXAP

0.0
OA●
CDC20B
SLC16A3
TNFAIP3
PLP2
AA669383
AA975768
RUNX1
MSN
EMP3 HBA1
SLC15A2
APOC2
S100A10
N98591 SCN2B
CTHRC1
SORT1
AIF1 ● DDN
PCP4
CAMK2A
OSBPL1A
CA11.1
CLCA3RAB27BMGC26694
CLSTN3
CKMT1A
CAMKK2
LOC388610RTN3
C16orf5
KNS2
CALM1
CBX6
A2BP1
VAMP2
CAPG
PDPN AA598631
AA401952
KIAA0963TSPAN4
T62491
AEBP1
NY.SAR.48 FAM46A
PYHIN1
ZNF217
TGFBI C1orf187
POSTN
STK17AMYBPC2
COX6A2 FBXO2
NPTX1
CAP2 WASF1
FBXL16
KLRC3.1
LOC613212
W93688
WDR7
TBC1D7
PPP3CB
GBM24
JPA3 A PLTPDES TOMM40
IGFBP5 DDIT4 NCF1
FREQ
ITPKA
H10054FABP3
AA424849
SETD5 NMNAT2
PDXP
TSPYL1
ARPP.19
RALYHCLS1
AA181288 FBL
MED11
PARP14 ADCY1 FLJ35740
MOAP1
PNCK
−1

● ●
LHFPL2
IGF2BP3 MMP2
HBXIP
RND3 LRBA FBXW7NLK
NUAK1 SCAMP5
STMN1
LMO7PSD3
LOC57228
NEFH
UBA52
C9orf48
ADARB2
DKFZp313A2432
FAM84B
MAPKBP1
ATP6V1C1
GBM16 STEAP3 TNCLAMA4 MDK
FCGRT AA702986
C4A PRRX1
MYC
DMN MTHFD2
PHLDA1PPP3CA
DLL3
FLJ38984 X37864
AOA7 RPS19BP1 ITGA9
COX7A1R70684
HK2
AP4B1 GAS1 EPHB1
HEY1
PCDHGC3 MYCBP2
PEG3
EXT1AA398420
LOC146795

AOA2 MRC2
CASP1 ANXA5 FLJ38944 PALLD SLC35A2
DEF6
AI005038 NLGN1
AA906888
MST1
LOC389831
AA029415

EDG3 FZD7 AI350724 MGC18216


DYNLT1
CSPG2 MASP1
TIMP3
LOC283070
KCNQ2 MVP
PYCR1
SOX4 PARD3
CCDC3
SPAG6
ID4 AA131320
CACNB3
GBM3 ● SUV39H1 PLOD2
WWTR1 BNIP2 PTPRZ1
ID3 GABBR1
TCF7L1
CA11 PSG5 AI263051 SRPXSPARC
COL9A2 H87106 EDNRB
NOTCH1

AA3 AOA1 TXNDC
IGF1R

−0.5


AI002301 TFDP2
FLJ12572 MDFI
USP3
H91845 DECR2
SVIL
−2

RBP7
VASP
FLJ40873
MGC4728
LOC56931
ZNF160
APOC1LILRA1
ZNF233
SBP1
DCLRE1B
FLJ12586
GNN1
−3

−1.0

−2 −1 0 1 2 3 −1.0 −0.5 0.0 0.5 1.0

Dim 1 (20.99 %) Dim 1 (20.99 %)

⇒ What can be added to interpret?

9 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Individuals and variables representations

Individual factor map

A
2

GBM30 O4
GBM ● ●
O
OA GBM22
● O1
GBM28
GBM31 O5
O3

GS2
1

● ● ●
JPA2 ●
GBM6 O2 O

GBM21 ●

GBM27
● GBM25
GBM23 ● ●

GBM4 GBM11 GBM15 ● sGBM3



AO1

GBM9 GBM ●
AOA6 AO3
GBM5
● ●
GBM26

LGG1



GS1 AOA4
sGBM1 ●●

GBM29 ●

0

● ● AOA3
Dim 2 (13.51 %)



AO2
GBM1●

OA
GBM24
JPA3 A
−1

● ●

GBM16 AOA7

AOA2 ●
GBM3 ●

AA3 AOA1


−2

GNN1
−3

−2 −1 0 1 2 3

Dim 1 (20.99 %)

Figure 4: Multi-way glioma data set: Characteristics of oligodendrogliomas are linked to modifica
the genomic status of genes located on 1p and 19q positions.

⇒ Do the means of the tumors coordinates per stage on each


dimension significantly differ from each other?

10 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Groups study

⇒ Synthetic comparison of the groups

⇒ Are the relative positions of individuals globally similar from one


group to another? Are the partial clouds similar?

⇒ Do the groups bring the same information?

11 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Principal component in MFA

MFA = weighted PCA ⇒ first principal component of MFA


maximizes
 
J X J
X x.k
X
cov 2  q , F1  = Lg (F1 , Kj )
j=1 k∈Kj λj1 j=1

WKj
Lg (F1 , Kj ) =< , F1 F10 >= trace(WK0 j F1 F10 )
λ1

⇒ F1 the most related to the groups in the Lg sense

12 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the groups


Group j has the
Groupscoordinates
representation (Lg (F1 , Kj ), Lg (F2 , Kj ))
1.0

CGH

• 2 groups are all the more


0.8

close that they induce the


Dim 2 (13.51 %)

0.6

same structure
• The 1st dimension is
0.4

common to all the groups


0.2

WHO expr • 2nd dimension mainly due


to CGH
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Dim 1 (20.99 %)

1 X
0 ≤ Lg (F1 , Kj ) = cov 2 (x.k , F1 ) ≤ 1
λj1 k∈Kj
| {z }
≤λj1

⇒ Could you predict the results of the PCA for each group?
13 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

The RV coefficient
Xj(I ×K ) and Xm(I ×Km ) not directly comparable
j
Wj(I ×I ) = Xj Xj0 and Wm(I ×I ) = Xm Xm0 can be compared
Inner product matrices = relative position of the individuals
Covariance between two groups:
X X
< Wj , Wm >= cov 2 (x.k , x.l )
k∈Kj l∈Km

Correlation between two groups (Escoufier, 1973):

< Wj , Wm >
RV (Kj , Km ) = 0 ≤ RV ≤ 1
kWj k kWm k

RV = 0: variables of Kj are uncorrelated with variables of Km


RV = 1: the two clouds of points are homothetic
⇒ Extension of the notion of correlation matrix
14 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Similarity between two groups

Measure of similarity between groups Kj and Km :


 
2 x.k x.l
X X
Lg (Kj , Km ) = cov ,
k∈K l∈K
λk1 λl1
j m

Ramsay (1984): "Matrices may be similar or dissimilar in a many


ways"

Canonical correlation (Hotteling, 1936), Mantel (1967), Procrustes


(Gower, 1971), dCov (Szekely et al., 2007), kernel based HSIC
(Gretton et al., 2005), etc...

15 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Numeric indicators

> res.mfa$group$Lg
CGH expr WHO PKj PKj
j 2 j 2
k=1 (λk ) k=2 (λk )
CGH 2.51 0.60 0.46
expr 0.60 1.10 0.36 Lg (Kj , Kj ) = = 1+
WHO 0.46 0.36 0.50 (λj1 )2 (λj1 )2

> res.mfa$group$RV
• CGH gives richer description (Lg greater)
CGH expr WHO • RV: a standardized Lg
CGH 1.00 0.36 0.41 • CGH and expr are not linked (RV=0.36)
expr 0.36 1.00 0.48
WHO 0.41 0.48 1.00

Contribution of each group to each component of the MFA


> res.mfa$group$contrib
• Similar contribution of the 2 groups to
Dim.1 Dim.2 Dim.3 the first dimension
CGH 45.8 93.3 78.1
expr 54.2 6.7 21.9
• Second dimension only due to CGH

16 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Partial analyses

⇒ Comparison of the groups through the individuals

⇒ Comparison of the typologies provided by each group in a


common space

⇒ Are there individuals very particular with respect to one group?

⇒ Comparison of the separate PCA

17 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Projection of partial points


G1 G2 G3
xxxxxxxx xxxxx xxxxx MFA individuals configuration
xxxxxxxx xxxxx xxxxx
xxxxxxxx xxxxx xxxxx
Data xxxxxxxx xxxxx xxxxx RK = ⊕ RK j
xxxxxxxx xxxxx xxxxx
i xxxxxxxx xxxxx xxxxx i
xxxxxxxx xxxxx xxxxx

xxxxxxxx 00000 00000


xxxxxxxx 00000 00000
xxxxxxxx 00000 00000
Projection of group 1 xxxxxxxx 00000 00000

xxxxxxxx 00000 00000


i1 xxxxxxxx 00000 00000
xxxxxxxx 00000 00000

00000000 xxxxx 00000


00000000 xxxxx 00000 Partial point 1
00000000 xxxxx 00000
Projection of group 2 00000000 xxxxx 00000 Partial point 2
00000000 xxxxx 00000
2 00000000 xxxxx 00000
i 00000000 xxxxx 00000 Mean point
00000000 00000 xxxxx
00000000 00000 xxxxx
00000000 00000 xxxxx Partial point 3
Projection of group 3 00000000 00000 xxxxx

00000000 00000 xxxxx


i3 00000000 00000 xxxxx
00000000 00000 xxxxx 18 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Partial points
opinion attitude

individuals

F2 What you think

behavioral conflict
individual i

What you do

F1
19 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Partial points

Tutorial participants

What you expected What you have learned


for the tutorial during the tutorial

F2
What you expected
for the tutorial
What you expected What you have learned
for the tutorial during the tutorial

What you have learned


during the tutorial

F1

20 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Partial points

Tutorial participants

What you expected What you have learned


for the tutorial during the tutorial

F2
What you expected
for the tutorial
What you expected What you have learned
for the tutorial during the tutorial Happy learner

Disappointed What you have learned


learner during the tutorial

F1

20 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the partial points


Individual factor map Individual factor map

CGH CGH

1.5
4

expr ●
expr

1.0

GBM30 O4

2

● ● ●

O
GBM22 ●
O1


GBM28
GBM31
GS2 ●
O5
O3 ●

0.5
● ●

GBM21 JPA2GBM6 O2O ●

GBM27
GBM4 GBM25
●GBM23
●●
GBM15
GBM11

sGBM3


● ●

GBM9
GBM5

GBM

GBM26 AO1
AOA6

LGG1 AO3
● ● ● GBM
Dim 2 (13.51 %)

Dim 2 (13.51 %)

●GS1
● ●● ● ●
● ●


GBM29

● ●●●
●●

AOA4
●sGBM1

● ●



AOA3


● ●

●●
● ●●





● ● ●
0

● ● ●
● ● ●●
● AO2

GBM1
● ●● ●●

0.0
●●● ●
● ●
●● ● ●● ● ●


JPA3 A ●OA
● ● ●

GBM24 ● ● ●

GBM16 ● ● AOA7

GBM3● AOA2 ●

AA3 ● AOA1 ●

−0.5

−2

●● ● ●

● OA

GNN1 ●
A

−1.0




−4

−1.5

−2.0
−6

−4 −2 0 2 4 6 −1 0 1 2

Dim 1 (20.99 %) Dim 1 (20.99 %)

• an individual is at the barycentre of its partial points


• an individual is all the more "homogeneous" that its
superposed representations are close
21 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Identify particular individuals

Individual factor map

CGH

2
GBM30 O4
expr ●

GBM22
● O1
GBM28
GBM31 O5
O3

GS2
1

JPA2

GBM6 O2 O

GBM21 ●

GBM27 GBM25
GBM23 ●

GBM4GBM9 GBM15 sGBM3


GBM11 AO1



GBM AOA6 AO3
GBM5 ●

GBM26 LGG1 ●
● ●

GS1 AOA4

sGBM1 ●


GBM29 ●
0

AOA3 ● ●
Dim 2 (13.51 %)




● AO2
GBM1 ●

OA●

GBM24
JPA3 A
−1

● ●

GBM16 AOA7
AOA2

GBM3 ●


AA3 AOA1

−2

GNN1
−3

−2 −1 0 1 2 3

Dim 1 (20.99 %)

22 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Numeric indicators

X J
I X I X
X J I X
X J
(Fi j q )2 = (Fiq )2 + (Fi j q − Fiq )2
i=1 j=1 i=1 j=1 i=1 j=1

Total inertia = Between indiv. inertia + Within indiv. inertia


> res.mfa$inertia.ratio
Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
0.84 0.56 0.44 0.59 0.43

• For the first dimension, the coordinates of each partial points


are close (0.84 close to 1)
• The within inertia can be decomposed by individuals
res.mfa$ind$within.inertia

23 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the partial components


Do the separate analyses give similar dimensions as MFA?

1 q Q

PCA

1 q Q
1

24 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the partial components


Partial axes

CGH
1.0

expr
WHO Dim1.CGH
0.5

• The first dimension of


Dim 2 (13.51 %)

Dim1.WHO
Dim1.expr each group is well
0.0

● Dim3.CGH

Dim3.expr projected
−0.5

Dim3.WHO
Dim2.expr
Dim2.WHO Dim2.CGH • CGH has same
dimensions as MFA
−1.0

−1.0 −0.5 0.0 0.5 1.0

Dim 1 (20.99 %)

25 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Use of biological knowledge


Genes can be grouped by gene ontology (GO) biological process

GO:0006928 GO:0009966 GO:0052276


cell motility regulation of signal chromosome
transduction organisation and
ANXA1 CASP1 biogenesis
CALD1 EDG2
CBX6
EGFR F2R
NUSAP1
ENPP2 HCLS1
PCOLN3
FN1 HMOX1
PTTG1
FPRL2 IGFBP3
SUV39H1
LSP1 IQSEC1
TCF7L1
MSN LYN
TSPYL1
PDPN MALT1
PLAUR TCF7L1
PRSS3 TNFAIP3
SAA2 TRIO
SPINT2 VEGF
TNFRSF12A YWHAG
VEGF YWHAH
WASF1
YARS

26 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Use of biological knowledge

• Biological processes considered as supplementary groups of


variables Modules
<MODULES of GENES>
Modular approach
1 j1 J1 1 j2 J2
1
Tumors

i M1 M2 M3 …..

‘-omics’ data Modules

=> Integration of the modules as groups of supplementary variables 27 / 58


Data Common Structure Groups Study Partial Analyses To go further Example

Use of biological knowledge


1.0 Groups representation

CGH
0.8


Dim 2 (13.51 %)


● ● ● ●
0.6


● ● ●
● ●
Many biological processes
● ●
induce the same structure
0.4




● ●
● ●

on the individuals than



● ●

● ●
● ● ●
● ●

● ●

MFA

● ● ● ●
● ●

0.2

● ●


● ●
● ● ● ●

● ● ● ● ● ●
● ● ● ●
● ●

WHO ●
● ●





● ●

● ● ● ●
●●

●●

●● ●

●● ●
expr

● ●●
● ●● ●● ●●●● ● ●●

● ●●
● ●●● ●● ● ● ●● ● ●●
● ● ● ●●
●●
● ●● ●
● ● ●● ● ● ●●●
● ● ● ●
● ●● ● ●
●● ● ● ● ● ●● ● ● ● ● ●

●●● ●●●
● ●● ●● ●●
● ● ●● ●
● ●● ● ● ●● ● ● ●●● ●●●●
● ●
● ● ● ●●●●●
●●
● ● ● ● ● ●

●●
● ● ●● ● ● ● ●●● ● ●●● ●●● ●
●●● ● ●
● ● ●● ●● ●
● ● ● ●● ● ● ● ●●● ●
●● ● ● ● ●● ● ● ● ●●●

●●
●● ● ● ● ● ●● ● ●

●●
0.0

● ● ●● ●● ● ●
● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ●● ●
● ● ● ● ● ●● ●●●
● ● ● ● ●
● ● ●● ● ● ● ● ●

0.0 0.2 0.4 0.6 0.8 1.0

Dim 1 (20.99 %)

28 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

To go further

• Mixed data: MFA with 1 group = 1 variable


continuous variables: PCA is recovered; categorical variables:
MCA is recovered
mixed: FAMD
• MFA used for methodological purposes:
• comparison of coding (continuous or categorical)
• comparison between preprocessing (standardized PCA and
unstandardized PCA)
• comparison of results from different analyses
• Hierarchical Multiple Factor Analysis:
Takes into account a hierarchy on the variables: variables are
grouped and subgrouped (like in questionnaires structured in
topics and subtopics)

29 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Clustering: MFA as a preprocessing


X1 X2

i’

MFA balances the influence of the groups when computing


distances between individuals
J Kj
2
X
0 1 X
d (i, i ) = p (xik − xi 0 k )2
j=1
λj k=1

AHC or k-means onto the first principal components (F.1 , ..., F.Q )
obtained from MFA allows to
• take into account the groups structure in the clustering
• make the clustering more robust by deleting the last dimensions
30 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Clustering
AHC onto the first 5 principal components from MFA

0.8
Hierarchical clustering

0.4
0.0
Cluster Dendrogram
1.0
inertia gain
0.8
0.6
0.4
0.2
0.0

O4
AO2
sGBM1
AOA6
AO1
AO3
GBM1
LGG1
AOA3
O3
O1
GBM6
O2
O5
AOA4
GBM25
GS1
GBM29
GBM31
GBM15
GBM26
JPA2
GBM9
GBM21
GBM22
GBM23
GBM27
GBM11
GBM4
GBM5
GBM30
GBM28
GS2
sGBM3
AOA1
GNN1
AOA2
AOA7
GBM24
JPA3
AA3
GBM3
GBM16

Individuals are sorted according to their coordinate on F.1


31 / 58
Data
Common Structure

0.0 0.2 0.4 0.6 0.8 1.0

O4
AO2
sGBM1
AOA6
AO1
AO3
GBM1
LGG1
AOA3
O3
O1
GBM6
O2
O5
AOA4
GBM25
GS1
GBM29
GBM31
GBM15
GBM26
Groups Study

JPA2
GBM9
GBM21
GBM22
GBM23
GBM27
Cluster Dendrogram

GBM11
GBM4
GBM5
GBM30
Hierarchical clustering

GBM28
GS2
sGBM3
AOA1
GNN1
AOA2
AOA7
GBM24
JPA3
AA3
GBM3
GBM16

0.0 0.4 0.8


An empirical number of clusters is suggested
Partial Analyses

inertia gain
Partition from the tree
To go further
Example

32 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Partition on the principal component map


Factor map

cluster 1

2
GBM30 O4
cluster 2
cluster 3
GBM22
O1
GBM31 GBM28 O5
GS2 JPA2
GBM21 GBM6 O3
cluster 1 GBM23 O2
GBM27
GBM11 GBM25 sGBM3 AO1
GBM4 GBM15 cluster 3 AO3
GBM9 GBM26
0 GBM5 LGG1AOA6
GS1
Dim 2 (13.51%)

GBM29 sGBM1 AOA4


AOA3
AO2

GBM1
JPA3
GBM24
GBM16
AOA7
cluster 2
AOA2
GBM3
−2

AA3 AOA1

GNN1
−4

−2 −1 0 1 2 3
Dim 1 (20.99%)

Continuous vision (principal component) and discontinuous


(clusters)
33 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Cluster description by variables

x̄` − x̄
v.test = r   H0 : random sampling of I` values from I
s2 I −I`
I` I −1

with x̄` the mean of variable x in cluster `, x̄ (s) the mean


(standard deviation) of the variable x in the data set, I` the cardinal
of cluster `
$desc.var$quanti$‘1‘
v.test Mean in category Overall mean sd in category Overall sd p.value
TMEM49 4.488 -0.430 -1.424 0.722 1.277 0.000
TNFRSF12A 4.433 -0.794 -1.838 0.789 1.357 0.000
LGALS3 4.369 -0.222 -1.216 0.861 1.312 0.000
S100A11 4.300 -0.737 -1.500 0.525 1.024 0.000
BGN 4.273 2.105 1.106 0.697 1.348 0.000
IFI30 4.264 0.987 0.026 0.979 1.300 0.000
....
....
C9orf48 -4.411 -0.686 -0.037 0.540 0.848 0.000
PSD3 -4.594 -1.684 -1.024 0.419 0.829 0.000
AA398420 -4.635 0.324 1.134 0.635 1.007 0.000

34 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Cluster description by observations


• parangon: the closest observations to the centroid of the cluster
min d (xi. , C` ) with C` the centroid of cluster `
i∈`

• specific observations: the furthest observations to the centroids


of the other clusters (the observations sorted according to their
distance from the highest to the smallest to the closest centroid)
max min
0
d (xi. , C`0 )
i∈` ` 6=`

desc.ind$para
cluster: 1
GBM11 GBM28 GBM5 GBM25 GBM31
0.6649847 0.7001998 0.7973604 0.8869271 0.9674042
---------------------------------------------------------------
desc.ind$dist
cluster: 1
GBM30 GS2 GBM21 GBM22 GBM27
3.227968 3.096048 3.031256 2.904327 2.778950
---------------------------------------------------------------
35 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Cluster description

• by the principal components (observations coordinates): same


description than for continuous variables
$desc.axes$quanti$‘1‘
v.test Mean in category Overall mean sd in category Overall sd p.value
Dim.2 2.919 0.511 0 0.465 1.010 0.004
Dim.1 -4.458 -0.974 0 0.560 1.259 0.000

• by categorical variables: chi-square and hypergeometric test


$test.chi2
p.value df
type 8.433474e-06 6

⇒ Active and supplementary elements are used


⇒ Only significant results are presented

36 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Cluster description

$‘1‘
Cla/Mod Mod/Cla Global p.value v.test
type=GBM 75 94.73684 55.81395 3.300966e-06 4.651145
type=OA 0 0.00000 13.95349 2.207775e-02 -2.289028
type=O 0 0.00000 18.60465 5.071916e-03 -2.802430

$‘2‘
Cla/Mod Mod/Cla Global p.value v.test
type=A 60 37.5 11.62791 0.0398214 2.055597

$‘3‘
Cla/Mod Mod/Cla Global p.value v.test
type=O 100.0 50.00 18.60465 8.875341e-05 3.919444
type=GBM 12.5 18.75 55.81395 2.319983e-04 -3.681354

37 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Complementarity between hierarchical clustering and


partitioning

• Partitioning after AHC: the k-means algorithm is initialized


from the centroids of the partition obtained from the tree
• consolidate the partition
• loss of the hierarchy

• AHC with many individuals: time-consuming


⇒ partitioning before AHC
• compute k-means with approximately 100 clusters
• AHC on the weighted centroids obtained from the k-means
⇒ top of the tree is approximately the same

38 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Other methods: ade4


Two-table analysis

Available methods

variables variables

Function name individuals Analysis name


between Between-class analysis
within Within-class analysis
discrimin Discriminant analysis
coinertia Coinertia analysis
cca Canonical correspondence analysis
pcaiv PCA on Instrumental Variables
pcaivortho Orthogonal PCAIV
procuste Procustes analysis
niche Niche (OMI) analysis

Stéphane Dray (Univ. Lyon 1) CARME 2011, Rennes 22 / 31

39 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Other methods: ade4


Other functions

K-table

variables

individuals
Function name Analysis name
sepan K-table separate analyses
pta Partial triadic analysis
foucart Foucart analysis
statis STATIS analysis
mfa Multiple factor analysis
mcoa Multiple coinertia analysis
statico 2 K-table analysis

Stéphane Dray (Univ. Lyon 1) CARME 2011, Rennes 26 / 31

40 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Other methods

Predict one block with others:

• Multi-block PLS regression


• Multi-block PCA on instrumental variables..

41 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

RV Tests

Is there any (linear) relationship between the 2 sets? H0 : ρV = 0

Asymptotic tests: distributions normal, elliptical - rank P


(Robert et
nRV ∼
al, 1985, Cléroux, 1995, Cléroux & Ducharme, 1989) λi Zi2
⇒ sensitive to the departure from the distribution and to n

42 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

RV Tests

Is there any (linear) relationship between the 2 sets? H0 : ρV = 0

Asymptotic tests: distributions normal, elliptical - rank P


(Robert et
nRV ∼
al, 1985, Cléroux, 1995, Cléroux & Ducharme, 1989) λi Zi2
⇒ sensitive to the departure from the distribution and to n

Permutation tests:
permute one matrix’s rows - compute the RV for n! permutations
p-value: proportion of the values greater than the observed one
⇒ computationally costly (“old fashion" argument?)

42 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

RV Tests

Is there any (linear) relationship between the 2 sets? H0 : ρV = 0

Asymptotic tests: distributions normal, elliptical - rank P


(Robert et
nRV ∼
al, 1985, Cléroux, 1995, Cléroux & Ducharme, 1989) λi Zi2
⇒ sensitive to the departure from the distribution and to n

Permutation tests:
permute one matrix’s rows - compute the RV for n! permutations
p-value: proportion of the values greater than the observed one
⇒ computationally costly (“old fashion" argument?)

Approximation of the permutation distribution


• sampling from the permutations - package ade4 (RV.rtest)
• moment matching: Pearson family, Edgeworth expansion

42 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Moments matching

The first three moments under H0 (Kazi-Aoual et al., 1995)


p
βx × βy (tr (X 0 X ))2 ( λi )2
P
EH0 (RV ) = βx = = P 2 .
n−1 tr ((X 0 X )2 ) λi

βx a measure of complexity 1 ≤ βx ≤ p
RV large: n small and many orthogonal variables per group

⇒ Normal approximation:

RV − EH0 (RV )
RVstd = p
VH0 (RV )

43 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Moments matching

Problem: the exact distribution of the RVstd is often skewed

Histogram of the standardized RV


0.5

Normal
Gamma
Edgeworth

⇒ Pearson type III


0.4
0.3

f(x) (skewness= γ):


Density

0.2

2  (4−γ 2 )/γ 2
(2/γ)4/γ 2+γx 2
Γ(4/γ 2 ) γ e −2(2+xγ)/γ
0.1
0.0

−1 0 1 2 3 4

⇒ package FactoMineR (coeffRV) (Josse et al., 2008)


44 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Back to the wine example!

• 10 white wines from Val de Loire (5 Vouvray - 5 Sauvignon)


• 27 continuous variables: sensory descriptors

Aroma.persistency

Overall.preference
Aroma.intensity

Visual.intensity
Odor.preferene
Astringency
Sweetness

Bitterness
O.passion

O.citrus
O.fruity

Acidity

Label

S Michaud 4.3 2.4 5.7 … 3.5 5.9 4.1 1.4 7.1 6.7 5.0 6.0 5.0 Sauvignon
S Renaudie 4.4 3.1 5.3 … 3.3 6.8 3.8 2.3 7.2 6.6 3.4 5.4 5.5 Sauvignon
S Trotignon 5.1 4.0 5.3 … 3.0 6.1 4.1 2.4 6.1 6.1 3.0 5.0 5.5 Sauvignon
S Buisse Domaine 4.3 2.4 3.6 … 3.9 5.6 2.5 3.0 4.9 5.1 4.1 5.3 4.6 Sauvignon
S Buisse Cristal 5.6 3.1 3.5 … 3.4 6.6 5.0 3.1 6.1 5.1 3.6 6.1 5.0 Sauvignon
V Aub Silex 3.9 0.7 3.3 … 7.9 4.4 3.0 2.4 5.9 5.6 4.0 5.0 5.5 Vouvray
V Aub Marigny 2.1 0.7 1.0 … 3.5 6.4 5.0 4.0 6.3 6.7 6.0 5.1 4.1 Vouvray
V Font Domaine 5.1 0.5 2.5 … 3.0 5.7 4.0 2.5 6.7 6.3 6.4 4.4 5.1 Vouvray
V Font Brûlés 5.1 0.8 3.8 … 3.9 5.4 4.0 3.1 7.0 6.1 7.4 4.4 6.4 Vouvray
V Font Coteaux 4.1 0.9 2.7 … 3.8 5.1 4.3 4.3 7.3 6.6 6.3 6.0 5.7 Vouvray

45 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Back to the wine example!


• 3 panels (oenologists, naive consumers, our students!)
• 60 preference scores: taste evaluation 1 - 10

Continuous variables
Categorical
Expert Consu Student Preference Label
(27) mer (15) (60) (1)
(15)
wine 1

wine 2

wine 10

• How are the products described by the panels?


• Do the panels describe the products in a same way? Is there a
specific description done by one panel?
46 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Practice with R

1 Define groups of active and supplementary variables


2 Scale or not the variables
3 Perform MFA
4 Choose the number of dimensions to interpret
5 Simultaneously interpret the individuals and variables graphs
6 Study the groups of variables
7 Study the partial representations
8 Use indicators to enrich the interpretation

47 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Practice with R

library(FactoMineR)
Expert <- read.table("http://factominer.free.fr/docs/Expert_wine.csv",
header=TRUE, sep=";", row.names=1)
Consu <- read.table(".../Consumer_wine.csv",header=T,sep=";",row.names=1)
Stud <- read.table(".../Student_wine.csv",header=T,sep=";",row.names=1)
Pref <- read.table(".../Preference_wine.csv",header=T,sep=";",row.names=1)

palette(c("black","red","blue","orange","darkgreen","maroon","darkviolet"))
complet <- cbind.data.frame(Expert[,1:28],Consu[,2:16],Stud[,2:16],Pref)
res.mfa <- MFA(complet,group=c(1,27,15,15,60),type=c("n",rep("s",4)),
num.group.sup=c(1,5),graph=FALSE,
name.group=c("Label","Expert","Consumer","Student","Preference"))
plot(res.mfa,choix="group",palette=palette())
plot(res.mfa,choix="var",invisible="quanti.sup",hab="group",palette=palette())
plot(res.mfa,choix="ind",partial="all",habillage="group",palette=palette())
plot(res.mfa,choix="axes",habillage="group",palette=palette())
dimdesc(res.mfa)
write.infile(res.mfa,file="my_wine_results.csv") #to export a list

48 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the individuals

Sauvignon
Vouvray S Trotignon
S Renaudie
S Michaud
1

Sauvignon • The two labels are


S Buisse Cristal V Aub Marigny
well separated
Dim 2 (24.42 %)

V Font Coteaux
0

S Buisse Domaine
Vouvray
• Vouvray are
sensorially more
-1

V Font Domaine
V Font Brûlés
different
• Several groups of
-2

V Aub Silex
wines, ...
-3

-2 -1 0 1 2 3
Dim 1 (42.52 %)

49 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the active variables

Expert
O.Intensity.before.shaking_S
1.0
Consumer
Student Attack.intensity Expression O.Intensity.before.shaking
Acidity
Freshness O.Intensity.after.shaking
O.passion
O.Intensity.before.shaking_C
O.Intensity.after.shaking_S
O.plante_C O.Intensity.after.shaking_C
O.passion_S
O.passion_CO.flower Astringency_S
A.persistency Bitterness
0.5

O.citrus
Balance_S Astringency_CBitterness_C
A.intensity
O.plante Smoothness Acidity_S
Dim 2 (24.42 %)

Acidity_C A.intensity_C
O.Typicity_S O.fruity O.mushroom_S Bitterness_S
A.alcohol_C A.intensity_S
0.0

O.vanilla
O.wooded
Typical_S Balance_C O.mushroom_C A.alcohol_S
Typical_C O.alcohol
Astringency
O.Typicity_C
Visual.intensity
Grade
Sweetness_C
-0.5

Surface.feeling
Sweetness_S O.candied.fruit
O.mushroom
Sweetness Oxidation
Typicity O.alcohol_C
O.plante_S O.alcohol_S
-1.0

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

Dim 1 (42.52 %)

50 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the active variables

1.0 Expert
Consumer
Student
Acidity
O.passion
O.passion_S
0.5

O.passion_C

Acidity_S
Dim 2 (24.42 %)

Acidity_C
0.0

Sweetness_C
-0.5

Sweetness_S
Sweetness
-1.0

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

Dim 1 (42.52 %)

50 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the groups

• 2 groups are all the


1.0

more close that they


induce the same
0.8

Expert

Preference
structure
0.6
Dim 2 (24.42 %)

Student • The 1st dimension is


Label
Consumer common to all the
0.4

panels
• 2nd dimension mainly
0.2

due to the experts


0.0

0.0 0.2 0.4 0.6 0.8 1.0


• Preference linked to
Dim 1 (42.52 %) sensory description

51 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the partial points


Expert
Consumer
2 Student

S Trotignon
S Renaudie
S Michaud
1

Sauvignon
S Buisse Cristal
V Aub Marigny V Font Coteaux
Dim 2 (24.42 %)

S Buisse Domaine
Vouvray
V Font Domaine
-1

V Font Brûlés
-2

V Aub Silex
-3

-4 -2 0 2 4

Dim 1 (42.52 %)

52 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of the partial dimensions

Expert Dim2.Student
1.0

Consumer
Student Dim2.Consumer
Preference
Dim2.Expert
Label
Dim1.Label
0.5

Dim1.Consumer • The two first


Dim 2 (24.42 %)

dimensions of each
0.0

Dim1.Student
group are well projected
• Consumer has same
-0.5

Dim1.Expert
Dim1.Preference Dim2.Preference
dimensions as MFA
-1.0

-1.5 -1.0 -0.5 0.0 0.5 1.0


Dim 1 (42.52 %)

53 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of supplementary continuous variables

1.0
0.5
Dim 2 (24.42 %)

0.0
-0.5
-1.0

-1.0 -0.5 0.0 0.5 1.0

Dim 1 (42.52 %)

⇒ Preferences do not participated to the construction of the


dimensions
⇒ Preferences are linked to sensory description
54 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of supplementary continuous variables

Sauvignon Expert
S Trotignon

1.0
Vouvray Consumer
S Renaudie Student
Acidity
S Michaud O.passion
1

Sauvignon O.passion_S

0.5
V Aub Marigny O.passion_C
S Buisse Cristal
Dim 2 (24.42 %)

V Font Coteaux Acidity_S

Dim 2 (24.42 %)
Acidity_C
0

0.0
S Buisse Domaine
Vouvray
-1

V Font Domaine
V Font Brûlés
Sweetness_C

-0.5
Sweetness_S
-2

Sweetness

V Aub Silex

-1.0
-3

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5


-2 -1 0 1 2 3
Dim 1 (42.52 %) Dim 1 (42.52 %)

54 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Representation of supplementary continuous variables

⇒ Main information: the favourite is Vouvray Aubuisières Silex


54 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Helps to interpret

• Contribution of each group of variables to each component of


the MFA

> res.mfa$group$contrib • Similar contribution of the 3 groups


Dim.1 Dim.2 Dim.3 to the first dimension
Expert 30.5 46.0 33.7
Consumer 33.2 23.1 31.2 • Second dimension mainly due to the
Student 36.3 30.9 35.1 expert

• Correlation between the global cloud and each partial cloud

> res.mfa$group$correlation
Dim.1 Dim.2 Dim.3 First components are highly linked to
Expert 0.95 0.95 0.96 the 3 groups: the 3 clouds of points
Consumer 0.95 0.83 0.87
Student 0.99 0.99 0.84
are nearly homothetic

55 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Similarity measures between groups


> res.mfa$group$Lg
Expert Consumer Student Preference Label MFA
Expert 1.45 0.94 1.17 1.01 0.89 1.33
Consumer 0.94 1.25 1.04 1.11 0.28 1.21
Student 1.17 1.04 1.29 1.03 0.62 1.31
Preference 1.01 1.11 1.03 1.47 0.37 1.18
Label 0.89 0.28 0.62 0.37 1.00 0.67
MFA 1.33 1.21 1.31 1.18 0.67 1.44

> res.mfa$group$RV
Expert Consumer Student Preference Label MFA
Expert 1.00 0.70 0.85 0.69 0.74 0.92
Consumer 0.70 1.00 0.82 0.82 0.25 0.90
Student 0.85 0.82 1.00 0.75 0.55 0.96
Preference 0.69 0.82 0.75 1.00 0.31 0.81
Label 0.74 0.25 0.55 0.31 1.00 0.56
MFA 0.92 0.90 0.96 0.81 0.56 1.00

• Expert gives a richer description (Lg greater)


• Groups Student and Expert are linked (RV = 0.85)
• Group Student is the closest to the overall (RV = 0.96)
56 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Partition from the tree


Wq −Wq+1
An empirical number of clusters is suggested (minq Wq−1 −Wq )

Hierarchical Clustering

0.0 0.5 1.0 1.5 2.0


Hierarchical Classification

2.0

inertia gain
1.5
1.0
0.5
0.0

V Font Brûlés

V Aub Marigny

V Font Coteaux
V Font Domaine
V Aub Silex

S Buisse Cristal
S Trotignon

S Renaudie

S Buisse Domaine
S Michaud

57 / 58
Data Common Structure Groups Study Partial Analyses To go further Example

Partition on the principal component map


cluster 1

2
cluster 2 S Trotignon
cluster 3 S Renaudie
cluster 4
S Michaud
cluster 5
1
S Buisse Cristal V Aub Marigny
V Font Coteaux
0
Dim 2 (24.42%)

S Buisse Domaine
V Font Domaine
-1

V Font Brûlés
-2

V Aub Silex
-3

-2 -1 0 1 2 3 4
Dim 1 (42.52%)

Continuous vision (principal component) and discontinuous


(clusters)
58 / 58

You might also like