0% found this document useful (0 votes)

27 views24 pages

PCA with Missing Data in R Using missMDA

Uploaded by

Thierry Nesztler

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views24 pages

PCA with Missing Data in R Using missMDA

Uploaded by

Thierry Nesztler

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PCA with missing data using the

missMDA R package

François Husson

Applied Mathematics Department, Rennes Agrocampus

husson@agrocampus-ouest.fr

1 / 10
Using missMDA to deal with missing data

> library(missMDA)
> data(orange)

Color Odor Attack Sweet Acid Bitter Pulp Typicity

intensity intensity intensity
1 4.79 5.29 NA NA NA 2.83 NA 5.21
2 4.58 6.04 4.42 5.46 4.13 3.54 4.62 4.46
3 4.71 5.33 NA NA 4.29 3.17 6.25 5.17
4 6.58 6.00 7.42 4.17 6.75 NA 1.42 3.42
5 NA 6.17 5.33 4.08 NA 4.38 3.42 4.42
6 6.33 5.00 5.38 5.00 5.50 3.63 4.21 4.88
7 4.29 4.92 5.29 5.54 5.25 NA 1.29 4.33
8 NA 4.54 4.83 NA 4.96 2.92 1.54 3.96
9 4.42 NA 5.17 4.62 5.04 3.67 1.54 3.96
10 4.54 4.29 NA 5.79 4.38 NA NA 5.00
11 4.08 5.13 3.92 NA NA NA 7.33 5.25
12 6.50 5.88 6.13 4.88 5.29 4.17 1.50 3.50

2 / 10
Some (bad) easy methods
• Delete individuals or variables with missing data : usually not
a good idea
• Replace missing data with the mean (default in several
packages including FactoMineR)

3 / 10
Some (bad) easy methods
• Delete individuals or variables with missing data : usually not
a good idea
• Replace missing data with the mean (default in several
packages including FactoMineR)
> res.pca <- PCA(orange)

Variables factor map (PCA)

1.0
Odor.intensity
Odor.intensity
Individuals factor map (PCA) Pulp
Pulp Bitter

0.5
3

Dim 2 (18.32%)
5●
Typicity
Typicity
2

0.0
11 ● ●
2● Color.intensity
Color.intensity
Dim 2 (18.32%)

3● Acid
Acid
Attack.intensity
Attack.intensity
● 12

−0.5
●
0

● ●
6 9 Sweet
Sweet
●
●
1 4
−1

● ●
10 7
−1.0
−2

●
8

−4 −2 0 2 4 6 −1.0 −0.5 0.0 0.5 1.0

Dim 1 (51.45%) Dim 1 (51.45%)

● ●
2

●
● ●
●

● ●
●
●
●

● ●
● ● ●
1

●
●

●● ●
●
●
● ●
●
● ●● ● ● ●
●
● ● ●
y

●
0

● ● ● ●●● ●
● ● ●● ● ●●●● ● ● ● ● ●
●
●● ●
● ●
● ●
●
●
●
● ● ● ●
● ● ●
●
● ●
●
−1

● ●
● ●
● ●
●
● ●
● ● ●
●
●● ●
−2

−2 −1 0 1 2
x

Big distortion of links between variables 3 / 10

Iterative PCA

Ideas :
• As x and y strongly correlated : impute missing y value using
x value
• if individuals i and j have similar values for all variables,
impute missing i value using j value for that variable

⇒ takes into account global similarity between individuals and

links between variables

4 / 10
Iterative PCA
x1 x2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01

3
1.5 NA
2.0 1.98

2
1
x2

0
-1
-2

-2 -1 0 1 2 3

5 / 10
Iterative PCA
x1 x2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01

3
1.5 NA
2.0 1.98

x1 x2

2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01
1.5 0.00 1
2.0 1.98
x2

0
-1
-2

-2 -1 0 1 2 3

Initialize : impute the mean

5 / 10
Iterative PCA
x1 x2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01

3
1.5 NA
2.0 1.98

x1 x2

2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01
1.5 0.00 1
2.0 1.98
x2

x1 x2
0

-1.98 -2.04
-1.44 -1.56
0.15 -0.18
1.00 0.57
-1

2.27 1.67
-2

-2 -1 0 1 2 3

Do PCA on imputed table → axes and components ;

5 / 10
Iterative PCA
x1 x2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01

3
1.5 NA
2.0 1.98

x1 x2

2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01
1.5 0.00 1
2.0 1.98
x2

x1 x2
0

-1.98 -2.04
-1.44 -1.56
0.15 -0.18
1.00 0.57
-1

2.27 1.67
-2

-2 -1 0 1 2 3

Missing data imputed using PCA

5 / 10
Iterative PCA
x1 x2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01

3
1.5 NA
2.0 1.98

x1 x2

2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01
1.5 0.00 1
2.0 1.98
x2

x1 x2
0

-1.98 -2.04
-1.44 -1.56
0.15 -0.18
1.00 0.57
-1

2.27 1.67

x1 x2
-2.0 -2.01
-2

-1.5 -1.48
0.0 -0.01 -2 -1 0 1 2 3
1.5 0.57
2.0 1.98 x1

New imputed data table

5 / 10
Iterative PCA
x1 x2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01

3
1.5 NA
2.0 1.98

x1 x2

2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01
1.5 0.57 1
2.0 1.98
x2

0
-1

x1 x2
-2.0 -2.01
-2

-1.5 -1.48
0.0 -0.01 -2 -1 0 1 2 3
1.5 0.57
2.0 1.98 x1

5 / 10
Iterative PCA
x1 x2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01

3
1.5 NA
2.0 1.98

x1 x2

2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01
1.5 0.57 1
2.0 1.98
x2

x1 x2
0

-2.00 -2.01
-1.47 -1.52
0.09 -0.11
1.20 0.90
-1

2.18 1.78

x1 x2
-2.0 -2.01
-2

-1.5 -1.48
0.0 -0.01 -2 -1 0 1 2 3
1.5 0.90
2.0 1.98 x1

5 / 10
Iterative PCA
x1 x2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01

3
1.5 NA
2.0 1.98

x1 x2

2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01
1.5 1.48 1
2.0 1.98
x2

x1 x2
0

-1.98 -2.04
-1.44 -1.56
0.15 -0.18
1.00 0.57
-1

2.27 1.67

x1 x2
-2.0 -2.01
-2

-1.5 -1.48
0.0 -0.01 -2 -1 0 1 2 3
1.5 1.48
2.0 1.98 x1

Repeat these steps until convergence

5 / 10
Iterative PCA
x1 x2
-2.0 -2.01
-1.5 -1.48
0.0 -0.01

3
1.5 NA
2.0 1.98

2
1
x2

0
-1

x1 x2
-2.0 -2.01
-2

-1.5 -1.48
0.0 -0.01 -2 -1 0 1 2 3
1.5 1.48
2.0 1.98 x1

Do PCA on imputed data table

5 / 10
Iterative PCA

1. initialization : impute using the mean

2. Step ` :
(a) do PCA on imputed data table
S dimensions retained
(b) missing data imputed using PCA
(c) means (and standard deviations) updated
3. iterate the estimation and imputation steps

6 / 10
Iterative PCA

1. initialization : impute using the mean

2. Step ` :
(a) do PCA on imputed data table
S dimensions retained
(b) missing data imputed using PCA
(c) means (and standard deviations) updated
3. iterate the estimation and imputation steps

Overfitting problem due to believing too much in links between

variables
⇒ regularized iterative PCA

6 / 10
Running missMDA in R
> library(missMDA)
> data(orange)
> nb <- estim_ncpPCA(orange, scale=TRUE) ## Estimate no. of dimensions
> comp <- imputePCA(orange, ncp=2, scale=TRUE) ## Impute the table
> res.pca <- PCA(comp$completeObs) ## Do the PCA

> orange > comp$completeObs

Sweet Acid Bitter Pulp Typicity Sweet Acid Bitter Pulp Typicity
NA NA 2.83 NA 5.21 5.54 4.13 2.83 5.89 5.21
5.46 4.13 3.54 4.62 4.46 5.46 4.13 3.54 4.62 4.46
NA 4.29 3.17 6.25 5.17 5.45 4.29 3.17 6.25 5.17
... ...
4.88 5.29 4.17 1.50 3.50 4.88 5.29 4.17 1.50 3.50
Individuals factor map (PCA) Variables factor map (PCA)

1.0
2

5
●

Odor.intensity
3 2 Pulp
11 1 ●
●
1

● ●

0.5
6 Typicity
● Color.intensity
Bitter
Dim 2 (17.16%)

12 4 Dim 2 (17.16%)
●
0

● ●

0.0
●

Attack.intensity
Acid
10 9 Sweet
−1

● ●
−0.5

7
●

8
−2

●
−1.0

−4 −2 0 2 4 6 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Dim 1 (71.34%) Dim 1 (71.34%) 7 / 10

Is running the imputation algorithm once sufficient ?

●●

2
●
●
● ●
●

● ●
●
●
●
●
● ●
● ● ●●●
1

●●
●
●
●● ● ●
● ● ●
● ●●
●
● ● ●●
● ● ●
●
● ●
●●
●
y
0

● ●●
● ● ●
●●
● ●
●
●
●
●
●
●
● ● ●
● ● ●●●●
●
● ●
●●
−1

● ● ●
● ●
●
●
●
● ●
●
●
● ● ●
●● ●
−2

●
●

−2 −1 0 1 2
x

⇒ Reinforces links between variables 8 / 10

Visualizing uncertainty due to missing data

What confidence can we give to the results ? Idea of variance ?

⇒ A single value cannot show variability in the predicted value

9 / 10
Visualizing uncertainty due to missing data

What confidence can we give to the results ? Idea of variance ?

⇒ A single value cannot show variability in the predicted value
(F̂ Û ′ )ik

⇒ Multiple imputation : generate several plausible values for each

missing data point

9 / 10
Visualizing uncertainty due to missing data
> mi <- MIPCA(orange, scale = TRUE, ncp=2)
> plot(mi)

10 / 10
Visualizing uncertainty due to missing data
> mi <- MIPCA(orange, scale = TRUE, ncp=2)
> plot(mi)
Supplementary projection Variable representation

1.0
4

Odor.intensity
Pulp
5

0.5
2

11 1 3● 2 Typicity
Color.intensity
Dim 2 (17.17%)

Dim 2 (17.17%)
● ●
6 Bitter
● 12 4
●
0

● ●

0.0
●
10 9
● ● Attack.intensity
7 Sweet Acid
8 ●
−2

−0.5
−4

−1.0
−6 −4 −2 0 2 4 6 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Dim 1 (71.33%) Dim 1 (71.33%)

10 / 10
Visualizing uncertainty due to missing data
> mi <- MIPCA(orange, scale = TRUE, ncp=2)
> plot(mi)
Supplementary projection Variable representation

1.0
4

Odor.intensity
Pulp
5

0.5
2

11 1 3● 2 Typicity
Color.intensity
Dim 2 (17.17%)

Dim 2 (17.17%)
● ●
6 Bitter
● 12 4
●
0

● ●

0.0
●
10 9
● ● Attack.intensity
7 Sweet Acid
8 ●
−2

−0.5
−4

−1.0
−6 −4 −2 0 2 4 6 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Dim 1 (71.33%) Dim 1 (71.33%)

Projection of the Principal Components
Multiple imputation using Procrustes

5 1.0
0.5
2

11 1 3● 2
Dim 2 (17.17%)

Dim 2 (17.17%)

● ●
●
6
● 12 4
0.0

●
0

● ●
●

10 9
● ●
7
8 ●
−2

−0.5

●
−4

−1.0

−4 −2 0 2 4 6 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Dim 1 (71.33%) Dim 1 (71.33%) 10 / 10

Principal Component Analysis For Data Containing Outliers and Missing Elements
No ratings yet
Principal Component Analysis For Data Containing Outliers and Missing Elements
16 pages
A Comparison of Six Methods For Missing Data Imputation 2155 6180 1000224 PDF
No ratings yet
A Comparison of Six Methods For Missing Data Imputation 2155 6180 1000224 PDF
6 pages
Mastering Data Imputation Techniques
No ratings yet
Mastering Data Imputation Techniques
26 pages
Strategies for Missing Data in Dementia
No ratings yet
Strategies for Missing Data in Dementia
6 pages
R Package missMDA: Handle Missing Data
No ratings yet
R Package missMDA: Handle Missing Data
31 pages
R Data Imputation Techniques Guide
No ratings yet
R Data Imputation Techniques Guide
16 pages
Missing Data Analysis Using MICE
No ratings yet
Missing Data Analysis Using MICE
13 pages
PCA Analysis in R: Handling Datasets
No ratings yet
PCA Analysis in R: Handling Datasets
11 pages
Handling Missing Values in Data Analysis
No ratings yet
Handling Missing Values in Data Analysis
182 pages
MICE
No ratings yet
MICE
4 pages
DADM S5 Imputation of Missing Data
No ratings yet
DADM S5 Imputation of Missing Data
15 pages
PCA Analysis - R and Interpretation
No ratings yet
PCA Analysis - R and Interpretation
6 pages
PCA Analysis and Data Evaluation Guide
No ratings yet
PCA Analysis and Data Evaluation Guide
6 pages
PCA Guide: Usage, Python Implementation, Feature Importance
No ratings yet
PCA Guide: Usage, Python Implementation, Feature Importance
9 pages
ML Self Unit 2
No ratings yet
ML Self Unit 2
20 pages
Lab 6 - Dealing With Missing Values
No ratings yet
Lab 6 - Dealing With Missing Values
10 pages
Handling Missing Data
No ratings yet
Handling Missing Data
32 pages
R Script
No ratings yet
R Script
14 pages
Principal Component Analysis Limitations and How To Overcome Them Let's Talk A
No ratings yet
Principal Component Analysis Limitations and How To Overcome Them Let's Talk A
5 pages
Centraltendencywhattoconsider 1
No ratings yet
Centraltendencywhattoconsider 1
6 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Missing Data
No ratings yet
Missing Data
14 pages
DM Record Final
No ratings yet
DM Record Final
68 pages
Clustering and Feature Selection Using Sparse Principal Component Analysis
No ratings yet
Clustering and Feature Selection Using Sparse Principal Component Analysis
13 pages
Dunit I-Part-2
No ratings yet
Dunit I-Part-2
82 pages
Multiple Imputation Techniques with mice()
No ratings yet
Multiple Imputation Techniques with mice()
45 pages
PCA R Script
No ratings yet
PCA R Script
4 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
Factor Analysis Techniques Explained
No ratings yet
Factor Analysis Techniques Explained
15 pages
Handling Missing Data in Research
No ratings yet
Handling Missing Data in Research
58 pages
PCA for Dimension Reduction in Analytics
No ratings yet
PCA for Dimension Reduction in Analytics
25 pages
2018 Open Sesame Experimental Session 1 Data Analysis
No ratings yet
2018 Open Sesame Experimental Session 1 Data Analysis
28 pages
DAI Amberish LAB ASSIGNMENT 3
No ratings yet
DAI Amberish LAB ASSIGNMENT 3
7 pages
Exploratory Factor Analysis and Cronbach's Alpha: Questionnaire Validation Workshop, 10/10/2017, USM Health Campus
No ratings yet
Exploratory Factor Analysis and Cronbach's Alpha: Questionnaire Validation Workshop, 10/10/2017, USM Health Campus
22 pages
Factor Analysis and Reliability Report
No ratings yet
Factor Analysis and Reliability Report
15 pages
Factor Analysis Results and Insights
No ratings yet
Factor Analysis Results and Insights
7 pages
PCA and Clustering Analysis Guide
No ratings yet
PCA and Clustering Analysis Guide
20 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
PCA and K-Means for Feature Reduction
No ratings yet
PCA and K-Means for Feature Reduction
56 pages
Aman DA 111
No ratings yet
Aman DA 111
14 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
Data Exploration and Analysis Techniques
No ratings yet
Data Exploration and Analysis Techniques
23 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
27 pages
PCA Analysis for Hair Salon Data
No ratings yet
PCA Analysis for Hair Salon Data
21 pages
Platias2020 Greece
No ratings yet
Platias2020 Greece
10 pages
PCA Steps for Data Dimensionality Reduction
No ratings yet
PCA Steps for Data Dimensionality Reduction
10 pages
Comprehensive Guide to EDA Techniques
No ratings yet
Comprehensive Guide to EDA Techniques
48 pages
Machine Learning Signal Processing Homework
No ratings yet
Machine Learning Signal Processing Homework
2 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
DA Unit 2 15m Handling Missing Data
No ratings yet
DA Unit 2 15m Handling Missing Data
3 pages
EX NO:06 Simulate Dimensionality Reduction Using Pca On A Dataset Date
No ratings yet
EX NO:06 Simulate Dimensionality Reduction Using Pca On A Dataset Date
4 pages
ML Lec 4
No ratings yet
ML Lec 4
9 pages
An Innovative Imputation and Classification
No ratings yet
An Innovative Imputation and Classification
9 pages
New PCA Methods for Error Minimization
No ratings yet
New PCA Methods for Error Minimization
14 pages
8 Hron Et Al 2010
No ratings yet
8 Hron Et Al 2010
13 pages
Data Quality Review For Missing Values and Outliers
No ratings yet
Data Quality Review For Missing Values and Outliers
8 pages
Notes: Frequencies
No ratings yet
Notes: Frequencies
34 pages
UploadFile 9116
No ratings yet
UploadFile 9116
21 pages
Unit 2 Notes - Docx-3
No ratings yet
Unit 2 Notes - Docx-3
14 pages
The 3.0 L V6 TDI Engine (Generation 2) : Design and Function
100% (7)
The 3.0 L V6 TDI Engine (Generation 2) : Design and Function
48 pages
Manual JIK-6 V1ind
No ratings yet
Manual JIK-6 V1ind
46 pages
Top End Overhaul Parts List.
No ratings yet
Top End Overhaul Parts List.
3 pages
Pseudocode Algorithms for Various Calculations
No ratings yet
Pseudocode Algorithms for Various Calculations
23 pages
FDE-English EST Solved Past Paper 21 April 2024
No ratings yet
FDE-English EST Solved Past Paper 21 April 2024
11 pages
RRU5905 Technical Specifications (V100R016C10 - 01) (PDF) - en
No ratings yet
RRU5905 Technical Specifications (V100R016C10 - 01) (PDF) - en
35 pages
English Comprehension Test "The Secret Passage" 4th Grade of Elementary School
No ratings yet
English Comprehension Test "The Secret Passage" 4th Grade of Elementary School
3 pages
GEM - Training On Mivan Formwork
100% (2)
GEM - Training On Mivan Formwork
75 pages
Q2 LE English-9 Lesson-1 Week 4
No ratings yet
Q2 LE English-9 Lesson-1 Week 4
15 pages
Peran Guru dalam Pembelajaran Daring COVID-19
No ratings yet
Peran Guru dalam Pembelajaran Daring COVID-19
10 pages
Understanding Emotion in Chinese Culture Thinking Through Psychology ISBN 331918220X, 9783319182209 Multiformat Download
No ratings yet
Understanding Emotion in Chinese Culture Thinking Through Psychology ISBN 331918220X, 9783319182209 Multiformat Download
14 pages
Paper 3 Section B
No ratings yet
Paper 3 Section B
5 pages
Precipitation Titration BPharm
No ratings yet
Precipitation Titration BPharm
14 pages
Account Statement: 01-08 to 08-08-2023
No ratings yet
Account Statement: 01-08 to 08-08-2023
1 page
Infinite Love 2020
No ratings yet
Infinite Love 2020
9 pages
STIHLCARBSETTINGS
No ratings yet
STIHLCARBSETTINGS
1 page
Bunny Pattern Printing Instructions
100% (1)
Bunny Pattern Printing Instructions
3 pages
Density of Phonon States (Kittel, Ch5) : X Nka I T
No ratings yet
Density of Phonon States (Kittel, Ch5) : X Nka I T
6 pages
Worksheet - 1 2024-25
No ratings yet
Worksheet - 1 2024-25
2 pages
Multiple Access Techniques
No ratings yet
Multiple Access Techniques
8 pages
RADWIN Portfolio
No ratings yet
RADWIN Portfolio
35 pages
Power Electronics Textbooks and Resources
0% (1)
Power Electronics Textbooks and Resources
3 pages
JVVNL Bill Summary for November 2024
No ratings yet
JVVNL Bill Summary for November 2024
1 page
Formaldehyde vs Non-Formaldehyde Cross-Linkers on Cotton
No ratings yet
Formaldehyde vs Non-Formaldehyde Cross-Linkers on Cotton
9 pages
Stages of Curriculum Development
No ratings yet
Stages of Curriculum Development
11 pages
Line Graph
No ratings yet
Line Graph
2 pages
Grade 6 Ms
100% (1)
Grade 6 Ms
130 pages
Caracteristici Tehnice Scenic II
100% (1)
Caracteristici Tehnice Scenic II
4 pages
PA Office Visit Workflow Flowchart
No ratings yet
PA Office Visit Workflow Flowchart
1 page
Gas Turbine Equivalent Op Hours For Maintenance
91% (22)
Gas Turbine Equivalent Op Hours For Maintenance
10 pages