Professional Documents
Culture Documents
STHDA
S t at i st i cal too l s f or h ig h‐ th ro ug hp ut data analysis
Search
Connect
Home / Easy Guides / R software / Factor analysis / ade4 and factoextra : Actions menu for module Wiki
Principal Component Analysis ‐ R software and data mining
广告
ade4 and factoextra : Principal Component Analysis ‐ R software and data mining
广告 Google Data Mining SPSS Software PCA
Tools
Required packages
Prepare the data
Principal component analysis
Variances of the principal components
Extract the eigenvalues
Make a scree plot using ade4 base graphics
Make the scree plot using the package factoextra
Graph of variables : the circle of correlations
Coordinates of variables on the principal components
Graph of variables using ade4 base graph
Graph of variables using factoextra
Cos2 : quality of the representation for variables on the factor map
Contributions of the variables to the principal components
Graph of individuals
Coordinates of individuals on the principal components
Cos2 : quality of the representation for individuals on the principal components
Contribution of the individuals to the princial components
Graph of individuals using ade4 base graph
Biplot of individuals and variables using ade4
Graph of individuals using factoextra
Change the color of individuals by groups
Principal component analysis using supplementary individuals and variables
Supplementary individuals
Supplementary quantitative variables
Infos
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 1/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
This R tutorial describes how to perform a Principal Component Analysis (PCA) using R software and ade4
package.
Required packages
The package ade4 can be installed and loaded as follow :
install.packages("ade4")
library("ade4")
The package factoextra is used for the visualization of the principal component analysis results
factoextra can be installed as follow :
# install.packages("devtools")
devtools::install_github("kassambara/factoextra")
Load it :
library("factoextra")
We’ll used the data sets decathlon2 from the package factoextra :
library("factoextra")
data(decathlon2)
head(decathlon2[, 1:6])
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 2/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Only some of these individuals and variables will be used to perform the principal component analysis
(PCA).
The coordinates of the remaining individuals and variables on the factor map will be predicted after the
PCA.
Active individuals (in blue, rows 1:23) : Individuals that are used during the principal component
analysis.
Supplementary individuals (in green, rows 24:27) : The coordinates of these individuals will be predicted
using the PCA informations and parameters obtained with active individuals/variables
Active variables (in pink, columns 1:10) : Variables that are used for the principal component analysis.
Supplementary variables : As supplementary individuals, the coordinates of these variables will be
predicted also.
Supplementary continuous variables : Columns 11 and 12 corresponding respectively to the rank and the
points of athletes.
Supplementary qualitative variables : Column 13 corresponding to the two athletic meetings (2004
Olympic Game or 2004 Decastar). This factor variables will be used to color individuals by groups.
Extract only active individuals and variables for principal component analysis:
df : a data frame. Rows are individuals and columns are numeric variables
center : a logical value specifying whether the variables should be shifted to be zero centered.
scale : a logical value. If TRUE, the data are scaled to unit variance before the analysis. This
standardization to the same scale avoids some variables to become dominant just because of their large
measurement units.
scannf : a logical value specifying whether the scree plot should be displayed
nf : number of dimensions kept in the final results.
In the R code below, the PCA is performed only on the active individuals/variables :
library("ade4")
res.pca <- dudi.pca(decathlon2.active, scannf = FALSE, nf = 5)
summary(res.pca)
You can also use the package factoextra to extract the eigenvalues :
library("factoextra")
eig.val <- get_eigenvalue(res.pca)
head(eig.val)
The function scree plot() can be used to represent the amount of inertia (variance) associated with each principal
component (PC).
A simplified format is :
Example of usage :
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 5/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
You can also customize the plot using the standard barplot() function. In the R code below, we’ll draw the
percentage of variances retained by each component :
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 6/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
~60% of the information (variances) contained in the data are retained by the first two principal
components.
fviz_screeplot(res.pca, ncp=10)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 7/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
# Column coordinates
head(res.pca$co)
The function s.corcircle() can be used to plot the correlation circle. A simplified format is :
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 8/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
# Graph of variables
s.corcircle(res.pca$co)
# Default plot
fviz_pca_var(res.pca)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 9/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 10/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Read more about the function fviz_pca_var() : Graph of variables ‐ Principal Component Analysis
The cos2 and the contributions of variables (columns) / individuals (rows) are calculated using the function
inertia.dudi() as follow :
Note that, the contributions and the cos2 are printed in 1/10 000. The sign is the sign of the coordinates.
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 11/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
# squared coordinates
head(res.pca$co^2)
Using factoextra package, the color of variables can be automatically controlled by the value of their
cos2.
fviz_pca_var(res.pca, col.var="contrib")+
scale_color_gradient2(low="white", mid="blue",
high="red", midpoint=55) + theme_minimal()
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 12/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Note that, You can also use the function get_pca_var() [from factoextra package]. It provides a list of
matrices containing all the results for the active variables (coordinates, correlation between variables and
axes, squared cosine and contributions).
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 13/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
# Contributions of variables
head(var$contrib)
Using factoextra package, the color of variables can be automatically controlled by the value of their
contributions
fviz_pca_var(res.pca, col.var="contrib") +
scale_color_gradient2(low="white", mid="blue",
high="red", midpoint=50) + theme_minimal()
This is helpful to highlight the most important variables for the principal components.
The most important variables for a given PC can be visualized using the function fviz_pca_contrib()[factoextra
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 14/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
package] :
(factoextra >= 1.0.1 is required)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 15/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Read more about fviz_pca_contrib() : Principal Component Analysis: How to reveal the most important variables
in your data?
Graph of individuals
The coordinates of the individuals on the factor maps can be extracted as follow :
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 16/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
It’s also possible to use the function get_pca_ind() [from factoextra package]. factoextra provides, a list
of matrices containing all the results for the active individuals (coordinates, squared cosine and
contributions)./span>
# Contributions of individuals
head(ind$contrib)
Use the function fviz_pca_contrib()[factoextra package] to visualize the most contributing individuals :
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 17/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 18/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Read more about fviz_pca_contrib() : Principal Component Analysis: How to reveal the most important variables
in your data?
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 19/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Biplot can be drawn using the combination of the two functions below :
s.label() to plot individuals
s.arrow() to add variables
# Plot of individuals
s.label(res.pca$li, xax = 1, yax = 2)
# Add variables
s.arrow(7*res.pca$c1, add.plot = TRUE)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 20/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
scatter(res.pca)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 21/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
NULL
Note that, to remove variable labels the argument clab.col = 0 can be used.
fviz_pca_ind(res.pca)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 22/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Control automatically the color of individuals using the cos2 values (the quality of the individuals on the factor
map) :
fviz_pca_ind(res.pca, col.ind="cos2") +
scale_color_gradient2(low="white", mid="blue",
high="red", midpoint=0.50)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 23/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
fviz_pca_ind(res.pca, col.ind="cos2") +
scale_color_gradient2(low="white", mid="blue",
high="red", midpoint=0.50) + theme_minimal()
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 24/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 25/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Read more about fviz_pca_biplot() : Biplot of individuals and variables ‐ principal component analysis
The data sets decathlon2 contain a supplementary qualitative variable at columns 13 corresponding to the type
of competitions.
Qualitative variable can be helpful for interpreting the data and for coloring individuals by groups :
The function s.class() can be used to visualize the classes (groups) of points :
dfxy : a data frame containing the two columns for x and y axes
fac : a factor variable partitioning the individuals in classes
xax, yax : a numeric value specifying the column number containing x and y values
col : a vector of colors used to draw each class in a different color
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 26/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 27/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
# Make a biplot
# clab.row : hide the label for rows (individuals)
res <- scatter(res.pca, clab.row = 0, posieig = "none")
s.class(res.pca$li, fac = quali.sup, col = c("blue", "red"),
add.plot = TRUE)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 28/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 29/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 30/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 31/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
data(iris)
head(iris)
Now, let’s :
make a biplot of individuals and variables
change the color of individuals by groups
change the transparency of variable colors by their contribution values
show only the labels for variables
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 32/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
fviz_pca_biplot(iris.pca,
habillage = iris$Species, addEllipses = TRUE,
col.var = "red", alpha.var ="cos2",
label = "var") +
scale_color_brewer(palette="Dark2")+
theme_minimal()
Ascolumns
described above, the data sets decathlon2 contain supplementary continuous variables (quanti.sup,
11:12), supplementary qualitative variables (quali.sup, column 13) and supplementary
individuals (ind.sup, rows 24:27)
Supplementary variables / individuals are not used to compute the principal component. Their coordinates are
predicted using only the information provided by the performed principal component analysis on active variables /
individuals.
The functions suprow() and supcol() [in ade4 package] are used to calculate the coordinates of supplementary
rows (individuals) and columns (variables), respectively.
The simplified formats are :
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 33/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Supplementary individuals
# coordinates
ind.sup.coord <- ind.sup.pca$lisup
head(ind.sup.coord)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 34/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
How to calculate the cos2 (quality of the representation) for supplementary individuals?
cos2.func <-function(x){x^2/sum(x^2)}
ind.sup.cos2 <- t(apply(ind.sup.coord, 1, cos2.func))
head(ind.sup.cos2)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 35/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Rank Points
SEBRLE 1 8217
CLAY 2 8122
BERNARD 4 8067
YURKOV 5 8036
ZSIVOCZKY 7 8004
McMULLEN 8 7995
Remember that, rows 24:27 are supplementary individuals. We don’t want them in this current analysis.
This is why, I extracted only rows 1:23.
# coordinates
quanti.coord <- quanti.pca$cosup
head(quanti.coord)
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 36/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Infos
This analysis has been performed using R software (ver. 3.1.2) and ggplot2 (ver. 1.0.0)
广告 Google Graph Software Eclipse Software Package Software
Share 0 Share 12
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 37/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
广告
Want to Learn More on R Programming and Data Science?
Follow us by Email
Subscribe
by FeedBurner
On Social Networks:
on Social Networks
Get involved :
Click to follow us on Facebook and Google+ :
Comment this article by clicking on "Discussion" button (top‐right position of this page)
Sign up as a member and post news and articles on STHDA web site.
Suggestions
Principal component analysis in R : prcomp() vs. princomp() ‐ R software and data mining
Correspondence Analysis in R: The Ultimate Guide for the Analysis, the Visualization and the Interpretation ‐ R
software and data mining
Principal Component Analysis: How to reveal the most important variables in your data? ‐ R software and data
mining
FactoMineR and factoextra : Principal Component Analysis Visualization ‐ R software and data mining
Multiple Correspondence Analysis Essentials: Interpretation and application to investigate the associations
between categories of multiple qualitative variables ‐ R software and data mining
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 38/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Principal component analysis : the basics you should read ‐ R software and data mining
Correspondence analysis basics ‐ R software and data mining
Factor analysis
ca package and factoextra : Correspondence Analysis ‐ R software and data mining
ade4 and factoextra : Correspondence Analysis ‐ R software and data mining
MASS package and factoextra : Correspondence Analysis ‐ R software and data mining
License
(Click on the image below)
Welcome!
Want to Learn More on R Programming and Data Science?
Follow us by Email
Subscribe
by FeedBurner
on Social Networks
R Basics
Importing Data
Exporting Data
Reshaping Data
Data Manipulation
Data Visualization
Basic Statistics
Cluster Analysis
广告 Google
R Software
Data Mining Software
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 39/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
factoextra
survminer
ggpubr
ggcorrplot
Forum
Contact
广告
Fastest VPN
for China
R Books
Cluster Analysis Book
ggplot2 Book
3D Plots in R
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 41/42
2017/4/9 ade4 and factoextra : Principal Component Analysis R software and data mining Easy Guides Wiki STHDA
Guest Book
If you like this web site or if you have a suggestion, let us know. This encourages us to continue....
By kassambara
Guest Book
R‐Bloggers
Newsletter alboukadel.kassambara@gmail.com
http://www.sthda.com/english/wiki/ade4andfactoextraprincipalcomponentanalysisrsoftwareanddatamining 42/42