Professional Documents
Culture Documents
Analysis(PCA)
What is it?
• It is a dimensionality reduction method.
• It analyze the data and generate the most important part of it.
• It is used :
• to analyze all the dimensions of a given data and
• Reduce the dimensions as much as possible without incurring a
significant information loss.
• E.g. In machine learning an nD (n>3) feature vector can be
transformed into a 2D or 3D feature vector.
Why do we use it?
• In Machine learning:
• As unsupervised learning:
for data analysis ,
for data compression, i.e., to compress information,
to group data into dependent and independent variables.
• As Preprocessing for both supervised and unsupervised learning:
in data cleaning, data de-noising and preparing a good feature vector for training.
• Used as a preprocessing in:
face recognition,
Image classification,
pattern recognition,
Clustering, etc
Steps in PCA
1. Standardizing the range of values in the data or feature vector.
2. Compute the Covariance matrix to identify correlations.
3. Compute the eigenvalues and eigenvectors of the covariance matrix to identify the principal
components.
4. Create a feature vector to decide the principal components.
5. Recast the data along the principal components axes.
Compute Create a
Standardizing Compute the Recast the
eigenvalues feature
the range of Covariance data
and vector
values matrix
eigenvectors
Standardizing
• The range of variables is calculated and standardized
To analyze the contribution of each variable equally.
Let
𝑥 is a value in a given input data or feature vector.
𝑥ҧ is the mean of the input data or feature vector.
𝑥−𝑥ҧ
𝜎(𝑥) is the standard deviation of the input data given by , where 𝑛 is the size
𝑛
or number of values(terms) in the input data.
Then, the standardization or transformation is done using the z-score:
𝑥 − 𝑥ҧ
𝑧=
𝜎(𝑥)
The standardized data will have a zero mean and a 1 standard deviation.
Example
• Let us assume a given dataset has four features(F1,F2,F3, and F4) with
the following data collected.
F1 F2 F3 F4
1 5 3 1
4 2 6 3
1 4 3 2
4 4 1 1
5 5 2 3
F1 F2 F3 F4 v1 v1
-1.0695 0.8196 0 -1 0.515514 -0.623014
0.5347 -1.6393 1.6042 1
-0.616625 0.113105
-1.0695 0 0 0
0.399314 0.744256
0.5347 0 -1.0695 -1
1.0695 0.8196 -0.5347 1 0.441098 0.212477
Example