You are on page 1of 15

PRINCIPAL

COMPONENT
ANALYSIS
Presented by:
Zoha Ahmed (F20604017)
Faria Shoaib (F20604022)
Qundeel Saleem(F20604028)
Eesha Noor(F20604030)
Aliza Mushtaq (F20604039)
INTRODUCTION

 Principal component analysis, or PCA, is a dimensionality reduction


method.
 It is often used to reduce the dimensionality of large data sets, by
transforming a large set of variables into a smaller one that still
contains most of the information in the large set.
 An orthogonal transformation is used to transform some correlated
variables into linearly uncorrelated variables. Those are called principal
components.
 Principal components show the direction of the most variation in data.
 The first, and the most important principal component, demonstrates
the maximum variance in the data.

CONTD.
INRODUCTION

 The second principal component describes the remaining variance in the data and is uncorrelated to the
first principal component.
 Be aware that the PCA transformation depends on how the original variables were scaled relative to one another.
 Before using PCA, data column ranges must be normalized. it can be note that the new coordinates no longer
correspond to system-produced variables.
 Your data set becomes less interpretable after applying PCA.
 PCA is not the transformation for your application if the results' interpretability is crucial to your analysis.
WHY PCA?

Curse of Dimensionality:
The issues that arise when dealing with high dimensional data. Some problem sets may have:
 Large number of feature set
 Making the model extremely slow
 Even making it difficult to find a solution

Hence, More data is good but more detailed data might not be.
CONTD….

SOLUTION: DIMENSIONALITY
REDUCTION

 Data can be represented by fewer dimensions.


 Reduce dimensionality by feature elimination

Example
DIMENSIONAL REDUCTION TECHNIQUES

Principal Component Analysis (PCA):


 transforms the original variables of a data set into a new set of variables called principal
components.
 the first principal component (PC1) has the largest possible variance.
 each succeeding component has the highest possible variance under the constraint that it is
orthogonal to the preceding components.
PRINCLIPLE COMPONENTS

Example
WORKING
 Plot some data on a graph and calculate the average measurement for the sample.
 With these average values, we can calculate the center of our data and shift the data in a way, so the center of the
data is placed on the origin of the graph (0,0).
 Now we will try to fit a line on this data: we start by drawing a random line that intersects the graph right in its
origin.
 Then we rotate the line until the line fits the data as good as it can with the condition that the line still must go
through the origin.
 In the end, some line will fit best.
 PCA can find the line that maximizes the distances from the projected points to the origin.

CONTD.
WORKING

 To find that line fits the data the best way possible measure the distance
from each data point to the separating line.
 Then find the line that minimizes these distances.
 We normalize our existing vectors in a way, so their length is one unit
long.
 Such vectors are called singular vectors or eigenvectors.
 Eigenvectors have a characteristic that no matter what, they always
stay at the origin. SSD is a scaling value, or, in other words, it is a
representation of eigenvalues.
 The root of SSD is a singular value.
ALGORITHM

Data Collection of Calculation of


Standardization of
Acquisition covariance eigen value and
data set
matrix eigen vector

Choosing
Deriving a new components and
data set forming a feature
vector
IMPLEMENTATION
CONTD.
CONCLUSION

 Dimensionality reduction is simply, the process of reducing the dimension of your feature set.
 Your feature set could be a dataset with a hundred columns (i.e features) or it could be an array of points that make
up a large sphere in the three-dimensional space .
 The principal components of a collection of points in a real p-space are a sequence of direction vectors, where the
vector is the direction of a line that best fits the data while being orthogonal to the first vectors.

You might also like