Professional Documents
Culture Documents
COMPONENT
ANALYSIS
Presented by:
Zoha Ahmed (F20604017)
Faria Shoaib (F20604022)
Qundeel Saleem(F20604028)
Eesha Noor(F20604030)
Aliza Mushtaq (F20604039)
INTRODUCTION
CONTD.
INRODUCTION
The second principal component describes the remaining variance in the data and is uncorrelated to the
first principal component.
Be aware that the PCA transformation depends on how the original variables were scaled relative to one another.
Before using PCA, data column ranges must be normalized. it can be note that the new coordinates no longer
correspond to system-produced variables.
Your data set becomes less interpretable after applying PCA.
PCA is not the transformation for your application if the results' interpretability is crucial to your analysis.
WHY PCA?
Curse of Dimensionality:
The issues that arise when dealing with high dimensional data. Some problem sets may have:
Large number of feature set
Making the model extremely slow
Even making it difficult to find a solution
Hence, More data is good but more detailed data might not be.
CONTD….
SOLUTION: DIMENSIONALITY
REDUCTION
Example
DIMENSIONAL REDUCTION TECHNIQUES
Example
WORKING
Plot some data on a graph and calculate the average measurement for the sample.
With these average values, we can calculate the center of our data and shift the data in a way, so the center of the
data is placed on the origin of the graph (0,0).
Now we will try to fit a line on this data: we start by drawing a random line that intersects the graph right in its
origin.
Then we rotate the line until the line fits the data as good as it can with the condition that the line still must go
through the origin.
In the end, some line will fit best.
PCA can find the line that maximizes the distances from the projected points to the origin.
CONTD.
WORKING
To find that line fits the data the best way possible measure the distance
from each data point to the separating line.
Then find the line that minimizes these distances.
We normalize our existing vectors in a way, so their length is one unit
long.
Such vectors are called singular vectors or eigenvectors.
Eigenvectors have a characteristic that no matter what, they always
stay at the origin. SSD is a scaling value, or, in other words, it is a
representation of eigenvalues.
The root of SSD is a singular value.
ALGORITHM
Choosing
Deriving a new components and
data set forming a feature
vector
IMPLEMENTATION
CONTD.
CONCLUSION
Dimensionality reduction is simply, the process of reducing the dimension of your feature set.
Your feature set could be a dataset with a hundred columns (i.e features) or it could be an array of points that make
up a large sphere in the three-dimensional space .
The principal components of a collection of points in a real p-space are a sequence of direction vectors, where the
vector is the direction of a line that best fits the data while being orthogonal to the first vectors.