You are on page 1of 19

Principal Component

Analysis(PCA)
What is PCA?
• The main idea of Principal Component Analysis (PCA) is
to reduce the dimensionality of a data set consisting of
many variables correlated with each other, while retaining
the variation present in the dataset, up to the maximum
extent.
Need For PCA

Problem of Over fitting

Inaccurate Assessment of Target Values


Original Data Set, High Dimensional, Data Set After PCA , Two Dimensional,
Overfitted Best Fit
PCA Method
• 1. Standardize the data.
• 2. Generate a Covariance matrix
• 3. Obtain Eigenvectors and Eigenvalues from the covariance matrix.
• 4. Sort the eigenvalues in descending order.
• 5. Select the k eigenvectors with the largest eigenvalues.
• 6. Construct a new matrix with the selected k eigenvectors.

BUT DON’T WORRY WE WILL BE USING A SHORTCUT APPROACH TO PCA.


So we are going to implement
PCA on
Boston House Prices Dataset
IMPORTING THE DATASET
DATA PREPROCESSING
EXPLORATORY DATA ANALYSIS
1.Understanding values of Target column
By Plotting a Histogram
INFERENCE: We see that the values of 'target' are distributed normally with few outliers.
• Next, we create a correlation matrix that measures the linear
relationships between the variables. We will use the heatmap function
from the seaborn library to plot the correlation matrix.¶
• The correlation coefficient ranges from -1 to 1. If the value is close to
1, it means that there is a strong positive correlation between the two
variables. When it is close to -1, the variables have a strong negative
correlation.
StandardScaler is a common method used to
standardize/normalize data: the mean of the
data is subtracted from each value and divided
by the standard deviation.
Fitting of Standardized Dataset
Using PCA function of sklearn
Displaying Principal Components
Concatenating Principal Components with
Target Values

This is our required Dataset


which has Been reduced from
(506,14) to (506,3)
Plotting of Principal Components

You might also like