You are on page 1of 2

Ex.

No:5 Principal Component Analysis

Aim

To illustrate the functionality of Principal Component Analysis using Python

Procedure
Step 1: Standardize the dataset.
Step 2: Calculate the covariance matrix for the features in the dataset.
Step 3: Calculate the eigenvalues and eigenvectors for the covariance matrix.
Step 4: Sort eigenvalues and their corresponding eigenvectors.
Step 5: Pick k eigenvalues and form a matrix of eigenvectors.
Step 6: Transform the original matrix.

Program

from sklearn.datasets import make_classification


from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
# define dataset
X, y = make_classification(n_samples=1000, n_features=20,
n_informative=15, n_redundant=5, random_state=7)
# define the model
steps = [('pca', PCA(n_components=15)), ('m', LogisticRegression())]
model = Pipeline(steps=steps)
# fit the model on the whole dataset
model.fit(X, y)
# make a single prediction
row =
[[0.2929949,-4.21223056,-1.288332,-2.17849815,-0.64527665,2.58097719,0.284
22388,-7.1827928,-1.91211104,2.73729512,0.81395695,3.96973717,-2.66939799,
3.34692332,4.19791821,0.99990998,-0.30201875,-4.43170633,-2.82646737,0.449
16808]]
yhat = model.predict(row)
print('Predicted Class: %d' % yhat[0])
Output
Predicted Class: 1

Output Interpretation
>1 0.542 (0.048)
>2 0.713 (0.048)
>3 0.720 (0.053)
>4 0.723 (0.051)
>5 0.725 (0.052)
>6 0.730 (0.046)
>7 0.805 (0.036)
>8 0.800 (0.037)
>9 0.814 (0.036)
>10 0.816 (0.034)
>11 0.819 (0.035)
>12 0.819 (0.038)
>13 0.819 (0.035)
>14 0.853 (0.029)
>15 0.865 (0.027)
>16 0.865 (0.027)
>17 0.865 (0.027)
>18 0.865 (0.027)
>19 0.865 (0.027)
>20 0.865 (0.027)

We see a general trend of increased performance as the number of dimensions is increased. On this
dataset, the results suggest a trade-off in the number of dimensions vs. the classification accuracy of
the model.Interestingly, we don’t see any improvement beyond 15 components. This matches our
definition of the problem where only the first 15 components contain information about the class and
the remaining five are redundant.A new row of data with 20 columns is provided and is automatically
transformed to 15 components and fed to the logistic regression model in order to predict the class
label.

Result
Thus the program to illustrate Principal Component Analysis was executed successfully.

You might also like