ML Lab R20
ML Lab R20
GARIVIDI
Vizianagaram Dist (AP)
SEMISTER : III – I
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE ENGIINEERING (AI&ML)
III-I CSE (AI&ML) PAGE NO:
CERTIFICATE
This is certify that is the bona fide record of the work done in______________________
Laboratory by Mr./Ms_______________________________________________________
Bearing Regd.No./Roll no_________________________of__________________________
Course during______________________________________________________________
EXTERNAL EXAMINAR
INDEX
Experiment-1
AIM: Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.
Data set:
sky Airtemp Humidity Wind Water Forecast EnjoySports
sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes
PROGRAM:
import csv
num_attributes = 6
a = []
reader = csv.reader(csvfile)
a.append (row)
print(row)
print(hypothesis)
for j in range(0,num_attributes):
hypothesis[j] = a[0][j];
for i in range(0,len(a)):
if a[i][num_attributes]=='yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]: hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")
print(hypothesis)
OUTPUT:
The Given Training Data Set
['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']
['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']
['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']
['sunny', 'warm', 'high', 'strong', 'cool', 'change',
'yes'] The initial value of hypothesis:
['0', '0', '0', '0', '0', '0']
Find S: Finding a Maximally Specific Hypothesis
For Training instance No: 0 the hypothesis is ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']
For Training instance No: 1 the hypothesis is ['sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training instance No: 2 the hypothesis is ['sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training instance No: 3 the hypothesis is ['sunny', 'warm', '?', 'strong', '?',
'?'] The Maximally Specific Hypothesis for a given Training Examples:
['sunny', 'warm', '?', 'strong', '?', '?']
Experiment-2
AIM: For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.
Data set:
sky Airtemp Humidity Wind Water Forecast PlayTennis
sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes
PROGRAM:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('enjoyysports.csv'))
print(concepts)
print(target)
specific_h = concepts[0].copy()
print(specific_h)
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
print(specific_h)
print(general_h)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
OUTPUT:
[['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
['sunny' 'warm' 'high' 'strong' 'warm' 'same']
['rainy' 'cold' 'high' 'strong' 'warm' 'change']
['sunny' 'warm' 'high' 'strong' 'cool' 'change']]
['yes' 'yes' 'no' 'yes']
Final Specific_h:
['sunny' 'warm' '?' 'strong' '?' '?']
Final General_h:
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]
Experiment-3
AIM: Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify
a new sample.
Data set:
outlook temp humidity windy pt
sunny hot high weak no
sunny hot high strong no
overcast hot high weak yes
rainy mild high weak yes
rainy cool normal weak yes
rainy cool normal strong no
overcast cool normal strong yes
sunny mild high weak no
sunny cool normal weak yes
rainy mild normal weak yes
sunny mild normal strong yes
overcast mild high strong yes
overcast hot normal weak yes
rainy mild high strong no
PROGRAM:
import numpy as np
import math
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',')
headers = next(datareader)
metadata = []
traindata = []
for name in headers:
metadata.append(name)
for row in datareader:
traindata.append(row)
return (metadata, traindata)
class Node:
def _init_(self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""
def _str_(self):
return self.attribute
for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1
for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):
if data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)
def entropy(S):
items = np.unique(S)
if items.size == 1:
return 0
for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)
for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv
split = np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))
return node
def empty(size):
s = ""
for x in range(size):
s += " "
return s
OUTPUT:
outlook
overcast b'yes'
rainy
windy
b'strong' b'no'
b'weak'
b'yes'
sunny humidity
b'high' b'no'
b'normal'
b'yes'
Experiment-4
AIM: Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression
b) Logistic Regression
c) Binary Classifier
a) Linear Regression:
Linear regression is probably one of the most important and widely used regression techniques. It’s
among the simplest regression methods. One of its main advantages is the ease of interpreting results.
PROGRAM:
from sklearn import datasets
from sklearn import metrics
disease = datasets.load_diabetes()
print(disease.keys())
import numpy as np
disease_X = disease.data[:, np.newaxis,2]
disease_X_train = disease_X[:-30]
disease_X_test = disease_X[-20:]
disease_Y_train = disease.target[:-30]
disease_Y_test = disease.target[-20:]
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(disease_X_train,disease_Y_train)
y_predict = reg.predict(disease_X_test)
accuracy = metrics.mean_squared_error(disease_Y_test,y_predict,)
print("accuracy=",accuracy)
weights = reg.coef_
intercept = reg.intercept_
print(weights,intercept)
import matplotlib.pyplot as plt
plt.scatter(disease_X_test, disease_Y_test)
plt.plot(disease_X_test,y_predict)
plt.show()
OUTPUT:
dict_keys(['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_filename','target_filename',
'data_module'])
accuracy= 2561.3204277283867
[941.43097333] 153.39713623331698
b) Logistic Regression:
It’s a classification algorithm that is used where the target variable is of categorical nature.
The main objective behind Logistic Regression is to determine the relationship between
features and the probability of a particular outcome.
Data Set:
User ID Gender Age EstimatedSalary Purchased
15624510 Male 19 19000 0
15810944 Male 25 20000 0
15668575 Female 26 43000 0
15603246 Female 27 57000 0
15804002 Male 19 76000 0
15728773 Male 27 58000 0
15598044 Female 27 84000 0
15694829 Female 32 150000 1
15600575 Male 25 33000 0
15727311 Female 35 65000 0
15570769 Female 26 80000 0
15606274 Female 26 52000 0
15746139 Male 20 86000 0
15704987 Male 32 18000 0
15628972 Male 18 82000 0
15697686 Male 29 80000 0
15733883 Male 47 25000 1
15617482 Male 45 26000 1
15704583 Male 46 28000 1
PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
dataset = pd.read_csv("User_Data.csv")
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
Output:
[[ 2.149452 -1.02601437]
[-0.28717375 0.70708966]
[-1.26182405 0.4720925 ]
[-0.40900504 -0.49727077]
[-0.28717375 -0.0566511 ]
[ 0.32198269 -1.23163688]
[ 0.68747655 0.14897141]
[ 0.32198269 2.6458162 ]
[ 1.90578942 -0.99663973]
[-0.40900504 -0.23289897]]
Confusion Matrix :
[[4 0]
[0 1]]
Accuracy : 1.0
c) Binary Classifier:
In machine learning, binary classification is a supervised learning algorithm that
categorizes new observations into one of two classes.
If the model successfully predicts the patients as positive, this case is called True
Positive (TP). If the model successfully predicts patients as negative, this is called
True Negative (TN).
The binary classifier may misdiagnose some patients as well. If a diseased patient is
classified as healthy by a negative test result, this error is called False Negative (FN).
Similarly, If a healthy patient is classified as diseased by a positive test result, this
error is called False Positive (FP).
PROGRAM:
print(X.shape, y.shape)
counter = Counter(y)
print(counter)
for i in range(10):
print(X[i], y[i])
pyplot.legend()
pyplot.show()
OUTPUT:
(1000, 2) (1000,)
Counter({0: 500, 1: 500})
[-3.05837272 4.48825769] 0
[-8.60973869 -3.72714879] 1
[1.37129721 5.23107449] 0
[-9.33917563 -2.9544469 ] 1
[-11.57178593 -3.85275513] 1
[-11.42257341 -4.85679127] 1
[-10.44518578 -3.76476563] 1
[-10.44603561 -3.26065964] 1
[-0.61947075 3.48804983] 0
[-10.91115591 -4.5772537] 1
Experiment-5
AIM: Develop a program for Bias, Variance, Remove duplicates, Cross Validation.
PROGRAM:
from mlxtend.evaluate import bias_variance_decomp
import warnings
warnings.filterwarnings('ignore')
housing=fetch_california_housing()
X, y = fetch_california_housing(return_X_y=True)
model_lr = LinearRegression()
num_rounds=200, random_seed=123)
y_pred=model_lr.predict(X_test)
OUTPUT:
MSE from bias_variance lib [avg expected loss]: 0.527
Avg Bias: 0.525
Avg Variance: 0.002
Mean Square error by Sckit-learn lib: 0.527
Experiment-6
AIM: Write a program to implement Categorical Encoding, One-hot Encoding.
Data set:
Sno Empid Gender Remarks
0 45 male Nice
1 78 female Good
2 56 female Great
3 12 male Great
4 7 female Nice
PROGRAM:
import pandas as pd
data = pd.read_csv("emp.csv")
print(data.head())
print(data['Gender'].unique())
print(data['Remarks'].unique())
print(data['Gender'].value_counts())
print(data['Remarks'].value_counts())
encoder = OneHotEncoder(sparse=False)
onehot = encoder.fit_transform(data)
print(onehot)
OUTPUT:
Sno Empid Gender Remarks
0 0 45 male Nice
1 1 78 female Good
2 2 56 female Great
3 3 12 male Great
4 4 7 female Nice
['male' 'female']
['Nice' 'Good' 'Great']
female 3
male 2
Name: Gender, dtype: int64
Nice 2
Great 2
Good 1
Name: Remarks, dtype: int64
[[1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0.]
[0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1. 0.]
[0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 0. 0. 1.]]
Experiment-7
AIM: Build an Artificial Neural Network by implementing the Back propagation algorithm
and test the same using appropriate data sets.
PROGRAM:
import numpy as np
X = X/np.amax(X,axis=0)
y = y/100
def derivatives_sigmoid(x):
return x * (1 - x)
epoch=1
lr=0.1
inputlayer_neurons = 2
hiddenlayer_neurons = 3
output_neurons = 1
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
output = sigmoid(outinp)
EO = y-output
outgrad = derivatives_sigmoid(output)
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
wh += X.T.dot(d_hiddenlayer) *lr
OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.75910827]
[0.75067151]
[0.76258194]]
Exercise-8
AIM: Write a program to implement k-Nearest Neighbor algorithm to classify the iris data
set. Print both correct and wrong predictions.
PROGRAM:
import pandas as pd
from sklearn.datasets import load_iris
iris=load_iris()
iris.keys()
df=pd.DataFrame(iris['data'])
print(df)
print(iris['target_names'])
X=df
y=iris['target']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train,y_train)
import numpy as np
import warnings
warnings.filterwarnings('ignore')
x_new=np.array([[5,2.9,1,0.2]])
prediction=knn.predict(x_new)
iris['target_names'][prediction]
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
y_pred=knn.predict(X_test)
cm=confusion_matrix(y_test,y_pred)
print(cm)
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))
OUTPUT:
0 1 2 3
[[19 0 0]
[ 0 15 0]
[ 0 1 15]]
Experiment-9
AIM: Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.
PROGRAM:
from math import ceil
import numpy as np
n = len(x)
r = int(ceil(f * n))
w = (1 - w ** 3) ** 3
yest = np.zeros(n)
delta = np.ones(n)
for i in range(n):
beta = linalg.solve(A, b)
residuals = y - yest
s = np.median(np.abs(residuals))
delta = (1 - delta ** 2) ** 2
return yest
import math
n = 100
x = np.linspace(0, 2 * math.pi, n)
f = 0.25
iterations= 10
plt.plot(x,y,"r.")
plt.plot(x,yest,"b-")
OUTPUT:
[<matplotlib.lines.Line2D at 0x1755ac36cd0>]
Exercise-10
AIM: Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write the
program. Calculate the accuracy, precision, and recall for your data set.
Data set:
I love this sandwich pos
This is an amazing place pos
I feel very good about these beers pos
This is my best work pos
What an awesome view pos
I do not like this restaurant neg
I am tired of this stuff neg
I can't deal with this neg
He is my sworn enemy neg
My boss is horrible neg
This is an awesome place pos
PROGRAM:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
X = msg.message
y = msg.labelnum
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
print(df[0:5])
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
recall_score
OUTPUT:
Total Instances of Dataset: 11
0 1 0 0 0 1 0 0 0 0 0 ... 0
1 0 0 0 0 0 1 0 0 0 0 ... 0
2 0 0 0 0 0 0 0 1 1 0 ... 0
3 0 0 1 1 0 0 0 0 0 0 ... 0
4 0 0 1 1 0 0 0 0 0 0 ... 0
0 0 1 0 0 1 0 0 0 0
1 0 0 1 0 0 0 0 0 1
2 0 0 1 0 0 0 0 1 0
3 0 0 0 0 0 1 1 0 0
4 0 0 1 0 0 0 0 0 0
[5 rows x 29 columns]
Accuracy Metrics:
Accuracy: 0.3333333333333333
Recall: 0.5
Precision: 0.5
Confusion Matrix:
[[0 1]
[1 1]]
Exercise-11
AIM: Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.
PROGRAM:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y = pd.DataFrame(iris.target)
y.columns = ['Targets']
model = KMeans(n_clusters=3)
model.fit(X)
plt.figure(figsize=(14,7))
colormap = np.array(['red', 'lime', 'black'])
plt.subplot(1, 3, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real Clusters')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.subplot(1, 3, 2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)
plt.title('K-Means Clustering')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
from sklearn import preprocessing
scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns = X.columns)
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=40)
gmm.fit(xs)
plt.subplot(1, 3, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[0], s=40)
plt.title('GMM Clustering')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
print('Observation: The GMM using EM algorithm based clustering
matched the true labels more closely than the Kmeans.')
OUTPUT:
Observation: The GMM using EM algorithm based clustering matched the true labels more
closely than the Kmeans.
Exercise-12
AIM: Exploratory Data Analysis for Classification using Pandas or Matplotlib.
PROGRAM:
import pandas as pd
import matplotlib.pyplot as pp
for i in df:
df[i].hist()
pp.show()
OUTPUT:
The data is :
nums sqrs
0 1 1
1 2 4
2 3 9
3 4 16
4 5 25
5 6 36
6 7 49
Calling head :
nums sqrs
0 1 1
1 2 4
2 3 9
3 4 16
4 5 25
Calling tail :
nums sqrs
2 3 9
3 4 16
4 5 25
5 6 36
6 7 49
For nums :
For sqrs :
Exercise-13
AIM: Write a Python program to construct a Bayesian network considering medical data.
Use this model to demonstrate the diagnosis of heart patients using standard Heart Disease
Data Set.
Data set:
age Gender Family diet Lifestyle cholestrol heartdisease
0 0 1 1 3 0 1
0 1 1 1 3 0 1
1 0 0 0 2 1 1
4 0 1 1 3 2 0
3 1 1 0 0 2 0
2 0 1 1 1 0 1
4 0 1 0 2 0 1
0 0 1 1 3 0 1
3 1 1 0 0 2 0
1 1 0 0 0 2 1
4 1 0 1 2 0 1
4 0 1 1 3 2 0
2 1 0 0 0 0 0
2 0 1 1 1 0 1
3 1 1 0 0 1 0
0 0 1 0 0 2 1
1 1 0 1 2 1 1
3 1 1 1 0 1 0
4 0 1 1 3 2 0
PROGRAM:
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination
data = pd.read_csv("heart.csv")
heart_disease = pd.DataFrame(data)
print(heart_disease)
model = BayesianNetwork([ ('age', 'Lifestyle'), ('Gender', 'Lifestyle'),
('Family', 'heartdisease'), ('diet', 'cholestrol'), ('Lifestyle', 'diet'),
('cholestrol', 'heartdisease'), ('diet', 'cholestrol')])
model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)
HeartDisease_infer = VariableElimination(model)
OUTPUT:
age Gender Family diet Lifestyle cholestrol heartdisease
0 0 0 1 1 3 0 1
1 0 1 1 1 3 0 1
2 1 0 0 0 2 1 1
3 4 0 1 1 3 2 0
4 3 1 1 0 0 2 0
5 2 0 1 1 1 0 1
6 4 0 1 0 2 0 1
7 0 0 1 1 3 0 1
8 3 1 1 0 0 2 0
9 1 1 0 0 0 2 1
10 4 1 0 1 2 0 1
11 4 0 1 1 3 2 0
12 2 1 0 0 0 0 0
13 2 0 1 1 1 0 1
14 3 1 1 0 0 1 0
15 0 0 1 0 0 2 1
16 1 1 0 1 2 1 1
17 3 1 1 1 0 1 0
18 4 0 1 1 3 2 0
Enter Age:2
Enter Gender:1
Enter Family History:1
Enter Diet: 0
Enter Lifestyle:2
Enter Cholestrol: 2
+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.8333 |
+-----------------+---------------------+
| heartdisease(1) | 0.1667 |
+-----------------+---------------------+
Experiment-14
AIM: Write a program to Implement Support Vector Machines and Principle Component
Analysis.
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score
iris= pd.read_csv(r'iris.csv',header=0)
print(iris.columns)
print(iris.shape)
x= iris.drop('Species',axis=1)
iris['Species']=iris['Species'].astype('category')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
n_components = 2
pca = PCA(n_components=n_components)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
svm = SVC(kernel='linear', C=1.0)
svm.fit(X_train_pca, y_train)
y_pred = svm.predict(X_test_pca)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy of SVM with PCA:", accuracy)
plt.figure(figsize=(8, 6))
plt.scatter(X_train_pca[:, 0], X_train_pca[:, 1],
c=y_train, cmap=plt.cm.Paired, marker='o')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
h = .02
OUTPUT:
Index(['sno', 'Id', 'SepalLengthCm', 'SepalWidthCm',
(150, 7)
Exercise-15
AIM: Write a program to Implement Principle Component Analysis.
PROGRAM:
import pandas as pd
import numpy as np
'exec(%matplotlib inline)'
cancer = load_breast_cancer()
df.head()
scalar = StandardScaler()
scalar.fit(df)
scaled_data = scalar.transform(df)
components = 2
pca = PCA(n_components = 2)
pca.fit(scaled_data)
x_pca = pca.transform(scaled_data)
OUTPUT:
Text(0, 0.5, 'Second Principal Component')