0% found this document useful (0 votes)
20 views44 pages

ML Lab R20

The document is a lab record for the Machine Learning course at Avanthi's St. Theressa Institute of Engineering & Technology, detailing various experiments and algorithms related to machine learning. It includes a certificate section, an index of experiments, and specific programming tasks such as implementing the FIND-S algorithm, Candidate-Elimination algorithm, and decision tree ID3 algorithm. Each experiment is outlined with aims, datasets, and sample code for practical implementation.

Uploaded by

Sirela Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views44 pages

ML Lab R20

The document is a lab record for the Machine Learning course at Avanthi's St. Theressa Institute of Engineering & Technology, detailing various experiments and algorithms related to machine learning. It includes a certificate section, an index of experiments, and specific programming tasks such as implementing the FIND-S algorithm, Candidate-Elimination algorithm, and decision tree ID3 algorithm. Each experiment is outlined with aims, datasets, and sample code for practical implementation.

Uploaded by

Sirela Meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AVANTHI’S ST.

THERESSA INSTITUTE OF ENGINEERING & TECHNOLOGY

GARIVIDI
Vizianagaram Dist (AP)

MACHINE LEARNING LAB RECORD

R20 CSE (AI&ML)

SEMISTER : III – I

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE ENGIINEERING (AI&ML)
III-I CSE (AI&ML) PAGE NO:

AVANTHI’S ST.THERESSA INSTITUTE OF ENGINEERING & TECHNOLOGY


GARIVIDI
Vizianagaram Dist (AP)

CERTIFICATE
This is certify that is the bona fide record of the work done in______________________
Laboratory by Mr./Ms_______________________________________________________
Bearing Regd.No./Roll no_________________________of__________________________
Course during______________________________________________________________

Total Number of Total Number of


Experiments held__________ Experiments done___________

LAB INCHARGE HEAD OF THE DEPARTMENT

EXTERNAL EXAMINAR

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

INDEX

SNO DATE EXPERIMENTS PAGE NO REMARKS

Implement and demonstrate the FIND-S algorithm for


1 finding the most specific hypothesis based on a given
set of training data samples. Read the training data from
a .CSV file.

For a given set of training data examples stored in


a .CSV file, implement and demonstrate the
2
Candidate-Elimination algorithm to output a
description of the set of all hypotheses consistent
with the training examples.
Write a program to demonstrate the working of the
decision tree based ID3 algorithm. Use an
3
appropriate data set for building the decision tree
and apply this knowledge to classify a new
sample.
Exercises to solve the real-world problems using
the following machine learning methods:
a) Linear Regression
4
b) Logistic Regression
c) Binary Classifier

Develop a program for Bias, Variance, Remove


5 duplicates, Cross Validation

Write a program to implement Categorical


6 Encoding, One-hot Encoding

Build an Artificial Neural Network by


implementing the Back propagation algorithm and
7
test the same using appropriate data sets.
Write a program to implement k-Nearest
8 Neighbor algorithm to classify the iris data set.
Print both correct and wrong predictions.

Implement the non-parametric Locally Weighted


9 Regression algorithm in order to fit data points.
Select appropriate data set for your experiment
and draw graphs.

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Assuming a set of documents that need to be


classified, use the naïve Bayesian Classifier model
10 to perform this task. Built-in Java classes/API can
be used to write the program. Calculate the
accuracy, precision, and recall for your data set.
Apply EM algorithm to cluster a Heart Disease Data
Set. Use the same data set for clustering using k-
Means algorithm. Compare the results of these two
11
algorithms and comment on the quality of clustering.
You can add Java/Python ML library classes/API in
the program.
Exploratory Data Analysis for Classification using
12 Pandas or Matplotlib

Write a Python program to construct a Bayesian


network considering medical data. Use this model to
13 demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set.
Write a program to Implement Support Vector
14 Machines and Principle Component Analysis.

Write a program to Implement Principle Component


15 Analysis.

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Experiment-1
AIM: Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.

Data set:
sky Airtemp Humidity Wind Water Forecast EnjoySports
sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes

PROGRAM:
import csv
num_attributes = 6

a = []

print("\n The Given Training Data Set \n")

with open('enjoyysports.csv', 'r') as csvfile:

reader = csv.reader(csvfile)

for row in reader:

a.append (row)

print(row)

print("\n The initial value of hypothesis: ")

hypothesis = ['0'] * num_attributes

print(hypothesis)

for j in range(0,num_attributes):

hypothesis[j] = a[0][j];

print("\n Find S: Finding a Maximally Specific Hypothesis\n")

for i in range(0,len(a)):

if a[i][num_attributes]=='yes':

for j in range(0,num_attributes):

if a[i][j]!=hypothesis[j]: hypothesis[j]='?'

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

else :

hypothesis[j]= a[i][j]

print(" For Training instance No:{0} the hypothesis is".format(i),hypothesis)

print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")

print(hypothesis)

OUTPUT:
The Given Training Data Set
['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']
['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']
['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']
['sunny', 'warm', 'high', 'strong', 'cool', 'change',
'yes'] The initial value of hypothesis:
['0', '0', '0', '0', '0', '0']
Find S: Finding a Maximally Specific Hypothesis
For Training instance No: 0 the hypothesis is ['sunny', 'warm', 'normal', 'strong', 'warm',
'same']
For Training instance No: 1 the hypothesis is ['sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training instance No: 2 the hypothesis is ['sunny', 'warm', '?', 'strong', 'warm', 'same']
For Training instance No: 3 the hypothesis is ['sunny', 'warm', '?', 'strong', '?',
'?'] The Maximally Specific Hypothesis for a given Training Examples:
['sunny', 'warm', '?', 'strong', '?', '?']

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Experiment-2
AIM: For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.

Data set:
sky Airtemp Humidity Wind Water Forecast PlayTennis
sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes

PROGRAM:
import numpy as np

import pandas as pd

data = pd.DataFrame(data=pd.read_csv('enjoyysports.csv'))

concepts = np.array(data.iloc[:, 0:-1])

print(concepts)

target = np.array(data.iloc[:, -1])

print(target)

def learn(concepts, target):

specific_h = concepts[0].copy()

print("Initialization of specific_h and general_h")

print(specific_h)

general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]

print(general_h)

for i, h in enumerate(concepts):

if target[i] == "yes":

for x in range(len(specific_h)):

if h[x] != specific_h[x]:

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

specific_h[x] = '?'

general_h[x][x] = '?'

print("Specific Hypothesis after step", i + 1)

print(specific_h)

print("General Hypothesis after step", i + 1)

print(general_h)

if target[i] == "no":

for x in range(len(specific_h)):

if h[x] != specific_h[x]:

general_h[x][x] = specific_h[x]

else:

general_h[x][x] = '?'

indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]

for i in indices:

general_h.remove(['?', '?', '?', '?', '?', '?'])

return specific_h, general_h

s_final, g_final = learn(concepts, target)

print("Final Specific_h:", s_final, sep="\n")

print("Final General_h:", g_final, sep="\n")

OUTPUT:
[['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
['sunny' 'warm' 'high' 'strong' 'warm' 'same']
['rainy' 'cold' 'high' 'strong' 'warm' 'change']
['sunny' 'warm' 'high' 'strong' 'cool' 'change']]
['yes' 'yes' 'no' 'yes']

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Initialization of specific_h and general_h


['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Specific Hypothesis after step 1


['sunny' 'warm' 'normal' 'strong' 'warm' 'same']

General Hypothesis after step 1


[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Specific Hypothesis after step 2


['sunny' 'warm' '?' 'strong' 'warm' 'same']

General Hypothesis after step 2


[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Specific Hypothesis after step 4


['sunny' 'warm' '?' 'strong' '?' '?']

General Hypothesis after step 4


[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Final Specific_h:
['sunny' 'warm' '?' 'strong' '?' '?']

Final General_h:
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Experiment-3
AIM: Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify
a new sample.

Data set:
outlook temp humidity windy pt
sunny hot high weak no
sunny hot high strong no
overcast hot high weak yes
rainy mild high weak yes
rainy cool normal weak yes
rainy cool normal strong no
overcast cool normal strong yes
sunny mild high weak no
sunny cool normal weak yes
rainy mild normal weak yes
sunny mild normal strong yes
overcast mild high strong yes
overcast hot normal weak yes
rainy mild high strong no
PROGRAM:
import numpy as np
import math
import csv

def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',')
headers = next(datareader)
metadata = []
traindata = []
for name in headers:
metadata.append(name)
for row in datareader:
traindata.append(row)
return (metadata, traindata)

class Node:
def _init_(self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

def _str_(self):
return self.attribute

def subtables(data, col, delete):


dict = {}
items = np.unique(data[:, col])
count = np.zeros((items.shape[0], 1), dtype=np.int32)

for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1

for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):
if data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)

return items, dict

def entropy(S):
items = np.unique(S)

if items.size == 1:
return 0

counts = np.zeros((items.shape[0], 1))


sums = 0

for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts:


sums += -1 * count * math.log(count, 2)
return sums

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

def gain_ratio(data, col):


items, dict = subtables(data, col, delete=False)
total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1))
intrinsic = np.zeros((items.shape[0], 1))

for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)

total_entropy = entropy(data[:, -1])


iv = -1 * sum(intrinsic)

for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv

def create_node(data, metadata):


if (np.unique(data[:, -1])).shape[0] == 1:
node = Node("")
node.answer = np.unique(data[:, -1])[0]
return node

gains = np.zeros((data.shape[1] - 1, 1))

for col in range(data.shape[1] - 1):


gains[col] = gain_ratio(data, col)

split = np.argmax(gains)

node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)

items, dict = subtables(data, split, delete=True)

for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))

return node

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

def empty(size):
s = ""
for x in range(size):
s += " "
return s

def print_tree(node, level):


if node.answer != "":
print(empty(level), node.answer)
return
print(empty(level), node.attribute)
for value, n in node.children:
print(empty(level + 1), value)
print_tree(n, level + 2)

metadata, traindata = read_data("ID3.csv")


data = np.array(traindata)
node = create_node(data, metadata)
print_tree(node, 0)

OUTPUT:
outlook
overcast b'yes'
rainy
windy
b'strong' b'no'
b'weak'

b'yes'
sunny humidity

b'high' b'no'
b'normal'
b'yes'

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Experiment-4
AIM: Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression
b) Logistic Regression
c) Binary Classifier

a) Linear Regression:
Linear regression is probably one of the most important and widely used regression techniques. It’s
among the simplest regression methods. One of its main advantages is the ease of interpreting results.

PROGRAM:
from sklearn import datasets
from sklearn import metrics
disease = datasets.load_diabetes()
print(disease.keys())
import numpy as np
disease_X = disease.data[:, np.newaxis,2]
disease_X_train = disease_X[:-30]
disease_X_test = disease_X[-20:]
disease_Y_train = disease.target[:-30]
disease_Y_test = disease.target[-20:]
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(disease_X_train,disease_Y_train)
y_predict = reg.predict(disease_X_test)
accuracy = metrics.mean_squared_error(disease_Y_test,y_predict,)
print("accuracy=",accuracy)
weights = reg.coef_
intercept = reg.intercept_
print(weights,intercept)
import matplotlib.pyplot as plt
plt.scatter(disease_X_test, disease_Y_test)
plt.plot(disease_X_test,y_predict)
plt.show()

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

OUTPUT:
dict_keys(['data', 'target', 'frame', 'DESCR', 'feature_names', 'data_filename','target_filename',

'data_module'])

accuracy= 2561.3204277283867

[941.43097333] 153.39713623331698

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

b) Logistic Regression:
It’s a classification algorithm that is used where the target variable is of categorical nature.
The main objective behind Logistic Regression is to determine the relationship between
features and the probability of a particular outcome.

Data Set:
User ID Gender Age EstimatedSalary Purchased
15624510 Male 19 19000 0
15810944 Male 25 20000 0
15668575 Female 26 43000 0
15603246 Female 27 57000 0
15804002 Male 19 76000 0
15728773 Male 27 58000 0
15598044 Female 27 84000 0
15694829 Female 32 150000 1
15600575 Male 25 33000 0
15727311 Female 35 65000 0
15570769 Female 26 80000 0
15606274 Female 26 52000 0
15746139 Male 20 86000 0
15704987 Male 32 18000 0
15628972 Male 18 82000 0
15697686 Male 29 80000 0
15733883 Male 47 25000 1
15617482 Male 45 26000 1
15704583 Male 46 28000 1

PROGRAM:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

dataset = pd.read_csv("User_Data.csv")
x = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

from sklearn.model_selection import train_test_split


xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.25, random_state=0)

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

from sklearn.preprocessing import StandardScaler


sc_x = StandardScaler()
xtrain = sc_x.fit_transform(xtrain)
xtest = sc_x.transform(xtest)
print (xtrain[0:10, :])

from sklearn.linear_model import LogisticRegression


classifier = LogisticRegression(random_state = 0)
classifier.fit(xtrain, ytrain)
y_pred = classifier.predict(xtest)

from sklearn.metrics import confusion_matrix


cm = confusion_matrix(ytest, y_pred)
print ("Confusion Matrix : \n", cm)

from sklearn.metrics import accuracy_score


print ("Accuracy : ", accuracy_score(ytest, y_pred))

from matplotlib.colors import ListedColormap


X_set, y_set = xtest, ytest
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1,
stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1,
stop = X_set[:, 1].max() + 1, step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),


X2.ravel()]).T).reshape(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Classifier (Test set)')


plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Output:
[[ 2.149452 -1.02601437]
[-0.28717375 0.70708966]
[-1.26182405 0.4720925 ]
[-0.40900504 -0.49727077]
[-0.28717375 -0.0566511 ]
[ 0.32198269 -1.23163688]
[ 0.68747655 0.14897141]
[ 0.32198269 2.6458162 ]
[ 1.90578942 -0.99663973]
[-0.40900504 -0.23289897]]
Confusion Matrix :
[[4 0]
[0 1]]
Accuracy : 1.0

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

c) Binary Classifier:
 In machine learning, binary classification is a supervised learning algorithm that
categorizes new observations into one of two classes.
 If the model successfully predicts the patients as positive, this case is called True
Positive (TP). If the model successfully predicts patients as negative, this is called
True Negative (TN).
 The binary classifier may misdiagnose some patients as well. If a diseased patient is
classified as healthy by a negative test result, this error is called False Negative (FN).
 Similarly, If a healthy patient is classified as diseased by a positive test result, this
error is called False Positive (FP).

PROGRAM:

from numpy import where

from collections import Counter

from sklearn.datasets import make_blobs

from matplotlib import pyplot

X, y = make_blobs(n_samples=1000, centers=2, random_state=1)

print(X.shape, y.shape)

counter = Counter(y)

print(counter)

for i in range(10):

print(X[i], y[i])

for label, _ in counter.items():

row_ix = where(y == label)[0]

pyplot.scatter(X[row_ix, 0], X[row_ix, 1],label=str(label))

pyplot.legend()

pyplot.show()

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

OUTPUT:
(1000, 2) (1000,)
Counter({0: 500, 1: 500})
[-3.05837272 4.48825769] 0
[-8.60973869 -3.72714879] 1
[1.37129721 5.23107449] 0
[-9.33917563 -2.9544469 ] 1
[-11.57178593 -3.85275513] 1
[-11.42257341 -4.85679127] 1
[-10.44518578 -3.76476563] 1
[-10.44603561 -3.26065964] 1
[-0.61947075 3.48804983] 0
[-10.91115591 -4.5772537] 1

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Experiment-5

AIM: Develop a program for Bias, Variance, Remove duplicates, Cross Validation.
PROGRAM:
from mlxtend.evaluate import bias_variance_decomp

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

import warnings

warnings.filterwarnings('ignore')

from sklearn.datasets import fetch_california_housing

housing=fetch_california_housing()

from sklearn import metrics

X, y = fetch_california_housing(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

model_lr = LinearRegression()

mse, bias, var = bias_variance_decomp(model_lr, X_train, y_train,

X_test, y_test, loss='mse',

num_rounds=200, random_seed=123)

y_pred=model_lr.predict(X_test)

print('MSE from bias_variance lib [avg expected loss]: %.3f' % mse)

print('Avg Bias: %.3f' % bias)

print('Avg Variance: %.3f' % var)

print('Mean Square error by Sckit-learn lib: %.3f' % metrics.mean_squared_error(y_test,y_pred))

OUTPUT:
MSE from bias_variance lib [avg expected loss]: 0.527
Avg Bias: 0.525
Avg Variance: 0.002
Mean Square error by Sckit-learn lib: 0.527

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Experiment-6
AIM: Write a program to implement Categorical Encoding, One-hot Encoding.

Data set:
Sno Empid Gender Remarks
0 45 male Nice
1 78 female Good
2 56 female Great
3 12 male Great
4 7 female Nice

PROGRAM:
import pandas as pd

from numpy import asarray

from sklearn.preprocessing import OneHotEncoder

data = pd.read_csv("emp.csv")

print(data.head())

print(data['Gender'].unique())

print(data['Remarks'].unique())

print(data['Gender'].value_counts())

print(data['Remarks'].value_counts())

encoder = OneHotEncoder(sparse=False)

onehot = encoder.fit_transform(data)

print(onehot)

OUTPUT:
Sno Empid Gender Remarks
0 0 45 male Nice
1 1 78 female Good
2 2 56 female Great
3 3 12 male Great
4 4 7 female Nice

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

['male' 'female']
['Nice' 'Good' 'Great']
female 3
male 2
Name: Gender, dtype: int64
Nice 2
Great 2
Good 1
Name: Remarks, dtype: int64
[[1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1.]
[0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0.]
[0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 1. 0.]
[0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1. 0.]
[0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 1. 0. 0. 0. 1.]]

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Experiment-7
AIM: Build an Artificial Neural Network by implementing the Back propagation algorithm
and test the same using appropriate data sets.

PROGRAM:
import numpy as np

X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)

y = np.array(([92], [86], [89]), dtype=float)

X = X/np.amax(X,axis=0)

y = y/100

def sigmoid (x):

return 1/(1 + np.exp(-x))

def derivatives_sigmoid(x):

return x * (1 - x)

epoch=1

lr=0.1

inputlayer_neurons = 2

hiddenlayer_neurons = 3

output_neurons = 1

wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))

bh=np.random.uniform(size=(1,hiddenlayer_neurons))

wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))

bout=np.random.uniform(size=(1,output_neurons))

for i in range(epoch):

hinp1=np.dot(X,wh)

hinp=hinp1 + bh

hlayer_act = sigmoid(hinp)

outinp1=np.dot(hlayer_act,wout)

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

outinp= outinp1+ bout

output = sigmoid(outinp)

EO = y-output

outgrad = derivatives_sigmoid(output)

d_output = EO* outgrad

EH = d_output.dot(wout.T)

hiddengrad = derivatives_sigmoid(hlayer_act)

d_hiddenlayer = EH * hiddengrad

wout += hlayer_act.T.dot(d_output) *lr

bout += np.sum(d_output, axis=0,keepdims=True) *lr

wh += X.T.dot(d_hiddenlayer) *lr

bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr

print("Input: \n" + str(X))

print("Actual Output: \n" + str(y))

print("Predicted Output: \n" ,output)

OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.75910827]
[0.75067151]
[0.76258194]]

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Exercise-8
AIM: Write a program to implement k-Nearest Neighbor algorithm to classify the iris data
set. Print both correct and wrong predictions.

PROGRAM:
import pandas as pd
from sklearn.datasets import load_iris
iris=load_iris()
iris.keys()
df=pd.DataFrame(iris['data'])
print(df)
print(iris['target_names'])
X=df
y=iris['target']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
from sklearn.neighbors import KNeighborsClassifier
knn=KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train,y_train)
import numpy as np
import warnings
warnings.filterwarnings('ignore')
x_new=np.array([[5,2.9,1,0.2]])
prediction=knn.predict(x_new)
iris['target_names'][prediction]
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
y_pred=knn.predict(X_test)
cm=confusion_matrix(y_test,y_pred)
print(cm)
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

OUTPUT:
0 1 2 3

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

.. ... ... ... ...

145 6.7 3.0 5.2 2.3

146 6.3 2.5 5.0 1.9

147 6.5 3.0 5.2 2.0

148 6.2 3.4 5.4 2.3

149 5.9 3.0 5.1 1.8

[150 rows x 4 columns]

['setosa' 'versicolor' 'virginica']

[[19 0 0]

[ 0 15 0]

[ 0 1 15]]

correct predicition 0.98

worng predicition 0.020000000000000018

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Experiment-9
AIM: Implement the non-parametric Locally Weighted Regression algorithm in order to fit
data points. Select appropriate data set for your experiment and draw graphs.

PROGRAM:
from math import ceil

import numpy as np

from scipy import linalg

def lowess(x, y, f, iterations):

n = len(x)

r = int(ceil(f * n))

h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)]

w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0)

w = (1 - w ** 3) ** 3

yest = np.zeros(n)

delta = np.ones(n)

for iteration in range(iterations):

for i in range(n):

weights = delta * w[:, i]

b = np.array([np.sum(weights * y), np.sum(weights * y * x)])

A = np.array([[np.sum(weights), np.sum(weights * x)],

[np.sum(weights * x), np.sum(weights * x * x)]])

beta = linalg.solve(A, b)

yest[i] = beta[0] + beta[1] * x[i]

residuals = y - yest

s = np.median(np.abs(residuals))

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

delta = np.clip(residuals / (6.0 * s), -1, 1)

delta = (1 - delta ** 2) ** 2

return yest

import math

n = 100

x = np.linspace(0, 2 * math.pi, n)

y = np.sin(x) + 0.3 * np.random.randn(n)

f = 0.25

iterations= 10

yest = lowess(x, y, f, iterations)

import matplotlib.pyplot as plt

plt.plot(x,y,"r.")

plt.plot(x,yest,"b-")

OUTPUT:
[<matplotlib.lines.Line2D at 0x1755ac36cd0>]

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Exercise-10
AIM: Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write the
program. Calculate the accuracy, precision, and recall for your data set.

Data set:
I love this sandwich pos
This is an amazing place pos
I feel very good about these beers pos
This is my best work pos
What an awesome view pos
I do not like this restaurant neg
I am tired of this stuff neg
I can't deal with this neg
He is my sworn enemy neg
My boss is horrible neg
This is an awesome place pos

PROGRAM:
import pandas as pd

msg = pd.read_csv('ex10.csv', names=['message', 'label'])

print("Total Instances of Dataset: ", msg.shape[0])

msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

import warnings

warnings.filterwarnings("ignore")

X = msg.message

y = msg.labelnum

from sklearn.model_selection import train_test_split

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)

from sklearn.feature_extraction.text import CountVectorizer

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

count_v = CountVectorizer()

Xtrain_dm = count_v.fit_transform(Xtrain)

Xtest_dm = count_v.transform(Xtest)

df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())

print(df[0:5])

from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB()

clf.fit(Xtrain_dm, ytrain)

pred = clf.predict(Xtest_dm)

for doc, p in zip(Xtrain, pred):

p = 'pos' if p == 1 else 'neg'

print("%s -> %s" % (doc, p))

from sklearn.metrics import accuracy_score, confusion_matrix, precision_score,

recall_score

print('Accuracy Metrics: \n')

print('Accuracy: ', accuracy_score(ytest, pred))

print('Recall: ', recall_score(ytest, pred))

print('Precision: ', precision_score(ytest, pred))

print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

OUTPUT:
Total Instances of Dataset: 11

about am an awesome beers best boss can deal do ... restaurant \

0 1 0 0 0 1 0 0 0 0 0 ... 0

1 0 0 0 0 0 1 0 0 0 0 ... 0

2 0 0 0 0 0 0 0 1 1 0 ... 0

3 0 0 1 1 0 0 0 0 0 0 ... 0

4 0 0 1 1 0 0 0 0 0 0 ... 0

stuff these this tired very view what with work

0 0 1 0 0 1 0 0 0 0

1 0 0 1 0 0 0 0 0 1

2 0 0 1 0 0 0 0 1 0

3 0 0 0 0 0 1 1 0 0

4 0 0 1 0 0 0 0 0 0

[5 rows x 29 columns]

I feel very good about these beers -> pos

This is my best work -> neg

I can't deal with this -> pos

Accuracy Metrics:

Accuracy: 0.3333333333333333

Recall: 0.5

Precision: 0.5

Confusion Matrix:

[[0 1]

[1 1]]

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Exercise-11

AIM: Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.

PROGRAM:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y = pd.DataFrame(iris.target)
y.columns = ['Targets']
model = KMeans(n_clusters=3)
model.fit(X)
plt.figure(figsize=(14,7))
colormap = np.array(['red', 'lime', 'black'])
plt.subplot(1, 3, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real Clusters')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.subplot(1, 3, 2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)
plt.title('K-Means Clustering')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
from sklearn import preprocessing
scaler = preprocessing.StandardScaler()
scaler.fit(X)

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns = X.columns)
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture(n_components=40)
gmm.fit(xs)
plt.subplot(1, 3, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[0], s=40)
plt.title('GMM Clustering')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
print('Observation: The GMM using EM algorithm based clustering
matched the true labels more closely than the Kmeans.')

OUTPUT:
Observation: The GMM using EM algorithm based clustering matched the true labels more
closely than the Kmeans.

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Exercise-12
AIM: Exploratory Data Analysis for Classification using Pandas or Matplotlib.
PROGRAM:
import pandas as pd

import matplotlib.pyplot as pp

df = pd.DataFrame({ "nums" : list(range(1,8)),

"sqrs" : [i**2 for i in range(1,8)]})

print("\nThe data is : \n",df)

print("\nCalling head : \n",df.head())

print("\nCalling head with 'n' : \n",df.head(3))

print("\nCalling tail : \n",df.tail())

print("\nCalling tail with 'n' : \n",df.tail(3))

print("\nGetting the description of our dataset : \n",df.describe())

print("\nGetting all correlations of our dataset : \n",df.corr())

for i in df:

print(f"\nFor {i} : \n")

df[i].hist()

pp.show()

OUTPUT:
The data is :
nums sqrs
0 1 1
1 2 4
2 3 9
3 4 16
4 5 25
5 6 36
6 7 49

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Calling head :
nums sqrs
0 1 1
1 2 4
2 3 9
3 4 16
4 5 25

Calling head with 'n' :


nums sqrs
0 1 1
1 2 4
2 3 9

Calling tail :
nums sqrs
2 3 9
3 4 16
4 5 25
5 6 36
6 7 49

Calling tail with 'n' :


nums sqrs
4 5 25
5 6 36
6 7 49

Getting the description of our dataset :


nums sqrs
count 7.000000 7.000000
mean 4.000000 20.000000
std 2.160247 17.682383
min 1.000000 1.000000
25% 2.500000 6.500000
50% 4.000000 16.000000
75% 5.500000 30.500000
max 7.000000 49.000000

Getting all correlations of our dataset :


nums sqrs
nums 1.000000 0.977356
sqrs 0.977356 1.000000

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

For nums :

For sqrs :

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Exercise-13
AIM: Write a Python program to construct a Bayesian network considering medical data.
Use this model to demonstrate the diagnosis of heart patients using standard Heart Disease
Data Set.

Data set:
age Gender Family diet Lifestyle cholestrol heartdisease
0 0 1 1 3 0 1
0 1 1 1 3 0 1
1 0 0 0 2 1 1
4 0 1 1 3 2 0
3 1 1 0 0 2 0
2 0 1 1 1 0 1
4 0 1 0 2 0 1
0 0 1 1 3 0 1
3 1 1 0 0 2 0
1 1 0 0 0 2 1
4 1 0 1 2 0 1
4 0 1 1 3 2 0
2 1 0 0 0 0 0
2 0 1 1 1 0 1
3 1 1 0 0 1 0
0 0 1 0 0 2 1
1 1 0 1 2 1 1
3 1 1 1 0 1 0
4 0 1 1 3 2 0
PROGRAM:
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination
data = pd.read_csv("heart.csv")
heart_disease = pd.DataFrame(data)
print(heart_disease)
model = BayesianNetwork([ ('age', 'Lifestyle'), ('Gender', 'Lifestyle'),
('Family', 'heartdisease'), ('diet', 'cholestrol'), ('Lifestyle', 'diet'),
('cholestrol', 'heartdisease'), ('diet', 'cholestrol')])
model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)
HeartDisease_infer = VariableElimination(model)

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

print('For Age enter SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Youth:3, Teen:4')


print('For Gender enter Male:0, Female:1')
print('For Family History enter Yes:1, No:0')
print('For Diet enter High:0, Medium:1')
print('for LifeStyle enter Athlete:0, Active:1, Moderate:2, Sedentary:3')
print('for Cholesterol enter High:0, BorderLine:1, Normal:2')
q = HeartDisease_infer.query(variables=['heartdisease'], evidence={'age': int(input('Enter Age:')),

'Gender': int(input('Enter Gender:')),


'Family': int(input('Enter Family History:')),
'diet': int(input('Enter Diet: ')),
'Lifestyle': int(input('Enter Lifestyle:')),
'cholestrol': int(input('Enter Cholestrol: '))})
print(q)

OUTPUT:
age Gender Family diet Lifestyle cholestrol heartdisease
0 0 0 1 1 3 0 1
1 0 1 1 1 3 0 1
2 1 0 0 0 2 1 1
3 4 0 1 1 3 2 0
4 3 1 1 0 0 2 0
5 2 0 1 1 1 0 1
6 4 0 1 0 2 0 1
7 0 0 1 1 3 0 1
8 3 1 1 0 0 2 0
9 1 1 0 0 0 2 1
10 4 1 0 1 2 0 1
11 4 0 1 1 3 2 0
12 2 1 0 0 0 0 0
13 2 0 1 1 1 0 1
14 3 1 1 0 0 1 0
15 0 0 1 0 0 2 1
16 1 1 0 1 2 1 1
17 3 1 1 1 0 1 0
18 4 0 1 1 3 2 0

For Age enter SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Yout


h:3, Teen:4
For Gender enter Male:0, Female:1
For Family History enter Yes:1, No:0
For Diet enter High:0, Medium:1
for LifeStyle enter Athlete:0, Active:1, Moderate:2, Sedentary:3
for Cholesterol enter High:0, BorderLine:1, Normal:2

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Enter Age:2
Enter Gender:1
Enter Family History:1
Enter Diet: 0
Enter Lifestyle:2
Enter Cholestrol: 2

+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.8333 |
+-----------------+---------------------+
| heartdisease(1) | 0.1667 |
+-----------------+---------------------+

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Experiment-14
AIM: Write a program to Implement Support Vector Machines and Principle Component
Analysis.

PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score
iris= pd.read_csv(r'iris.csv',header=0)
print(iris.columns)
print(iris.shape)
x= iris.drop('Species',axis=1)
iris['Species']=iris['Species'].astype('category')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
n_components = 2
pca = PCA(n_components=n_components)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
svm = SVC(kernel='linear', C=1.0)
svm.fit(X_train_pca, y_train)
y_pred = svm.predict(X_test_pca)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy of SVM with PCA:", accuracy)
plt.figure(figsize=(8, 6))
plt.scatter(X_train_pca[:, 0], X_train_pca[:, 1],
c=y_train, cmap=plt.cm.Paired, marker='o')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
h = .02

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

x_min, x_max = X_train_pca[:, 0].min() - 1, X_train_pca[:, 0].max() + 1


y_min, y_max = X_train_pca[:, 1].min() - 1, X_train_pca[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.title('SVM Decision Boundary with PCA')
plt.show()

OUTPUT:
Index(['sno', 'Id', 'SepalLengthCm', 'SepalWidthCm',

'PetalLengthCm', 'PetalWidthCm', 'Species'], dtype='object')

(150, 7)

Accuracy of SVM with PCA: 0.9666666666666667

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

Exercise-15
AIM: Write a program to Implement Principle Component Analysis.
PROGRAM:
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

'exec(%matplotlib inline)'

from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()

df = pd.DataFrame(cancer['data'], columns = cancer['feature_names'])

df.head()

from sklearn.preprocessing import StandardScaler

scalar = StandardScaler()

scalar.fit(df)

scaled_data = scalar.transform(df)

from sklearn.decomposition import PCA

components = 2

pca = PCA(n_components = 2)

pca.fit(scaled_data)

x_pca = pca.transform(scaled_data)

plt.figure(figsize =(8, 6))

plt.scatter(x_pca[:, 0], x_pca[:, 1], c = cancer['target'], cmap ='plasma')

plt.xlabel('First Principal Component')

plt.ylabel('Second Principal Component')

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:


III-I CSE (AI&ML) PAGE NO:

OUTPUT:
Text(0, 0.5, 'Second Principal Component')

ASTC MACHINE LEARNING - LAB (R20) ROLL NO:

You might also like