You are on page 1of 33

B. M.

S EVENING COLLEGE OF ENGINEERING

Bull Temple Road, Bangalore – 19

Department
of
Computer Science & Engineering

Machine Learning
Laboratory Record

Subject Code: 17CSL76

Name: _______________

USN: ________________
B M S EVENING
COLLEGE OF ENGINEERING

(Affiliated to VTU, Belagavi)

LABORATORY CERTIFICATE

This is to certify that Mr / Ms ______________________________ has


Satisfactorily completed the course of experiments in Practical
___________________________ Prescribed by the Visveswaraya
Technological University for _____________________ Semester
________________________ Course in the Laboratory of the college in
the year 2020-21.

Head of the Department Staff in-charge of the batch

Date: ____________
Particulars of the Experiments Performed

CONTENTS
Expt Date Experiment Marks Page
No. Obtained No.
01 Implement and demonstrate the FIND- 3-4
S algorithm for finding the most
specific hypothesis based on a given
set of training data samples. Read the
training data from a .CSV file.
02 For a given set of training data 5-7
examples stored in a .CSV file,
implement and demonstrate the
Candidate-Elimination algorithm to
output a description of the set of all
hypotheses consistent with the training
examples.
03 Write a program to demonstrate the 8-10
working of the decision tree based ID3
algorithm. Use an appropriate data set
for building the decision tree and apply
this knowledge to classify a new
sample.
04 Build an Artificial Neural Network by 11-12
implementing the Backpropagation
algorithm and test the same using
appropriate data sets.
05 Write a program to implement the 13-15
naïve Bayesian classifier for a sample
training data set stored as a .CSV file.
Compute the accuracy of the classifier,
considering few test data sets.
06 Assuming a set of documents that need 16-17
to be classified, use the naïve Bayesian
Classifier model to perform this task.
Built-in Java classes/API can be used
to write the program. Calculate the
accuracy, precision, and recall for your
data set.
07 Write a program to construct a 18-21
Bayesian network considering medical
data. Use this model to demonstrate the
diagnosis of heart patients using
standard Heart Disease Data Set. You
can use Java/Python ML library
classes/API.
08 Apply EM algorithm to cluster a set of 22-25
data stored in a .CSV file. Use the same
data set for clustering using k-Means
algorithm. Compare the results of these
two algorithms and comment on the
quality of clustering. You can add
Java/Python ML library classes/API in
the program.
09 Write a program to implement k- 26-27
Nearest Neighbour algorithm to
classify the iris data set. Print both
correct and wrong predictions.
Java/Python ML library classes can be
used for this problem.
10 Implement the non-parametric Locally 28-29
Weighted Regression algorithm in
order to fit data points. Select
appropriate data set for your
experiment and draw graphs.
MACHINE LEARING LABORATORY (17CSL76)

BMS EVENING COLLEGE OF ENGINEERING 1


MACHINE LEARING LABORATORY (17CSL76)

BMS EVENING COLLEGE OF ENGINEERING 2


MACHINE LEARING LABORATORY (17CSL76)

1. Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of training data samples. Read the
training data from a .CSV file.

import csv

def loadCsv(filename):
lines = csv.reader(open(filename, "rt"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = dataset[i]
return dataset

attributes = ['Sky','Temp','Humidity','Wind','Water','Forecast']
print(attributes)
n = len(attributes)
dataset = loadCsv("pgm1.csv")
print(dataset)
h=['0'] * n
print("Intial hypothesis")
print(h)
print("The hypothesis are")

for i in range(len(dataset)):
target = dataset[i][-1]
if(target == 'Yes'):
for j in range(n):
if(h[j]=='0'):
h[j] = dataset[i][j]
if(h[j]!= dataset[i][j]):
h[j]='?'
print(i+1,'=',h)

print("Final hypothesis")
print(h)

BMS EVENING COLLEGE OF ENGINEERING 3


MACHINE LEARING LABORATORY (17CSL76)

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 4


MACHINE LEARING LABORATORY (17CSL76)

2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the
set of all hypotheses consistent with the training examples.

import csv
def get_domains(examples):
d = [set() for i in examples[0]]
for x in examples:
for i, xi in enumerate(x):
d[i].add(xi)
return [list(sorted(x)) for x in d]

def more_general(h1, h2):


more_general_parts = []
for x, y in zip(h1, h2):
mg = x == "?" or (x != "0" and (x == y or y == "0"))
more_general_parts.append(mg)
return all(more_general_parts)

def fulfills(example, hypothesis):


# the implementation is the same as for hypotheses:
return more_general(hypothesis, example)

def min_generalizations(h, x):


h_new = list(h)
for i in range(len(h)):
if not fulfills(x[i:i+1], h[i:i+1]):
h_new[i] = '?' if h[i] != '0' else x[i]
return [tuple(h_new)]

def min_specializations(h, domains, x):


results = []
for i in range(len(h)):
if h[i] == "?":
for val in domains[i]:
if x[i] != val:
h_new = h[:i] + (val,) + h[i+1:]
results.append(h_new)
elif h[i] != "0":
h_new = h[:i] + ('0',) + h[i+1:]
results.append(h_new)
return results

def generalize_S(x, G, S):


S_prev = list(S)
for s in S_prev:
if s not in S:
continue
if not fulfills(x, s):

BMS EVENING COLLEGE OF ENGINEERING 5


MACHINE LEARING LABORATORY (17CSL76)

S.remove(s)
Splus = min_generalizations(s, x)
## keep only generalizations that have a counterpart in G
S.update([h for h in Splus if any([more_general(g,h) for g in G])])
## remove hypotheses less specific than any other in S
S.difference_update([h for h in S if any([more_general(h, h1) for h1 in S if h != h1])])
return S

def specialize_G(x, domains, G, S):


G_prev = list(G)
for g in G_prev:
if g not in G:
continue
if fulfills(x, g):
G.remove(g)
Gminus = min_specializations(g, domains, x)
## keep only specializations that have a conuterpart in S
G.update([h for h in Gminus if any([more_general(h, s) for s in S])])
## remove hypotheses less general than any other in G
G.difference_update([h for h in G if any([more_general(g1, h) for g1 in G if h != g1])])

return G

def candidate_elimination(examples):
domains = get_domains(examples)[:-1]
n = len(domains)
G = set([("?",)*n])
S = set([("0",)*n])
print("Maximally specific hypotheses - S ")
print("Maximally general hypotheses - G ")
i=0
print("\nS[0]:",str(S),"\nG[0]:",str(G))
for xcx in examples:
i=i+1
x, cx = xcx[:-1], xcx[-1]
if cx=='Y': # x is positive example
G = {g for g in G if fulfills(x, g)}
S = generalize_S(x, G, S)
else:
S = {s for s in S if not fulfills(x, s)}
G = specialize_G(x, domains, G, S)
print("\nS[{0}]:".format(i),S)
print("G[{0}]:".format(i),G)
return

with open('program2.csv') as csvFile:


examples = [tuple(line) for line in csv.reader(csvFile)]

candidate_elimination(examples)

BMS EVENING COLLEGE OF ENGINEERING 6


MACHINE LEARING LABORATORY (17CSL76)

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 7


MACHINE LEARING LABORATORY (17CSL76)

3. Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply
this knowledge to classify a new sample.

import math
import csv

def load_csv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
headers = dataset.pop(0)
return dataset, headers

class Node:
def __init__(self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""

def subtables(data, col, delete):


dic = {}
coldata = [ row[col] for row in data]
attr = list(set(coldata)) # All values of attribute retrived
for k in attr:
dic[k] = []
for y in range(len(data)):
key = data[y][col]
if delete:
del data[y][col]
dic[key].append(data[y])
return attr, dic

def entropy(S):
attr = list(set(S))
if len(attr) == 1: #if all are +v
return 0
counts = [0,0] # Only two values possible 'yes' or 'no'
for i in range(2):
counts[i] = sum( [1 for x in S if attr[i] == x] ) / (len(S) * 1.0)
sums = 0
for cnt in counts:
sums += -1 * cnt * math.log(cnt, 2)
return sums

def compute_gain(data, col):


attValues, dic = subtables(data, col, delete=False)
total_entropy = entropy([row[-1] for row in data])
for x in range(len(attValues)):
ratio = len(dic[attValues[x]]) / ( len(data) * 1.0)

BMS EVENING COLLEGE OF ENGINEERING 8


MACHINE LEARING LABORATORY (17CSL76)

entro = entropy([row[-1] for row in dic[attValues[x]]])


total_entropy -= ratio*entro
return total_entropy

def build_tree(data, features):


lastcol = [row[-1] for row in data]
if (len(set(lastcol))) == 1: # If all samples have same labels return that label
node=Node("")
node.answer = lastcol[0]
return node
n = len(data[0])-1
gains = [compute_gain(data, col) for col in range(n) ]
split = gains.index(max(gains)) # Find max gains and returns index
node = Node(features[split]) # 'node' stores attribute selected
#del (features[split])
fea = features[:split]+features[split+1:]
attr, dic = subtables(data, split, delete=True) # Data will be spilt in subtables
for x in range(len(attr)):
child = build_tree(dic[attr[x]], fea)
node.children.append((attr[x], child))
return node

def print_tree(node, level):


if node.answer != "":
print(" "*level, node.answer) # Displays leaf node yes/no
return
print(" "*level, node.attribute) # Displays attribute Name
for value, n in node.children:
print(" "*(level+1), value)
print_tree(n, level + 2)

def classify(node,x_test,features):
if node.answer != "":
print(node.answer)
return
pos = features.index(node.attribute)
for value, n in node.children:
if x_test[pos]==value:
classify(n,x_test,features)
''' Main program '''
dataset, features = load_csv("pgm3a.csv") # Read Tennis data
node = build_tree(dataset, features) # Build decision tree
print("The decision tree for the dataset using ID3 algorithm is ")
print_tree(node, 0)
testdata, features = load_csv("pgm3b.csv")
for xtest in testdata:
print("The test instance : ",xtest)
print("The predicted label : ", end="")
classify(node,xtest,features)

BMS EVENING COLLEGE OF ENGINEERING 9


MACHINE LEARING LABORATORY (17CSL76)

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 10


MACHINE LEARING LABORATORY (17CSL76)

4. Build an Artificial Neural Network by implementing the Backpropagation


algorithm and test the same using appropriate data sets.

import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0)
y = y/100

def sigmoid (x):


return 1/(1 + np.exp(-x))

def dersig(x):
return x * (1 - x)

e=7000
lr=0.1
iln = 2
hln = 3
oln = 1

wh=np.random.uniform(size=(iln,hln))
bh=np.random.uniform(size=(1,hln))
wout=np.random.uniform(size=(hln,oln))
bout=np.random.uniform(size=(1,oln))

for i in range(e):
h1=np.dot(X,wh)
h=h1 + bh
hla = sigmoid(h)
oi1=np.dot(hla,wout)
oi= oi1+ bout
op = sigmoid(oi)

EO = y-op
og = dersig(op)
dop = EO* og
EH = dop.dot(wout.T)
hg = dersig(hla)

BMS EVENING COLLEGE OF ENGINEERING 11


MACHINE LEARING LABORATORY (17CSL76)

dhl = EH * hg
wout += hla.T.dot(dop) *lr
wh += X.T.dot(dhl) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,op)

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 12


MACHINE LEARING LABORATORY (17CSL76)

5. Write a program to implement the naïve Bayesian classifier for a sample


training data set stored as a .CSV file. Compute the accuracy of the classifier,
considering few test data sets.

import csv
import random
import math
def loadCsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
#converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def splitDataset(dataset, splitRatio):
#67% training size
trainSize = int(len(dataset) * splitRatio);
trainSet = []
copy = list(dataset);
while len(trainSet) < trainSize:
#generate indices for the dataset list randomly to pick ele for training data
index = random.randrange(len(copy));
trainSet.append(copy.pop(index))
return [trainSet, copy]
def separateByClass(dataset):
separated = {}
#creates a dictionary of classes 1 and 0 where the values are the instacnes belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers)/float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)];
del summaries[-1]
return summaries
def summarizeByClass(dataset):
separated = separateByClass(dataset);
summaries = {}
for classValue, instances in separated.items():

BMS EVENING COLLEGE OF ENGINEERING 13


MACHINE LEARING LABORATORY (17CSL76)

#summaries is a dic of tuples(mean,std) for each class value


summaries[classValue] = summarize(instances)
return summaries
def calculateProbability(x, mean, stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
def calculateClassProbabilities(summaries, inputVector):
probabilities = {}
for classValue, classSummaries in summaries.items():#class and attribute information as mean
and sd
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i] #take mean and sd of every attribute for class 0 and 1
seperaely
x = inputVector[i] #testvector's first attribute
probabilities[classValue] *= calculateProbability(x, mean, stdev);#use normal dist
return probabilities
def predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():#assigns that class which has he highest prob
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel
def getPredictions(summaries, testSet):
predictions = []
for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
return predictions
def getAccuracy(testSet, predictions):
correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
filename = '5.csv'
splitRatio = 0.67
dataset = loadCsv(filename);
trainingSet, testSet = splitDataset(dataset, splitRatio)
print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset),len(trainingSet),
len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet);
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)

BMS EVENING COLLEGE OF ENGINEERING 14


MACHINE LEARING LABORATORY (17CSL76)

print('Accuracy of the classifier is : {0}%'.format(accuracy))


main()

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 15


MACHINE LEARING LABORATORY (17CSL76)

6. Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to
write the program. Calculate the accuracy, precision, and recall for your data
set.

import pandas as pd
msg=pd.read_csv('pgm6.csv',names=['message','label'])
print('Total instances in the dataset:',msg.shape[0])
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
Y=msg.labelnum
print('\nThe message and its label of first 5 instances are listed below')
X5, Y5 = X[0:5], msg.label[0:5]
for x, y in zip(X5,Y5):
print(x,',',y)

from sklearn.model_selection import train_test_split


xtrain,xtest,ytrain,ytest=train_test_split(X,Y)
print('\nDataset is split into Training and Testing samples')
print('Total training instances :', xtrain.shape[0])
print('Total testing instances :', xtest.shape[0])

from sklearn.feature_extraction.text import CountVectorizer


count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm = count_vect.transform(xtest)
print('\nTotal features extracted using CountVectorizer:',xtrain_dtm.shape[1])
print('\nFeatures for first 5 training instances are listed below')
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())

from sklearn.naive_bayes import MultinomialNB


clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
print('\nClassstification results of testing samples are given below')
for doc, p in zip(xtest, predicted):
pred = 'pos' if p==1 else 'neg'
print('%s -> %s ' % (doc, pred))

from sklearn import metrics


print('\nAccuracy metrics')
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Recall :',metrics.recall_score(ytest,predicted),'\nPrecison
:',metrics.precision_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))

BMS EVENING COLLEGE OF ENGINEERING 16


MACHINE LEARING LABORATORY (17CSL76)

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 17


MACHINE LEARING LABORATORY (17CSL76)

7. Write a program to construct a Bayesian network considering medical data.


Use this model to demonstrate the diagnosis of heart patients using standard
Heart Disease Data Set. You can use Java/Python ML library classes/API.

Initial Setup

BMS EVENING COLLEGE OF ENGINEERING 18


MACHINE LEARING LABORATORY (17CSL76)

import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination

heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)

print('Sample instances from the dataset are given below')


print(heartDisease.head())

print('\n Attributes and datatypes')


print(heartDisease.dtypes)

model=
BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),('cp','heartdisease'
),('heartdisease','restecg'),('heartdisease','chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)

print('\n Inferencing with Bayesian Network:')


HeartDiseasetest_infer = VariableElimination(model)

print('\n 1. Probability of HeartDisease given evidence= restecg')


q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'restecg':1})
print(q1)

print('\n 2. Probability of HeartDisease given evidence= cp ')


q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'cp':2})
print(q2)

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 19


MACHINE LEARING LABORATORY (17CSL76)

BMS EVENING COLLEGE OF ENGINEERING 20


MACHINE LEARING LABORATORY (17CSL76)

BMS EVENING COLLEGE OF ENGINEERING 21


MACHINE LEARING LABORATORY (17CSL76)

8. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same
data set for clustering using k-Means algorithm. Compare the results of these
two algorithms and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.

import matplotlib.pyplot as plt


from sklearn import datasets
from sklearn.cluster import KMeans
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib
l1 = [0,1,2]

def rename(s):
l2 = []
for i in s:
if i not in l2:
l2.append(i)

for i in range(len(s)):
pos = l2.index(s[i])
s[i] = l1[pos]

return s

iris = datasets.load_iris()

X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']

y = pd.DataFrame(iris.target)
y.columns = ['Targets']

print("Actual Target is:\n", iris.target)

model = KMeans(n_clusters=3)
model.fit(X)

plt.figure(figsize=(14,7))
colormap = np.array(['red', 'lime', 'black'])
plt.subplot(1, 2, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real Classification')

plt.subplot(1, 2, 2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)

BMS EVENING COLLEGE OF ENGINEERING 22


MACHINE LEARING LABORATORY (17CSL76)

plt.title('K Mean Classification')


plt.show()

km = rename(model.labels_)
print("\nWhat KMeans thought: \n", km)
print("Accuracy of KMeans is ",sm.accuracy_score(y, km))
print("Confusion Matrix for KMeans is \n",sm.confusion_matrix(y, km))

from sklearn import preprocessing


scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns = X.columns)
print("\n",xs.sample(5))

from sklearn.mixture import GaussianMixture


gmm = GaussianMixture(n_components=3)
gmm.fit(xs)

y_cluster_gmm = gmm.predict(xs)

plt.subplot(1, 2, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_cluster_gmm], s=40)
plt.title('GMM Classification')
plt.show()

em = rename(y_cluster_gmm)
print("\nWhat EM thought: \n", em)
print("Accuracy of EM is ",sm.accuracy_score(y, em))
print("Confusion Matrix for EM is \n", sm.confusion_matrix(y, em))

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 23


MACHINE LEARING LABORATORY (17CSL76)

BMS EVENING COLLEGE OF ENGINEERING 24


MACHINE LEARING LABORATORY (17CSL76)

BMS EVENING COLLEGE OF ENGINEERING 25


MACHINE LEARING LABORATORY (17CSL76)

9. Write a program to implement k-Nearest Neighbour algorithm to classify the


iris data set. Print both correct and wrong predictions. Java/Python ML library
classes can be used for this problem.

from sklearn.model_selection import train_test_split


from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets

iris=datasets.load_iris()
print("Iris Data set loaded...")

x_train, x_test, y_train, y_test = train_test_split(iris.data,iris.target,test_size=0.1)


print("Dataset is split into training and testing...")
print("Size of trainng data and its label",x_train.shape,y_train.shape)
print("Size of trainng data and its label",x_test.shape, y_test.shape)

for i in range(len(iris.target_names)):
print("Label", i , "-",str(iris.target_names[i]))

classifier = KNeighborsClassifier(n_neighbors=1)
classifier.fit(x_train, y_train)
y_pred=classifier.predict(x_test)

print("Results of Classification using K-nn with K=1 ")


for r in range(0,len(x_test)):
print(" Sample:", str(x_test[r]), " Actual-label:", str(y_test[r]), " Predicted-label:",str(y_pred[r]))

print("Classification Accuracy :" , classifier.score(x_test,y_test));

from sklearn.metrics import classification_report, confusion_matrix


print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))

BMS EVENING COLLEGE OF ENGINEERING 26


MACHINE LEARING LABORATORY (17CSL76)

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 27


MACHINE LEARING LABORATORY (17CSL76)

10. Implement the non-parametric Locally Weighted Regression algorithm in


order to fit data points. Select appropriate data set for your experiment and
draw graphs.

import matplotlib.pyplot as plt


import pandas as pd
import numpy as np

def kernel(point,xmat, k):


m,n = np.shape(xmat)
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights

def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat,ymat,k):
m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

def graphPlot(X,ypred):
sortindex = X[:,1].argsort(0) #argsort - index of the smallest
xsort = X[sortindex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[sortindex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show();
# load data points
data = pd.read_csv('pgm10.csv')
bill = np.array(data.total_bill) # We use only Bill amount and Tips data
tip = np.array(data.tip)
mbill = np.mat(bill) # .mat will convert nd array is converted in 2D array
mtip = np.mat(tip)
m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
X = np.hstack((one.T,mbill.T)) # 244 rows, 2 cols
# increase k to get smooth curves
ypred = localWeightRegression(X,mtip,3)

BMS EVENING COLLEGE OF ENGINEERING 28


MACHINE LEARING LABORATORY (17CSL76)

graphPlot(X,ypred)

SAMPLE OUTPUT

BMS EVENING COLLEGE OF ENGINEERING 29

You might also like