Machine Learning With SQL

Machine Learning LAB CSA-405
PRACTICAL-1
AIM : Implement Exploratory Data Analysis on any data set.

Dataset Used : Iris Flower Data Set
CODE :
#Importing packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
#Loading dataset and do the needful formatting to set a header on data frame
url = https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pd.read_csv(url, names=names)
dataset.head()
‘‘‘Now analyse the data frame and output some of the useful information about data
Like it’s shape, information, mean, quantiles, count, max value, min value, data type of
Values, etc’’’
dataset.shape
dataset.info()
1|Page
dataset.describe()
‘‘‘Now I graphically represent the behaviour of dataset in form of box plot, histogram,
and Scatter-matrix’’’
dataset.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()
dataset.hist()
from pandas.plotting import scatter_matrix

scatter_matrix(dataset)
2|Page
plt.show()
3|Page
PRACTICAL-2
AIM : Implement Linear Regression on any data set.

Dataset Used : Random Values using the NUMPY library
CODE :
#Importing packages
import numpy as np
import tensorflow as tf
import matplotlib.patches as mpatches
%matplotlib inline
plt.rcParams['figure.figsize'] = (10,6)
#Declaring Independent Variable “x” and Dependent Variable “y” for trial
x = np.arange(0.0,5.0,0.1)
a,b = 1,0
y = a*x+b
#Draw the graphical relationship between Dependent and Independent variable

plt.plot(x,y)
plt.ylabel('Dependent Variable')
plt.xlabel('Independent Variable')
plt.show()
‘‘‘Declaring Independent Variable “x_data”, Target Variable “y_data” (where a=3 and # b=2 as I
want to draw the line of form y=3x+2) and also declare the dependent variable “y_data” using
random function of numpy library. ’’’
x_data = np.random.rand(100).astype(np.float32)
y_data = 3*x_data+2
4|Page
y_data = np.vectorize(lambda y:y+np.random.normal(loc=0.0,scale=0.1))(y_data)
a = tf.Variable(1.0)
b = tf.Variable(0.2)
y = a*x_data+b
‘‘‘Calculate the RMSE using redeuce_mean library of tensorflow to analyse the error between target
and obtained value and after that using GradientDescentOptimizer library of tensorflow I minimize
the error on scale of 0.5’’’
loss = tf.reduce_mean(tf.square(y-y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
‘‘‘Now initialize the tensorflow with global_variables_initializer library of tensorflow to demonstrate

the regression function neatly and precisely through graphical illustration’’’
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
‘‘‘Now I providing the finishing touch to my program and start the optimization process for my
Regression algorithm and finally draw the scattering of data along the regression line’’’
train_data = []
for step in range(100):
evals = sess.run([train,a,b])[1:]
if step%5 == 0:
print(step," ",evals)
train_data.append(evals)
cr,cg,cb = (1.0,1.0,1.0)
for f in train_data:
cb += 1.0/len(train_data)
cg -= 1.0/len(train_data)
5|Page
if cb> 1.0:
cb = 1.0
if cg < 0.0:
cg = 0.0
[a,b] = f
f_y = np.vectorize(lambda x:a*x+b)(x_data)
line = plt.plot(x_data,f_y)
plt.setp(line,color=(cr,cg,cb))
plt.plot(x_data,y_data,'ro')
green_line = mpatches.Patch(color='red',label='Data Points')
plt.legend(handles=[green_line])
plt.show()
6|Page
PRACTICAL-3
AIM: Program to implement Confusion Matrix and ROC Curve.

Dataset Used: Iris Flower imported from sklearn library
CODE:
‘‘‘Importing useful libraries to implement logistic regression and then evaluate it using Confusion
Matrix and ROC Curve’’’
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
iris=load_iris()
#Splitting the dataset into train and test list so that we will evaluate it later using it.
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
#Implementing the Logistic Regression in below steps
logreg = LogisticRegression(C=0.1).fit(X_train, y_train)
pred_logreg = logreg.predict(X_test)
print("logreg score: %f" % logreg.score(X_test, y_test))
‘‘‘Here is the confusion matrix which contain the predicted result based on test data and prediction
obtained by logistic regression in above step’’’
from sklearn.metrics import confusion_matrix
confusion = confusion_matrix(y_test, pred_logreg)
print(confusion)
‘‘‘In the below steps we are going to evaluate the predicted result obtained by Logistic regression in
above step using ROC curve and for this we use some libraries from sklearn’’’
from sklearn.metrics import precision_recall_curve
from sklearn.datasets import make_blobs
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
digits = load_digits()
y = digits.target == 9
7|Page
X_train, X_test, y_train, y_test = train_test_split(
digits.data, y, random_state=0)
plt.figure()
for gamma in [1, 0.05, 0.01]:
svc = SVC(gamma=gamma).fit(X_train, y_train)
accuracy = svc.score(X_test, y_test)
auc = roc_auc_score(y_test, svc.decision_function(X_test))
fpr, tpr, _ = roc_curve(y_test , svc.decision_function(X_test))
print("gamma = %.02f accuracy = %.02f AUC = %.02f" % (gamma, accuracy, auc))
plt.plot(fpr, tpr, label="gamma=%.03f" % gamma, linewidth=4)
8|Page
PRACTICAL-4
AIM: Implement Support Vector Machine on any dataset and analyse the accuracy
with Logistic Regression.
Dataset Used: Iris Flower dataset
CODE:
#Importing library to load dataset
import pandas as pd
dataset = pd.read_csv('/Iris.csv')
dataset.head()
#Below image is just the visual representation of Iris data frame which show top 5 data
‘‘‘In the below steps we are going to delete the unwanted column from data frame and set the target
value using set’’’
dataset = dataset.drop(['Id'],axis=1)
target = dataset['Species']
s = set()
for val in target:
s.add(val)
s = list(s)
‘‘‘Since the Iris dataset has three classes so in the below steps we will further reorganize the dataset
and remove one of the classes. This will leave us with a binary class classification problem’’’
rows = list(range(100,150))
dataset = dataset.drop(dataset.index[rows])
‘‘‘Since there are four features available for us to use. But we will be using only two features, i.e.
Sepal length and petal length. Now we take these two features and plot them to visualize and we are
going to perform same in the below subsequent steps’’’
x = dataset['SepalLengthCm']
y = dataset['PetalLengthCm']
setosa_x = x[:50]
setosa_y = y[:50]
9|Page
versicolor_x = x[50:]
versicolor_y = y[50:]
plt.figure(figsize=(8,6))
plt.scatter(setosa_x,setosa_y,marker='+',color='green')
plt.scatter(versicolor_x,versicolor_y,marker='_',color='red')
plt.show()
‘‘‘Below is the plot to visualize the scatterness of data points so that it will give us the idea about
hyperplane position during SVM implementation’’’
#In the following subsequent step we are going to split the dataset into training and test set
from sklearn.utils import shuffle
import numpy as np
#Now drop all of the features except our target values
dataset = dataset.drop(['SepalWidthCm','PetalWidthCm'],axis=1)
Y = []
target = dataset['Species']
for val in target:
if(val=='Iris_setosa'):
Y.append(-1)
else:
Y.append(1)
dataset = dataset.drop(['Species'],axis=1)
X = dataset.values.tolist()
#Now shuffle and split the data into training and test set
10 | P a g e
X,Y = shuffle(X,Y)
x_train,y_train = [],[]
x_test,y_test = [],[]
x_train,x_test,y_train,y_test = train_test_split(X,Y,train_size=0.9)
x_train,y_train = np.array(x_train),np.array(y_train)
x_test,y_test = np.array(x_test),np.array(y_test)
y_train = y_train.reshape(90,1)
y_test = y_test.reshape(10,1)
‘‘‘In the steps below we are going to implement the SVM algorithm at learning rate of 0.0001 for
10000 iterations and the hyperparameter value for the algorithm will be changing in each iteration by
equation (1/epochs) here epochs will contain the value corresponding to iteration number. Therefore,
the regularization value decreases when the number of epochs increases and this is called adjustment
of regularization parameter.’’’
train_f1 = x_train[:,0]
train_f2 = x_train[:,1]
train_f1 = train_f1.reshape(90,1)
train_f2 = train_f2.reshape(90,1)
w1 = np.zeros((90,1))
w2 = np.zeros((90,1))
epochs=1
alpha = 0.0001
while(epochs<10001):
y = w1*train_f1 + w2*train_f2
prod = y*y_train
if(epochs%1000==0):
print(epochs)
count = 0
for val in prod:
if(val>=1):
cost = 0
w1 = w1-alpha*(2*1/epochs*w1)
w2 = w2-alpha*(2*1/epochs*w2)
else:
cost = 1-val
w1 = w1+alpha*(train_f1[count]*y_train[count]-2*1/epochs*w1)
w2 = w2+alpha*(train_f2[count]*y_train[count]-2*1/epochs*w2)
count += 1
epochs += 1
11 | P a g e
‘‘‘Now in further step we will clip down the weights as the test data contains only 10 data points. We
extract the features from the test data and predict the values. We will obtain the predictions and
compare it with the actual values and got the accuracy of our model.’’’
from sklearn.metrics import accuracy_score
#Weight clipping
index = list(range(10,90))
w1 = np.delete(w1,index)
w2 = np.delete(w2,index)
w1 = w1.reshape(10,1)
w2 = w2.reshape(10,1)
#Extracting the test data features
test_f1 = x_test[:,0]
test_f2 = x_test[:,1]
test_f1 = test_f1.reshape(10,1)
test_f2 = test_f2.reshape(10,1)
#Now prediction will be started
y_pred = w1*test_f1 + w2*test_f2
predictions = []
for val in y_pred:
if(val>1):
predictions.append(1)
else:
predictions.append(-1)
print(accuracy_score(y_test,predictions),"\n\n")
‘‘‘In the below step we are going to implement the logistic regression using sklearn library and then
compare it’s accuracy with SVM’’’
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
iris=load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
#Implementing the Logistic Regression in below steps
logreg = LogisticRegression(C=0.1).fit(X_train, y_train)
pred_logreg = logreg.predict(X_test)
print("logreg score: %f" % logreg.score(X_test, y_test))
So now we can clearly see that the prediction accuracy of Logistic regression is around 63.15% which is
much smaller than the prediction accuracy of SVM. So we can say that SVM classify the dataset more
accurately than Logistic Regression.
12 | P a g e

Machine Learning With SQL

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning With SQL

Uploaded by

Copyright:

Available Formats

Machine Learning LAB CSA-405

AIM : Implement Exploratory Data Analysis on any data set.

from pandas.plotting import scatter_matrix

AIM : Implement Linear Regression on any data set.

#Draw the graphical relationship between Dependent and Independent variable

‘‘‘Now initialize the tensorflow with global_variables_initializer library of tensorflow to demonstrate

AIM: Program to implement Confusion Matrix and ROC Curve.

You might also like