You are on page 1of 63

Artificial Intelligence and Machine Learning

Lab (CC3230)

Name: - Ananiya Sardana

Reg. No. 199303079

Computer and communication Engineering

School of computing and Information Technology

Manipal University Jaipur, Jaipur

(Jan 2022- May 2022)


Index
Serial Number Title Page No.
1-2 Introduction to Python 1-6
3 Write a program to 7-9
implement hill climbing
search algorithm

4 Write program 10-12


to A*
a search
implemen
t
algorithm
5 Write a program to solve 13-14
some real-world problem
using constraint satisfaction
6-7 Write a program to 15-27
Implement Simple Linear and
Logistic Regression.

8 Write a program to 28-37


implement the Bayes
Classifier and SVM
Classifier.

9 Write a program to 38-41


implement Decision Tree
Algorithm

10 Write a program 42-46


implemen
t to k-
Neighbors Nearest
11. Write a program to 47-51
implement k-means
algorithm
12. Write a Program to 52-55
implement Principal
Component Analysis for
dimensionality reduction
13. Write programs to 56-61
Implement the Perceptron
Algorithm

Write a program to
implement the
Backpropagation Algorithm.
1

Lab 1

Ananiya Sardana
199303079
CCE-B

Introduction to Python
Program 1: Python Program for factorial of a number

In [1]:

def factorial(n):
if (n==1 or n==0):
return 1
else:
return n * factorial(n - 1)

In [2]:

number = int(input("Enter a number: "))


print("Factorial of",number,"is", factorial(number))

Enter a number: 10
Factorial of 10 is 3628800

Program 2: Python Program for compound interest

In [3]:

def compound_interest(p, r, t):


amt = p * (pow((1 + r / 100), t))
CI = amt - p
print("Compound interest:", CI)

In [4]:

p = float(input("Enter principle: "))


r = float(input("Enter rate: "))
t = float(input("Enter time/annum: "))
compound_interest(p,r,t)

Enter principle: 20000


Enter rate: 12
Enter time/annum: 5
Compound interest: 15246.833664000013

Program 3: Python program to print all Prime numbers in an Interval

In [5]:

def prime(start, end):


numCount=0
print("Prime numbers between",start,"and",end,"are:\n")
for i in range(start,end+1):
count = 0
for j in range(2,i): if i%j==0: 2
count=count+1 if count==0:
numCount=numCount+1 print (i,"\n")
if numCount==0:
print("Prime number between",start,"and",end,"doesn't exist!!")

In [6]:
start = int(input("Enter start range:")) end = int(input("Enter end range:")) prime(start,end)

Enter start range:1


Enter end range:15
Prime numbers between 1 and 15 are:

11

13

Program 4: Python program to print Fibonacci series

In [7]:
def fibonacci(n): f1=0
f2=1
if n==0: print(f1)
elif n==1: print(f2)
else:
print(f1," ",f2," ",end=" ") for i in range(3,n+1):
term=f1+f2 f1=f2 f2=term
print(term," ",end=" ")

In [8]:
n = int(input("Enter nth term: ")) fibonacci(n)

Enter nth term: 10


0 1 1 2 3 5 8 13 21 34

Program 5: Python Program for How to check if a given number is Fibonacci number

In [9]:
import math

def isPerfectSquare(x): s = int(math.sqrt(x))


return s*s == x
3
def isFibonacci(n):
return isPerfectSquare(5*n*n + 4) or isPerfectSquare(5*n*n - 4)

In [10]:

n = int(input("Enter a number: "))


if (isFibonacci(n)):
print (n,"is a Fibonacci Number")
else:
print (n,"is a not Fibonacci Number ")

Enter a number: 12
12 is a not Fibonacci Number

Program 6: Python Program for cube sum of first n natural numbers

In [11]:

def SumOfCube(n):
sum=0
for i in range(1,n+1):
sum += i*i*i
print("Cube Sum of first",n,"natural numbers:",sum)

In [12]:

n = int(input("Enter value of n:"))


SumOfCube(n)

Enter value of n:5


Cube Sum of first 5 natural numbers: 225

Program 7: Python program to print double side staircase pattern

In [34]:

def stairCase(num):
for i in range(1,num+1):
for j in range(2):
for k in range(i,num):
print(" ",end="")
for l in range(2*i):
print("*",end="")
print()

In [36]:

n = int(input("Enter value of n:"))


stairCase(n)

Enter value of n:5


**
**
****
****
******
******
********
********
**********
**********
4

Lab 2

Ananiya Sardana
199303079
CCE-B

Advance Topics in Python


Program 1: Python program to find the N smallest and largest number in a list

In [3]:

def smallAndLarge():
max = float('-inf')
min = float('inf')
list1 = []
n = int(input("Enter size of list: "))

for i in range(O,n):
list1.append(int(input("Enter number: ")))

for i in range(O,n):
if list1[i]>max:
max = list1[i]
if list1[i]<min:
min = list1[i]

print("\nLargest number in the list: ",max) print("\


nSmallest number in the list: ",min)

In [4]:

smallAndLarge()

Enter size of list: 5


Enter number: 1
Enter number: 2
Enter number: 3
Enter number: 4
Enter number: 5

Largest number in the list: 5

Smallest number in the list: 1

Program 2: Python program to find and print all even and odd numbers in a list

In [13]:

def oddOrEven():
list1 = []
n = int(input("Enter size of list: "))

for i in range(O,n):
list1.append(int(input("Enter number: ")))
print("\nOdd Numbers in the list:") for i in range(O,n):
if list1[i]%2==1: print(list1[i],end=' ') 5
print("\nEven Numbers in the list:") for i in range(O,n):
if list1[i]%2==O: print(list1[i],end=' ')

In [14]:
Enter size of
oddOrEven() list: 4
Enter number: 1
Enter number: 2
Enter number: 3
Enter number: 4

Odd Numbers in the list:


1 3
Even Numbers in the list:
2 4

Program 3: Create a dictionary “Employee" add dataitems name, age, salary and company name by taking input
from the user.

In [15]:
def EmpDict():
Employee = dict({'Name':input("Enter Name of the Employee: "),'Age':int(input("Enter Ag e of the Employee:
")),'Salary':int(input("Enter Salary of the Employee: ")),'Company':in put("Enter Company Name of the Employee: ")})
print(Employee)

In [16]:
EmpDict()

Enter Name of the Employee: Leo


Enter Age of the Employee: 35
Enter Salary of the Employee: 1OOOOOOO
Enter Company Name of the Employee: Google
{'Name': 'Leo', 'Age': 35, 'Salary': 1OOOOOOO, 'Company': 'Google'}

Program 4: Explain the usage of cmp, len,max,min and tuple inbuilt python functions for tuple data structure
using some programming examples.

In [2O]:
#cmp function doesn.t exist in Python3 anymore so we have to create a function
def cmp(a, b):
return (a > b) - (a < b)

In [22]:
def cmpExample():
print("Usage of cmp Example:\n") a = 5
b = 1O
print(cmp(a,b))

a=5
b = 5 print(cmp(a,b))

a = 1O
b=5
print(cmp(a,b))
6
In [23]:

cmpExample()

Usage of cmp Example:

-1
O
1

In [24]:

def lenExample():
print("Usage of len Example:\n")
a= "Leo Messi"
b= [1,2,3,4,5,6,7]

print("Length of sthe string: ", len(a))


print("Length of sthe list: ", len(b))

In [25]:

lenExample()

Usage of cmp Example:

Length of sthe string: 9


Length of sthe list: 7

In [26]:

def MaxAndMinExample():
print("Usage of max and min Example:\n")
a= [1,2,3,4,5,6,7]

print("Largest number in the list: ",max(a))


print("Smallest number in the list: ",min(a))

In [27]:

MaxAndMinExample()

Usage of max and min Example:

Largest number in the list: 7


Smallest number in the list: 1
7

Lab 3

Ananiya Sardana
199303079
CCE-B

Hill Climbing Algorithm


In [1]:

import random

Step 1 -> Create a Problem Space

In [2]:

tsp = [
[0, 400, 500, 300],
[400, 0, 300, 500],
[500, 300, 0, 400],
[300, 500, 400, 0],
]

In [3]:

tsp
Out[3]:

[[0, 400, 500, 300],


[400, 0, 300, 500],
[500, 300, 0, 400],
[300, 500, 400, 0]]

In [4]:

len(tsp)
Out[4]:
4

Step 2 -> Create a Random Solution

Create random sequence of cities

In [5]:
def randomSolution(tsp):
cities = list(range(len(tsp)))
solution=[]

for i in range(len(tsp)):
randomcity = cities[random.randint(0,len(cities)-1)]
solution.append(randomcity)
cities.remove(randomcity) return #drop city that is already appended to solution
solution 8
In [6]:
solution = randomSolution(tsp)

In [7]:
print(solution)

[3, 2, 0, 1]

Calculate Route Length

In [8]:
def routeLength(tsp,solution): routeLength = 0
for i in range(len(solution)): routeLength=routeLength+tsp[solution[i-1]][solution[i]]
return routeLength

In [9]:
routelength = routeLength(tsp,solution)

In [10]:
print(routelength)

1800

Step 3 -> Generate Neighbours

In [11]:
def getNeigbours(solution): neighbours = []
for i in range(len(solution)):
for j in range(i+1, len(solution)): neighbour = solution.copy() neighbour[i] = solution[j] neighbour[j] = solution[i]
neighbours.append(neighbour)
return neighbours

In [12]:
neighbours = getNeigbours(solution)

In [13]:
print(neighbours)

[[2, 3, 0, 1], [0, 2, 3, 1], [1, 2, 0, 3], [3, 0, 2, 1], [3, 1, 0, 2], [3, 2, 1, 0]]

Step 4 -> Find out best neighbour

In [19]:
def getBestNeigbour(neighbours): neigbourRoute = [0]*len(neighbours) for i in range(len(neighbours)):
neigbourRoute[i] = routeLength(tsp,neighbours[i])
bestNeighbour = neighbours[neigbourRoute.index(min(neigbourRoute))] bestRouteLength = min(neigbourRoute)
return bestRouteLength, bestNeighbour
In [20]:
9
bestRouteLength, bestNeighbour = getBestNeigbour(neighbours)

In [22]:

print(bestRouteLength)

1400

In [21]:

print(bestNeighbour)

[2, 3, 0, 1]

Step 5 -> Implement Hill Climbing

In [55]:

def hillClimbing(tsp):
currentSolution = randomSolution(tsp)
currentRouteLength = routeLength(tsp, currentSolution)
print(currentSolution)
print(currentRouteLength)
neighbors = getNeigbours(currentSolution)
bestneighborroutelength, bestneighbor=getBestNeigbour(neighbors)
# print(bestneighborroutelength)
# print(currentRouteLength)
while (bestneighborroutelength<currentRouteLength):
currentsolution=bestneighbor
currentRouteLength=bestneighborroutelength
neighbors=getNeigbours(currentsolution)
bestneighbor,bestneighborroutelength=getBestNeigbour(neighbors)

In [56]:
print(hillClimbing(tsp))

[3, 0, 1, 2]
1400
None
10

Lab 4

Ananiya Sardana
199303079
CCE-B

A* Algorithm
Swapping of tiles

In [1]:

def move(ar,p,st):
rh=999999
store_st = st.copy()
print("store_st: ",store_st)
for i in range(len(ar)):
dupl_st=st.copy()
temp=dupl_st[p]
#print(temp)
dupl_st[p]=dupl_st[arr[i]]
print("x",dupl_st[p])
dupl_st[arr[i]]=temp

tmp_rh=count(dupl_st)

if tmp_rh<rh:
rh=tmp_rh
store_st=dupl_st.copy()
return store_st,rh

Function to print in 3x3 format

In [2]:

def print_in_format(matrix):
for i in range(9):
if i%3==0 and i>0:
print("")
print(str(matrix[i])+ " ", end = "")

Step 1 -> Create a Problem Space

In [3]:

start = [1,2,3,
0,5,6,
4,7,8]

Step 2 -> Create a function that counts no.of misplaced tiles

In [4]:

def count(s):
c = 0 ideal=[1,2,3,
4,5,6, 11
7,8,0]
for i in range(9):
if s[i]!=0 and s[i]!=ideal[i]: c = c+1
return c

Step 3 -> Define Moves

In [5]:
h = count(start) level = 1
print("\n............Level "+str(level)+"")
print_in_format(start)
print("\nHeuristic Value(misplaced tiles): "+str(h)) while h>0:
pos = int(start.index(0)) level = level+1
if pos == 0: arr = [1,3]
elif pos == 1: arr = [0,2,4]
elif pos == 2: arr = [1,5]
elif pos == 3: arr = [0,4,6]
elif pos == 4: arr = [1,3,5,7]
elif pos == 5: arr = [2,4,8]
elif pos == 6: arr = [3,7]
elif pos == 7: arr = [4,6,8]
elif pos == 8: arr = [5,7]
start,h = move(arr,pos,start) print("\n..............Level:"+str(level)+"")
print_in_format(start)
print("\nheuristic Value(no of misplaced tiles): "+str(h))

............Level 1............
1 2 3
0 5 6
4 7 8
Heuristic Value(misplaced tiles): 3
store_st: [1, 2, 3, 0, 5, 6, 4, 7, 8]
x 1
x 5
x 4

..............Level:2...............
1 2 3
4 5 6
0 7 8
heuristic Value(no of misplaced tiles): 2
store_st: [1, 2, 3, 4, 5, 6, 0, 7, 8]
x 4
x 7

..............Level:3...............
1 2 3
4 5 6
7 0 8
heuristic Value(no of misplaced tiles): 1
store_st: [1, 2, 3, 4, 5, 6, 7, 0, 8]
x 5
x 7
x8
12
..............Level:4...............
123
456
780
heuristic Value(no of misplaced tiles): 0
13

Lab 5

Ananiya Sardana
199303079
CCE-B

Constraint Satisfaction Problem


In [3]:

# CSP
# find the value of x and y such that value of x is from [1,2,3] and vlaue of y is
from 0 to 9 and x+y>= 5

# constraint module

import constraint
from constraint import *
problem=Problem()
problem.addVariable('x',[1,2,3])
problem.addVariable('y',range(10))
def my_constraint(x,y):
if x+y>=5:
return True problem.addConstraint(my_constraint,
['x','y']) solutions=problem.getSolutions()
for solution in solutions:
print(solution)

{'x': 3, 'y': 9}
{'x': 3, 'y': 8}
{'x': 3, 'y': 7}
{'x': 3, 'y': 6}
{'x': 3, 'y': 5}
{'x': 3, 'y': 4}
{'x': 3, 'y': 3}
{'x': 3, 'y': 2}
{'x': 2, 'y': 9}
{'x': 2, 'y': 8}
{'x': 2, 'y': 7}
{'x': 2, 'y': 6}
{'x': 2, 'y': 5}
{'x': 2, 'y': 4}
{'x': 2, 'y': 3}
{'x': 1, 'y': 9}
{'x': 1, 'y': 8}
{'x': 1, 'y': 7}
{'x': 1, 'y': 6}
{'x': 1, 'y': 5}
{'x': 1, 'y': 4}

In [6]:

# find the value of x and y such that value of x is from 1 to


10 #and vlaue of y is from 1 to 20 and y/x should be an even
number

import constraint
from constraint import *
problem=Problem() 14
problem.addVariable('x',[1,2,3,4,5,6,7,8,9,10])
problem.addVariable('y',range(1,21))
def my_constraint(x,y):
if (y/x)%2==0:
return True
problem.addConstraint(my_constraint,['x','y'])
solutions=problem.getSolutions()
for solution in solutions:
print(solution)

{'x': 10, 'y': 20}


{'x': 9, 'y': 18}
{'x': 8, 'y': 16}
{'x': 7, 'y': 14}
{'x': 6, 'y': 12}
{'x': 5, 'y': 10}
{'x': 5, 'y': 20}
{'x': 4, 'y': 8}
{'x': 4, 'y': 16}
{'x': 3, 'y': 6}
{'x': 3, 'y': 18}
{'x': 3, 'y': 12}
{'x': 2, 'y': 4}
{'x': 2, 'y': 20}
{'x': 2, 'y': 8}
{'x': 2, 'y': 16}
{'x': 2, 'y': 12}
{'x': 1, 'y': 2}
{'x': 1, 'y': 14}
{'x': 1, 'y': 10}
{'x': 1, 'y': 6}
{'x': 1, 'y': 18}
{'x': 1, 'y': 4}
{'x': 1, 'y': 20}
{'x': 1, 'y': 8}
{'x': 1, 'y': 16}
{'x': 1, 'y': 12}

In [ ]:
15

Lab 6 Part 1

Ananiya Sardana
199303079
CCE-B

Single Variable Linear Regression

Importing the libraries


In [1]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Importing the dataset


In [2]:

dataset = pd.read_csv('canada_per_capita_income.csv')
x = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values

In [3]:
print(y)
print(x)

[ 3399.299037 3768.297935 4251.175484 4804.463248 5576.514583


5998.144346 7062.131392 7100.12617 7247.967035 7602.912681
8355.96812 9434.390652 9619.438377 10416.53659 10790.32872
11018.95585 11482.89153 12974.80662 15080.28345 16426.72548
16838.6732 17266.09769 16412.08309 15875.58673 15755.82027
16369.31725 16699.82668 17310.75775 16622.67187 17581.02414
18987.38241 18601.39724 19232.17556 22739.42628 25719.14715
29198.05569 32738.2629 36144.48122 37446.48609 32755.17682
38420.52289 42334.71121 42665.25597 42676.46837 41039.8936
35175.18898 34229.19363 ]
[[1970]
[1971]
[1972]
[1973]
[1974]
[1975]
[1976]
[1977]
[1978]
[1979]
[1980]
[1981]
[1982]
[1983]
[1984]
[1985]
[1986]
[1987] 16
[1988]
[1989]
[1990]
[1991]
[1992]
[1993]
[1994]
[1995]
[1996]
[1997]
[1998]
[1999]
[2000]
[2001]
[2002]
[2003]
[2004]
[2005]
[2006]
[2007]
[2008]
[2009]
[2010]
[2011]
[2012]
[2013]
[2014]
[2015]
[2016]]

Training the Simple Linear Regression model on the Training set


In [4]:
from sklearn.linear_model import LinearRegression regressor = LinearRegression()
regressor.fit(x, y)

Out[4]:

LinearRegression()

Predicting the Test set results


In [5]:
y_pred = regressor.predict([[2020]])

In [6]:
y_pred

Out[6]:
array([41288.69409442])

Visualising the Training set results


In [7]:
plt.scatter(x,y, color ='red')
plt.plot(x, regressor.predict(x), color = 'blue') plt.title('Year vs per Capita Income (US$)(Training Set)') plt.xlabel('Year')
plt.ylabel('per Capita Income (US$)') plt.show()
17

In [ ]:

In [ ]:
18

Lab 6 Part 2

Ananiya Sardana
199303079
CCE-B

In [1]:
! pip3 install word2number

Collecting word2number
Downloading word2number-1.1.zip (9.7 kB)
Preparing metadata (setup.py) ... done
Building wheels for collected packages: word2number
Building wheel for word2number (setup.py) ... done
Created wheel for word2number: filename=word2number-1.1-py3-none-any.whl size=5586 sha2
56=91dc8db272f453f02ecf59edfe402555a66b387ca007f41e65e602149c251ec2
Stored in directory: /Users/priyamthakkar/Library/Caches/pip/wheels/cb/f3/5a/d88198fdeb
46781ddd7e7f2653061af83e7adb2a076d8886d6
Successfully built word2number
Installing collected packages: word2number
Successfully installed word2number-1.1
WARNING: You are using pip version 21.3.1; however, version 22.0.4 is available.
You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.8/
bin/python3.8 -m pip install --upgrade pip' command.

Multiple Variable Linear Regression

Importing the libraries


In [63]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Importing the dataset


In [95]:

dataset = pd.read_csv('hiring.csv')
dataset.experience = dataset.experience.fillna(str("zero"))
x = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values

In [96]:
print(x)

[['zero' 8.0 9]
['zero' 8.0 6]
['five' 6.0 7]
['two' 10.0 10]
['seven' 9.0 6]
['three' 7.0 10]
['ten' nan 7]
['eleven' 7.0 8]] 19

In [97]:
print(y)

[50000 45000 60000 65000 70000 62000 72000 80000]

In [98]:
print(x[0])

['zero' 8.0 9]

Taking care of missing data


In [99]:
for i in x:
i[0]=w2n.word_to_num(i[0])

In [100]:
print(x)

[[0 8.0 9]
[0 8.0 6]
[5 6.0 7]
[2 10.0 10]
[7 9.0 6]
[3 7.0 10]
[10 nan 7]
[11 7.0 8]]

In [101]:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean') imputer.fit(x[:,:])
x[:,:] = (imputer.transform(x[:,:]))

In [102]:
[[0.0 8.0 9.0]
print(x)
[0.0 8.0 6.0]
[5.0 6.0 7.0]
[2.0 10.0 10.0]
[7.0 9.0 6.0]
[3.0 7.0 10.0]
[10.0 7.857142857142857 7.0]
[11.0 7.0 8.0]]

Training the Multiple Linear Regression model on the Training set


In [103]:
from sklearn.linear_model import LinearRegression regressor = LinearRegression()
regressor.fit(x, y)

Out[103]:

LinearRegression()
Predicting the Test set results 20

In [108]: x_test=[[2,9,6],[12,10,10]]

In [109]:

y_pred = regressor.predict(x_test) print(y_pred)

[53290.89 92268.07]

In [110]:

regressor.coef_ Out[110]:
array([2827.63, 1912.94, 2196.98])

In [111]:

regressor.intercept_ Out[111]:

17237.330313727172

In [ ]:
21

Lab 7 Part 1

Ananiya Sardana
199303079
CCE-B

Logistic Regression (Binary)


In [1]:

import pandas as pd
import matplotlib.pyplot as plt

In [2]:

df = pd.read_csv (r'insurancedata.csv')
df.head()
Out[2]:

age bought_insurance

0 22 0

1 25 0

2 47 1

3 52 0

4 46 1

In [3]:

plt.scatter(df.age,df.bought_insurance,marker='+',color='red')
Out[3]:
<matplotlib.collections.PathCollection at 0x7f9a78448940>

In [4]:
from sklearn.model_selection import train_test_split
In [5]: 22
df
Out[5]:

age bought_insurance

0 22 0

1 25 0

2 47 1

3 52 0

4 46 1

5 56 1

6 55 0

7 60 1

8 62 1

9 61 1

10 18 0

11 28 0

12 27 0

13 29 0

14 49 1

15 55 1

16 25 1

17 58 1

18 19 0

19 18 0

20 21 0

21 26 0

22 40 1

23 45 1

24 50 1

25 54 1

26 23 0

In [6]:
X_train, X_test, y_train, y_test = train_test_split(df[['age']],df.bought_insurance,trai n_size=0.8)

In [7]:
y_test

Out[7]:

16 1
3 0
18 0
1 0
20 0
4 1
Name: bought_insurance, dtype: int64
In [8]:
23
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

In [9]:

model.fit(X_train, y_train)
Out[9]:

LogisticRegression()

In [10]:

X_test
Out[10]:

age

16 25

3 52

18 19

1 25

20 21

4 46

In [11]:

y_predicted = model.predict(X_test)
y_predicted
Out[11]:

array([0, 1, 0, 0, 0, 1])

In [12]:

model.score(X_test,y_test)
Out[12]:
0.6666666666666666

In [ ]:
24

Lab 7 Part 2

Ananiya Sardana
199303079
CCE-B

Multi-class Logistic regression


In [1]:

# loading dataset: digits


from sklearn.datasets import load_digits

In [2]:

digits = load_digits()
print(digits.data.shape)

(1797, 64)

In [3]:
digits.data

Out[3]:
array([[ 0., 0., 5., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 10., 0., 0.],
[ 0., 0., 0., ..., 16., 9., 0.],
...,
[ 0., 0., 1., ..., 6., 0., 0.],
[ 0., 0., 2., ..., 12., 0., 0.],
[ 0., 0., 10., ..., 12., 1., 0.]])

In [4]:

# data visualization
from matplotlib import pyplot as plt

In [5]:

plt.gray() # to show gray scale images


for i in range(0,5):
plt.matshow(digits.images[i])
plt.axis('off')
plt.show()

<Figure size 432x288 with 0 Axes>


25

In [6]:
# dataset description
# checking the dataset directory
dir(digits)
Out[6]:
['DESCR', 'data', 'feature_names', 'frame', 'images', 'target', 'target_names'] 26
In [7]:
digits.data.shape

Out[7]:

(1797, 64)

In [8]:
digits.DESCR

Out[8]:

".. _digits_dataset:\n\nOptical recognition of handwritten digits dataset\n--------------


------------------------------------\n\n**Data Set Characteristics:**\n\n :Number of
Instances: 1797\n :Number of Attributes: 64\n :Attribute Information: 8x8 image of
integer pixels in the range 0..16.\n :Missing Attribute Values: None\n :Creator: E.
Alpaydin (alpaydin '@' boun.edu.tr)\n :Date: July; 1998\n\nThis is a copy of the test
set of the UCI ML hand-written digits
datasets\nhttps://archive.ics.uci.edu/ml/datasets/O
ptical+Recognition+of+Handwritten+Digits\n\nThe data set contains images of hand-written
digits: 10 classes where\neach class refers to a digit.\n\nPreprocessing programs made
av ailable by NIST were used to extract\nnormalized bitmaps of handwritten digits from a
pre printed form. From a\ntotal of 43 people, 30 contributed to the training set and
differen t 13\nto the test set. 32x32 bitmaps are divided into nonoverlapping blocks of\
n4x4 and t he number of on pixels are counted in each block. This generates\nan input
matrix of 8x8 where each element is an integer in the range\n0..16. This reduces
dimensionality and giv es invariance to small\ndistortions.\n\nFor info on NIST
preprocessing routines, see M. D
. Garris, J. L. Blue, G.\nT. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet
, and C.\nL. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,\n1994.\
n\ n.. topic:: References\n\n - C. Kaynak (1995) Methods of Combining Multiple
Classifiers and Their\n Applications to Handwritten Digit Recognition, MSc
Thesis, Institute of\n Graduate Studies in Science and Engineering, Bogazici
University.\n - E. Alpaydin, C. Ka ynak (1998) Cascading Classifiers, Kybernetika.\n -
Ken Tang and Ponnuthurai N. Sugantha n and Xi Yao and A. Kai Qin.\n Linear
dimensionalityreduction using relevance weighted LDA. School of\n Electrical and
Electronic Engineering Nanyang Technological Universit y.\n 2005.\n - Claudio Gentile.
A New Approximate Maximal Margin Classification\n Algorithm. NIPS. 2000.\n"

In [9]:
digits.feature_names[0:5]

Out[9]:

['pixel_0_0', 'pixel_0_1', 'pixel_0_2', 'pixel_0_3', 'pixel_0_4']

In [10]:
digits.frame

In [11]:
digits.images.shape

Out[11]:

(1797, 8, 8)

In [12]:
digits.target.shape

Out[12]:

(1797,)

In [13]:
# importing model
from sklearn.linear_model import LogisticRegression
27
In [14]:
model = LogisticRegression(max_iter = 10000)

In [15]:
# dataset splitting
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(digits.data, digits.target, train_size
= 0.8, random_state = 42)

In [16]:
x_train.shape

Out[16]:

(1437, 64)

In [17]:
y_train.shape

Out[17]:

(1437,)

In [18]:
y_test.shape

Out[18]:

(360,)

In [19]:
x_test.shape

Out[19]:

(360, 64)

In [21]:
# training model
model.fit(x_train,y_train)

Out[21]:

LogisticRegression(max_iter=10000)

In [22]:
y = model.predict(x_test)

In [23]:
model.score(x_test,y_test)

Out[23]:
0.9722222222222222

In [24]:
y[0:5]

Out[24]:

array([6, 9, 3, 7, 2])
28

Lab 8 Part 1

Ananiya Sardana
199303079
CCE-B

Naive Bayes

Importing the Iibraries


In [1]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Importing the dataset


In [5]:

titanicdf = pd.read_csv('titanic.csv')

In [9]:

titanicdf[0:5]
Out[9]:

PassengerId Survived PcIass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked

Braund, Mr. Owen


0 1 0 3 maIe 22.0 1 0 A/5 21171 7.2500 NaN S
Harris

Cumings, Mrs.
1 2 1 1 John BradIey femaIe 38.0 1 0 PC 17599 71.2833 C85 C
(FIorence
Briggs Th...
STON/O2.
2 3 1 3 Heikkinen, Miss. Laina femaIe 26.0 0 0 7.9250 NaN S
3101282

FutreIIe, Mrs. Jacques


3 4 1 1 femaIe 35.0 1 0 113803 53.1000 C123 S
Heath (LiIy May
PeeI)
AIIen, Mr. WiIIiam
4 5 0 3 maIe 35.0 0 0 373450 8.0500 NaN S
Henry

In [10]:

titanicdf.drop(titanicdf.columns.difference(['Pclass','Age','Fare','Survived']), 1, inpla
ce=True)

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: FutureWarning: In a
futur e version of pandas all arguments of DataFrame.drop except for the argument
'labels' will be keyword-only
"""Entry point for launching an IPython kernel.
In [11]:
titanicdf
29

Out[11]:

Survived PcIass Age Fare

0 0 3 22.0 7.2500

1 1 1 38.0 71.2833

2 1 3 26.0 7.9250

3 1 1 35.0 53.1000

4 0 3 35.0 8.0500

... ... ... ... ...

886 0 2 27.0 13.0000

887 1 1 19.0 30.0000

888 0 3 NaN 23.4500

889 1 1 26.0 30.0000

890 0 3 32.0 7.7500

891 rows × 4 coIumns

In [12]:
target = titanicdf.iloc[:, 0].values

In [13]:

target
Out[13]:

array([0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1,
1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1,
1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0,
0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0,
1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0,
1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0,
0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1,
0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1,
1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0,
0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0,
0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0,
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0,
1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,
1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0,
0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1,
1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1,
0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0,
0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0,
0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0,
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1,
0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 30
0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1,
0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1,
1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0])

In [20]:
x = titanicdf.iloc[:, 1:].values

In [21]:

x
Out[21]:

array([[ 3. , 22. , 7.25 ],


[ 1. , 38. , 71.2833],
[ 3. , 26. , 7.925 ],
...,
[ 3. , nan, 23.45 ],
[ 1. , 26. , 30. ],
[ 3. , 32. , 7.75 ]])

In [17]:
print("Number of missing values in Survived:",titanicdf['Survived'].isnull().sum()) print("Number of missing values in
Pclass:",titanicdf['Pclass'].isnull().sum()) print("Number of missing values in Age:",titanicdf['Age'].isnull().sum()) print("Number of
missing values in Fare:",titanicdf['Fare'].isnull().sum())

Number of missing values in Survived: 0


Number of missing values in Pclass: 0
Number of missing values in Age: 177
Number of missing values in Fare: 0

In [23]:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean') imputer.fit(x[:,:])
x[:,:] = imputer.transform(x[:,:])

In [24]:
x

Out[24]:
array([[ 3. , 22. , 7.25 ],
[ 1. , 38. , 71.2833 ],
[ 3. , 26. , 7.925 ],
...,
[ 3. , 29.69911765, 23.45 ],
[ 1. , 26. , 30. ],
[ 3. , 32. , 7.75 ]])

SpIitting the dataset into the Training set and Test set
In [35]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, target, test_size = 0.2, random_s tate = 90)

Feature ScaIing
In [36]: 31
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

Training the Naive Bayes modeI on the Training set


In [37]:

from pandas.core.common import random_state


from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)
Out[37]:

GaussianNB()

Making the Confusion Matrix


In [38]:

from sklearn.metrics import confusion_matrix, accuracy_score


y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[106 17]
[ 30 26]]

Out[38]:
0.7374301675977654
32

Lab 8 Part 2

Ananiya Sardana
199303079
CCE-B

Support Vector
In [40]:

import pandas as pd
from sklearn.datasets import load_iris
iris=load_iris()

In [41]:

dir(iris)
Out[41]:

['DESCR',
'data',
'feature_names',
'filename',
'frame',
'target',
'target_names']

In [7]:

iris.target
Out[7]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [42]:

iris.feature_names
Out[42]:
['sepal length (cm)',
'sepal width (cm)',
'petal length (cm)',
'petal width (cm)']

In [9]:

iris.data
Out[9]:
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2], 33
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
[5. , 3.4, 1.5, 0.2],
[4.4, 2.9, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.1],
[5.4, 3.7, 1.5, 0.2],
[4.8, 3.4, 1.6, 0.2],
[4.8, 3. , 1.4, 0.1],
[4.3, 3. , 1.1, 0.1],
[5.8, 4. , 1.2, 0.2],
[5.7, 4.4, 1.5, 0.4],
[5.4, 3.9, 1.3, 0.4],
[5.1, 3.5, 1.4, 0.3],
[5.7, 3.8, 1.7, 0.3],
[5.1, 3.8, 1.5, 0.3],
[5.4, 3.4, 1.7, 0.2],
[5.1, 3.7, 1.5, 0.4],
[4.6, 3.6, 1. , 0.2],
[5.1, 3.3, 1.7, 0.5],
[4.8, 3.4, 1.9, 0.2],
[5. , 3. , 1.6, 0.2],
[5. , 3.4, 1.6, 0.4],
[5.2, 3.5, 1.5, 0.2],
[5.2, 3.4, 1.4, 0.2],
[4.7, 3.2, 1.6, 0.2],
[4.8, 3.1, 1.6, 0.2],
[5.4, 3.4, 1.5, 0.4],
[5.2, 4.1, 1.5, 0.1],
[5.5, 4.2, 1.4, 0.2],
[4.9, 3.1, 1.5, 0.2],
[5. , 3.2, 1.2, 0.2],
[5.5, 3.5, 1.3, 0.2],
[4.9, 3.6, 1.4, 0.1],
[4.4, 3. , 1.3, 0.2],
[5.1, 3.4, 1.5, 0.2],
[5. , 3.5, 1.3, 0.3],
[4.5, 2.3, 1.3, 0.3],
[4.4, 3.2, 1.3, 0.2],
[5. , 3.5, 1.6, 0.6],
[5.1, 3.8, 1.9, 0.4],
[4.8, 3. , 1.4, 0.3],
[5.1, 3.8, 1.6, 0.2],
[4.6, 3.2, 1.4, 0.2],
[5.3, 3.7, 1.5, 0.2],
[5. , 3.3, 1.4, 0.2],
[7. , 3.2, 4.7, 1.4],
[6.4, 3.2, 4.5, 1.5],
[6.9, 3.1, 4.9, 1.5],
[5.5, 2.3, 4. , 1.3],
[6.5, 2.8, 4.6, 1.5],
[5.7, 2.8, 4.5, 1.3],
[6.3, 3.3, 4.7, 1.6],
[4.9, 2.4, 3.3, 1. ],
[6.6, 2.9, 4.6, 1.3],
[5.2, 2.7, 3.9, 1.4],
[5. , 2. , 3.5, 1. ],
[5.9, 3. , 4.2, 1.5],
[6. , 2.2, 4. , 1. ],
[6.1, 2.9, 4.7, 1.4],
[5.6, 2.9, 3.6, 1.3],
[6.7, 3.1, 4.4, 1.4],
[5.6, 3. , 4.5, 1.5],
[5.8, 2.7, 4.1, 1. ],
[6.2, 2.2, 4.5, 1.5],
[5.6, 2.5, 3.9, 1.1],
[5.9, 3.2, 4.8, 1.8],
[6.1, 2.8, 4. , 1.3],
[6.3, 2.5, 4.9, 1.5],
[6.1, 2.8, 4.7, 1.2], 34
[6.4, 2.9, 4.3, 1.3],
[6.6, 3. , 4.4, 1.4],
[6.8, 2.8, 4.8, 1.4],
[6.7, 3. , 5. , 1.7],
[6. , 2.9, 4.5, 1.5],
[5.7, 2.6, 3.5, 1. ],
[5.5, 2.4, 3.8, 1.1],
[5.5, 2.4, 3.7, 1. ],
[5.8, 2.7, 3.9, 1.2],
[6. , 2.7, 5.1, 1.6],
[5.4, 3. , 4.5, 1.5],
[6. , 3.4, 4.5, 1.6],
[6.7, 3.1, 4.7, 1.5],
[6.3, 2.3, 4.4, 1.3],
[5.6, 3. , 4.1, 1.3],
[5.5, 2.5, 4. , 1.3],
[5.5, 2.6, 4.4, 1.2],
[6.1, 3. , 4.6, 1.4],
[5.8, 2.6, 4. , 1.2],
[5. , 2.3, 3.3, 1. ],
[5.6, 2.7, 4.2, 1.3],
[5.7, 3. , 4.2, 1.2],
[5.7, 2.9, 4.2, 1.3],
[6.2, 2.9, 4.3, 1.3],
[5.1, 2.5, 3. , 1.1],
[5.7, 2.8, 4.1, 1.3],
[6.3, 3.3, 6. , 2.5],
[5.8, 2.7, 5.1, 1.9],
[7.1, 3. , 5.9, 2.1],
[6.3, 2.9, 5.6, 1.8],
[6.5, 3. , 5.8, 2.2],
[7.6, 3. , 6.6, 2.1],
[4.9, 2.5, 4.5, 1.7],
[7.3, 2.9, 6.3, 1.8],
[6.7, 2.5, 5.8, 1.8],
[7.2, 3.6, 6.1, 2.5],
[6.5, 3.2, 5.1, 2. ],
[6.4, 2.7, 5.3, 1.9],
[6.8, 3. , 5.5, 2.1],
[5.7, 2.5, 5. , 2. ],
[5.8, 2.8, 5.1, 2.4],
[6.4, 3.2, 5.3, 2.3],
[6.5, 3. , 5.5, 1.8],
[7.7, 3.8, 6.7, 2.2],
[7.7, 2.6, 6.9, 2.3],
[6. , 2.2, 5. , 1.5],
[6.9, 3.2, 5.7, 2.3],
[5.6, 2.8, 4.9, 2. ],
[7.7, 2.8, 6.7, 2. ],
[6.3, 2.7, 4.9, 1.8],
[6.7, 3.3, 5.7, 2.1],
[7.2, 3.2, 6. , 1.8],
[6.2, 2.8, 4.8, 1.8],
[6.1, 3. , 4.9, 1.8],
[6.4, 2.8, 5.6, 2.1],
[7.2, 3. , 5.8, 1.6],
[7.4, 2.8, 6.1, 1.9],
[7.9, 3.8, 6.4, 2. ],
[6.4, 2.8, 5.6, 2.2],
[6.3, 2.8, 5.1, 1.5],
[6.1, 2.6, 5.6, 1.4],
[7.7, 3. , 6.1, 2.3],
[6.3, 3.4, 5.6, 2.4],
[6.4, 3.1, 5.5, 1.8],
[6. , 3. , 4.8, 1.8],
[6.9, 3.1, 5.4, 2.1],
[6.7, 3.1, 5.6, 2.4],
[6.9, 3.1, 5.1, 2.3],
[5.8, 2.7, 5.1, 1.9],
[6.8, 3.2, 5.9, 2.3],
[6.7, 3.3, 5.7, 2.5],
[6.7, 3. , 5.2, 2.3], 35
[6.3, 2.5, 5. , 1.9],
[6.5, 3. , 5.2, 2. ],
[6.2, 3.4, 5.4, 2.3],
[5.9, 3. , 5.1, 1.8]])

In [8]:
iris.target_names

Out[8]:

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [11]:
df=pd.DataFrame(iris.data,columns=iris.feature_names) df

Out[11]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

... ... ... ... ...

145 6.7 3.0 5.2 2.3

146 6.3 2.5 5.0 1.9

147 6.5 3.0 5.2 2.0

148 6.2 3.4 5.4 2.3

149 5.9 3.0 5.1 1.8

150 rows × 4 columns

In [12]:
iris.target

Out[12]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [13]:
df['target']=iris.target df.head()

Out[13]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target

0 5.1 3.5 1.4 0.2 0

1 4.9 3.0 1.4 0.2 0

2 4.7 3.2 1.3 0.2 0


3 4.6 3.1 1.5 0.2 0
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
4 5.0 3.6 1.4 0.2 0
36

In [18]:
df['flowername']=df.target.apply(lambda x: iris.target_names[x]) df.head()

Out[18]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target flowername

0 5.1 3.5 1.4 0.2 0 setosa

1 4.9 3.0 1.4 0.2 0 setosa

2 4.7 3.2 1.3 0.2 0 setosa

3 4.6 3.1 1.5 0.2 0 setosa

4 5.0 3.6 1.4 0.2 0 setosa

In [20]:
iris.target_names

Out[20]:

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [22]:
from sklearn.model_selection import train_test_split x=df.drop(['target','flowername'],axis='columns') y=df.target

In [38]:
x_test

Out[38]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

44 5.1 3.8 1.9 0.4

69 5.6 2.5 3.9 1.1

121 5.6 2.8 4.9 2.0

53 5.5 2.3 4.0 1.3

89 5.5 2.5 4.0 1.3

13 4.3 3.0 1.1 0.1

105 7.6 3.0 6.6 2.1

86 6.7 3.1 4.7 1.5

62 6.0 2.2 4.0 1.0

20 5.4 3.4 1.7 0.2

1 4.9 3.0 1.4 0.2

92 5.8 2.6 4.0 1.2

102 7.1 3.0 5.9 2.1

133 6.3 2.8 5.1 1.5

112 6.8 3.0 5.5 2.1

115 6.4 3.2 5.3 2.3

85 6.0 3.4 4.5 1.6

48 5.3 3.7 1.5 0.2


125 sepal length sepal width petal length petal width
(c7m.2) (c3m.2) (c6m.0) (c1m.8) 37
138 6.0 3.0 4.8 1.8

73 6.1 2.8 4.7 1.2

63 6.1 2.9 4.7 1.4

39 5.1 3.4 1.5 0.2

2 4.7 3.2 1.3 0.2

146 6.3 2.5 5.0 1.9

33 5.5 4.2 1.4 0.2

149 5.9 3.0 5.1 1.8

81 5.5 2.4 3.7 1.0

72 6.3 2.5 4.9 1.5

60 5.0 2.0 3.5 1.0

In [33]:

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

In [34]:

from sklearn.svm import SVC


model=SVC()

In [35]:

model.fit(x_train,y_train)
Out[35]:

SVC()

In [36]:

model.score(x_test,y_test)
Out[36]:
0.9666666666666667

In [39]:

model.predict([[5.0,2.0,3.5,1.0]])
Out[39]:
array([1])

In [ ]:
38

Lab 9

Ananiya Sardana
199303079
CCE-B

Decision Tree

Importing the Iibraries


In [1]:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Importing the dataset


In [2]:

import requests
import io

In [3]:

url = "https://raw.githubusercontent.com/codebasics/py/master/ML/9_decision_tree/salaries
.csv"

In [4]:

download = requests.get(url).content

In [5]:

df = pd.read_csv(io.StringIO(download.decode('utf-8')))

In [6]:

df
Out[6]:

company job degree saIary_more_then_100k

0 googIe saIes executive bacheIors 0

1 googIe saIes executive masters 0

2 googIe business manager bacheIors 1

3 googIe business manager masters 1

computer
4 googIe bacheIors 0
programmer
computer masters 1
5 googIe
company programmjo degree saIary_more_then_100k
ebr
6 abc pharma saIes executive masters 0

computer 39
7 abc pharma

bacheIors 0
programmer

8 abc pharma business manager bacheIors 0

9 abc pharma business manager masters 1

10 facebook saIes executive bacheIors 1

11 facebook saIes executive masters 1

12 facebook business manager bacheIors 1

13 facebook business manager masters 1

computer
14facebook bacheIors 1
programmer
computer
15 facebook
programmer masters 1

In [7]:
target = df.iloc[:, -1].values

In [8]:
target

Out[8]:

array([O, O, 1, 1, O, 1, O, O, O, 1, 1, 1, 1, 1, 1, 1])

In [9]:
x = df.iloc[:, O:-1].values

In [1O]:

x
Out[1O]:

array([['google', 'sales executive', 'bachelors'],


['google', 'sales executive', 'masters'],
['google', 'business manager', 'bachelors'],
['google', 'business manager', 'masters'],
['google', 'computer programmer', 'bachelors'],
['google', 'computer programmer', 'masters'],
['abc pharma', 'sales executive', 'masters'],
['abc pharma', 'computer programmer', 'bachelors'],
['abc pharma', 'business manager', 'bachelors'],
['abc pharma', 'business manager', 'masters'],
['facebook', 'sales executive', 'bachelors'],
['facebook', 'sales executive', 'masters'],
['facebook', 'business manager', 'bachelors'],
['facebook', 'business manager', 'masters'],
['facebook', 'computer programmer', 'bachelors'],
['facebook', 'computer programmer', 'masters']], dtype=object)

Encoding categoricaI data

Encoding the Dependent VariabIe

In [11]:
target = le_target.fit_transform(target)
40
In [12]:
print(target)

[O O 1 1 O 1 O O O 1 1 1 1 1 1 1]

In [13]:
le_company = LabelEncoder()
x[:,O] = le_company.fit_transform(x[:,O])

In [14]:
le_job = LabelEncoder()
x[:,1] = le_job.fit_transform(x[:,1])

In [15]:
le_degree = LabelEncoder()
x[:,2] = le_degree.fit_transform(x[:,2])

In [16]:
print(target)

[O O 1 1 O 1 O O O 1 1 1 1 1 1 1]

In [17]:
[[2 2 O]
print(x)
[2 2 1]
[2 O O]
[2 O 1]
[2 1 O]
[2 1 1]
[O 2 1]
[O 1 O]
[O O O]
[O O 1]
[1 2 O]
[1 2 1]
[1 O O]
[1 O 1]
[1 1 O]
[1 1 1]]

SpIitting the dataset into the Training set and Test set
In [18]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, target, test_size = O.3, random_s tate = 4O)

Training the Decision Tree modeI on the Training set


In [19]:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 4O) classifier.fit(x_train, y_train)

Out[19]:
DecisionTreeClassifier(criterion='entropy', random_state=4O)
41
Making the Confusion Matrix
In [2O]:

from sklearn.metrics import confusion_matrix, accuracy_score


y_pred = classifier.predict(x_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[1 O]
[O 4]]

Out[2O]:
1.O

In [22]:

from sklearn.metrics import classification_report


print(classification_report(y_test, y_pred))

precision recall f1-score support

O 1.OO 1.OO 1.OO 1


1 1.OO 1.OO 1.OO 4

accuracy 1.OO 5
macro avg 1.OO 1.OO 1.OO 5
weighted avg 1.OO 1.OO 1.OO 5

In [ ]:
42

Lab 10

Ananiya Sardana
199303079
CCE-B

K-Nearest Neighbour
In [3]:

import pandas as pd
from sklearn.datasets import load_iris
iris=load_iris()

In [4]:

dir(iris)
Out[4]:

['DESCR',
'data',
'feature_names',
'filename',
'frame',
'target',
'target_names']

In [39]:

iris.frame

In [37]:

iris.feature_names
Out[37]:
['sepal length (cm)',
'sepal width (cm)',
'petal length (cm)',
'petal width (cm)']

In [49]:

df=pd.DataFrame(iris.data,columns=iris.feature_names)
df[9:12]
Out[49]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

9 4.9 3.1 1.5 0.1

10 5.4 3.7 1.5 0.2

11 4.8 3.4 1.6 0.2


In [12]:
43
df['target']=iris.target
df
Out[12]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target

0 5.1 3.5 1.4 0.2 0

1 4.9 3.0 1.4 0.2 0

2 4.7 3.2 1.3 0.2 0

3 4.6 3.1 1.5 0.2 0

4 5.0 3.6 1.4 0.2 0

... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 2

146 6.3 2.5 5.0 1.9 2

147 6.5 3.0 5.2 2.0 2

148 6.2 3.4 5.4 2.3 2

149 5.9 3.0 5.1 1.8 2

150 rows × 5 columns

In [15]:
df[df.target==2].head()

Out[15]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target
1006.33.36.02.52

101 5.8 2.7 5.1 1.9 2

102 7.1 3.0 5.9 2.1 2

103 6.3 2.9 5.6 1.8 2

104 6.5 3.0 5.8 2.2 2

In [16]:
df['flowername']=df.target.apply(lambda x: iris.target_names[x]) df

Out[16]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target flowername

0 5.1 3.5 1.4 0.2 0 setosa

1 4.9 3.0 1.4 0.2 0 setosa

2 4.7 3.2 1.3 0.2 0 setosa

3 4.6 3.1 1.5 0.2 0 setosa

4 5.0 3.6 1.4 0.2 0 setosa

... ... ... ... ... ... ...

145 6.7 3.0 5.2 2.3 2 virginica

146 6.3 2.5 5.0 1.9 2 virginica

147 6.5 3.0 5.2 2.0 2 virginica

148 6.2 3.4 5.4 2.3 2 virginica

149 5.9 3.0 5.1 1.8 2 virginica


sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target flowername
150 rows × 6 columns 44

In [18]:
from sklearn.model_selection import train_test_split x=df.drop(['target','flowername'],axis='columns') y=df.target

In [32]:

y
Out[32]:

0 0
1 0
2 0
3 0
4 0
..
145 2
146 2
147 2
148 2
149 2
Name: target, Length: 150, dtype: int32

In [21]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

In [23]:
len(x_test)

Out[23]:

30

In [52]:
from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=20)

In [53]:
knn.fit(x_train, y_train) KNeighborsClassifier(n_neighbors=20) knn.score(x_test, y_test)

Out[53]:
0.9666666666666667

In [44]:
knn.predict([[4.8,3.0,1.5,0.3]])

Out[44]:

array([0])

In [46]:
from sklearn.metrics import confusion_matrix y_pred = knn.predict(x_test)
cm = confusion_matrix(y_test, y_pred) cm

Out[46]:

array([[10, 0, 0],
[ 0, 10, 1],
[ 0, 0, 9]], dtype=int64) 45

In [54]:
y_pred

Out[54]:

array([2, 0, 1, 1, 1, 0, 0, 2, 2, 2, 2, 1, 2, 0, 0, 1, 2, 1, 2, 2, 1, 0,
1, 0, 0, 0, 0, 1, 1, 2])

In [55]:
y_test

Out[55]:

121 2
21 0
90 1
55 1
72 1
33 0
49 0
149 2
119 2
117 2
135 2
84 1
130 2
13 0
47 0
67 1
83 1
82 1
145 2
116 2
85 1
45 0
86 1
48 0
0 0
34 0
36 0
87 1
91 1
124 2
Name: target, dtype: int32

In [47]:
%matplotlib inline
import matplotlib.pyplot as plt import seaborn as sn plt.figure(figsize=(7,5)) sn.heatmap(cm, annot=True) plt.xlabel('Predicted')
plt.ylabel('Truth')

Out[47]:

Text(42.0, 0.5, 'Truth')


46

In [48]:

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))
precision recall f1-score support

0 1.00 1.00 1.00 10


1 1.00 0.91 0.95 11
2 0.90 1.00 0.95 9

accuracy 0.97 30
macro avg 0.97 0.97 0.97 30
weighted avg 0.97 0.97 0.97 30

In [ ]:
47

Lab 11

Ananiya Sardana
199303079
CCE-B

K-Means Clustering
In [29]:

from sklearn.cluster import KMeans


import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from matplotlib import pyplot as plt
%matplotlib inline

In [30]:

url=("https://raw.githubusercontent.com/codebasics/py/master/ML/13_kmeans/income.csv")
df = pd.read_csv(url)
df.head()
Out[30]:

Name Age Income($)

0 Rob 27 70000

1 Michael 29 90000

2 Mohan 29 61000

3 Ismail 28 60000

4 Kory 42 150000

In [31]:

plt.scatter(df.Age,df['Income($)'])
plt.xlabel('Age')
plt.ylabel('Income($)')
Out[31]:

Text(0, 0.5, 'Income($)')


48

In [32]:
km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']]) y_predicted

Out[32]:

array([2, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0])

In [33]:
df['cluster']=y_predicted df.head()

Out[33]:

Name Age Income($) cluster

0 Rob 27 70000 2

1 Michael 29 90000 2

2 Mohan 29 61000 0

3 Ismail 28 60000 0

4 Kory 42 150000 1

In [34]:
km.cluster_centers_

Out[34]:

array([[3.29090909e+01, 5.61363636e+04],
[3.82857143e+01, 1.50000000e+05],
[3.40000000e+01, 8.05000000e+04]])

In [35]:
df1 = df[df.cluster==0] df2 = df[df.cluster==1] df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green') plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*', label='centroid')
plt.xlabel('Age') plt.ylabel('Income ($)') plt.legend()

Out[35]:

<matplotlib.legend.Legend at 0x1aca98e40d0>
49
In [36]:
scaler = MinMaxScaler()

scaler.fit(df[['Income($)']])
df['Income($)'] = scaler.transform(df[['Income($)']])

scaler.fit(df[['Age']])
df['Age'] = scaler.transform(df[['Age']])

In [37]:
df.head()

Out[37]:

Name Age Income($) cluster

0 Rob 0.058824 0.213675 2

1 Michael 0.176471 0.384615 2

2 Mohan 0.176471 0.136752 0

3 Ismail 0.117647 0.128205 0

4 Kory 0.941176 0.897436 1

In [38]:
plt.scatter(df.Age,df['Income($)'])

Out[38]:

<matplotlib.collections.PathCollection at 0x1aca997a520>

In [39]:
km = KMeans(n_clusters=3)
y_predicted = km.fit_predict(df[['Age','Income($)']]) y_predicted

Out[39]:

array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2])

In [40]:
km.cluster_centers_

Out[40]:

array([[0.1372549 , 0.11633428],
[0.72268908, 0.8974359 ],
[0.85294118, 0.2022792 ]])
In [41]:
50
df['cluster']=y_predicted
df.head()
Out[41]:

Name Age Income($) cluster

0 Rob 0.058824 0.213675 0

1 Michael 0.176471 0.384615 0

2 Mohan 0.176471 0.136752 0

3 Ismail 0.117647 0.128205 0

4 Kory 0.941176 0.897436 1

In [42]:
df1 = df[df.cluster==0] df2 = df[df.cluster==1] df3 = df[df.cluster==2]
plt.scatter(df1.Age,df1['Income($)'],color='green') plt.scatter(df2.Age,df2['Income($)'],color='red')
plt.scatter(df3.Age,df3['Income($)'],color='black')
plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='purple',marker='*', label='centroid')
plt.legend()

Out[42]:

<matplotlib.legend.Legend at 0x1aca99e0ee0>

In [43]:
sse = []
k_rng = range(1,10) for k in k_rng:
km = KMeans(n_clusters=k) km.fit(df[['Age','Income($)']]) sse.append(km.inertia_)
plt.xlabel('K')
plt.ylabel('Sum of squared error') plt.plot(k_rng,sse)

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\cluster\_kmeans.py:881: UserWarning: K Means is known to have a memory


leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable
OMP_NUM_THREADS=1
.
warnings.warn(
Out[43]:

[<matplotlib.lines.Line2D at 0x1aca9a5d1c0>]
51

In [ ]:
52

Lab 12

Ananiya Sardana
199303079
CCE-B

Principle Component Analysis


In [1]:

# PCA
# import digit dataset to implement PCA
from sklearn.datasets import load_digits
import pandas as pd
dataset = load_digits()
# use to check what the dataset is all about
dataset.keys()
Out[1]:
dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'images', 'DESCR']
)

In [2]:

#to shack shape of data [it contain 1797 sample and each sample contain 64 features]
dataset.data.shape
Out[2]:
(1797, 64)

In [3]:
# first element of dataset which is in the form of 1 d array
dataset.data[O]
Out[3]:
array([ O., O., 5., 13., 9., 1., O., O., O., O., 13., 15., 1O.,
15., 5., O., O., 3., 15., 2., O., 11., 8., O., O., 4.,
12., O., O., 8., 8., O., O., 5., 8., O., O., 9., 8.,
O., O., 4., 11., O., 1., 12., 7., O., O., 2., 14., 5.,
1O., 12., O., O., O., O., 6., 13., 1O., O., O., O.])

In [4]:

# for visualization using matplot lib convert 1 d array to 2 d array


dataset.data[O].reshape(8,8)
Out[4]:
array([[ O., O., 5., 13., 9., 1., O., O.],
[ O., O., 13., 15., 1O., 15., 5., O.],
[ O., 3., 15., 2., O., 11., 8., O.],
[ O., 4., 12., O., O., 8., 8., O.],
[ O., 5., 8., O., O., 9., 8., O.],
[ O., 4., 11., O., 1., 12., 7., O.],
[ O., 2., 14., 5., 1O., 12., O., O.],
[ O., O., 6., 13., 1O., O., O., O.]])
53
In [5]:
# data visualization
from matplotlib import pyplot as plt
%matplotlib inline plt.gray()
plt.matshow(dataset.data[O].reshape(8,8))

Out[5]:

<matplotlib.image.AxesImage at Ox1ae666ed4cO>

<Figure size 432x288 with O Axes>

In [7]:
# check the dataset target
dataset.target[:5]

Out[7]:

array([O, 1, 2, 3, 4])

In [8]:
# create dataframe
df = pd.DataFrame(dataset.data, columns=dataset.feature_names) df.head()

Out[8]:

pixel_0_0 pixel_0_1 pixel_0_2 pixel_0_3 pixel_0_4 pixel_0_5 pixel_0_6 pixel_0_7 pixel_1_0 pixel_1_1 ... pixel_6_6 pixel
00.00.05.013.09.01.00.00.00.00.0 ...0.0

1 0.0 0.0 0.0 12.0 13.0 5.0 0.0 0.0 0.0 0.0 ... 0.0

2 0.0 0.0 0.0 4.0 15.0 12.0 0.0 0.0 0.0 0.0 ... 5.0

3 0.0 0.0 7.0 15.0 13.0 1.0 0.0 0.0 0.0 8.0 ... 9.0

4 0.0 0.0 0.0 1.0 11.0 0.0 0.0 0.0 0.0 0.0 ... 0.0

5 rows × 64 columns

In [9]:
X = df
y = dataset.target

In [1O]:
# scaling the value using standardscaler [range is -1 to 1]. you can use minmax scaler al so
from sklearn.preprocessing import StandardScaler
54
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) X_scaled

Out[1O]:

array([[ O. , -O.335O1649, -O.O43O81O2, ..., -1.14664746,


-O.5O56698 , -O.196OO752],
[ O. , -O.335O1649, -1.O9493684, ..., O.54856O67,
-O.5O56698 , -O.196OO752],
[ O. , -O.335O1649, -1.O9493684, ..., 1.56568555,
1.6951369 , -O.196OO752],
...,
[ O. , -O.335O1649, -O.88456568, ..., -O.12952258,
-O.5O56698 , -O.196OO752],
[ O. , -O.335O1649, -O.67419451, ..., O.8876O23 ,
-O.5O56698 , -O.196OO752],
[ O. , -O.335O1649, 1.OO877481, ..., O.8876O23 ,
-O.26113572, -O.196OO752]])

In [12]:
# splitting the dataset and use logistic regression
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=O.2, random_s tate=3O)


from sklearn.linear_model import LogisticRegression

model = LogisticRegression() model.fit(X_train, y_train) model.score(X_test, y_test)

Out[12]:
O.9722222222222222

In [14]:
# apply pca for retaining 95 percent useful features
from sklearn.decomposition import PCA

pca = PCA(O.95)
X_pca = pca.fit_transform(X) X_pca.shape

Out[14]:

(1797, 29)

In [17]:
# split new dataframe and train the model
X_train_pca, X_test_pca, y_train, y_test = train_test_split(X_pca, y, test_size=O.2, ran dom_state=3O)

In [18]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1OOO) model.fit(X_train_pca, y_train) model.score(X_test_pca, y_test)

Out[18]:
O.9694444444444444

In [21]:
#from the above scenario we can see that despite dropping lots of features we obtained ne
ar by similar accuracy
55
In [22]:

pca = PCA(n_components=2) X_pca = pca.fit_transform(X) X_pca.shape

Out[22]:

(1797, 2)

In [23]:

X_train_pca, X_test_pca, y_train, y_test = train_test_split(X_pca, y, test_size=O.2, ran dom_state=3O)

model = LogisticRegression(max_iter=1OOO) model.fit(X_train_pca, y_train) model.score(X_test_pca, y_test)

Out[23]:

O.6O83333333333333

In [25]:

#We get less accuancy (~6O%) as using only 2 components did not retain much of the featur e information.
#However in real life you will find many cases where using 2 or few PCA components can st ill give you a pretty good accuracy
56

Lab 13

Ananiya Sardana
199303079
CCE-B

Neural Network
In [1]:

#Handwritten digits classification using neural network


#In this notebook we will classify handwritten digits using a simple
#neural network which has only input and output layers.
#We will than add a hidden layer and see how the performance of the model improves

In [2]:

# import all the required framework and libraries


import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

In [3]:

# load mnist dataset


(X_train, y_train) , (X_test, y_test) = keras.datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.n


pz
11493376/11490434 [==============================] - 9s 1us/step
11501568/11490434 [==============================] - 9s 1us/step

In [4]:

#normalize the images


X_train = X_train / 255
X_test = X_test / 255

In [5]:

#reshape and flattened image array


X_train_flattened = X_train.reshape(len(X_train), 28*28)
X_test_flattened = X_test.reshape(len(X_test), 28*28)
X_train_flattened.shape
Out[5]:
(60000, 784)

In [6]:

# create a neural network with one input and one output layer
model = keras.Sequential([
keras.layers.Dense(10, input_shape=(784,), activation='sigmoid')
])
model.compile(optimizer='adam', 57
loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# train the neural network


model.fit(X_train_flattened, y_train, epochs=5)
2022-05-15 00:30:43.502161: I tensorflow/core/platform/cpu_feature_guard.cc:151] This Ten sorFlow binary is optimized with
oneAPI Deep Neural Network Library (oneDNN) to use the f ollowing CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flag s.

Epoch 1/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.4679 - accuracy: 0.877
0
Epoch 2/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.3044 - accuracy: 0.915
6
Epoch 3/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2836 - accuracy: 0.920
8
Epoch 4/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2733 - accuracy: 0.923
4
Epoch 5/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2669 - accuracy: 0.925
4
Out[6]:

<keras.callbacks.History at 0x7f8bc51d0b50>

In [7]:
#evaluate the performance of th neural network
model.evaluate(X_test_flattened, y_test)

313/313 [==============================] - 0s 925us/step - loss: 0.2721 - accuracy: 0.924


4

Out[7]:

[0.2720883786678314, 0.9243999719619751]

In [8]:
# some predictions
y_predicted = model.predict(X_test_flattened) y_predicted[0]

Out[8]:

array([1.6449779e-02, 5.1594122e-07, 4.0429801e-02, 9.6460772e-01,


1.2413859e-03, 1.0533932e-01, 1.7105019e-06, 9.9976885e-01,
8.2001984e-02, 6.8728566e-01], dtype=float32)

In [9]:
# display input image at particular index
plt.matshow(X_test[0])

Out[9]:

<matplotlib.image.AxesImage at 0x7f8bc55fd250>
58

In [10]:
#np.argmax finds a maximum element from an array and returns the index of it
np.argmax(y_predicted[0])

Out[10]:

In [11]:
#predictions
y_predicted_labels = [np.argmax(i) for i in y_predicted] y_predicted_labels[:5]

Out[11]:

[7, 2, 1, 0, 4]

In [12]:
# create confusion matrix
cm = tf.math.confusion_matrix(labels=y_test,predictions=y_predicted_labels) cm

Out[12]:

<tf.Tensor: shape=(10, 10), dtype=int32, numpy=


array([[ 962, 0, 2, 2, 0, 5, 6, 2, 1, 0],
[ 0, 1119, 2, 2, 0, 1, 4, 2, 5, 0],
[ 7, 12, 893, 33, 7, 4, 12, 11, 48, 5],
[ 1, 0, 9, 939, 0, 21, 2, 10, 21, 7],
[ 1, 2, 2, 2, 895, 0, 15, 4, 11, 50],
[ 9, 3, 1, 45, 6, 769, 17, 4, 32, 6],
[ 9, 3, 4, 2, 7, 12, 917, 2, 2, 0],
[ 1, 10, 18, 9, 3, 1, 0, 946, 3, 37],
[ 5, 11, 5, 27, 9, 21, 8, 11, 871, 6],
[ 11, 7, 1, 14, 13, 5, 0, 20, 5, 933]],
dtype=int32)>

In [13]:
#plot confusion matrix import seaborn as sn plt.figure(figsize = (10,7))
sn.heatmap(cm, annot=True, fmt='d') plt.xlabel('Predicted') plt.ylabel('Truth')

Out[13]:

Text(69.0, 0.5, 'Truth')


59

In [15]:
#add hidden layer in the neural network #Using hidden layer
model = keras.Sequential([
keras.layers.Dense(100, input_shape=(784,), activation='relu'), keras.layers.Dense(10, activation='sigmoid')
])

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train_flattened, y_train, epochs=5)

Epoch 1/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2746 - accuracy: 0.921
9
Epoch 2/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.1270 - accuracy: 0.962
6
Epoch 3/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0889 - accuracy: 0.973
3
Epoch 4/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0668 - accuracy: 0.979
4
Epoch 5/5
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0533 - accuracy: 0.983
1
Out[15]:

<keras.callbacks.History at 0x7f8bc8723cd0>

In [16]:
#evlaute the performance of the new neural network with hidden layers
model.evaluate(X_test_flattened,y_test)

313/313 [==============================] - 0s 1ms/step - loss: 0.0794 - accuracy: 0.9751


Out[16]:

[0.07935800403356552, 0.9750999808311462]

In [17]:
#predictions
y_predicted = model.predict(X_test_flattened) y_predicted_labels = [np.argmax(i) for i in y_predicted] #create confusion matrix
cm = tf.math.confusion_matrix(labels=y_test,predictions=y_predicted_labels)
#plot confusion matrix plt.figure(figsize = (10,7)) sn.heatmap(cm, annot=True, fmt='d') plt.xlabel('Predicted') plt.ylabel('Truth')
Out[17]:
60
Text(69.0, 0.5, 'Truth')

In [18]:
#Using Flatten layer so that we don't have to call .reshape on input dataset
model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)), keras.layers.Dense(100, activation='relu'),
keras.layers.Dense(10, activation='sigmoid')
])

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy', metrics=['accuracy'])

#train the model


model.fit(X_train, y_train, epochs=10)

Epoch 1/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2665 - accuracy: 0.922
7
Epoch 2/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.1204 - accuracy: 0.964
2
Epoch 3/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0852 - accuracy: 0.974
3
Epoch 4/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0642 - accuracy: 0.980
3
Epoch 5/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0510 - accuracy: 0.984
5
Epoch 6/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0408 - accuracy: 0.987
7
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.0339 - accuracy: 0.989
2
Epoch 8/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0269 - accuracy: 0.991
8
Epoch 9/10

You might also like