You are on page 1of 30

Python for Machine Learning

Tips & Tools

Prof. Sajeev G P
December 6, 2018
Dept Of CSE, Amrita School of Engineering
Amrita Vishwa Vidyapeetham
sajeevgp@am.amrita.edu

1
Outline

Contents
• Python Data Structures
• Python Modules for Machine Learning
• NumPy, random
• Collections
• Sklearn
• K-Nearest-Neigbor (KNN) Algorithm
• Basics, Example
• Hands-On
• References

2
Python Data Structures for ML

Python Lists
Example
• List
# empty list
• Tupple
my_list = []
• Set # list of integers
• Dictionary my_list = [1, 2, 3]
• String # list with mixed datatypes
my_list = [1, "Hello", 3.4]
List # nested list
• Mutable data type my_list = ["mouse", [8, 4, 6],
• List Comprehension [’a’]]
• Mixed data types possible

3
Basic Programs

Palindrome
Ask the user for a string and print out whether this string is a
palindrome or not.

4
Basic Programs

Palindrome
Ask the user for a string and print out whether this string is a
palindrome or not.
Solution
wrd=input("Please enter a word")
wrd=str(wrd)
rvs=wrd[::-1]
print(rvs)
if wrd == rvs:
print("This word is a palindrome")
else:
print("This word is not a palindrome")

4
Lists: Example 1

List Comprehension
Write a Python program to print a specified list after removing
the 0th, 4th and 5th elements.

Solution

color = [’Red’, ’Green’, ’White’,


’Black’, ’Pink’, ’Yellow’]
color = [x for (i,x) in enumerate(color)
if i not in (0,4,5)]
print(color)

5
NumPy

2-dimensional NumPy array


Write a Python program to create a 2d array with 1 on the
border and 0 inside.

6
NumPy

2-dimensional NumPy array


Write a Python program to create a 2d array with 1 on the
border and 0 inside.

Solution
import numpy as np
x = np.ones((5,5))
print(x)
x[1:-1,1:-1] = 0
print(x)

6
Libraries Used

• sklearn
• sklearn.datasets
• numpy.random, numpy.random.seed
• Collections.counter

7
sklearn

Example
sklearn

>>> from sklearn.datasets import load_iris


>>> data = load_iris()
>>> data.target[[10, 25, 50]]
array([0, 0, 1])
>>> list(data.target_names)
[’setosa’, ’versicolor’, ’virginica’]

8
NumPy.Random

• numpy.random:- Return random floats in the half-open


interval [0.0, 1.0)
• numpy.random.seed(seed):- random generator with the given
seed value

9
NumPy.Random (contd.)

Example

>>> np.random.random_sample()
0.47108547995356098
>>> type(np.random.random_sample())
<type ’float’>
>>> np.random.random_sample((5,))
array([ 0.30220482, 0.86820401, 0.1654503 ,
0.11659149, 0.54323428])

10
NumPy.Random (contd.)

Example
seed

>>> import numpy as np


>>> np.random.seed(0)
>>> perm = np.random.permutation(10)
>>> print perm
[2 8 4 9 1 6 7 3 0 5]

11
Counter in Collections

A Counter is a container that keeps track of how many times


equivalent values are added.

>>>import collections
>>>print collections.Counter([’a’, ’b’, ’c’, ’a’, ’b’, ’b’]

12
Counter in Collections (contd.)

Example
Getting the count

import collections
c = collections.Counter(’abcdaab’)
for letter in ’abcde’:
print ’%s : %d’ % (letter, c[letter])

most common() will produce a sequence of the n most


frequently encountered input values and their
respective counts

13
Counter in Collections (contd.)

Example

import collections
c = collections.Counter(’azxyxyzbcdaababababbbxxxxxa’)
for letter, count in c.most_common(2):
print ’%s: %7d’ % (letter, count)

14
KNN Classifier

Steps Involved

1. Load the iris data using sklearn


2. Prepare training and test set using random
3. Calculate the distance using np.linalg
4. Find K Neighbors
5. Vote for the Majority class using Probability calculation
6. List predicted Classes using vote

15
Data Handling

Datahandling: sklearn
Write a Python program to load iris dataset and print labels
using sklearn
Solution
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
iris_data = iris.data
iris_labels = iris.target
print(iris_data[0], iris_data[79], iris_data[100])
print(iris_labels[0], iris_labels[79], iris_labels[100])

16
Preparing Training Dataset

random method in NumPy


Write a Python program to prepare Learning set using random in
NumPy
Solution
np.random.seed(42)
indices = np.random.permutation(len(iris_data))
n_training_samples = 12
learnset_data = iris_data[indices[:-n_training_samples]]
learnset_labels = iris_labels[indices[:-n_training_samples]
testset_data = iris_data[indices[-n_training_samples:]]
testset_labels = iris_labels[indices[-n_training_samples:]]
print(learnset_data[:4], learnset_labels[:4])
print(testset_data[:4], testset_labels[:4])

17
Vizualizing the data
Matplotlib
Write a Python program to Vizualize the data using Matplotlib

18
Vizualizing the data
Matplotlib
Write a Python program to Vizualize the data using Matplotlib
Solution
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
colours = ("r", "b")
X = []
for iclass in range(3):
X.append([[], [], []])
for i in range(len(learnset_data)):
if learnset_labels[i] == iclass:
X[iclass][0].append(learnset_data[i][0])
X[iclass][1].append(learnset_data[i][1])
X[iclass][2].append(sum(learnset_data[i][2:]))
colours = ("r", "g", "y")
fig = plt.figure()
ax = fig.add_subplot(111, projection=’3d’)
for iclass in range(3):
ax.scatter(X[iclass][0], X[iclass][1],
X[iclass][2], c=colours[iclass])
plt.show()

18
Computing Similarity

Write a distance function to calculate Euclidean distance between


two instances.

19
Computing Similarity

Write a distance function to calculate Euclidean distance between


two instances.
Solution
def distance(instance1, instance2):
# just in case, if the instances are lists or tuples:
instance1 = np.array(instance1)
instance2 = np.array(instance2)

return np.linalg.norm(instance1 - instance2)


print(distance([3, 5], [1, 1]))
print(distance(learnset_data[3], learnset_data[44]))

19
Finding Neighbors

Write a fuction to return K Neighbors.


Solution
def get_neighbors(training_set,
labels,test_instance,
k,distance=distance):
distances = []
for index in range(len(training_set)):
dist = distance(test_instance, training_set[index])
distances.append((training_set[index],
dist, labels[index]))
distances.sort(key=lambda x: x[1])
neighbors = distances[:k]
return neighbors

20
Majority Voting

Counter in Collections
Write a Python program to find most common class

21
Majority Voting

Counter in Collections
Write a Python program to find most common class
Solution
from collections import Counter
def vote(neighbors):
class_counter = Counter()
for neighbor in neighbors:
class_counter[neighbor[2]] += 1
return class_counter.most_common(1)[0][0]

21
Majority Voting

Vote on Training set


Write a program to get vote on training samples
Solution
for i in range(n_training_samples):
neighbors = get_neighbors(learnset_data,
learnset_labels,
testset_data[i],
3,
distance=distance)
print("index: ", i,
", result of vote: ", vote(neighbors),
", label: ", testset_labels[i],
", data: ", testset_data[i])

22
Voting with Probability

Vote Probability
Write a function to return the class name and the Probability.
Solution
def vote_prob(neighbors):
class_counter = Counter()
for neighbor in neighbors:
class_counter[neighbor[2]] += 1
labels, votes = zip(*class_counter.most_common())
winner = class_counter.most_common(1)[0][0]
votes4winner = class_counter.most_common(1)[0][1]
return winner, votes4winner/sum(votes)

23
Testing
Predicted classes
Write a Python program
Solution
for i in range(n_training_samples):
neighbors = get_neighbors(learnset_data,
learnset_labels,
testset_data[i],
5,
distance=distance)
print("index: ", i,
", vote_prob: ", vote_prob(neighbors),
", label: ", testset_labels[i],
", data: ", testset_data[i])

24
References

Duncan, S.
Automate the boring stuff with python.
Software Quality Professional 17, 4 (2015), 53.
Harms, D. D., and McDonald, K.
The quick python book.
Manning, 2000.
Python Machine Learning Tutorial.
KNN Classifier with Python.
https: // www. python-course. eu/ k_ nearest_
neighbor_ classifier. php .

25

You might also like