Python For Machine Leraning

Python for Machine Learning
Tips & Tools
Prof. Sajeev G P
December 6, 2018
Dept Of CSE, Amrita School of Engineering
Amrita Vishwa Vidyapeetham
sajeevgp@am.amrita.edu
1
Outline
Contents
• Python Data Structures
• Python Modules for Machine Learning
• NumPy, random
• Collections
• Sklearn
• K-Nearest-Neigbor (KNN) Algorithm
• Basics, Example
• Hands-On
• References
2
Python Data Structures for ML
Python Lists
Example
• List
# empty list
• Tupple
my_list = []
• Set # list of integers
• Dictionary my_list = [1, 2, 3]
• String # list with mixed datatypes
my_list = [1, "Hello", 3.4]
List # nested list
• Mutable data type my_list = ["mouse", [8, 4, 6],
• List Comprehension [’a’]]
• Mixed data types possible
3
Basic Programs
Palindrome
Ask the user for a string and print out whether this string is a
palindrome or not.
4
Basic Programs
Palindrome
Ask the user for a string and print out whether this string is a
palindrome or not.
Solution
wrd=input("Please enter a word")
wrd=str(wrd)
rvs=wrd[::-1]
print(rvs)
if wrd == rvs:
print("This word is a palindrome")
else:
print("This word is not a palindrome")
4
Lists: Example 1
List Comprehension
Write a Python program to print a specified list after removing
the 0th, 4th and 5th elements.
Solution
color = [’Red’, ’Green’, ’White’,

’Black’, ’Pink’, ’Yellow’]
color = [x for (i,x) in enumerate(color)
if i not in (0,4,5)]
print(color)
5
NumPy
2-dimensional NumPy array

Write a Python program to create a 2d array with 1 on the
border and 0 inside.
6
NumPy
2-dimensional NumPy array

Write a Python program to create a 2d array with 1 on the
border and 0 inside.
Solution
import numpy as np
x = np.ones((5,5))
print(x)
x[1:-1,1:-1] = 0
print(x)
6
Libraries Used
• sklearn
• sklearn.datasets
• numpy.random, numpy.random.seed
• Collections.counter
7
sklearn
Example
sklearn
>>> from sklearn.datasets import load_iris

>>> data = load_iris()
>>> data.target[[10, 25, 50]]
array([0, 0, 1])
>>> list(data.target_names)
[’setosa’, ’versicolor’, ’virginica’]
8
NumPy.Random
• numpy.random:- Return random floats in the half-open

interval [0.0, 1.0)
• numpy.random.seed(seed):- random generator with the given
seed value
9
NumPy.Random (contd.)
Example
>>> np.random.random_sample()
0.47108547995356098
>>> type(np.random.random_sample())
<type ’float’>
>>> np.random.random_sample((5,))
array([ 0.30220482, 0.86820401, 0.1654503 ,
0.11659149, 0.54323428])
10
NumPy.Random (contd.)
Example
seed
>>> import numpy as np

>>> np.random.seed(0)
>>> perm = np.random.permutation(10)
>>> print perm
[2 8 4 9 1 6 7 3 0 5]
11
Counter in Collections
A Counter is a container that keeps track of how many times

equivalent values are added.
>>>import collections
>>>print collections.Counter([’a’, ’b’, ’c’, ’a’, ’b’, ’b’]
12
Counter in Collections (contd.)
Example
Getting the count
import collections
c = collections.Counter(’abcdaab’)
for letter in ’abcde’:
print ’%s : %d’ % (letter, c[letter])
most common() will produce a sequence of the n most

frequently encountered input values and their
respective counts
13
Counter in Collections (contd.)
Example
import collections
c = collections.Counter(’azxyxyzbcdaababababbbxxxxxa’)
for letter, count in c.most_common(2):
print ’%s: %7d’ % (letter, count)
14
KNN Classifier
Steps Involved
1. Load the iris data using sklearn

2. Prepare training and test set using random
3. Calculate the distance using np.linalg
4. Find K Neighbors
5. Vote for the Majority class using Probability calculation
6. List predicted Classes using vote
15
Data Handling
Datahandling: sklearn
Write a Python program to load iris dataset and print labels
using sklearn
Solution
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
iris_data = iris.data
iris_labels = iris.target
print(iris_data[0], iris_data[79], iris_data[100])
print(iris_labels[0], iris_labels[79], iris_labels[100])
16
Preparing Training Dataset
random method in NumPy

Write a Python program to prepare Learning set using random in
NumPy
Solution
np.random.seed(42)
indices = np.random.permutation(len(iris_data))
n_training_samples = 12
learnset_data = iris_data[indices[:-n_training_samples]]
learnset_labels = iris_labels[indices[:-n_training_samples]
testset_data = iris_data[indices[-n_training_samples:]]
testset_labels = iris_labels[indices[-n_training_samples:]]
print(learnset_data[:4], learnset_labels[:4])
print(testset_data[:4], testset_labels[:4])
17
Vizualizing the data
Matplotlib
Write a Python program to Vizualize the data using Matplotlib
18
Vizualizing the data
Matplotlib
Write a Python program to Vizualize the data using Matplotlib
Solution
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
colours = ("r", "b")
X = []
for iclass in range(3):
X.append([[], [], []])
for i in range(len(learnset_data)):
if learnset_labels[i] == iclass:
X[iclass][0].append(learnset_data[i][0])
X[iclass][1].append(learnset_data[i][1])
X[iclass][2].append(sum(learnset_data[i][2:]))
colours = ("r", "g", "y")
fig = plt.figure()
ax = fig.add_subplot(111, projection=’3d’)
for iclass in range(3):
ax.scatter(X[iclass][0], X[iclass][1],
X[iclass][2], c=colours[iclass])
plt.show()
18
Computing Similarity
Write a distance function to calculate Euclidean distance between

two instances.
19
Computing Similarity
Write a distance function to calculate Euclidean distance between

two instances.
Solution
def distance(instance1, instance2):
# just in case, if the instances are lists or tuples:
instance1 = np.array(instance1)
instance2 = np.array(instance2)
return np.linalg.norm(instance1 - instance2)

print(distance([3, 5], [1, 1]))
print(distance(learnset_data[3], learnset_data[44]))
19
Finding Neighbors
Write a fuction to return K Neighbors.

Solution
def get_neighbors(training_set,
labels,test_instance,
k,distance=distance):
distances = []
for index in range(len(training_set)):
dist = distance(test_instance, training_set[index])
distances.append((training_set[index],
dist, labels[index]))
distances.sort(key=lambda x: x[1])
neighbors = distances[:k]
return neighbors
20
Majority Voting
Write a Python program to find most common class
21
Majority Voting
Write a Python program to find most common class
Solution
from collections import Counter
def vote(neighbors):
class_counter = Counter()
for neighbor in neighbors:
class_counter[neighbor[2]] += 1
return class_counter.most_common(1)[0][0]
21
Majority Voting
Vote on Training set

Write a program to get vote on training samples
Solution
for i in range(n_training_samples):
neighbors = get_neighbors(learnset_data,
learnset_labels,
testset_data[i],
3,
distance=distance)
print("index: ", i,
", result of vote: ", vote(neighbors),
", label: ", testset_labels[i],
", data: ", testset_data[i])
22
Voting with Probability
Vote Probability
Write a function to return the class name and the Probability.
Solution
def vote_prob(neighbors):
class_counter = Counter()
for neighbor in neighbors:
class_counter[neighbor[2]] += 1
labels, votes = zip(*class_counter.most_common())
winner = class_counter.most_common(1)[0][0]
votes4winner = class_counter.most_common(1)[0][1]
return winner, votes4winner/sum(votes)
23
Testing
Predicted classes
Write a Python program
Solution
for i in range(n_training_samples):
neighbors = get_neighbors(learnset_data,
learnset_labels,
testset_data[i],
5,
distance=distance)
print("index: ", i,
", vote_prob: ", vote_prob(neighbors),
", label: ", testset_labels[i],
", data: ", testset_data[i])
24
References
Duncan, S.
Automate the boring stuff with python.
Software Quality Professional 17, 4 (2015), 53.
Harms, D. D., and McDonald, K.
The quick python book.
Manning, 2000.
Python Machine Learning Tutorial.
KNN Classifier with Python.
https: // www. python-course. eu/ k_ nearest_
neighbor_ classifier. php .
25

Python For Machine Leraning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Python For Machine Leraning

Uploaded by

Copyright:

Available Formats

Python for Machine Learning

Tips & Tools

color = [’Red’, ’Green’, ’White’,

2-dimensional NumPy array

2-dimensional NumPy array

>>> from sklearn.datasets import load_iris

• numpy.random:- Return random floats in the half-open

>>> import numpy as np

A Counter is a container that keeps track of how many times

most common() will produce a sequence of the n most

1. Load the iris data using sklearn

random method in NumPy

Write a distance function to calculate Euclidean distance between

Write a distance function to calculate Euclidean distance between

return np.linalg.norm(instance1 - instance2)

Write a fuction to return K Neighbors.

Vote on Training set

You might also like