20MEMECH Part 3 - Classification

PART 3
Classification
1
Classification is a tasks of supervised learning.
It specifies the class to which data elements

belong to.
Two common types of Classification:
Binary (2 classes)
Multi-Class (More than 2 classes)
2
Application:
Social media sentiment analysis has two
potential outcomes, positive or negative, as
displayed by the chart given below.
To find whether an email received is a spam or

not
To find if a bank loan is granted or not
To identify if a student will pass or fail in an

examination
To classify images
3
Types of classification algorithms
(discriminative and generative learning
algorithms)
discriminative learning algorithm tries to find a

straight line (decision boundary) that separates
the classes (e.g. cats and dogs) from each other
Eg. SVM (to be discussed).
generative learning algorithm builds separate
models of each class (cats and dogs) E.g. Naïve
Bayes (to be discussed)
4
5
Types of Classification Algorithms
Logistic Regression
Naïve Bayes
Support Vector Machines
K-nearest Neighbors (KNN)

Decision Tree Classification
Random Forest
(Assignment)
6
Logistic regression
Named because it uses logistic function.
The logistic or sigmoid function is an S-shaped curve that can
take any real-valued number and map it into a value between 0
and 1, but never exactly at those limits.
SIGMOID FUNCTION 7
Unlike linear regression which outputs
continuous number values, logistic regression
transforms its output using the logistic sigmoid
function to return a probability value which can
then be mapped to two or more discrete
classes.
Linear Regression could help us predict the

student’s test score on a scale of 0 - 100.
Logistic Regression could help use predict

whether the student passed or failed.
8
Types of logistic regression
Binary (example: Pass/Fail)
Multiclass (Example: Cats, Dogs, Sheep)
Ordinal (Example: Low, Medium, High)
9
Model Building
10
Python Example: Digits Dataset
The digits dataset is included in scikit-learn.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_digits
digits = load_digits()
print(digits.data.shape)
plt.matshow(digits.images[1796])
plt.show()
11
from sklearn.model_selection import
train_test_split
x_train, x_test, y_train, y_test =
train_test_split(digits.data, digits.target,
test_size=0.25, random_state=0)
12
Scikit-learn 4-Step Modeling Pattern
Step 1. Import the model you want to use
from sklearn.linear_model import LogisticRegression
Step 2. Make an instance of the Model
logisticRegr = LogisticRegression()
Step 3. Training the model on the data, storing the information

learned from the data
logisticRegr.fit(x_train, y_train)
Step 4. Predict the labels of new data
y_pred = logisticRegr.predict(x_test) 13
Model Performance
Confusion matrix and classification report are used to

check model performance.
from sklearn.metrics import classification_report,

confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
14
Confusion Matrix
1. Accuracy
Accuracy = (TP+TN) / (TP+FP+FN+TN)
Ratio of correctly predicted observation to the total

observations.
Accuracy is suitable when you have symmetric datasets where

values of false positive and false negatives are almost same.
15
Accuracy is suitable for symmetric datasets (i.e. false
positive and false negatives are almost same)
Is accuracy is good measure for the following

confusion matrix?
16
2. Precision
Precision = (TP) / (TP+FP)
Precision is a good measure to use, when the costs of False

Positive is high (e.g. in email spam detection)
17
3. Recall
Recall = (TP) / (TP+FN)
Recall is a good measure to use, when the costs of False

Negative is high (e.g. in fraud detection)
18
4. F1 Measure
F1 Score is a better measure to use if we need

balance between Precision and Recall AND there is
an uneven class distribution.
19
ASSIGNMENT 02
Date of submission:
Use Logistic regression model on MNIST

database.
Run four steps of scikit learn
Calculate confusion matrix
Find performance measures
20
Naïve Bayes classifier
Naïve Bayes classifier is a probabilistic algorithm

used for classification. It uses Baye’s theorem of
probability to predict the class of unknown data.
It is a probabilistic algorithm that can be used in a

wide variety of classification tasks. Typical
applications include filtering spam and sentiment
prediction. The word naïve is used because features
are assumed to be independent of each other. Naïve
Bayes is a simple yet powerful and fast algorithm.
21
Play-tennis example
22
23
Will you play or not if it rains, temperature
level is hot, humidity is high and there is light
wind?
X = rain, hot temperature, high humidity, light wind
P(play | X) = P(X | play) · P(play) / P (X)
= P(rain | play) · P(hot temperature | play) · P(high humidity | play) ·

P(light wind | play) · P(play) / P (X)
= (3/9 · 2/9 · 3/9 · 6/9 · 9/14) / (5/14 · 4/14 · 7/14 · 8/14)
= 3.26
24
X = rain, hot temperature, high humidity, light wind
P(don’t play | X) = P(X | don’t play) · P(don’t play) / P (X)
= P(rain | don’t play) · P(hot temperature | don’t play) · P(high

humidity | don’t play) · P(light wind | don’t play) · P(don’t play) / P (X)
= (2/5 · 2/5 · 4/5 · 2/5 · 5/14) / (5/14 · 4/14 · 7/14 · 8/14)
= 0.62
25
Implementation in sklearn
In jupyter notebook
26
Support Vector Machines (SVM)
SVM algorithm finds a hyperplane that classifies

data points.
Hyperplane is a:
point for 1 feature data,
line for 2 feature data,
plane for a 3 feature data
and hyperplane for data with more than 3
features.
27
Consider we have to classify 2 types of objects
(represented by circles and squares below) on
the basis of two features (X1 and X2) .
28
Infinite number of lines may be drawn to classify
them. The optimal hyperplane is shown below).
29
30
31
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).
32
33
34
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).
35
The two-feature linearly non-
separable data is shown in fig below.
36
In this case the input space is transformed in to a higher
dimensional space as shown below. The data points are
plotted on the x-axis and z-axis such that z  x  y
2 2
37
The decision boundary (blue circle) in original input
space looks like below.
38
KERNEL
A kernel transforms a low-dimensional
input space into a higher dimensional
space, i.e. it converts non-separable
problem to separable problems by adding
more dimension to it.
Three types are Kernels are used:

1.Linear Kernel
2.Polynomial Kernel
3.Radial Basis Function Kernel
39
Example:
Classifier Building in Scikit-learn
We will use banknote dataset. This example is available

online at: https://stackabuse.com/implementing-svm-and-
kernel-svm-with-pythons-scikit-learn/
Task is to predict whether a bank currency note is authentic

or not (i.e. binary classification).
Four attributes of the image:
1. skewness
2. variance
3. entropy
4. kurtosis
40
The following script imports required libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Importing the Dataset

The data is available for download at the following link:
https://drive.google.com/file/d/13nw-
uRXPY8XIZQxKRNZ3yYlho-CYm_Qt/view
The detailed information about the data is available at the
following link:
https://archive.ics.uci.edu/ml/datasets/banknote+authenticatio
n
Download the dataset from the Google drive link and store
it locally on your machine.
41
Load dataset:
bankdata = pd.read_csv("D:/Datasets/bill_authentication.csv")
Shape of dataset:
bankdata.shape
To check first five rows:

bankdata.head()
42
Data Preprocessing
Data preprocessing involves

(1) Dividing the data into attributes and
labels and
(2) dividing the data into training and

testing sets.
43
(1) Dividing the data into attributes and
labels
X = bankdata.drop('Class', axis=1) #1
y = bankdata['Class’] #2
#1 The drop() command drops whole column

labeled ‘Class’ (axis=1 means whole column,
not just values are deleted)
#2 Only the class column is being stored in
the y variable.
Now, X variable contains features while y
variable contains corresponding labels.
44
(2) dividing the data into training and testing sets
from sklearn.model_selection import

train_test_split
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size = 0.20)
45
Training the Algorithm
Scikit-Learn svm library, contains built-in classes for
different SVM algorithms.
We will use the support vector classifier (SVC) class.
The fit command of SVC class is called to train the

algorithm on the training data:
from sklearn.svm import SVC

svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)
Making Predictions
y_pred = svclassifier.predict(X_test)
46
Evaluating the Algorithm
from sklearn.metrics import

classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
47
ASSIGNMENT NO. 1
1. Download any publicly available linearly

separable dataset. Run SVM. Put your
code, dataset and confusion matrix in
single word file. What do you conclude?
48
THE END
49

20MEMECH Part 3 - Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

20MEMECH Part 3 - Classification

Uploaded by

Copyright:

Available Formats

PART 3

It specifies the class to which data elements

Two common types of Classification:

To find whether an email received is a spam or

To find if a bank loan is granted or not

To identify if a student will pass or fail in an

discriminative learning algorithm tries to find a

K-nearest Neighbors (KNN)

Linear Regression could help us predict the

Logistic Regression could help use predict

Binary (example: Pass/Fail)

Multiclass (Example: Cats, Dogs, Sheep)

Ordinal (Example: Low, Medium, High)

The digits dataset is included in scikit-learn.

Step 1. Import the model you want to use

from sklearn.linear_model import LogisticRegression

Step 2. Make an instance of the Model

Step 3. Training the model on the data, storing the information

Step 4. Predict the labels of new data

Confusion matrix and classification report are used to

from sklearn.metrics import classification_report,

Accuracy = (TP+TN) / (TP+FP+FN+TN)

Ratio of correctly predicted observation to the total

Accuracy is suitable when you have symmetric datasets where

Is accuracy is good measure for the following

Precision = (TP) / (TP+FP)

Precision is a good measure to use, when the costs of False

Recall = (TP) / (TP+FN)

Recall is a good measure to use, when the costs of False

F1 Score is a better measure to use if we need

Use Logistic regression model on MNIST

Naïve Bayes classifier is a probabilistic algorithm

It is a probabilistic algorithm that can be used in a

X = rain, hot temperature, high humidity, light wind

P(play | X) = P(X | play) · P(play) / P (X)

= P(rain | play) · P(hot temperature | play) · P(high humidity | play) ·

= (3/9 · 2/9 · 3/9 · 6/9 · 9/14) / (5/14 · 4/14 · 7/14 · 8/14)

P(don’t play | X) = P(X | don’t play) · P(don’t play) / P (X)

= P(rain | don’t play) · P(hot temperature | don’t play) · P(high

= (2/5 · 2/5 · 4/5 · 2/5 · 5/14) / (5/14 · 4/14 · 7/14 · 8/14)

SVM algorithm finds a hyperplane that classifies

Three types are Kernels are used:

We will use banknote dataset. This example is available

Task is to predict whether a bank currency note is authentic

Four attributes of the image:

Importing the Dataset

To check first five rows:

Data preprocessing involves

(2) dividing the data into training and

#1 The drop() command drops whole column

from sklearn.model_selection import

The fit command of SVC class is called to train the

from sklearn.svm import SVC

from sklearn.metrics import

1. Download any publicly available linearly

You might also like