You are on page 1of 49

PART 3

Classification

1
Classification is a tasks of supervised learning.

It specifies the class to which data elements


belong to.

Two common types of Classification:

Binary (2 classes)
Multi-Class (More than 2 classes)

2
Application:
Social media sentiment analysis has two
potential outcomes, positive or negative, as
displayed by the chart given below.

To find whether an email received is a spam or


not

To find if a bank loan is granted or not

To identify if a student will pass or fail in an


examination

To classify images
3
Types of classification algorithms
(discriminative and generative learning
algorithms)

discriminative learning algorithm tries to find a


straight line (decision boundary) that separates
the classes (e.g. cats and dogs) from each other
Eg. SVM (to be discussed).
generative learning algorithm builds separate
models of each class (cats and dogs) E.g. Naïve
Bayes (to be discussed)

4
5
Types of Classification Algorithms

Logistic Regression
Naïve Bayes
Support Vector Machines

K-nearest Neighbors (KNN)


Decision Tree Classification
Random Forest
(Assignment)
6
Logistic regression
Named because it uses logistic function.
The logistic or sigmoid function is an S-shaped curve that can
take any real-valued number and map it into a value between 0
and 1, but never exactly at those limits.

SIGMOID FUNCTION 7
Unlike linear regression which outputs
continuous number values, logistic regression
transforms its output using the logistic sigmoid
function to return a probability value which can
then be mapped to two or more discrete
classes.

Linear Regression could help us predict the


student’s test score on a scale of 0 - 100.

Logistic Regression could help use predict


whether the student passed or failed.
8
Types of logistic regression

Binary (example: Pass/Fail)

Multiclass (Example: Cats, Dogs, Sheep)

Ordinal (Example: Low, Medium, High)

9
Model Building

10
Python Example: Digits Dataset

The digits dataset is included in scikit-learn.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_digits
digits = load_digits()
print(digits.data.shape)

plt.matshow(digits.images[1796])
plt.show()

11
from sklearn.model_selection import
train_test_split
x_train, x_test, y_train, y_test =
train_test_split(digits.data, digits.target,
test_size=0.25, random_state=0)

12
Scikit-learn 4-Step Modeling Pattern

Step 1. Import the model you want to use

from sklearn.linear_model import LogisticRegression

Step 2. Make an instance of the Model

logisticRegr = LogisticRegression()

Step 3. Training the model on the data, storing the information


learned from the data

logisticRegr.fit(x_train, y_train)

Step 4. Predict the labels of new data

y_pred = logisticRegr.predict(x_test) 13
Model Performance

Confusion matrix and classification report are used to


check model performance.

from sklearn.metrics import classification_report,


confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

14
Confusion Matrix

1. Accuracy

Accuracy = (TP+TN) / (TP+FP+FN+TN)

Ratio of correctly predicted observation to the total


observations.

Accuracy is suitable when you have symmetric datasets where


values of false positive and false negatives are almost same.

15
Accuracy is suitable for symmetric datasets (i.e. false
positive and false negatives are almost same)

Is accuracy is good measure for the following


confusion matrix?

16
2. Precision

Precision = (TP) / (TP+FP)

Precision is a good measure to use, when the costs of False


Positive is high (e.g. in email spam detection)

17
3. Recall

Recall = (TP) / (TP+FN)

Recall is a good measure to use, when the costs of False


Negative is high (e.g. in fraud detection)

18
4. F1 Measure

F1 Score is a better measure to use if we need


balance between Precision and Recall AND there is
an uneven class distribution.

19
ASSIGNMENT 02

Date of submission:

Use Logistic regression model on MNIST


database.
Run four steps of scikit learn
Calculate confusion matrix
Find performance measures

20
Naïve Bayes classifier

Naïve Bayes classifier is a probabilistic algorithm


used for classification. It uses Baye’s theorem of
probability to predict the class of unknown data.

It is a probabilistic algorithm that can be used in a


wide variety of classification tasks. Typical
applications include filtering spam and sentiment
prediction. The word naïve is used because features
are assumed to be independent of each other. Naïve
Bayes is a simple yet powerful and fast algorithm.

21
Play-tennis example

22
23
Will you play or not if it rains, temperature
level is hot, humidity is high and there is light
wind?

X = rain, hot temperature, high humidity, light wind

P(play | X) = P(X | play) · P(play) / P (X)

= P(rain | play) · P(hot temperature | play) · P(high humidity | play) ·


P(light wind | play) · P(play) / P (X)

= (3/9 · 2/9 · 3/9 · 6/9 · 9/14) / (5/14 · 4/14 · 7/14 · 8/14)

= 3.26

24
X = rain, hot temperature, high humidity, light wind

P(don’t play | X) = P(X | don’t play) · P(don’t play) / P (X)

= P(rain | don’t play) · P(hot temperature | don’t play) · P(high


humidity | don’t play) · P(light wind | don’t play) · P(don’t play) / P (X)

= (2/5 · 2/5 · 4/5 · 2/5 · 5/14) / (5/14 · 4/14 · 7/14 · 8/14)

= 0.62

25
Implementation in sklearn

In jupyter notebook

26
Support Vector Machines (SVM)

SVM algorithm finds a hyperplane that classifies


data points.

Hyperplane is a:
point for 1 feature data,
line for 2 feature data,
plane for a 3 feature data
and hyperplane for data with more than 3
features.

27
Consider we have to classify 2 types of objects
(represented by circles and squares below) on
the basis of two features (X1 and X2) .

28
Infinite number of lines may be drawn to classify
them. The optimal hyperplane is shown below).

29
30
31
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).

32
33
34
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).

35
The two-feature linearly non-
separable data is shown in fig below.

36
In this case the input space is transformed in to a higher
dimensional space as shown below. The data points are
plotted on the x-axis and z-axis such that z  x  y
2 2

37
The decision boundary (blue circle) in original input
space looks like below.

38
KERNEL
A kernel transforms a low-dimensional
input space into a higher dimensional
space, i.e. it converts non-separable
problem to separable problems by adding
more dimension to it.

Three types are Kernels are used:


1.Linear Kernel
2.Polynomial Kernel
3.Radial Basis Function Kernel

39
Example:
Classifier Building in Scikit-learn

We will use banknote dataset. This example is available


online at: https://stackabuse.com/implementing-svm-and-
kernel-svm-with-pythons-scikit-learn/

Task is to predict whether a bank currency note is authentic


or not (i.e. binary classification).

Four attributes of the image:

1. skewness
2. variance
3. entropy
4. kurtosis
40
The following script imports required libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Importing the Dataset


The data is available for download at the following link:
https://drive.google.com/file/d/13nw-
uRXPY8XIZQxKRNZ3yYlho-CYm_Qt/view
The detailed information about the data is available at the
following link:
https://archive.ics.uci.edu/ml/datasets/banknote+authenticatio
n
Download the dataset from the Google drive link and store
it locally on your machine.
41
Load dataset:
bankdata = pd.read_csv("D:/Datasets/bill_authentication.csv")

Shape of dataset:
bankdata.shape

To check first five rows:


bankdata.head()

42
Data Preprocessing

Data preprocessing involves


(1) Dividing the data into attributes and
labels and

(2) dividing the data into training and


testing sets.

43
(1) Dividing the data into attributes and
labels

X = bankdata.drop('Class', axis=1) #1
y = bankdata['Class’] #2

#1 The drop() command drops whole column


labeled ‘Class’ (axis=1 means whole column,
not just values are deleted)
#2 Only the class column is being stored in
the y variable.
Now, X variable contains features while y
variable contains corresponding labels.
44
(2) dividing the data into training and testing sets

from sklearn.model_selection import


train_test_split
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size = 0.20)

45
Training the Algorithm
Scikit-Learn svm library, contains built-in classes for
different SVM algorithms.
We will use the support vector classifier (SVC) class.

The fit command of SVC class is called to train the


algorithm on the training data:

from sklearn.svm import SVC


svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

Making Predictions
y_pred = svclassifier.predict(X_test)
46
Evaluating the Algorithm

from sklearn.metrics import


classification_report, confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

47
ASSIGNMENT NO. 1

1. Download any publicly available linearly


separable dataset. Run SVM. Put your
code, dataset and confusion matrix in
single word file. What do you conclude?

48
THE END

49

You might also like