Professional Documents
Culture Documents
Classification
1
Classification is a tasks of supervised learning.
Binary (2 classes)
Multi-Class (More than 2 classes)
2
Application:
Social media sentiment analysis has two
potential outcomes, positive or negative, as
displayed by the chart given below.
To classify images
3
Types of classification algorithms
(discriminative and generative learning
algorithms)
4
5
Types of Classification Algorithms
Logistic Regression
Naïve Bayes
Support Vector Machines
SIGMOID FUNCTION 7
Unlike linear regression which outputs
continuous number values, logistic regression
transforms its output using the logistic sigmoid
function to return a probability value which can
then be mapped to two or more discrete
classes.
9
Model Building
10
Python Example: Digits Dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_digits
digits = load_digits()
print(digits.data.shape)
plt.matshow(digits.images[1796])
plt.show()
11
from sklearn.model_selection import
train_test_split
x_train, x_test, y_train, y_test =
train_test_split(digits.data, digits.target,
test_size=0.25, random_state=0)
12
Scikit-learn 4-Step Modeling Pattern
logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)
y_pred = logisticRegr.predict(x_test) 13
Model Performance
14
Confusion Matrix
1. Accuracy
15
Accuracy is suitable for symmetric datasets (i.e. false
positive and false negatives are almost same)
16
2. Precision
17
3. Recall
18
4. F1 Measure
19
ASSIGNMENT 02
Date of submission:
20
Naïve Bayes classifier
21
Play-tennis example
22
23
Will you play or not if it rains, temperature
level is hot, humidity is high and there is light
wind?
= 3.26
24
X = rain, hot temperature, high humidity, light wind
= 0.62
25
Implementation in sklearn
In jupyter notebook
26
Support Vector Machines (SVM)
Hyperplane is a:
point for 1 feature data,
line for 2 feature data,
plane for a 3 feature data
and hyperplane for data with more than 3
features.
27
Consider we have to classify 2 types of objects
(represented by circles and squares below) on
the basis of two features (X1 and X2) .
28
Infinite number of lines may be drawn to classify
them. The optimal hyperplane is shown below).
29
30
31
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).
32
33
34
Consider the case when data cannot be
linearly separable. For example, the
Low and high amounts of a drug didn’t
cured the disease (red dots).
35
The two-feature linearly non-
separable data is shown in fig below.
36
In this case the input space is transformed in to a higher
dimensional space as shown below. The data points are
plotted on the x-axis and z-axis such that z x y
2 2
37
The decision boundary (blue circle) in original input
space looks like below.
38
KERNEL
A kernel transforms a low-dimensional
input space into a higher dimensional
space, i.e. it converts non-separable
problem to separable problems by adding
more dimension to it.
39
Example:
Classifier Building in Scikit-learn
1. skewness
2. variance
3. entropy
4. kurtosis
40
The following script imports required libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Shape of dataset:
bankdata.shape
42
Data Preprocessing
43
(1) Dividing the data into attributes and
labels
X = bankdata.drop('Class', axis=1) #1
y = bankdata['Class’] #2
45
Training the Algorithm
Scikit-Learn svm library, contains built-in classes for
different SVM algorithms.
We will use the support vector classifier (SVC) class.
Making Predictions
y_pred = svclassifier.predict(X_test)
46
Evaluating the Algorithm
47
ASSIGNMENT NO. 1
48
THE END
49