You are on page 1of 38

Classification

CLASSIFICATION
TYPES
Binary Classification Multi-class Classification
Binary classification is the task of Multiclass classification is the task
classifying the elements of a given of classifying the elements of a
set into two groups on the basis of given set into more than two
a classification rule. groups on the basis of a
classification rule.
BMSCE - ME | PA G E
MCL - Python 2
Classification
• Can you separate red class
from blue class ?

BMSCE - ME
MCL - Python
| PA G E 3
Linear Boundary
• Straight line for two dimensions.

BMSCE - ME
MCL - Python
| PA G E 4
Linear Boundary
• Straight line for two dimensions.
• Plane for three dimensions.
• Hyperplane for higher dimension

BMSCE - ME
MCL - Python
| PA G E 5
Confusion Matrix Accuracy
Predicted
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑
𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠
Positive Negative ¿ ( 𝑇𝑃 +𝑇𝑁 )
A c c u+𝐹𝑃
(𝑇𝑃+𝐹𝑁 r a c +𝑇𝑁
y )
Positive a (TP) b (FN)
Actual Confusion Matrix
Negative c (FP) d (TN)
Other error metrics
• Precision
Sensitivity Specificity • Re c a l l
• F score
a / (a + c) d/ (b + d) • ROC cur ve
Algorithm

Construct a frequency table for the target and


select its most frequent value.

For classification this is Baseline Model

Disease
Yes No

9 6

0.6 0.4
ZeroR Method
BMSCE - ME
MCL - Python
| PA G E 7
Blood
Pressure
Classification

Target
variable is
discrete

Target : 0/1

BMSCE - ME
MCL - Python
| PA G E 11
Can you fit a
Linear regression
model ?

BMSCE - ME
MCL - Python
| PA G E 12
BMSCE - ME
MCL - Python
| PA G E 13
BMSCE - ME
MCL - Python
| PA G E 14
BMSCE - ME
MCL - Python
| PA G E 15
BMSCE - ME
MCL - Python
| PA G E 16
BMSCE - ME
MCL - Python
| PA G E 17
BMSCE - ME
MCL - Python
| PA G E 18
BMSCE - ME
MCL - Python
| PA G E 19
Algorithm

Model the log-odds ratio as a linear function of


independent variables and then convert the log odds ratio
to probability using the sigmoid (logistic) function

Logistic Function

Logistic
Regression BMSCE - ME | PA G E 20
MCL - Python
Algorithm
C a l c u l a te t h e p o s te r i o r p ro ba b i l i t y, P ( A | B ) ,
from P(A), P(B), and P(B|A). Naive Bayes
classifier assume that the effect of the
value of a predictor ( x) on a given class
(c) is independent of the values of other
predictors.

Naive Bayes
BMSCE - ME
MCL - Python
| PA G E 22
Algorithm
C a l c u l a te t h e p o s te r i o r p ro ba b i l i t y, P ( A | B ) ,
from P(A), P(B), and P(B|A). Naive Bayes
classifier assume that the effect of the
value of a predictor ( x) on a given class
(c) is independent of the values of other
predictors.

P(A|D) P(A)

P(D)
Naive Bayes
BMSCE - ME | PA G E
P(D|Alco & S & Age) = P(A|D) * P(S|D)MCL
* P(Age|D)
- Python 23
* P(D)
Naive Bayes
PROS
• Very easy and fast
• Can be used for multiclass prediction
• Performs well with categorical features
• If features are independent NB gives
superior predictions

CONS
• Features are not independent in most real
life examples
• Issue with category that was not found in
training data
• Assumption that the numerical features
follow normal distribution
P(Y | X) = P(X1 | Y) * P( X2 | Y)…. P(Xn|Y) * P(Y)
BMSCE - ME
MCL - Python
| PA G E 24
Algorithm

SVM performs classification by coming up with a


hyperplane that maximizes the separation margin
between the two classes. The vectors that support
the hyperplane are support vectors

 Plot all the data rows as a point in N-Dimensional


space
 N dimensions refer to N features
 So the point will have feature as the value for a
particular coordinate
Support Vector  Come up with a hyperplane that can separate these

Machines (SVM) points in the best way possible into different classes
in that N-dimensions BMSCE - ME | PA G E
25
MCL - Python
Which one would you select ?
Why ?

The black line


► Because it is classifying the points
accurately with the highest margin

Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 26
Which one would you select ?
Why ?

I would again select the black line


► Because it is classifying the points
accurately with the highest margin

Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 27
Maximum Margin Classifier

► Classifies with the maximum margin

Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 28
What will you do in this case?

Can we calculate some other feature from X


and Y and then try to separate this ?

How about Z = X2 + Y2

Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 29
What will you do in this case?

Can we calculate some other feature from F1


and F2 and then try to separate this ?

How about Z = F12 + F22

Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 30
What will you do in this case?

Can we calculate some other feature from F1


and F2 and then try to separate this ?

How about Z = F12 + F22

Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 31
What will you do in this case?

Can we calculate some other feature from F1


and F2 and then try to separate this ?

How about Z = F12 + F22

Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 32
Support Vector
Machines
PROS
 Works very well for small and clean
datasets
 Works well with clear separation margins

 Very effective in high dimensional space

 Kernels give more flexibility

CONS
 Large data sets require lot of training
time and eventually won’t perform well
 Can’t do a good job when noisy data is
given ( overlapping classes)

BMSCE - ME
MCL - Python
| PA G E 33
Algorithm

Decision tree uses Entropy and Information Gain to


construct a tree. Top-down, greedy search through
the space of possible branches with no backtracking

Entropy for 1 attribute

Entropy for 1 attribute

Decision Trees
BMSCE - ME
MCL - Python
| PA G E 34
BMSCE - ME
MCL - Python
| PA G E 35
BMSCE - ME
MCL - Python
| PA G E 36
Steps in Decision Trees

Step
1 Calculate entropy of the target variable

Step Calculate entropy for each branch (split by


2 various features)

Step
3 Calculate Gain for each of the above splits

Step Choose attribute with the largest


4 information gain as the decision node

Step Check if the entropy is zero, else continue


5 further

Step Run recursively on the all branches, until all


6 data is classified (Branch entropy == 0)

BMSCE - ME
MCL - Python
| PA G E 37
Decision Trees

PROS
 Implicitly perform feature selection

 Discover Nonlinear relationships

 Not affected by outliers

 Easy to interpret and explain

 Rules generated which can be shared


easily
CONS
 Decision Trees do not work well if you
have smooth boundaries
 Super attributes will give higher info gain

 Missing values are ignored

BMSCE - ME
MCL - Python
| PA G E 38
Bias-Variance
Tradeoff

BIAS :
How well the model fits the data

Simpler models: Stable (low variance) but they VARIANCE :


don't get close to the truth (high bias).
How much the model changes based on
Complex models: More prone to being over fit changes in the inputs
(high variance) but they are expressive enough to get
close to the truth (low bias).
BMSCE - ME
MCL - Python
| PA G E 39
Decision Trees +
Bagging

If the number of cases in the training set is N, sample


01 N cases at random - but with replacement, from the
original data. This sample will be the training set for
growing the tree.

02 If there are M input variables, a number m<<M is


specified such that at each node, m variables are
selected at random out of the M and the best split on

Random Forest these m is used to split the node.

The value of m is held constant during the forest


03 growing. BMSCE - ME | PA G E 40
MCL - Python
BAGGING
• It is also called as bootstrap aggregating

• Bagging tries to combine predictions from


multiple similar learners trained on
different datasets by averaging their
predictions.

• It reduces variance and helps to avoid


overfitting

• Mostly used with decision trees

BMSCE - ME
MCL - Python
| PA G E 41
Features of
Random Forests

Implicitly gives estimates of what


01 Unexcelled in accuracy among
current algorithms.
04 variables are important in the 07 Generated forests can be
saved for future use on other
classification. data.
It has an effective method for
02 Runs efficiently on
large data sets.
05 estimating missing data and maintains 08 It offers an experimental
method for detecting variable
accuracy when a large proportion of
interactions
the data are missing.

03 06 09
Can handle thousands of input It has methods for balancing error
Uses OOB for error
variables without variable in class population unbalanced
calculation
deletion. data sets.

BMSCE - ME
MCL - Python
| PA G E 42

You might also like