2 Classification

Classification
CLASSIFICATION
TYPES
Binary Classification Multi-class Classification
Binary classification is the task of Multiclass classification is the task
classifying the elements of a given of classifying the elements of a
set into two groups on the basis of given set into more than two
a classification rule. groups on the basis of a
classification rule.
BMSCE - ME | PA G E
MCL - Python 2
Classification
• Can you separate red class
from blue class ?
BMSCE - ME
MCL - Python
| PA G E 3
Linear Boundary
• Straight line for two dimensions.
BMSCE - ME
MCL - Python
| PA G E 4
Linear Boundary
• Straight line for two dimensions.
• Plane for three dimensions.
• Hyperplane for higher dimension
BMSCE - ME
MCL - Python
| PA G E 5
Confusion Matrix Accuracy
Predicted
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑
𝑑𝑎𝑡𝑎𝑝𝑜𝑖𝑛𝑡𝑠
Positive Negative ¿ ( 𝑇𝑃 +𝑇𝑁 )
A c c u+𝐹𝑃
(𝑇𝑃+𝐹𝑁 r a c +𝑇𝑁
y )
Positive a (TP) b (FN)
Actual Confusion Matrix
Negative c (FP) d (TN)
Other error metrics
• Precision
Sensitivity Specificity • Re c a l l
• F score
a / (a + c) d/ (b + d) • ROC cur ve
Algorithm
Construct a frequency table for the target and

select its most frequent value.
For classification this is Baseline Model
Disease
Yes No
9 6
0.6 0.4
ZeroR Method
BMSCE - ME
MCL - Python
| PA G E 7
Blood
Pressure
Classification
Target
variable is
discrete
Target : 0/1
BMSCE - ME
MCL - Python
| PA G E 11
Can you fit a
Linear regression
model ?
BMSCE - ME
MCL - Python
| PA G E 12
BMSCE - ME
MCL - Python
| PA G E 13
BMSCE - ME
MCL - Python
| PA G E 14
BMSCE - ME
MCL - Python
| PA G E 15
BMSCE - ME
MCL - Python
| PA G E 16
BMSCE - ME
MCL - Python
| PA G E 17
BMSCE - ME
MCL - Python
| PA G E 18
BMSCE - ME
MCL - Python
| PA G E 19
Algorithm
Model the log-odds ratio as a linear function of

independent variables and then convert the log odds ratio
to probability using the sigmoid (logistic) function
Logistic Function
Logistic
Regression BMSCE - ME | PA G E 20
MCL - Python
Algorithm
C a l c u l a te t h e p o s te r i o r p ro ba b i l i t y, P ( A | B ) ,
from P(A), P(B), and P(B|A). Naive Bayes
classifier assume that the effect of the
value of a predictor ( x) on a given class
(c) is independent of the values of other
predictors.
Naive Bayes
BMSCE - ME
MCL - Python
| PA G E 22
Algorithm
C a l c u l a te t h e p o s te r i o r p ro ba b i l i t y, P ( A | B ) ,
from P(A), P(B), and P(B|A). Naive Bayes
classifier assume that the effect of the
value of a predictor ( x) on a given class
(c) is independent of the values of other
predictors.
P(A|D) P(A)
P(D)
Naive Bayes
BMSCE - ME | PA G E
P(D|Alco & S & Age) = P(A|D) * P(S|D)MCL
* P(Age|D)
- Python 23
* P(D)
Naive Bayes
PROS
• Very easy and fast
• Can be used for multiclass prediction
• Performs well with categorical features
• If features are independent NB gives
superior predictions
CONS
• Features are not independent in most real
life examples
• Issue with category that was not found in
training data
• Assumption that the numerical features
follow normal distribution
P(Y | X) = P(X1 | Y) * P( X2 | Y)…. P(Xn|Y) * P(Y)
BMSCE - ME
MCL - Python
| PA G E 24
Algorithm
SVM performs classification by coming up with a

hyperplane that maximizes the separation margin
between the two classes. The vectors that support
the hyperplane are support vectors
 Plot all the data rows as a point in N-Dimensional

space
 N dimensions refer to N features
 So the point will have feature as the value for a
particular coordinate
Support Vector  Come up with a hyperplane that can separate these
Machines (SVM) points in the best way possible into different classes
in that N-dimensions BMSCE - ME | PA G E
25
MCL - Python
Which one would you select ?
Why ?
The black line

► Because it is classifying the points
accurately with the highest margin
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 26
Which one would you select ?
Why ?
I would again select the black line

► Because it is classifying the points
accurately with the highest margin
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 27
Maximum Margin Classifier
► Classifies with the maximum margin
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 28
What will you do in this case?
Can we calculate some other feature from X

and Y and then try to separate this ?
How about Z = X2 + Y2
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 29
Can we calculate some other feature from F1

and F2 and then try to separate this ?
How about Z = F12 + F22
Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 30

Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 31

Decision Boundary
BMSCE - ME
MCL - Python
| PA G E 32
Support Vector
Machines
PROS
 Works very well for small and clean
datasets
 Works well with clear separation margins
 Very effective in high dimensional space
 Kernels give more flexibility
CONS
 Large data sets require lot of training
time and eventually won’t perform well
 Can’t do a good job when noisy data is
given ( overlapping classes)
BMSCE - ME
MCL - Python
| PA G E 33
Algorithm
Decision tree uses Entropy and Information Gain to

construct a tree. Top-down, greedy search through
the space of possible branches with no backtracking
Entropy for 1 attribute
Entropy for 1 attribute
Decision Trees
BMSCE - ME
MCL - Python
| PA G E 34
BMSCE - ME
MCL - Python
| PA G E 35
BMSCE - ME
MCL - Python
| PA G E 36
Steps in Decision Trees
Step
1 Calculate entropy of the target variable
Step Calculate entropy for each branch (split by

2 various features)
Step
3 Calculate Gain for each of the above splits
Step Choose attribute with the largest

4 information gain as the decision node
Step Check if the entropy is zero, else continue

5 further
Step Run recursively on the all branches, until all

6 data is classified (Branch entropy == 0)
BMSCE - ME
MCL - Python
| PA G E 37
Decision Trees
PROS
 Implicitly perform feature selection
 Discover Nonlinear relationships
 Not affected by outliers
 Easy to interpret and explain
 Rules generated which can be shared

easily
CONS
 Decision Trees do not work well if you
have smooth boundaries
 Super attributes will give higher info gain
 Missing values are ignored
BMSCE - ME
MCL - Python
| PA G E 38
Bias-Variance
Tradeoff
BIAS :
How well the model fits the data
Simpler models: Stable (low variance) but they VARIANCE :

don't get close to the truth (high bias).
How much the model changes based on
Complex models: More prone to being over fit changes in the inputs
(high variance) but they are expressive enough to get
close to the truth (low bias).
BMSCE - ME
MCL - Python
| PA G E 39
Decision Trees +
Bagging
If the number of cases in the training set is N, sample

01 N cases at random - but with replacement, from the
original data. This sample will be the training set for
growing the tree.
02 If there are M input variables, a number m<<M is

specified such that at each node, m variables are
selected at random out of the M and the best split on
Random Forest these m is used to split the node.
The value of m is held constant during the forest

03 growing. BMSCE - ME | PA G E 40
MCL - Python
BAGGING
• It is also called as bootstrap aggregating
• Bagging tries to combine predictions from

multiple similar learners trained on
different datasets by averaging their
predictions.
• It reduces variance and helps to avoid

overfitting
• Mostly used with decision trees
BMSCE - ME
MCL - Python
| PA G E 41
Features of
Random Forests
Implicitly gives estimates of what

01 Unexcelled in accuracy among
current algorithms.
04 variables are important in the 07 Generated forests can be
saved for future use on other
classification. data.
It has an effective method for
02 Runs efficiently on
large data sets.
05 estimating missing data and maintains 08 It offers an experimental
method for detecting variable
accuracy when a large proportion of
interactions
the data are missing.
03 06 09
Can handle thousands of input It has methods for balancing error
Uses OOB for error
variables without variable in class population unbalanced
calculation
deletion. data sets.
BMSCE - ME
MCL - Python
| PA G E 42

2 Classification

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 Classification

Uploaded by

Copyright:

Available Formats

Classification

Construct a frequency table for the target and

For classification this is Baseline Model

Model the log-odds ratio as a linear function of

SVM performs classification by coming up with a

 Plot all the data rows as a point in N-Dimensional

The black line

I would again select the black line

► Classifies with the maximum margin

Can we calculate some other feature from X

Can we calculate some other feature from F1

How about Z = F12 + F22

Can we calculate some other feature from F1

How about Z = F12 + F22

Can we calculate some other feature from F1

How about Z = F12 + F22

 Very effective in high dimensional space

 Kernels give more flexibility

Decision tree uses Entropy and Information Gain to

Entropy for 1 attribute

Entropy for 1 attribute

Step Calculate entropy for each branch (split by

Step Choose attribute with the largest

Step Check if the entropy is zero, else continue

Step Run recursively on the all branches, until all

 Discover Nonlinear relationships

 Not affected by outliers

 Easy to interpret and explain

 Rules generated which can be shared

 Missing values are ignored

Simpler models: Stable (low variance) but they VARIANCE :

If the number of cases in the training set is N, sample

02 If there are M input variables, a number m<<M is

Random Forest these m is used to split the node.

The value of m is held constant during the forest

• Bagging tries to combine predictions from

• It reduces variance and helps to avoid

• Mostly used with decision trees

Implicitly gives estimates of what

You might also like