You are on page 1of 19

Introduction to Management

Science
Session 16
Monday, 8th November 2021
Recap
• Time series analysis
– Types of data (stationary, non-stationary, seasonal)
– Metrics for evaluating effectiveness of one
technique over another (MAD, MSE etc.)
– Techniques
• Simple moving average
• Weighted average
• Exponential smoothing
• Additive seasonality adjusted forecasting
Classification Models
• Supervised Learning
– Show what to recognize (using labeled “training” data)
– Test how good is the model on the basis of unseen
(validation/test) data
– For now we will talk about classification models with
categorical/class/group outcome
• Techniques we will briefly discuss
– KNN (k-nearest neighbor)
– Logistic Regression
k-NN k-Nearest Neighbors
• Classify or predict an outcome of a new
observation
– by observing the k most similar observations from
the training set
– Similarity defined in terms of Euclidean distance
– New observation belongs to a group if the
percentage of its k nearest neighbors in that group
is greater than or equal to a cutoff value
How does kNN classification work?
• “Learn” to differentiate between observations by looking
at “K” neighboring/nearby observations

• The classifier/model “learns” on the basis of labeled


training data

• New observations get classified on the basis of the “most


popular class” amongst “k-nearest neighbors”

• Simple, useful and expensive learning technique


How does kNN work?
Two known observations
belonging to two
predefined classes along
two dimensions/features
Feature 2 (Weight, GMAT, etc.)

Red/Blue Predefined Classes:


Healthy/Unhealthy Can we develop a model
DHL/Non-DHL that “learns” to
differentiate between them
such that when a new
observation is introduced,
it can be appropriately
“classified”?

Feature 1 (Height, GPA, etc.)


How does kNN work?
In essence we have
drawn a decision
boundary between these
two observations
Feature 2 (Weight, GMAT, etc.)

Red/Blue Predefined Classes:


Healthy/Unhealthy Points appearing on either
DHL/Non-DHL side of the line get
“classified” accordingly

Feature 1 (Height, GPA, etc.)


How does kNN work?
In essence we have
drawn a decision
boundary between these
two observations
Feature 2 (Weight, GMAT, etc.)

Red/Blue Predefined Classes:


Healthy/Unhealthy Points appearing on either
DHL/Non-DHL side of the line get
“classified” accordingly
Red/Green/Blue Predefined Classes:
Healthy/Normal/Unhealthy
DHL/Average/Non-DHL/
Let’s say we introduce a
third class represented by
These “regions” are green and draw boundaries
called Voronoi cells between each pair of
observations

Feature 1 (Height, GPA, etc.)


How does kNN work?
And for visualization
purposes only at this point –
this is a Voronoi Tesselation
Feature 2 (Weight, GMAT, etc.)

How would you classify this


with k = 1? k = 3?

These “regions” are This is the overall


called Voronoi cells decision boundary

Feature 1 (Height, GPA, etc.)


Before we move on …
• Learns and then classifies a new observation on the
basis of the most popular class amongst k nearest
neighbors
• Some things to think about
– How will it learn and classify?
• Training
• Validation
• Testing
– What are the implications of k on the performance of the
technique?
– Euclidean is not the only distance measure
So how good is a classifier?
• Accuracy – Confusion Matrix
– Reports classification error
– False positive
– False negative
– True positive
– True negative
Sensitivity and Specificity
• Sensitivity: TP / (TP + FN)
• Specificity: TN / (TN + FP)
Logistic Regression
• Classification technique building along the
lines of linear regression

• Classifying a categorical outcome on the basis


of a combination of features

• Input features could be numerical or


categorical
This is not (Linear) Regression
• Classification problem
• Two classes (typically, 0 and 1)
• Regression
– Line of best fit
– Outcome variable should be continuous
– Operates under certain assumptions
Linear Regression on a Binary Outcome

Y = b0 + b1*X1
Line of best fit
1
Classification Rule
Outcome Class

Ycutoff If new X1 >= X1cutoff


Class ‘1’
Else
Class ‘0’
0
X1cutoff X1
Linear Regression on a Binary Outcome

Y = b0 + b1*Feature1
Line of best fit

1
Outcome Class

Any other
challenges?
Assumptions
Outcome values

0
X1new cutoff X1
Logistic Regression Classification
• Classification technique based on supervised learning
• Predict the probability of being in one class or
another
P(Class = 1) = b0 + b1*X1 + b2*X2 + … + bn*Xn
• Probability that you are in class 0 or 1 should add up
to 1
– Truncate? Not ideal
• Find a mathematical form that can provide values
between 0 and 1
One Approach
• P = b0 + b1*X1
• P = eb0 + b1*X1 >= 0
• P = eb0 + b1*X1 / (eb0 + b1*X1 + 1)  now 0 <= P <= 1
• Re-arrange
• P(eb0 + b1*X1 + 1) = eb0 + b1*X1
• P = eb0 + b1*X1 – P(eb0 + b1*X1)
• P = (1 – P) eb0 + b1*X1
• P / (1 – P) = eb0 + b1*X1
• LN(P/(1 – P)) = b0 + b1*X1
Alternative way of thinking
• If p is the probability of the occurrence of an event
• p/q is the “odds” of the event where q is the
probability of the event not occurring i.e. q = 1 – p
• p/(1-p) is the “odds” of the occurrence of the event
• Logistic regression computes the Log odds of the
event
– So unit change in the beta weights (or coefficients)
represents affect on the log odds of the outcome

You might also like