Session 16 - 17 Nov8-10th

Introduction to Management
Science
Session 16
Monday, 8th November 2021
Recap
• Time series analysis
– Types of data (stationary, non-stationary, seasonal)
– Metrics for evaluating effectiveness of one
technique over another (MAD, MSE etc.)
– Techniques
• Simple moving average
• Weighted average
• Exponential smoothing
• Additive seasonality adjusted forecasting
Classification Models
• Supervised Learning
– Show what to recognize (using labeled “training” data)
– Test how good is the model on the basis of unseen
(validation/test) data
– For now we will talk about classification models with
categorical/class/group outcome
• Techniques we will briefly discuss
– KNN (k-nearest neighbor)
– Logistic Regression
k-NN k-Nearest Neighbors
• Classify or predict an outcome of a new
observation
– by observing the k most similar observations from
the training set
– Similarity defined in terms of Euclidean distance
– New observation belongs to a group if the
percentage of its k nearest neighbors in that group
is greater than or equal to a cutoff value
How does kNN classification work?
• “Learn” to differentiate between observations by looking
at “K” neighboring/nearby observations
• The classifier/model “learns” on the basis of labeled

training data
• New observations get classified on the basis of the “most

popular class” amongst “k-nearest neighbors”
• Simple, useful and expensive learning technique

How does kNN work?
Two known observations
belonging to two
predefined classes along
two dimensions/features
Feature 2 (Weight, GMAT, etc.)
Red/Blue Predefined Classes:

Healthy/Unhealthy Can we develop a model
DHL/Non-DHL that “learns” to
differentiate between them
such that when a new
observation is introduced,
it can be appropriately
“classified”?
Feature 1 (Height, GPA, etc.)

How does kNN work?
In essence we have
drawn a decision
boundary between these
two observations

Healthy/Unhealthy Points appearing on either
DHL/Non-DHL side of the line get
“classified” accordingly

How does kNN work?
In essence we have
drawn a decision
boundary between these
two observations

Healthy/Unhealthy Points appearing on either
DHL/Non-DHL side of the line get
“classified” accordingly
Red/Green/Blue Predefined Classes:
Healthy/Normal/Unhealthy
DHL/Average/Non-DHL/
Let’s say we introduce a
third class represented by
These “regions” are green and draw boundaries
called Voronoi cells between each pair of
observations

How does kNN work?
And for visualization
purposes only at this point –
this is a Voronoi Tesselation
How would you classify this

with k = 1? k = 3?
These “regions” are This is the overall

called Voronoi cells decision boundary

Before we move on …
• Learns and then classifies a new observation on the
basis of the most popular class amongst k nearest
neighbors
• Some things to think about
– How will it learn and classify?
• Training
• Validation
• Testing
– What are the implications of k on the performance of the
technique?
– Euclidean is not the only distance measure
So how good is a classifier?
• Accuracy – Confusion Matrix
– Reports classification error
– False positive
– False negative
– True positive
– True negative
Sensitivity and Specificity
• Sensitivity: TP / (TP + FN)
• Specificity: TN / (TN + FP)
Logistic Regression
• Classification technique building along the
lines of linear regression
• Classifying a categorical outcome on the basis

of a combination of features
• Input features could be numerical or

categorical
This is not (Linear) Regression
• Classification problem
• Two classes (typically, 0 and 1)
• Regression
– Line of best fit
– Outcome variable should be continuous
– Operates under certain assumptions
Linear Regression on a Binary Outcome
Y = b0 + b1*X1
Line of best fit
1
Classification Rule
Outcome Class
Ycutoff If new X1 >= X1cutoff

Class ‘1’
Else
Class ‘0’
0
X1cutoff X1
Linear Regression on a Binary Outcome
Y = b0 + b1*Feature1
Line of best fit
1
Outcome Class
Any other
challenges?
Assumptions
Outcome values
0
X1new cutoff X1
Logistic Regression Classification
• Classification technique based on supervised learning
• Predict the probability of being in one class or
another
P(Class = 1) = b0 + b1*X1 + b2*X2 + … + bn*Xn
• Probability that you are in class 0 or 1 should add up
to 1
– Truncate? Not ideal
• Find a mathematical form that can provide values
between 0 and 1
One Approach
• P = b0 + b1*X1
• P = eb0 + b1*X1 >= 0
• P = eb0 + b1*X1 / (eb0 + b1*X1 + 1)  now 0 <= P <= 1
• Re-arrange
• P(eb0 + b1*X1 + 1) = eb0 + b1*X1
• P = eb0 + b1*X1 – P(eb0 + b1*X1)
• P = (1 – P) eb0 + b1*X1
• P / (1 – P) = eb0 + b1*X1
• LN(P/(1 – P)) = b0 + b1*X1
Alternative way of thinking
• If p is the probability of the occurrence of an event
• p/q is the “odds” of the event where q is the
probability of the event not occurring i.e. q = 1 – p
• p/(1-p) is the “odds” of the occurrence of the event
• Logistic regression computes the Log odds of the
event
– So unit change in the beta weights (or coefficients)
represents affect on the log odds of the outcome

Session 16 - 17 Nov8-10th

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 16 - 17 Nov8-10th

Uploaded by

Copyright:

Available Formats

Introduction to Management

• The classifier/model “learns” on the basis of labeled

• New observations get classified on the basis of the “most

• Simple, useful and expensive learning technique

Red/Blue Predefined Classes:

Feature 1 (Height, GPA, etc.)

Red/Blue Predefined Classes:

Feature 1 (Height, GPA, etc.)

Red/Blue Predefined Classes:

Feature 1 (Height, GPA, etc.)

How would you classify this

These “regions” are This is the overall

Feature 1 (Height, GPA, etc.)

• Classifying a categorical outcome on the basis

• Input features could be numerical or

Ycutoff If new X1 >= X1cutoff

You might also like