You are on page 1of 23

Classification using

Binary Logistic
Regression

Dr. Karuna Reddy


STEMP, The University of the South Pacific

1
Overvie
wUnderstand what classification is

 Learn about binary logistic regression
 Confusion matrix
 Performance metrics

2
Classificatio
n Process of making categorical

predictions using data
 You have some data that has
already been placed in the correct
categories
 Goal is to learn from this data in
order to make good predictions for
new data 3
Classification
uses
 Make weather forecasts
 Classify emails as spam and not spam
 Predict if a patient has a particular disease
 Predict a candidate to vote for in an election
 Predict the genre of a song/movie/TV show
 Predict whether two users are compatible

4
Classificatio
n Observation – a situation where you want to

make a prediction
 Attributes - certain known aspects that
describe the observation
 Observation belongs to a class, which is
unknown
 Classification predicts the classes using the
attributes of the observations
5
Training
data
 Consists of observations that
have already been correctly
classified
 Classifier - an algorithm that helps
you classify future observations

6
Classificatio
n

7
Logistic Regression – Bernoulli
Distribution
 Toss a fair coin: S = {Heads, Tails}
 Mapped to 1 (positive class) or 0
 𝑃 𝑌 = 𝑦 = 𝑝 ! (1 − 𝑝) ( # $ ! )

8
Logistic Regression
Setting
 Predicting a dichotomous response variable
Y
 Y follows a Bernoulli distribution
 𝐸 𝑌! 𝑋 " = 𝑥 ", … , 𝑋 # = 𝑥 # = 𝑝!

9
Logistic Regression
Setting
𝐸 𝑌! 𝑋 " = 𝑥 " , … , 𝑋 # = 𝑥 # =
𝑝!

10
Non-linear
Regression

Non-Linear regression: Dependent variable is
binary or categorical
 Binary logistic Dependent variable is {0,1}
 Multinomial logistic - dependent variable is
categorical, without any particular ordering.

Ordered logistic - dependent variable is
categorical with ordering important.

11
Why not Linear
Regression?
 How should we estimate p(x)?
 The dependent variable is categorical, e.g.,
binary, count
 The “Generalised Linear Model
(GLM)” extends the linear model

12
Fitting with Linear
Regression

13
Logistic
Regression

14
The Logistic
Function

15
Logistic Function & logit
transformation

16
The logit Transformation

 Predict the response categories:


 If p(x) > 0.5 => predict the Positive class
 Negative class otherwise

17
Confusion
Matrix
 Evaluating of prediction accuracy

18
Confusion
Matrix

Overall, 58 patients (77%) were correctly classified. We


predicted that 8 patients would not develop heart disease
when in fact they did develop heart disease (false negative).
19
Performance
Metrics
 F score - gives equal weight to
!
false positive and false negative
rates
 Precision–function of false positives
 Recall –function of false negatives

20
Performance
Metrics
 Sensitivity - function of false
negatives (recall)
 Specificity - function of false
positives

21
Performance
Metrics

22
Question
s

23

You might also like