You are on page 1of 47

Logistic

Regression
Logistic Regression

 Predict results on a binary outcome variable


🞑 E.g.,
 Whether or not a patient has a disease
 Whether a new applicant would succeed in the program or not

🞑 The outcome is not continuous or distributed normally


Introduction to Logistic
Regression

Los
Patient Status t
After Five Years

Survived

Number of Positive Nodes


Linear Regression for
Classification?

Patient Los
Status t
After Five
Years
Survived

Number of Positive
Nodes
𝑦𝛽 𝑥 = 𝛽0 + 𝛽1𝑥 + ε
Linear Regression for
Classification?

Patient Lost:
Status 1.0
After Five 0.5
Years
Survived: 0.0

Number of Positive
Nodes
𝑦𝛽 𝑥 = 𝛽0 + 𝛽1𝑥 + ε
Linear Regression for
Classification?

Patient Lost:
Status 1.0
After Five 0.5
Years
Survived: 0.0

Number of Positive Nodes

If model result > 0.5: predict lost


If model result < 0.5: predict
survived
Linear Regression for
Classification?
Prediction
0 0 1 1111 1 1

Patient Lost:
Status 1.0
After Five 0.5
Years 0 0 0 0
Survived: 0.0

Number of Positive Nodes

If model result > 0.5: predict lost


If model result < 0.5: predict
survived
A problem with linear regression
 When we have a binary response variables,
🞑 We code “disease” as 1 and “no disease” as 0, can we just fit a
line through those points as we would with linear regression?

🞑 Possible! But some problems.


A problem with linear regression
 The problem of fitting a regular regression line to
a binary dependent variable
A problem with linear regression
 The line seems to oversimplify the relationship
 It gives predictions that cannot be observable
values of Y for extreme values of X.
 The approach is analogous to fitting a linear model to
the probability of the event, but now we need 1 or 0.
Linear model for a Binary outcome
In the OLS regression:
y = Xb + e, where y = (0, 1)
 e is not normally distributed because Y takes on only two

values (0, 1)
 The predicted probabilities can be greater than 1 or less

than 0
Probabilistic Approach
 Learn P(Y|X) directly
 Cumulative probability distribution
 Using a sigmoid function
Probabilistic Approach
 P(Y|X) using sigmoid function

1
🞑𝑃 𝑌=1 X =
1+exp(-𝐗𝐛)

exp -𝐗𝐛
🞑𝑃 𝑌=0𝑋 =1−𝑃 𝑌=1 𝑋 =
1+exp -𝐗𝐛
1
=
1 + exp(𝐗𝐛)
What is this
Function?
1.0
1
0.8
𝑦 =
0.6 1+𝑒 −𝑥
0.4
0.2

0.0
-10 -5 0 5 10
Understanding the sigmoid
Logistic Regression Model

1
𝑃 𝑌=1 𝑋 =
1 + exp(-𝐗𝐛)
1
𝑃 𝑌=0𝑋 =
1 + exp(𝐗𝐛)

𝑃 𝑌=1𝑋
log = 𝐗𝐛
𝑃 𝑌=0𝑋
Logistic
Regression

Patient Lost:
Status 1.0
After Five 0.5
Years
Survived: 0.0

Number of Positive
Nodes
1
𝑦� 𝑥 =
� 1+𝑒 −(𝛽 0 + 𝛽 1 𝑥 + ε )
Relationship of Logistic to Linear
Regression
1
Logistic 𝑃 𝑥 =
1 + 𝑒 −(𝛽 0+ 𝛽 1 𝑥 + 𝗌 )
Functio
n
Relationship of Logistic to Linear
Regression
1
Logistic 𝑃 𝑥
= 1 + 𝑒 −(𝛽 0+ 𝛽 1 𝑥 + 𝗌 )
Functio
n 𝑒 (𝛽 0 + 𝛽 1 𝑥 )
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
𝑃 𝑥 =
Relationship of Logistic to Linear
Regression
Logistic 𝑃 𝑥 = 𝑒 (𝛽 0 + 𝛽 1 𝑥 )
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
Functio
n
Relationship of Logistic to Linear
Regression
Logistic 𝑃 𝑥 = 𝑒 (𝛽 0 + 𝛽 1 𝑥 )
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
Functio
n

Odd 𝑃 𝑥
s 𝛽 0+ 𝛽 1 𝑥
1−𝑃 𝑥 =𝑒
Ratio
Relationship of Logistic to Linear
Regression
Logistic 𝑃 𝑥 = 𝑒 (𝛽 0 + 𝛽 1 𝑥 )
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
Functio
n

Log 𝑃 𝑥
Odd 𝑙𝑜𝑔 = 𝛽0 + 𝛽1𝑥
1−𝑃 𝑥
s
Relationship of Logistic to Linear
Regression
Logistic 𝑃 𝑥 = 𝑒 (𝛽 0 + 𝛽 1 𝑥 )
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
Functio
n

Log 𝑃 𝑥
Odd 𝑙𝑜𝑔 = 𝛽0 + 𝛽1𝑥
1−𝑃 𝑥
s
Maximum Likelihood Estimation (MLE)
 Likelihood function: probability of the observed data as a
function of the unknown parameters
 To estimate unknown parameters that maximize the
likelihood of getting the data we observed in a
probabilistic model.
Likelihood Function
 Logistic regression predicts probabilities rather than classes:
Stochastic approach
🞑 Fit the model using likelihood
🞑 Maximum Likelihood Estimation
Likelihood Function

 The log-likelihood turns products into


sums:
Likelihood Function

 Derivative with respect to one component



b �
Negative log-Likelihood Function

 Negative log-
likelihood
🞑 Minimization problem

🞑 Solveby Gradient
Descent

20
Logistic Regression: The
Syntax
Import the class containing the classification method
from sklearn.linear_model import LogisticRegression

Create an instance of the class


regularizatio
LR = LogisticRegression(penalty='l2', c=10.0)
n
parameters
Fit the instance on the data and then predict the expected value
LR = LR.fit(X_train, y_train)
y_predict = LR.predict(X_test)

Tune regularization parameters with cross-validation: LogisticRegressionCV.


Classification Error
Metrics
Choosing the Right Error Measurement

• You are asked to build a classifier for leukemia


• Training data: 1% patients with leukemia, 99%
healthy
• Measure accuracy: total % of predictions that are
correct
Choosing the Right Error Measurement

• You are asked to build a classifier for leukemia


• Training data: 1% patients with leukemia, 99% healthy
• Measure accuracy: total % of predictions that are correct
• Build a simple model that always predicts "healthy"
• Accuracy will be 99%...
Confusion
Matrix
Predicted
Predicted
Positive
Negative
Actual True Positive False Negative
Positiv (TP) (FN)
e
False Positive True Negative
Actual (FP) (TN)
Negativ
e
Confusion
Matrix
Predicted
Predicted
Positive
Negative
Actual True Positive False Negative
(TP) (FN) Type II
Positiv
Error
e
False Positive True Negative
Actual (FP) (TN)
Negativ
e
Type I
Error
Error
Measurements
Evaluation Metrics
ROC curve

 Receiver Operating Characteristic


🞑 Graphicalapproach for displaying the tradeoff between true
positive rate(TPR) and false positive rate (FPR) of a classifier
 TPR = positives correctly classified/total positives
 FPR = negatives incorrectly classified/total negatives

🞑 TPR on y-axis and FPR on x-axis


Receiver Operating Characteristic
(ROC) 1.0
Perfect
Model
0.8
Better

True Positive Rate


0.6
Rando
(Sensitivity)
0.4 m
Guess

0.2
Wors
e
0.2 0.4 0.6 0.8 1.0
False Positive Rate (1 –
Specificity)

Evaluation of model at all possible


thresholds
ROC curve
ROC curve
ROC curve
ROC curve
ROC curve
ROC curve
 Points of interests (TPR, FPR)
🞑 (0, 0): everything is negative
🞑 (1, 1): everything is positive

🞑 (1, 0): perfect (ideal)


 Diagonal line
🞑 Random guessing (50%)
 Area Under Curve (AUC)
🞑 Measurement how good the model on the
average
🞑 Good to compare with other methods
Area Under Curve
(AUC) 1.0
AUC
0.8 0.9
AUC
0.75

True Positive Rate


0.6

(Sensitivity)
0.4 AUC
0.5

0.2

0.2 0.4 0.6 0.8 1.0


False Positive Rate (1 –
Specificity)

Measures total area under ROC


curve
Multiple Class Error
Metrics
Predicte Predicte Predicte
d d d
Class 1 Class 2 Class 3 TP1 + TP2 + TP3
Accuracy
Actual TP1
Class =
1
Total

Actual
Class
TP2 Most multi-class error
2 metrics are similar to
binary versions—
TP3 just expand elements
Actual
Class as a sum
3
Classification Error Metrics: The
Syntax
Import the desired error function
from sklearn.metrics import accuracy_score

Calculate the error on the test and predicted data sets


accuracy_value = accuracy_score(y_test, y_pred)

Lots of other error metrics and diagnostic tools:


from sklearn.metrics import precision_score, recall_score,
f1_score, roc_auc_score,
confusion_matrix, roc_curve,
precision_recall_curve

You might also like