Logistic Regression

Logistic
Regression
Logistic Regression
 Predict results on a binary outcome variable

🞑 E.g.,
 Whether or not a patient has a disease
 Whether a new applicant would succeed in the program or not
🞑 The outcome is not continuous or distributed normally

Introduction to Logistic
Regression
Los
Patient Status t
After Five Years
Survived
Number of Positive Nodes

Linear Regression for
Classification?
Patient Los
Status t
After Five
Years
Survived
Number of Positive
Nodes
𝑦𝛽 𝑥 = 𝛽0 + 𝛽1𝑥 + ε
Classification?
Patient Lost:
Status 1.0
After Five 0.5
Years
Survived: 0.0
Number of Positive
Nodes
𝑦𝛽 𝑥 = 𝛽0 + 𝛽1𝑥 + ε
Classification?
Patient Lost:
Status 1.0
After Five 0.5
Years
Survived: 0.0
If model result > 0.5: predict lost

If model result < 0.5: predict
survived
Classification?
Prediction
0 0 1 1111 1 1
Patient Lost:
Status 1.0
After Five 0.5
Years 0 0 0 0
Survived: 0.0
If model result > 0.5: predict lost

If model result < 0.5: predict
survived
A problem with linear regression
 When we have a binary response variables,
🞑 We code “disease” as 1 and “no disease” as 0, can we just fit a
line through those points as we would with linear regression?
🞑 Possible! But some problems.

 The problem of fitting a regular regression line to
a binary dependent variable
 The line seems to oversimplify the relationship
 It gives predictions that cannot be observable
values of Y for extreme values of X.
 The approach is analogous to fitting a linear model to
the probability of the event, but now we need 1 or 0.
Linear model for a Binary outcome
In the OLS regression:
y = Xb + e, where y = (0, 1)
 e is not normally distributed because Y takes on only two
values (0, 1)
 The predicted probabilities can be greater than 1 or less
than 0
Probabilistic Approach
 Learn P(Y|X) directly
 Cumulative probability distribution
 Using a sigmoid function
Probabilistic Approach
 P(Y|X) using sigmoid function
1
🞑𝑃 𝑌=1 X =
1+exp(-𝐗𝐛)
exp -𝐗𝐛
🞑𝑃 𝑌=0𝑋 =1−𝑃 𝑌=1 𝑋 =
1+exp -𝐗𝐛
1
=
1 + exp(𝐗𝐛)
What is this
Function?
1.0
1
0.8
𝑦 =
0.6 1+𝑒 −𝑥
0.4
0.2
0.0
-10 -5 0 5 10
Understanding the sigmoid
Logistic Regression Model
1
𝑃 𝑌=1 𝑋 =
1 + exp(-𝐗𝐛)
1
𝑃 𝑌=0𝑋 =
1 + exp(𝐗𝐛)
𝑃 𝑌=1𝑋
log = 𝐗𝐛
𝑃 𝑌=0𝑋
Logistic
Regression
Patient Lost:
Status 1.0
After Five 0.5
Years
Survived: 0.0
Number of Positive
Nodes
1
𝑦� 𝑥 =
� 1+𝑒 −(𝛽 0 + 𝛽 1 𝑥 + ε )
Relationship of Logistic to Linear
Regression
1
Logistic 𝑃 𝑥 =
1 + 𝑒 −(𝛽 0+ 𝛽 1 𝑥 + 𝗌 )
Functio
n
Regression
1
Logistic 𝑃 𝑥
= 1 + 𝑒 −(𝛽 0+ 𝛽 1 𝑥 + 𝗌 )
Functio
n 𝑒 (𝛽 0 + 𝛽 1 𝑥 )
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
𝑃 𝑥 =
Regression
Logistic 𝑃 𝑥 = 𝑒 (𝛽 0 + 𝛽 1 𝑥 )
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
Functio
n
Regression
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
Functio
n
Odd 𝑃 𝑥
s 𝛽 0+ 𝛽 1 𝑥
1−𝑃 𝑥 =𝑒
Ratio
Regression
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
Functio
n
Log 𝑃 𝑥
Odd 𝑙𝑜𝑔 = 𝛽0 + 𝛽1𝑥
1−𝑃 𝑥
s
Regression
1+𝑒 (𝛽 0 + 𝛽 1 𝑥 )
Functio
n
Log 𝑃 𝑥
Odd 𝑙𝑜𝑔 = 𝛽0 + 𝛽1𝑥
1−𝑃 𝑥
s
Maximum Likelihood Estimation (MLE)
 Likelihood function: probability of the observed data as a
function of the unknown parameters
 To estimate unknown parameters that maximize the
likelihood of getting the data we observed in a
probabilistic model.
Likelihood Function
 Logistic regression predicts probabilities rather than classes:
Stochastic approach
🞑 Fit the model using likelihood
🞑 Maximum Likelihood Estimation
Likelihood Function
 The log-likelihood turns products into

sums:
Likelihood Function
 Derivative with respect to one component

�
b �
Negative log-Likelihood Function
 Negative log-
likelihood
🞑 Minimization problem
🞑 Solveby Gradient
Descent
20
Logistic Regression: The
Syntax
Import the class containing the classification method
from sklearn.linear_model import LogisticRegression
Create an instance of the class

regularizatio
LR = LogisticRegression(penalty='l2', c=10.0)
n
parameters
Fit the instance on the data and then predict the expected value
LR = LR.fit(X_train, y_train)
y_predict = LR.predict(X_test)
Tune regularization parameters with cross-validation: LogisticRegressionCV.

Classification Error
Metrics
Choosing the Right Error Measurement
• You are asked to build a classifier for leukemia

• Training data: 1% patients with leukemia, 99%
healthy
• Measure accuracy: total % of predictions that are
correct
Choosing the Right Error Measurement
• You are asked to build a classifier for leukemia

• Training data: 1% patients with leukemia, 99% healthy
• Measure accuracy: total % of predictions that are correct
• Build a simple model that always predicts "healthy"
• Accuracy will be 99%...
Confusion
Matrix
Predicted
Predicted
Positive
Negative
Actual True Positive False Negative
Positiv (TP) (FN)
e
False Positive True Negative
Actual (FP) (TN)
Negativ
e
Confusion
Matrix
Predicted
Predicted
Positive
Negative
Actual True Positive False Negative
(TP) (FN) Type II
Positiv
Error
e
False Positive True Negative
Actual (FP) (TN)
Negativ
e
Type I
Error
Error
Measurements
Evaluation Metrics
ROC curve
 Receiver Operating Characteristic

🞑 Graphicalapproach for displaying the tradeoff between true
positive rate(TPR) and false positive rate (FPR) of a classifier
 TPR = positives correctly classified/total positives
 FPR = negatives incorrectly classified/total negatives
🞑 TPR on y-axis and FPR on x-axis

Receiver Operating Characteristic
(ROC) 1.0
Perfect
Model
0.8
Better
True Positive Rate

0.6
Rando
(Sensitivity)
0.4 m
Guess
0.2
Wors
e
0.2 0.4 0.6 0.8 1.0
False Positive Rate (1 –
Specificity)
Evaluation of model at all possible

thresholds
ROC curve
ROC curve
ROC curve
ROC curve
ROC curve
ROC curve
 Points of interests (TPR, FPR)
🞑 (0, 0): everything is negative
🞑 (1, 1): everything is positive
🞑 (1, 0): perfect (ideal)

 Diagonal line
🞑 Random guessing (50%)
 Area Under Curve (AUC)
🞑 Measurement how good the model on the
average
🞑 Good to compare with other methods
Area Under Curve
(AUC) 1.0
AUC
0.8 0.9
AUC
0.75
True Positive Rate

0.6
(Sensitivity)
0.4 AUC
0.5
0.2
0.2 0.4 0.6 0.8 1.0

False Positive Rate (1 –
Specificity)
Measures total area under ROC

curve
Multiple Class Error
Metrics
Predicte Predicte Predicte
d d d
Class 1 Class 2 Class 3 TP1 + TP2 + TP3
Accuracy
Actual TP1
Class =
1
Total
Actual
Class
TP2 Most multi-class error
2 metrics are similar to
binary versions—
TP3 just expand elements
Actual
Class as a sum
3
Classification Error Metrics: The
Syntax
Import the desired error function
from sklearn.metrics import accuracy_score
Calculate the error on the test and predicted data sets

accuracy_value = accuracy_score(y_test, y_pred)
Lots of other error metrics and diagnostic tools:

from sklearn.metrics import precision_score, recall_score,
f1_score, roc_auc_score,
confusion_matrix, roc_curve,
precision_recall_curve

Logistic Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression

Uploaded by

Copyright:

Available Formats

Logistic

 Predict results on a binary outcome variable

🞑 The outcome is not continuous or distributed normally

Number of Positive Nodes

Number of Positive Nodes

If model result > 0.5: predict lost

Number of Positive Nodes

If model result > 0.5: predict lost

🞑 Possible! But some problems.

 The log-likelihood turns products into

 Derivative with respect to one component

Create an instance of the class

Tune regularization parameters with cross-validation: LogisticRegressionCV.

• You are asked to build a classifier for leukemia

• You are asked to build a classifier for leukemia

 Receiver Operating Characteristic

🞑 TPR on y-axis and FPR on x-axis

True Positive Rate

Evaluation of model at all possible

🞑 (1, 0): perfect (ideal)

True Positive Rate

0.2 0.4 0.6 0.8 1.0

Measures total area under ROC

Calculate the error on the test and predicted data sets

Lots of other error metrics and diagnostic tools:

You might also like