Professional Documents
Culture Documents
Logistic Regression - Day5
Logistic Regression - Day5
LOGISTIC REGRESSION
January 2020
Definition
• In many situations, the response variable is qualitative or, in other words, categorical. For
example, gender is qualitative, taking on values male or female.
• Predicting a qualitative response for an observation can be referred to as classifying that
observation, since it involves assigning the observation to a category, or class. On the other
hand, the methods that are often used for classification first predict the probability of each of
the categories of a qualitative variable, as the basis for making the classification.
• Linear regression is not capable of predicting probability. If you use linear regression to model
a binary response variable, for example, the resulting model may not restrict the predicted Y
values within 0 and 1. Here's where logistic regression comes into play, where you get a
probability score that reflects the probability of the occurrence at the event.
Logistic Regression : statistical model equation
• Logistic regression is defined as the log of odds of the event(logit function), ln(P/1−P), where,
P is the probability of event.
• P always lies between 0 and 1.
Where,
• β: log-odds ratio associated with predictors
• e β: odds ratio
Logistic regression is based on Maximum Likelihood Estimation which says coefficients
should be chosen in such a way that it maximizes the Probability of Y given X (likelihood).
Assumptions of Logistic Regression
Confusion Matrix
• A confusion matrix is formed from the four outcomes produced as a result of binary
classification.
Four outcomes of classification
• True positive (TP): correct positive prediction
• False positive (FP): incorrect positive prediction
• True negative (TN): correct negative prediction
• False negative (FN): incorrect negative prediction
Evaluate Logistic Regression Model Fit
and Accuracy
Accuracy
• Accuracy (ACC) is calculated as the number of all correct predictions divided by the total number of
the dataset. The best accuracy is 1.0, whereas the worst is 0.0. It can also be calculated by 1 – ERR.
Error rate
• Error rate (ERR) is calculated as the number of all incorrect predictions divided by the total number of
the dataset. The best error rate is 0.0, whereas the worst is 1.0.
• specificity, is more informative than accuracy and error rate.
Sensitivity (Recall or True positive rate)
• Sensitivity is calculated as the number of correct positive predictions (TP) divided by the total number
of positives (P).
• It is also called recall (REC) or true positive rate (TPR). The best sensitivity is 1.0, whereas the worst
is 0.0.
Specificity (True negative rate)
• Specificity is calculated as the number of correct negative predictions (TN) divided by the total
number of negatives (N).
• It is also called true negative rate (TNR). The best specificity is 1.0, whereas the worst is 0.0.
Precision (Positive predictive value)
• Precision is calculated as the number of correct positive predictions (TP) divided by the total number
of positive predictions (TP + FP).
• It is also called positive predictive value (PPV). The best precision is 1.0, whereas the worst is 0.0.
Evaluate Logistic Regression Model Fit
and Accuracy
False positive rate
• False positive rate (FPR) is calculated as the number of incorrect positive predictions divided
by the total number of negatives. The best false positive rate is 0.0 whereas the worst is 1.0. It
can also be calculated as 1 – specificity.
F-score
• F-score is a harmonic mean of precision and recall.
Note:
In general we are concerned with one of the above defined metric. For instance, in a
pharmaceutical company, they will be more concerned with minimal wrong positive diagnosis.
Hence, they will be more concerned about high Specificity. On the other hand an attrition model
will be more concerned with Senstivity. Confusion matrix are generally used only with class
output models.
Evaluate Logistic Regression Model Fit
and Accuracy
ROC CURVE
• The ROC curve is the plot between sensitivity and (1- specificity).
• (1- specificity) is also known as false positive rate and sensitivity is also known as True
Positive rate.