You are on page 1of 16

ASET

Module-IV

Logistic Regression
Prepared by: Dr. Ram Paul Hathwal
Dept of CSE, ASET, AUUP
Logistic Regression ASET

 Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique.
 It is used for predicting the categorical dependent variable using a given set of
independent variables.
 In a classification problem, the target variable(or output), y, can take only discrete
values for given set of features(or inputs), X.
 The objective of Logistic regression is to find the best fitting model to describe the
relationship between the dichotomous characteristics of interest and a set of independent
variables.
 Used in a situation when a researcher is interested to predict the occurrence of any
happenings.
Graphical Representation:
Logistic Regression ASET
Linear vs Logistic Regression ASET
Linear vs Logistic Regression ASET
Logistic Regression:
Mathematical Representation ASET

 In linear regression, the output Y is in the same units as the target variable (the
thing you are trying to predict).
 However, in logistic regression the output Y is in log odds. 

Odds is just another way of expressing the probability of an event, P(Event).


 

1 1
sigmoid ( z )  z
  (  T xi )
1 e 1 e
Sigmoid function convers input range 0 to 1

e= Euler’s number~2.71828
ASET
Types of Logistic Regression ASET

 Binomial: 
Target variable can have only 2 possible types: “0” or “1” which may
represent “win” vs “loss”, “pass” vs “fail”, “dead” vs “alive”, etc.
 Multinomial: 
Target variable can have 3 or more possible types which are not ordered
(i.e. Types have no quantitative significance) like “disease A” vs “disease
B” vs “disease C”.
 Ordinal: 
It deals with target variables with ordered categories. For example, a test
score can be categorized as: “very poor”, “poor”, “good”, “very good”.
Here, each category can be given a score like 0, 1, 2, 3.
How does Logistic Regression Work? ASET

 Consider we have a model with one predictor “x” and one Bernoulli response variable “ŷ”
and p is the probability of ŷ=1. The linear equation can be written as:
p = b0+b1x --------> (1)
 Odds: The ratio of the probability of an event occurring to the probability of an event not occurring.
Odds = p/(1-p)
 The equation 1 can be re-written as:
p/(1-p) = b0+b1x      --------> (2)
 Odds can only be a positive value, to tackle the negative numbers, we predict the logarithm of odds.
Log of odds (i.e. logit) = ln(p/(1-p))
 The equation 2 can be re-written as:
ln(p/(1-p)) = b0+b1x      --------> (3)
 To recover p from equation (3), we apply exponential on both sides.
exp(ln(p/(1-p))) = exp(b0+b1x)
eln(p/(1-p)) = e(b0+b1x) 9
How does Logistic Regression Work? ASET

 From the inverse rule of logarithms


p/(1-p) = e(b0+b1x)
 Simple algebraic manipulations
p = (1-p) * e(b0+b1x) = e(b0+b1x)- p * e(b0+b1x)
 Taking p as common on the right-hand side
p = p * ((e(b0+b1x))/p - e(b0+b1x))
p = e(b0+b1x) / (1 + e(b0+b1x))
 Dividing numerator and denominator by e(b0+b1x) on the right-hand side
p = 1 / (1 + e-(b0+b1x))
 Similarly, the equation for a logistic model with ‘n’ predictors is as below:
p = 1/ (1 + e-(b0+b1x1+b2x2+b3x3+----+bnxn)
 Yes, it is the sigmoid function. It helps to squeeze the output to be in the range between 0 and 1.
 The regression coefficient for logistic regression is calculated using maximum
likelihood estimation
10
Sigmoid function ASET

The sigmoid function is useful to map any predicted values of probabilities into another value
between 0 and 1.

11
Logistic Regression assumptions ASET

 Consider removing outliers in your training set because logistic regression will
not give significant weight to them during its calculations.
 Does not favor sparse (consisting of a lot of zero values) data.
 Logistic regression is a classification model, unlike linear regression.
 The coefficients in logistic regression are estimated using a process called
maximum-likelihood estimation.
 In a binary logistic regression, the dependent variable must be binary
 For a binary regression, the factor level one of the dependent variables should
represent the desired outcome
 The independent variables should be independent of each other. This means the
model should have little or no multicollinearity
 Remove highly correlated inputs.
 Logistic regression requires quite large sample sizes
12
Making Predictions with
Logistic Regression ASET

 Making predictions with a logistic regression model is as simple as plugging in


numbers into the logistic regression equation and calculating a result.
 Let’s say we have a model that can predict whether a person is male or female
based on their height. Given a height of 150cm is the person male or female.
 We have learned the coefficients of b0 = -100 and b1 = 0.6. Using the equation
above we can calculate the probability of male given a height of 150cm or more
formally P(male|height=150). We will use EXP() for e, because that is what you can
use if you type this example into your spreadsheet:
y = e^(b0 + b1*X) / (1 + e^(b0 + b1*X))
y = exp(-100 + 0.6*150) / (1 + EXP(-100 + 0.6*X))
y = 0.0000453978687
• Or a probability of near zero that the person is a male.
• In practice we can use the probabilities directly. Because this is classification and
we want a crisp answer, we can snap the probabilities to a binary class value, for 13
Uses of Logistic Regression ASET

 Prediction of group membership


 It is also provides knowledge of the relationship and strength among the variables.
 Casual relationship between one or more independent variables and one binary dependent
variables.
 Used to forecast the outcome event.
 Used to predict changes in probabilities.
 Using the logistic regression algorithm, banks can predict whether a customer would
default on loans or not
 To predict the weather conditions of a certain place (sunny, windy, rainy, humid, etc.)
 Ecommerce companies can identify buyers if they are likely to purchase a certain product
 Companies can predict whether they will gain or lose money in the next quarter, year, or
month based on their current performance
 To classify objects based on their features and attributes
14
Linear Regression vs. Logistic Regression ASET

Linear Regression Logistic Regression

Used to solve classification


Used to solve regression problems
problems

The response variables are The response variable is categorical


continuous in nature in nature

It helps estimate the dependent


It helps to calculate the possibility of
variable when there is a change in
a particular event taking place
the independent variable

It is a straight line It is an S-curve (S = Sigmoid)


15
ASET

Thanks!

16

You might also like