You are on page 1of 29

Logistic

Regression
PhD. Msc. David C. Baldears S.

TC3007C

https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc

1
Simple linear regression
Age and systolic blood pressure (SBP) among 33 adult women

2
Simple linear regression
● Relation between 2 continuous variables (SBP and age)

● Regression coefficient β1
○ Measures association between y and x
○ Amount by which y changes on average when x changes by one unit
○ Least squares method

3
Multiple linear regression
● Relation between a continuous variable and a set of i continuous variables

● Partial regression coefficients βi


○ Amount by which y changes on average
when xi changes by one unit
and all the other xis remain constant
○ Measures association between xi and y adjusted for all other xi
● Example
○ SBP versus age, weight, height, etc

4
Multiple linear regression

Predicted Predictor variables

Response variable Explanatory variables

Outcome variable Covariables

Dependent Independent variables

5
Age and signs of coronary heart disease (CD)

Example — Data
What is the relationship between age and CD?

6
Linear Regression Not a good fit
Increasing relationship between the
age and CD

But we may want a simpler model,


with am predicted as 0 or 1

We need a model for analyzing data


with binary response

7
Logistic regression
● It is a type of Regression Machine Learning Algorithms being deployed to
solve Classification Problems/categorical.
● It does this by predicting categorical outcomes, unlike linear regression
that predicts a continuous outcome.
● Problems having binary outcomes, such as Yes/No, 0/1, True/False, are the
ones being called classification problems.

8
Why Apply Logistic Regression?
Linear regression doesn’t give a good fit line for the problems having only two
values.

It will give less accuracy while prediction because it will fail to cover the
datasets, being linear in nature.

It needs a model for


analyzing data with binary
response

9
Logistic regression
Prevalence (%) of signs of CD according to age group

10
Comparing LP and Logistic regression Models

11
Mathematics Involved in Logistic Regression
Sigmoid Function

Which is a be also expressed in terms of intersect and weight

Can also be expressed as:

12
Probability
This can be express as Probability

π is the probability that the event Y occurs

1-π is the probability that the event Y does not occurs

13
Probability
Odds ratio is the probability of an event occurring divided by the probability of
it not occurring: Thus if π is the probability of an event:

● β0= log odds of disease


in unexposed
● β1 = log odds ratio associated
with being exposed
● eβ1 = odds ratio

Log-odds ratio, or “logit:

Note: This is natural log (aka ‘‘ln’’) 14


Why Log Odds ratio?
In the a study 70.5% of men and 72.9% of women voted

The odds ratio of men voting were = 0.705/.295 = 2.39, and the log odds ratio
were ln(2.39) = 0.8712

The odds ratio of women voting were = 0.729/.271 = 2.69, and the log odds
ratio were ln(2.69) = 0.9896

Note that ln(1.0) = 0, so that when the odds ratio is 1.0 (0.5/0.5) the log odds
ratio is zero

15
Why use Logarithms?
● They have 3 advantages
1. Odds vary from 0 to ∞, whereas log odds vary from -∞ to +∞ and are centered at 0. Odds
less than 1 have negative values in log odds and odds greater than one have positive
values in log odds. This accords better with the natural number system which runs from -∞
to +∞.
2. If we take any two numbers and multiply them together that's is the equivalent of adding
their logs. Thus logs make it possible to convert multiplicative models to additive models, a
useful property in the case of logistic regression which is a non-linear multiplicative model
when not expressed in logs.
3. A useful statistic for evaluating the fit of models is -2log likelihood (also known as
deviance). The model has to be expressed in logarithms for this to work.

16
Logistic Regression Model
● The logistic distribution constrains the estimated probabilities to lie
between 0 and 1.
● The estimated probability is:

○ if you let β0 + β1x = 0, then p = .50


○ as β0 + β1x gets really big, p approaches 1
○ as β0 + β1x gets really small, p approaches 0

17
Cost Function of Logistic Regression
The cost can be defined as:

in a nutshell:

where n is the number of records that we have.

18
Fitting equation to the data
● Linear regression: Least squares
● Logistic regression: Maximum likelihood
● Likelihood function
○ Estimates parameters β0 and β1
○ Practically easier to work with log-likelihood

19
Explanation of Cases of Cost Function
When Y = 1: When Y = 0:

Penalize as it moves farther from 1, Penalize as it moves farther from 0,


-log(π(x)) -log(1- π(x))
Cost

Cost
Model Prediction Model Prediction
20

Note: our model’s prediction won’t exceed 1 and won’t go below 0. So,
that part is outside of our worries.
Derivative of Sigmoid Function
Sigmoid function Simplify the equation

Reciprocal Rule Make Less Exponential Operations

Applying the reciprocal rule

21
Derivative of Sigmoid Function

Sigmoid:

Derivative

22
Gradient Descent and Cost Function Derivatives
Having the cost function

For convenience
it requires to find

The gradient descent formula:

with α as the learning rate


Gradient Descent and Cost Function Derivatives
Let Part 2

Part 1

Adding A + B

24
Gradient Descent and Cost Function Derivatives
Hence, the update for β

25
Maximum likelihood
● Iterative computing
○ Choice of an arbitrary value for the coefficients (usually 0)
○ Computing of log-likelihood
○ Variation of coefficients’ values
○ Reiteration until maximisation (plateau)
● Results
○ Maximum Likelihood Estimates (MLE) for β0 and β1
○ Estimates of P(y) for a given value of x

26
Multiple logistic regression
● More than one independent variable
○ Dichotomous, ordinal, nominal, continuous …

● Interpretation of βi
○ Increase in log-odds for a one unit increase in xi with all the other xis constant
○ Measures association between xi and log-odds adjusted for all other x i

27
Types of Logistic Regression
● Binary Logistic Regression: the dependent/target variable has two
distinct values
○ For example 0 or 1, malignant or benign, passed or failed, admitted or not admitted.
● Multinomial Logistic Regression: the target or independent variable has
three or more possible values.
○ For example, the use of Chest X-ray images as features that give indication about one of
the three possible outcomes (No disease, Viral Pneumonia, COVID-19).
● Ordinal Logistic Regression: the target variable is of ordinal nature. In
this type, the categories are ordered in a meaningful manner and each
category has quantitative significance.
○ For example, the grades obtained on an exam have categories that have quantitative
significance and they are ordered. Keeping it simple, the grades can be A, B, or C.
○ “very poor”, “poor”, “good”, “very good” 28
Examples
● Education sector:
○ Whether a student gets admission into a university program or not is based on test scores
and various other factors.
○ In E-learning platforms to see whether a student will complete a course on time or not
based on past activity and other statistics relevant to the problem.
● Business sector:
○ Predicting whether a credit card transaction made by a user is fraudulent or not.
● Medical sector:
○ Predicting whether a person has a disease or not is based on values obtained from test
reports or other factors in general.
○ A very innovative application of Machine Learning being used by researchers is to predict
whether a person has COVID-19 or not using Chest X-ray images.
● Other applications:
○ Email Classification – Spam or not spam
○ Sentiment Analysis – Person is sad or happy based on a text message
○ Object Detection and Classification – Classifying an image to be a cat image or a dog image 29

You might also like