You are on page 1of 64

Logistic Regression Algorithm

www.intellipaat.com
Agenda for
Today’s Session
01 03 05

Introduction to Why Logistic Summary


Logistic Regression?
Regression

What is Logistic
Linear Regression Spam Email
Regression?
Recap Classifier
02 04 06

www.intellipaat.com
Look back at Linear Regression

www.intellipaat.com
Hey Josh! Help me

Linear find a property with


bigger garden area
for “X” bucks.
Hey Lauren! Sure,
I will help you.

Regression
(Recap)

www.intellipaat.com
Hey. Can you do
Linear something about it?
Sure…

Regression
(Recap)

www.intellipaat.com
So,
Hey. Can you do Spending “X” bucks
Linear something about it? will get her a
property of area “Y”
Regression
(Recap)

www.intellipaat.com
How did you find Simple linear
Linear out? regression.

Regression
(Recap)

www.intellipaat.com
How did you find Let me show you
Linear out? how.

Regression
(Recap)

www.intellipaat.com
Dependent
Variable
(Property Size)
+ ve

Spending more
Linear money will get her
bigger property.

Regression
(Recap)

Independent
Variable
(Money)

www.intellipaat.com
Dependent
Variable
(House Area)

Bigger garden area


Linear means smaller
house.

Regression
(Recap)
- ve

Independent
Variable
(Garden Area)

www.intellipaat.com
Dependent
Variable
(Property Area)

Y
Linear
Regression
(Recap)

What he can say If Lauren spends “X” amount of money,


Independent
she can buy a property of area “Y”. Variable
Will the property have a good neighbourhood (Price)
What he can’t say or not?
Will the location be noiseless suburb or a
bustling city? www.intellipaat.com
Will the property have a good neighbourhood?

Will it rain tomorrow or not?

Introductio Is the mail spam or not?

n to Classification
Linear
Regression

Logistic Problems cannot


answer

Regression
Logistic
Regression
comes into
the picture

www.intellipaat.com
Linear
Introductio Regression
Regression

n to
Logistic Supervised
Logistic
Regression
Regression Machine
Learning
Classification
SVM

Unsupervised

www.intellipaat.com
 A statistical classification model
 Deals with categorical dependent variables
What is  Could be binary or dichotomous

Logistic  Could be multinomial



Regression Takes both continuous and discrete input data

www.intellipaat.com
 Tool for applied statistics and discrete data analysis
 Gives outcome in terms of probability
Why  In-turn helps in classifying the given data
Logistic
Regression
?

www.intellipaat.com
Let’s understand this with an example

www.intellipaat.com
Let us explore with an example of Spam email classifier .

Spam Email Spam

Classifier Email

Not
Spam

www.intellipaat.com
Steps:

Understanding the variable

Plot the labeled data


Spam Email
Classifier Draw regression curve

Find out the best fitted curve using


Maximum Likelihood
Estimator(MLE)

www.intellipaat.com
Spam Email
Let’s get started.
Classifier

www.intellipaat.com
Independent variable:
 Count of spam words

STEP:1 Most common spam words:


 Buy
Define the variable  Get paid
 Guarantee
 Winner
Plot labeled data
 Unlimited
Bag of spam words
Draw regression line

Find out the best using MLE

www.intellipaat.com
Dependent variable:
 Probability of mail being spam.
 Binary or dichotomous

STEP:1 1

Define the variable Probability of mail Spam


Being spam
Plot labeled data 0

Draw regression line Not Spam

Find out the best using MLE

Word
count
0 1 2 3 4 5 6 7 9 10

www.intellipaat.com
Probability
2 words 0

6 words 1

STEP:2 8 words 1

1 words 0
Define the variable

7 words 1
Plot labeled data
3 words 0
Draw regression line
9 words 1
Find out the best using MLE
8 words 0

Pre-labeled Dataset

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

Probability
1
1 words 0

5 words 1

3 words 1

2 words 0

7 words 1

4 words 0

9 words 1 0

8 words 0
www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

Probability Probability
1 words 0 1

5 words 1 Spam if
>= 0.5
3 words 1

2 words 0 0.5

7 words 1 Not spam if


< 0.5
4 words 0

9 words 1 0

8 words 0
www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

Probability Probability
1 words 0 1

5 words 1 Spam if
>= 0.5
3 words 1

2 words 0 0.5

7 words 1 Not spam if


< 0.5
4 words 0

9 words 1 0

8 words 0
www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

Probability
1

 Need to find a regression curve


which would be the best fit Spam if
> 0.5
 That would be our logistic
regression curve 0.5

Not spam if
But how do we find the < 0.5

best fit? 0

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

How? log(odds)
+∞

Probability
1

log(odds) 0

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds) How?
+∞

Probability of
mail being spam
1

Sigmoid
0
Function

0.03
0

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

How?

Probability of
mail being spam
1

Individual
log(likelihood)
likelihood

0.03
0

www.intellipaat.com
STEP:3
Define the variable
What does log(odds) mean?
Plot labeled data

Draw regression line

Find out the best using MLE

www.intellipaat.com
Probability vs odds
 This guy went fishing 5 times
a week

STEP:3 

Caught a fish 2 times
Failed to catch 3 times.

Define the variable What is the probability and odds


of getting a Fish for dinner?
Plot labeled data

Chances for 2
Draw regression line
Probability= =
Total chances 5
Find out the best using MLE

Chances for 2
Odds = =
Chances Against 3

www.intellipaat.com
Log(odds) and log(odds ratio)

STEP:3  Log(odds)=Logit Function  Note: Odds ≠ Odds Ratio

2
 Odds of catching on sunny day=
3
Define the variable
3
 Odds of catching on rainy day=
Plot labeled data 2
2
 Log(odds of catching on sunny day) =log
3
Draw regression line
3
 Log(odds of catching on rainy day)=log( )
Find out the best using MLE 2
2
𝑜𝑑𝑑𝑠 𝑜𝑛 𝑟𝑎𝑖𝑛𝑦 𝑑𝑎𝑦
 Log(odds ratio)= Log( )=log ( ) =log(0.44)
3
3
𝑜𝑑𝑑𝑠 𝑜𝑛 𝑠𝑢𝑛𝑛𝑦 𝑑𝑎𝑦
2

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds)
+∞

Probability
1

log(odds) 0

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

𝑃(𝑆𝑝𝑎𝑚)
log(odds) = log( )
1−𝑃(𝑆𝑝𝑎𝑚)

1
Probability log(odds) = log( )
1−1
1 1
log(odds) = log( )
0
+∞

log𝑏 0 = 𝑐
=>0=𝑏 𝑐 -------(1)
0
So, for equation(1) to be true.
If b<1 =>c  - ∞
If b>1 =>c  + ∞
www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds)
+∞

Probability
1 𝑃(𝑆𝑝𝑎𝑚)
log(odds) = log( )
1−𝑃(𝑆𝑝𝑎𝑚)

1
log(odds) = log( ) 0
0

log(odds)=log(1)-log(0)
log(odds)=0-(- ∞)+∞
0

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds)
+∞

Probability
1 𝑃(𝑁𝑜𝑡 𝑆𝑝𝑎𝑚)
log(odds) = log( )
1−𝑃(𝑁𝑜𝑡 𝑆𝑝𝑎𝑚)

0
log(odds) = log( ) 0
1−0
0
log(odds) = log( )
1

0 log(odds)=log(0)-log(1)-∞

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds) log(odds)
+∞ +∞

0 0

-∞ -∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds) How?
+∞

Probability of
mail being spam
1

Sigmoid
0
Function

0.03
0

-∞

www.intellipaat.com
STEP:3
Define the variable What does Sigmoid Function mean?
Plot labeled data

Draw regression line

Find out the best using MLE

www.intellipaat.com
Sigmoid Function:
Sigmoid function is the standard logistic function
𝐿(𝑒 𝑘 𝑥−𝑥0 )
Logistic function=
STEP:3 1+𝑒 𝑘(𝑥−𝑥0 )

Here,
Define the variable
L – Curve’s maximum value
k – Steepness of the curve
x0 – x value of Sigmoid midpoint
Plot labeled data
𝑒𝑥
Sigmoid function=
Draw regression line 1+𝑒 𝑥
Here,
Find out the best using MLE k=1
x0=0
L=1

www.intellipaat.com
Sigmoid Function:
𝑒𝑥
Sigmoid function=
1+𝑒 𝑥

STEP:3 

S’ shaped curve.
Sigmoid curve has a finite limit of:
Define the variable ‘0’ as x approaches −∞
‘1’ as x approaches +∞
Plot labeled data

Y
Draw regression line 1

Find out the best using MLE

0 X
www.intellipaat.com
Sigmoid Function:

STEP:3 X 1

Define the variable

Plot labeled data


0 X

Draw regression line

Find out the best using MLE  Takes any real-valued number and maps into a
value between 0 and 1.
 Helpful while solving classification problems

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds) How?
+∞

Probability of
mail being spam
1

Sigmoid
0
Function

0.03
0

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds)

+∞
Probability of
mail being spam

𝑒 log(𝑜𝑑𝑑𝑠) 1
P=
1+𝑒 log(𝑜𝑑𝑑𝑠)

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds)

+∞
Probability of
mail being spam

𝑒 log(−3.2) 1
P= = 0.03
1+𝑒 log(−3.2)

0.5
0

-3.2
0.03 .
0

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds)

+∞
Probability of
mail being spam

𝑒 log(5.6) 1
0.99
5.6
P= = 0.99
1+𝑒 log(5.6)

-3.2

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds)

+∞
Probability of
mail being spam

𝑒 log(−4.5) 1
5.6
P= = 0.01
1+𝑒 log(−4.5)

-3.2
0.01
-4.5 0

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:3 Define the variable Plot labeled data
Line using MLE

log(odds)

+∞
Probability of
mail being spam

5.6
𝑒 log(odds) 1
P=
1+𝑒 log(𝑜𝑑𝑑𝑠)

-3.2
0.01
-4.5 0

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

Probability of Likelihood of 1st mail being spam = 0.01


mail being spam
Likelihood of 2nd mail being spam = 0.01
1
0.99 Likelihood of 3 rd mail being spam = 0.03

Likelihood of 4 th mail being spam = 0.05


Now let us find out the
individual log likelihood of Likelihood of 5 th mail being spam = 0.97

each mail Likelihood of 6 th mail being spam = 0.99

0.03 Likelihood of 7th mail being spam = 0.99


0.01
0
Likelihood of 8 th mail being spam = 0.99

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

Probability
1
0.99

(1-0.01) x (1-0.01) x (1-0.03) x (1-


Likelihood of data given this curve =
0.5 0.05) x 0.97 x 0.99 x 0.99 x 0.99

0.03
0

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

Probability
1
0.99
log(1-0.01) + log(1-0.02)
+ log(1-0.05) + log(1-0.05)
log(Likelihood of data given this curve) = -0.084
0.5 + log(0.97) + log(0.99)
+ log(0.99) + log(0.99)

0.03
0

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

+∞

Probability
5.6 1
0.99

This means that log


0
0.5 likelihood of this line is
-0.084
-3.2

0.03
0

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

+∞

5.6

0 Now let us rotate this line to find out the best fitted
regression line.
-3.2

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

+∞

Probability
1
0.93
2.7

Again we will calculate


0 0.5 individual log likelihood of
-1.5
each mail
0.18

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

Probability
1
0.93
log(0.93) + log(0.97)
log(Likelihood of data given this curve) = + log(0.99) + log(0.99)
-0.207
0.5 + log(1-0.18) + log(1-0.2) +
log(1-0.03) + log(1-0.03)

0.18

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

+∞

Probability
1
0.93
2.7

This means that log


0 0.5 likelihood of this line is
-1.5 -0.207
0.18

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

+∞ +∞

Line “A” Line “B”

5.6

2.7

0 Line “A” has a better 0


likelihood value than -1.5
-3.2 line “B”

-∞ -∞

log(likelihood)= - 0.084 log(likelihood)= - 0.207


www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

+∞

0 Again we will rotate the line

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

+∞

0 We will keep on rotating the line

-∞

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

+∞

0 We will keep on rotating the line

-∞ Until we get the maximum


log likelihood

www.intellipaat.com
Draw Regression Find out the best
STEP:4 Define the variable Plot labeled data
Line using MLE

+∞

0 We will keep on rotating the line until we get the


maximum log likelihood

-∞ That would be the best


fitted regression line

www.intellipaat.com
Summary

Look back of Linear Introduction to Logistic What and Why logistic


Regression Regression regression?

www.intellipaat.com
Summary

A B C

Draw a regression line and Use MLE to find out the best
Plot the labeled data
keep rotating fitted regression line

Data 1 Data 1 Data 1

0 0 0

Sigmoid
Log(odds) MLE
Function

www.intellipaat.com
www.intellipaat.com

India : +91-7847955955

US : 1-800-216-8930 (TOLL FREE)

sales@intellipaat.com

www.intellipaat.com

You might also like