You are on page 1of 38

Logistic Regression

The Classification Problem

1.Given the observations' set {x1i, x2i}


2.And a classification or label associated with
each observation - Class-0 or Class-1
3.Create a model that will learn the boundary
between the two classes such that, given any
candidate observation {x1c, x2c}, it will be
classified into one of these classes with high
reliability

Logistic Regression is one method to solve this problem

2
Logistic Unit
• If the combination of inputs
– {x1, x2, x3, …, xn}
• Result in a response that is a “categorical variable”
– With two possible states: 0 and 1
• Then, we have a unit that is known as the
– Logistic Unit
• And we need a function that will
– Trigger 0 or 1 as an output, based on the inputs
– Such a function is known as an Activation Function

3
Logistic Unit and Logistic Regression

4
SIGMOID
• S(a) = 1/(1 + e-a)

5
Logistic Regression simplified

6
Logistic Regression: Notations

7
Matrix Notations: Logistic Regression

8
Matrix Operations

9
Logistic Regression: Calculating the weights wi
• Given
– N observations (data points: X)
– Corresponding targets (t) … see Note below
• Goal
– Calculate weights wi
– Such that the likelihood of getting the desired targets is
maximized given the observations X

• Note:
– In the training phase, input data points xj and the output yj are both known. In
this case, yj is known as the target and denoted by tj
• The known x and t are used to find w
– While predicting, the input data point x is known and the w’s are known, and the
corresponding output y is found out
• Recall that y is calculated using the SIGMOID function and ranges from 0-1. It can be
therefore viewed as a ‘probability’ p(y=1|x). If this probability is more than 0.5, the output
is set to 1, else 0.

10
Theory: How to calculate the weights?

11
Theory: How to calculate the weights?

12
Theory: How to calculate the weights?

-Log(L) = also known as the “error function”


13
Maximizing Log Likelihood
Maximizing Likelihood = Minimizing the error
function:

Goal:
– Minimize this error function
– Since we want to find out the weights to minimize the
error function with respect to weights w. That is:

14
Minimizing the error function (Logistic)

15
Minimizing the error function (Logistic)

16
Minimizing the error function (Logistic)

17
Minimizing the error function (Logistic)

18
Calculating the weights wi
• Now that we have an expression for minimizing
the error function with respect to wi
• We can begin the process of calculating the
weights themselves
• This is an iterative procedure known as the
“Gradient Descent Method” and it works as
follows: (also refer next slide)
1. Initialize w randomly
2. Find out the predicted Y
3. Find out gradient
4. Descend along the gradient to get new weights
5. Repeat until termination criteria is reached
19
Basis of the Gradient Descent method
• The error function is given by:

• The gradient of this error


function is:

• In gradient descent, the


weights are updated as:

20
Logistic Regression: Example

21
Logistic Regression: Example

22
Logistic Regression: Example

23
Logistic Regression: Example

24
Logistic Regression: Example

25
Logistic Regression: Example

26
Logistic Regression: Example

27
Logistic Regression: Example

28
Logistic Regression: Quality Metrics

CONFUSION MATRIX

29
The F1 Score

30
ROC Curves

31
ROC – Example Dataset and Classification

32
Confusion Matrix v/s Threshold Values
Confusion matrices for
probability threshold values
ranging from 0.0 to 1.0

Red highlighted confusion


matrix is the default one
where the probability
threshold is 0.5

33
Logistic Regression Metrics v/s Threshold

Threshold Accuracy Sensitivity Specificity


(TPR) (TNR) (FPR)
0.0 0.5 0.0 1.0 0.0
0.1 0.9597 0.9472 0.9722 0.0278
0.2 0.9569 0.9472 0.9667 0.0333
0.3 0.9569 0.9528 0.9611 0.0389
0.4 0.9556 0.9528 0.9583 0.0417
0.5 0.9569 0.9583 0.9556 0.0444
0.6 0.9542 0.9611 0.9472 0.0528
0.7 0.9556 0.9639 0.9472 0.0528
0.8 0.9528 0.9639 0.9417 0.0583
0.9 0.9403 0.9667 0.9139 0.0861
1.0 0.5583 1.000 0.1167 0.8833

34
ROC Plot

35
Impact of Data on ROC Plots

36
Impact of Data Characteristics on ROC Plots

It can be seen that clear class


separation results in a sharper
ROC curve

37
Handling more than 2 classes …
The following techniques are used to handle multiple classes:

The choice between One-vs-


Rest and Multinomial Logistic
Regression depends on
factors like dataset size,
computational resources, and
the nature of the problem.

Softmax regression is a more


direct and computationally
efficient approach for
handling multiple classes
when compared to One-vs-
Rest.

38

You might also like