DS203 2024 01 02 LogisticRegression

Logistic Regression
The Classification Problem
1.Given the observations' set {x1i, x2i}

2.And a classification or label associated with
each observation - Class-0 or Class-1
3.Create a model that will learn the boundary
between the two classes such that, given any
candidate observation {x1c, x2c}, it will be
classified into one of these classes with high
reliability
Logistic Regression is one method to solve this problem
2
Logistic Unit
• If the combination of inputs
– {x1, x2, x3, …, xn}
• Result in a response that is a “categorical variable”
– With two possible states: 0 and 1
• Then, we have a unit that is known as the
– Logistic Unit
• And we need a function that will
– Trigger 0 or 1 as an output, based on the inputs
– Such a function is known as an Activation Function
3
Logistic Unit and Logistic Regression
4
SIGMOID
• S(a) = 1/(1 + e-a)
5
Logistic Regression simplified
6
Logistic Regression: Notations
7
Matrix Notations: Logistic Regression
8
Matrix Operations
9
Logistic Regression: Calculating the weights wi
• Given
– N observations (data points: X)
– Corresponding targets (t) … see Note below
• Goal
– Calculate weights wi
– Such that the likelihood of getting the desired targets is
maximized given the observations X
• Note:
– In the training phase, input data points xj and the output yj are both known. In
this case, yj is known as the target and denoted by tj
• The known x and t are used to find w
– While predicting, the input data point x is known and the w’s are known, and the
corresponding output y is found out
• Recall that y is calculated using the SIGMOID function and ranges from 0-1. It can be
therefore viewed as a ‘probability’ p(y=1|x). If this probability is more than 0.5, the output
is set to 1, else 0.
10
Theory: How to calculate the weights?
11
12
-Log(L) = also known as the “error function”

13
Maximizing Log Likelihood
Maximizing Likelihood = Minimizing the error
function:
Goal:
– Minimize this error function
– Since we want to find out the weights to minimize the
error function with respect to weights w. That is:
14
Minimizing the error function (Logistic)
15
16
17
18
Calculating the weights wi
• Now that we have an expression for minimizing
the error function with respect to wi
• We can begin the process of calculating the
weights themselves
• This is an iterative procedure known as the
“Gradient Descent Method” and it works as
follows: (also refer next slide)
1. Initialize w randomly
2. Find out the predicted Y
3. Find out gradient
4. Descend along the gradient to get new weights
5. Repeat until termination criteria is reached
19
Basis of the Gradient Descent method
• The error function is given by:
• The gradient of this error

function is:
• In gradient descent, the

weights are updated as:
20
Logistic Regression: Example
21
22
23
24
25
26
27
28
Logistic Regression: Quality Metrics
CONFUSION MATRIX
29
The F1 Score
30
ROC Curves
31
ROC – Example Dataset and Classification
32
Confusion Matrix v/s Threshold Values
Confusion matrices for
probability threshold values
ranging from 0.0 to 1.0
Red highlighted confusion

matrix is the default one
where the probability
threshold is 0.5
33
Logistic Regression Metrics v/s Threshold
Threshold Accuracy Sensitivity Specificity

(TPR) (TNR) (FPR)
0.0 0.5 0.0 1.0 0.0
0.1 0.9597 0.9472 0.9722 0.0278
0.2 0.9569 0.9472 0.9667 0.0333
0.3 0.9569 0.9528 0.9611 0.0389
0.4 0.9556 0.9528 0.9583 0.0417
0.5 0.9569 0.9583 0.9556 0.0444
0.6 0.9542 0.9611 0.9472 0.0528
0.7 0.9556 0.9639 0.9472 0.0528
0.8 0.9528 0.9639 0.9417 0.0583
0.9 0.9403 0.9667 0.9139 0.0861
1.0 0.5583 1.000 0.1167 0.8833
34
ROC Plot
35
Impact of Data on ROC Plots
36
Impact of Data Characteristics on ROC Plots
It can be seen that clear class

separation results in a sharper
ROC curve
37
Handling more than 2 classes …
The following techniques are used to handle multiple classes:
The choice between One-vs-

Rest and Multinomial Logistic
Regression depends on
factors like dataset size,
computational resources, and
the nature of the problem.
Softmax regression is a more

direct and computationally
efficient approach for
handling multiple classes
when compared to One-vs-
Rest.
38

DS203 2024 01 02 LogisticRegression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DS203 2024 01 02 LogisticRegression

Uploaded by

Copyright:

Available Formats

Logistic Regression

The Classification Problem

1.Given the observations' set {x1i, x2i}

Logistic Regression is one method to solve this problem

-Log(L) = also known as the “error function”

• The gradient of this error

• In gradient descent, the

Red highlighted confusion

Threshold Accuracy Sensitivity Specificity

It can be seen that clear class

The choice between One-vs-

Softmax regression is a more

You might also like