Professional Documents
Culture Documents
2
Logistic Unit
• If the combination of inputs
– {x1, x2, x3, …, xn}
• Result in a response that is a “categorical variable”
– With two possible states: 0 and 1
• Then, we have a unit that is known as the
– Logistic Unit
• And we need a function that will
– Trigger 0 or 1 as an output, based on the inputs
– Such a function is known as an Activation Function
3
Logistic Unit and Logistic Regression
4
SIGMOID
• S(a) = 1/(1 + e-a)
5
Logistic Regression simplified
6
Logistic Regression: Notations
7
Matrix Notations: Logistic Regression
8
Matrix Operations
9
Logistic Regression: Calculating the weights wi
• Given
– N observations (data points: X)
– Corresponding targets (t) … see Note below
• Goal
– Calculate weights wi
– Such that the likelihood of getting the desired targets is
maximized given the observations X
• Note:
– In the training phase, input data points xj and the output yj are both known. In
this case, yj is known as the target and denoted by tj
• The known x and t are used to find w
– While predicting, the input data point x is known and the w’s are known, and the
corresponding output y is found out
• Recall that y is calculated using the SIGMOID function and ranges from 0-1. It can be
therefore viewed as a ‘probability’ p(y=1|x). If this probability is more than 0.5, the output
is set to 1, else 0.
10
Theory: How to calculate the weights?
11
Theory: How to calculate the weights?
12
Theory: How to calculate the weights?
Goal:
– Minimize this error function
– Since we want to find out the weights to minimize the
error function with respect to weights w. That is:
14
Minimizing the error function (Logistic)
15
Minimizing the error function (Logistic)
16
Minimizing the error function (Logistic)
17
Minimizing the error function (Logistic)
18
Calculating the weights wi
• Now that we have an expression for minimizing
the error function with respect to wi
• We can begin the process of calculating the
weights themselves
• This is an iterative procedure known as the
“Gradient Descent Method” and it works as
follows: (also refer next slide)
1. Initialize w randomly
2. Find out the predicted Y
3. Find out gradient
4. Descend along the gradient to get new weights
5. Repeat until termination criteria is reached
19
Basis of the Gradient Descent method
• The error function is given by:
20
Logistic Regression: Example
21
Logistic Regression: Example
22
Logistic Regression: Example
23
Logistic Regression: Example
24
Logistic Regression: Example
25
Logistic Regression: Example
26
Logistic Regression: Example
27
Logistic Regression: Example
28
Logistic Regression: Quality Metrics
CONFUSION MATRIX
29
The F1 Score
30
ROC Curves
31
ROC – Example Dataset and Classification
32
Confusion Matrix v/s Threshold Values
Confusion matrices for
probability threshold values
ranging from 0.0 to 1.0
33
Logistic Regression Metrics v/s Threshold
34
ROC Plot
35
Impact of Data on ROC Plots
36
Impact of Data Characteristics on ROC Plots
37
Handling more than 2 classes …
The following techniques are used to handle multiple classes:
38