Classification With Logistic Regression, Newton's Method For Optimization, Generalized Linear Models

Classification with Logistic Regression,
Newton’s Method for Optimization,

Generalized Linear Models
Fundamentals of Data Science

12, 17, 19 November 2020
Prof. Fabio Galasso
Classifiers
2 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Classification
Email: Spam / Not Spam?
Online Transactions: Fraudulent (Yes / No)?
Tumor: Malignant / Benign ?
0: “Negative Class” (e.g., benign tumor)
1: “Positive Class” (e.g., malignant tumor)

Classifiers
4 Fabio Galasso
Linear classifiers in a nutshell
Hamburger was awesome

… service was awful…
price was awesome
Pizza was awesome … hot

dog was awful…price was
awesome
Pizza was awful … hot dog

was awesome…price was
awesome
Pizza was awful … hot dog
was awful…price was
awful
5 Fabio Galasso
Sentiment analysis
#awful #awesome sentiment
pizza was
awful… ice
cream was 1 2 +
awesome…price
was awesome
hot dog was
awesome…martin
i was awesome… 1 2 +
price was awful
everything
was 1 0 -
awful
wine was
awful..pasta was
awful…dessert was 3 1 -
awful..view was
6 Fabio Galasso
(Yes) 1
Malignant ?
(No) 0
Tumor Size Tumor Size
Threshold classifier output at 0.5:

If , predict “y = 1”
If , predict “y = 0”
Classification: y = 0 or 1
can be > 1 or < 0
Logistic Regression:
Outline
• Classification with Logistic Regression
• Newton’s Method for Optimization
• Exponential Family of Distributions
• Generalized Linear Models
13 Fabio Galasso
Outline
14 Fabio Galasso
Classification with Logistic Regression
Logistic Regression Model
Want
0.5
Sigmoid function 0
Logistic function
Interpretation of Hypothesis Output
= estimated probability that y = 1 on input x
Example: If
Tell patient that 70% chance of tumor being malignant
“probability that y = 1, given x,

parameterized by ”
Classification with Logistic Regression:
Decision Boundary
Logistic regression 1
z
Suppose predict “ “ if
predict “ “ if
Decision Boundary
x2
3
2
1 2 3 x1
Predict “ “ if
Non-‐linear decision boundaries
x2
-‐1 1 x1
-‐1
Predict “ “ if
x2
x1
Classification with Logistic Regression:
Learn 𝚯’s
Training
set:
m examples
How to choose parameters ?

Logistic Regression Algorithm
• Recall the probabilistic interpretation of the hypothesis: 𝑃(𝑦 = 1|𝑥; Θ)
24 Fabio Galasso
• Log likelihood l Θ = log 𝐿(Θ)
25 Fabio Galasso
Digression:
Perceptron
• Step function as the activation function
27 Fabio Galasso
Outline
29 Fabio Galasso
Newton’s Method for Optimization
Recap on logistic regression
31 Fabio Galasso
Optimization algorithm
Given , we have code that can compute
-‐
-‐ (for )
Optimization algorithms: Advantages:

-‐ Gradient descent -‐ No need to manually pick
-‐ Conjugate gradient -‐ Often faster than gradient
-‐ BFGS descent.
-‐ L-‐BFGS Disadvantages:
-‐ More complex
Newton’s Method
33 Fabio Galasso
Newton’s Method
• When 𝚯 is a vector
34 Fabio Galasso
Outline
35 Fabio Galasso
Exponential Family of Distributions
Recap:
Bernoulli and Gaussian distributions
37 Fabio Galasso
The Exponential Family of Distributions
38 Fabio Galasso
More on the Bernoulli Distribution
• 𝑃 𝑦, 𝜂 = 𝑏 𝑦 exp( 𝜂 𝑇 𝑇 𝑦 − 𝑎 𝜂 )
39 Fabio Galasso
More on the Bernoulli Distribution
𝜙
• [log 𝑦 + log 1 − 𝜙 ]
1−𝜙
40 Fabio Galasso
More on the Gaussian Distribution
• 𝑃 𝑦, 𝜂 = 𝑏 𝑦 exp( 𝜂 𝑇 𝑇 𝑦 − 𝑎 𝜂 )
41 Fabio Galasso
Outline
42 Fabio Galasso
Generalized Linear Models
Generalized Linear Models (GLM)
44 Fabio Galasso
Generalized Linear Models (GLM)
46 Fabio Galasso
GLM for Ordinary Least Squares
47 Fabio Galasso
GLM for Logistic Regression
48 Fabio Galasso
Generalized Linear Models:
Multi-Class Classification
Multiclass classification
Email foldering/tagging: Work, Friends, Family, Hobby
Medical diagrams: Not ill, Cold, Flu
Weather: Sunny, Cloudy, Rain, Snow

Binary classification: Multi-‐class classification:
x2 x2
x1 x1
Multinomial Classification
x2
x1
53 Fabio Galasso
Multinomial and the Exponential Family
54 Fabio Galasso
Multinomial and the Exponential Family continued
55 Fabio Galasso
Softmax Regression
56 Fabio Galasso
Learn θ’s for Softmax Regression
57 Fabio Galasso
x2
One-‐vs-‐all (one-‐vs-‐rest):
x1
x2 x2
x1 x1
x2
Class 1:
Class 2:
Class 3:
𝑐 x1
ℎΘ (𝑥) = 𝑃 𝑦 = 𝑐 x; Θ (𝑐 = 1,2,3)
One-‐vs-‐all
Train a logistic regression classifier for each

class to predict the probability that .
On a new input , to make a prediction, pick the

class that maximizes
Softmax Regression and Cross Entropy Minimization
60 Fabio Galasso
Additional References
• Chapters 2.4, 4 and 4.1 and 4.3 in [Bishop, 2006. Pattern Recognition and Machine
Learning]
• https://beginningwithml.wordpress.com/2018/06/22/3-4-softmax-regression/
61 Fabio Galasso
Thank you
Acknowledges: slides and material from Andrew Ng, Alessandro Panconesi, Marco Bressan

Classification With Logistic Regression, Newton's Method For Optimization, Generalized Linear Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classification With Logistic Regression, Newton's Method For Optimization, Generalized Linear Models

Uploaded by

Copyright:

Available Formats

Classification with Logistic Regression,

Newton’s Method for Optimization,

Fundamentals of Data Science

0: “Negative Class” (e.g., benign tumor)

1: “Positive Class” (e.g., malignant tumor)

Hamburger was awesome

Pizza was awesome … hot

Pizza was awful … hot dog

#awful #awesome sentiment

Threshold classiﬁer output at 0.5:

can be > 1 or < 0

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

Tell patient that 70% chance of tumor being malignant

“probability that y = 1, given x,

How to choose parameters ?

• Recall the probabilistic interpretation of the hypothesis: 𝑃(𝑦 = 1|𝑥; Θ)

• Log likelihood l Θ = log 𝐿(Θ)

• Step function as the activation function

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

Optimization algorithms: Advantages:

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

Medical diagrams: Not ill, Cold, Flu

Weather: Sunny, Cloudy, Rain, Snow

Train a logistic regression classiﬁer for each

On a new input , to make a prediction, pick the

You might also like