You are on page 1of 55

Classification with Logistic Regression,

Newton’s Method for Optimization,


Generalized Linear Models

Fundamentals of Data Science


12, 17, 19 November 2020
Prof. Fabio Galasso
Classifiers

2 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Classification
Email: Spam / Not Spam?
Online Transactions: Fraudulent (Yes / No)?
Tumor: Malignant / Benign ?

0: “Negative Class” (e.g., benign tumor)

1: “Positive Class” (e.g., malignant tumor)


Classifiers

4 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Linear classifiers in a nutshell

Hamburger was awesome


… service was awful…
price was awesome

Pizza was awesome … hot


dog was awful…price was
awesome

Pizza was awful … hot dog


was awesome…price was
awesome
Pizza was awful … hot dog
was awful…price was
awful

5 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Sentiment analysis

#awful #awesome sentiment

pizza was
awful… ice
cream was 1 2 +
awesome…price
was awesome
hot dog was
awesome…martin
i was awesome… 1 2 +
price was awful

everything
was 1 0 -
awful
wine was
awful..pasta was
awful…dessert was 3 1 -
awful..view was

6 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
(Yes) 1

Malignant ?

(No) 0
Tumor Size Tumor Size

Threshold classifier output at 0.5:


If , predict “y = 1”
If , predict “y = 0”
Classification: y = 0 or 1

can be > 1 or < 0

Logistic Regression:
Outline

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

13 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Outline

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

14 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Classification with Logistic Regression
Logistic Regression Model
Want

0.5

Sigmoid function 0

Logistic function
Interpretation of Hypothesis Output
= estimated probability that y = 1 on input x

Example: If

Tell patient that 70% chance of tumor being malignant

“probability that y = 1, given x,


parameterized by ”
Classification with Logistic Regression:
Decision Boundary
Logistic regression 1

z
Suppose predict “ “ if

predict “ “ if
Decision Boundary
x2
3
2

1 2 3 x1

Predict “ “ if
Non-‐linear decision boundaries
x2

-‐1 1 x1
-‐1
Predict “ “ if
x2

x1
Classification with Logistic Regression:
Learn 𝚯’s
Training
set:
m examples

How to choose parameters ?


Logistic Regression Algorithm

• Recall the probabilistic interpretation of the hypothesis: 𝑃(𝑦 = 1|𝑥; Θ)

24 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Logistic Regression Algorithm

• Log likelihood l Θ = log 𝐿(Θ)

25 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Digression:
Perceptron
Logistic Regression Algorithm

• Step function as the activation function

27 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Outline

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

29 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Newton’s Method for Optimization
Recap on logistic regression

31 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Optimization algorithm
Given , we have code that can compute
-‐
-‐ (for )

Optimization algorithms: Advantages:


-‐ Gradient descent -‐ No need to manually pick
-‐ Conjugate gradient -‐ Often faster than gradient
-‐ BFGS descent.
-‐ L-‐BFGS Disadvantages:
-‐ More complex
Newton’s Method

33 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Newton’s Method

• When 𝚯 is a vector

34 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Outline

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

35 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Exponential Family of Distributions
Recap:
Bernoulli and Gaussian distributions

37 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
The Exponential Family of Distributions

38 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
More on the Bernoulli Distribution

• 𝑃 𝑦, 𝜂 = 𝑏 𝑦 exp( 𝜂 𝑇 𝑇 𝑦 − 𝑎 𝜂 )

39 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
More on the Bernoulli Distribution

𝜙
• [log 𝑦 + log 1 − 𝜙 ]
1−𝜙

40 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
More on the Gaussian Distribution

• 𝑃 𝑦, 𝜂 = 𝑏 𝑦 exp( 𝜂 𝑇 𝑇 𝑦 − 𝑎 𝜂 )

41 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Outline

• Classification with Logistic Regression

• Newton’s Method for Optimization

• Exponential Family of Distributions

• Generalized Linear Models

42 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Generalized Linear Models
Generalized Linear Models (GLM)

44 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Generalized Linear Models (GLM)

46 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
GLM for Ordinary Least Squares

47 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
GLM for Logistic Regression

48 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Generalized Linear Models:
Multi-Class Classification
Multiclass classification
Email foldering/tagging: Work, Friends, Family, Hobby

Medical diagrams: Not ill, Cold, Flu

Weather: Sunny, Cloudy, Rain, Snow


Binary classification: Multi-‐class classification:

x2 x2

x1 x1
Multinomial Classification

x2

x1

53 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Multinomial and the Exponential Family

54 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Multinomial and the Exponential Family continued

55 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Softmax Regression

56 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Learn θ’s for Softmax Regression

57 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
x2
One-‐vs-‐all (one-‐vs-‐rest):

x1
x2 x2

x1 x1
x2
Class 1:
Class 2:
Class 3:
𝑐 x1
ℎΘ (𝑥) = 𝑃 𝑦 = 𝑐 x; Θ (𝑐 = 1,2,3)
One-‐vs-‐all

Train a logistic regression classifier for each


class to predict the probability that .

On a new input , to make a prediction, pick the


class that maximizes
Softmax Regression and Cross Entropy Minimization

60 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Additional References

• Chapters 2.4, 4 and 4.1 and 4.3 in [Bishop, 2006. Pattern Recognition and Machine
Learning]
• https://beginningwithml.wordpress.com/2018/06/22/3-4-softmax-regression/

61 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Thank you

Acknowledges: slides and material from Andrew Ng, Alessandro Panconesi, Marco Bressan

You might also like