You are on page 1of 100

LINEAR MODEL

Bùi Tiến Lên

2023
Contents

1. Linear Regression

2. Classification

3. Logistic Regression

4. Softmax Regression

5. Capacity, Overfitting and Underfitting


Notation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
symbol meaning
Logistic
Regression a, b, c, N . . . scalar number
Binary Classification
Evaluation
w, v, x, y . . . column vector
Multi-class
Classification
X, Y . . . matrix operator meaning
Softmax R set of real numbers w| transpose
set of integer numbers matrix multiplication
Regression
Softmax Regression
Z XY
Cross Entropy vs. MSE
N set of natural numbers X −1 inverse
RD
Capacity,
Overfitting and set of vectors
set
Underfitting
Model Capacity
X , Y, . . .
Model vs. Data
Bias-Variance
A algorithm
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

3
Learning diagram
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

4
Linear Regression
• Simple Linear Model
• Weighted Linear Model
• Linear Basis Function Model
Problem 1
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider the Advertising data set Dtrain consists of the sales of that product
Logistic in 200 different markets, along with advertising budgets for the product in
Regression
Binary Classification each of those markets for the media TV. Find the relationship between TV
(input) and sales (output)
Evaluation
Multi-class
Classification

Softmax

25

25
Regression
Softmax Regression
Cross Entropy vs. MSE

20

20
Capacity,
Overfitting and
Sales

Sales
15

15
Underfitting
Model Capacity
Model vs. Data
10

10
Bias-Variance
Tradeoff of Capacity
Regularization
5

5
Tradeoff of
Regularization

0 50 100 200 300 0 50 100 200 300

TV TV

6
Linear Regression Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 1
Logistic
Regression A linear regression is a model that assumes a linear relationship between
Binary Classification
Evaluation
inputs and the output.
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

7
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The requirement is to build a system that can take a vector x ∈ RD+1 as
Logistic input and predict the value of a scalar y ∈ R as its output
Regression
Binary Classification
Evaluation
• The hypothesis set H
Multi-class
Classification

Softmax
y ≈ ŷ = hw (x) = w | x (1)
Regression

where ŷ be the value that our model (function) predicts y and w ∈ RD+1 is
Softmax Regression
Cross Entropy vs. MSE

Capacity, a vector of parameters of the model


Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

8
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Task T : to predict y from x by outputting ŷ = hw (x) = w | x
Logistic • The train set Dtrain denoted as (X, y) including N samples
Regression
Binary Classification {(x 1 , y1 ), (x 2 , y2 ) . . . (x N , yN )}, construct the matrix X and the vectors y and

Evaluation
Multi-class

x1 y1 ŷ1
Classification  |     
Softmax
Regression  x |2   y2   ŷ2 
Softmax Regression
X =  .  , y =  . , ŷ =  .  (2)
     
Cross Entropy vs. MSE
 .  . .
 .   . .
Capacity,
Overfitting and x| yN ŷN
Underfitting
| {z N } | {z } | {z }
target vector output vector
Model Capacity
Model vs. Data
input data matrix
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

9
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Performance measure P:
Logistic
Regression Concept 2
Binary Classification
Evaluation The mean squared error MSEtrain of the model on the train set Dtrain
Multi-class
Classification

Softmax N
1 1 X
Regression
MSEtrain = kŷ − yk2 = (ŷn − yn )2 (3)
Softmax Regression
N N
Cross Entropy vs. MSE
n=1
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

10
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

11
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• The learning goal: find the vector of parameter w such that
Logistic
Regression w = arg min(MSEtrain ) (4)
Binary Classification w
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

12
Solving Problem
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Solution
Logistic • Compute the gradient of MSEtrain
Regression
Binary Classification
Evaluation
Multi-class
∇w (MSEtrain ) = ∇w (w | X | Xw − w | X | y − y | Xw + y | y)
Classification

Softmax
= 2X | Xw − 2X | y (5)
Regression
Softmax Regression
Cross Entropy vs. MSE
• If MSEtrain reach the min value then ∇w (MSEtrain ) = 0
Capacity,
Overfitting and
Underfitting ∇w (MSEtrain ) = 0
X | Xw − X | y
Model Capacity
Model vs. Data = 0
Bias-Variance
Tradeoff of Capacity X Xw
|
= X |y
Regularization
Tradeoff of
Regularization w = (X | X)−1 X | y (6)

13
Solving Problem (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

14
Programming Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Use seaborn to read tips dataset and find the linear relationship between
Logistic total_bill and tip
Regression
Binary Classification
Evaluation
import numpy as np
Multi-class import seaborn as sns
Classification
import matplotlib . pyplot as plt
Softmax
Regression
Softmax Regression sns. set_style (" darkgrid ")
Cross Entropy vs. MSE
tips = sns. load_dataset ("tips")
Capacity,
Overfitting and
sns. regplot (x=" total_bill ", y="tip", data=tips , ci=None , line_kws
Underfitting ={ 'color ':'red '})
Model Capacity
Model vs. Data
plt.show ()
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

15
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
10
Logistic
Regression
Binary Classification
Evaluation
Multi-class 8
Classification

Softmax
Regression
Softmax Regression 6
Cross Entropy vs. MSE
tip
Capacity,
Overfitting and
Underfitting
4
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity 2
Regularization
Tradeoff of
Regularization

10 20 30 40 50
total_bill

16
Word Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification 1. Find the linear regression function y = f (x) = w0 + w1 x given the following
Logistic data set D
Regression
Binary Classification input x target y
1 2
Evaluation
Multi-class

2 3
Classification

Softmax
Regression 3 3
4 5
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and 2. Find the linear regression function y = f (x) = f (x1 , x2 ) = w0 + w1 x1 + w2 x2
Underfitting
Model Capacity
given the following data set D
Model vs. Data
Bias-Variance
input x target y
Tradeoff of Capacity
Regularization
(1, 1) 1
Tradeoff of
Regularization
(2, 3) 3
(3, 4) 4
(4, 3) 5
17
Discussion
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • D is a large number


Logistic
Regression
• Online learning
Binary Classification
Evaluation
• Limitations of the model (hypothesis set)
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

18
Weighted Linear Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • In some cases the observations may be weighted; for example, they may not
Logistic be equally reliable. In this case, we find the vector of parameters w to
Regression
Binary Classification minimize the weighted sum of squares of errors
Evaluation
Multi-class
Classification N
Etrain = an (ŷn − yn )2 (7)
X
Softmax
Regression
Softmax Regression n=1
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

19
Solving Problem
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic 1. Construct the matrices X, A and the vector y


Regression
Binary Classification

x1 a1 0 · · · 0 y1
 |     
Evaluation

 0 a2 · · · 0  y2
Multi-class
Classification  x |2   
Softmax X =  . , A =  . .. , y = (8)
     
.. . . ..
Regression  ..   ..

Softmax Regression
. . .   . 
Cross Entropy vs. MSE
x |N 0 0 · · · aN yN
Capacity, | {z } | {z } | {z }
Overfitting and input data matrix weight matrix target vector
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
2. Calculate the vector of parameters
Tradeoff of Capacity

w = (X | AX)−1 X | Ay (9)
Regularization
Tradeoff of
Regularization

20
Linear in What?
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Linearity in the weights


Logistic
Regression
Binary Classification
hw (x) = w0 + w 1 x1 + ... + w D xD (10)
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

21
Linear Basis Function Models
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 3
Logistic
Regression A linear basis function model is a linear combination of fixed nonlinear
Binary Classification
Evaluation
functions of the input variables
Multi-class
Classification

Softmax hw (x) = w0 φ0 (x) + w1 φ1 (x) + ... + wM φM (x) (11)


Regression

where φi (x) are basis functions


Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

22
Some types of basis functions
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Polynomial
Logistic
Regression
Binary Classification
Evaluation
1
Multi-class
Classification
φj (x) = x j (12)
0.5
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
0
Capacity,
Overfitting and −0.5
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
−1
Tradeoff of Capacity
−1 0 1
Regularization
Tradeoff of
Regularization

23
Some types of basis functions (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Gaussian
Logistic
Regression
Binary Classification
Evaluation
1
Multi-class
Classification
!
0.75 (x − µj )2
Softmax φj (x; µj , sj ) = exp − (13)
Regression
Softmax Regression
sj2
Cross Entropy vs. MSE
0.5
Capacity,
Overfitting and 0.25
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
0
Tradeoff of Capacity
−1 0 1
Regularization
Tradeoff of
Regularization

24
Some types of basis functions (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Sigmoid
Logistic
Regression
Binary Classification
Evaluation
1
1
Multi-class
Classification φj (x; µj , sj ) = x−µj (14)
0.75 −
Softmax
Regression
1+e sj

Softmax Regression
Cross Entropy vs. MSE
0.5
Capacity,
Overfitting and 0.25
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
0
Tradeoff of Capacity
−1 0 1
Regularization
Tradeoff of
Regularization

25
Problem 2
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Find the relationship between Years of Education and Income based on
Logistic the given data
Regression
Binary Classification
Evaluation
Multi-class
Classification

80

80
Softmax
Regression

70

70
Softmax Regression
Cross Entropy vs. MSE
60

60
Capacity,
Income

Income
Overfitting and
50

50
Underfitting
40

40
Model Capacity
Model vs. Data
Bias-Variance
30

30
Tradeoff of Capacity
Regularization
20

20
Tradeoff of
Regularization

10 12 14 16 18 20 22 10 12 14 16 18 20 22

Years of Education Years of Education

26
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The requirement is to build a system that can take a vector x ∈ RD as


Logistic input and predict the value of a scalar y ∈ R as its output
Regression
Binary Classification
Evaluation
• The hypothesis set H
Multi-class
Classification

Softmax
y ≈ ŷ = hw (x) = w | φ(x) (15)
Regression

where ŷ be the value that our model (function) predicts y, w ∈ RM+1 is a


Softmax Regression
Cross Entropy vs. MSE

Capacity, vector of parameters of the model and φ is a set of M + 1 basis functions


Overfitting and
Underfitting  
Model Capacity
Model vs. Data
φ0 (x)
Bias-Variance  φ1 (x) 
Tradeoff of Capacity
φ(x) =  (16)
 
Regularization .. 
Tradeoff of
Regularization
 . 
φM (x)

27
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Task T : to predict y from x by outputting ŷ = hw (x) = w | φ(x)
Logistic • Performance measure P:
Regression
Binary Classification The mean squared error MSEtrain of the model on the train set Dtrain
including N samples {(x 1 , y1 ), (x 2 , y2 ) . . . (x N , yN )}
Evaluation
Multi-class
Classification

Softmax • The learning goal: find the vector of parameter w such that
Regression
Softmax Regression
Cross Entropy vs. MSE w = arg min(MSEtrain )
w
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

28
Solving Problem
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic 1. Construct the matrix Φ and the vector y


Regression
Binary Classification

φ(x 1 )| y1
   
Evaluation

 y2
Multi-class
Classification  φ(x 2 )|  
Softmax Φ= , y =  .. (17)
   
Regression
.. 
Softmax Regression
 .   . 
Cross Entropy vs. MSE
φ(x N ) | yN
Capacity, | {z } | {z }
Overfitting and design matrix target vector
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
2. Calculate the vector of parameters
Tradeoff of Capacity

w = (Φ| Φ)−1 Φ| y (18)


Regularization
Tradeoff of
Regularization

29
Programming Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification import numpy as np


Logistic import seaborn as sns
Regression
import matplotlib . pyplot as plt
Binary Classification
Evaluation
Multi-class
Classification
sns. set_style (" darkgrid ")
Softmax
x = [1, 2, 3, 4, 5, 8, 10]
Regression y = [1.1 , 3.8, 8.5, 16, 24, 65, 99.2]
Softmax Regression
Cross Entropy vs. MSE
sns. regplot (x, y, order =2, ci=None , line_kws ={'color ':'red '})
Capacity,
plt. xlabel ('x')
Overfitting and plt. ylabel ('y')
Underfitting
Model Capacity
plt.show ()
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

30
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
100
Logistic
Regression
Binary Classification
Evaluation
80
Multi-class
Classification

Softmax
Regression 60
Softmax Regression
Cross Entropy vs. MSE y
Capacity, 40
Overfitting and
Underfitting
Model Capacity
Model vs. Data 20
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of 0
Regularization

1 2 3 4 5 6 7 8 9 10
x

31
Word Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Find a polynomial regression function y = f (x) = w0 + w1 x + w2 x 2 given the


Logistic data set D
Regression
Binary Classification input x target y
1 2
Evaluation
Multi-class
Classification

Softmax
2 3
Regression
Softmax Regression
3 3
Cross Entropy vs. MSE 4 5
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

32
Puzzle
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • What basis functions?


Logistic
Regression
Binary Classification
Evaluation 1.0
Multi-class
Classification

Softmax
Regression 0.5
Softmax Regression
Cross Entropy vs. MSE

Capacity, 0.0
Overfitting and
Underfitting
y
Model Capacity
Model vs. Data
0.5
Bias-Variance
Tradeoff of Capacity
Regularization 1.0
Tradeoff of
Regularization
1.0 0.5 0.0 0.5 1.0
x

33
Classification
A real data set
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Some 16-by-16 pixel grayscale image from the MNIST database
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

35
Input representation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Input representation or feature extraction


Logistic • “raw” input
Regression
pixels x | = x0 x1 . . . x256 

Binary Classification

linear model w | = w0 w1 . . . w 256


Evaluation
Multi-class
Classification

Softmax
• Feature extraction: extract useful information
intensity and symmetry x | = x1 x2 

Regression

linear model w | = w1 w 2
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

36
Illustration of Features
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class

2x
Classification

Softmax

symmetry
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

intensity x1

37
The Problem with Categorical Data
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Some algorithms can work with categorical data directly.


Logistic
Regression
• For example, a decision tree can be learned directly from categorical
Binary Classification
Evaluation
data with no data transform required
Multi-class
Classification • Many machine learning algorithms cannot operate on label data directly.
Softmax • They require all input variables and output variables to be numeric.
Regression
Softmax Regression
Cross Entropy vs. MSE
• There are two common types of conversion: integer encoding and one-hot
Capacity, encoding
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

38
Integer Encoding
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • For categorical variables where no such ordinal relationship exists, the integer
Logistic encoding is not enough.
Regression
Binary Classification
Evaluation
Multi-class
id color id color
Classification
1 red 1 1
Softmax
Regression 2 green 2 2
Softmax Regression
Cross Entropy vs. MSE 3 blue Integer 3 3
Capacity,
4 red encoding 4 1
Overfitting and
Underfitting
Model Capacity
… … … …
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

39
One-Hot Encoding
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • One-hot encoding ensures that machine learning does not assume that higher
Logistic numbers are more important.
Regression
Binary Classification
Evaluation
Multi-class
id color id color_red color_green color_blue
Classification
1 red 1 1 0 0
Softmax
Regression 2 green 2 0 1 0
Softmax Regression
Cross Entropy vs. MSE 3 blue One-hot 3 0 0 1
Capacity,
4 red encoding 4 1 0 0
Overfitting and
Underfitting
Model Capacity
… … … … … …
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

40
Classifier Training
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Select the learning model for classifier, e.g., Perceptron
Logistic • Train the classifier/model using a training set D = {(x 1 , y1 ), ..., (x N , yN )}
Regression
Binary Classification
Evaluation
Multi-class
Classification
training
set sample
Softmax
Regression
Softmax Regression input set
Cross Entropy vs. MSE ... ...
get
Capacity, next sample Model
... ...
Overfitting and training
Underfitting set
Model Capacity ... ...
Model vs. Data
Bias-Variance ... ... update
model cost value
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization No statisfied? Yes Model

features target

41
Logistic Regression
• Binary Classification
• Evaluation
• Multi-class Classification
A Third Linear Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
d
Logistic
s= wi xi
X
Regression

i=0
Binary Classification
Evaluation
Multi-class

linear classification linear regression logistic regression


Classification

Softmax
Regression h(x) = sign(s) h(x) = s h(x) = σ(s)
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

43
The logistic function
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• The formula
Logistic 1
Regression
1
Binary Classification
σ(s) = (19)
Evaluation
Multi-class
1 + e −s 0.75
Classification

Softmax • The logistic function converts a σ


0.5
Regression
Softmax Regression score to a probability
Cross Entropy vs. MSE

Capacity,
• Properties 0.25

Overfitting and
Underfitting σ(−s) = 1 − σ(s)
Model Capacity −6 −4 −2 0 2 4 6
Model vs. Data s
Bias-Variance
Tradeoff of Capacity
Regularization
σ 0 (s) = σ(s)(1 − σ(s))
Tradeoff of
Regularization

44
Probability Interpretation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • h(x) = σ(s) can be interpreted as a probability


Logistic • For example, prediction of heart attacks
Regression
Binary Classification • Input x: cholesterol level, age, weight, etc.
Evaluation
Multi-class • The signal s = w T x: risk score
Classification

Softmax
• σ(s): probability of a heart attack
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

score probability of heart attack

45
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The target function f is the probability distribution


Logistic
Regression
Binary Classification
f : RD → [0, 1]
Evaluation

• Hypothesis set hw (x) = σ(w T x) and the conditional probability


Multi-class
Classification

Softmax
Regression
hw (x) for y = 1

P(y | x, w) = (20)
Softmax Regression

1 − hw (x) for y = 0
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

46
Error measure
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification We define error measurement based on likelihood


Logistic • For each (x, y), y is generated by probability hw (x).
Regression
Binary Classification
Evaluation
• Plausible error measure based on likelihood of y given x and w
Multi-class

P(y | x, w) = z y (1 − z)1−y
Classification

Softmax
(21)
Regression

where
Softmax Regression
Cross Entropy vs. MSE

Capacity, z = hw (x) = σ(w | x) (22)


Overfitting and
Underfitting
Model Capacity
• Likelihood of D = {(x 1 , y1 )...(x N , yN )} given w is
Model vs. Data
Bias-Variance
Tradeoff of Capacity N
P(yn | x n , w) (23)
Y
Regularization
Tradeoff of
Regularization
n=1

47
Error measure (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Learning goal: maximizing likelihood
Logistic QN
Regression Maximize P(y | x , w)
Binary Classification QN n n
n=1
Evaluation
⇔ Minimize − log n=1 P(yn | x n , w) (24)
⇔ Minimize − N
Multi-class

n=1 (yn log zn + (1 − yn ) log(1 − zn ))


Classification
P
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

48
Learning Algorithm (Gradient Descent)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification 1. Initialize the weights (parameters) at t = 0 w 0


Logistic 2. For t = 1, 2, 3, ... do
Regression
Binary Classification
2.1 Compute the outputs zn for each x n (n = 1, ..., N)
Evaluation
Multi-class
Classification
zn = σ(w T
t xn) (25)
Softmax
Regression 2.2 Compute the gradient
Softmax Regression
Cross Entropy vs. MSE N
1 X
∇w E = x n (zn − yn ) (26)
Capacity, N
Overfitting and n=1
Underfitting
Model Capacity 2.3 Update the weights
Model vs. Data
Bias-Variance
Tradeoff of Capacity w t+1 = w t − η∇w E (27)
Regularization
Tradeoff of
Regularization where η is a learning rate (hyper-parameter)
Iterate the next step until w is not changes
3. Return the final weights w
49
Learning Rate
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • How η affects the algorithm?


Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

50
Programming Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification import matplotlib . pyplot as plt


Logistic import seaborn as sns
Regression
sns.set(style=" darkgrid ")
Binary Classification
Evaluation
Multi-class
Classification
# Load the example titanic dataset
Softmax
df = sns. load_dataset (" titanic ")
Regression
Softmax Regression
Cross Entropy vs. MSE
# Make a custom palette with gendered colors
Capacity,
pal = dict(male=" #6495 ED", female ="# F08080 ")
Overfitting and
Underfitting
Model Capacity
# Show the survival proability as a function of age and sex
Model vs. Data g = sns. lmplot (x="age", y=" survived ", col="sex", hue="sex", data=df ,
Bias-Variance
Tradeoff of Capacity
palette =pal , y_jitter =.02 , logistic =True , ci=None)
Regularization g.set(xlim =(0, 80) , ylim =( -.05 , 1.05))
Tradeoff of
Regularization plt.show ()

51
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
sex = male sex = female
Logistic 1.0
Regression
Binary Classification
Evaluation
Multi-class
0.8
Classification

Softmax
Regression 0.6
survived

Softmax Regression
Cross Entropy vs. MSE
0.4
Capacity,
Overfitting and
Underfitting
Model Capacity
0.2
Model vs. Data
Bias-Variance
Tradeoff of Capacity 0.0
Regularization
Tradeoff of 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Regularization age age

52
Evaluation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider two-class problem with two classes ⊕ and


Logistic
Regression
• The performance of logistic regression model is based on a threshold th
Binary Classification

y is ⊕ if P(y | x) ≥ th
Evaluation
Multi-class
Classification

Softmax y is if P(y | x) < th (28)


Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
• High threshold: high specificity, low sensitivity
Underfitting
Model Capacity
• Low threshold: low specificity, high sensitivity
Model vs. Data
Bias-Variance
• We should select the best threshold for the trade-off between the cost of false
positives vs false negatives
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

53
ROC Curve
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The receiver operating characteristic (ROC) curve is plot which shows the
Logistic performance of a binary classifier as function of its cut-off threshold.
Regression
Binary Classification • It essentially shows the true positive rate (sensitivity) against the false
positive rate (1-specificity) for various threshold values.
Evaluation
Multi-class
Classification

Softmax
• The area under the curve (AUC) is an aggregated measure of performance.
Regression
1.00
Softmax Regression 1.00

Cross Entropy vs. MSE

Capacity, 0.75
Overfitting and
0.75

Underfitting

sensitivity
Model Capacity
0.50 0.50
p

Model vs. Data


Bias-Variance
Tradeoff of Capacity
Regularization 0.25 0.25

Tradeoff of
Regularization

0.00 0.00

−4 −2 0 2 0.00 0.25 0.50 0.75 1.00


x 1 − specificity

54
Multi-class classification problems
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Email foldering/tagging: Work (1), Friends (2), Family (3), Hobby (4)
Logistic
Regression
• Medical diagrams: Not ill, Cold, Flu
Binary Classification
Evaluation
• Weather: Sunny, Cloudy, Rain, Snow
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

55
Visual of Binary vs Multi-class classification
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

56
Approaches
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • One-vs-one
Logistic
Regression
• Hierarchical
Binary Classification
Evaluation
• One-vs-all
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

57
One-vs-all
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression
• Class # (1):
Binary Classification
Evaluation
Multi-class
h(1) (x) = P(y = 1 | x, w 1 )
Classification

Softmax
Regression
• Class M (2):
Softmax Regression

h(2) (x) = P(y = 2 | x, w 2 )


Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
• Class ♦ (3):
Model vs. Data

h(3) (x) = P(y = 3 | x, w 3 )


Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

58
One-vs-all (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Learning
Logistic • Train a logistic regression classifier h(i) (x) for each class i to predict the
Regression
Binary Classification probability that y = i.
Evaluation
Multi-class
Classification
Prediction
Softmax • On a new input x, to make a prediction, pick the class i that maximizes
Regression
Softmax Regression
Cross Entropy vs. MSE
arg max(h(i) (x)) (29)
Capacity, i
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

59
Decision boundaries and decision regions
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Logistic regression for Iris dataset


Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

60
Softmax Regression
• Softmax Regression
• Cross Entropy vs. MSE
Softmax Regression
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 4
Logistic
Regression Softmax regression is a generalization of logistic regression that we can use for
Binary Classification
Evaluation
multi-class classification
Multi-class
Classification

Softmax
• In Softmax regression, we replace the sigmoid function by the so-called
Regression
Softmax Regression
softmax function φ(·) = {φ1 , ..., φC }.
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

62
Score function
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 5
Logistic
Regression The score function f that maps the raw features to class scores.
Binary Classification

z = f (x; W , b) = W x + b (30)
Evaluation
Multi-class
Classification

Softmax
Regression input image
Softmax Regression
Cross Entropy vs. MSE

Capacity, 56
Overfitting and 0.2 -0.5 0.1 2.0 1.1 -96.8 cat score
Underfitting
Model Capacity 231
Model vs. Data
1.5 1.3 2.1 0.0 3.2 437.9 dog score
Bias-Variance
Tradeoff of Capacity 24
Regularization 0 0.25 0.2 -0.3 -1.2 61.95 ship score
Tradeoff of
Regularization 2

63
Score function (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Use bias trick W0 = b to represent the two parameters W , b as one


Logistic
Regression
Binary Classification
z = f (x; W ) = W x (31)
Evaluation
Multi-class
Classification new new
Softmax
Regression 1
Softmax Regression
Cross Entropy vs. MSE 56
0.2 -0.5 0.1 2.0 1.1 1.1 0.2 -0.5 0.1 2.0 56
Capacity,
Overfitting and 231
Underfitting 1.5 1.3 2.1 0.0 3.2 3.2 1.5 1.3 2.1 0.0 231
Model Capacity
Model vs. Data
24
Bias-Variance 0 0.25 0.2 -0.3 -1.2 -1.2 0 0.25 0.2 -0.3 24
Tradeoff of Capacity
Regularization
2
Tradeoff of 2
Regularization

64
Softmax function
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 6
Logistic
Regression The softmax function converts a score vector z = (z1 , ..., zC ) to a discrete
distribution vector p = (p1 , ..., pC )
Binary Classification
Evaluation
Multi-class

e zi
Classification

Softmax
Regression
pi = P(y = i | z) = φi (z) = PC , i ∈ [1, ..., C] (32)
zj
Softmax Regression j=1 e
Cross Entropy vs. MSE

Capacity,
Overfitting and
where
Underfitting
Model Capacity
zi = w i0 + w i1 x1 + ... + w iD xD = w |i x (33)
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

65
Softmax function (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification score vectors


Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression softmax
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

distribution vectors

66
Softmax function (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
input image
Logistic
Regression
Binary Classification
1
Evaluation
Multi-class
Classification 56 -96.8 0 cat
Softmax
Regression
Softmax Regression
231 437.9 1 dog
Cross Entropy vs. MSE

Capacity, 24 61.95 0 ship


Overfitting and
Underfitting
Model Capacity 2
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

67
Problem Statement
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Given D = {(x 1 , y1 )...(x N , yN )} where yn ∈ {1, ..., C}.


Logistic
Regression
• Denote t n is a one-hot encoding of yn (or target discrete distribution)
Binary Classification
Evaluation
• Learning goal: Find a softmax function φW (·) = {φ1 , ..., φC } that minimize
Multi-class
Classification

Softmax
N
arg min E (φW ) = arg min CE (p n , t n ) (34)
X
Regression
Softmax Regression
W W
Cross Entropy vs. MSE n=1
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

68
Cross Entropy
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 7
Logistic
Regression Cross-entropy (CE) is a measure of the difference between two probability
distributions. The cross-entropy between a “true” distribution t = (t1 , ..., tC ) and
Binary Classification
Evaluation
Multi-class
Classification an estimated distribution p = (p1 , ..., pC ) is defined as
Softmax
Regression C
CE (p, t) = − ti log pi (35)
Softmax Regression
X
Cross Entropy vs. MSE

Capacity, i=1
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

69
Cross Entropy (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification p t
Logistic 0.45
Regression 0.4
0.35
Binary Classification

probability
0.3
Evaluation 0.25
Multi-class 0.2
Classification 0.15
0.1
Softmax 0.05
Regression 0
Softmax Regression A B C D E
Cross Entropy vs. MSE
class
Capacity,
Overfitting and
Underfitting

p = (0.1, 0.2, 0.4, 0.2, 0.1)


Model Capacity

→ CE (p, t) = 1.678
Model vs. Data
Bias-Variance
Tradeoff of Capacity t = (0.2, 0.4, 0.2, 0.1, 0.1)
Regularization
Tradeoff of
Regularization
• CE > 0
• CE (p, t) 6= CE (t, p)
• CE minimize if pi = ti , ∀i
70
MSE
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 8
Logistic
Regression Mean squared error (MSE) is a measure of the average of the squares of the
errors.
Binary Classification
Evaluation
Multi-class C
1X
MSE (p, t) = (pi − ti )2 (36)
Classification

Softmax
C
Regression i=1
Softmax Regression
Cross Entropy vs. MSE

Capacity, • MSE ≥ 0
Overfitting and
Underfitting • MSE (p, t) = MSE (t, p)
Model Capacity
Model vs. Data
Bias-Variance
• MSE = 0 if pi = ti , ∀i
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

71
Cross Entropy vs. MSE
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider three “true” binary distributions p = (0.1, 0.9), (0.5, 0.5) and
Logistic (0.8, 0.2)
Regression
Binary Classification
Evaluation p = 0.1 p = 0.5 p = 0.8
Multi-class
6 6 6
Classification

Softmax
5 5 5
Regression
Softmax Regression 4 4 4
Cross Entropy vs. MSE
3 3 3
Capacity,
Overfitting and
Underfitting 2 (0.1log(q) + 0.9log(1 q)) 2 2 (0.8log(q) + 0.2log(1 q))
Model Capacity
1 1
(0.5log(q) + 0.5log(1 q)) 1
Model vs. Data
Bias-Variance (q 0.1)2 (q 0.5)2 (q 0.8)2
Tradeoff of Capacity 0 0 0
Regularization
Tradeoff of 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Regularization

72
Learning Algorithm
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification 1. Initialize the weights W 0 (parameters) at t = 0


Logistic 2. For t = 1, 2, 3, ... do
Regression
Binary Classification
2.1 Compute the ouput distribution p n for each x n (n = 1...N)
Evaluation
Multi-class
Classification
p n = softmax(W t x n ) (37)
Softmax
Regression 2.2 Compute the gradient
Softmax Regression
Cross Entropy vs. MSE N
1 X
∇W E = (p n − t n )x |n (38)
Capacity, N
Overfitting and n=1
Underfitting
Model Capacity 2.3 Update the weights
Model vs. Data
Bias-Variance
Tradeoff of Capacity W t+1 = W t − η∇W E (39)
Regularization
Tradeoff of
Regularization Iterate the next step until W is not change
3. Return the final weights W

73
Example
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Softmax regression for Iris dataset


Logistic
Regression
Binary Classification Softmax Regression - Gradient Descent
Evaluation 0
1 0.30
Multi-class 2 2
Classification

Softmax 0.25
1
Regression
Softmax Regression 0.20

Cost
Cross Entropy vs. MSE 0

Capacity, 0.15
Overfitting and 1
Underfitting 0.10
Model Capacity
2
Model vs. Data
Bias-Variance
0.05
2 1 0 1 2 3 0 100 200 300 400 500
Tradeoff of Capacity Iterations
Regularization
Tradeoff of
Regularization

74
Capacity, Overfitting and Underfitting
• Model Capacity
• Model vs. Data
• Bias-Variance
• Tradeoff of Capacity
• Regularization
• Tradeoff of Regularization
Model Capacity
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic Concept 9
Regression
Model space
Binary Classification
Evaluation
Capacity is model complexity.
Multi-class

The most common ways to estimate the


Classification

Softmax
Regression
Softmax Regression
capacity of a model:
Cross Entropy vs. MSE
• VC dimension
Capacity,
Overfitting and • The number of parameters
Underfitting
Model Capacity
Model vs. Data
• The norm of parameters simple
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization complex

76
Model vs. Data
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 10
Logistic
Regression Data is divided into three sets: training set, validation set and test set.
Binary Classification
Evaluation

Concept 11
Multi-class
Classification

Softmax
Regression Models can be too limited. We can’t find a function that fits the data well. This
Softmax Regression
Cross Entropy vs. MSE
is called underfitting.
Capacity,
Overfitting and
Underfitting Concept 12
Models can also be too rich. We don’t just model the data, but also the
Model Capacity
Model vs. Data

underlying noise. This is called overfitting.


Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

77
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Given the data set D = {(x1 , y1 ), ...(x10 , y10 )} shown in the following figure,
Logistic find the best regression function to the data
Regression
Binary Classification
Evaluation
Multi-class
Classification
1
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity, y0
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
−1
Tradeoff of
Regularization

0 1
x
78
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider the simple hypothesis set


Logistic
Regression
Binary Classification
H1 = {h | y = h(x) = w0 + w 1 x} (40)
Evaluation
Multi-class
Classification

Softmax
1 1 M =1
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity, y0 y0
Overfitting and
Underfitting
Model Capacity
Model vs. Data −1 −1
Bias-Variance
Tradeoff of Capacity
Regularization 0 1 0 1
Tradeoff of x x
Regularization

79
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider the hypothesis set


Logistic
Regression
Binary Classification
H3 = {h | y = h(x) = w0 + w 1 x + w2 x 2 + w 3 x 3 } (41)
Evaluation

(note that H1 ⊂ H3 )
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
1 1 M =3

Capacity,
Overfitting and
Underfitting y0 y0
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
Regularization
0 1 0 1
x x

80
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider the hypothesis set


Logistic
Regression
Binary Classification
H9 = {h | y = h(x) = w 0 + w 1 x + w 2 x 2 + ... + w 9 x 9 } (42)
Evaluation

(note that H1 ⊂ H3 ⊂ H9 )
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE
1 1 M =9

Capacity,
Overfitting and
Underfitting y0 y0
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
Regularization
0 1 0 1
x x

81
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic Which one 1 1 M =1


Regression
Binary Classification • Under-fitting
Evaluation y0 y0
Multi-class
Classification
• Over-fitting
Softmax • Appropriate −1 −1

Regression
Softmax Regression fitting 0
x
1 0
x
1

Cross Entropy vs. MSE

Capacity, M =3 M =9
1 1
Overfitting and
Underfitting
Model Capacity y0 y0
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
0 1 0 1
Regularization x x

82
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
M=1 M=3 M=9
Logistic
Regression w0 0.82 0.31 0.35
Binary Classification
Evaluation
w1 -1.27 7.99 232.37
Multi-class
Classification w2 -25.43 -5321.83
Softmax w3 17.37 48568.31
Regression
Softmax Regression w4 -231639.30
Cross Entropy vs. MSE
w5 640042.26
Capacity,
Overfitting and w6 -1061800.52
Underfitting
Model Capacity w7 1042400.18
Model vs. Data
Bias-Variance
w8 -557682.99
Tradeoff of Capacity
Regularization
w9 125201.43
Tradeoff of
Regularization

83
Model Performance
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification 1
Logistic Training
Regression
Binary Classification
Test
Evaluation
Multi-class
Classification

ERMS
Softmax 0.5
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data 0
Bias-Variance 0 3 M 6 9
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
Figure 1: Graphs of the root-mean-square error evaluated on the training set and on an
independent test set for various values of M

84
What happen if increasing N
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification

Logistic
Regression 1 N = 15 1 N = 100
Binary Classification
Evaluation
Multi-class
Classification

Softmax
y0 y0
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity, −1 −1
Overfitting and
Underfitting
Model Capacity
Model vs. Data 0 1 0 1
Bias-Variance x x
Tradeoff of Capacity
Regularization
Tradeoff of
Figure 2: Using the M = 9 polynomial for N = 15 data points (left plot) and N = 100
Regularization
data points (right plot). We see that increasing the size of the data set reduces the
over-fitting problem

85
Errors in Learning Model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Bias errors: error due to assumption in the model


Logistic
Regression
• High bias to signify underfitting
Binary Classification
Evaluation
• Variance errors: It measures the variability in the results given by model
Multi-class
Classification when the dataset is changed
Softmax • High variance to signify overfitting
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting Expected error = Bias + Variance + Irreducible Error (43)
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

86
Errors in Learning Model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 13
Logistic
Regression Given a learning model hH, Ai, we define the “average” hypothesis g(x)
Binary Classification

g(x) = ED [gD (x)] (44)


Evaluation
Multi-class
Classification

Softmax
Regression where gD (x) is the “best” hypothesis given the data set D
Softmax Regression
Cross Entropy vs. MSE • Bias of learning model
Capacity,
Overfitting and
Bias = Ex (g(x) − f (x))2 (45)
h i
Underfitting
Model Capacity
Model vs. Data

where f (x) is the “truth” function


Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
• Variance of learning model

Variance = Ex ED (gD (x) − g(x))2 (46)


  

87
Errors in Learning Model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

truth function
Classification

Logistic
Regression
Binary Classification
bias
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
variance
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
• Given many independent data sets D1 , D2 , ..., DK , we can estimate g(x) by
Regularization
K
1 X
g(x) ≈ gDi (x) (47)
K
i=1
88
Example: two learning models
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider a target function sine


Logistic
Regression
f : [−1, 1] →
(48)
Binary Classification
R
Evaluation
Multi-class
x → sin(πx)
Classification

Softmax
Regression
• We generate 100 data sets {Di } , i = 1, ..., 100, each containing N = 2 data
Softmax Regression
Cross Entropy vs. MSE
points, independently from the sinusoidal curve f (x) = sin(πx). For each
Capacity,
data set Di , we fit the data using one of two models
Overfitting and
Underfitting
• H0 : set of all lines of the form h(x) = b
Model Capacity
Model vs. Data
• H1 : set of all lines of the form h(x) = ax + b
Bias-Variance • Note that H0 ⊂ H1
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

89
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Given a data set Di = {(x1 , y1 ), (x2 , y2 )}.
Logistic • For H0 , we choose the constant hypothesis that best fits the data (the
horizontal line at the midpoint, b = (y1 + y2 )/2).
Regression
Binary Classification
Evaluation
Multi-class
• For H1 , we choose the line that passes through the two data points
Classification
(x1 , y1 ) and (x2 , y2 ).
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

90
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Repeating this process with 100 data sets {Di } , i = 1, ..., 100,
Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

91
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • The bias-variance for each learning model


Logistic
Regression
Binary Classification
Evaluation
Multi-class
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
bias=0.50 bias=0.21
Tradeoff of
Regularization var=0.25 var=1.69

92
Generalization and Capacity
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification The criteria determining how well a machine learning model will perform:
Logistic 1. Make the training error small.
Regression
Binary Classification
Evaluation
2. Make the gap between training and test (generalization) error small.
Multi-class
Classification

Softmax Training error


Regression Under-fitting Over-fitting
Softmax Regression Generalization error
Cross Entropy vs. MSE

Capacity,
Error

Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Generalization gap
Regularization
Tradeoff of
Regularization 0 Optimal Capacity
Capacity

93
Regularization
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification Concept 14
Logistic
Regression Regularization is any modification we make to a learning model that is intended
Binary Classification
Evaluation
to reduce its generalization error but not its training error.
Multi-class
Classification
performance measure
Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
algorithm
Model Capacity
Model vs. Data
Bias-Variance
data
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

model

94
Addressing
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
High bias High variance
Logistic
Regression Obtain more features Decrease number of features
Binary Classification
Evaluation
Decrease regularization λ Increase regularization λ
Multi-class
Classification Extend model Obtain more data
Softmax Train longer Stop early
Regression
Softmax Regression New model architecture New model architecture
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

95
Regularization for Linear Regression
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Using regularized MSEtrain for linear regression


Logistic
Regression
  
1 T
Binary Classification
w = arg min MSEtrain + λ w w (49)
Evaluation
Multi-class
w N
Classification

Softmax
Regression
where λ is the regularization coefficient (hyper-parameter) that controls the
Softmax Regression relative importance of the data-dependent error MSEtrain and the
1 T
regularization term λ N w w
Cross Entropy vs. MSE 
Capacity,
Overfitting and
Underfitting
• Solving for w, we obtain
Model Capacity

w = (X | X + λI)−1 X | y (50)
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

96
Example: one learning model
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification • Consider a target function sine


Logistic
Regression
f : [−1, 1] →
(51)
Binary Classification
R
Evaluation
Multi-class
x → sin(2πx)
Classification

Softmax
Regression
• We generate 100 data sets {Di } , i = 1, ..., 100, each containing N = 25 data
Softmax Regression
Cross Entropy vs. MSE
points, independently from the sinusoidal curve f (x) = sin(2πx). For each
Capacity,
data set Di , we fit a model with 24 Gaussian basis functions by minimizing
Overfitting and
Underfitting
the regularized error function
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization

97
Example: one learning model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Illustration of the dependence of bias and variance on model regularization
Logistic coefficient
Regression
Binary Classification
Evaluation
Multi-class y y y
Classification

Softmax
Regression
Softmax Regression
Cross Entropy vs. MSE

Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity y y y
Regularization
Tradeoff of
Regularization

98
Example: one learning model (cont.)
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model

Classification
• Summary of the dependence of bias and variance on model regularization
Logistic coefficient
Regression
Binary Classification 0.15
Evaluation
Multi-class
Classification
(bias)2
0.12 variance
Softmax
Regression (bias)2 + variance
Softmax Regression
0.09 test error
Cross Entropy vs. MSE

Capacity,
Overfitting and 0.06
Underfitting
Model Capacity
Model vs. Data
0.03
Bias-Variance
Tradeoff of Capacity
Regularization
Tradeoff of
0
Regularization −3 −2 −1 0 1 2
ln λ

99
References

Goodfellow, I., Bengio, Y., and Courville, A. (2016).


Deep learning.
MIT press.
Lê, B. and Tô, V. (2014).
Cở sở trí tuệ nhân tạo.
Nhà xuất bản Khoa học và Kỹ thuật.
Russell, S. and Norvig, P. (2021).
Artificial intelligence: a modern approach.
Pearson Education Limited.

You might also like