Lect03 Linear Model ML

LINEAR MODEL
Bùi Tiến Lên
2023
Contents
1. Linear Regression
2. Classification
3. Logistic Regression
4. Softmax Regression
5. Capacity, Overfitting and Underfitting

Notation
Linear
Regression
Simple Linear Model
Weighted Linear Model
Linear Basis Function
Model
Classification
symbol meaning
Logistic
Regression a, b, c, N . . . scalar number
Binary Classification
Evaluation
w, v, x, y . . . column vector
Multi-class
Classification
X, Y . . . matrix operator meaning
Softmax R set of real numbers w| transpose
set of integer numbers matrix multiplication
Regression
Softmax Regression
Z XY
Cross Entropy vs. MSE
N set of natural numbers X −1 inverse
RD
Capacity,
Overfitting and set of vectors
set
Underfitting
Model Capacity
X , Y, . . .
Model vs. Data
Bias-Variance
A algorithm
Tradeoff of Capacity
Regularization
Tradeoff of
Regularization
3
Learning diagram
Linear
Regression
Simple Linear Model
Model
Classification
Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
4
Linear Regression
• Simple Linear Model
• Weighted Linear Model
• Linear Basis Function Model
Problem 1
Linear
Regression
Simple Linear Model
Model
Classification • Consider the Advertising data set Dtrain consists of the sales of that product
Logistic in 200 different markets, along with advertising budgets for the product in
Regression
Binary Classification each of those markets for the media TV. Find the relationship between TV
(input) and sales (output)
Evaluation
Multi-class
Classification
Softmax
25
25
Regression
Softmax Regression
20
20
Capacity,
Overfitting and
Sales
Sales
15
15
Underfitting
Model Capacity
Model vs. Data
10
10
Bias-Variance
Regularization
5
5
Tradeoff of
Regularization
0 50 100 200 300 0 50 100 200 300
TV TV
6
Linear Regression Model
Linear
Regression
Simple Linear Model
Model
Classification Concept 1
Logistic
Regression A linear regression is a model that assumes a linear relationship between
Evaluation
inputs and the output.
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
7
Problem Statement
Linear
Regression
Simple Linear Model
Model
Classification • The requirement is to build a system that can take a vector x ∈ RD+1 as
Logistic input and predict the value of a scalar y ∈ R as its output
Regression
Evaluation
• The hypothesis set H
Multi-class
Classification
Softmax
y ≈ ŷ = hw (x) = w | x (1)
Regression
where ŷ be the value that our model (function) predicts y and w ∈ RD+1 is
Softmax Regression
Capacity, a vector of parameters of the model

Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
8
Problem Statement (cont.)
Linear
Regression
Simple Linear Model
Model
Classification
• Task T : to predict y from x by outputting ŷ = hw (x) = w | x
Logistic • The train set Dtrain denoted as (X, y) including N samples
Regression
Binary Classification {(x 1 , y1 ), (x 2 , y2 ) . . . (x N , yN )}, construct the matrix X and the vectors y and
ŷ
Evaluation
Multi-class
x1 y1 ŷ1
Classification  |     
Softmax
Regression  x |2   y2   ŷ2 
Softmax Regression
X =  .  , y =  . , ŷ =  .  (2)
     
 .  . .
 .   . .
Capacity,
Overfitting and x| yN ŷN
Underfitting
| {z N } | {z } | {z }
target vector output vector
Model Capacity
Model vs. Data
input data matrix
Bias-Variance
Regularization
Tradeoff of
Regularization
9
Linear
Regression
Simple Linear Model
Model
Classification
• Performance measure P:
Logistic
Regression Concept 2
Evaluation The mean squared error MSEtrain of the model on the train set Dtrain
Multi-class
Classification
Softmax N
1 1 X
Regression
MSEtrain = kŷ − yk2 = (ŷn − yn )2 (3)
Softmax Regression
N N
n=1
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
10
Linear
Regression
Simple Linear Model
Model
Classification
Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
11
Linear
Regression
Simple Linear Model
Model
Classification
• The learning goal: find the vector of parameter w such that
Logistic
Regression w = arg min(MSEtrain ) (4)
Binary Classification w
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
12
Solving Problem
Linear
Regression
Simple Linear Model
Model
Classification Solution
Logistic • Compute the gradient of MSEtrain
Regression
Evaluation
Multi-class
∇w (MSEtrain ) = ∇w (w | X | Xw − w | X | y − y | Xw + y | y)
Classification
Softmax
= 2X | Xw − 2X | y (5)
Regression
Softmax Regression
• If MSEtrain reach the min value then ∇w (MSEtrain ) = 0
Capacity,
Overfitting and
Underfitting ∇w (MSEtrain ) = 0
X | Xw − X | y
Model Capacity
Model vs. Data = 0
Bias-Variance
Tradeoff of Capacity X Xw
|
= X |y
Regularization
Tradeoff of
Regularization w = (X | X)−1 X | y (6)
13
Solving Problem (cont.)
Linear
Regression
Simple Linear Model
Model
Classification
Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
14
Programming Example
Linear
Regression
Simple Linear Model
Model
Classification • Use seaborn to read tips dataset and find the linear relationship between
Logistic total_bill and tip
Regression
Evaluation
import numpy as np
Multi-class import seaborn as sns
Classification
import matplotlib . pyplot as plt
Softmax
Regression
Softmax Regression sns. set_style (" darkgrid ")
tips = sns. load_dataset ("tips")
Capacity,
Overfitting and
sns. regplot (x=" total_bill ", y="tip", data=tips , ci=None , line_kws
Underfitting ={ 'color ':'red '})
Model Capacity
Model vs. Data
plt.show ()
Bias-Variance
Regularization
Tradeoff of
Regularization
15
Programming Example (cont.)
Linear
Regression
Simple Linear Model
Model
Classification
10
Logistic
Regression
Evaluation
Multi-class 8
Classification
Softmax
Regression
Softmax Regression 6
tip
Capacity,
Overfitting and
Underfitting
4
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity 2
Regularization
Tradeoff of
Regularization
10 20 30 40 50
total_bill
16
Word Example
Linear
Regression
Simple Linear Model
Model
Classification 1. Find the linear regression function y = f (x) = w0 + w1 x given the following
Logistic data set D
Regression
Binary Classification input x target y
1 2
Evaluation
Multi-class
2 3
Classification
Softmax
Regression 3 3
4 5
Softmax Regression
Capacity,
Overfitting and 2. Find the linear regression function y = f (x) = f (x1 , x2 ) = w0 + w1 x1 + w2 x2
Underfitting
Model Capacity
given the following data set D
Model vs. Data
Bias-Variance
input x target y
Regularization
(1, 1) 1
Tradeoff of
Regularization
(2, 3) 3
(3, 4) 4
(4, 3) 5
17
Discussion
Linear
Regression
Simple Linear Model
Model
Classification • D is a large number

Logistic
Regression
• Online learning
Evaluation
• Limitations of the model (hypothesis set)
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
18
Linear
Regression
Simple Linear Model
Model
Classification • In some cases the observations may be weighted; for example, they may not
Logistic be equally reliable. In this case, we find the vector of parameters w to
Regression
Binary Classification minimize the weighted sum of squares of errors
Evaluation
Multi-class
Classification N
Etrain = an (ŷn − yn )2 (7)
X
Softmax
Regression
Softmax Regression n=1
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
19
Solving Problem
Linear
Regression
Simple Linear Model
Model
Classification
Logistic 1. Construct the matrices X, A and the vector y

Regression
x1 a1 0 · · · 0 y1
 |     
Evaluation
 0 a2 · · · 0  y2
Multi-class
Classification  x |2   
Softmax X =  . , A =  . .. , y = (8)
     
.. . . ..
Regression  ..   ..

Softmax Regression
. . .   . 
x |N 0 0 · · · aN yN
Capacity, | {z } | {z } | {z }
Overfitting and input data matrix weight matrix target vector
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
2. Calculate the vector of parameters
w = (X | AX)−1 X | Ay (9)
Regularization
Tradeoff of
Regularization
20
Linear in What?
Linear
Regression
Simple Linear Model
Model
Classification • Linearity in the weights

Logistic
Regression
hw (x) = w0 + w 1 x1 + ... + w D xD (10)
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
21
Linear Basis Function Models
Linear
Regression
Simple Linear Model
Model
Logistic
Regression A linear basis function model is a linear combination of fixed nonlinear
Evaluation
functions of the input variables
Multi-class
Classification
Softmax hw (x) = w0 φ0 (x) + w1 φ1 (x) + ... + wM φM (x) (11)

Regression
where φi (x) are basis functions

Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
22
Some types of basis functions
Linear
Regression
Simple Linear Model
Model
Classification • Polynomial
Logistic
Regression
Evaluation
1
Multi-class
Classification
φj (x) = x j (12)
0.5
Softmax
Regression
Softmax Regression
0
Capacity,
Overfitting and −0.5
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
−1
−1 0 1
Regularization
Tradeoff of
Regularization
23
Some types of basis functions (cont.)
Linear
Regression
Simple Linear Model
Model
Classification • Gaussian
Logistic
Regression
Evaluation
1
Multi-class
Classification
!
0.75 (x − µj )2
Softmax φj (x; µj , sj ) = exp − (13)
Regression
Softmax Regression
sj2
0.5
Capacity,
Overfitting and 0.25
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
0
−1 0 1
Regularization
Tradeoff of
Regularization
24
Some types of basis functions (cont.)
Linear
Regression
Simple Linear Model
Model
Classification • Sigmoid
Logistic
Regression
Evaluation
1
1
Multi-class
Classification φj (x; µj , sj ) = x−µj (14)
0.75 −
Softmax
Regression
1+e sj
Softmax Regression
0.5
Capacity,
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
0
−1 0 1
Regularization
Tradeoff of
Regularization
25
Problem 2
Linear
Regression
Simple Linear Model
Model
Classification
• Find the relationship between Years of Education and Income based on
Logistic the given data
Regression
Evaluation
Multi-class
Classification
80
80
Softmax
Regression
70
70
Softmax Regression
60
60
Capacity,
Income
Income
Overfitting and
50
50
Underfitting
40
40
Model Capacity
Model vs. Data
Bias-Variance
30
30
Regularization
20
20
Tradeoff of
Regularization
10 12 14 16 18 20 22 10 12 14 16 18 20 22
Years of Education Years of Education
26
Problem Statement
Linear
Regression
Simple Linear Model
Model
Classification • The requirement is to build a system that can take a vector x ∈ RD as

Logistic input and predict the value of a scalar y ∈ R as its output
Regression
Evaluation
• The hypothesis set H
Multi-class
Classification
Softmax
y ≈ ŷ = hw (x) = w | φ(x) (15)
Regression
where ŷ be the value that our model (function) predicts y, w ∈ RM+1 is a

Softmax Regression
Capacity, vector of parameters of the model and φ is a set of M + 1 basis functions

Overfitting and
Underfitting  
Model Capacity
Model vs. Data
φ0 (x)
Bias-Variance  φ1 (x) 
φ(x) =  (16)
 
Regularization .. 
Tradeoff of
Regularization
 . 
φM (x)
27
Linear
Regression
Simple Linear Model
Model
Classification
• Task T : to predict y from x by outputting ŷ = hw (x) = w | φ(x)
Logistic • Performance measure P:
Regression
Binary Classification The mean squared error MSEtrain of the model on the train set Dtrain
including N samples {(x 1 , y1 ), (x 2 , y2 ) . . . (x N , yN )}
Evaluation
Multi-class
Classification
Softmax • The learning goal: find the vector of parameter w such that
Regression
Softmax Regression
Cross Entropy vs. MSE w = arg min(MSEtrain )
w
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
28
Solving Problem
Linear
Regression
Simple Linear Model
Model
Classification
Logistic 1. Construct the matrix Φ and the vector y

Regression
φ(x 1 )| y1
   
Evaluation
 y2
Multi-class
Classification  φ(x 2 )|  
Softmax Φ= , y =  .. (17)
   
Regression
.. 
Softmax Regression
 .   . 
φ(x N ) | yN
Capacity, | {z } | {z }
Overfitting and design matrix target vector
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
2. Calculate the vector of parameters
w = (Φ| Φ)−1 Φ| y (18)

Regularization
Tradeoff of
Regularization
29
Programming Example
Linear
Regression
Simple Linear Model
Model
Classification import numpy as np

Logistic import seaborn as sns
Regression
import matplotlib . pyplot as plt
Evaluation
Multi-class
Classification
sns. set_style (" darkgrid ")
Softmax
x = [1, 2, 3, 4, 5, 8, 10]
Regression y = [1.1 , 3.8, 8.5, 16, 24, 65, 99.2]
Softmax Regression
sns. regplot (x, y, order =2, ci=None , line_kws ={'color ':'red '})
Capacity,
plt. xlabel ('x')
Overfitting and plt. ylabel ('y')
Underfitting
Model Capacity
plt.show ()
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
30
Linear
Regression
Simple Linear Model
Model
Classification
100
Logistic
Regression
Evaluation
80
Multi-class
Classification
Softmax
Regression 60
Softmax Regression
Cross Entropy vs. MSE y
Capacity, 40
Overfitting and
Underfitting
Model Capacity
Model vs. Data 20
Bias-Variance
Regularization
Tradeoff of 0
Regularization
1 2 3 4 5 6 7 8 9 10
x
31
Word Example
Linear
Regression
Simple Linear Model
Model
Classification • Find a polynomial regression function y = f (x) = w0 + w1 x + w2 x 2 given the

Logistic data set D
Regression
Binary Classification input x target y
1 2
Evaluation
Multi-class
Classification
Softmax
2 3
Regression
Softmax Regression
3 3
Cross Entropy vs. MSE 4 5
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
32
Puzzle
Linear
Regression
Simple Linear Model
Model
Classification • What basis functions?

Logistic
Regression
Evaluation 1.0
Multi-class
Classification
Softmax
Regression 0.5
Softmax Regression
Capacity, 0.0
Overfitting and
Underfitting
y
Model Capacity
Model vs. Data
0.5
Bias-Variance
Regularization 1.0
Tradeoff of
Regularization
1.0 0.5 0.0 0.5 1.0
x
33
Classification
A real data set
Linear
Regression
Simple Linear Model
Model
Classification • Some 16-by-16 pixel grayscale image from the MNIST database
Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
35
Input representation
Linear
Regression
Simple Linear Model
Model
Classification Input representation or feature extraction

Logistic • “raw” input
Regression
pixels x | = x0 x1 . . . x256

linear model w | = w0 w1 . . . w 256

Evaluation
Multi-class
Classification
Softmax
• Feature extraction: extract useful information
intensity and symmetry x | = x1 x2

Regression
linear model w | = w1 w 2
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
36
Illustration of Features
Linear
Regression
Simple Linear Model
Model
Classification
Logistic
Regression
Evaluation
Multi-class
2x
Classification
Softmax
symmetry
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
intensity x1
37
The Problem with Categorical Data
Linear
Regression
Simple Linear Model
Model
Classification • Some algorithms can work with categorical data directly.

Logistic
Regression
• For example, a decision tree can be learned directly from categorical
Evaluation
data with no data transform required
Multi-class
Classification • Many machine learning algorithms cannot operate on label data directly.
Softmax • They require all input variables and output variables to be numeric.
Regression
Softmax Regression
• There are two common types of conversion: integer encoding and one-hot
Capacity, encoding
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
38
Integer Encoding
Linear
Regression
Simple Linear Model
Model
Classification • For categorical variables where no such ordinal relationship exists, the integer
Logistic encoding is not enough.
Regression
Evaluation
Multi-class
id color id color
Classification
1 red 1 1
Softmax
Regression 2 green 2 2
Softmax Regression
Cross Entropy vs. MSE 3 blue Integer 3 3
Capacity,
4 red encoding 4 1
Overfitting and
Underfitting
Model Capacity
… … … …
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
39
One-Hot Encoding
Linear
Regression
Simple Linear Model
Model
Classification • One-hot encoding ensures that machine learning does not assume that higher
Logistic numbers are more important.
Regression
Evaluation
Multi-class
id color id color_red color_green color_blue
Classification
1 red 1 1 0 0
Softmax
Regression 2 green 2 0 1 0
Softmax Regression
Cross Entropy vs. MSE 3 blue One-hot 3 0 0 1
Capacity,
4 red encoding 4 1 0 0
Overfitting and
Underfitting
Model Capacity
… … … … … …
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
40
Classifier Training
Linear
Regression
Simple Linear Model
Model
Classification
• Select the learning model for classifier, e.g., Perceptron
Logistic • Train the classifier/model using a training set D = {(x 1 , y1 ), ..., (x N , yN )}
Regression
Evaluation
Multi-class
Classification
training
set sample
Softmax
Regression
Softmax Regression input set
Cross Entropy vs. MSE ... ...
get
Capacity, next sample Model
... ...
Overfitting and training
Underfitting set
Model Capacity ... ...
Model vs. Data
Bias-Variance ... ... update
model cost value
Regularization
Tradeoff of
Regularization No statisfied? Yes Model
features target
41
Logistic Regression
• Binary Classification
• Evaluation
• Multi-class Classification
A Third Linear Model
Linear
Regression
Simple Linear Model
Model
Classification
d
Logistic
s= wi xi
X
Regression
i=0
Evaluation
Multi-class
linear classification linear regression logistic regression

Classification
Softmax
Regression h(x) = sign(s) h(x) = s h(x) = σ(s)
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
43
The logistic function
Linear
Regression
Simple Linear Model
Model
Classification
• The formula
Logistic 1
Regression
1
σ(s) = (19)
Evaluation
Multi-class
1 + e −s 0.75
Classification
Softmax • The logistic function converts a σ

0.5
Regression
Softmax Regression score to a probability
Capacity,
• Properties 0.25
Overfitting and
Underfitting σ(−s) = 1 − σ(s)
Model Capacity −6 −4 −2 0 2 4 6
Model vs. Data s
Bias-Variance
Regularization
σ 0 (s) = σ(s)(1 − σ(s))
Tradeoff of
Regularization
44
Probability Interpretation
Linear
Regression
Simple Linear Model
Model
Classification • h(x) = σ(s) can be interpreted as a probability

Logistic • For example, prediction of heart attacks
Regression
Binary Classification • Input x: cholesterol level, age, weight, etc.
Evaluation
Multi-class • The signal s = w T x: risk score
Classification
Softmax
• σ(s): probability of a heart attack
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
score probability of heart attack
45
Problem Statement
Linear
Regression
Simple Linear Model
Model
Classification • The target function f is the probability distribution

Logistic
Regression
f : RD → [0, 1]
Evaluation
• Hypothesis set hw (x) = σ(w T x) and the conditional probability

Multi-class
Classification
Softmax
Regression
hw (x) for y = 1

P(y | x, w) = (20)
Softmax Regression
1 − hw (x) for y = 0
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
46
Error measure
Linear
Regression
Simple Linear Model
Model
Classification We define error measurement based on likelihood

Logistic • For each (x, y), y is generated by probability hw (x).
Regression
Evaluation
• Plausible error measure based on likelihood of y given x and w
Multi-class
P(y | x, w) = z y (1 − z)1−y
Classification
Softmax
(21)
Regression
where
Softmax Regression
Capacity, z = hw (x) = σ(w | x) (22)

Overfitting and
Underfitting
Model Capacity
• Likelihood of D = {(x 1 , y1 )...(x N , yN )} given w is
Model vs. Data
Bias-Variance
Tradeoff of Capacity N
P(yn | x n , w) (23)
Y
Regularization
Tradeoff of
Regularization
n=1
47
Error measure (cont.)
Linear
Regression
Simple Linear Model
Model
Classification
• Learning goal: maximizing likelihood
Logistic QN
Regression Maximize P(y | x , w)
Binary Classification QN n n
n=1
Evaluation
⇔ Minimize − log n=1 P(yn | x n , w) (24)
⇔ Minimize − N
Multi-class
n=1 (yn log zn + (1 − yn ) log(1 − zn ))

Classification
P
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
48
Learning Algorithm (Gradient Descent)
Linear
Regression
Simple Linear Model
Model
Classification 1. Initialize the weights (parameters) at t = 0 w 0

Logistic 2. For t = 1, 2, 3, ... do
Regression
2.1 Compute the outputs zn for each x n (n = 1, ..., N)
Evaluation
Multi-class
Classification
zn = σ(w T
t xn) (25)
Softmax
Regression 2.2 Compute the gradient
Softmax Regression
Cross Entropy vs. MSE N
1 X
∇w E = x n (zn − yn ) (26)
Capacity, N
Overfitting and n=1
Underfitting
Model Capacity 2.3 Update the weights
Model vs. Data
Bias-Variance
Tradeoff of Capacity w t+1 = w t − η∇w E (27)
Regularization
Tradeoff of
Regularization where η is a learning rate (hyper-parameter)
Iterate the next step until w is not changes
3. Return the final weights w
49
Learning Rate
Linear
Regression
Simple Linear Model
Model
Classification • How η affects the algorithm?

Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
50
Programming Example
Linear
Regression
Simple Linear Model
Model
Classification import matplotlib . pyplot as plt

Logistic import seaborn as sns
Regression
sns.set(style=" darkgrid ")
Evaluation
Multi-class
Classification
# Load the example titanic dataset
Softmax
df = sns. load_dataset (" titanic ")
Regression
Softmax Regression
# Make a custom palette with gendered colors
Capacity,
pal = dict(male=" #6495 ED", female ="# F08080 ")
Overfitting and
Underfitting
Model Capacity
# Show the survival proability as a function of age and sex
Model vs. Data g = sns. lmplot (x="age", y=" survived ", col="sex", hue="sex", data=df ,
Bias-Variance
palette =pal , y_jitter =.02 , logistic =True , ci=None)
Regularization g.set(xlim =(0, 80) , ylim =( -.05 , 1.05))
Tradeoff of
Regularization plt.show ()
51
Linear
Regression
Simple Linear Model
Model
Classification
sex = male sex = female
Logistic 1.0
Regression
Evaluation
Multi-class
0.8
Classification
Softmax
Regression 0.6
survived
Softmax Regression
0.4
Capacity,
Overfitting and
Underfitting
Model Capacity
0.2
Model vs. Data
Bias-Variance
Tradeoff of Capacity 0.0
Regularization
Tradeoff of 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Regularization age age
52
Evaluation
Linear
Regression
Simple Linear Model
Model
Classification • Consider two-class problem with two classes ⊕ and

Logistic
Regression
• The performance of logistic regression model is based on a threshold th
y is ⊕ if P(y | x) ≥ th
Evaluation
Multi-class
Classification
Softmax y is if P(y | x) < th (28)

Regression
Softmax Regression
Capacity,
Overfitting and
• High threshold: high specificity, low sensitivity
Underfitting
Model Capacity
• Low threshold: low specificity, high sensitivity
Model vs. Data
Bias-Variance
• We should select the best threshold for the trade-off between the cost of false
positives vs false negatives
Regularization
Tradeoff of
Regularization
53
ROC Curve
Linear
Regression
Simple Linear Model
Model
Classification • The receiver operating characteristic (ROC) curve is plot which shows the
Logistic performance of a binary classifier as function of its cut-off threshold.
Regression
Binary Classification • It essentially shows the true positive rate (sensitivity) against the false
positive rate (1-specificity) for various threshold values.
Evaluation
Multi-class
Classification
Softmax
• The area under the curve (AUC) is an aggregated measure of performance.
Regression
1.00
Softmax Regression 1.00
Capacity, 0.75
Overfitting and
0.75
Underfitting
sensitivity
Model Capacity
0.50 0.50
p
Model vs. Data

Bias-Variance
Regularization 0.25 0.25
Tradeoff of
Regularization
0.00 0.00
−4 −2 0 2 0.00 0.25 0.50 0.75 1.00

x 1 − specificity
54
Multi-class classification problems
Linear
Regression
Simple Linear Model
Model
Classification • Email foldering/tagging: Work (1), Friends (2), Family (3), Hobby (4)
Logistic
Regression
• Medical diagrams: Not ill, Cold, Flu
Evaluation
• Weather: Sunny, Cloudy, Rain, Snow
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
55
Visual of Binary vs Multi-class classification
Linear
Regression
Simple Linear Model
Model
Classification
Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
56
Approaches
Linear
Regression
Simple Linear Model
Model
Classification • One-vs-one
Logistic
Regression
• Hierarchical
Evaluation
• One-vs-all
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
57
One-vs-all
Linear
Regression
Simple Linear Model
Model
Classification
Logistic
Regression
• Class # (1):
Evaluation
Multi-class
h(1) (x) = P(y = 1 | x, w 1 )
Classification
Softmax
Regression
• Class M (2):
Softmax Regression
h(2) (x) = P(y = 2 | x, w 2 )

Capacity,
Overfitting and
Underfitting
Model Capacity
• Class ♦ (3):
Model vs. Data
h(3) (x) = P(y = 3 | x, w 3 )

Bias-Variance
Regularization
Tradeoff of
Regularization
58
One-vs-all (cont.)
Linear
Regression
Simple Linear Model
Model
Classification Learning
Logistic • Train a logistic regression classifier h(i) (x) for each class i to predict the
Regression
Binary Classification probability that y = i.
Evaluation
Multi-class
Classification
Prediction
Softmax • On a new input x, to make a prediction, pick the class i that maximizes
Regression
Softmax Regression
arg max(h(i) (x)) (29)
Capacity, i
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
59
Decision boundaries and decision regions
Linear
Regression
Simple Linear Model
Model
Classification • Logistic regression for Iris dataset

Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
60
Softmax Regression
• Softmax Regression
• Cross Entropy vs. MSE
Softmax Regression
Linear
Regression
Simple Linear Model
Model
Logistic
Regression Softmax regression is a generalization of logistic regression that we can use for
Evaluation
multi-class classification
Multi-class
Classification
Softmax
• In Softmax regression, we replace the sigmoid function by the so-called
Regression
Softmax Regression
softmax function φ(·) = {φ1 , ..., φC }.
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
62
Score function
Linear
Regression
Simple Linear Model
Model
Logistic
Regression The score function f that maps the raw features to class scores.
z = f (x; W , b) = W x + b (30)
Evaluation
Multi-class
Classification
Softmax
Regression input image
Softmax Regression
Capacity, 56
Overfitting and 0.2 -0.5 0.1 2.0 1.1 -96.8 cat score
Underfitting
Model Capacity 231
Model vs. Data
1.5 1.3 2.1 0.0 3.2 437.9 dog score
Bias-Variance
Tradeoff of Capacity 24
Regularization 0 0.25 0.2 -0.3 -1.2 61.95 ship score
Tradeoff of
Regularization 2
63
Score function (cont.)
Linear
Regression
Simple Linear Model
Model
Classification • Use bias trick W0 = b to represent the two parameters W , b as one

Logistic
Regression
z = f (x; W ) = W x (31)
Evaluation
Multi-class
Classification new new
Softmax
Regression 1
Softmax Regression
Cross Entropy vs. MSE 56
0.2 -0.5 0.1 2.0 1.1 1.1 0.2 -0.5 0.1 2.0 56
Capacity,
Overfitting and 231
Underfitting 1.5 1.3 2.1 0.0 3.2 3.2 1.5 1.3 2.1 0.0 231
Model Capacity
Model vs. Data
24
Bias-Variance 0 0.25 0.2 -0.3 -1.2 -1.2 0 0.25 0.2 -0.3 24
Regularization
2
Tradeoff of 2
Regularization
64
Softmax function
Linear
Regression
Simple Linear Model
Model
Logistic
Regression The softmax function converts a score vector z = (z1 , ..., zC ) to a discrete
distribution vector p = (p1 , ..., pC )
Evaluation
Multi-class
e zi
Classification
Softmax
Regression
pi = P(y = i | z) = φi (z) = PC , i ∈ [1, ..., C] (32)
zj
Softmax Regression j=1 e
Capacity,
Overfitting and
where
Underfitting
Model Capacity
zi = w i0 + w i1 x1 + ... + w iD xD = w |i x (33)
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
65
Softmax function (cont.)
Linear
Regression
Simple Linear Model
Model
Classification score vectors

Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression softmax
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
distribution vectors
66
Softmax function (cont.)
Linear
Regression
Simple Linear Model
Model
Classification
input image
Logistic
Regression
1
Evaluation
Multi-class
Classification 56 -96.8 0 cat
Softmax
Regression
Softmax Regression
231 437.9 1 dog
Capacity, 24 61.95 0 ship

Overfitting and
Underfitting
Model Capacity 2
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
67
Problem Statement
Linear
Regression
Simple Linear Model
Model
Classification • Given D = {(x 1 , y1 )...(x N , yN )} where yn ∈ {1, ..., C}.

Logistic
Regression
• Denote t n is a one-hot encoding of yn (or target discrete distribution)
Evaluation
• Learning goal: Find a softmax function φW (·) = {φ1 , ..., φC } that minimize
Multi-class
Classification
Softmax
N
arg min E (φW ) = arg min CE (p n , t n ) (34)
X
Regression
Softmax Regression
W W
Cross Entropy vs. MSE n=1
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
68
Cross Entropy
Linear
Regression
Simple Linear Model
Model
Logistic
Regression Cross-entropy (CE) is a measure of the difference between two probability
distributions. The cross-entropy between a “true” distribution t = (t1 , ..., tC ) and
Evaluation
Multi-class
Classification an estimated distribution p = (p1 , ..., pC ) is defined as
Softmax
Regression C
CE (p, t) = − ti log pi (35)
Softmax Regression
X
Capacity, i=1
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
69
Cross Entropy (cont.)
Linear
Regression
Simple Linear Model
Model
Classification p t
Logistic 0.45
Regression 0.4
0.35
probability
0.3
Evaluation 0.25
Multi-class 0.2
Classification 0.15
0.1
Softmax 0.05
Regression 0
Softmax Regression A B C D E
class
Capacity,
Overfitting and
Underfitting
p = (0.1, 0.2, 0.4, 0.2, 0.1)

Model Capacity

→ CE (p, t) = 1.678
Model vs. Data
Bias-Variance
Tradeoff of Capacity t = (0.2, 0.4, 0.2, 0.1, 0.1)
Regularization
Tradeoff of
Regularization
• CE > 0
• CE (p, t) 6= CE (t, p)
• CE minimize if pi = ti , ∀i
70
MSE
Linear
Regression
Simple Linear Model
Model
Logistic
Regression Mean squared error (MSE) is a measure of the average of the squares of the
errors.
Evaluation
Multi-class C
1X
MSE (p, t) = (pi − ti )2 (36)
Classification
Softmax
C
Regression i=1
Softmax Regression
Capacity, • MSE ≥ 0
Overfitting and
Underfitting • MSE (p, t) = MSE (t, p)
Model Capacity
Model vs. Data
Bias-Variance
• MSE = 0 if pi = ti , ∀i
Regularization
Tradeoff of
Regularization
71
Linear
Regression
Simple Linear Model
Model
Classification • Consider three “true” binary distributions p = (0.1, 0.9), (0.5, 0.5) and
Logistic (0.8, 0.2)
Regression
Evaluation p = 0.1 p = 0.5 p = 0.8
Multi-class
6 6 6
Classification
Softmax
5 5 5
Regression
Softmax Regression 4 4 4
3 3 3
Capacity,
Overfitting and
Underfitting 2 (0.1log(q) + 0.9log(1 q)) 2 2 (0.8log(q) + 0.2log(1 q))
Model Capacity
1 1
(0.5log(q) + 0.5log(1 q)) 1
Model vs. Data
Bias-Variance (q 0.1)2 (q 0.5)2 (q 0.8)2
Tradeoff of Capacity 0 0 0
Regularization
Tradeoff of 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Regularization
72
Learning Algorithm
Linear
Regression
Simple Linear Model
Model
Classification 1. Initialize the weights W 0 (parameters) at t = 0

Logistic 2. For t = 1, 2, 3, ... do
Regression
2.1 Compute the ouput distribution p n for each x n (n = 1...N)
Evaluation
Multi-class
Classification
p n = softmax(W t x n ) (37)
Softmax
Regression 2.2 Compute the gradient
Softmax Regression
Cross Entropy vs. MSE N
1 X
∇W E = (p n − t n )x |n (38)
Capacity, N
Overfitting and n=1
Underfitting
Model Capacity 2.3 Update the weights
Model vs. Data
Bias-Variance
Tradeoff of Capacity W t+1 = W t − η∇W E (39)
Regularization
Tradeoff of
Regularization Iterate the next step until W is not change
3. Return the final weights W
73
Example
Linear
Regression
Simple Linear Model
Model
Classification • Softmax regression for Iris dataset

Logistic
Regression
Binary Classification Softmax Regression - Gradient Descent
Evaluation 0
1 0.30
Multi-class 2 2
Classification
Softmax 0.25
1
Regression
Softmax Regression 0.20
Cost
Cross Entropy vs. MSE 0
Capacity, 0.15
Overfitting and 1
Underfitting 0.10
Model Capacity
2
Model vs. Data
Bias-Variance
0.05
2 1 0 1 2 3 0 100 200 300 400 500
Tradeoff of Capacity Iterations
Regularization
Tradeoff of
Regularization
74
Capacity, Overfitting and Underfitting
• Model Capacity
• Model vs. Data
• Bias-Variance
• Tradeoff of Capacity
• Regularization
• Tradeoff of Regularization
Model Capacity
Linear
Regression
Simple Linear Model
Model
Classification
Logistic Concept 9
Regression
Model space
Evaluation
Capacity is model complexity.
Multi-class
The most common ways to estimate the

Classification
Softmax
Regression
Softmax Regression
capacity of a model:
• VC dimension
Capacity,
Overfitting and • The number of parameters
Underfitting
Model Capacity
Model vs. Data
• The norm of parameters simple
Bias-Variance
Regularization
Tradeoff of
Regularization complex
76
Model vs. Data
Linear
Regression
Simple Linear Model
Model
Logistic
Regression Data is divided into three sets: training set, validation set and test set.
Evaluation
Concept 11
Multi-class
Classification
Softmax
Regression Models can be too limited. We can’t find a function that fits the data well. This
Softmax Regression
is called underfitting.
Capacity,
Overfitting and
Underfitting Concept 12
Models can also be too rich. We don’t just model the data, but also the
Model Capacity
Model vs. Data
underlying noise. This is called overfitting.

Bias-Variance
Regularization
Tradeoff of
Regularization
77
Model vs. Data (cont.)
Linear
Regression
Simple Linear Model
Model
Classification • Given the data set D = {(x1 , y1 ), ...(x10 , y10 )} shown in the following figure,
Logistic find the best regression function to the data
Regression
Evaluation
Multi-class
Classification
1
Softmax
Regression
Softmax Regression
Capacity, y0
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
−1
Tradeoff of
Regularization
0 1
x
78
Linear
Regression
Simple Linear Model
Model
Classification • Consider the simple hypothesis set

Logistic
Regression
H1 = {h | y = h(x) = w0 + w 1 x} (40)
Evaluation
Multi-class
Classification
Softmax
1 1 M =1
Regression
Softmax Regression
Capacity, y0 y0
Overfitting and
Underfitting
Model Capacity
Model vs. Data −1 −1
Bias-Variance
Regularization 0 1 0 1
Tradeoff of x x
Regularization
79
Linear
Regression
Simple Linear Model
Model
Classification • Consider the hypothesis set

Logistic
Regression
H3 = {h | y = h(x) = w0 + w 1 x + w2 x 2 + w 3 x 3 } (41)
Evaluation
(note that H1 ⊂ H3 )
Multi-class
Classification
Softmax
Regression
Softmax Regression
1 1 M =3
Capacity,
Overfitting and
Underfitting y0 y0
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity −1 −1
Regularization
Tradeoff of
Regularization
0 1 0 1
x x
80
Linear
Regression
Simple Linear Model
Model
Classification • Consider the hypothesis set

Logistic
Regression
H9 = {h | y = h(x) = w 0 + w 1 x + w 2 x 2 + ... + w 9 x 9 } (42)
Evaluation
(note that H1 ⊂ H3 ⊂ H9 )
Multi-class
Classification
Softmax
Regression
Softmax Regression
1 1 M =9
Capacity,
Overfitting and
Underfitting y0 y0
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
0 1 0 1
x x
81
Linear
Regression
Simple Linear Model
Model
Classification
Logistic Which one 1 1 M =1

Regression
Binary Classification • Under-fitting
Evaluation y0 y0
Multi-class
Classification
• Over-fitting
Softmax • Appropriate −1 −1
Regression
Softmax Regression fitting 0
x
1 0
x
1
Capacity, M =3 M =9
1 1
Overfitting and
Underfitting
Model Capacity y0 y0
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
0 1 0 1
Regularization x x
82
Linear
Regression
Simple Linear Model
Model
Classification
M=1 M=3 M=9
Logistic
Regression w0 0.82 0.31 0.35
Evaluation
w1 -1.27 7.99 232.37
Multi-class
Classification w2 -25.43 -5321.83
Softmax w3 17.37 48568.31
Regression
Softmax Regression w4 -231639.30
w5 640042.26
Capacity,
Overfitting and w6 -1061800.52
Underfitting
Model Capacity w7 1042400.18
Model vs. Data
Bias-Variance
w8 -557682.99
Regularization
w9 125201.43
Tradeoff of
Regularization
83
Model Performance
Linear
Regression
Simple Linear Model
Model
Classification 1
Logistic Training
Regression
Test
Evaluation
Multi-class
Classification
ERMS
Softmax 0.5
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data 0
Bias-Variance 0 3 M 6 9
Regularization
Tradeoff of
Regularization
Figure 1: Graphs of the root-mean-square error evaluated on the training set and on an
independent test set for various values of M
84
What happen if increasing N
Linear
Regression
Simple Linear Model
Model
Classification
Logistic
Regression 1 N = 15 1 N = 100
Evaluation
Multi-class
Classification
Softmax
y0 y0
Regression
Softmax Regression
Capacity, −1 −1
Overfitting and
Underfitting
Model Capacity
Model vs. Data 0 1 0 1
Bias-Variance x x
Regularization
Tradeoff of
Figure 2: Using the M = 9 polynomial for N = 15 data points (left plot) and N = 100
Regularization
data points (right plot). We see that increasing the size of the data set reduces the
over-fitting problem
85
Errors in Learning Model
Linear
Regression
Simple Linear Model
Model
Classification • Bias errors: error due to assumption in the model

Logistic
Regression
• High bias to signify underfitting
Evaluation
• Variance errors: It measures the variability in the results given by model
Multi-class
Classification when the dataset is changed
Softmax • High variance to signify overfitting
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting Expected error = Bias + Variance + Irreducible Error (43)
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
86
Errors in Learning Model (cont.)
Linear
Regression
Simple Linear Model
Model
Logistic
Regression Given a learning model hH, Ai, we define the “average” hypothesis g(x)
g(x) = ED [gD (x)] (44)

Evaluation
Multi-class
Classification
Softmax
Regression where gD (x) is the “best” hypothesis given the data set D
Softmax Regression
Cross Entropy vs. MSE • Bias of learning model
Capacity,
Overfitting and
Bias = Ex (g(x) − f (x))2 (45)
h i
Underfitting
Model Capacity
Model vs. Data
where f (x) is the “truth” function

Bias-Variance
Regularization
Tradeoff of
Regularization
• Variance of learning model
Variance = Ex ED (gD (x) − g(x))2 (46)

87
Errors in Learning Model (cont.)
Linear
Regression
Simple Linear Model
Model
truth function
Classification
Logistic
Regression
bias
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
variance
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
• Given many independent data sets D1 , D2 , ..., DK , we can estimate g(x) by
Regularization
K
1 X
g(x) ≈ gDi (x) (47)
K
i=1
88
Example: two learning models
Linear
Regression
Simple Linear Model
Model
Classification • Consider a target function sine

Logistic
Regression
f : [−1, 1] →
(48)
R
Evaluation
Multi-class
x → sin(πx)
Classification
Softmax
Regression
• We generate 100 data sets {Di } , i = 1, ..., 100, each containing N = 2 data
Softmax Regression
points, independently from the sinusoidal curve f (x) = sin(πx). For each
Capacity,
data set Di , we fit the data using one of two models
Overfitting and
Underfitting
• H0 : set of all lines of the form h(x) = b
Model Capacity
Model vs. Data
• H1 : set of all lines of the form h(x) = ax + b
Bias-Variance • Note that H0 ⊂ H1
Regularization
Tradeoff of
Regularization
89
Example: two learning models (cont.)
Linear
Regression
Simple Linear Model
Model
Classification
• Given a data set Di = {(x1 , y1 ), (x2 , y2 )}.
Logistic • For H0 , we choose the constant hypothesis that best fits the data (the
horizontal line at the midpoint, b = (y1 + y2 )/2).
Regression
Evaluation
Multi-class
• For H1 , we choose the line that passes through the two data points
Classification
(x1 , y1 ) and (x2 , y2 ).
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
90
Linear
Regression
Simple Linear Model
Model
Classification • Repeating this process with 100 data sets {Di } , i = 1, ..., 100,
Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
91
Linear
Regression
Simple Linear Model
Model
Classification • The bias-variance for each learning model

Logistic
Regression
Evaluation
Multi-class
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
bias=0.50 bias=0.21
Tradeoff of
Regularization var=0.25 var=1.69
92
Generalization and Capacity
Linear
Regression
Simple Linear Model
Model
Classification The criteria determining how well a machine learning model will perform:
Logistic 1. Make the training error small.
Regression
Evaluation
2. Make the gap between training and test (generalization) error small.
Multi-class
Classification
Softmax Training error

Regression Under-fitting Over-fitting
Softmax Regression Generalization error
Capacity,
Error
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Generalization gap
Regularization
Tradeoff of
Regularization 0 Optimal Capacity
Capacity
93
Regularization
Linear
Regression
Simple Linear Model
Model
Logistic
Regression Regularization is any modification we make to a learning model that is intended
Evaluation
to reduce its generalization error but not its training error.
Multi-class
Classification
performance measure
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
algorithm
Model Capacity
Model vs. Data
Bias-Variance
data
Regularization
Tradeoff of
Regularization
model
94
Addressing
Linear
Regression
Simple Linear Model
Model
Classification
High bias High variance
Logistic
Regression Obtain more features Decrease number of features
Evaluation
Decrease regularization λ Increase regularization λ
Multi-class
Classification Extend model Obtain more data
Softmax Train longer Stop early
Regression
Softmax Regression New model architecture New model architecture
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
95
Regularization for Linear Regression
Linear
Regression
Simple Linear Model
Model
Classification • Using regularized MSEtrain for linear regression

Logistic
Regression

1 T
w = arg min MSEtrain + λ w w (49)
Evaluation
Multi-class
w N
Classification
Softmax
Regression
where λ is the regularization coefficient (hyper-parameter) that controls the
Softmax Regression relative importance of the data-dependent error MSEtrain and the
1 T
regularization term λ N w w
Capacity,
Overfitting and
Underfitting
• Solving for w, we obtain
Model Capacity
w = (X | X + λI)−1 X | y (50)
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
96
Example: one learning model
Linear
Regression
Simple Linear Model
Model
Classification • Consider a target function sine

Logistic
Regression
f : [−1, 1] →
(51)
R
Evaluation
Multi-class
x → sin(2πx)
Classification
Softmax
Regression
• We generate 100 data sets {Di } , i = 1, ..., 100, each containing N = 25 data
Softmax Regression
points, independently from the sinusoidal curve f (x) = sin(2πx). For each
Capacity,
data set Di , we fit a model with 24 Gaussian basis functions by minimizing
Overfitting and
Underfitting
the regularized error function
Model Capacity
Model vs. Data
Bias-Variance
Regularization
Tradeoff of
Regularization
97
Example: one learning model (cont.)
Linear
Regression
Simple Linear Model
Model
Classification
• Illustration of the dependence of bias and variance on model regularization
Logistic coefficient
Regression
Evaluation
Multi-class y y y
Classification
Softmax
Regression
Softmax Regression
Capacity,
Overfitting and
Underfitting
Model Capacity
Model vs. Data
Bias-Variance
Tradeoff of Capacity y y y
Regularization
Tradeoff of
Regularization
98
Example: one learning model (cont.)
Linear
Regression
Simple Linear Model
Model
Classification
• Summary of the dependence of bias and variance on model regularization
Logistic coefficient
Regression
Binary Classification 0.15
Evaluation
Multi-class
Classification
(bias)2
0.12 variance
Softmax
Regression (bias)2 + variance
Softmax Regression
0.09 test error
Capacity,
Underfitting
Model Capacity
Model vs. Data
0.03
Bias-Variance
Regularization
Tradeoff of
0
Regularization −3 −2 −1 0 1 2
ln λ
99
References
Goodfellow, I., Bengio, Y., and Courville, A. (2016).

Deep learning.
MIT press.
Lê, B. and Tô, V. (2014).
Cở sở trí tuệ nhân tạo.
Nhà xuất bản Khoa học và Kỹ thuật.
Russell, S. and Norvig, P. (2021).
Artificial intelligence: a modern approach.
Pearson Education Limited.

Lect03 Linear Model ML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect03 Linear Model ML

Uploaded by

Copyright:

Available Formats

LINEAR MODEL

Bùi Tiến Lên

5. Capacity, Overfitting and Underfitting

0 50 100 200 300 0 50 100 200 300

Capacity, a vector of parameters of the model

Classification • D is a large number

Logistic 1. Construct the matrices X, A and the vector y

Classification • Linearity in the weights

Softmax hw (x) = w0 φ0 (x) + w1 φ1 (x) + ... + wM φM (x) (11)

where φi (x) are basis functions

Years of Education Years of Education

Classification • The requirement is to build a system that can take a vector x ∈ RD as

where ŷ be the value that our model (function) predicts y, w ∈ RM+1 is a

Capacity, vector of parameters of the model and φ is a set of M + 1 basis functions

Logistic 1. Construct the matrix Φ and the vector y

w = (Φ| Φ)−1 Φ| y (18)

Classification import numpy as np

Classification • Find a polynomial regression function y = f (x) = w0 + w1 x + w2 x 2 given the

Classification • What basis functions?

Classification Input representation or feature extraction

linear model w | = w0 w1 . . . w 256

Classification • Some algorithms can work with categorical data directly.

linear classification linear regression logistic regression

Softmax • The logistic function converts a σ

Classification • h(x) = σ(s) can be interpreted as a probability

score probability of heart attack

Classification • The target function f is the probability distribution

• Hypothesis set hw (x) = σ(w T x) and the conditional probability

Classification We define error measurement based on likelihood

Capacity, z = hw (x) = σ(w | x) (22)

n=1 (yn log zn + (1 − yn ) log(1 − zn ))

Classification 1. Initialize the weights (parameters) at t = 0 w 0

Classification • How η affects the algorithm?

Classification import matplotlib . pyplot as plt

Classification • Consider two-class problem with two classes ⊕ and

Softmax y is if P(y | x) < th (28)

Cross Entropy vs. MSE

Model vs. Data

−4 −2 0 2 0.00 0.25 0.50 0.75 1.00

h(2) (x) = P(y = 2 | x, w 2 )

h(3) (x) = P(y = 3 | x, w 3 )

Classification • Logistic regression for Iris dataset

Classification • Use bias trick W0 = b to represent the two parameters W , b as one

Classification score vectors

Capacity, 24 61.95 0 ship

Classification • Given D = {(x 1 , y1 )...(x N , yN )} where yn ∈ {1, ..., C}.

p = (0.1, 0.2, 0.4, 0.2, 0.1)

Classification 1. Initialize the weights W 0 (parameters) at t = 0

Classification • Softmax regression for Iris dataset

The most common ways to estimate the

underlying noise. This is called overfitting.

Classification • Consider the simple hypothesis set

Classification • Consider the hypothesis set

Classification • Consider the hypothesis set

Logistic Which one 1 1 M =1

Cross Entropy vs. MSE

Classification • Bias errors: error due to assumption in the model

g(x) = ED [gD (x)] (44)

where f (x) is the “truth” function

Variance = Ex ED (gD (x) − g(x))2 (46)

Classification • Consider a target function sine

Classification • The bias-variance for each learning model

Softmax Training error