Professional Documents
Culture Documents
From Linear Regression To Logistic Regression - Update - 1
From Linear Regression To Logistic Regression - Update - 1
All-in-One Course
Quang-Vinh Dinh
PhD in Computer Science
Year 2023
Outline
➢ Optimization Review
➢ Linear Regression Review
➢ Logistic Regression
➢ Examples
➢ Vectorization
➢ Implementation (optional)
AI VIETNAM
All-in-One Course
Optimization
❖ Gradient descent
𝐽
𝐽
𝐝
𝐉 𝐱𝟐 < 𝟎 𝐝
𝐝𝐱
J(x) 𝐝𝐱
𝐉 𝐱𝟏 > 𝟎
∆𝐽
∆𝑥 𝑥2 𝑥op 𝑥1 𝑥
𝑥 𝑥 + ∆𝑥
𝑥 d
x𝑛𝑒𝑤 = x𝑜𝑙𝑑 − 𝜂 J x𝑜𝑙𝑑 Derivate at x𝑜𝑙𝑑
dx
𝑑 𝐽 𝑥 + ∆𝑥 − 𝐽(𝑥 )
𝐽(𝑥) = lim
𝑑𝑥 ∆𝑥→0 ∆𝑥 learning rate 1
Di chuyển 𝜽 ngược hướng đạo hàm
Optimization d
J 𝜃 >0
d𝜃
Dịch chuyển 𝜃
❖ Gradient descent J 𝜃 về phía trái
d
J 𝜃 >0
d𝜃
Khởi tạo giá trị 𝜽 Dịch chuyển 𝜃
về phía trái
d
J 𝜃 >0
d𝜃 𝜃 value
𝐉 𝜃
Dịch chuyển 𝜃 Cứ tiếp tục di chuyển ngược
về phía trái
hướng đạo hàm
J 𝜃
𝜃 value d
J 𝜃 >0
d𝜃
d
J 𝜃 <0 Dịch chuyển 𝜃 về phía trái
d𝜃
Dịch chuyển 𝜃
về phía phải 𝜃 value 2
AI VIETNAM
All-in-One Course
Optimization
❖ Square function
𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥𝑡−1 )
𝑓 𝑥 = 𝑥2
Initialize x
Compute
derivative at x
Move x
opposite to dx
−100 ≤ 𝑥 ≤ 100
𝑥∈ℕ
3
Optimization
❖ Square function
𝑓 𝑥 = 𝑥2
𝑥0 = 99.0
𝜂 = 0.1
𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥)
4
Optimization
❖ Square function
𝑓 𝑥 = 𝑥2
𝑥0 = 99.0
𝜂 = 0.001
𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥)
5
Optimization
❖ Square function
𝑓 𝑥 = 𝑥2
𝑥0 = 99.0
𝜂 = 0.8
𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥)
6
Optimization
❖ Square function
𝑓 𝑥 = 𝑥2
𝑥0 = 99.0
𝜂 = 1. 1
𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥)
7
AI VIETNAM
All-in-One Course
Optimization
❖ Optimization: 2D function
𝑓 𝑥, 𝑦 = 𝑥 2 + 𝑦 2
−100 ≤ 𝑥, 𝑦 ≤ 100
𝑥, 𝑦 ∈ ℕ
8
AI VIETNAM
𝜂 = 1.0
All-in-One Course
Derivative
𝑥0 = 3.0 𝑦0 = 4.0
❖ Optimization: 2D function 𝜕𝑓(𝑥0 , 𝑦0 ) 𝜕𝑓(𝑥0 , 𝑦0 )
= 6.0 = 8.0
𝑓 𝑥, 𝑦 = 𝑥 2 + 𝑦 2 𝜕𝑥 𝜕y
𝑥1 = 2.0 𝑦1 = 3.0
−100 ≤ 𝑥, 𝑦 ≤ 100
𝜕𝑓(𝑥1 , 𝑦1 ) 𝜕𝑓(𝑥1 , 𝑦1 )
𝑥, 𝑦 ∈ ℕ = 4.0 = 6.0
𝜕𝑥 𝜕y
𝑥2 = 1.0 𝑦2 = 2.0
𝜕𝑓(𝑥2 , 𝑦2 ) 𝜕𝑓(𝑥2 , 𝑦2 )
= 2.0 = 4.0
𝜕𝑥 𝜕y
𝜕𝑓(𝑥, 𝑦)
𝑥 =𝑥−𝜂 𝑥3 = 0.0 𝑦3 = 1.0
𝜕𝑥
𝜕𝑓(𝑥3 , 𝑦3 ) 𝜕𝑓(𝑥3 , 𝑦3 )
= 0.0 = 0.0
𝜕𝑓(𝑥, 𝑦) 𝜕𝑥 𝜕y
𝑦 =𝑦−𝜂 𝑥4 = 0.0 𝑦4 = 0.0
𝜕𝑦
9
AI VIETNAM
All-in-One Course
Optimization
❖ For composite function 2
𝑓 𝑥 = 2𝑥 − 1 𝑔 𝑓 = 𝑓−3
𝑑
𝑓 𝑥
𝑑𝑥
𝑥 𝑓
𝑥 𝑓 𝑔
𝑔 𝑓 𝑥
𝑑 𝑑
𝑓 𝑥 𝑔 𝑓
𝑑𝑥 𝑑𝑓 𝑥 𝑔
𝑓 𝑥 𝑔 𝑓
𝑑 𝑑 𝑑
𝑔 𝑓 𝑥 = 𝑔 𝑓 ∗ 𝑓 𝑥
𝑑𝑥 𝑑𝑓 𝑑𝑥
AI VIETNAM
All-in-One Course
Optimization
❖ For composite function
𝑑
𝑓 𝑥 𝑓 𝑥 = 2𝑥 − 1
𝑑𝑥
2
𝑔 𝑓 = 𝑓−3
𝑥 𝑓
𝑔 𝑥 = 2𝑥 − 1 − 3 2
= 2𝑥 − 4 2
𝑥 𝑓 𝑔
𝑑 𝑑
𝑓 𝑥 𝑔 𝑓
𝑑𝑥 𝑑𝑓 𝑔′ 𝑥 = 4 2𝑥 − 4
𝑑 𝑑 𝑑 = 8𝑥 − 16
𝑔 𝑓 𝑥 = 𝑔 𝑓 ∗ 𝑓 𝑥
𝑑𝑥 𝑑𝑓 𝑑𝑥
11
AI VIETNAM
All-in-One Course
Optimization
❖ For composite function and chain rule
𝑓 𝑥 = 2𝑥 − 1
𝑑 2
𝑓 𝑥 𝑔 𝑓 = 𝑓−3
𝑑𝑥
𝑥 𝑓 𝑓′ 𝑥 = 2
𝑔′ 𝑓 = 2 𝑓 − 3
𝑥 𝑓 𝑔
𝑑𝑔 𝑑𝑔 𝑑𝑓
𝑑 =
𝑑 𝑑𝑥 𝑑𝑓 𝑑𝑥
𝑓 𝑥 𝑔 𝑓
𝑑𝑥 𝑑𝑓
=2 𝑓−3 2
𝑑 𝑑 𝑑 = 4 2𝑥 − 1 − 3
𝑔 𝑓 𝑥 = 𝑔 𝑓 ∗ 𝑓 𝑥
𝑑𝑥 𝑑𝑓 𝑑𝑥 = 8𝑥 − 16
12
Outline
➢ Optimization Review
➢ Linear Regression Review
➢ Logistic Regression
➢ Examples
➢ Vectorization
➢ Implementation (optional)
House Feature Label Feature Label
Price
Prediction
price = 𝑤1 ∗ area + 𝑏1
price = 𝑤2 ∗ area + 𝑏2
price = 𝑤3 ∗ area + 𝑏3
price = w∗ area + 𝑏
𝑦ො = 𝑤𝑥 + 𝑏
𝐿(𝑦ො , 𝑦) = 𝑦ො − 𝑦 2
14
AI VIETNAM
All-in-One Course
Linear Regression
❖ Area-based house price prediction
w = 1.17
b = 0.26
predicted_price = w ∗ area + 𝑏
error = predicted_price − real_price 2
𝑦ො = 𝑤𝑥 + 𝑏
𝐿(𝑦ො , 𝑦) = 𝑦ො − 𝑦 2
15
AI VIETNAM
All-in-One Course
Linear Regression
❖ Area-based house
𝑦ො = 𝑤𝑥 + 𝑏 How to change w and b
price prediction
𝐿(𝑦,
ො 𝑦) = 𝑦ො − 𝑦 2 so that 𝐿(𝑦,
ො 𝑦) reduces
w = -0.34
w = 1.17
b = 0.04 b = 0.26
16
AI VIETNAM
𝑦ො = 𝑤𝑥 + 𝑏
All-in-One Course
Linear Regression 𝐿(𝑦,
ො 𝑦) = 𝑦ො − 𝑦 2
Different b values with a fixed w value Different w values with a fixed b value
17
AI VIETNAM
All-in-One Course
Linear Regression
Initialize 𝑤 and 𝑏
Linear equation
𝑦ො = 𝑤𝑥 + 𝑏
Compute output 𝑦ො
where 𝑦ො is a predicted value,
𝑤 and 𝑏 are parameters Training
data Compute loss
and 𝑥 is input feature Pick a sample
(x, y)
Compute derivative
Error (loss) computation
for each parameter
Idea: compare predicted values 𝑦ො and label values y
Squared loss
Update parameters
ො 𝑦) = (𝑦ො − 𝑦)2
L(𝑦, x=area and y=price
18
AI VIETNAM
All-in-One Course
Linear Regression
Model
Parameters Training
data Compute loss
𝑏 𝑤 Pick a sample
(x, y)
Compute derivative
Label for each parameter
𝑦ො = 𝑤𝑥 + 𝑏
𝑦
Update parameters
(𝑦ො − 𝑦)2 x=area and y=price
Loss 20
AI VIETNAM
All-in-One Course
Linear Regression
Feature Label Initialize 1
b=0.04 and Input 𝑥 = 6.7
w=-0.34
Given
sample Model
data Parameters
𝑏 = 0.04 w = -0.34
𝑦ො = 𝑥𝑤 + 𝑏 = -2.238 𝑦 = 9.1
Forward Loss
propagation 2
𝑦ො − 𝑦 = 128.5
21
AI VIETNAM
All-in-One Course
Linear Regression
2 3
Input 𝑥 = 6.7 Input 𝑥 = 6.7
Backpropagation Forward
propagation
Model Model
Parameters 𝜂 = 0.01 Parameters
𝑏 = 0.26676 𝑤 = 1.17929 𝑏 = 0.26676 𝑤 = 1.17929
𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿
𝑏 =𝑏−𝜂 w=w−𝜂 𝑏 =𝑏−𝜂 w=w−𝜂
𝜕𝑏 𝜕𝑤 𝜕𝑏 𝜕𝑤
Label Label
𝜕𝐿
ෝ−𝑦
= 2𝑥 𝑦
𝜕𝑤
= −151.9292
𝜕𝐿
ෝ−𝑦
=2 𝑦 Loss New w and b help Loss
𝜕𝑏
= −22.676 𝑦ො − 𝑦 2
= 128.5 the loss reduce 𝑦ො − 𝑦 2
= 0.868
AI VIETNAM
All-in-One Course
Linear Regression
❖ Toy example
Model prediction before and after the first update
(𝑦ො − 𝑦)2 𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂
𝜕𝑤 𝜕𝑏
Loss 𝜂 is learning rate
24
Outline
➢ Optimization Review
➢ Linear Regression Review
➢ Logistic Regression
➢ Examples
➢ Vectorization
➢ Implementation (optional)
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Linear regression
𝒚
Model
Training data 𝒙
25
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Given a new kind of data
Category 1
Feature Label
Plot data
Category 0
Category 0
Category 1
Feature
Assign numbers
to categories
Feature Label
A line is not suitable
for this data
Category 0
Category 1
Feature 26
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Sigmoid function
1
𝜎(𝑢) =
1 + 𝑒 −𝑧 𝜎2
𝑧 ∈ −∞ +∞
𝜎(𝑢) ∈ 0 1 𝜎
Property 𝜎1
∀𝑧1 𝑧2 ∈ 𝑎 𝑏 and 𝑧1 ≤ 𝑧2 𝑧1 𝑧2
→ 𝜎(𝑧1 ) ≤ 𝑧(𝑢1 )
𝑧
−∞ +∞
27
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
𝑧 = 𝑤𝑥 + 𝑏
𝑧 𝑧
𝑧 ∈ −∞ +∞
𝑥 𝑥
𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) = 𝜎 𝜎
1 + 𝑒 −𝑧
𝜎(𝑧) ∈ 0 1
𝑧 𝑧
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
𝑧 = 𝑤𝑥 + 𝑏
𝑧 𝑧
𝑧 ∈ −∞ +∞
𝑥 𝑥
𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) =
1 + 𝑒 −𝑧
𝜎 𝜎
𝜎(𝑧) ∈ 0 1
𝑧 𝑧
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Feature Label
Category 0
Category 1
𝜎
𝑧 𝜎(𝑧)
𝑥
𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) = 𝑧 = 0.535 ∗ 𝑥 − 0.654
1 + 𝑒 −𝑧
𝜎(𝑧) ∈ 0 1
30
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Feature Label
Category 0
Category 1
𝜎
𝑧 𝜎(𝑧)
𝑥
𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) =
1 + 𝑒 −𝑧 𝑧 = 2.331 ∗ 𝑥 − 5.156
𝜎(𝑧) ∈ 0 1
31
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Feature Label
𝑧(𝑥)
Category 0
Category 1
𝜎(𝑧)
𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) =
1 + 𝑒 −𝑧
𝜎(𝑧) ∈ 0 1
𝑦 = 𝑥2
𝑥
1 𝑦 = 2𝑥
𝑦=
2
𝑦 = log(𝑥)
𝑦 = −log(1 − 𝑥)
𝑦 = −log(𝑥)
𝑦 = log(1 − 𝑥)
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Loss function 𝑒𝑟𝑟𝑜𝑟
with y = 0
Feature Label
𝑧 = 𝑤𝑥 + 𝑏
Category 0 1
𝜎(𝑧) = -log(1-𝑦)
ො
1 + 𝑒 −𝑧
Category 1
𝜎(𝑧) ∈ 0 1
𝑦ො
𝑒𝑟𝑟𝑜𝑟
with y = 1
if y = 1 How to
𝐿(𝑦)
ො = −log(𝑦)
ො
remove if? -log(𝑦)
ො
if y = 0
𝐿(𝑦)
ො = −log(1 − 𝑦)
ො
𝑦ො
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Loss function 𝑒𝑟𝑟𝑜𝑟
with y = 0
Feature Output Label
if y = 0
-log(1-𝑦)
ො
𝐿(𝑦)
ො = −log(1 − 𝑦)
ො
if y = 1 𝑦ො
𝐿(𝑦)
ො = −log(𝑦)
ො 𝑒𝑟𝑟𝑜𝑟
with y = 1
Binary cross-entropy
-log(𝑦)
ො
L(y, 𝑦)
ො = −𝑦log𝑦ො − 1 − 𝑦 log 1 − 𝑦ො
𝑦ො
Introduce the loss function in another way
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Given a new kind of data
Sigmoid function
Feature Label could fit the data
𝑧 = 𝑤𝑥 + 𝑏 1
1 1 + 𝑒 −𝜽
𝑇𝒙
Category 0 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
Category 1
𝑦ො ∈ 0 1
Feature
Assign numbers
to categories Error error =1-𝑦ො
Feature Label if 𝑦 = 1
error = 1 − 𝑦ො For some 𝜽
if 𝑦 = 0
Category 0 error = 𝑦ො
error = 𝑦ො
Category 1 Feature
36
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Construct loss
error =1-𝑦ො
belief = 1 − 𝑦ො
belief = 𝑦ො
error = 𝑦ො
Error Belief
if 𝑦 = 1 if 𝑦 = 1
error = 1 − 𝑦ො belief = 𝑦ො
if 𝑦 = 0 if 𝑦 = 0
error = 𝑦ො belief = 1 − 𝑦ො
𝑃 = 𝑦ො 𝑦 1 − 𝑦ො 1−𝑦
log_belief = log𝑃
belief = 1 − 𝑦ො
if 𝑦 = 1
belief = 𝑦ො L(𝑦,
ො 𝑦) = −𝑦log𝑦ො − 1 − 𝑦 log 1 − 𝑦ො
if 𝑦 = 0 Binary cross-entropy
belief = 1 − 𝑦ො
𝑃 = 𝑦ො 𝑦 1 − 𝑦ො 1−𝑦
38
Logarithm Ứng dụng trong Machine Learning
Công thức phổ biến
log 𝑎 𝑎 = 1
log 𝑎 𝑥𝑦 = log 𝑎 𝑥 + log 𝑎 𝑦 𝑓 𝑥 𝑓 𝑥
Với data sample được thu nhập độc lập với nhau
argmax 𝑓 θ = argmax 𝑃θ sample_1 ∗ ⋯ ∗ 𝑃θ sample_𝑛
θ θ
Ví trí cực đại của hàm 𝒇 𝜽 và 𝐥𝐨𝐠𝒇 𝜽 không thay đổi
Dùng hàm log
argmax log𝑓 θ = argmax[log 𝑃θ sample_1 + ⋯ + log𝑃θ sample_𝑛 ]
θ θ
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Construct loss 𝑛
-log(1-𝑦)
ො -log(𝑦)
ො
𝑦ො 𝑦ො 41
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Feature Label Feature Label
𝑧 = 𝜽𝑇 𝒙 = 𝒙𝑇 𝜽
Category 0 1 Category 0
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
Category 1 Category 1
1 1
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
42
Outline
➢ Optimization Review
➢ Linear Regression Review
➢ Logistic Regression
➢ Examples
➢ Vectorization
➢ Implementation (optional)
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝜽𝑇 = [𝑏 𝑤]
1) Pick a sample (𝑥, 𝑦) from training data 𝑥
𝒙𝑇 = [1 𝑥]
2) Compute output 𝑦ො
𝑧 = 𝑤𝑥 + 𝑏 Model
1
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧 𝑏 𝑤
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
4) Compute derivative 𝑧 = 𝑤𝑥 + 𝑏
𝜕𝐿 𝜕𝐿
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦)
𝜕𝑤 𝜕𝑏 1 Label
5) Update parameters 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝜕𝐿 𝑦
𝑤 =𝑤−𝜂 𝜕𝐿
𝜕𝑤 𝑏 =𝑏−𝜂
𝜕𝑏 Loss
−ylogොy−(1−y)log(1−ොy )
43
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥 = 1.4 𝑥
Dataset
Model
𝑏 𝑤
0.1 -0.1
𝑧 = 𝑤𝑥 + 𝑏 𝑧 = −0.0399
1 𝒚= 0 1
𝒙= 𝑦=0
1.4 𝑦ො = 0.49 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦
Loss
L = 0.6733 −ylogොy−(1−y)log(1−ොy )
44
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝜂 = 0.01 𝑥 = 1.4 𝑥
Dataset
Model
𝑏 𝑤
𝑏 = 0.1 − 𝜂0.49=0.095
0.1 -0.1
𝑤 = −0.1 − 𝜂0.686=−0.1068
𝑧 = 𝑤𝑥 + 𝑏 𝑧 = −0.0399
1 𝒚= 0 1
𝒙= 𝑦=0
1.4 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦ො = 0.49
𝑦
Loss
𝐿′𝑏 1 ∗ 0.49 0.49
= = −ylogොy−(1−y)log(1−ොy )
𝐿′w 1.4 ∗ 0.49 0.686
L = 0.6733 45
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥 = 1.4 𝑥
Dataset
Model
𝑏 𝑤
0.095 −0.1068
𝑧 = 𝑤𝑥 + 𝑏 𝑧 = −0.0545
𝒙=
1 𝒚= 0 1 𝑦=0
1.4 𝑦ො = 0.486 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦
Loss
previous L = 0.6733 L = 0.666 −ylogොy−(1−y)log(1−ොy )
46
Another example
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝜽𝑇 = [𝑏 𝑤1 𝑤2 ]
1) Pick a sample (𝑥, 𝑦) from training data 𝑥1 𝑥2
𝒙𝑇 = [1 𝑥1 𝑥2 ]
2) Compute output 𝑦ො
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 Model
1
𝑦ො = 𝜎(𝑧) = 𝑏 𝑤1 𝑤2
1 + 𝑒 −𝑧
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏
4) Compute derivative
𝜕𝐿 𝜕𝐿
= 𝑥𝑖 (𝑦ො − 𝑦) = (𝑦ො − 𝑦) 1
𝜕𝑤𝑖 𝜕𝑏 Label
𝑦ො = 𝜎(𝑧) =
5) Update parameters 1 + 𝑒 −𝑧
𝑦
𝜕𝐿 𝜕𝐿
𝑤𝑖 = 𝑤𝑖 − 𝜂 𝑏 =𝑏−𝜂 Loss
𝜕𝑤𝑖 𝜕𝑏
−ylogොy−(1−y)log(1−ොy )
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥1 = 1.4 𝑥2 = 0.2
𝑥1 𝑥2
Dataset
Model
𝑏 𝑤1 𝑤2
0.1 0.5 -0.1
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑧 = 0.78
1
𝒙 = 1.4 𝒚= 0
0.2
1 Label
𝑦ො = 0.6856 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦 𝑦=0
Loss
𝐿 = 1.1573 −ylogොy−(1−y)log(1−ොy )
48
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥1 = 1.4 𝑥1 𝑥2 𝑥2 = 0.2
Dataset 𝜂 = 0.01
Model
𝑏 𝑤1 𝑤2
𝑏 = 0.1 − 𝜂0.6856 0.1 0.5 -0.1
=0.0931 𝐿′𝑤2
𝐿′𝑏 𝐿′𝑤1
𝐿 = 1.1573
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥1 = 1.4 𝑥1 𝑥2 𝑥2 = 0.2
Dataset 𝜂 = 0.01
Model
𝑏 𝑤1 𝑤2
𝑏 = 0.1 − 𝜂0.6856 0.0931 0.4904 -0.1013
=0.0931 𝐿′𝑤2
𝐿′𝑏 𝐿′𝑤1
𝐿 = 1.1573
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥1 = 1.4 𝑥1 𝑥2 𝑥2 = 0.2
Dataset
Model
𝑏 𝑤1 𝑤2
0.0931 0.4904 -0.1013
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑧 = 0.75
1
𝒙 = 1.4 𝒚= 0
0.2 1 𝑦=0
𝑦ො = 0.6812 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦
Loss
−ylogොy−(1−y)log(1−ොy )
1 2
T 2 = 4
1 2 1 3
2 *
3 6
3 4 2 4
52
Review
Dot product
𝑣1 𝑢1
𝑣Ԧ = … 𝑢= …
𝑣𝑛 𝑢𝑛
𝑣Ԧ ∙ 𝑢 = 𝑣1 × 𝑢1 + ⋯ + 𝑣𝑛 × 𝑢𝑛
v w result
1 2 2 = 8
53
AI VIETNAM Feature Label
All-in-One Course
Vectorization
1) Pick a sample (𝑥, 𝑦) from training data
𝑥 𝑦
2) Compute the output 𝑦ො
𝑧 = 𝑤𝑥 + 𝑏 Traditional
1 1 𝑏
𝑧 = 𝑤𝑥 + 𝑏 𝒙= 𝜽=
𝑦ො = 𝜎(𝑧) = 𝑥 𝑤
1 + 𝑒 −𝑧
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
4) Compute derivative 𝑏 → 𝜽𝑇 = 𝑏 𝑤
𝜽=
𝑤
𝜕𝐿 𝜕𝐿
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦)
𝜕𝑤 𝜕𝑏
1
5) Update parameters 𝑧 = 𝑤𝑥 + 𝑏1 = 𝑏 𝑤 = 𝜽𝑇 𝒙
𝑥
𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂
𝜕𝑤 𝜕𝑏
𝜂 is learning rate dot product 54
AI VIETNAM
All-in-One Course
Vectorization
1) Pick a sample (𝑥, 𝑦) from training data
1 𝑏
2) Compute the output 𝑦ො 𝑧 = 𝑤𝑥 + 𝑏 𝒙= 𝜽=
𝑥 𝑤
𝑧 = 𝑤𝑥 + 𝑏 Traditional
1 1
𝑦ො = 𝜎(𝑧) = 𝑧= 𝜽𝑇 𝒙 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
1 + 𝑒 −𝑧
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
4) Compute derivative ො 𝑦 = (𝑦ො − 𝑦)2
𝐿 𝑦,
𝜕𝐿 𝜕𝐿
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦)
𝜕𝑤 𝜕𝑏
numbers
5) Update parameters
𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂
𝜕𝑤 𝜕𝑏
𝜂 is learning rate What will we do?
55
1) Pick a sample (𝑥, 𝑦) from training data
𝜕𝐿 𝜕𝐿
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦) 𝜽 𝜽 𝛻𝜽 𝐿
𝜕𝑤 𝜕𝑏
5) Update parameters
𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂 → 𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿
𝜕𝑤 𝜕𝑏
𝜂 is learning rate 57
AI VIETNAM
All-in-One Course
Vectorization
1) Pick a sample (𝑥, 𝑦) from training data 1) Pick a sample (𝑥, 𝑦) from training data
2) Compute output 𝑦ො
1
𝑧= 𝜽𝑇 𝒙 = 𝒙𝑇 𝜽 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
𝛻𝜽 𝐿 = 𝒙(𝑦ො − 𝑦)
5) Update parameters
𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿
𝜂 is learning rate 59
1 𝑏 0.1
1 1 Given 𝜽 = 𝑤1 = 0.5
𝒙 = 𝑥1 = 1.4 𝑤2 −0.1
𝑥2 0.2 𝜂 = 0.01
Dataset
Input 𝒙
0.1 Model
1) Pick a sample (𝑥, 𝑦) from training data 𝜽 = 0.5
−0.1
2) Compute output 𝑦ො Label
1
𝑧= 𝜽𝑇 𝒙 = 𝒙𝑇 𝜽 𝑦ො = 𝜎(𝑧) = 𝑦ො = 𝜎 𝜽𝑇 𝒙 = 0.6856 𝑦=0
1 + 𝑒 −𝑧
3) Compute loss
Loss
3
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy ) 𝐿 = 1.1573
4 ′
1 0.6856 𝐿 𝑏
4) Compute derivative
𝛻𝜽 𝐿 = 𝒙 𝑦ො − 𝑦 = 1.4 0.6856 = 0.9599 = 𝐿′𝑤1
𝛻𝜽 𝐿 = 𝒙(𝑦ො − 𝑦) 0.2 0.1371 𝐿′𝑤2
5 0.1 0.6856 0.093
5) Update parameters
𝛉 − ηL′𝛉 = 0.5 − η 0.9599 = 0.499
𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿 −0.1 0.1371 −0.101
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
5) Update parameters
𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿
𝜂 is learning rate
Demo
61
1) Pick a sample (𝑥, 𝑦) from training data
2) Compute output 𝑦ො
𝑧 = 𝜽𝑇 𝒙
1
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
3) Compute loss
𝐿 𝜽 = −ylogොy−(1−y)log(1−ොy )
4) Compute derivative
𝛻𝜽 𝐿 = 𝐱(ොy − 𝑦)
5) Update parameters
𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿
𝜂 is learning rate