You are on page 1of 71

AI VIETNAM

All-in-One Course

From Linear Regression


to Logistic Regression

Quang-Vinh Dinh
PhD in Computer Science

Year 2023
Outline
➢ Optimization Review
➢ Linear Regression Review
➢ Logistic Regression
➢ Examples
➢ Vectorization
➢ Implementation (optional)
AI VIETNAM
All-in-One Course
Optimization
❖ Gradient descent
𝐽
𝐽

𝐝
𝐉 𝐱𝟐 < 𝟎 𝐝
𝐝𝐱
J(x) 𝐝𝐱
𝐉 𝐱𝟏 > 𝟎

∆𝐽
∆𝑥 𝑥2 𝑥op 𝑥1 𝑥
𝑥 𝑥 + ∆𝑥
𝑥 d
x𝑛𝑒𝑤 = x𝑜𝑙𝑑 − 𝜂 J x𝑜𝑙𝑑 Derivate at x𝑜𝑙𝑑
dx
𝑑 𝐽 𝑥 + ∆𝑥 − 𝐽(𝑥 )
𝐽(𝑥) = lim
𝑑𝑥 ∆𝑥→0 ∆𝑥 learning rate 1
Di chuyển 𝜽 ngược hướng đạo hàm
Optimization d
J 𝜃 >0
d𝜃
Dịch chuyển 𝜃
❖ Gradient descent J 𝜃 về phía trái
d
J 𝜃 >0
d𝜃
Khởi tạo giá trị 𝜽 Dịch chuyển 𝜃
về phía trái

d
J 𝜃 >0
d𝜃 𝜃 value
𝐉 𝜃
Dịch chuyển 𝜃 Cứ tiếp tục di chuyển ngược
về phía trái
hướng đạo hàm

J 𝜃

𝜃 value d
J 𝜃 >0
d𝜃
d
J 𝜃 <0 Dịch chuyển 𝜃 về phía trái
d𝜃
Dịch chuyển 𝜃
về phía phải 𝜃 value 2
AI VIETNAM
All-in-One Course
Optimization
❖ Square function
𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥𝑡−1 )

𝑓 𝑥 = 𝑥2
Initialize x

Compute
derivative at x

Move x
opposite to dx
−100 ≤ 𝑥 ≤ 100
𝑥∈ℕ
3
Optimization
❖ Square function
𝑓 𝑥 = 𝑥2

𝑥0 = 99.0
𝜂 = 0.1

𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥)

4
Optimization
❖ Square function
𝑓 𝑥 = 𝑥2

𝑥0 = 99.0
𝜂 = 0.001

𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥)

5
Optimization
❖ Square function
𝑓 𝑥 = 𝑥2

𝑥0 = 99.0
𝜂 = 0.8

𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥)

6
Optimization
❖ Square function
𝑓 𝑥 = 𝑥2

𝑥0 = 99.0
𝜂 = 1. 1

𝑥𝑡 = 𝑥𝑡−1 − 𝜂𝑓 ′ (𝑥)

7
AI VIETNAM
All-in-One Course
Optimization
❖ Optimization: 2D function

𝑓 𝑥, 𝑦 = 𝑥 2 + 𝑦 2

−100 ≤ 𝑥, 𝑦 ≤ 100
𝑥, 𝑦 ∈ ℕ

8
AI VIETNAM
𝜂 = 1.0
All-in-One Course
Derivative
𝑥0 = 3.0 𝑦0 = 4.0
❖ Optimization: 2D function 𝜕𝑓(𝑥0 , 𝑦0 ) 𝜕𝑓(𝑥0 , 𝑦0 )
= 6.0 = 8.0
𝑓 𝑥, 𝑦 = 𝑥 2 + 𝑦 2 𝜕𝑥 𝜕y
𝑥1 = 2.0 𝑦1 = 3.0
−100 ≤ 𝑥, 𝑦 ≤ 100
𝜕𝑓(𝑥1 , 𝑦1 ) 𝜕𝑓(𝑥1 , 𝑦1 )
𝑥, 𝑦 ∈ ℕ = 4.0 = 6.0
𝜕𝑥 𝜕y
𝑥2 = 1.0 𝑦2 = 2.0

𝜕𝑓(𝑥2 , 𝑦2 ) 𝜕𝑓(𝑥2 , 𝑦2 )
= 2.0 = 4.0
𝜕𝑥 𝜕y
𝜕𝑓(𝑥, 𝑦)
𝑥 =𝑥−𝜂 𝑥3 = 0.0 𝑦3 = 1.0
𝜕𝑥
𝜕𝑓(𝑥3 , 𝑦3 ) 𝜕𝑓(𝑥3 , 𝑦3 )
= 0.0 = 0.0
𝜕𝑓(𝑥, 𝑦) 𝜕𝑥 𝜕y
𝑦 =𝑦−𝜂 𝑥4 = 0.0 𝑦4 = 0.0
𝜕𝑦
9
AI VIETNAM
All-in-One Course
Optimization
❖ For composite function 2
𝑓 𝑥 = 2𝑥 − 1 𝑔 𝑓 = 𝑓−3
𝑑
𝑓 𝑥
𝑑𝑥

𝑥 𝑓

𝑥 𝑓 𝑔
𝑔 𝑓 𝑥
𝑑 𝑑
𝑓 𝑥 𝑔 𝑓
𝑑𝑥 𝑑𝑓 𝑥 𝑔

𝑓 𝑥 𝑔 𝑓
𝑑 𝑑 𝑑
𝑔 𝑓 𝑥 = 𝑔 𝑓 ∗ 𝑓 𝑥
𝑑𝑥 𝑑𝑓 𝑑𝑥
AI VIETNAM
All-in-One Course
Optimization
❖ For composite function

𝑑
𝑓 𝑥 𝑓 𝑥 = 2𝑥 − 1
𝑑𝑥
2
𝑔 𝑓 = 𝑓−3
𝑥 𝑓

𝑔 𝑥 = 2𝑥 − 1 − 3 2

= 2𝑥 − 4 2
𝑥 𝑓 𝑔

𝑑 𝑑
𝑓 𝑥 𝑔 𝑓
𝑑𝑥 𝑑𝑓 𝑔′ 𝑥 = 4 2𝑥 − 4

𝑑 𝑑 𝑑 = 8𝑥 − 16
𝑔 𝑓 𝑥 = 𝑔 𝑓 ∗ 𝑓 𝑥
𝑑𝑥 𝑑𝑓 𝑑𝑥
11
AI VIETNAM
All-in-One Course
Optimization
❖ For composite function and chain rule
𝑓 𝑥 = 2𝑥 − 1
𝑑 2
𝑓 𝑥 𝑔 𝑓 = 𝑓−3
𝑑𝑥

𝑥 𝑓 𝑓′ 𝑥 = 2
𝑔′ 𝑓 = 2 𝑓 − 3

𝑥 𝑓 𝑔
𝑑𝑔 𝑑𝑔 𝑑𝑓
𝑑 =
𝑑 𝑑𝑥 𝑑𝑓 𝑑𝑥
𝑓 𝑥 𝑔 𝑓
𝑑𝑥 𝑑𝑓
=2 𝑓−3 2
𝑑 𝑑 𝑑 = 4 2𝑥 − 1 − 3
𝑔 𝑓 𝑥 = 𝑔 𝑓 ∗ 𝑓 𝑥
𝑑𝑥 𝑑𝑓 𝑑𝑥 = 8𝑥 − 16
12
Outline
➢ Optimization Review
➢ Linear Regression Review
➢ Logistic Regression
➢ Examples
➢ Vectorization
➢ Implementation (optional)
House Feature Label Feature Label
Price
Prediction

House price data House price data

price = 𝑤1 ∗ area + 𝑏1
price = 𝑤2 ∗ area + 𝑏2
price = 𝑤3 ∗ area + 𝑏3
price = w∗ area + 𝑏

if area=6.0, price=7.28 if area=6.0, price=?


AI VIETNAM
All-in-One Course
Linear Regression
❖ Area-based house price prediction
w = -0.34
b = 0.04
predicted_price = w ∗ area + 𝑏
error = predicted_price − real_price 2

𝑦ො = 𝑤𝑥 + 𝑏

𝐿(𝑦ො , 𝑦) = 𝑦ො − 𝑦 2

14
AI VIETNAM
All-in-One Course
Linear Regression
❖ Area-based house price prediction
w = 1.17
b = 0.26
predicted_price = w ∗ area + 𝑏
error = predicted_price − real_price 2

𝑦ො = 𝑤𝑥 + 𝑏
𝐿(𝑦ො , 𝑦) = 𝑦ො − 𝑦 2

15
AI VIETNAM
All-in-One Course
Linear Regression
❖ Area-based house
𝑦ො = 𝑤𝑥 + 𝑏 How to change w and b
price prediction
𝐿(𝑦,
ො 𝑦) = 𝑦ො − 𝑦 2 so that 𝐿(𝑦,
ො 𝑦) reduces

w = -0.34
w = 1.17
b = 0.04 b = 0.26

16
AI VIETNAM
𝑦ො = 𝑤𝑥 + 𝑏
All-in-One Course
Linear Regression 𝐿(𝑦,
ො 𝑦) = 𝑦ො − 𝑦 2

❖ Understanding the loss function How to change w and b


so that 𝐿(𝑦,
ො 𝑦𝑖 ) reduces

Different b values with a fixed w value Different w values with a fixed b value
17
AI VIETNAM
All-in-One Course
Linear Regression
Initialize 𝑤 and 𝑏
Linear equation
𝑦ො = 𝑤𝑥 + 𝑏
Compute output 𝑦ො
where 𝑦ො is a predicted value,
𝑤 and 𝑏 are parameters Training
data Compute loss
and 𝑥 is input feature Pick a sample
(x, y)
Compute derivative
Error (loss) computation
for each parameter
Idea: compare predicted values 𝑦ො and label values y
Squared loss
Update parameters
ො 𝑦) = (𝑦ො − 𝑦)2
L(𝑦, x=area and y=price

18
AI VIETNAM
All-in-One Course
Linear Regression

Linear equation Find better w and b


Use gradient descent to minimize the loss function
𝑦ො = 𝑤𝑥 + 𝑏
Compute derivate for each parameter
where 𝑦ො is a predicted value,
𝜕𝐿 𝜕𝐿 𝜕𝑦ො
𝑤 and 𝑏 are parameters = = 2𝑥(𝑦ො − 𝑦)
𝜕𝑤 𝜕𝑦ො 𝜕𝑤
and 𝑥 is input feature
𝜕𝐿 𝜕𝐿 𝜕𝑦ො
= = 2(𝑦ො − 𝑦)
𝜕𝑏 𝜕𝑦ො 𝜕𝑏
Error (loss) computation
Update parameters
Idea: compare predicted values 𝑦ො and label values y
𝜕𝐿 𝜕𝐿
Squared loss 𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂
𝜕𝑤 𝜕𝑏
ො 𝑦) = (𝑦ො − 𝑦)2
L(𝑦, 𝜂 is learning rate
Feature Label
Linear Regression
❖ Example
Initialize 𝑤 and 𝑏

𝑥 Input Compute output 𝑦ො

Model
Parameters Training
data Compute loss
𝑏 𝑤 Pick a sample
(x, y)
Compute derivative
Label for each parameter
𝑦ො = 𝑤𝑥 + 𝑏
𝑦
Update parameters
(𝑦ො − 𝑦)2 x=area and y=price
Loss 20
AI VIETNAM
All-in-One Course
Linear Regression
Feature Label Initialize 1
b=0.04 and Input 𝑥 = 6.7
w=-0.34
Given
sample Model
data Parameters
𝑏 = 0.04 w = -0.34

House price prediction


Label

𝑦ො = 𝑥𝑤 + 𝑏 = -2.238 𝑦 = 9.1

Forward Loss
propagation 2
𝑦ො − 𝑦 = 128.5
21
AI VIETNAM
All-in-One Course
Linear Regression
2 3
Input 𝑥 = 6.7 Input 𝑥 = 6.7
Backpropagation Forward
propagation
Model Model
Parameters 𝜂 = 0.01 Parameters
𝑏 = 0.26676 𝑤 = 1.17929 𝑏 = 0.26676 𝑤 = 1.17929

𝜕𝐿 𝜕𝐿 𝜕𝐿 𝜕𝐿
𝑏 =𝑏−𝜂 w=w−𝜂 𝑏 =𝑏−𝜂 w=w−𝜂
𝜕𝑏 𝜕𝑤 𝜕𝑏 𝜕𝑤
Label Label

𝑦ො = 𝑥𝑤 + 𝑏 = -2.238 𝑦 = 9.1 𝑦ො = 𝑥𝑤 + 𝑏 = -2.238 𝑦 = 9.1

𝜕𝐿
ෝ−𝑦
= 2𝑥 𝑦
𝜕𝑤
= −151.9292
𝜕𝐿
ෝ−𝑦
=2 𝑦 Loss New w and b help Loss
𝜕𝑏
= −22.676 𝑦ො − 𝑦 2
= 128.5 the loss reduce 𝑦ො − 𝑦 2
= 0.868
AI VIETNAM
All-in-One Course
Linear Regression
❖ Toy example
Model prediction before and after the first update

w = -0.34 b = 0.04 L = 128.55 w = 1.179292 b = 0.26676 L = 0.868

Before updating After updating


23
AI VIETNAM
All-in-One Course
Linear Regression
❖ Summary (one feature and one sample)
1) Pick a sample (𝑥, 𝑦) from training data

𝑥 Input 2) Compute the output 𝑦ො


𝑦ො = 𝑤𝑥 + 𝑏
Model
Parameters 3) Compute loss
𝑏 𝑤 ො 𝑦) = (𝑦ො − 𝑦)2
𝐿(𝑦,
4) Compute derivative
Label 𝜕𝐿 𝜕𝐿
𝑦ො = 𝑤𝑥 + 𝑏 = 2𝑥(𝑦ො − 𝑦) = 2(𝑦ො − 𝑦)
𝑦 𝜕𝑤 𝜕𝑏
5) Update parameters

(𝑦ො − 𝑦)2 𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂
𝜕𝑤 𝜕𝑏
Loss 𝜂 is learning rate
24
Outline
➢ Optimization Review
➢ Linear Regression Review
➢ Logistic Regression
➢ Examples
➢ Vectorization
➢ Implementation (optional)
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Linear regression
𝒚

Area-based House Price Data


error
Feature Label
error
construct 𝑦ො = 𝜽𝑇 𝒙 = 𝑤𝑥 + 𝑏
error
𝑦ො ∈ −∞ +∞ error

Model

Training data 𝒙

Find the line 𝑦ො = 𝜽𝑇 𝒙 that is best fitting to given


data, then use 𝑦ො to predict for new data

25
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Given a new kind of data
Category 1
Feature Label
Plot data

Category 0

Category 0
Category 1
Feature
Assign numbers
to categories

Feature Label
A line is not suitable
for this data
Category 0

Category 1
Feature 26
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Sigmoid function

1
𝜎(𝑢) =
1 + 𝑒 −𝑧 𝜎2
𝑧 ∈ −∞ +∞
𝜎(𝑢) ∈ 0 1 𝜎

Property 𝜎1

∀𝑧1 𝑧2 ∈ 𝑎 𝑏 and 𝑧1 ≤ 𝑧2 𝑧1 𝑧2
→ 𝜎(𝑧1 ) ≤ 𝑧(𝑢1 )
𝑧
−∞ +∞

27
AI VIETNAM
All-in-One Course
Idea of Logistic Regression

𝑧 = 𝑤𝑥 + 𝑏
𝑧 𝑧
𝑧 ∈ −∞ +∞

𝑥 𝑥

𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) = 𝜎 𝜎
1 + 𝑒 −𝑧
𝜎(𝑧) ∈ 0 1

𝑧 𝑧
AI VIETNAM
All-in-One Course
Idea of Logistic Regression

𝑧 = 𝑤𝑥 + 𝑏
𝑧 𝑧
𝑧 ∈ −∞ +∞

𝑥 𝑥

𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) =
1 + 𝑒 −𝑧
𝜎 𝜎
𝜎(𝑧) ∈ 0 1

𝑧 𝑧
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Feature Label

Category 0

Category 1
𝜎

𝑧 𝜎(𝑧)

𝑥
𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) = 𝑧 = 0.535 ∗ 𝑥 − 0.654
1 + 𝑒 −𝑧
𝜎(𝑧) ∈ 0 1
30
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Feature Label

Category 0

Category 1
𝜎

𝑧 𝜎(𝑧)

𝑥
𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) =
1 + 𝑒 −𝑧 𝑧 = 2.331 ∗ 𝑥 − 5.156
𝜎(𝑧) ∈ 0 1
31
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Feature Label
𝑧(𝑥)

Category 0

Category 1
𝜎(𝑧)

𝑧 = 𝑤𝑥 + 𝑏
1
𝜎(𝑧) =
1 + 𝑒 −𝑧
𝜎(𝑧) ∈ 0 1

How to evaluate the performance of a model? 𝑥


32
❖ Suggested Functions 1
𝑥
𝑦=
6 𝑦 = 𝑒𝑥

𝑦 = 𝑥2
𝑥
1 𝑦 = 2𝑥
𝑦=
2

𝑦 = log(𝑥)

𝑦 = −log(1 − 𝑥)
𝑦 = −log(𝑥)
𝑦 = log(1 − 𝑥)
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Loss function 𝑒𝑟𝑟𝑜𝑟
with y = 0
Feature Label

𝑧 = 𝑤𝑥 + 𝑏
Category 0 1
𝜎(𝑧) = -log(1-𝑦)

1 + 𝑒 −𝑧
Category 1
𝜎(𝑧) ∈ 0 1
𝑦ො
𝑒𝑟𝑟𝑜𝑟
with y = 1

if y = 1 How to
𝐿(𝑦)
ො = −log(𝑦)

remove if? -log(𝑦)

if y = 0
𝐿(𝑦)
ො = −log(1 − 𝑦)

𝑦ො
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Loss function 𝑒𝑟𝑟𝑜𝑟
with y = 0
Feature Output Label

if y = 0
-log(1-𝑦)

𝐿(𝑦)
ො = −log(1 − 𝑦)

if y = 1 𝑦ො
𝐿(𝑦)
ො = −log(𝑦)
ො 𝑒𝑟𝑟𝑜𝑟
with y = 1

Binary cross-entropy
-log(𝑦)

L(y, 𝑦)
ො = −𝑦log𝑦ො − 1 − 𝑦 log 1 − 𝑦ො

𝑦ො
Introduce the loss function in another way
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Given a new kind of data
Sigmoid function
Feature Label could fit the data
𝑧 = 𝑤𝑥 + 𝑏 1
1 1 + 𝑒 −𝜽
𝑇𝒙
Category 0 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
Category 1
𝑦ො ∈ 0 1
Feature
Assign numbers
to categories Error error =1-𝑦ො

Feature Label if 𝑦 = 1
error = 1 − 𝑦ො For some 𝜽
if 𝑦 = 0
Category 0 error = 𝑦ො
error = 𝑦ො
Category 1 Feature
36
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Construct loss

error =1-𝑦ො
belief = 1 − 𝑦ො

belief = 𝑦ො

error = 𝑦ො

Error Belief
if 𝑦 = 1 if 𝑦 = 1
error = 1 − 𝑦ො belief = 𝑦ො
if 𝑦 = 0 if 𝑦 = 0
error = 𝑦ො belief = 1 − 𝑦ො

𝑃 = 𝑦ො 𝑦 1 − 𝑦ො 1−𝑦

Minimize error ~ maximize belief ~ Minimize (-belief) 37


AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Construct loss One sample
belief = 𝑃

log_belief = log𝑃
belief = 1 − 𝑦ො

log_belief = 𝑦log𝑦ො + 1 − 𝑦 log 1 − 𝑦ො


belief = 𝑦ො

loss = −log _belief

Belief = −[𝑦log𝑦ො + 1 − 𝑦 log 1 − 𝑦ො ]

if 𝑦 = 1
belief = 𝑦ො L(𝑦,
ො 𝑦) = −𝑦log𝑦ො − 1 − 𝑦 log 1 − 𝑦ො
if 𝑦 = 0 Binary cross-entropy
belief = 1 − 𝑦ො
𝑃 = 𝑦ො 𝑦 1 − 𝑦ො 1−𝑦
38
Logarithm Ứng dụng trong Machine Learning
Công thức phổ biến
log 𝑎 𝑎 = 1
log 𝑎 𝑥𝑦 = log 𝑎 𝑥 + log 𝑎 𝑦 𝑓 𝑥 𝑓 𝑥

Hàm log là hàm đơn điệu (~thứ tự


không thay đổi)
∀𝑥1 𝑥2 ∈ 𝑎 𝑏 và 𝑥1 ≤ 𝑥2
→ log(𝑥1 ) ≤ log(𝑥1 )

Tìm bộ tham số 𝛉 cho một model sao


cho model mô tả được dữ liệu training log 𝑓 𝑥
log 𝑓 𝑥
argmax 𝑓 θ = argmax 𝑃θ training data
θ

Với data sample được thu nhập độc lập với nhau
argmax 𝑓 θ = argmax 𝑃θ sample_1 ∗ ⋯ ∗ 𝑃θ sample_𝑛
θ θ
Ví trí cực đại của hàm 𝒇 𝜽 và 𝐥𝐨𝐠𝒇 𝜽 không thay đổi
Dùng hàm log
argmax log𝑓 θ = argmax[log 𝑃θ sample_1 + ⋯ + log𝑃θ sample_𝑛 ]
θ θ
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Construct loss 𝑛

belief = ෑ 𝑃𝑖 since iid


𝑖=1
𝑛
belief = 1 − 𝑦ො N samples
log_belief = ෍ log𝑃𝑖
𝑖=1
belief = 𝑦ො 𝑛

log_belief = ෍ 𝑦𝑖 log𝑦ො𝑖 + 1 − 𝑦𝑖 log 1 − 𝑦ො𝑖


𝑖=1

Belief loss = −log _belief


𝑛

= − ෍ 𝑦𝑖 log𝑦ො𝑖 + 1 − 𝑦𝑖 log 1 − 𝑦ො𝑖


if 𝑦𝑖 = 1
𝑖=1
belief = 𝑦ො𝑖
if 𝑦𝑖 = 0 1
L= −𝒚𝑇 𝑙𝑜𝑔 𝒚
ෝ − (𝟏 − 𝒚𝑇 )𝑙𝑜𝑔 𝟏 − 𝒚

belief = 1 − 𝑦ො𝑖 𝑁
𝑦𝑖 1−𝑦𝑖 Binary cross-entropy
𝑃𝑖 = 𝑦ො𝑖 1 − 𝑦ො𝑖
40
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
❖ Construct loss 𝜕𝐿 𝜕𝐿 𝜕𝑦ො 𝜕𝑧 Derivative
=
𝜕𝜃𝑖 𝜕𝑦ො 𝜕𝑧 𝜕𝜃𝑖
Model and Loss 𝜕𝐿 𝑦 1−𝑦 𝑦ො − 𝑦
𝑧 = 𝑤𝑥 + 𝑏 =− + =
𝜕𝑦ො 𝑦ො 1 − 𝑦ො 𝑦(1
ො − 𝑦) ො
1
𝑦ො = 𝜎(𝑧) = 𝜕𝑦ො
1 + 𝑒 −𝑧 = 𝑦(1
ො − 𝑦)

𝜕𝑧 𝜕𝐿
= 𝑥𝑖 (𝑦ො − 𝑦)
L(𝑦ො − 𝑦) = −𝑦log 𝑦ො − (1 − 𝑦)log 1 − 𝑦ො 𝜕𝑧 𝜕𝜃𝑖
= 𝑥𝑖
𝜕𝜃𝑖
𝑒𝑟𝑟𝑜𝑟 𝑒𝑟𝑟𝑜𝑟
with y = 0 with y = 1

-log(1-𝑦)
ො -log(𝑦)

𝑦ො 𝑦ො 41
AI VIETNAM
All-in-One Course
Idea of Logistic Regression
Feature Label Feature Label

𝑧 = 𝜽𝑇 𝒙 = 𝒙𝑇 𝜽
Category 0 1 Category 0
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
Category 1 Category 1

1 1
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧

42
Outline
➢ Optimization Review
➢ Linear Regression Review
➢ Logistic Regression
➢ Examples
➢ Vectorization
➢ Implementation (optional)
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝜽𝑇 = [𝑏 𝑤]
1) Pick a sample (𝑥, 𝑦) from training data 𝑥
𝒙𝑇 = [1 𝑥]
2) Compute output 𝑦ො
𝑧 = 𝑤𝑥 + 𝑏 Model
1
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧 𝑏 𝑤
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )

4) Compute derivative 𝑧 = 𝑤𝑥 + 𝑏
𝜕𝐿 𝜕𝐿
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦)
𝜕𝑤 𝜕𝑏 1 Label
5) Update parameters 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝜕𝐿 𝑦
𝑤 =𝑤−𝜂 𝜕𝐿
𝜕𝑤 𝑏 =𝑏−𝜂
𝜕𝑏 Loss
−ylogොy−(1−y)log(1−ොy )
43
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥 = 1.4 𝑥
Dataset
Model
𝑏 𝑤
0.1 -0.1

𝑧 = 𝑤𝑥 + 𝑏 𝑧 = −0.0399

1 𝒚= 0 1
𝒙= 𝑦=0
1.4 𝑦ො = 0.49 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦
Loss
L = 0.6733 −ylogොy−(1−y)log(1−ොy )
44
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝜂 = 0.01 𝑥 = 1.4 𝑥
Dataset
Model
𝑏 𝑤
𝑏 = 0.1 − 𝜂0.49=0.095
0.1 -0.1
𝑤 = −0.1 − 𝜂0.686=−0.1068

𝑧 = 𝑤𝑥 + 𝑏 𝑧 = −0.0399

1 𝒚= 0 1
𝒙= 𝑦=0
1.4 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦ො = 0.49
𝑦
Loss
𝐿′𝑏 1 ∗ 0.49 0.49
= = −ylogොy−(1−y)log(1−ොy )
𝐿′w 1.4 ∗ 0.49 0.686
L = 0.6733 45
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥 = 1.4 𝑥
Dataset
Model
𝑏 𝑤
0.095 −0.1068

𝑧 = 𝑤𝑥 + 𝑏 𝑧 = −0.0545

𝒙=
1 𝒚= 0 1 𝑦=0
1.4 𝑦ො = 0.486 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦
Loss
previous L = 0.6733 L = 0.666 −ylogොy−(1−y)log(1−ොy )
46
Another example
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝜽𝑇 = [𝑏 𝑤1 𝑤2 ]
1) Pick a sample (𝑥, 𝑦) from training data 𝑥1 𝑥2
𝒙𝑇 = [1 𝑥1 𝑥2 ]
2) Compute output 𝑦ො
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 Model
1
𝑦ො = 𝜎(𝑧) = 𝑏 𝑤1 𝑤2
1 + 𝑒 −𝑧
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏
4) Compute derivative
𝜕𝐿 𝜕𝐿
= 𝑥𝑖 (𝑦ො − 𝑦) = (𝑦ො − 𝑦) 1
𝜕𝑤𝑖 𝜕𝑏 Label
𝑦ො = 𝜎(𝑧) =
5) Update parameters 1 + 𝑒 −𝑧
𝑦
𝜕𝐿 𝜕𝐿
𝑤𝑖 = 𝑤𝑖 − 𝜂 𝑏 =𝑏−𝜂 Loss
𝜕𝑤𝑖 𝜕𝑏
−ylogොy−(1−y)log(1−ොy )
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥1 = 1.4 𝑥2 = 0.2
𝑥1 𝑥2
Dataset
Model
𝑏 𝑤1 𝑤2
0.1 0.5 -0.1

𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑧 = 0.78
1
𝒙 = 1.4 𝒚= 0
0.2
1 Label
𝑦ො = 0.6856 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦 𝑦=0

Loss
𝐿 = 1.1573 −ylogොy−(1−y)log(1−ොy )
48
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥1 = 1.4 𝑥1 𝑥2 𝑥2 = 0.2
Dataset 𝜂 = 0.01
Model
𝑏 𝑤1 𝑤2
𝑏 = 0.1 − 𝜂0.6856 0.1 0.5 -0.1
=0.0931 𝐿′𝑤2
𝐿′𝑏 𝐿′𝑤1

𝑤1 = 0.5 − 𝜂0.9598 𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑧 = 0.78


1 =0.4990
𝒙 = 1.4 𝒚= 0 𝑦ො = 0.6856
0.2 1
𝑤2 = −0.1+𝜂0.1371 𝑦ො = 𝜎(𝑧) = 𝑦=0
1 + 𝑒 −𝑧
=−0.1013 𝑦
𝐿′𝑏 1 ∗ 0.6856 0.6856
′ Loss
𝐿𝑤1 = 1.4 ∗ 0.6856 = 0.9599
𝐿′𝑤2 0.2 ∗ 0.6856 0.1371 −ylogොy−(1−y)log(1−ොy )

𝐿 = 1.1573
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥1 = 1.4 𝑥1 𝑥2 𝑥2 = 0.2
Dataset 𝜂 = 0.01
Model
𝑏 𝑤1 𝑤2
𝑏 = 0.1 − 𝜂0.6856 0.0931 0.4904 -0.1013
=0.0931 𝐿′𝑤2
𝐿′𝑏 𝐿′𝑤1

𝑤1 = 0.5 − 𝜂0.9598 𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑧 = 0.78


1 =0.4990
𝒙 = 1.4 𝒚= 0 𝑦ො = 0.6856
0.2 1 𝑦=0
𝑤2 = −0.1+𝜂0.1371 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
=−0.1013 𝑦
𝐿′𝑏 1 ∗ 0.6856 0.6856
′ Loss
𝐿𝑤1 = 1.4 ∗ 0.6856 = 0.9599
𝐿′𝑤2 0.2 ∗ 0.6856 0.1371 −ylogොy−(1−y)log(1−ොy )

𝐿 = 1.1573
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic
𝑥1 = 1.4 𝑥1 𝑥2 𝑥2 = 0.2
Dataset
Model
𝑏 𝑤1 𝑤2
0.0931 0.4904 -0.1013

𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑏 𝑧 = 0.75
1
𝒙 = 1.4 𝒚= 0
0.2 1 𝑦=0
𝑦ො = 0.6812 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
𝑦
Loss
−ylogොy−(1−y)log(1−ොy )

previous 𝑳 = 1.1573 𝐿 = 1.1432 51


Outline
➢ Optimization Review
➢ Linear Regression Review
➢ Logistic Regression
➢ Examples
➢ Vectorization
➢ Implementation (optional)
AI VIETNAM
All-in-One Course
Review
Transpose
𝑣1
𝑣Ԧ = … 𝑣Ԧ 𝑇 = 𝑣1 … 𝑣𝑛
𝑣𝑛

T Multiply with a number


1
1 2 𝑢1 𝛼𝑢1
2
𝛼𝑢 = 𝛼 … = …
𝑢𝑛 𝛼𝑢𝑛

𝑎11 … 𝑎1𝑛 𝑎11 … 𝑎𝑚1


A = … … ... A𝑇 = … … …
𝑎𝑚1 … 𝑎𝑚𝑛 𝑎1𝑛 … 𝑎𝑚𝑛 data result

1 2
T 2 = 4
1 2 1 3
2 *
3 6
3 4 2 4
52
Review
Dot product

𝑣1 𝑢1
𝑣Ԧ = … 𝑢= …
𝑣𝑛 𝑢𝑛

𝑣Ԧ ∙ 𝑢 = 𝑣1 × 𝑢1 + ⋯ + 𝑣𝑛 × 𝑢𝑛

v w result

1 2 2 = 8

53
AI VIETNAM Feature Label
All-in-One Course
Vectorization
1) Pick a sample (𝑥, 𝑦) from training data
𝑥 𝑦
2) Compute the output 𝑦ො
𝑧 = 𝑤𝑥 + 𝑏 Traditional
1 1 𝑏
𝑧 = 𝑤𝑥 + 𝑏 𝒙= 𝜽=
𝑦ො = 𝜎(𝑧) = 𝑥 𝑤
1 + 𝑒 −𝑧
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
4) Compute derivative 𝑏 → 𝜽𝑇 = 𝑏 𝑤
𝜽=
𝑤
𝜕𝐿 𝜕𝐿
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦)
𝜕𝑤 𝜕𝑏
1
5) Update parameters 𝑧 = 𝑤𝑥 + 𝑏1 = 𝑏 𝑤 = 𝜽𝑇 𝒙
𝑥
𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂
𝜕𝑤 𝜕𝑏
𝜂 is learning rate dot product 54
AI VIETNAM
All-in-One Course
Vectorization
1) Pick a sample (𝑥, 𝑦) from training data
1 𝑏
2) Compute the output 𝑦ො 𝑧 = 𝑤𝑥 + 𝑏 𝒙= 𝜽=
𝑥 𝑤
𝑧 = 𝑤𝑥 + 𝑏 Traditional
1 1
𝑦ො = 𝜎(𝑧) = 𝑧= 𝜽𝑇 𝒙 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
1 + 𝑒 −𝑧
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
4) Compute derivative ො 𝑦 = (𝑦ො − 𝑦)2
𝐿 𝑦,
𝜕𝐿 𝜕𝐿
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦)
𝜕𝑤 𝜕𝑏
numbers
5) Update parameters
𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂
𝜕𝑤 𝜕𝑏
𝜂 is learning rate What will we do?
55
1) Pick a sample (𝑥, 𝑦) from training data

2) Compute the output 𝑦ො Traditional


Vectorization
𝑧 = 𝑤𝑥 + 𝑏
1
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧 𝑧 = 𝑤𝑥 + 𝑏 𝒙=
1
𝜽=
𝑏
3) Compute loss 𝑥 𝑤
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
4) Compute derivative 𝜕𝐿
= 𝑦ො − 𝑦 = 𝑦ො − 𝑦 × 1
𝜕𝐿 𝜕𝐿 𝜕𝑏
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦)
𝜕𝑤 𝜕𝑏 𝜕𝐿
= 𝑥 𝑦ො − 𝑦 = 𝑦ො − 𝑦 × 𝑥
5) Update parameters 𝜕𝑤
𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂
𝜕𝑤 𝜕𝑏
𝜕𝐿
𝑦ො − 𝑦 × 1 1
= 𝑦ො − 𝑦 = 𝑦ො − 𝑦 𝒙 = 𝜕𝑏 = 𝛻𝜽 𝐿 → 𝛻𝜽 𝐿 = 2𝒙(𝑦Ƹ − 𝑦)
𝑦ො − 𝑦 × 𝑥 𝑥 𝜕𝐿
𝜕𝑤
common factor 56
AI VIETNAM
All-in-One Course
Vectorization 𝑧 = 𝜽𝑇 𝒙
𝜕𝐿
1
𝒙=
𝑥 𝛻𝜽 𝐿 = 𝜕𝑏
𝜕𝐿
1) Pick a sample (𝑥, 𝑦) from training data 𝑏
𝜽= 𝜕𝑤
𝑤
2) Compute the output 𝑦ො
𝑧 = 𝑤𝑥 + 𝑏 Traditional
1 𝜕𝐿
𝑦ො = 𝜎(𝑧) = 𝑏 = 𝑏 − 𝜂
1 + 𝑒 −𝑧 𝜕𝑏
3) Compute loss
𝜕𝐿
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy ) 𝑤 = 𝑤 − 𝜂
𝜕𝑤
4) Compute derivative

𝜕𝐿 𝜕𝐿
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦) 𝜽 𝜽 𝛻𝜽 𝐿
𝜕𝑤 𝜕𝑏
5) Update parameters
𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂 → 𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿
𝜕𝑤 𝜕𝑏
𝜂 is learning rate 57
AI VIETNAM
All-in-One Course
Vectorization

1) Pick a sample (𝑥, 𝑦) from training data 1) Pick a sample (𝑥, 𝑦) from training data

2) Compute the output 𝑦ො 2) Compute output 𝑦ො


1 1
𝑧 = 𝑤𝑥 + 𝑏 𝑦ො = 𝜎(𝑧) = 𝑧= 𝜽𝑇 𝒙 = 𝒙𝑇 𝜽 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
3) Compute loss
3) Compute loss
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy ) 𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )
4) Compute derivative Traditional Vectorized
𝜕𝐿 𝜕𝐿 4) Compute derivative
= 𝑥(𝑦ො − 𝑦) = (𝑦ො − 𝑦)
𝜕𝑤 𝜕𝑏 𝛻𝜽 𝐿 = 𝒙(𝑦ො − 𝑦)
5) Update parameters
5) Update parameters
𝜕𝐿 𝜕𝐿
𝑤 =𝑤−𝜂 𝑏 =𝑏−𝜂 𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿
𝜕𝑤 𝜕𝑏
𝜂 is learning rate 𝜂 is learning rate
AI VIETNAM
All-in-One Course
Vectorization
❖ Implementation (using Numpy)

1) Pick a sample (𝑥, 𝑦) from training data

2) Compute output 𝑦ො
1
𝑧= 𝜽𝑇 𝒙 = 𝒙𝑇 𝜽 𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
3) Compute loss

𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy )

4) Compute derivative # Given X and y

𝛻𝜽 𝐿 = 𝒙(𝑦ො − 𝑦)

5) Update parameters
𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿
𝜂 is learning rate 59
1 𝑏 0.1
1 1 Given 𝜽 = 𝑤1 = 0.5
𝒙 = 𝑥1 = 1.4 𝑤2 −0.1
𝑥2 0.2 𝜂 = 0.01
Dataset
Input 𝒙

0.1 Model
1) Pick a sample (𝑥, 𝑦) from training data 𝜽 = 0.5
−0.1
2) Compute output 𝑦ො Label
1
𝑧= 𝜽𝑇 𝒙 = 𝒙𝑇 𝜽 𝑦ො = 𝜎(𝑧) = 𝑦ො = 𝜎 𝜽𝑇 𝒙 = 0.6856 𝑦=0
1 + 𝑒 −𝑧
3) Compute loss
Loss
3
𝐿(𝑦,
ො 𝑦) = −ylogොy−(1−y)log(1−ොy ) 𝐿 = 1.1573
4 ′
1 0.6856 𝐿 𝑏
4) Compute derivative
𝛻𝜽 𝐿 = 𝒙 𝑦ො − 𝑦 = 1.4 0.6856 = 0.9599 = 𝐿′𝑤1
𝛻𝜽 𝐿 = 𝒙(𝑦ො − 𝑦) 0.2 0.1371 𝐿′𝑤2
5 0.1 0.6856 0.093
5) Update parameters
𝛉 − ηL′𝛉 = 0.5 − η 0.9599 = 0.499
𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿 −0.1 0.1371 −0.101
AI VIETNAM
All-in-One Course
Logistic Regression-Stochastic

Dataset 1) Pick a sample (𝑥, 𝑦) from training data


2) Compute output 𝑦ො
𝑧 = 𝜽𝑇 𝒙
1
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
3) Compute loss
𝐿 𝜽 = −ylogොy−(1−y)log(1−ොy )
1
𝒙 = 1.4 𝒚= 0 4) Compute derivative
0.2
𝛻𝜽 𝐿 = 𝐱(ොy − 𝑦)

5) Update parameters
𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿
𝜂 is learning rate
Demo
61
1) Pick a sample (𝑥, 𝑦) from training data
2) Compute output 𝑦ො
𝑧 = 𝜽𝑇 𝒙
1
𝑦ො = 𝜎(𝑧) =
1 + 𝑒 −𝑧
3) Compute loss
𝐿 𝜽 = −ylogොy−(1−y)log(1−ොy )

4) Compute derivative

𝛻𝜽 𝐿 = 𝐱(ොy − 𝑦)

5) Update parameters
𝜽 = 𝜽 − 𝜂𝛻𝜽 𝐿
𝜂 is learning rate

You might also like