You are on page 1of 38

Artificial Neural Networks

Lecturer: Javier Machacuay


Mechanical-Electrical Engineer (UDEP)
MicroMasters Program in Statistics and Data Science Graduate (MIT)
𝐼𝑛𝑝𝑢𝑡𝑠 𝑂𝑢𝑡𝑝𝑢𝑡𝑠
Process /
System

𝑥 𝑦
𝑓 𝑦=𝑓 𝑥
Examples (Tabular data: Classification)

Process /
System
Examples (Tabular data: Regression)

Process /
System
Examples (Computer vision: Image Classification)

𝐶𝑎𝑡
Process /
System
Examples (Computer vision: Object detection)
Examples (Natural Language Processing)
𝐼𝑛𝑝𝑢𝑡:

𝑂𝑢𝑡𝑝𝑢𝑡:
𝐼𝑛𝑝𝑢𝑡𝑠 𝑂𝑢𝑡𝑝𝑢𝑡𝑠
Process /
System

𝑥 𝑦
𝑓 𝑦=𝑓 𝑥
𝐼𝑛𝑝𝑢𝑡𝑠 𝑂𝑢𝑡𝑝𝑢𝑡𝑠
Process /
System

𝐼𝑛𝑝𝑢𝑡𝑠
Machine 𝑃𝑟𝑜𝑐𝑒𝑠𝑠/𝑆𝑦𝑠𝑡𝑒𝑚
𝑂𝑢𝑡𝑝𝑢𝑡𝑠
Learning
𝐼𝑛𝑝𝑢𝑡𝑠
Artificial Neural 𝑃𝑟𝑜𝑐𝑒𝑠𝑠/𝑆𝑦𝑠𝑡𝑒𝑚
𝑂𝑢𝑡𝑝𝑢𝑡𝑠
Network

𝑥
=
Artificial Neural
𝑓መ
𝑦
Network
What is an
artificial
neural
network?
𝑧1 = 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
Artificial Neuron 𝑎1 = 𝑓 𝑧1

𝑥1 𝑥1

𝑥2 𝑥2
=
𝑧1
𝑧1 𝑎1 𝑎1
𝑥3 𝑥3

⋮ ⋮
𝑥𝑑 𝑥𝑑
Artificial Neuron
𝑥1
𝑊11

𝑥2 𝑊12
𝑧1
𝑎1 𝑎1 = 𝑓 𝑧1 = 𝑓 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
𝑥3 𝑊13


Each connection has weights 𝑊𝑖𝑗
𝑊1𝑑

𝑥𝑑
(and each neuron a bias 𝑏𝑖 )
Artificial Neurons
𝑧1 𝑎1 = 𝑓 𝑧1 = 𝑓 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
𝑥1 𝑎1

𝑥2

𝑥3

𝑥𝑑
Artificial Neurons
𝑧1 𝑎1 = 𝑓 𝑧1 = 𝑓 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
𝑥1 𝑎1

𝑧2
𝑥2 𝑎2
𝑎2 = 𝑓 𝑧2 = 𝑓 𝑊21 𝑥1 + 𝑊22 𝑥2 + ⋯ + 𝑊2𝑑 𝑥𝑑 + 𝑏2

𝑥3

𝑥𝑑
Artificial Neurons
𝑧1 𝑎1 = 𝑓 𝑧1 = 𝑓 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
𝑥1 𝑎1

𝑧2
𝑥2 𝑎2
𝑎2 = 𝑓 𝑧2 = 𝑓 𝑊21 𝑥1 + 𝑊22 𝑥2 + ⋯ + 𝑊2𝑑 𝑥𝑑 + 𝑏2

𝑥3 𝑧3 𝑎3 = 𝑓 𝑧3 = 𝑓 𝑊31 𝑥1 + 𝑊32 𝑥2 + ⋯ + 𝑊3𝑑 𝑥𝑑 + 𝑏3


𝑎3

𝑥𝑑
Artificial Neurons
𝑧1 𝑎1 = 𝑓 𝑧1 = 𝑓 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
𝑥1 𝑎1

𝑧2
𝑥2 𝑎2
𝑎2 = 𝑓 𝑧2 = 𝑓 𝑊21 𝑥1 + 𝑊22 𝑥2 + ⋯ + 𝑊2𝑑 𝑥𝑑 + 𝑏2

𝑥3 𝑧3 𝑎3 = 𝑓 𝑧3 = 𝑓 𝑊31 𝑥1 + 𝑊32 𝑥2 + ⋯ + 𝑊3𝑑 𝑥𝑑 + 𝑏3


𝑎3
⋮ ⋮ ⋮
𝑧ℎ
𝑥𝑑 𝑎ℎ 𝑎ℎ = 𝑓 𝑧ℎ = 𝑓 𝑊ℎ1 𝑥1 + 𝑊ℎ2 𝑥2 + ⋯ + 𝑊ℎ𝑑 𝑥𝑑 + 𝑏ℎ
Artificial Neural Network (ANN)
𝑧1 𝑧1 𝑧1
𝑥1 𝑎1 𝑎1 𝑎1

𝑧2 𝑧2 𝑧2
𝑥2 𝑎2 𝑎2 𝑎2


𝑥3 𝑧3 𝑧3 𝑧3
𝑎3 𝑎3 𝑎3
⋮ ⋮ ⋮ ⋮
𝑧ℎ 𝑧ℎ 𝑧ℎ
𝑥𝑑 𝑎ℎ 𝑎ℎ 𝑎ℎ

𝑙=1 𝑙=2 𝑙=3 𝑙=𝐿


Artificial Neural Network (ANN): Forward Equations
𝑧1 𝑧1
𝑎1 𝑎1

𝑧2 𝑧2
𝑎2 𝑎2
𝑛𝑙−1
𝑙 𝑙 𝑙 𝑙−1 𝑙
𝑧3 𝑧3 𝑎𝑖 = 𝑓 𝑧𝑖 = 𝑓 ෍ 𝑊𝑖𝑗 𝑎𝑗 + 𝑏𝑖
𝑎3 𝑎3 𝑗=1

⋮ ⋮ 𝑊ℎ𝑒𝑟𝑒:
𝑧ℎ 𝑧ℎ 1
𝑎ℎ 𝑎ℎ 𝑎𝑗 = 𝑥𝑗
𝑙=2 𝑙=3
Artificial Neural Network (ANN): Forward Equations
𝑧1 𝑧1
𝑎1 𝑎1 𝑛𝑙−1
𝑙 𝑙 𝑙 𝑙−1 𝑙
𝑎𝑖 = 𝑓 𝑧𝑖 =𝑓 ෍ 𝑊𝑖𝑗 𝑎𝑗 + 𝑏𝑖
𝑧2 𝑧2 𝑗=1
𝑎2 𝑎2

=
𝑧3 𝑧3
𝑎3 𝑎3
⋮ ⋮
𝑧ℎ 𝑧ℎ 𝑎 𝑙
= 𝑓ሚ 𝑧 𝑙
= 𝑓ሚ 𝑊 𝑙 𝑎 𝑙−1
+𝑏 𝑙

𝑎ℎ 𝑎ℎ

𝑙=2 𝑙=3
Artificial Neural Network (ANN): Forward Equations
𝑧1 𝑧1
𝑎1 𝑎1
𝑎 𝑙
= 𝑓ሚ 𝑧 𝑙 ሚ 𝑙
=𝑓 𝑊 𝑎 𝑙−1
+𝑏 𝑙

𝑧2 𝑧2
𝑎2 𝑎2

𝑧3 𝑧3
𝑎3 𝑎3
⋮ ⋮
𝑧ℎ 𝑧ℎ
𝑎ℎ 𝑎ℎ

𝑙=2 𝑙=3
ANN: Forward equations (Computational Graph)
𝑙−1
𝑎
𝑎 𝑙
= 𝑓ሚ 𝑧 𝑙
= 𝑓ሚ 𝑊 𝑙 𝑎 𝑙−1
+𝑏 𝑙

𝑀𝑎𝑡𝑟𝑖𝑥
×
𝑃𝑟𝑜𝑑𝑢𝑐𝑡
𝑙 𝑙 𝑙
𝑊 𝑧 𝑎
+ 𝑓ሚ
𝑙
𝑏 𝐸𝑙𝑒𝑚𝑒𝑛𝑡 − 𝑤𝑖𝑠𝑒
nonlinear 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
ANN: Forward Propagation

1 2 𝐿
𝑎 𝑎 𝑎


𝑊ℎ𝑒𝑟𝑒:
1
𝑎 =𝑥
ANN: Learnable parameters
1 2 𝐿
𝑎 𝑎 𝑎


𝑎 𝑙
= 𝑓ሚ 𝑧 𝑙
= 𝑓ሚ 𝑊 𝑙 𝑎 𝑙−1
+𝑏 𝑙

How to define them?


Machine Learning (Supervised Learning) Framework
Set up the optimization problem

min 𝐿 𝑑𝑎𝑡𝑎, 𝑊, 𝑏
𝑊,𝑏
Standard optimization algorithm in Machine Learning
𝐿 𝜃

𝑘+1 𝑘 𝜕𝐿
𝜃 =𝜃 − 𝛼
𝜕𝜃

𝜃
Optimization algorithm: Gradient descent

𝑘+1 𝑘 𝜕𝐿
𝜃 =𝜃 − 𝛼
𝜕𝜃

Remarks:
• Local optimizer.
• Only requires derivative computation.
The Loss Function 𝐿

min 𝐿 𝑑𝑎𝑡𝑎, 𝑊, 𝑏
𝑊,𝑏
𝑛
1 𝐿 𝑖 𝑖
2
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛: 𝐿= ෍ 𝑎 −𝑦
2𝑛
𝑖 𝑑 𝑖=1
𝑦 ∈ ℝ , ∀𝑖
𝑛
1 𝑖 𝐿 𝑖
𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛: 𝐿 = − ෍ 𝑦 ln 𝑎
𝑖 𝑛
𝑦 ∈ 𝑌, ∀𝑖 𝑖=1
ANN: Derivatives (Backward) computation
𝜕𝐿
𝜕𝑎 𝑙−1 𝜕𝐿
𝜕𝑧 𝑙
×
𝜕𝐿 𝜕𝐿
𝜕𝑊 𝑙 𝜕𝑧 𝑙 𝜕𝐿
𝜕𝐿 + 𝑓ሚ
𝜕𝑎 𝑙
𝜕𝑏 𝑙
ANN: Derivatives (Backward) computation
(Mathematical proofs skipped)

𝑙
𝜕𝐿 𝜕𝐿 𝜕𝑎
𝑙
= 𝑙
∗ 𝑙
𝜕𝑧 𝜕𝑎 𝜕𝑧

𝜕𝐿 𝜕𝐿
𝑙
=
𝜕𝑏 𝜕𝑧 𝑙

𝜕𝐿 𝜕𝐿 𝑙−1 𝑇
𝑙
= 𝑙
×𝑎
𝜕𝑊 𝜕𝑧

𝜕𝐿 𝑙 𝑇
𝜕𝐿
𝑙−1
=𝑊 ×
𝜕𝑎 𝜕𝑧 𝑙
ANN: Backward propagation (backpropagation)

𝜕𝐿 𝜕𝐿 𝜕𝐿
𝜕𝑎 1 𝜕𝑎 2 𝜕𝑎 𝐿


ANN: Forward & Backward propagation

1 2 𝐿
𝑎 𝑎 𝑎


𝜕𝐿 𝜕𝐿 𝜕𝐿
𝜕𝑎 1 𝜕𝑎 2 𝜕𝑎 𝐿


Solving the optimization problem for ANNs
min 𝐿 𝑑𝑎𝑡𝑎, 𝑊, 𝑏
𝑊,𝑏

𝑙 𝑙
𝜕𝐿
𝑊 =𝑊 −𝛼
𝜕𝑊 𝑙

𝑙 𝑙
𝜕𝐿
𝑏 =𝑏 −𝛼
𝜕𝑏 𝑙
Worked example (Handwritten digit recognition)

“It’s a 5.”
Artificial Neural
Network

You might also like