Professional Documents
Culture Documents
Ann PDF
Ann PDF
𝑥 𝑦
𝑓 𝑦=𝑓 𝑥
Examples (Tabular data: Classification)
Process /
System
Examples (Tabular data: Regression)
Process /
System
Examples (Computer vision: Image Classification)
𝐶𝑎𝑡
Process /
System
Examples (Computer vision: Object detection)
Examples (Natural Language Processing)
𝐼𝑛𝑝𝑢𝑡:
𝑂𝑢𝑡𝑝𝑢𝑡:
𝐼𝑛𝑝𝑢𝑡𝑠 𝑂𝑢𝑡𝑝𝑢𝑡𝑠
Process /
System
𝑥 𝑦
𝑓 𝑦=𝑓 𝑥
𝐼𝑛𝑝𝑢𝑡𝑠 𝑂𝑢𝑡𝑝𝑢𝑡𝑠
Process /
System
𝐼𝑛𝑝𝑢𝑡𝑠
Machine 𝑃𝑟𝑜𝑐𝑒𝑠𝑠/𝑆𝑦𝑠𝑡𝑒𝑚
𝑂𝑢𝑡𝑝𝑢𝑡𝑠
Learning
𝐼𝑛𝑝𝑢𝑡𝑠
Artificial Neural 𝑃𝑟𝑜𝑐𝑒𝑠𝑠/𝑆𝑦𝑠𝑡𝑒𝑚
𝑂𝑢𝑡𝑝𝑢𝑡𝑠
Network
𝑥
=
Artificial Neural
𝑓መ
𝑦
Network
What is an
artificial
neural
network?
𝑧1 = 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
Artificial Neuron 𝑎1 = 𝑓 𝑧1
𝑥1 𝑥1
𝑥2 𝑥2
=
𝑧1
𝑧1 𝑎1 𝑎1
𝑥3 𝑥3
⋮ ⋮
𝑥𝑑 𝑥𝑑
Artificial Neuron
𝑥1
𝑊11
𝑥2 𝑊12
𝑧1
𝑎1 𝑎1 = 𝑓 𝑧1 = 𝑓 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
𝑥3 𝑊13
⋮
Each connection has weights 𝑊𝑖𝑗
𝑊1𝑑
𝑥𝑑
(and each neuron a bias 𝑏𝑖 )
Artificial Neurons
𝑧1 𝑎1 = 𝑓 𝑧1 = 𝑓 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
𝑥1 𝑎1
𝑥2
𝑥3
⋮
𝑥𝑑
Artificial Neurons
𝑧1 𝑎1 = 𝑓 𝑧1 = 𝑓 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
𝑥1 𝑎1
𝑧2
𝑥2 𝑎2
𝑎2 = 𝑓 𝑧2 = 𝑓 𝑊21 𝑥1 + 𝑊22 𝑥2 + ⋯ + 𝑊2𝑑 𝑥𝑑 + 𝑏2
𝑥3
⋮
𝑥𝑑
Artificial Neurons
𝑧1 𝑎1 = 𝑓 𝑧1 = 𝑓 𝑊11 𝑥1 + 𝑊12 𝑥2 + ⋯ + 𝑊1𝑑 𝑥𝑑 + 𝑏1
𝑥1 𝑎1
𝑧2
𝑥2 𝑎2
𝑎2 = 𝑓 𝑧2 = 𝑓 𝑊21 𝑥1 + 𝑊22 𝑥2 + ⋯ + 𝑊2𝑑 𝑥𝑑 + 𝑏2
𝑧2
𝑥2 𝑎2
𝑎2 = 𝑓 𝑧2 = 𝑓 𝑊21 𝑥1 + 𝑊22 𝑥2 + ⋯ + 𝑊2𝑑 𝑥𝑑 + 𝑏2
𝑧2 𝑧2 𝑧2
𝑥2 𝑎2 𝑎2 𝑎2
⋮
𝑥3 𝑧3 𝑧3 𝑧3
𝑎3 𝑎3 𝑎3
⋮ ⋮ ⋮ ⋮
𝑧ℎ 𝑧ℎ 𝑧ℎ
𝑥𝑑 𝑎ℎ 𝑎ℎ 𝑎ℎ
𝑧2 𝑧2
𝑎2 𝑎2
𝑛𝑙−1
𝑙 𝑙 𝑙 𝑙−1 𝑙
𝑧3 𝑧3 𝑎𝑖 = 𝑓 𝑧𝑖 = 𝑓 𝑊𝑖𝑗 𝑎𝑗 + 𝑏𝑖
𝑎3 𝑎3 𝑗=1
⋮ ⋮ 𝑊ℎ𝑒𝑟𝑒:
𝑧ℎ 𝑧ℎ 1
𝑎ℎ 𝑎ℎ 𝑎𝑗 = 𝑥𝑗
𝑙=2 𝑙=3
Artificial Neural Network (ANN): Forward Equations
𝑧1 𝑧1
𝑎1 𝑎1 𝑛𝑙−1
𝑙 𝑙 𝑙 𝑙−1 𝑙
𝑎𝑖 = 𝑓 𝑧𝑖 =𝑓 𝑊𝑖𝑗 𝑎𝑗 + 𝑏𝑖
𝑧2 𝑧2 𝑗=1
𝑎2 𝑎2
=
𝑧3 𝑧3
𝑎3 𝑎3
⋮ ⋮
𝑧ℎ 𝑧ℎ 𝑎 𝑙
= 𝑓ሚ 𝑧 𝑙
= 𝑓ሚ 𝑊 𝑙 𝑎 𝑙−1
+𝑏 𝑙
𝑎ℎ 𝑎ℎ
𝑙=2 𝑙=3
Artificial Neural Network (ANN): Forward Equations
𝑧1 𝑧1
𝑎1 𝑎1
𝑎 𝑙
= 𝑓ሚ 𝑧 𝑙 ሚ 𝑙
=𝑓 𝑊 𝑎 𝑙−1
+𝑏 𝑙
𝑧2 𝑧2
𝑎2 𝑎2
𝑧3 𝑧3
𝑎3 𝑎3
⋮ ⋮
𝑧ℎ 𝑧ℎ
𝑎ℎ 𝑎ℎ
𝑙=2 𝑙=3
ANN: Forward equations (Computational Graph)
𝑙−1
𝑎
𝑎 𝑙
= 𝑓ሚ 𝑧 𝑙
= 𝑓ሚ 𝑊 𝑙 𝑎 𝑙−1
+𝑏 𝑙
𝑀𝑎𝑡𝑟𝑖𝑥
×
𝑃𝑟𝑜𝑑𝑢𝑐𝑡
𝑙 𝑙 𝑙
𝑊 𝑧 𝑎
+ 𝑓ሚ
𝑙
𝑏 𝐸𝑙𝑒𝑚𝑒𝑛𝑡 − 𝑤𝑖𝑠𝑒
nonlinear 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
ANN: Forward Propagation
1 2 𝐿
𝑎 𝑎 𝑎
⋮
𝑊ℎ𝑒𝑟𝑒:
1
𝑎 =𝑥
ANN: Learnable parameters
1 2 𝐿
𝑎 𝑎 𝑎
⋮
𝑎 𝑙
= 𝑓ሚ 𝑧 𝑙
= 𝑓ሚ 𝑊 𝑙 𝑎 𝑙−1
+𝑏 𝑙
min 𝐿 𝑑𝑎𝑡𝑎, 𝑊, 𝑏
𝑊,𝑏
Standard optimization algorithm in Machine Learning
𝐿 𝜃
𝑘+1 𝑘 𝜕𝐿
𝜃 =𝜃 − 𝛼
𝜕𝜃
𝜃
Optimization algorithm: Gradient descent
𝑘+1 𝑘 𝜕𝐿
𝜃 =𝜃 − 𝛼
𝜕𝜃
Remarks:
• Local optimizer.
• Only requires derivative computation.
The Loss Function 𝐿
min 𝐿 𝑑𝑎𝑡𝑎, 𝑊, 𝑏
𝑊,𝑏
𝑛
1 𝐿 𝑖 𝑖
2
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛: 𝐿= 𝑎 −𝑦
2𝑛
𝑖 𝑑 𝑖=1
𝑦 ∈ ℝ , ∀𝑖
𝑛
1 𝑖 𝐿 𝑖
𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛: 𝐿 = − 𝑦 ln 𝑎
𝑖 𝑛
𝑦 ∈ 𝑌, ∀𝑖 𝑖=1
ANN: Derivatives (Backward) computation
𝜕𝐿
𝜕𝑎 𝑙−1 𝜕𝐿
𝜕𝑧 𝑙
×
𝜕𝐿 𝜕𝐿
𝜕𝑊 𝑙 𝜕𝑧 𝑙 𝜕𝐿
𝜕𝐿 + 𝑓ሚ
𝜕𝑎 𝑙
𝜕𝑏 𝑙
ANN: Derivatives (Backward) computation
(Mathematical proofs skipped)
𝑙
𝜕𝐿 𝜕𝐿 𝜕𝑎
𝑙
= 𝑙
∗ 𝑙
𝜕𝑧 𝜕𝑎 𝜕𝑧
𝜕𝐿 𝜕𝐿
𝑙
=
𝜕𝑏 𝜕𝑧 𝑙
𝜕𝐿 𝜕𝐿 𝑙−1 𝑇
𝑙
= 𝑙
×𝑎
𝜕𝑊 𝜕𝑧
𝜕𝐿 𝑙 𝑇
𝜕𝐿
𝑙−1
=𝑊 ×
𝜕𝑎 𝜕𝑧 𝑙
ANN: Backward propagation (backpropagation)
𝜕𝐿 𝜕𝐿 𝜕𝐿
𝜕𝑎 1 𝜕𝑎 2 𝜕𝑎 𝐿
⋮
ANN: Forward & Backward propagation
1 2 𝐿
𝑎 𝑎 𝑎
⋮
𝜕𝐿 𝜕𝐿 𝜕𝐿
𝜕𝑎 1 𝜕𝑎 2 𝜕𝑎 𝐿
⋮
Solving the optimization problem for ANNs
min 𝐿 𝑑𝑎𝑡𝑎, 𝑊, 𝑏
𝑊,𝑏
𝑙 𝑙
𝜕𝐿
𝑊 =𝑊 −𝛼
𝜕𝑊 𝑙
𝑙 𝑙
𝜕𝐿
𝑏 =𝑏 −𝛼
𝜕𝑏 𝑙
Worked example (Handwritten digit recognition)
“It’s a 5.”
Artificial Neural
Network