Professional Documents
Culture Documents
1
Introduction to Deep Learning Chapter 1
Google DeepMind
3
HCM City Univ. of Technology, Faculty of Mechanical Engineering 3 Duong Van Tu
Introduction to Deep Learning Chapter 1
Motivation for a ZIP code recognizer on real U.S. mail for the postal
service!
“three”
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0
120 0 0 0 0 0 0 0 0 0 0 0
240 40 0 0 0 0 0 0 0 0 0 0
242 128 0 0 0 0 0 0 0 0 0 0
255 240 10 0 0 0 0 0 0 0 0 0
254 244 120 0 0 0 0 0 0 0 0 0
255 255 121 8 0 0 0 0 0 0 0 0
5
1 2
10
Digit is a 7 if 𝑃1 > 128 and
𝑃2 > 128 and 𝑃3 > 128
15
20
3
25
0 5 10 15 20 25
5
1 2
10
15 Slanted digit?
20
3
25
pixel 3 is no
longer dark! 0 5 10 15 20 25
An Improved Heuristic!
5
1 2
Digit is a 7 if 𝑃1 >
10
0 5 10 15 20 25
Not so fast...
5
1 2
the pixel 10
values are
completely 15 Digit shifted up?
different
20
4 3
25
0 5 10 15 20 25
MNIST
Handwritten digits
0 — 9 (10 classes)
70,000 images
Optimizer
Not a 2
The Perceptron
𝑥1
parameters) 𝑥6
𝑤7
𝑥7
1, 𝑖𝑓 𝑏 + 𝑤𝑖 𝑥𝑖 > 0
𝑓Φ 𝑥 =
𝑖=0
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑥0 , 𝑥1 , 𝑥2 , … 𝑥𝑛 ∙ 𝑤0 , 𝑤1 , 𝑤2 , … 𝑤𝑛 + 𝑏
= 𝑥0 , 𝑥1 , 𝑥2 , … 𝑥𝑛 , 1 ∙ 𝑤0 , 𝑤1 , 𝑤2 , … 𝑤𝑛 , 𝑏
𝑤1 𝑤1
𝑥2 𝑥2
𝑤2 𝑏 𝑤2
𝑥3 𝑥3
=
𝑤3 𝑤3
𝑥4 𝑤4 Σ 𝑥4 𝑤4 Σ
𝑤5 𝑤5
𝑥5 𝑥5
𝑤6 𝑤6
𝑥6 𝑥6
𝑤7 𝑤7
𝑥7 𝑥7 𝑏
𝑥3
• 𝑛 = 784 (28 ∗ 28 pixel values) 𝑤3
𝑥4 𝑤4 Σ
• Output is either 0 or 1 𝑥5
𝑤5
⋮
⋮
• 0 → input is not the digit type we’re looking for
𝑤784
𝑥784
• Iterate over training set several times, feeding in each training example into the
model, producing an output, and adjusting the parameters according to whether
that output was right or wrong
• Stop once we either (a) get every training example right or (b) after 𝑁 iterations,
a number set by the programmer.
• If our label 𝑎𝑘 is a 1, and our model’s output is a 0, we update the 𝑖𝑡ℎ weight by:
• 1 − 0 ∙ 𝑥𝑖𝑘 = 𝑥𝑖𝑘
• Output was 0 and should have been 1, so make the output more positive
• If our label 𝑎𝑘 is a 0, and our model’s output is a 1, we update the 𝑖𝑡ℎ weight by:
• 0 − 1 ∙ 𝑥𝑖𝑘 = −𝑥𝑖𝑘
• Output was 1 and should have been 0, so make the output more negative
𝑥1
𝑥2 𝑥1 = 0.8
𝑥2 = 0
Next example:
𝑥1
𝑥1 = 0.9
𝑥2 𝑥2 = 0.9
35
HCM City Univ. of Technology, Faculty of Mechanical Engineering 35 Duong Van Tu
Introduction to Deep Learning Chapter 1
𝑤1
𝑥2
𝑤2
𝑥3
𝑤3
handwritten digit is a 0
𝑤5
𝑥5
𝑤6
𝑥6
𝑤7
𝑥7
⋮𝑥1
⋮
𝑤1
𝑥2
𝑤2
𝑥3
𝑥4 𝑤4 Σ 𝑜𝑢𝑡𝑝𝑢𝑡
𝑥5
𝑤5
𝑤6
handwritten digit is a 9
𝑥6
𝑤7
𝑥7
𝑥2
𝑤1
𝑤2
Σ 𝑜𝑢𝑡𝑝𝑢𝑡1
𝑤1,3
Σ 𝑜𝑢𝑡𝑝𝑢𝑡1
𝑥3
𝑤3 𝑥1 𝑤2,1
𝑥4 𝑤4 𝑤2,2
𝑥5
𝑤5 𝑤2,3
𝑥6
𝑤6
𝑥2
𝑤7 𝑤3,1
𝑥7
𝑤3,2
𝑥1 𝑥3 𝑤3,3
𝑤1
𝑥2
=
𝑤2 𝑤4,1
𝑥3
𝑥4
𝑤3
𝑤4 Σ 𝑜𝑢𝑡𝑝𝑢𝑡2
𝑥4 𝑤4,2
𝑤4,3
Σ 𝑜𝑢𝑡𝑝𝑢𝑡2
𝑤5
𝑥5
𝑥5
𝑤6
𝑤5,1
𝑥6
𝑤7
𝑤5,2
𝑥7
𝑤5,3
𝑥1
𝑤1
𝑥6 𝑤6,1
𝑥2
𝑤2
𝑥3 𝑤6,2
𝑤3
𝑥4 𝑤4 𝑥7 𝑤6,3
Σ 𝑜𝑢𝑡𝑝𝑢𝑡3
𝑤5
𝑥5 Σ 𝑜𝑢𝑡𝑝𝑢𝑡3 𝑤7,1
𝑤6
𝑥6
𝑤7 𝑤7,2
𝑥7 𝑤7,3
Activate Functions
f(x) = 1, x>=0
= 0, x<0
Activate Functions
2. Linear Function
f(x)=ax
Activate Functions
3. Sigmoid
f(x) = 1/(1+e^-x)
Activate Functions
4. Tanh
tanh(x)=2sigmoid(2x)-1
Activate Functions
f(x)=max(0,x)
Activate Functions
6. Leaky ReLU
f(x)= 0.01x, x<0
= x, x>=0
Activate Functions
7. Parameterised ReLU
f(x) = x, x>=0
= ax, x<0
Activate Functions
f(x) = x, x>=0
= a(e^x-1), x<0
Activate Functions
9. Swish Function
f(x) = x*sigmoid(x)
f(x) = x/(1-e^-x)
Activate Functions