Professional Documents
Culture Documents
Deep learning
attracts lots of attention.
• I believe you have seen lots of exciting results
before.
z
z z
z
“Neuron”
Neural Network
Different connection leads to different network
structures
Network parameter 𝜃: all the weights and biases in the “neurons”
Fully Connect Feedforward
Network
1 4 0.98
1
-2
1
-1 -2 0.12
-1
1
0
Sigmoid Function z
z
1
z
1 e z
Fully Connect Feedforward
Network
1 4 0.98 2 0.86 3 0.62
1
-2 -1 -1
1 0 -2
-1 -2 0.12 -2 0.11 -1 0.83
-1
1 -1 4
0 0 2
Fully Connect Feedforward
Network
1 0.73 2 0.72 3 0.51
0
-2 -1 -1
1 0 -2
-1 0.5 -2 0.12 -1 0.85
0
1 -1 4
0 0 2
This is a function. 1 0.62 0 0.51
𝑓 = 𝑓 =
Input vector, output vector −1 0.83 0 0.85
……
……
……
……
……
xN …… yM
Input Output
Layer Hidden Layers Layer
Deep = Many hidden layers
22 layers
http://cs231n.stanford.e
du/slides/winter1516_le 19 layers
cture8.pdf
8 layers
6.7%
7.3%
16.4%
Special
structure
3.57%
7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net Taipei
(2012) (2014) (2014) (2015) 101
Matrix Operation
1 4 0.98
1 y1
-2
1
-1 -2 0.12
-1 y2
1
0
1 −2 1 1 0.98
𝜎 + =
−1 1 −1 0 0.12
4
−2
Neural Network
x1 …… y1
x2 W1 W2 ……
WL y2
b1 b2 bL
……
……
……
……
……
xN x a1 ……
a2 y yM
𝜎 W1 x + b1
𝜎 W2 a1 + b2
𝜎 WL aL-1 + bL
Neural Network
x1 …… y1
x2 W1 W2 ……
WL y2
b1 b2 bL
……
……
……
……
……
xN x a1 ……
a2 y yM
=𝜎 WL …𝜎 W2 𝜎 W1 x + b1 + b2 … + bL
Output Layer
Feature extractor replacing
feature engineering
x1 y1
……
x2
Softmax
…… y2
x
……
……
……
……
……
xK
…… yM
Input Output = Multi-class
Layer Hidden Layers Layer Classifier
Example Application
Input Output
y1
0.1 is 1
x1
x2 y2
0.7 is 2
The image
is “2”
……
……
……
x256 y10
0.2 is 0
16 x 16 = 256
Ink → 1 Each dimension represents
No ink → 0 the confidence of a digit.
Example Application
• Handwriting Digit Recognition
x1 y1 is 1
x2
y2 is 2
Neural
Machine “2”
……
……
……
Network
x256 y10 is 0
What is needed is a
function ……
Input: output:
256-dim vector 10-dim vector
Example Application
Input Layer 1 Layer 2 Layer L Output
x1 …… y1 is 1
x2 ……
A function set containing the y2 is 2
candidates for “2”
……
……
……
……
……
……
Handwriting Digit Recognition
xN …… y10 is 0
Input Output
Layer Hidden Layers Layer
x1 …… y1 𝑦ො1 1
x2
Softmax
……
Given a set of y2 𝑦ො2 0
parameters
……
……
……
……
……
……
Cross
Entropy
x256 …… y10 𝑦ො10 0
10 𝑦 𝑦ො
𝐶 𝑦 , 𝑦ො = − 𝑦ෝ𝑖 𝑙𝑛𝑦𝑖
𝑖=1
Total Loss:
Total Loss 𝑁
𝐿 = 𝐶𝑛
For all training data … 𝑛=1
x1 NN y1 𝑦ො 1
𝐶1
Find a function in
x2 NN y2 𝑦ො 2
𝐶2 function set that
minimizes total loss L
x3 NN y3 𝑦ො 3
𝐶3
……
……
……
……
𝜕𝐿
Compute 𝜕𝐿Τ𝜕𝑏1 𝜕𝑏1
𝑏1 0.3 0.2
−𝜇 𝜕𝐿Τ𝜕𝑏1 ⋮
……
gradient
Gradient Descent
𝜃
Compute 𝜕𝐿Τ𝜕𝑤1 Compute 𝜕𝐿Τ𝜕𝑤1
𝑤1 0.2 0.15 0.09
−𝜇 𝜕𝐿Τ𝜕𝑤1 −𝜇 𝜕𝐿Τ𝜕𝑤1 ……
Compute 𝜕𝐿Τ𝜕𝑤2 Compute 𝜕𝐿Τ𝜕𝑤2
𝑤2 -0.1 0.05 0.15
−𝜇 𝜕𝐿Τ𝜕𝑤2 −𝜇 𝜕𝐿Τ𝜕𝑤2
……
……
libdnn
台大周伯威
同學開發
Ref:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20b
ackprop.ecm.mp4/index.html
Concluding Remarks
f : R N RM
Can be realized by a network
with one hidden layer
Reference for the reason:
(given enough hidden http://neuralnetworksandde
neurons) eplearning.com/chap4.html