Professional Documents
Culture Documents
Multilayer
Perceptron. Backpropagation
COMP90051 Statistical Machine Learning
Semester 2, 2017
Lecturer: Andrey Kan
This lecture
• Multilayer perceptron
∗ Model structure
∗ Universal approximation
∗ Training preliminaries
• Backpropagation
∗ Step-by-step derivation
∗ Notes on regularisation
2
Statistical Machine Learning (S2 2017) Deck 7
Feed-forward
Multilayer perceptrons networks art: OpenClipartVectors
at pixabay.com (CC0)
Perceptrons Convolutional
neural networks
Multilayer Perceptron
4
Statistical Machine Learning (S2 2017) Deck 7
AND OR XOR
5
Statistical Machine Learning (S2 2017) Deck 7
Sputnik – first
artificial Earth
satellite
6
Statistical Machine Learning (S2 2017) Deck 7
1, 𝑖𝑖𝑖𝑖 𝑠𝑠 ≥ 0
Sign function 𝑓𝑓 𝑠𝑠 = �
−1, 𝑖𝑖𝑖𝑖 𝑠𝑠 < 0
1
Logistic function 𝑓𝑓 𝑠𝑠 =
1 + 𝑒𝑒 −𝑠𝑠
7
Statistical Machine Learning (S2 2017) Deck 7
8
Statistical Machine Learning (S2 2017) Deck 7
𝑢𝑢𝑗𝑗 = 𝑔𝑔(𝑟𝑟𝑗𝑗 )
𝑢𝑢1 𝑠𝑠𝑘𝑘 = 𝑤𝑤0𝑘𝑘 + � 𝑢𝑢𝑗𝑗 𝑤𝑤𝑗𝑗𝑗𝑗
𝑥𝑥2 𝑧𝑧2
𝑗𝑗=1
𝑚𝑚 …
𝑟𝑟𝑗𝑗 = 𝑣𝑣0𝑗𝑗 + � 𝑥𝑥𝑖𝑖 𝑣𝑣𝑖𝑖𝑖𝑖 note that 𝑧𝑧𝑘𝑘 is a
… … function composition
𝑖𝑖=1
𝑢𝑢𝑝𝑝 (a function applied to
𝑧𝑧𝑞𝑞 the result of another
function, etc.)
𝑥𝑥𝑚𝑚
you can add bias node 𝑥𝑥0 = 1 to simplify
here 𝑔𝑔, ℎ are activation equations: 𝑟𝑟𝑗𝑗 = ∑𝑚𝑚
𝑖𝑖=0 𝑥𝑥𝑖𝑖 𝑣𝑣𝑖𝑖𝑖𝑖
functions. These can be
either same (e.g., both similarly you can add bias node 𝑢𝑢0 = 1 to
𝑝𝑝
sigmoid) or different simplify equations: 𝑠𝑠𝑘𝑘 = ∑𝑗𝑗=0 𝑢𝑢𝑗𝑗 𝑤𝑤𝑗𝑗𝑗𝑗
9
Statistical Machine Learning (S2 2017) Deck 7
• Binary classification
∗ e.g., predict whether a patient has type II diabetes
• Multivariate classification
∗ e.g., handwritten digits recognition with labels “1”, “2”, etc.
10
Statistical Machine Learning (S2 2017) Deck 7
𝑥𝑥 𝑢𝑢2 𝑧𝑧
𝑢𝑢3
Plot by Geek3
Wikimedia Commons (CC3)
11
Statistical Machine Learning (S2 2017) Deck 7
• You know the drill: Define the loss function and find
parameters that minimise the loss on training data
13
Statistical Machine Learning (S2 2017) Deck 7
…
• This will simplify description 𝑢𝑢𝑝𝑝
of backpropagation. In other
settings, the training 𝑥𝑥𝑚𝑚
procedure is similar
14
Statistical Machine Learning (S2 2017) Deck 7
𝑚𝑚 + 1 𝑝𝑝
art: OpenClipartVectors at
pixabay.com (CC0)
15
Statistical Machine Learning (S2 2017) Deck 7
1
Need to compute partial
2
𝐿𝐿 = 𝑧𝑧𝑗𝑗 − 𝑦𝑦𝑗𝑗 𝜕𝜕𝐿𝐿 𝜕𝜕𝐿𝐿
2 derivatives and
𝜕𝜕𝜐𝜐𝑖𝑖𝑗𝑗 𝜕𝜕𝑤𝑤𝑗𝑗
17
Statistical Machine Learning (S2 2016) Deck 7
Backpropagation
= “backward propagation of errors”
18
Statistical Machine Learning (S2 2017) Deck 7
• Here 𝐿𝐿 = 0.5 𝑧𝑧 − 𝑦𝑦 2
𝑥𝑥1 𝑣𝑣𝑖𝑖𝑖𝑖 𝑤𝑤𝑗𝑗
and 𝑧𝑧 = 𝑠𝑠
𝑢𝑢1 Thus 𝛿𝛿 = 𝑧𝑧 − 𝑦𝑦
𝑥𝑥2 𝑧𝑧 𝑝𝑝
… • Here 𝑠𝑠 = ∑𝑗𝑗=0 𝑢𝑢𝑗𝑗 𝑤𝑤𝑗𝑗 and
…
𝑢𝑢𝑝𝑝 𝑢𝑢𝑗𝑗 = ℎ 𝑟𝑟𝑗𝑗
Thus 𝜀𝜀𝑗𝑗 = 𝛿𝛿𝑤𝑤𝑗𝑗 ℎ′ (𝑟𝑟𝑗𝑗 )
𝑥𝑥𝑚𝑚
20
Statistical Machine Learning (S2 2017) Deck 7
Backpropagation equations
• We have … where • Recall that
𝑝𝑝
∗
𝜕𝜕𝜕𝜕
= 𝛿𝛿
𝜕𝜕𝜕𝜕
∗ 𝛿𝛿 =
𝜕𝜕𝜕𝜕
= 𝑧𝑧 − 𝑦𝑦 ∗ 𝑠𝑠 = ∑𝑗𝑗=0 𝑢𝑢𝑗𝑗 𝑤𝑤𝑗𝑗
𝜕𝜕𝑤𝑤𝑗𝑗 𝜕𝜕𝑤𝑤𝑗𝑗 𝜕𝜕𝜕𝜕
𝜕𝜕𝜕𝜕 ∗ 𝑟𝑟𝑗𝑗 = ∑𝑚𝑚
𝑖𝑖=0 𝑥𝑥𝑖𝑖 𝑣𝑣𝑖𝑖𝑖𝑖
∗
𝜕𝜕𝜕𝜕
= 𝜀𝜀𝑗𝑗
𝜕𝜕𝑟𝑟𝑗𝑗 ∗ 𝜀𝜀𝑗𝑗 = = 𝛿𝛿𝑤𝑤𝑗𝑗 ℎ′ (𝑟𝑟𝑗𝑗 )
𝜕𝜕𝑟𝑟𝑗𝑗
𝜕𝜕𝑣𝑣𝑖𝑖𝑖𝑖 𝜕𝜕𝑣𝑣𝑖𝑖𝑖𝑖
𝜕𝜕𝑠𝑠 𝜕𝜕𝑟𝑟𝑗𝑗
• So = 𝑢𝑢𝑗𝑗 and = 𝑥𝑥𝑖𝑖
𝜕𝜕𝑤𝑤𝑗𝑗 𝜕𝜕𝑣𝑣𝑖𝑖𝑖𝑖
𝑥𝑥1 𝑣𝑣𝑖𝑖𝑖𝑖 𝑤𝑤𝑗𝑗
𝑢𝑢1 • We have
𝑥𝑥2 𝑧𝑧 𝜕𝜕𝜕𝜕
… ∗ = 𝛿𝛿𝑢𝑢𝑗𝑗 = 𝑧𝑧 − 𝑦𝑦 𝑢𝑢𝑗𝑗
𝜕𝜕𝑤𝑤𝑗𝑗
…
𝑢𝑢𝑝𝑝 ∗
𝜕𝜕𝜕𝜕
= 𝜀𝜀𝑗𝑗 𝑥𝑥𝑖𝑖 = 𝛿𝛿𝑤𝑤𝑗𝑗 ℎ′ 𝑟𝑟𝑗𝑗 𝑥𝑥𝑖𝑖
𝜕𝜕𝑣𝑣𝑖𝑖𝑖𝑖
𝑥𝑥𝑚𝑚
21
Statistical Machine Learning (S2 2017) Deck 7
Forward propagation
• Use current • Calculate 𝑟𝑟𝑗𝑗 ,
estimates of 𝑢𝑢𝑗𝑗 , 𝑠𝑠 and 𝑧𝑧
𝑣𝑣𝑖𝑖𝑖𝑖 and 𝑤𝑤𝑗𝑗
𝑥𝑥𝑚𝑚
22
Statistical Machine Learning (S2 2017) Deck 7
𝑥𝑥𝑚𝑚
23
Statistical Machine Learning (S2 2017) Deck 7
24
Statistical Machine Learning (S2 2017) Deck 7
Explicit regularisation
• Alternatively, an explicit regularisation can be used,
much like in ridge regression
• Instead of minimising the loss 𝐿𝐿, minimise regularised
𝑝𝑝 𝑝𝑝
function 𝐿𝐿 + 𝜆𝜆 ∑𝑚𝑚 ∑ 𝑣𝑣 2
𝑖𝑖=0 𝑗𝑗=1 𝑖𝑖𝑖𝑖 + ∑ 𝑤𝑤
𝑗𝑗=0 𝑗𝑗
2
• This will simply add 2𝜆𝜆𝑣𝑣𝑖𝑖𝑖𝑖 and 2𝜆𝜆𝑤𝑤𝑗𝑗 terms to the partial
derivatives
• With some activation functions this also shrinks the ANN
towards a linear model
25
Statistical Machine Learning (S2 2017) Deck 7
This lecture
• Multilayer perceptron
∗ Model structure
∗ Universal approximation
∗ Training preliminaries
• Backpropagation
∗ Step-by-step derivation
∗ Notes on regularisation
26