Professional Documents
Culture Documents
05 Neural Network
05 Neural Network
cse.iitkgp.ac.i
Deep Learning
CS60010
Abir Das
Assistant Professor
Computer Science and Engineering Department
Indian Institute of Technology Kharagpur
http://cse.iitkgp.ac.in/~adas/
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Image courtesy: F. A. Makinde et. al., “Prediction of crude oil viscosity using feed-forward
back-propagation neural network (FFBPNN)”." Petroleum & Coal 2012
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 2
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 3
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Artificial Neuron
𝑇 𝑇
𝒘 = [ 𝑤 1 𝑤 2 …𝑤 𝑑 ] and 𝒙 =[ 𝑥1 𝑥2 … 𝑥 𝑑 ]
[]
𝑑
𝑥1 𝑤1 𝒙
𝑎 ( 𝒙 )=𝑏+ ∑ 𝑤𝑖 𝑥 𝑖=[ 𝒘 𝑏 ] 𝑇
𝑤2 1
𝑎(𝒙) 𝑔 (𝑎) 𝑖=1
𝑥2 𝑦 𝑦 =𝑔 (𝑎 ( 𝒙 ) )
𝑊 0 =𝑏
𝑤𝑑 Terminologies:-
𝑥𝑑 : input, : weights, : bias
1 : pre-activation (input activation)
: activation function
: activation (output activation)
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 4
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Perceptron
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 5
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Perceptron
and – Binary Classification
𝑥1 𝑤1
{
𝑔 (𝑎)= 1 , 𝑎 ≥ 0
¿ 0 , 𝑎< 0
(Rosenblatt, 1957)
𝑊 0 =𝑏 𝑎
𝑥 𝑑 𝑤𝑑 The perceptron classification rule, thus, translates to
1
{
𝑇
𝑦= 1 ,𝒘 𝒙 +𝑏 ≥ 0 Remember: represents
𝑇
¿ −1 , 𝒘 𝒙 +𝑏<0 a hyperplane.
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 6
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Convergence Theorem:-
For a finite and linearly separable set of data, the perceptron learning algorithm
will find a linear separator in at most iterations where the maximum length of any
data point is and is the maximum margin of the linear separators.
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 7
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 8
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
(1)
(1) 𝒙 (1) (2) 𝒙
𝒙 𝒙 (3) 𝒘 𝒙 (2)(3) 𝒘
𝒙 (2)(3) 𝒘 𝒙 𝒙
𝒙
¿𝑎
𝑎 ¿𝑎
(0,0) (0,0)
(0,0) 𝑇 𝒘 𝑇 𝒙=𝑎
𝑇 𝒘 𝒙=𝑎
𝒘 𝒙=𝑎
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 9
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
• For simplicity, let us assume that the hyperplane passes through origin – This does not
affect the problem in our hand as it just shifts the origin. But that means . Thus the
classification rule becomes .
• There may not exist such hyperplane that can separate two sets of data. If it exists then
the data is known to be linearly separable.
• A more formal definition is – Two sets and of dimensional points are said to be linearly
separable if there exists real numbers such that every point , satisfies and every point ,
satisfies ,
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 10
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
• Margin of separation: This is the minimum distance between any data point and the
hyperplane . Such a hyperplane is defined to be the maximum margin separator.
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 11
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 13
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
• So, the length of the updated vector is increasing but it is increasing in a controlled way.
Observation 2
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 14
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 15
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 16
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 17
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 18
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Input Layer Hidden Layer Multilayer Neural Network
𝑥1 1
𝑥2 𝑔(𝑎)
1 Output Layer
Intuition:-
Hidden Layer: Extracts better
𝑥𝑑 𝑔(𝑎)
𝑦 representation of the input data
o Output layer: Does the classification
𝑔(𝑎)
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 19
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Input Layer Hidden Layer Multilayer Neural Network
[]
𝑑
𝑥1 (1)
1 𝒙
𝑎1 (𝒙 )=𝑏1 + ∑ 𝑤1 ,𝑖 𝑥 𝑖 =[ 𝒘 1 𝑏1 ]
𝑤 1,1 (1) (1)
( 1) (1) (1) (1) (1)
𝑤 =𝑏
(1)
1,0 1
𝑖=1 1
𝑤
𝑥2 1,2
𝑔(𝑎) h
(1 )
1
[ (1) (1)
Row vector 𝑤1 ,1 , 𝑤1 ,2 , ⋯ ,𝑤1 , 𝑑
(1)
]
𝑇
Column vector [ 𝑥1 𝑥 2 … 𝑥𝑑 ]
𝑥𝑑 𝑤
(1)
1, 𝑑
h =𝑔 ( 𝑎 ( 𝒙 ) )
(1 )
1
( 1)
1
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 20
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Input Layer Hidden Layer Multilayer Neural Network
[]
𝑑
𝑥1 (1)
1 𝒙
𝑎1 (𝒙 )=𝑏1 + ∑ 𝑤1 ,𝑖 𝑥 𝑖 =[ 𝒘 1 𝑏1 ]
𝑤 1,1 (1) (1)
( 1) (1) (1) (1) (1)
𝑤 =𝑏
(1)(1)
1,0 1
𝑖=1 1
𝑤𝑤
𝑥2 1,22,1
h
(1 )
𝑔(𝑎) h1 =𝑔 ( 𝑎1 ( 𝒙 ) )
1 (1 ) ( 1)
[]
𝑤(1) 1 𝑑
(1) (1)
𝒙
𝑤 =𝑏
𝑎 2 (𝒙 )=𝑏 2 + ∑ 𝑤2 ,𝑖 𝑥 𝑖 =[ 𝒘 2 𝑏2 ]
2,2
2,0 2 ( 1) (1) (1) (1) (1)
𝑤 (1)
1
𝑥𝑑 𝑤 (1)
2,3
1, 𝑑 𝑔(𝑎) h
(1 )
2
𝑖=1
h2 =𝑔 ( 𝑎2 ( 𝒙 ) )
(1 ) ( 1)
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 21
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Input Layer Hidden Layer Multilayer Neural Network
[]
𝑑
𝑥1 (1)
1 𝒙
𝑎1 (𝒙 )=𝑏1 + ∑ 𝑤1 ,𝑖 𝑥 𝑖 =[ 𝒘 1 𝑏1 ]
𝑤 1,1 (1) (1)
( 1) (1) (1) (1) (1)
𝑤 =𝑏
(1)(1)
1,0 1
𝑖=1 1
𝑤𝑤
𝑥2 1,22,1
h
(1 )
𝑔(𝑎) h1 =𝑔 ( 𝑎1 ( 𝒙 ) )
1 (1 ) ( 1)
[]
𝑤(1) 1 𝑑
(1) (1)
𝒙
𝑤 =𝑏
𝑎 2 (𝒙 )=𝑏 2 + ∑ 𝑤2 ,𝑖 𝑥 𝑖 =[ 𝒘 2 𝑏2 ]
2,2
(1) 2,0 2 ( 1) (1) (1) (1) (1)
𝑤 𝑀, 2
𝑤 (1)
1
𝑥𝑑 𝑤 (1)
2,3
1, 𝑑 𝑔(𝑎) h
(1 )
2
𝑖=1
h2 =𝑔 ( 𝑎2 ( 𝒙 ) )
(1 ) ( 1)
[]
(1) 𝑑
𝑤 𝑀, 1 𝒙
𝑎 (𝒙)=𝑏 + ∑ 𝑤
( 1)
𝑀
(1)
𝑀
(1)
𝑀 ,𝑖 𝑥 𝑖 =[ 𝒘 𝑏
(1)
𝑀
(1 )
2 ]
𝑤
(1)
𝑀, 3
1
𝑤
(1) (1)
=𝑏 𝑖=1 1
𝑀, 0 𝑀
h
(1 ) h =𝑔 ( 𝑎 ( 𝒙 ) )
(1 )
𝑀
( 1)
𝑀
𝑔(𝑎) 𝑀
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 22
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Input Layer Hidden Layer Multilayer Neural Network
[ ][ ]
𝑥1 1 𝑎 11 ( 𝒙 )= [ 𝒘 1 𝑏 1
( ) ( 1) ( 1)
][ ]𝒙
1 (1 )
𝑎1 ( 𝒙 ) 𝒘1
(1)
𝑏1
(1)
][ 𝒙
1]
[ ]
(1) (1)
𝑎 21 ( 𝒙 )= [ 𝒘 2 𝑏 2 𝑎2 ( 𝒙 ) = 𝒘 2 𝑏2 𝒙
( ) ( 1) ( 1) (1 )
𝑥2 𝑔(𝑎) h
(1 )
1 ⋮ ⋮ ⋮ ⋮ 1
(1) (1)
[ ]
( 1)
( 𝒙 )= [ 𝒘
𝒙 𝑎𝑀 ( 𝒙 ) 𝒘 𝑏𝑀
1 𝑎
( 1)
𝑀
(1)
𝑀
(1)
𝑏
2 ] 1
𝑀
𝑥𝑑 h
(1 ) 𝒂 1 ( 𝒙 )=[ 𝑾
( ) ( 1)
𝒃
(1)
] [ ]
𝒙
1
𝑔(𝑎) 2
Matrix
Column vector Column vector
Column vector
[ ][ ]
1 (1)
𝑔 ( 𝑎1 )
( 1)
h1
h(1) 𝑔 ( 𝑎 2 )
( 1)
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 23
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Input Layer Hidden Layer Multilayer Neural Network
𝑥1 1
Output Layer Pre-activation
[ ]
( 1)
𝒉
(1 ) )= [ 𝒘 ]
(2 ) ( 1) (2) ( 2)
𝑎 (𝒉 𝑏
𝑥2 𝑔(𝑎)
h1 1 1 1
1
(2)
1 𝑤1,1 Output Layer Output Layer Activation
(1 )
𝑦 =𝑜 ( 𝑎 (12) ( 𝒉(1 )) )
h
𝑥𝑑 𝑔(𝑎)
2
𝑦
𝑤
(2) o o can have many options
1,2
• sigmoid
• tanh
1 (2) • Relu etc.
𝑤1, 𝑀
One special is softmax function
𝑔(𝑎) h(1𝑀)
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 24
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Input Layer Hidden Layer Multilayer Neural Network
Output Layer
𝑥1 1
(2) (2)
𝑎1 exp ( 𝑎1 )
𝑥2 𝑔(𝑎)
𝑦 1 =𝑜 1 ( 𝒂
(2 )
)=
∑ exp (𝑎(2)
𝑐 )
o 𝑐
1
(2) Softmax function:
𝑎2
( 2)
exp (𝑎 ) Exponentiate each component
𝑦 2 =𝑜 2 ( 𝒂 )=
( 2) 2
𝑥𝑑 𝑔(𝑎) o ∑ exp ( 𝑎 ) of output activation vector and
( 2)
𝑐
𝑐
elementwise divide by the sum
of the exponentials.
1
(2)
𝑎 𝐶 (2)
exp (𝑎 𝐶 )
𝑦 𝐶 =𝑜 𝐶 ( 𝒂 )=
(2)
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 25
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
1
Can be equivalently expressed as known
Probability that as cross-entropy loss
𝑦𝐶 belongs to
𝑔(𝑎) o class C
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 26
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 27
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Backpropagation
(𝑙 ) (𝑙 )
(𝑙 − 1) 𝑾 ,𝒃 𝑔 (.) 𝑾
(𝑙 +1) (𝑙 +1)
,𝒃 𝑔 (.) 𝑾
(𝐿) (𝐿)
,𝒃 o
𝒉 (𝑙)
𝒂
(𝑙)
𝒉
(𝑙 +1)
𝒂 (𝑙 +1)
𝒉
( 𝐿)
𝒂 𝒚= 𝒇 ( 𝒙 ,𝜽 ) 𝐽 ( 𝒇 ( 𝒙 , 𝜽 ) , 𝒄)
True class of
For gradient descent we need to compute and the example
Backpropagation consists of intelligent use of the chain rule of differentiation for computation of these gradients
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 28
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Backpropagation
(𝑙 ) (𝑙 )
(𝑙 − 1) 𝑾 ,𝒃 𝑔 (.) 𝑾
(𝑙 +1) (𝑙 +1)
,𝒃 𝑔 (.) 𝑾
(𝐿) (𝐿)
,𝒃 o
𝒉 (𝑙)
𝒂
(𝑙)
𝒉
(𝑙 +1)
𝒂 (𝑙 +1)
𝒉
( 𝐿)
𝒂 𝒚= 𝒇 ( 𝒙 ,𝜽 ) 𝐽 ( 𝒇 ( 𝒙 , 𝜽 ) , 𝒄)
𝑗
(𝑙)
𝜕𝐽 𝜕 𝐽 𝜕𝒂 𝑖 𝜕 𝐽 ( 𝑙−1 )
⇒ (𝑙)
= (𝑙) (𝑙)
= (𝑙) 𝒉 𝑗 … (1)
𝜕𝑾 𝑖, 𝑗 𝜕𝒂 𝑖 𝜕𝑾 𝑖 , 𝑗 𝜕 𝒂𝑖 We know from forward pass, so we
need
And similarly … (2)
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 29
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Backpropagation
(𝑙 ) (𝑙 )
(𝑙 − 1) 𝑾 ,𝒃 𝑔 (.) 𝑾
(𝑙 +1) (𝑙 +1)
,𝒃 𝑔 (.) 𝑾
(𝐿) (𝐿)
,𝒃 o
𝒉 (𝑙)
𝒂
(𝑙)
𝒉
(𝑙 +1)
𝒂 (𝑙 +1)
𝒉
( 𝐿)
𝒂 𝒚= 𝒇 ( 𝒙 ,𝜽 ) 𝐽 ( 𝒇 ( 𝒙 , 𝜽 ) , 𝒄)
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 30
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Backpropagation
(𝑙 ) (𝑙 )
(𝑙 − 1) 𝑾 ,𝒃 𝑔 (.) 𝑾
(𝑙 +1) (𝑙 +1)
,𝒃 𝑔 (.) 𝑾
(𝐿) (𝐿)
,𝒃 o
𝒉 (𝑙)
𝒂 𝒉
(𝑙) (𝑙 +1)
𝒂 𝒉
(𝑙 +1) ( 𝐿)
𝒂 𝒚= 𝒇 ( 𝒙 ,𝜽 ) 𝐽 ( 𝒇 ( 𝒙 , 𝜽 ) , 𝒄)
𝒂 𝑚 =∑ 𝑾 𝑚, 𝑗 𝒉 𝑗 +𝒃
( 𝑙+1) (𝑙+1) (𝑙) (𝑙+1) (𝑙+1) (𝑙) (𝑙+1) (𝑙) ( 𝑙+1 ) ( 𝑙) (𝑙+1)
𝑚 =𝑾 𝑚,1 𝒉1 +𝑾 𝑚, 2 𝒉2 +…+𝑾 𝑚, 𝑖 𝒉𝑖 +…+𝒃𝑚
𝑗
𝑀 𝑙+ 1
𝜕𝐽 𝜕𝐽
⇒ (𝑙)
= ∑ (𝑙+1)
𝑾 ( 𝑙+1)
𝑚,𝑖 … (4)
𝜕 𝒉𝑖 𝑚=1 𝜕 𝒂𝑚
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 31
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Backpropagation
[ 𝑾 (𝑙)𝑚, 𝑖 ] (𝑙)
𝒃𝑖
[ 𝑾 (𝑙)𝑖, 𝑗 ]
𝒉
( 𝑙− 1)
𝑗
× +¿ 𝒂𝑖
(𝑙)
𝑔 (.) 𝒉𝑗
(𝑙 )
𝜕𝐽 𝜕𝐽
(𝑙)
= (𝑙)
𝑔 ′ ( 𝒂𝑖 )
(𝑙)
… (3)
𝜕 𝒂𝑖 𝜕 𝒉𝑖
𝑔′ (.)
𝜕𝐽 𝜕𝐽
×
𝜕 𝒂(𝑙)
𝑖 𝜕 𝒉(𝑙)
𝑖
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 32
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Backpropagation
[ 𝑾 (𝑙)𝑚, 𝑖 ] (𝑙) 𝜕𝐽
𝒃
[ 𝑾 (𝑙)𝑖, 𝑗 ] 𝑖 (𝑙)
𝜕 𝒃𝑖
𝒉
( 𝑙− 1)
𝑗
× +¿ 𝒂𝑖
(𝑙)
𝑔 (.) 𝒉𝑗
(𝑙 )
𝜕𝐽 𝜕𝐽
(𝑙)
= (𝑙)
𝑔 ′ ( 𝒂𝑖 )
(𝑙)
… (3)
𝜕 𝒂𝑖 𝜕 𝒉𝑖
𝜕𝐽 𝜕𝐽
𝑔′ (.) = … (2)
𝜕 𝒃(𝑙)
𝑖 𝜕 𝒂(𝑙)
𝑖
𝜕𝐽 𝜕𝐽
×
𝜕 𝒂(𝑙)
𝑖 𝜕 𝒉(𝑙)
𝑖
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 33
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Backpropagation
[ 𝑾 (𝑙)𝑚, 𝑖 ] 𝜕𝐽 (𝑙) 𝜕𝐽
𝒃
[ 𝑾 (𝑙)𝑖, 𝑗 ] (𝑙)
𝜕𝑾 𝑖 , 𝑗 𝑖 (𝑙)
𝜕 𝒃𝑖
𝒉
( 𝑙− 1)
𝑗
× +¿ 𝒂𝑖
(𝑙)
𝑔 (.) 𝒉𝑗
(𝑙 )
𝜕𝐽 𝜕𝐽
(𝑙)
= (𝑙)
𝑔 ′ ( 𝒂𝑖 )
(𝑙)
… (3)
𝜕 𝒂𝑖 𝜕 𝒉𝑖
𝜕𝐽 𝜕𝐽
𝑔′ (.) = … (2)
𝜕 𝒃(𝑙)
𝑖 𝜕 𝒂 (𝑙)
𝑖
𝜕𝐽 𝜕 𝐽 ( 𝑙−1 )
(𝑙)
= (𝑙)
𝒉𝑗 … (1)
𝜕𝑾 𝑖 , 𝑗 𝜕 𝒂𝑖
𝜕𝐽 𝜕𝐽
×
𝜕 𝒂(𝑙)
𝑖 𝜕 𝒉(𝑙)
𝑖
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 34
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Backpropagation
[ 𝑾 (𝑙)𝑚, 𝑖 ] 𝜕𝐽 (𝑙) 𝜕𝐽
𝒃
[ 𝑾 (𝑙)𝑖, 𝑗 ] (𝑙)
𝜕𝑾 𝑖 , 𝑗 𝑖 (𝑙)
𝜕 𝒃𝑖
𝒉
( 𝑙− 1)
𝑗
× +¿ 𝒂𝑖
(𝑙)
𝑔 (.) 𝒉𝑗
(𝑙 )
𝜕𝐽 𝜕𝐽
(𝑙)
= (𝑙)
𝑔 ′ ( 𝒂𝑖 )
(𝑙)
… (3)
∀ 𝑖=1 : 𝑀 𝐿 𝜕 𝒂𝑖 𝜕 𝒉𝑖
𝜕𝐽 𝜕𝐽
𝑔′ (.) = … (2)
𝜕 𝒃(𝑙)
𝑖 𝜕 𝒂 (𝑙)
𝑖
𝜕𝐽 𝜕 𝐽 ( 𝑙−1 )
(𝑙)
= (𝑙)
𝒉𝑗 … (1)
𝜕𝑾 𝑖 , 𝑗 𝜕 𝒂𝑖
𝜕𝐽 × 𝜕𝐽 𝜕𝐽 𝑀𝑙
× 𝜕𝐽 𝜕𝐽
𝜕 𝒉(𝑙−1)
and ∀ 𝑖=1 : 𝑀 𝐿
𝜕 𝒂(𝑙) 𝜕 𝒉(𝑙) = ∑ 𝑾 ( 𝑙)
𝑚, 𝑖 … (4)
𝑖
+¿ 𝑖 𝑖 (𝑙−1)
𝜕 𝒉𝑖 (𝑙)
𝑚=1 𝜕 𝒂𝑚
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 35
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
Training Data
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 35
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
21, 27 Jan 2022 CS60010 / Deep Learning | Introduction (c) Abir Das 36
# of hidden neurons = 32 # of hidden neurons = 64 # of hidden neurons = 128 # of hidden neurons = 256
Computer Science and Engineering| Indian Institute of Technology Kharagpu
cse.iitkgp.ac.i
# of neurons in each hidden layer = 2 # of neurons in each hidden layer = 4 # of neurons in each hidden layer = 8