DSE 2020-21 2nd Sem DL Problem Solving 1.0

Deep Learning
Dr. Sugata Ghosal

BITS sugata.ghosal@pilani.bits-pilani.ac.in
Pilani
Pilani Campus
Pilani Campus
BITS
Pilani
Pilani Campus
Solved Examples
These slides are assembled by the instructor with grateful acknowledgement of the many
others who made their course materials freely available online .
Weight Updates in Deep
Networks 𝑦= 𝑦
2
1
2
1 𝑧 1
𝑦 1
1
𝑦 2
1
𝑧
1 𝑧 2
1
0 0
𝑦 1 𝑦 2
• Training Input: (x1,x2)=(1,1) Target Output: d=0.0

• div function: square error
• Activation function f( ): ReLU
• Bias: 0 at all nodes
• What are the values of w31 and w12 in next iteration?

Weight Updates with ReLU, square error 2
𝑦= 𝑦 1
= 0.25*1+0.75*1=1.0
= 0.75*1+0.25*1=1.0 2
1 𝑧 1
= ReLU()= 1 = ReLU()= 1 𝑦 1
1
𝑦 2
1
= 0.67*1+0.33*1=1.0 𝑧
1 𝑧 2
1
= ReLU()= 1.0 0 0
𝑦 1 𝑦 2

= * = 1*1=1
= *=1*w31
==0.67*1=0.67
= * =1 =*=0.67*=0.67
*=0.67-0.1=0.57
Weight Updates with ReLU Hidden, sigmoid output and binary
2
cross-entropy
𝑦= 𝑦 1
= 0.25*1+0.75*1=1.0
= 0.75*1+0.25*1=1.0 2
1 𝑧 1
= ReLU()= 1 = ReLU()= 1 𝑦 1
1
𝑦 2
1
= 0.67*1+0.33*1=1.0 𝑧
1 𝑧 2
1
= sigmoid()=e/(1+e) 0 0
𝑦 1 𝑦 2
= -d*log(y)-(1-d)*log(1-y)
div
=-d/y+(1-d)/(1-y)=1/(1-y)= (1+e)
= * = (1+e)*sigmoid()*(1-sigmoid())=e/(1+e)
= * =e/(1+e)
*=0.67-0.1*e/(1+e)=0.597
Example Special Case : Softmax
(k )
(k) i
y(k-1) z(k) y(k)
(k )
i j j
(k )
j
(k ) (k ) (k )
i j j i
Div
(k) (k) (k )
j i i
(k ) k k
i - i j
(k ) (k )
(k ) (k ) i ij j
i j j
• For future reference

• is the Kronecker delta:
Weight Updates with Softmax
y1 y2
𝑧 1 𝑧 2 Softmax Output Layer
11 =1 𝑤 12 =0
𝑤 𝑤 22 =1
𝑤 21 =0
h1 h2
• Training input: (h1, h2) = (1, 0) target output: (d1,d2)=(0,1)

• div function: cross-entropy
• Learning rate: 0.1
• Bias = 0
• What will be the value of w11 in next iteration?
• Input to softmax node z1 = w11=1; z2 =0

• y1 = e / (1+e)
• y2 = 1/ (1+e)
• div = -d log(y ) – d log(y ) = -log(y ) = log(1+e) = 1.313
Weight Updates with Softmax …
y1 y2
(k ) (k )
i j
Softmax
(k ) (k ) ij
i j j Output Layer
𝑤 𝑤 =0
11 =1 12 𝑤 22 =1
𝑤 21 =0
h1 h2
• Change in w11 = w11 = -0.1 * ddiv/dw11

= -0.1 * ddiv/dz1 * dz1/dw11 = -0.1 * ddiv/dz1 * h1
• ddiv/dy1 = -d1/y1 = 0 ddiv/dy2=-d2/y2 = -1/y2

• ddiv/dz1 = ddiv/dy1*y1(1-y1) + ddiv/dy2*y1(-y2) = - y1 y2 / y2 = -y1
•Input to softmax node z1 = w11=1; z2 =0 ; y1 = e / (1+e)
• w11 = * e/(1+e) = -0.0731
Weight Updates with Sigmoid Output
y1 y2
𝑧 1 𝑧 2 Sigmoid Output Layer
11 =1 𝑤 12 =0
𝑤 𝑤 22 =1
𝑤 21 =0
h1 h2
• Training input: (h1, h2) = (1, 0) target output: (d1,d2)=(0,1)

• Activation: sigmoid, Bias=0
• div function: cross-entropy
• Learning rate: 0.1
• What will be the value of w11 in next iteration?
• Input to output node z1 = w11=1; z2 =0

• y1 = 1 / (1+e-1) =e/(1+e) y2 = 1/ (1+e0)=1/2
• div = -d1 log(y1) – d2 log(y2) = -log(y2) = log2
Weight Updates with Sigmoid Outputs …
Input (h1, h2)=(1.0, 0.0) y1 y2
target output: (d1,d2)=(0,1)
(t+1)=? Sigmoid
Output Layer
div = -d1 log(y1) – d2 log(y2)
𝑤 𝑤 =0
11 =1 12 𝑤 22 =1
𝑤 21 =0
= h2
h1

=
= 1.0
L2 Regularization
• Loss(x,y)=3x2+6xy+y2
• What is the value of (xopt , yopt) that minimizes the loss, if L2
regularization constant 0.1 is applied?
𝜃 R∗≈ (𝐻 + 0.1 𝐼)-1 𝐻𝜃 ∗
• Now, for regularized case, optimal 𝜃R∗ is given by
where H is the Hessian of Loss(x,y) and 𝜃∗is the optimal value

L2 Regularization
• Loss(x,y)=3x2+6xy+y2 . What is the value of (xopt , yopt) that
minimizes the loss, if L2 regularization constant 0.1 is
applied?
• d Loss / dx = 6x + 6y = 0 and d Loss / dy = 6x+2y = 0 at (xopt ,

yopt). Solving these two equations, we get, in unregularized
case, xopt = yopt = 0.
• Now, for regularized case, optimal 𝜃R∗ is given by
𝜃 R∗≈ (𝐻 + 0.1 𝐼)-1 𝐻𝜃 ∗
where H is the Hessian of Loss(x,y) and 𝜃∗is the optimal value

in the unregularized case.
• Since in unregularized case, xopt=yopt=0, 𝜃R∗is also a zero
L1 Regularization
• Loss
• What is the value of (xopt , yopt) that minimizes the loss, if L1
regularization constant 0.1 is applied?
𝛼 if 𝜃�∗ ≥ 0
max 𝜃�∗ − ,
(𝜃𝑅∗ )𝑖 � 𝐻𝑖𝑖 �
0
≈ min 𝜃�∗ + 𝛼 , 0 if 𝜃�∗ < 0
� 𝐻𝑖𝑖 �
• So, in regularized case, xopt = 1 – 0.1/6 = 0.9833 and yopt =

max (0-0.1/2,0)=0.
L1 Regularization
•• LossWhat is the value of (x, yopt) that minimizes the loss, if
opt
L1 regularization constant 0.1 is applied?
• H11= d2 Loss (x,y) / dx2 = 6, H12= H21 = d2 Loss (x,y) / dxdy = 0
and H22= d2 Loss (x,y) / dy2 = 2 . So, H is diagonal and +ve
definite. Hence
𝛼 if 𝜃�∗ ≥ 0
max 𝜃�∗ − ,
(𝜃𝑅∗ )𝑖 � 𝐻𝑖𝑖 �
0
≈ min 𝜃�∗ + 𝛼 , 0 if 𝜃�∗ < 0
• d Loss(x,y)/dx = 6x -6 = 0,�
and
𝐻𝑖𝑖 d Loss(x,y)/dy
�
= 2y=0 at
unregularized (xopt , yopt). Solving these we get, in
unregularized case, xopt = 1 and yopt = 0.
• So, in regularized case, xopt = max(1 – 0.1/6,0) = 0.9833 and
yopt = max (0-0.1/2,0)=0.

DSE 2020-21 2nd Sem DL Problem Solving 1.0

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSE 2020-21 2nd Sem DL Problem Solving 1.0

Uploaded by

Copyright:

Available Formats

Deep Learning

Dr. Sugata Ghosal

• Training Input: (x1,x2)=(1,1) Target Output: d=0.0

• What are the values of w31 and w12 in next iteration?

• For future reference

𝑧 1 𝑧 2 Softmax Output Layer

• Training input: (h1, h2) = (1, 0) target output: (d1,d2)=(0,1)

• Input to softmax node z1 = w11=1; z2 =0

• Change in w11 = w11 = -0.1 * ddiv/dw11

• ddiv/dy1 = -d1/y1 = 0 ddiv/dy2=-d2/y2 = -1/y2

𝑧 1 𝑧 2 Sigmoid Output Layer

• Training input: (h1, h2) = (1, 0) target output: (d1,d2)=(0,1)

• Input to output node z1 = w11=1; z2 =0

𝜃 R∗≈ (𝐻 + 0.1 𝐼)-1 𝐻𝜃 ∗

• Now, for regularized case, optimal 𝜃R∗ is given by

where H is the Hessian of Loss(x,y) and 𝜃∗is the optimal value

• d Loss / dx = 6x + 6y = 0 and d Loss / dy = 6x+2y = 0 at (xopt ,

where H is the Hessian of Loss(x,y) and 𝜃∗is the optimal value

• So, in regularized case, xopt = 1 – 0.1/6 = 0.9833 and yopt =

You might also like