You are on page 1of 14

Deep Learning

Dr. Sugata Ghosal


BITS sugata.ghosal@pilani.bits-pilani.ac.in
Pilani
Pilani Campus
Pilani Campus
BITS
Pilani
Pilani Campus

Solved Examples

These slides are assembled by the instructor with grateful acknowledgement of the many
others who made their course materials freely available online .
Weight Updates in Deep
Networks  𝑦= 𝑦
2
1

2
1  𝑧 1
 𝑦 1
1
 𝑦 2
1
 𝑧
1  𝑧 2
1

0 0
 𝑦 1  𝑦 2

  • Training Input: (x1,x2)=(1,1) Target Output: d=0.0


• div function: square error
• Activation function f( ): ReLU
• Bias: 0 at all nodes

• What are the values of w31 and w12 in next iteration?


Weight Updates with ReLU, square error 2
 𝑦= 𝑦 1

  = 0.25*1+0.75*1=1.0
  = 0.75*1+0.25*1=1.0 2
1  𝑧 1
  = ReLU()= 1 = ReLU()= 1  𝑦 1
1
 𝑦 2
1
  = 0.67*1+0.33*1=1.0  𝑧
1  𝑧 2
1
  = ReLU()= 1.0 0 0
 𝑦 1  𝑦 2
 

 = * = 1*1=1
 = *=1*w31
==0.67*1=0.67
 = * =1 =*=0.67*=0.67
*=0.67-0.1=0.57
Weight Updates with ReLU Hidden, sigmoid output and binary
2
cross-entropy
 𝑦= 𝑦 1

  = 0.25*1+0.75*1=1.0
  = 0.75*1+0.25*1=1.0 2
1  𝑧 1
  = ReLU()= 1 = ReLU()= 1  𝑦 1
1
 𝑦 2
1
  = 0.67*1+0.33*1=1.0  𝑧
1  𝑧 2
1
  = sigmoid()=e/(1+e) 0 0
 𝑦 1  𝑦 2
  = -d*log(y)-(1-d)*log(1-y)
div
=-d/y+(1-d)/(1-y)=1/(1-y)= (1+e)

 = * = (1+e)*sigmoid()*(1-sigmoid())=e/(1+e)

 = * =e/(1+e)
*=0.67-0.1*e/(1+e)=0.597
Example Special Case : Softmax
(k )
(k) i
y(k-1) z(k) y(k)
(k )
i j j
(k )
j
(k ) (k ) (k )
i j j i
Div
(k) (k) (k )
j i i
(k ) k k
i - i j

(k ) (k )
(k ) (k ) i ij j
i j j

• For future reference


• is the Kronecker delta:
Weight Updates with Softmax
y1 y2

 𝑧 1  𝑧 2 Softmax Output Layer

  11 =1 𝑤 12 =0
𝑤  𝑤 22 =1
 𝑤 21 =0

h1 h2

• Training input: (h1, h2) = (1, 0) target output: (d1,d2)=(0,1)


• div function: cross-entropy
• Learning rate: 0.1
• Bias = 0
• What will be the value of w11 in next iteration?

• Input to softmax node z1 = w11=1; z2 =0


• y1 = e / (1+e)
• y2 = 1/ (1+e)
• div = -d log(y ) – d log(y ) = -log(y ) = log(1+e) = 1.313
Weight Updates with Softmax …
y1 y2

(k ) (k )
i j
Softmax
(k ) (k ) ij
i j j Output Layer

𝑤 𝑤 =0
  11 =1   12  𝑤 22 =1
 𝑤 21 =0

h1 h2

  • Change in w11 = w11 = -0.1 * ddiv/dw11


= -0.1 * ddiv/dz1 * dz1/dw11 = -0.1 * ddiv/dz1 * h1

• ddiv/dy1 = -d1/y1 = 0 ddiv/dy2=-d2/y2 = -1/y2


• ddiv/dz1 = ddiv/dy1*y1(1-y1) + ddiv/dy2*y1(-y2) = - y1 y2 / y2 = -y1
 •Input to softmax node z1 = w11=1; z2 =0 ; y1 = e / (1+e)
• w11 = * e/(1+e) = -0.0731
Weight Updates with Sigmoid Output
y1 y2

 𝑧 1  𝑧 2 Sigmoid Output Layer

  11 =1 𝑤 12 =0
𝑤  𝑤 22 =1
 𝑤 21 =0

h1 h2

• Training input: (h1, h2) = (1, 0) target output: (d1,d2)=(0,1)


• Activation: sigmoid, Bias=0
• div function: cross-entropy
• Learning rate: 0.1
• What will be the value of w11 in next iteration?

• Input to output node z1 = w11=1; z2 =0


• y1 = 1 / (1+e-1) =e/(1+e) y2 = 1/ (1+e0)=1/2
• div = -d1 log(y1) – d2 log(y2) = -log(y2) = log2
Weight Updates with Sigmoid Outputs …
Input (h1, h2)=(1.0, 0.0) y1 y2
target output: (d1,d2)=(0,1)
 (t+1)=? Sigmoid
Output Layer
  div = -d1 log(y1) – d2 log(y2)
𝑤 𝑤 =0
  11 =1   12  𝑤 22 =1
 𝑤 21 =0
= h2
h1

 
=

= 1.0
L2 Regularization
• Loss(x,y)=3x2+6xy+y2
• What is the value of (xopt , yopt) that minimizes the loss, if L2
regularization constant 0.1 is applied?

𝜃 R∗≈ (𝐻 + 0.1 𝐼)-1 𝐻𝜃 ∗

• Now, for regularized case, optimal 𝜃R∗ is given by

where H is the Hessian of Loss(x,y) and 𝜃∗is the optimal value


L2 Regularization
• Loss(x,y)=3x2+6xy+y2 . What is the value of (xopt , yopt) that
minimizes the loss, if L2 regularization constant 0.1 is
applied?

• d Loss / dx = 6x + 6y = 0 and d Loss / dy = 6x+2y = 0 at (xopt ,


yopt). Solving these two equations, we get, in unregularized
case, xopt = yopt = 0.
• Now, for regularized case, optimal 𝜃R∗ is given by
𝜃 R∗≈ (𝐻 + 0.1 𝐼)-1 𝐻𝜃 ∗

where H is the Hessian of Loss(x,y) and 𝜃∗is the optimal value


in the unregularized case.
• Since in unregularized case, xopt=yopt=0, 𝜃R∗is also a zero
L1 Regularization
•  Loss
• What is the value of (xopt , yopt) that minimizes the loss, if L1
regularization constant 0.1 is applied?

𝛼 if 𝜃�∗ ≥ 0
max 𝜃�∗ − ,
(𝜃𝑅∗ )𝑖 � 𝐻𝑖𝑖 �
0
≈ min 𝜃�∗ + 𝛼 , 0 if 𝜃�∗ < 0
� 𝐻𝑖𝑖 �

• So, in regularized case, xopt = 1 – 0.1/6 = 0.9833 and yopt =


max (0-0.1/2,0)=0.
L1 Regularization
••  LossWhat is the value of (x, yopt) that minimizes the loss, if
opt
L1 regularization constant 0.1 is applied?
• H11= d2 Loss (x,y) / dx2 = 6, H12= H21 = d2 Loss (x,y) / dxdy = 0
and H22= d2 Loss (x,y) / dy2 = 2 . So, H is diagonal and +ve
definite. Hence

𝛼 if 𝜃�∗ ≥ 0
max 𝜃�∗ − ,
(𝜃𝑅∗ )𝑖 � 𝐻𝑖𝑖 �
0
≈ min 𝜃�∗ + 𝛼 , 0 if 𝜃�∗ < 0
• d Loss(x,y)/dx = 6x -6 = 0,�
and
𝐻𝑖𝑖 d Loss(x,y)/dy

= 2y=0 at
unregularized (xopt , yopt). Solving these we get, in
unregularized case, xopt = 1 and yopt = 0.
• So, in regularized case, xopt = max(1 – 0.1/6,0) = 0.9833 and
yopt = max (0-0.1/2,0)=0.

You might also like