You are on page 1of 3

Gradient of our cost function with 2-Layer NN

1
P
m !(i)
J W[1] ; b[1] ; W[2] ; b[2] = m L y ; y (i)
i=1

Z[1] = W[1] X + b[1]


A[1] = Z[1]
Z[2] = W[2] A[1] + b[2]
A[2] = Z[2]

~ 1
Y = A[2] = Z[2] = Z[2]
1+e
The output neuron activation function is sigmoid. The cost function with all the above values can then be
written as follows:
m
1 X !(i)
J = L y ; y (i)
m i=1
1 Xn ~ ~
o
= Y log Y (1 Y) log 1 Y
m
1 X 1 1
= Y log (1 Y) log 1
m 1 + e Z[2] 1 + e Z[2]
( ! )
1 X 1 1
= Y log [2] [1]
(1 Y) log 1
m 1 + e [W A +b ] 1 + e [W A +b ]
[2] [2] [1] [2]

1 X 1 1
= Y log (1 Y) log 1
m 1+e [W[2] (Z[1] )+b[2] ]
1+e [W[2] (Z[1] )+b[2] ]

1 X 1 1
= Y log (1 Y) log 1
m 1+e [ ( ) ] 1+e [ ( [1] X+b[1] )+b[2] ]
W [2] W [1] X+b[1] +b[2] W [2] W

1
@J
Let us start with the gradient of this cost function. We need to …nd ; @J ; @J ; @J
@W[1] @b[1] @W[2] @b[2]

@J ~
@J @ Y
=
@W[1] ~ @W[1]
@Y
@J 1 @ @ ~ @ ~
= Y log Y (1 Y) log 1 Y
@Y~ ~
m @Y @Y~ ~
@Y
1 1 1 d 1
= Y + (1 Y) ! log(x) =
m Y~ 1 Y ~ dx x
1 Y 1 Y d 1
= + ! log(1 x) =
m Y~ 1 Y ~ dx 1 x
@J 1 Y 1 Y @Y~
= +
@W[1] m Y~ 1 Y ~ @W[1]
~
@Y @Y~ @Z [2]
@Z[2]
0 [2]
= = Z
@W[1] @Z[2] @W[1] @W[1] !
~
@Y @ 1 1 1
0
= Z[2] = = 1
@Z[2] @Z[2] 1 + e Z[2] 1+e Z[2] 1+e Z[2]

~
@Y ~ 1
= Y ~
Y
@Z[2]
@J 1 Y 1 Y ~
@Y 1 Y 1 Y @Y~ @Z[2]
= + = +
@W[1] m ~
Y 1 ~ @W
Y [1] m ~
Y 1 ~ @Z[2] @W[1]
Y
1 Y 1 h
Y ~ i @Z[2]
= + Y 1 Y ~
m ~
Y 1 ~
Y @W[1]
1 ~ @Z[2]
= Y Y
m @W[1]
@J ~
@J @ Y @Z[2]
=
@W[1] ~ @Z[2] @W[1]
@Y
@Z[2] @Z[2] @A[1]
=
@W[1] @A[1] @W[1]
@Z[2]
= W[2]
@A[1]
@J ~ @Z[2] @A[1]
@J @ Y
=
@W[1] ~ @Z[2] @A[1] @W[1]
@Y
@J 1 ~ @A[1]
= W[2]T Y Y
@W[1] m @W[1]
@A[1] @A[1] @Z[1]
=
@W[1] @Z[1] @W[1]
@A[1] 0 @Z[1]
= Z[1] , =X
@Z[1] @W[1]

@J ~ @Z[2] @A[1] @Z[1]


@J @ Y
=
@W[1] ~ @Z[2] @A[1] @Z[1] @W[1]
@Y
@J 1 n [2]T ~ Y 0
o
= W Y Z[1] XT
@W[1] m
Similarly,

@J ~ @Z[2] @A[1] @Z[1]


@J @ Y
=
@b[1] ~ @Z[2] @A[1] @Z[1] @b[1]
@Y
@Z[1]
= 1
@b[1]
@J 1 X [2]T ~ 0
= W Y Y Z[1]
@b[1] m
Here, represents the Dot product whereas the represents the element-wise multiply of two matrices.

2
@J
Now, let us look at the terms ; @J .
@W[2] @b[2]
Starting from,

@J ~
@J @ Y
=
@W[2] ~ @W[2]
@Y
@J 1 Y 1 Y ~
@Y
= +
@W[2] m ~
Y 1 Y ~ @W[2]
~
@Y ~ @Z[2]
@Y
=
@W[2] @Z[2] @W[2]
[2]
~ 1 Y
= Y ~ @Z
@W[2]
@Z[2]
= A[1]
@W[2]
@J ~ @Z[2]
@J @ Y
=
@W[2] ~ @Z[2] @W[2]
@Y
1 ~
= Y Y A[1]T
m
Similarly,

@J ~ @Z[2]
@J @ Y
=
@b[2] ~ @Z[2] @b[2]
@Y
1 X ~
= Y Y
m

You might also like