Professional Documents
Culture Documents
Backprop For Recurrent Networks
Backprop For Recurrent Networks
networks
Steady state
Reward is an explicit function of x.
The steady state of a recurrent network.
xi = f Wij x j + bi
j
max R( x)
W ,b
Recurrent backpropagation
x = f (Wx + b)
R
(D W )s = x
1
W = sx
Sensitivity lemma
R R
=
x
j
Wij bi
b i = f ( xi ) Wij x
j
1
Jacobian matrix
bi = f ( xi ) Wij x
j
1
bi
1
= f ( x
i )ij W
ij
x j
= (D W )
ij
1
Chain rule
R
R bi
=
x j i bi x j
R 1
= (D W )ij
bi
i
R
1
T R
= (D W )
x
b
Trajectories
Initialize at x(0)
Iterate for T time steps
xi (t ) = f Wij x j (t 1) + bi
j
Multilayer perceptron
Same number of neurons in each layer
Same weights and biases in each layer
(weight-sharing)
W ,b
W ,b
W ,b
x(0)
x(1)
L
x(T
)
R
s(t ) = D(t )W s(t +1) + D(t )
x(t )
T
W = s(t )x(t 1) b = s(t )
T
(x(t )) Wx(
t 1)
Jacobian matrix
b(t ) = f
(x(t )) Wx(
t 1)
bi (t )
1
= tt' (D (t
))ij Wij t1,t'
x j (t')
D(t ) = diag{ f (Wx(t 1) + b(t ))}
Chain rule
R bi (t )
=
x j (t) i,t bi (t ) x j (t)
R
R
1
T
= D (t )s(t ) W s(t +1)
x(t )