You are on page 1of 11

30

Adam
echizen_tm
Apr.29, 2015


Adam

Adam(1p)
SGD(1p)
Adam(2p)
(3p)
Adam(1p)

Adam
state of the art
AdaGrad+RMSProp()

(AdaGrad)

SGD
gt = ft ( t1 )

t = t1 gt

f ()
g

Adam
gt = ft ( t1 )
2

t = t1 E[g] / E[g ]

f ()
g

Adam
2

t = t1 E[g] / E[g ]

abs()

()

1
E[g] / E[g 2 ]

E[g] / E[g 2 ]



m=0
(100)
m_1 = (0 + 100) / 2 = 50
m_2 = (0 + 100 + 100) / 3 = 66.6
m_3 = (0 + 100 + 100 + 100) / 4 = 75

(0)


(Exponential Moving Average)

mt = mt1 (1 )gt
t

mt = (1 ) gi
ti

i=1



t
"
%
ti
E $mt = (1 ) gi '
#
&
i=1
t

E[gt ](1 )

ti

i=1
t

t1

= E[gt ]( ti ti )
i=1

i=0
t

= E[gt ](1 )

Adam
gt = ft ( t1 )

mt = 1mt1 + (1 1 )gt
2
t

vt = 2 vt1 + (1 2 )g
t

mt = mt / (1 1 )

vt = vt / (1 2t )

t = t1 m t / vt


Adam

E[g]
/ E[g 2 ]