Professional Documents
Culture Documents
Kevin Chen
Stanford University
Abstract Pipeline
Reinforcement learning is essential for training an agent to
make smart decisions under uncertainty and to take small sample
initialize
update state choose action minibatch
actions in order to achieve a higher overarching goal. In this replay
new episode next frame and replay based on - from replay
project, we combined reinforcement learning and deep memory and
memory greedy policy memory and
DQN
learning techniques to train an agent to play the game, update DQN
Flappy Bird. The challenge is that the agent only sees the
pixels and the rewards, similar to a human player. Using just if crash if not crash
this information, it is able to successfully play the game at a
human or sometimes super-human level.
Loss Li(i) = Es, a~p()[(yi Q(s, a; i))2] Easy Inf Inf Inf Inf Inf
Medium Inf 11 2 Inf Inf
yi = Es~[r + maxa Qtarget(s, a; target) | s, a] Hard 65 1 1 1 215