You are on page 1of 1

Deep Reinforcement Learning for Flappy Bird

Kevin Chen
Stanford University

Abstract Pipeline
Reinforcement learning is essential for training an agent to
make smart decisions under uncertainty and to take small sample
initialize
update state choose action minibatch
actions in order to achieve a higher overarching goal. In this replay
new episode next frame and replay based on - from replay
project, we combined reinforcement learning and deep memory and
memory greedy policy memory and
DQN
learning techniques to train an agent to play the game, update DQN
Flappy Bird. The challenge is that the agent only sees the
pixels and the rewards, similar to a human player. Using just if crash if not crash
this information, it is able to successfully play the game at a
human or sometimes super-human level.

Feature extractor Deep Q-Network (DQN)


[1]
Related Work
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A.K. extract n-channel Deep Q-
convert to downsample Q-value for
Fidjeland, G. Ostrovski, S. Petersen, C. Beattle, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. images from image (n most Network
[2] Legg, D. Hassabis, Human-level control through deep reinforcement learning, Nature 518, 529-533 (2015). grayscale to 84x84 each action
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with state recent frames) (DQN)
deep reinforcement learning. arXiv preprint arXiv: 1312.5602, 2013.

Reinforcement Learning Experimental Results


Average Score
State: Sequence of frames and actions Easy Medium Hard Game Human Baseline DQN DQN DQN
(flap every n) (easy) (medium) (hard)
st = x1, a1, x2, a2, xt-1, at-1, xt difficulty
Action: Flap (a = 1) or Do nothing (a = 0) Easy Inf Inf Inf Inf Inf
Rewards: rewardAlive rewardPipe rewardDead Medium Inf Inf 0.7 Inf Inf
Hard 21 0.5 0.1 0.6 82.2
+0.1 +1.0 -1.0
Highest Score Achieved
Q-learning: Q*(s, a) = Es~[r + maxa Q*(s, a) | s, a] Game Human Baseline DQN DQN DQN
(flap every n) (easy) (medium) (hard)
Qi+1(s, a) Es~[r + maxa Qi(s, a) | s, a] difficulty

Loss Li(i) = Es, a~p()[(yi Q(s, a; i))2] Easy Inf Inf Inf Inf Inf
Medium Inf 11 2 Inf Inf
yi = Es~[r + maxa Qtarget(s, a; target) | s, a] Hard 65 1 1 1 215

You might also like