Exp 5.
Flappy Bird
Exp No: 5 Flappy Bird
27.03.2025
Aim: To train an AI agent to play Flappy Bird using Deep Q-Networks (DQN).
Objective: Flappy Bird is a game where the player controls a bird that must navigate through
gaps between pipes without hitting them. The goal is to maximize the agent's survival time
and score using reinforcement learning with a deep Q-network (DQN).
Simulation Tool: The Flappy Bird environment is implemented
using gym or gymnasium along with pygame for visualization.
Action Space Discrete(2)
Observation
Continuous (processed via CNN in DQN)
Space
import gymnasium.make("FlappyBird-v0")
Description: The game starts with the bird in the air, where it continuously falls due to
gravity. The player (agent) can either flap (jump) or do nothing. The objective is to pass
through as many pipes as possible without colliding.
Algorithm :
1. Initialize the deep Q-network with random weights.
2. Set hyperparameters such as learning rate (), discount factor (), exploration rate (), and
replay memory size.
3. For each episode:
Start at the initial state.
Choose an action using an -greedy policy.
Perform the action and observe the next state, reward, and done status.
Store the experience (state, action, reward, next state, done) in a replay buffer.
Sample a mini-batch from the replay buffer.
Compute the target Q-value using:
Update the Q-network using backpropagation.
Exp 5. Flappy Bird 1
Reduce over time.
Repeat until the game is over.
4. Train the DQN for multiple episodes until convergence.
5. Use the trained model to find the optimal policy.
Action Space: The action shape is (1,) in the range {0, 1} , indicating whether the bird should
flap or not.
0: Do nothing (bird falls)
1: Flap (bird jumps up)
Observation Space:
The observation consists of pixel frames processed using convolutional neural networks
(CNNs). The input state includes:
Stacked frames for temporal information
Bird’s vertical position
Bird’s velocity
Distance to next pipe
Height of next pipe
Rewards : Reward schedule:
Successfully passing a pipe: +1
Collision with ground or pipe: -1 (Game Over)
Starting State :
The episode starts with the bird in an initial position with a downward velocity.
Episode End : The episode ends if the following happens:
The bird collides with a pipe.
The bird collides with the ground.
The bird flies too high (in some implementations).
Arguments
The render_mode argument enables visualization, and sutton_barto_reward modifies the
reward structure to match the original implementation.
Program:
Exp 5. Flappy Bird 2
import os
import random
import numpy as np
import matplotlib.pyplot as plt
from collections import deque
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
import gym
from gym import spaces
from IPython.display import clear_output, display
import time
# Suppress TensorFlow warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# Force CPU usage instead of GPU to avoid compatibility issues
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
print("TensorFlow version:", tf.__version__)
print("Running with device:", tf.config.list_physical_devices())
# Define the Flappy Bird environment
class FlappyBirdEnv(gym.Env):
def __init__(self):
super(FlappyBirdEnv, self).__init__()
# Environment parameters
self.gravity = 1
self.bird_velocity = 0
self.bird_position = 50
self.pipe_gap = 40
self.pipe_width = 10
self.pipe_velocity = 2
self.pipes = []
self.screen_width = 100
self.screen_height = 100
Exp 5. Flappy Bird 3
self.pipe_spawn_freq = 50
self.frames_since_last_pipe = 0
self.score = 0
# Define action and observation space
self.action_space = spaces.Discrete(2) # 0: do nothing, 1: flap
self.observation_space = spaces.Box(low=0, high=255, shape=(4,), dtype
def reset(self):
self.bird_velocity = 0
self.bird_position = 50
self.pipes = [{'x': 70, 'gap_pos': random.randint(20, 80)}]
self.frames_since_last_pipe = 0
self.score = 0
return self._get_state()
def step(self, action):
# Apply action (flap or do nothing)
if action == 1:
self.bird_velocity = -10
# Update bird position
self.bird_velocity += self.gravity
self.bird_position += self.bird_velocity
# Spawn new pipes
self.frames_since_last_pipe += 1
if self.frames_since_last_pipe >= self.pipe_spawn_freq:
self.pipes.append({'x': self.screen_width, 'gap_pos': random.randint(20
self.frames_since_last_pipe = 0
# Move pipes
for pipe in self.pipes:
pipe['x'] -= self.pipe_velocity
# Remove pipes that are off-screen
self.pipes = [pipe for pipe in self.pipes if pipe['x'] + self.pipe_width > 0]
Exp 5. Flappy Bird 4
# Check if bird has passed a pipe
for pipe in self.pipes:
if self.screen_width // 5 == pipe['x'] + self.pipe_width:
self.score += 1
# Check for collisions
done = False
reward = 0.1 # Default small positive reward
# Bird hits the ground or ceiling
if self.bird_position <= 0 or self.bird_position >= self.screen_height:
done = True
reward = -10
else:
# Check for pipe collisions
for pipe in self.pipes:
if (self.screen_width // 5 >= pipe['x'] and
self.screen_width // 5 <= pipe['x'] + self.pipe_width):
if (self.bird_position <= pipe['gap_pos'] - self.pipe_gap // 2 or
self.bird_position >= pipe['gap_pos'] + self.pipe_gap // 2):
done = True
reward = -10
break
return self._get_state(), reward, done, {'score': self.score}
def _get_state(self):
# Get the nearest pipe
nearest_pipe = None
nearest_distance = float('inf')
for pipe in self.pipes:
if pipe['x'] + self.pipe_width >= self.screen_width // 5:
distance = pipe['x'] - self.screen_width // 5
if distance < nearest_distance:
nearest_distance = distance
nearest_pipe = pipe
Exp 5. Flappy Bird 5
if nearest_pipe is None:
# If no pipe ahead, use default values
horizontal_distance = self.screen_width
gap_pos = self.screen_height // 2
else:
horizontal_distance = nearest_pipe['x'] - self.screen_width // 5
gap_pos = nearest_pipe['gap_pos']
# Normalized state:
# [bird_y, bird_velocity, distance_to_pipe, center_of_gap]
state = [
self.bird_position / self.screen_height,
self.bird_velocity / 10,
horizontal_distance / self.screen_width,
gap_pos / self.screen_height if nearest_pipe else 0.5
]
return np.array(state, dtype=np.float32)
def render(self):
# Create a simple visualization using matplotlib
plt.figure(figsize=(5, 5))
plt.xlim(0, self.screen_width)
plt.ylim(0, self.screen_height)
# Draw pipes
for pipe in self.pipes:
# Top pipe
plt.fill_between([pipe['x'], pipe['x'] + self.pipe_width],
[0, 0],
[pipe['gap_pos'] - self.pipe_gap // 2, pipe['gap_pos'] - self.pi
color='green')
# Bottom pipe
plt.fill_between([pipe['x'], pipe['x'] + self.pipe_width],
[pipe['gap_pos'] + self.pipe_gap // 2, pipe['gap_pos'] + self.p
[self.screen_height, self.screen_height],
color='green')
Exp 5. Flappy Bird 6
# Draw bird
plt.scatter(self.screen_width // 5, self.bird_position, color='yellow', s=100)
# Add score
plt.text(5, 95, f'Score: {self.score}', fontsize=12)
plt.title('Flappy Bird')
plt.axis('off')
# Display in Colab
display(plt.gcf())
plt.close()
# Define DQN Agent
class DQNAgent:
def __init__(self, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.memory = deque(maxlen=2000)
self.gamma = 0.95 # discount rate
self.epsilon = 1.0 # exploration rate
self.epsilon_min = 0.01
self.epsilon_decay = 0.995
self.learning_rate = 0.001
self.model = self._build_model()
self.target_model = self._build_model()
self.update_target_model()
def _build_model(self):
# Neural Net for Deep-Q learning Model - simplified for stability
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_r
return model
def update_target_model(self):
Exp 5. Flappy Bird 7
# copy weights from model to target_model
self.target_model.set_weights(self.model.get_weights())
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
def act(self, state):
if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
state_tensor = tf.convert_to_tensor(state.reshape(1, -1), dtype=tf.float32)
act_values = self.model(state_tensor, training=False)
return np.argmax(act_values[0])
def replay(self, batch_size):
if len(self.memory) < batch_size:
return
minibatch = random.sample(self.memory, batch_size)
states = np.array([transition[0] for transition in minibatch])
actions = np.array([transition[1] for transition in minibatch])
rewards = np.array([transition[2] for transition in minibatch])
next_states = np.array([transition[3] for transition in minibatch])
dones = np.array([transition[4] for transition in minibatch])
# Get current states and predict Q values
states_tensor = tf.convert_to_tensor(states, dtype=tf.float32)
with tf.GradientTape() as tape:
q_values = self.model(states_tensor, training=True)
# Select the Q values for the actions that were taken
indices = tf.range(0, tf.shape(q_values)[0]) * tf.shape(q_values)[1] + act
selected_q_values = tf.gather(tf.reshape(q_values, [-1]), indices)
# Get Q values for next states with target model
next_states_tensor = tf.convert_to_tensor(next_states, dtype=tf.float32
next_q_values = self.target_model(next_states_tensor, training=False)
# Calculate targets
Exp 5. Flappy Bird 8
max_next_q_values = tf.reduce_max(next_q_values, axis=1)
targets = rewards + (1 - dones) * self.gamma * max_next_q_values
# Calculate loss
loss = tf.keras.losses.mse(selected_q_values, targets)
# Get gradients and update model
grads = tape.gradient(loss, self.model.trainable_variables)
self.model.optimizer.apply_gradients(zip(grads, self.model.trainable_varia
# Decay epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
# Custom save method to avoid keras restrictions
def save_model(self, filepath):
self.model.save_weights(filepath)
# Custom load method
def load_model(self, filepath):
self.model.load_weights(filepath)
# Training function
def train_dqn(episodes=100):
env = FlappyBirdEnv()
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
agent = DQNAgent(state_size, action_size)
batch_size = 32
# Keep track of scores
scores = []
print("Starting training for", episodes, "episodes")
for e in range(episodes):
state = env.reset()
total_reward = 0
Exp 5. Flappy Bird 9
done = False
step = 0
while not done:
step += 1
# Choose action
action = agent.act(state)
# Take action
next_state, reward, done, info = env.step(action)
total_reward += reward
# Remember the experience
agent.remember(state, action, reward, next_state, done)
state = next_state
# Train through replay
if len(agent.memory) > batch_size:
agent.replay(batch_size)
# Update target model occasionally
if step % 10 == 0:
agent.update_target_model()
# Visualize occasionally - reduced frequency for better performance
if e % 50 == 0 and step % 50 == 0:
clear_output(wait=True)
env.render()
time.sleep(0.01)
scores.append(info['score'])
# Print episode stats
avg_score = np.mean(scores[-100:]) if len(scores) >= 100 else np.mean(s
print(f"Episode: {e+1}/{episodes}, Score: {info['score']}, Epsilon: {agent.e
# Save model weights occasionally
Exp 5. Flappy Bird 10
if (e+1) % 50 == 0:
print(f"Saving model at episode {e+1}")
model_path = f"flappy_bird_model_ep{e+1}"
try:
agent.save_model(model_path)
print(f"Successfully saved model to {model_path}")
except Exception as ex:
print(f"Failed to save model: {ex}")
# Continue without saving
# Plot learning curve
plt.figure(figsize=(10, 6))
plt.plot(scores)
plt.title('Learning Curve')
plt.xlabel('Episode')
plt.ylabel('Score')
plt.show()
return agent, scores
# Function to watch trained agent play
def watch_agent_play(agent, episodes=3):
env = FlappyBirdEnv()
for e in range(episodes):
state = env.reset()
done = False
step = 0
while not done and step < 1000: # Add step limit as a safeguard
step += 1
clear_output(wait=True)
env.render()
# Agent chooses action with no exploration
state_tensor = tf.convert_to_tensor(state.reshape(1, -1), dtype=tf.float3
action = np.argmax(agent.model(state_tensor, training=False)[0])
Exp 5. Flappy Bird 11
# Take action
state, reward, done, info = env.step(action)
time.sleep(0.05) # Slow down for better visualization
print(f"Episode {e+1}: Score = {info['score']}")
# Run the training
print("Starting Flappy Bird DQN training (Final fixed version for Colab)")
# Use try-except to handle potential errors gracefully
try:
# Train agent with fewer episodes for testing - reduce to 100 for quicker res
agent, scores = train_dqn(episodes=100)
# Watch the trained agent play
watch_agent_play(agent, episodes=3)
except Exception as e:
print(f"An error occurred during execution: {e}")
Output: (Simulation Screen Shots)
Exp 5. Flappy Bird 12
Result: Using Deep Q-Networks (DQN), we successfully train an agent to play Flappy Bird.
The neural network learns optimal Q-values to make decisions based on state observations.
The trained model enables the agent to flap at the right time to maximize its survival and
score.
Exp 5. Flappy Bird 13