0% found this document useful (0 votes)

33 views13 pages

AI Training for Flappy Bird DQN

The document outlines an experiment to train an AI agent to play Flappy Bird using Deep Q-Networks (DQN) with the aim of maximizing survival time and score. It details the simulation environment, action and observation spaces, reward structure, and the training algorithm. The experiment successfully demonstrates the agent's ability to play the game after training.

Uploaded by

paramveersinghcr7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views13 pages

AI Training for Flappy Bird DQN

Uploaded by

paramveersinghcr7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Exp 5.

Flappy Bird
Exp No: 5 Flappy Bird

27.03.2025

Aim: To train an AI agent to play Flappy Bird using Deep Q-Networks (DQN).
Objective: Flappy Bird is a game where the player controls a bird that must navigate through
gaps between pipes without hitting them. The goal is to maximize the agent's survival time
and score using reinforcement learning with a deep Q-network (DQN).

Simulation Tool: The Flappy Bird environment is implemented

using gym or gymnasium along with pygame for visualization.

Action Space Discrete(2)

Observation
Continuous (processed via CNN in DQN)
Space

import gymnasium.make("FlappyBird-v0")

Description: The game starts with the bird in the air, where it continuously falls due to
gravity. The player (agent) can either flap (jump) or do nothing. The objective is to pass
through as many pipes as possible without colliding.
Algorithm :

1. Initialize the deep Q-network with random weights.

2. Set hyperparameters such as learning rate (), discount factor (), exploration rate (), and
replay memory size.

3. For each episode:

Start at the initial state.

Choose an action using an -greedy policy.

Perform the action and observe the next state, reward, and done status.

Store the experience (state, action, reward, next state, done) in a replay buffer.

Sample a mini-batch from the replay buffer.

Compute the target Q-value using:

Update the Q-network using backpropagation.

Exp 5. Flappy Bird 1

Reduce over time.

Repeat until the game is over.

4. Train the DQN for multiple episodes until convergence.

5. Use the trained model to find the optimal policy.

Action Space: The action shape is (1,) in the range {0, 1} , indicating whether the bird should
flap or not.

0: Do nothing (bird falls)

1: Flap (bird jumps up)

Observation Space:
The observation consists of pixel frames processed using convolutional neural networks
(CNNs). The input state includes:

Stacked frames for temporal information

Bird’s vertical position

Bird’s velocity

Distance to next pipe

Height of next pipe

Rewards : Reward schedule:

Successfully passing a pipe: +1

Collision with ground or pipe: -1 (Game Over)

Starting State :

The episode starts with the bird in an initial position with a downward velocity.
Episode End : The episode ends if the following happens:

The bird collides with a pipe.

The bird collides with the ground.

The bird flies too high (in some implementations).

Arguments
The render_mode argument enables visualization, and sutton_barto_reward modifies the
reward structure to match the original implementation.

Program:

Exp 5. Flappy Bird 2

import os
import random
import numpy as np
import matplotlib.pyplot as plt
from collections import deque
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
import gym
from gym import spaces
from IPython.display import clear_output, display
import time

# Suppress TensorFlow warnings

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# Force CPU usage instead of GPU to avoid compatibility issues

os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

print("TensorFlow version:", tf.version)

print("Running with device:", tf.config.list_physical_devices())

# Define the Flappy Bird environment

class FlappyBirdEnv(gym.Env):
def __init__(self):
super(FlappyBirdEnv, self).__init__()

# Environment parameters
self.gravity = 1
self.bird_velocity = 0
self.bird_position = 50
self.pipe_gap = 40
self.pipe_width = 10
self.pipe_velocity = 2
self.pipes = []
self.screen_width = 100
self.screen_height = 100

Exp 5. Flappy Bird 3

self.pipe_spawn_freq = 50
self.frames_since_last_pipe = 0
self.score = 0

# Define action and observation space

self.action_space = spaces.Discrete(2) # 0: do nothing, 1: flap
self.observation_space = spaces.Box(low=0, high=255, shape=(4,), dtype

def reset(self):
self.bird_velocity = 0
self.bird_position = 50
self.pipes = [{'x': 70, 'gap_pos': random.randint(20, 80)}]
self.frames_since_last_pipe = 0
self.score = 0
return self._get_state()

def step(self, action):

# Apply action (flap or do nothing)
if action == 1:
self.bird_velocity = -10

# Update bird position

self.bird_velocity += self.gravity
self.bird_position += self.bird_velocity

# Spawn new pipes

self.frames_since_last_pipe += 1
if self.frames_since_last_pipe >= self.pipe_spawn_freq:
self.pipes.append({'x': self.screen_width, 'gap_pos': random.randint(20
self.frames_since_last_pipe = 0

# Move pipes
for pipe in self.pipes:
pipe['x'] -= self.pipe_velocity

# Remove pipes that are off-screen

self.pipes = [pipe for pipe in self.pipes if pipe['x'] + self.pipe_width > 0]

Exp 5. Flappy Bird 4

# Check if bird has passed a pipe
for pipe in self.pipes:
if self.screen_width // 5 == pipe['x'] + self.pipe_width:
self.score += 1

# Check for collisions

done = False
reward = 0.1 # Default small positive reward

# Bird hits the ground or ceiling

if self.bird_position <= 0 or self.bird_position >= self.screen_height:
done = True
reward = -10
else:
# Check for pipe collisions
for pipe in self.pipes:
if (self.screen_width // 5 >= pipe['x'] and
self.screen_width // 5 <= pipe['x'] + self.pipe_width):
if (self.bird_position <= pipe['gap_pos'] - self.pipe_gap // 2 or
self.bird_position >= pipe['gap_pos'] + self.pipe_gap // 2):
done = True
reward = -10
break

return self._get_state(), reward, done, {'score': self.score}

def _get_state(self):
# Get the nearest pipe
nearest_pipe = None
nearest_distance = float('inf')

for pipe in self.pipes:

if pipe['x'] + self.pipe_width >= self.screen_width // 5:
distance = pipe['x'] - self.screen_width // 5
if distance < nearest_distance:
nearest_distance = distance
nearest_pipe = pipe

Exp 5. Flappy Bird 5

if nearest_pipe is None:
# If no pipe ahead, use default values
horizontal_distance = self.screen_width
gap_pos = self.screen_height // 2
else:
horizontal_distance = nearest_pipe['x'] - self.screen_width // 5
gap_pos = nearest_pipe['gap_pos']

# Normalized state:
# [bird_y, bird_velocity, distance_to_pipe, center_of_gap]
state = [
self.bird_position / self.screen_height,
self.bird_velocity / 10,
horizontal_distance / self.screen_width,
gap_pos / self.screen_height if nearest_pipe else 0.5
]

return np.array(state, dtype=np.float32)

def render(self):
# Create a simple visualization using matplotlib
plt.figure(figsize=(5, 5))
plt.xlim(0, self.screen_width)
plt.ylim(0, self.screen_height)

# Draw pipes
for pipe in self.pipes:
# Top pipe
plt.fill_between([pipe['x'], pipe['x'] + self.pipe_width],
[0, 0],
[pipe['gap_pos'] - self.pipe_gap // 2, pipe['gap_pos'] - self.pi
color='green')
# Bottom pipe
plt.fill_between([pipe['x'], pipe['x'] + self.pipe_width],
[pipe['gap_pos'] + self.pipe_gap // 2, pipe['gap_pos'] + self.p
[self.screen_height, self.screen_height],
color='green')

Exp 5. Flappy Bird 6

# Draw bird
plt.scatter(self.screen_width // 5, self.bird_position, color='yellow', s=100)

# Add score
plt.text(5, 95, f'Score: {self.score}', fontsize=12)

plt.title('Flappy Bird')
plt.axis('off')

# Display in Colab
display(plt.gcf())
plt.close()

# Define DQN Agent

class DQNAgent:
def __init__(self, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.memory = deque(maxlen=2000)
self.gamma = 0.95 # discount rate
self.epsilon = 1.0 # exploration rate
self.epsilon_min = 0.01
self.epsilon_decay = 0.995
self.learning_rate = 0.001
self.model = self._build_model()
self.target_model = self._build_model()
self.update_target_model()

def _build_model(self):
# Neural Net for Deep-Q learning Model - simplified for stability
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.compile(loss='mse', optimizer=Adam(learning_rate=self.learning_r
return model

def update_target_model(self):

Exp 5. Flappy Bird 7

# copy weights from model to target_model
self.target_model.set_weights(self.model.get_weights())

def remember(self, state, action, reward, next_state, done):

self.memory.append((state, action, reward, next_state, done))

def act(self, state):

if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
state_tensor = tf.convert_to_tensor(state.reshape(1, -1), dtype=tf.float32)
act_values = self.model(state_tensor, training=False)
return np.argmax(act_values[0])

def replay(self, batch_size):

if len(self.memory) < batch_size:
return

minibatch = random.sample(self.memory, batch_size)

states = np.array([transition[0] for transition in minibatch])
actions = np.array([transition[1] for transition in minibatch])
rewards = np.array([transition[2] for transition in minibatch])
next_states = np.array([transition[3] for transition in minibatch])
dones = np.array([transition[4] for transition in minibatch])

# Get current states and predict Q values

states_tensor = tf.convert_to_tensor(states, dtype=tf.float32)
with tf.GradientTape() as tape:
q_values = self.model(states_tensor, training=True)

# Select the Q values for the actions that were taken

indices = tf.range(0, tf.shape(q_values)[0]) * tf.shape(q_values)[1] + act
selected_q_values = tf.gather(tf.reshape(q_values, [-1]), indices)

# Get Q values for next states with target model

next_states_tensor = tf.convert_to_tensor(next_states, dtype=tf.float32
next_q_values = self.target_model(next_states_tensor, training=False)

# Calculate targets

Exp 5. Flappy Bird 8

max_next_q_values = tf.reduce_max(next_q_values, axis=1)
targets = rewards + (1 - dones) * self.gamma * max_next_q_values

# Calculate loss
loss = tf.keras.losses.mse(selected_q_values, targets)

# Get gradients and update model

grads = tape.gradient(loss, self.model.trainable_variables)
self.model.optimizer.apply_gradients(zip(grads, self.model.trainable_varia

# Decay epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay

# Custom save method to avoid keras restrictions

def save_model(self, filepath):
self.model.save_weights(filepath)

# Custom load method

def load_model(self, filepath):
self.model.load_weights(filepath)

# Training function
def train_dqn(episodes=100):
env = FlappyBirdEnv()
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
agent = DQNAgent(state_size, action_size)
batch_size = 32

# Keep track of scores

scores = []

print("Starting training for", episodes, "episodes")

for e in range(episodes):
state = env.reset()
total_reward = 0

Exp 5. Flappy Bird 9

done = False
step = 0

while not done:

step += 1
# Choose action
action = agent.act(state)

# Take action
next_state, reward, done, info = env.step(action)
total_reward += reward

# Remember the experience

agent.remember(state, action, reward, next_state, done)

state = next_state

# Train through replay

if len(agent.memory) > batch_size:
agent.replay(batch_size)

# Update target model occasionally

if step % 10 == 0:
agent.update_target_model()

# Visualize occasionally - reduced frequency for better performance

if e % 50 == 0 and step % 50 == 0:
clear_output(wait=True)
env.render()
time.sleep(0.01)

scores.append(info['score'])

# Print episode stats

avg_score = np.mean(scores[-100:]) if len(scores) >= 100 else np.mean(s
print(f"Episode: {e+1}/{episodes}, Score: {info['score']}, Epsilon: {agent.e

# Save model weights occasionally

Exp 5. Flappy Bird 10

if (e+1) % 50 == 0:
print(f"Saving model at episode {e+1}")
model_path = f"flappy_bird_model_ep{e+1}"
try:
agent.save_model(model_path)
print(f"Successfully saved model to {model_path}")
except Exception as ex:
print(f"Failed to save model: {ex}")
# Continue without saving

# Plot learning curve

plt.figure(figsize=(10, 6))
plt.plot(scores)
plt.title('Learning Curve')
plt.xlabel('Episode')
plt.ylabel('Score')
plt.show()

return agent, scores

# Function to watch trained agent play

def watch_agent_play(agent, episodes=3):
env = FlappyBirdEnv()

for e in range(episodes):
state = env.reset()
done = False
step = 0

while not done and step < 1000: # Add step limit as a safeguard
step += 1
clear_output(wait=True)
env.render()

# Agent chooses action with no exploration

state_tensor = tf.convert_to_tensor(state.reshape(1, -1), dtype=tf.float3
action = np.argmax(agent.model(state_tensor, training=False)[0])

Exp 5. Flappy Bird 11

# Take action
state, reward, done, info = env.step(action)

time.sleep(0.05) # Slow down for better visualization

print(f"Episode {e+1}: Score = {info['score']}")

# Run the training

print("Starting Flappy Bird DQN training (Final fixed version for Colab)")

# Use try-except to handle potential errors gracefully

try:
# Train agent with fewer episodes for testing - reduce to 100 for quicker res
agent, scores = train_dqn(episodes=100)

# Watch the trained agent play

watch_agent_play(agent, episodes=3)
except Exception as e:
print(f"An error occurred during execution: {e}")

Output: (Simulation Screen Shots)

Exp 5. Flappy Bird 12

Result: Using Deep Q-Networks (DQN), we successfully train an agent to play Flappy Bird.
The neural network learns optimal Q-values to make decisions based on state observations.
The trained model enables the agent to flap at the right time to maximize its survival and
score.

Exp 5. Flappy Bird 13

Presentation FLappyBird
No ratings yet
Presentation FLappyBird
12 pages
Converted Text
No ratings yet
Converted Text
7 pages
Mini Project
No ratings yet
Mini Project
15 pages
Flappy Bird Game Implementation
No ratings yet
Flappy Bird Game Implementation
6 pages
Flappy Bird Game Implementation in Pygame
No ratings yet
Flappy Bird Game Implementation in Pygame
12 pages
Flappy Bird Report Ca 3
No ratings yet
Flappy Bird Report Ca 3
20 pages
Create Flappy Bird Game in Python
No ratings yet
Create Flappy Bird Game in Python
9 pages
Simple Flappy Bird Game in Pygame
No ratings yet
Simple Flappy Bird Game in Pygame
4 pages
Tu 4 JDMK 074
No ratings yet
Tu 4 JDMK 074
23 pages
Flappy Bird Game Code in Pygame
No ratings yet
Flappy Bird Game Code in Pygame
4 pages
Flappu Car
No ratings yet
Flappu Car
3 pages
Python Snake Game Project Report
No ratings yet
Python Snake Game Project Report
8 pages
Flappy Bird Game Development Guide
No ratings yet
Flappy Bird Game Development Guide
5 pages
Flappy - Bird Gaming Code
No ratings yet
Flappy - Bird Gaming Code
5 pages
Flappy Bird
No ratings yet
Flappy Bird
6 pages
Flappy Card
No ratings yet
Flappy Card
4 pages
Pygame 3
No ratings yet
Pygame 3
3 pages
Flappy Bird Game Development in Python
No ratings yet
Flappy Bird Game Development in Python
10 pages
Python Side Scroller Game Guide
No ratings yet
Python Side Scroller Game Guide
14 pages
Flappy Bird Game in Python Code
No ratings yet
Flappy Bird Game in Python Code
13 pages
Practical No: 1 1.a. Depth First Search (DFS) Algorithm Source Code
No ratings yet
Practical No: 1 1.a. Depth First Search (DFS) Algorithm Source Code
16 pages
Flappy - Py Game Code
No ratings yet
Flappy - Py Game Code
2 pages
Reinforcement Learning for Flappy Bird
No ratings yet
Reinforcement Learning for Flappy Bird
11 pages
Cs Project Snake Game
No ratings yet
Cs Project Snake Game
11 pages
Mini Pro
No ratings yet
Mini Pro
15 pages
Hot-Air Balloon Game Guide
No ratings yet
Hot-Air Balloon Game Guide
19 pages
Project 1: Flappy Bird Game
0% (1)
Project 1: Flappy Bird Game
9 pages
Python Report Submit
No ratings yet
Python Report Submit
19 pages
Project On Flappy Bird in Python Using Pygame Module
43% (7)
Project On Flappy Bird in Python Using Pygame Module
14 pages
AI Reinforcement Learning for Ping Pong
No ratings yet
AI Reinforcement Learning for Ping Pong
2 pages
Python Code Snippets for Beginners
No ratings yet
Python Code Snippets for Beginners
5 pages
Class 12 Computer Project Final Documentation
No ratings yet
Class 12 Computer Project Final Documentation
28 pages
Q-Learning with CNN for Flappy Bird
No ratings yet
Q-Learning with CNN for Flappy Bird
8 pages
Flappy Bird Mini Project Report
No ratings yet
Flappy Bird Mini Project Report
12 pages
Python Free Game by Curious Programmer
No ratings yet
Python Free Game by Curious Programmer
46 pages
Cs Project Code Explanation
No ratings yet
Cs Project Code Explanation
13 pages
Implementing Search Algorithms and Games
No ratings yet
Implementing Search Algorithms and Games
30 pages
Snake Game Python
No ratings yet
Snake Game Python
4 pages
Snake
No ratings yet
Snake
3 pages
Vaidhei
No ratings yet
Vaidhei
10 pages
Artificial Intelligence Lab File
No ratings yet
Artificial Intelligence Lab File
16 pages
Python AI Problem Solutions Guide
No ratings yet
Python AI Problem Solutions Guide
20 pages
Aifinal
No ratings yet
Aifinal
14 pages
Lab Programs
No ratings yet
Lab Programs
14 pages
Jigensh Ai Manual
No ratings yet
Jigensh Ai Manual
38 pages
Project Flappy
No ratings yet
Project Flappy
14 pages
All Practical AI
No ratings yet
All Practical AI
28 pages
Python Turtle Snake Game Code
No ratings yet
Python Turtle Snake Game Code
6 pages
Graph Algorithms and Games in Python
No ratings yet
Graph Algorithms and Games in Python
8 pages
AI Journal TIT2324007
No ratings yet
AI Journal TIT2324007
27 pages
Implement Search Algorithms in Python
No ratings yet
Implement Search Algorithms in Python
26 pages
AI Lab Manual
No ratings yet
AI Lab Manual
18 pages
Flappy Tom Game Development Guide
No ratings yet
Flappy Tom Game Development Guide
20 pages
Python Egg Catcher Game Project
No ratings yet
Python Egg Catcher Game Project
25 pages
CS Project?
No ratings yet
CS Project?
24 pages
Flappy Bird Game in Python Guide
No ratings yet
Flappy Bird Game in Python Guide
9 pages
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
No ratings yet
Amity University, Noida Aset (Cse) Batch: 2020-2024: Course Code: CSE401 Course: Artificial Intelligence
31 pages
Aimllabmanual
No ratings yet
Aimllabmanual
12 pages
Excercise Questions PDF
No ratings yet
Excercise Questions PDF
1 page
Module 7
No ratings yet
Module 7
16 pages
Module 6
No ratings yet
Module 6
16 pages
06 Karnaugh
No ratings yet
06 Karnaugh
18 pages
Catalogo LKM - Brake - Frenos
No ratings yet
Catalogo LKM - Brake - Frenos
84 pages
English Listening Test Guide
No ratings yet
English Listening Test Guide
9 pages
Polyamide Use in Food Commodities
No ratings yet
Polyamide Use in Food Commodities
2 pages
Form 15G: Dividend Tax Exemption Declaration
No ratings yet
Form 15G: Dividend Tax Exemption Declaration
2 pages
Java Wrapper Classes Guide
No ratings yet
Java Wrapper Classes Guide
13 pages
F&L Space Introduction To Accounting
No ratings yet
F&L Space Introduction To Accounting
16 pages
Junior Assistant Exam Results 2023
No ratings yet
Junior Assistant Exam Results 2023
36 pages
07 8 2020 Vajiram&Ravi Indianexpress
No ratings yet
07 8 2020 Vajiram&Ravi Indianexpress
18 pages
Mazda RX-7 Bodyshop Repair Manual
100% (68)
Mazda RX-7 Bodyshop Repair Manual
8 pages
RISO EZ 370E Parts List Overview
100% (2)
RISO EZ 370E Parts List Overview
116 pages
Virtual-ATI NCLEX Readiness Guide
No ratings yet
Virtual-ATI NCLEX Readiness Guide
1 page
5.3 kW Solar System Quotation
No ratings yet
5.3 kW Solar System Quotation
3 pages
Mathematical Models of Mechanical Systems
No ratings yet
Mathematical Models of Mechanical Systems
12 pages
Indian Contract Act
No ratings yet
Indian Contract Act
15 pages
Anf 2 A Application Form For Issue / Modification in Importer Exporter Code Number (IEC)
No ratings yet
Anf 2 A Application Form For Issue / Modification in Importer Exporter Code Number (IEC)
6 pages
2178 Reference Aci
No ratings yet
2178 Reference Aci
3 pages
Income Awareness Survey Results
No ratings yet
Income Awareness Survey Results
15 pages
Software Development Module Overview
No ratings yet
Software Development Module Overview
4 pages
ACER Aspire 4739
No ratings yet
ACER Aspire 4739
4 pages
Solar Kiln by Bill Stuewe
100% (3)
Solar Kiln by Bill Stuewe
5 pages
MedPrime Admission Terms
No ratings yet
MedPrime Admission Terms
5 pages
OverallTax - AprilRemitted
No ratings yet
OverallTax - AprilRemitted
27 pages
Understanding Poverty in India
100% (1)
Understanding Poverty in India
23 pages
Biomimetic Design Seminar Report
No ratings yet
Biomimetic Design Seminar Report
11 pages
SANS15609 3ED1 - 09 01 27 - WP - IS MH TC
No ratings yet
SANS15609 3ED1 - 09 01 27 - WP - IS MH TC
18 pages
Multilock Connector Housing 16 Position
No ratings yet
Multilock Connector Housing 16 Position
6 pages
2023 2024 Important Dates Calendar
No ratings yet
2023 2024 Important Dates Calendar
3 pages
Magnetic Particle Examination Specification
No ratings yet
Magnetic Particle Examination Specification
4 pages
Step1: Login As A Root User and Search For VPC in The Console
No ratings yet
Step1: Login As A Root User and Search For VPC in The Console
43 pages

AI Training for Flappy Bird DQN

Uploaded by

AI Training for Flappy Bird DQN

Uploaded by

Exp 5.

Simulation Tool: The Flappy Bird environment is implemented

Action Space Discrete(2)

1. Initialize the deep Q-network with random weights.

3. For each episode:

Start at the initial state.

Choose an action using an -greedy policy.

Sample a mini-batch from the replay buffer.

Compute the target Q-value using:

Update the Q-network using backpropagation.

Exp 5. Flappy Bird 1

Repeat until the game is over.

4. Train the DQN for multiple episodes until convergence.

5. Use the trained model to find the optimal policy.

0: Do nothing (bird falls)

1: Flap (bird jumps up)

Stacked frames for temporal information

Bird’s vertical position

Distance to next pipe

Height of next pipe

Rewards : Reward schedule:

Successfully passing a pipe: +1

Collision with ground or pipe: -1 (Game Over)

The bird collides with a pipe.

The bird collides with the ground.

The bird flies too high (in some implementations).

Exp 5. Flappy Bird 2

# Suppress TensorFlow warnings

# Force CPU usage instead of GPU to avoid compatibility issues

print("TensorFlow version:", tf.__version__)

# Define the Flappy Bird environment

Exp 5. Flappy Bird 3

# Define action and observation space

def step(self, action):

# Update bird position

# Spawn new pipes

# Remove pipes that are off-screen

Exp 5. Flappy Bird 4

# Check for collisions

# Bird hits the ground or ceiling

return self._get_state(), reward, done, {'score': self.score}

for pipe in self.pipes:

Exp 5. Flappy Bird 5

return np.array(state, dtype=np.float32)

Exp 5. Flappy Bird 6

# Define DQN Agent

Exp 5. Flappy Bird 7

def remember(self, state, action, reward, next_state, done):

def act(self, state):

def replay(self, batch_size):

minibatch = random.sample(self.memory, batch_size)

# Get current states and predict Q values

# Select the Q values for the actions that were taken

# Get Q values for next states with target model

Exp 5. Flappy Bird 8

# Get gradients and update model

# Custom save method to avoid keras restrictions

# Custom load method

# Keep track of scores

print("Starting training for", episodes, "episodes")

Exp 5. Flappy Bird 9

while not done:

# Remember the experience

# Train through replay

# Update target model occasionally

# Visualize occasionally - reduced frequency for better performance

# Print episode stats

# Save model weights occasionally

Exp 5. Flappy Bird 10

# Plot learning curve

return agent, scores

# Function to watch trained agent play

# Agent chooses action with no exploration

Exp 5. Flappy Bird 11

time.sleep(0.05) # Slow down for better visualization

print(f"Episode {e+1}: Score = {info['score']}")

print("TensorFlow version:", tf.version)