You are on page 1of 2

Applied AI and Control

IP505245 Applied AI and Control


Assignment 3, Windy gridworld
Out: 26.10.2022
In: 16.11.2022

Given a 7×10 gridworld with the goal state G, as shown in the figure below. The agent has a random initial
state within the gridworld. It can move one of the four directions per step. There is a crosswind running
upward through the grid cells. The numbers under the gridworld in red indicate the strength of the wind,
which will take the agent the corresponding number of cells upward. For example, if the agent is at the cell
to the right of the goal, then the action “move left” takes the agent to the cell just above the goal.
Regarding the reward:
• Actions that take the agent off the grid will receive a reward of -100.
• Actions that take the agent to the goal state will receive a reward of 100.
• Otherwise, the agent will receive a constant reward of -1 per time step.
Episode termination:
• Actions that take the agent off the grid will terminate the episode.
• Actions that take the agent to the goal state will terminate the episode.
• Time step exceeds 20.

Wind 3 G
2
Actions
1

0
i
0 1 2 3 4 5 6 7 8 9
0 0 0 1 2 0 1 1 1 0

1 of 2
Applied AI and Control
1) Implement the windy gridworld scenario in Gym environment
• Implement the stepping, resetting and rendering functions.
• Create a script to test the developed scenario, where the agent can random walk in the
gridworld until episode termination.
2) Use dynamic programming (pp. 42 in Lecture 8) or Q-learning method to
• Generate an optimal policy, so that the agent can reach the goal state.
• Test two initial cases using the generated policy:
i. Initial state [i, j] = [1, 1]
ii. Initial state [i, j] = [2, 5]
• (Optional) Assume the cells above and under the goal state are obstacles. Actions that take
the agent to these obstacle cells will terminate the episode and receive a reward of -100. Redo
question 2 to find out the optimal policy.

2 of 2

You might also like