Professional Documents
Culture Documents
Version History
0.1 02/10/21 First version.
Summary
Title: Reinforcement Learning using OpenAI Gym environments
Hand-in: Programs AND a written report will need to be submitted online via
Moodle. Check the module’s Moodle page for the precise deadline.
Late policy: The coursework deadlines are absolute. Late submissions are
subject to a 5% deduction of the overall coursework mark per day.
Informal Description
The coursework consists of two tasks as described below. Your aim is to build
several reinforcement learning agents and to design, implement and run several
basic research-based experiments. You will hand-in software and a report that
discusses your work on these tasks. It is recommended that you start practicing
with several heuristics in task 1 and quickly switch over to reinforcement
learning. Heuristic based approaches can help give you a sense of the nature
and difficulty of each problem, however, in the long run, you are much more
likely to get high performing agents using reinforcement learning. Please consult
the table below for details.
Laboratory notes
You will work individually.
We need to start working hard from the very first day to make the most of
the lab sessions. In the first week you will learn the basics of Open AI Gym,
will experiment with several environments, and will even try some small
heuristics on simple control problems (e.g. cartpole).
Getting Started
Installation
Go to https://gym.openai.com/docs/
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action
env.close()
If you want to play with the MuJoCo environments you can, but on a
personal basis. In other words, I am not setting any of the questions based
on MuJoCo. When students use MuJoCo on a personal basis, they can get a
free license for this purpose. Get a MuJoCo student license here.
In order to use the Box2D environments first do: pip install box2d-py
Test bipedal_walker-v3.py
Try to come up with some simple heuristics to keep the pole up based on
your understanding of the environment.
Example: s1cart-pole-v0-sol1.
Can you modify the algorithm so that it learns faster? Recall that if you
use any solutions from the web, you need to cite them, and you need to
state explicitly which parts you have re-used and which parts you have
innovated yourself.
Task Description
Task 1:
Task 2:
◦ Goals. Keywords: novel experiments and insights. The aim of this task
is for you to carry out some simple research on your reinforcement
learning agents. The main sub-tasks are to: (1) formulate one or more
research questions, (2) implement experiments to address these
questions, (3) run the experiments and collect the results, (4) analyze
and interpret the results, (5) document your findings.
◦ Evaluation. Run your experiments and report your results for all of the
chosen environments consistently.
◦ Demo. Show and explain the performance of your solutions, and the
results of your experiments.
Performance Evaluation
To keep evaluation and comparisons simple, and consistent with
leaderboards elsewhere, your main performance measure should be the
number of episodes required before solving the problem. In other words,
we are interested in the speed of learning. Care must be taken in being
explicit and consistent regarding what constitutes having solved the
problem. As a good reference point, you should compare with existing
leaderboards, e.g.: https://github.com/openai/gym/wiki/Leaderboard. You
are of course, allowed to use and investigate additional performance
measures if you so wish, however the “number of episodes before solving
the problem” measure is compulsory.
Assessment – Overall
Marks
Component Description Main Criteria
(50)
Evidence of understanding of the base code. Evidence of
One demo for innovation. Strong justifications and arguments. Clear
Demo 10
each task. communication.
[Demo 1: 5 marks; Demo 2: 5 marks.]
Multiple files
organized with Is the code complete? Is the code well-designed, clean, elegant,
Software 10
a clear and well commented? Is the code complex/challenging enough?
structure.
Mini-
conference
paper Are the structure, grammar and argumentation of the
paper/report good? Are the introduction, background, methods,
Report 30 summarizing results and analyses, clear, comprehensive and insightful? Does
all of the work the paper show critical and creative thinking?
done on both
tasks.
Assessment Criteria: Report/Paper
1st an excellent, well-written paper demonstrating extensive
understanding and good insight.
2:1 a comprehensive, well-written paper demonstrating thorough
understanding and some insight.
2:2 a competent report/paper demonstrating good understanding of the
implementation.
3rd an adequate report/paper covering all specified topics at a basic level
of understanding.
F an inadequate report/paper failing to cover the specified topics.
Paper Guide
The paper should contain:
[5 marks] Introduction (about 1 page). An introduction containing an
introduction to the main concepts, problem statement, overview of
background, and your main contributions.
[5 marks] Background (about 0.5 pages). A brief overview of the field and
the key papers closely related to your work.
[6 marks] Methods (about 1 page). A detailed and concise description of
how you implemented each task (e.g. algorithms and experimental
design).
[6 marks] Results (about 1 page). An overview of your key results
encompassing performance measures and other results leading to insights
about the problem and/or your solutions.
[5 marks] Discussion (about 0.5 pages). Your interpretation of the results,
your conclusions, and proposed future work.
[3 marks] References & Appendices (not included in the word count).
Note: Writing a concise report/paper is a core part of the assignment. The total
number of pages for your paper (i.e. main sections, excluding references and
Appendices) cannot exceed 4 pages (with a minimum page margin of 2.5cm on
each side), using single line spacing, a two-column format, and a minimum font
size of 12 in Times New Roman font).