You are on page 1of 8

SOS – Mid Term Report

Topic : Reinforcement Learning


Mentee : Vora Jay Bhaveshbhai
Roll no. : 21d100023
Mentor : Rushi Chavda
Reinforcement Learning:
Agent: Actor operating within the environment.
Environment: The world within which the agent can operate in.
Action: Whatever agent does in environment.
Reward and observations: Reward received by agent depending
on how environment looks after action is performed.

The most important feature distinguishing reinforcement


learning from other types of learning is that it uses training
information that evaluates the actions taken rather than instructs by
giving correct actions. This is what creates the need for active
exploration, for an explicit trial-and-error search for good behaviour.
Purely evaluative feedback indicates how good the action taken is,
but not whether it is the best or the worst action possible.

Installation to start Reinforcement Learning:


Use following commands in windows PowerShell to install
libraries for RL (Reinforcement Learning). Not going into details of
installing python and PyCharm.
pip install stable-baselines3[extra]
pip install piglet

Cart Pole Problem:

A pole is attached by a frictionless joint to a cart, which


moves along a 1-D track. The pole is placed upright on the cart and
the goal is to balance the pole by applying forces in the left and right
direction on the cart.

Guide for reading further:


All text in red is code and others (in black) are comment. I
have written in such a way that excluding first para under each topic
the whole code can be copied to editor. Comments are used to make
the code understandable and also for explanation of the work I have
did till now.

Importing files:
Below files will be used to complete the program.

import os
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy

Instantiate the environment:

I am using OpenAI Gym Spaces (gym library) for setting up


environment.

environment_name = "CartPole-v0"
env = gym.make(environment_name)
# loads the environment with the help of gym library

episodes = 5
# will check environment (in other words play the game) 'episode'
(here 5) number of times.
# CartPole has fixed episode length which is 200 frames
for episode in range (1, episodes + 1):
state = env.reset()
# reset the environment and obtain initial observations of
environment
done = False
# episode is not done till now
score = 0
# score for each iteration (episode)

while not done:


# episode not done so perform it
env.render()
# shows the episode on screen
action = env.action_space.sample()
# gives random input to agent (for learning)
'''
"env.action_space.sample()" gives two output values namely
0-pushes cart to left
and 1-pushes cart to the right
'''
n_state, reward, done, info = env.step(action)
'''
env.step(action) returns four values namely
"n_state" - gets value as array [Cart Position, Cart Velocity,
Pole Angle, Pole Angular Velocity]
episode terminates if the cart leaves the (-2.4, 2.4) range or
the pole angle is not in the range (-.2095, .2095) (or ±12°).
"reward" gets value 1.0 till the episode runs.
"done" tells if the episode is done. Becomes true when episode
terminates
"info" additional info for debugging if required.
'''
score += reward
# keeps track of total score
print('Episode:{} Score:{}'.format(episode, score))
# prints episode number and score
env.close()
# closes the environment

Training:

For training RL model we first need to make folders in


directory where the python program is saved. I have made folders
“Training -> Logs”.

log_path = os.path.join('Training', 'Logs')


env = gym.make(environment_name)
#Instantiates environment
env = DummyVecEnv([lambda: env])
# lambda is an environment creation function
model = PPO('MlpPolicy', env, verbose = 1, tensorboard_log =
log_path)
'''
Using Multi Layer Perception Policy.
It saves log files in "Log" folder.
'''
model.learn(total_timesteps=20000)
# Making model train 20000 times
Future Plan Of Action:
Week 1 means next week (19th July onwards). There has been a
deviation in from my earlier plan of action because I want to learn
more about the application part of RL rather than theoretical one.
Week 1: Saving Learned Model
Week 2: Endsem
Week 3: Test and evaluate trained model, Code completion
(technically).
Week 4: Changing the above code slightly to make is more readable
and effective, final report submission.
Resources Used:
1. https://www.gymlibrary.ml/environments/classic_control/cart
_pole/
2. https://youtu.be/Mut_u40Sqz4

You might also like