You are on page 1of 31

REINFORCEMENT LEARNING

Roopavath Jethya
Assistant Professor
CSE Department
School Of Technology
GITAM(Deemed to be ) University
Hyderabad
Contents
• Introduction to Reinforcement Learning
• Game Playing [Deep Blue in Chess, IBM Watson in
Jeopardy, Google’s Deep Mind in AlphaGo]
• Agents and Environment
• Action-Value Function
• Deep Reinforced Learning
Introduction to Machine learning
• Machine learning is a scientific discipline that is concerned with
the design and development of algorithms that allow computers to
learn based on data, such as from sensor data or databases.
• A major focus of machine learning research is to automatically
learn to recognize complex patterns and make intelligent decisions
based on data .
• Machine learning (ML) is a category of an algorithm that allows
software applications to become more accurate in predicting
outcomes without being explicitly programmed. The basic premise
of machine learning is to build algorithms that can receive input
data and use statistical analysis to predict an output while updating
outputs as new data becomes available.
Machine learning Type:
❑ Supervised learning : Task Driven (Classification)
❑ Unsupervised learning : Data Driven (Clustering)
❑ Reinforcement learning —
– Close to human learning.
– Algorithm learns a policy of how to act in a given environment.
– Every action has some impact in the environment and the
environment provides rewards that guides the learning algorithm.
Introduction to Reinforcement Learning
Definition: Reinforcement Learning (RL)
• Reinforcement learning is learning what to do, how to
map situations to actions so as to maximize a numerical
reward signal. The learner is not told which actions to take,
but instead must discover which actions yield the most
reward by trying them.
• T h e t wo m o st i m p o r ta nt d i st i n g u i s h i n g fe at u re s o f
reinforcement learning are trial-and-error search and
delayed reward. RL is known as a semi-supervised
learning model in machine learning, is a technique to allow
an agent to take actions and interact with an environment
so as to maximize the total rewards.
Element of reinforcement learning
Agent Policy

State Reward Action

Environment

❑ Agent: Intelligent programs


❑ Environment: External condition

❑ Policy:
 Defines the agent’s behavior at a given time
 A mapping from states to actions
 Lookup tables or simple function

7
Elements of Reinforcement Learning
Steps for Reinforcement Learning
1. The agent observes an input state
2. An action is determined by a decision making
function (policy)
3. The action is performed
4. The agent receives a scalar reward or reinforcement
from the environment
5. Information about the reward given for that state /
action pair is recorded
Element of reinforcement learning
 Reward function :
❑ Defines the goal in an RL problem
❑ Policy is altered to achieve this goal

Value function:
❑ Reward function indicates what is good in an immediate sense
while a value function specifies what is good in the long run.
 Value of a state is the total amount of reward an agent can expect to
accumulate over the future, starting form that state.
 Model of the environment :
❑ Predict mimic behavior of environment,
❑ Used for planning & if Know current state and action then
predict the resultant next state and next reward.
Game Playing
• Deep Blue in Chess
• IBM Watson in Jeopardy
• Google’s Deep Mind in AlphaGo
Deep Blue in Chess
This is the first World Championclass
chess computer among the oldest
challenges in computer science. When
Wo r l d C h e s s C h a m p i o n G a r r y
Kasparov resigned the last game of a
six-game match against IBM’s Deep
Blue supercomputer on 11 May 1997,
his loss marked achievement of Deep
Blue to meet its goal.
Deep Blue in Chess

Deep Blue’s 1996 debut in the first Kasparov


versus Deep Blue match in Philadelphia finally
eclipsed Deep Thought II. The 1996 version of
Deep Blue used a new chess chip designed at
IBM Research over the course of three years. A
major revision of this chip participated in the
historic 1997 rematch between Kasparov and
Deep Blue to achieve its goal. Fig gives IBM
DeepBlue beats the chess champion Garry
Kasparov
IBM Watson in Jeopardy
• Watson is a question-answering computer system
capable of answering questions posed in natural
language, developed in IBM's DeepQA project by
a research team led by principal investigator
David Ferrucci.
• Google’s Deep Mind in AlphaGo
– AlphaGo is a computer program that plays the board
game Go.
– It was developed by DeepMind Technologies which
was later acquired by Google.
– AlphaGo versus Lee Sedol, also known as the
Google DeepMind Challenge Match, was a five-
game Go match between 18-time world champion
Lee Sedol and AlphaGo, a computer Go program
developed by Google DeepMind, played in Seoul,
South Korea between the 9th and 15th of March
2016.
– It is able to do this by using a novel form of
reinforcement learning, in which AlphaGo Zero
becomes its own teacher.

PAGE 16
Agents and Environment
• Reinforcement learning is a branch of Machine learning
where we have an agent and an environment.
• The environment is nothing but a task or simulation
• The Agent is an AI algorithm that interacts with the
environment and tries to solve it.

PAGE 17
• Agent: An AI algorithm.

• Environment: A task/simulation which needs to be solved


by the Agent.

An environment interacts with the agent by sending its state and


a reward. Thus following are the steps to create an environment.

– Create a Simulation.
– Add a State vector which represents the internal state of
the Simulation.
– Add a Reward system into the Simulation.

PAGE 18
Categorizing Reinforcement Learning Agents
Value Based Agent:

● In this, the agent will evaluate all the states in the state space, and the policy will be
kind of implicit, i.e. the value function tells the agent how good is each action in a
particular state and the agent will choose the best one.

Policy Based Agent:

● Here, instead of representing the value function inside the agent, we explicitly
represent the policy. The agent searches for the optimal action-value function which in
turn will enable it to act optimally.

Actor-Critic Agent:

● In this, this agent is a value-based and policy-based agent. It’s an agent that stores both
of the policy, and how much reward it is getting from each state.
PAGE 19
Model-Based Agent:
● Here, the agent tries to build a model of how the environment works, and
then plans to get the best possible behaviour.

Model-Free Agent:
● Here the agent doesn’t try to understand the environment, i.e. it doesn’t try
to build the dynamics. Instead we go directly to the policy and/or value
function. We just see experience and try to figure out a policy of how to
behave optimally to get the most possible rewards.

PAGE 20
There are 3 main approaches when it comes to implementing an RL algorithm.

1. Value-based — in a value-based reinforcement learning method, you try to


maximize a value function V(s). The main focus is to find an optimal value.

2. Policy-based — in a policy-based reinforcement learning method, you try to come


up with a policy such that the action performed at each state is optimal to gain
maximum reward in the future. The main focus is to find the optimal policy.

3. Model-based — in this type of reinforcement learning, you create a virtual model


for each environment, and the agent learns to perform in that specific environment.

PAGE 21
Action-Value Function
• This will give us a way to measure “how good” it is for an agent to be in a
given state or to select a given action
• We discussed the general idea of any game(like Chess) and how an agent
in an environment can perform actions and get a rewarded for those
actions.
• With all the possible actions that an agent may be able to take in all the
possible states of an environment, there are a couple of things that we
m i g ht b e i nte re ste d i n u n d e rsta n d i n g . We d i s c u s s t h e to p i c s
of policies and value functions.

PAGE 22
• First, we should probably like to know how likely it is for an agent to
take any given action from any given state. This is where the notion
of policies come into play.
• A policy is a function that maps a given state to probabilities of selecting
each possible action from that state. We will use the symbol π to denote
a policy.

PAGE 23
• Secondly, in addition to understanding the probability of selecting an
action, we should probably also like to know how good a given action
or a given state is for the agent.
• In terms of rewards, selecting one action over another in a given state
may increase or decrease the agent's rewards, so knowing this in advance
will probably help our agent out with deciding which actions to take in
which states. This is where value functions become useful
• Value functions are functions of states, or of state-action pairs, that
estimate how good it is for an agent to be in a given state, or how good it
is for the agent to perform a given action in a given state.

• Value functions are classified into two ways


– State-Value Function
– Action-Value Function

PAGE 24
– State-Value Function
• Figure shows an example of changing a state from
another state using a value function.
• In figure, how do we determine the value of state
A? There is a 50–50 chance to end up in the next 2
possible states, either state B or C. The value of
state A is simply the sum of all next states’
probability multiplied by the reward for reaching
that state. The value of state A is 0.5.

PAGE 25
• Action-Value Function

For Example minimum shortest path


PAGE 26
Deep Reinforced Learning
• Deep Reinforcement Learning (DRL), a very fast-moving field, is the
combination of Reinforcement Learning and Deep Learning.

• Implementing deep learning architectures (deep neural networks) with


reinforcement learning algorithms (Q-learning, actor critic, etc.) is
capable of scaling to previously unsolvable problems. That is because
DRL is able to learn from raw sensors or image signals as input.
PAGE 27
Challenges with RL
• The main challenge in reinforcement learning lies in preparing the
simulation environment, which is highly dependent on the task to be
performed.
• When the model has to go superhuman in Chess, Go or Atari games,
preparing the simulation environment is relatively simple.
• When it comes to building a model capable of driving an autonomous car,
building a realistic simulator is crucial before letting the car ride on the
street.
– The model has to figure out how to brake or avoid a collision in a safe environment,
where sacrificing even a thousand cars comes at a minimal cost.

– Transferring the model out of the training environment and into the real world is
where things get tricky.
PAGE 28
Applications of RL
• Robotics — RL can be used for high-dimensional control
problems as well as various industrial applications.
• Text mining — RL along with an advanced contextual text
generation model can be used to develop a system that’s able to
produce highly readable summaries of long texts.
• Trade execution — Major companies in the financial industry
have been using ML algorithms to enhance trading and equity.
• Healthcare — RL is useful for medication dosing, optimization
of treatment policies for those suffering from chronic, clinical
trials, etc.
• Games — RL is so well-known these days because it is the
mainstream algorithm used to solve different games and
sometimes achieve super-human performance. PAGE 29
PAGE 30
References
• https://www.newworldai.com/cs234-reinforcement-learning-
lectures-stanford-engineering/
• https://towardsdatascience.com/an-introduction-to-reinforcement-
learning-1e7825c60bbe

• Videos
– Basic level : https://www.youtube.com/watch?v=e3Jy2vShroE
– High level : https://www.youtube.com/watch?v=JgvyzIkgxF0

PAGE 31

You might also like