You are on page 1of 30

Reinforcement

Learning
Contents
• Introduction to Reinforcement Learning
• Game Playing [ Deep Blue in Chess, IBM Watson in Jeopardy, Google’s
Deep Mind in AlphaGo]
• Agents and Environment
• Action-Value Function
• Deep Reinforced Learning

06-11-2020 Reinforcement Learning 2


Introduction to Reinforcement Learning

• An approach to Artificial Intelligence


• Learning from interaction
• Goal-oriented learning
• Learning about, from, and while interacting with an external
environment
• Learning what to do—how to map situations to actions—so as to
maximize a numerical reward signal

06-11-2020 Reinforcement Learning 3


• Definition of RL:
Reinforcement learning (RL) is a type of ML which is all about taking suitable action to
maximize reward in a particular situation. It is employed by various software and
machines to find the best possible behavior or path it should take in a specific situation.

• RL means to establish or encourage a pattern of behavior

06-11-2020 Reinforcement Learning 4


Key Features
• Learner is not told which actions to take
• Trial-and-Error search
• Possibility of delayed reward – Sacrifice short-term gains for greater
long-term gains
• The need to explore and exploit
• Considers the whole problem of a goal-directed agent interacting with
an uncertain environment

06-11-2020 Reinforcement Learning 5


06-11-2020 Reinforcement Learning 6
06-11-2020 Reinforcement Learning 7
06-11-2020 Reinforcement Learning 8
Basic concepts of RL

• A Reinforcement learning setup is composed of 2 main components,


an agent and an environment.

06-11-2020 Reinforcement Learning 9


1.Agent- a hypothetical entity which performs actions in an environment to gain
some reward.
2.Action (a): All the possible moves that the agent can take.
3.Environment (e): A scenario the agent has to face.
4.State (s): Current situation returned by the environment.
5.Reward (R): An immediate return sent back from the environment to evaluate the
last action by the agent.
6.Policy (π): The strategy that the agent employs to determine next action based on
the current state.
7.Value (V): The expected long-term return with discount, as opposed to the short-
term reward R. Vπ(s), is defined as the expected long-term return of the current
state s under policy π.
8.Q-value or action-value (Q): Q-value is similar to Value, except that it takes an
extra parameter, the current action a. Qπ(s, a) refers to the long-term return of the
current state s, taking action an under policy π.

06-11-2020 Reinforcement Learning 10


How do RL work?

• A basic reinforcement learning involves these steps:


• Observation of the environment
• Deciding how to act using some strategy
• Acting accordingly
• Receiving a reward or penalty
• Learning from the experiences and refining our strategy
• Iterate until an optimal strategy is found

06-11-2020 Reinforcement Learning 11


Game Playing
• Deep Blue in Chess
• first World Champion class chess computer
• World Chess Champion Garry Kasparov resigned the last game of a six-game
match against IBM’s Deep Blue supercomputer on 11 May 1997

• IBM Watson in Jeopardy


• Watson is a question-answering computer system capable of answering
questions posed in natural language, developed in IBM's DeepQA project by a
research team led by principal investigator David Ferrucci.
• First computer to defeat TV game show Jeopardy! champions (Ken Jennings
and Brad Rutter).
06-11-2020 Reinforcement Learning 12
• Google’s Deep Mind in AlphaGo
• AlphaGo is a computer program that plays the board game Go.
• It was developed by DeepMind Technologies which was later acquired by Google.
• AlphaGo versus Lee Sedol, also known as the Google DeepMind Challenge
Match, was a five-game Go match between 18-time world champion Lee Sedol
and AlphaGo, a computer Go program developed by Google DeepMind, played in
Seoul, South Korea between the 9th and 15th of March 2016.
• It is able to do this by using a novel form of reinforcement learning, in which
AlphaGo Zero becomes its own teacher.

06-11-2020 Reinforcement Learning 13


Agents and Environment

• Reinforcement learning is a branch of Machine learning where we


have an agent and an environment.
• The environment is nothing but a task or simulation
• The Agent is an AI algorithm that interacts with the environment and
tries to solve it.

06-11-2020 Reinforcement Learning 14


• Agent: An AI algorithm.

• Environment: A task/simulation which needs to be solved by the


Agent.
An environment interacts with the agent by sending its state and a
reward. Thus following are the steps to create an environment.
• Create a Simulation.
• Add a State vector which represents the internal state of the
Simulation.
• Add a Reward system into the Simulation.

06-11-2020 Reinforcement Learning 15


Categorizing Reinforcement Learning Agents
Value Based Agent:
● In this, the agent will evaluate all the states in the state space, and the policy will be kind of implicit,
i.e. the value function tells the agent how good is each action in a particular state and the agent will
choose the best one.

Policy Based Agent:


● Here, instead of representing the value function inside the agent, we explicitly represent the policy.
The agent searches for the optimal action-value function which in turn will enable it to act optimally.

Actor-Critic Agent:
● In this, this agent is a value-based and policy-based agent. It’s an agent that stores both of the policy,
and how much reward it is getting from each state.

06-11-2020 Reinforcement Learning 16


Model-Based Agent:
● Here, the agent tries to build a model of how the environment works, and then plans to get
the best possible behaviour.

Model-Free Agent:
● Here the agent doesn’t try to understand the environment, i.e. it doesn’t try to build the
dynamics. Instead we go directly to the policy and/or value function. We just see experience
and try to figure out a policy of how to behave optimally to get the most possible rewards.

06-11-2020 Reinforcement Learning 17


There are 3 main approaches when it comes to implementing an RL
algorithm.
1.Value-based — in a value-based reinforcement learning method, you
try to maximize a value function V(s). The main focus is to find an
optimal value.
2.Policy-based — in a policy-based reinforcement learning method, you
try to come up with a policy such that the action performed at each
state is optimal to gain maximum reward in the future. The main
focus is to find the optimal policy.
3.Model-based — in this type of reinforcement learning, you create a
virtual model for each environment, and the agent learns to perform
in that specific environment.

06-11-2020 Reinforcement Learning 18


Action-Value Function
• This will give us a way to measure “how good” it is for an agent to be in a given
state or to select a given action
• We discussed the general idea of any game(like Chess) and how an agent in an
environment can perform actions and get a rewarded for those actions.
• With all the possible actions that an agent may be able to take in all the possible
states of an environment, there are a couple of things that we might be
interested in understanding. We discuss the topics of policies and value functions.

PAGE 19
• First, we should probably like to know how likely it is for an agent to take any
given action from any given state. This is where the notion of policies come into
play.
• A policy is a function that maps a given state to probabilities of selecting each
possible action from that state. We will use the symbol π to denote a policy.

PAGE 20
• Secondly, in addition to understanding the probability of selecting an action, we
should probably also like to know how good a given action or a given state is for
the agent.
• In terms of rewards, selecting one action over another in a given state may
increase or decrease the agent's rewards, so knowing this in advance will
probably help our agent out with deciding which actions to take in which states.
This is where value functions become useful
• Value functions are functions of states, or of state-action pairs, that estimate
how good it is for an agent to be in a given state, or how good it is for the agent
to perform a given action in a given state.

• Value functions are classified into two ways


• State-Value Function
• Action-Value Function

PAGE 21
• State-Value Function
• Figure shows an example of changing a state from
another state using a value function.
• In figure, how do we determine the value of state A?
There is a 50–50 chance to end up in the next 2
possible states, either state B or C. The value of state
A is simply the sum of all next states’ probability
multiplied by the reward for reaching that state. The
value of state A is 0.5.

PAGE 22
• Action-Value Function

For Example minimum shortest path


PAGE 23
Deep Reinforced Learning
• Deep Reinforcement Learning (DRL), a very fast-moving field, is the combination of
Reinforcement Learning and Deep Learning.

• Implementing deep learning architectures (deep neural networks) with reinforcement


learning algorithms (Q-learning, actor critic, etc.) is capable of scaling to previously unsolvable
problems. That is because DRL is able to learn from raw sensors or image signals as input.

PAGE 24
Challenges with RL
• The main challenge in reinforcement learning lies in preparing the simulation
environment, which is highly dependent on the task to be performed.
• When the model has to go superhuman in Chess, Go or Atari games, preparing
the simulation environment is relatively simple.
• When it comes to building a model capable of driving an autonomous car,
building a realistic simulator is crucial before letting the car ride on the street.
• The model has to figure out how to brake or avoid a collision in a safe environment, where
sacrificing even a thousand cars comes at a minimal cost.
• Transferring the model out of the training environment and into the real world is where
things get tricky.

PAGE 25
Applications of RL
• Robotics — RL can be used for high-dimensional control problems as well as
various industrial applications.
• Text mining — RL along with an advanced contextual text generation model
can be used to develop a system that’s able to produce highly readable
summaries of long texts.
• Trade execution — Major companies in the financial industry have been using
ML algorithms to enhance trading and equity.
• Healthcare — RL is useful for medication dosing, optimization of treatment
policies for those suffering from chronic, clinical trials, etc.
• Games — RL is so well-known these days because it is the mainstream
algorithm used to solve different games and sometimes achieve super-human
performance.

PAGE 26
PAGE 27
Deep Reinforced Learning
• Deep Reinforcement Learning (DRL), a very fast-moving field, is the
combination of Reinforcement Learning and Deep Learning 

Implementing deep learning architectures (deep neural networks) with reinforcement


learning algorithms (Q-learning, actor critic, etc.) is capable of scaling to previously
unsolvable problems. That is because DRL is able to learn from raw sensors or image
signals as input.
06-11-2020 Reinforcement Learning 28
06-11-2020 Reinforcement Learning 29
• https://www.newworldai.com/cs234-reinforcement-learning-lectures-
stanford-engineering/
• https://towardsdatascience.com/an-introduction-to-reinforcement-le
arning-1e7825c60bbe

06-11-2020 Reinforcement Learning 30

You might also like