Professional Documents
Culture Documents
Learning
Contents
• Introduction to Reinforcement Learning
• Game Playing [ Deep Blue in Chess, IBM Watson in Jeopardy, Google’s
Deep Mind in AlphaGo]
• Agents and Environment
• Action-Value Function
• Deep Reinforced Learning
Actor-Critic Agent:
● In this, this agent is a value-based and policy-based agent. It’s an agent that stores both of the policy,
and how much reward it is getting from each state.
Model-Free Agent:
● Here the agent doesn’t try to understand the environment, i.e. it doesn’t try to build the
dynamics. Instead we go directly to the policy and/or value function. We just see experience
and try to figure out a policy of how to behave optimally to get the most possible rewards.
PAGE 19
• First, we should probably like to know how likely it is for an agent to take any
given action from any given state. This is where the notion of policies come into
play.
• A policy is a function that maps a given state to probabilities of selecting each
possible action from that state. We will use the symbol π to denote a policy.
PAGE 20
• Secondly, in addition to understanding the probability of selecting an action, we
should probably also like to know how good a given action or a given state is for
the agent.
• In terms of rewards, selecting one action over another in a given state may
increase or decrease the agent's rewards, so knowing this in advance will
probably help our agent out with deciding which actions to take in which states.
This is where value functions become useful
• Value functions are functions of states, or of state-action pairs, that estimate
how good it is for an agent to be in a given state, or how good it is for the agent
to perform a given action in a given state.
PAGE 21
• State-Value Function
• Figure shows an example of changing a state from
another state using a value function.
• In figure, how do we determine the value of state A?
There is a 50–50 chance to end up in the next 2
possible states, either state B or C. The value of state
A is simply the sum of all next states’ probability
multiplied by the reward for reaching that state. The
value of state A is 0.5.
PAGE 22
• Action-Value Function
PAGE 24
Challenges with RL
• The main challenge in reinforcement learning lies in preparing the simulation
environment, which is highly dependent on the task to be performed.
• When the model has to go superhuman in Chess, Go or Atari games, preparing
the simulation environment is relatively simple.
• When it comes to building a model capable of driving an autonomous car,
building a realistic simulator is crucial before letting the car ride on the street.
• The model has to figure out how to brake or avoid a collision in a safe environment, where
sacrificing even a thousand cars comes at a minimal cost.
• Transferring the model out of the training environment and into the real world is where
things get tricky.
PAGE 25
Applications of RL
• Robotics — RL can be used for high-dimensional control problems as well as
various industrial applications.
• Text mining — RL along with an advanced contextual text generation model
can be used to develop a system that’s able to produce highly readable
summaries of long texts.
• Trade execution — Major companies in the financial industry have been using
ML algorithms to enhance trading and equity.
• Healthcare — RL is useful for medication dosing, optimization of treatment
policies for those suffering from chronic, clinical trials, etc.
• Games — RL is so well-known these days because it is the mainstream
algorithm used to solve different games and sometimes achieve super-human
performance.
PAGE 26
PAGE 27
Deep Reinforced Learning
• Deep Reinforcement Learning (DRL), a very fast-moving field, is the
combination of Reinforcement Learning and Deep Learning