Professional Documents
Culture Documents
(AUTONOMOUS)
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND
DATA SCIENCE
Unit V –Notes
(Eligibility Traces)
PREPARED BY APPROVED BY
This pictures shows how the weight becomes smaller as the time (or n-
steps) increases.
Backward View TD
Suppose an agent randomly walking in an environment and finds
a treasure. He then stops and looks backwards in an attempt to know
what led him to this treasure?
Naturally the steps that are close to the treasure have more
merits in finding it than the steps that are miles away. So closer
locations are more valuable than distant ones and thus they are
assigned bigger values.
How does this materialize, is through a vector E called eligibility
traces. Concretely, the eligibility traces is a function of state E(s) or
state action E(s,a) and holds the decaying values of the V(s).
The notation 1(St = s) means that we assign the full value when we are
at the state s, and as it gets propagated backwards it gets attenuated
exponentially.
The eligible trace update starts by E(s) = 0 for all states, then as we
pass by each state (due to performing an action) we increment E(s)
to boost the value of the state, then we decay E(s) by ɣ𝛌 (E(s) = ɣ𝛌
E(s)) for all s.
The makespan is the total length of the schedule (that is, when all the
jobs have finished processing).
Representation of JSS
The Gantt-Chart is a convenient way of visually representing a
solution of the JSSP.
The length of this solution is 12, which is the first time when all three
jobs are complete. However, note that this is not the optimal solution!
Watkins's Q( )
Unlike TD( ) or Sarsa( ), Watkins's Q( ) does not look ahead all the way
to the end of the episode in its backup. It only looks ahead as far as the next
exploratory action.