Welcome to Scribd!

Py Code Example 11 0 Baird Emphatic TD

Uploaded by

0% found this document useful (0 votes)

9 views3 pages

This document contains code that implements the Emphatic-TD algorithm on a counterexample environment to demonstrate divergence. It defines a 7 state environment with 2 possible actions, builds the TD update rule, runs the algorithm for 1000 steps, and plots the changing weights over time. The final weights diverge, showing that Emphatic-TD can fail to converge on some problems.

Original Description:

PYTHON

Original Title

Py-Code-Example-11-0-Baird-Emphatic-TD

Copyright

Available Formats

TXT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

9 views3 pages

Py Code Example 11 0 Baird Emphatic TD

Uploaded by

Emily Cheng

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

##################

## Example 11.0: Baird counterexample

## using Emphatic-TD algorithm on p. 304

##################

# Import packages and functions.

import numpy as np
import matplotlib.pyplot as plt

which = lambda status: np.arange(len(status))[status]

# parameter settings

n1 = 7 # number of states
n2 = n1 # number of working states

alpha = 0.03
#prob_eps = 0.1

discount = 0.99
#eps = 1.0e-7 # should be small enough

# Two possible actions: dashed = 0; solid = 1

action_word = np.array(["dashed", "solid"])

action = np.array([0, 1])

action_num = 2

# building environment

def step(state, move):

reward = 0
if move == 0: # dashed action
next_state = int(np.random.choice([0, 1, 2, 3, 4, 5]))
else: # solid action
next_state = 6
return {"next_state": next_state, "reward": reward}

def b_move(state):
rand_num = np.random.uniform(low = 0, high = 1, size = 1)
if rand_num <= 6/7:
move = 0 # dashed action
else:
move = 1 # solid action
return move

b_action_prob = np.array([6/7, 1/7])

pi_action_prob = np.array([0.0, 1.0])

rho = pi_action_prob / b_action_prob

rho
def v_value(state, w):
if state == 6:
value = w[6] + 2 * w[7]
else:
value = 2 * w[state] + w[7]
return value

def grad_v(state, w):

grad = np.zeros(len(w))
if state == 6:
grad[6] = 1.0
grad[7] = 2.0
else:
grad[state] = 2.0
grad[7] = 1.0
return grad

# main loop

seed = 543
np.random.seed(seed) # Set random seed for reproducibility.

interest = [1, 1, 1, 1, 1, 1, 1]

#step_num = 1000

step_num = 1000 # for testing purpose only

w = np.array([1, 1, 1, 1, 1, 1, 10, 1], dtype = float) # initial weights

w_matrix = np.zeros((step_num + 1, 8))

w_matrix[0, :] = w # initial weights

state = int(np.random.choice(range(n2))) # random initial state

for i in range(1, step_num + 1):

# Choose an action.
move = b_move(state)
step_obj = step(state, move)
next_state = step_obj["next_state"]
reward = step_obj["reward"]
if move == 1:
# Update the weights.
current_value = v_value(state, w)
next_value = v_value(next_state, w)
delta = reward + discount * next_value - current_value
grad = grad_v(state, w)
w[state] = w[state] + alpha * rho[1] * delta * grad[state]
w[7] = w[7] + alpha * rho[1] * delta * grad[7]
w_matrix[i, :] = np.copy(w)
state = next_state

print(w_matrix)

w_matrix[-1,:] # final weights

# Calculate state values.

state_value = np.zeros(7)

for i in range(7):
state_value[i] = v_value(i, w)

print(state_value)

# Figure 11.6

colors = ["black", "red", "green", "blue", "yellow", "cyan", "magenta", "purple"]

linestyles = ["solid", "dashed", "dotted", "dashdot"]

plt.figure("Figure 11.6")
for j in range(8):
plt.plot(range(step_num + 1), w_matrix[:, j],
color = colors[j], linestyle = linestyles[j%4])

plt.xlabel("step")
plt.ylabel("weight value")
plt.legend(['w1', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8'], loc = "best", frameon
= False)
plt.show()

##################

Reaction Paper
Document10 pages
Reaction Paper
Ericka Galang
100% (1)
Vibration Related Failures of Small Bore Attachments
Document13 pages
Vibration Related Failures of Small Bore Attachments
Jose Prado
No ratings yet
CDR Rollout
Document38 pages
CDR Rollout
mbhagavanprasad
No ratings yet
6th Class Lesson Plans Record
Document79 pages
6th Class Lesson Plans Record
jasti krishna
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Harmonics Free Series Reactor Flux Compensated Magnetic Amplifier
Document4 pages
Harmonics Free Series Reactor Flux Compensated Magnetic Amplifier
mv_mallik
75% (4)
Price Action Scalper
Document17 pages
Price Action Scalper
Marcos Luis Almeida Barbosa
100% (1)
Py Code Example 11 0 Baird Semi Gradient TD0
Document3 pages
Py Code Example 11 0 Baird Semi Gradient TD0
Emily Cheng
No ratings yet
Py Code Example 11 0 Baird Semi Gradient DP Like
Document3 pages
Py Code Example 11 0 Baird Semi Gradient DP Like
Emily Cheng
No ratings yet
Py Code Example 4 1 Gradient MC Evaluation
Document4 pages
Py Code Example 4 1 Gradient MC Evaluation
Emily Cheng
No ratings yet
Ass 3
Document3 pages
Ass 3
Akash Sahu
No ratings yet
Py Code Example 4 1 Policy Evaluation
Document3 pages
Py Code Example 4 1 Policy Evaluation
Emily Cheng
No ratings yet
Py Code Example 3 5 Policy Evaluation
Document3 pages
Py Code Example 3 5 Policy Evaluation
Emily Cheng
No ratings yet
Practical
Document6 pages
Practical
The P R E D A T O R
No ratings yet
AI Outputs (4,5,6,7)
Document16 pages
AI Outputs (4,5,6,7)
avnimote121
No ratings yet
Search Algorithm Assignment !
Document12 pages
Search Algorithm Assignment !
anurag.nuwesh
No ratings yet
Artificial Intelligence Record
Document26 pages
Artificial Intelligence Record
Harishwaran V.
No ratings yet
Tugas 2 Grid World
Document7 pages
Tugas 2 Grid World
heruprambadi
No ratings yet
AIML Lab Prog
Document15 pages
AIML Lab Prog
Green Mongor
No ratings yet
Ai Lab
Document14 pages
Ai Lab
sudharsanthanigasalam
No ratings yet
Ass 2
Document4 pages
Ass 2
Akash Sahu
No ratings yet
Abu Minhaj Farooqi 37560 Ai Lab Final Exam
Document14 pages
Abu Minhaj Farooqi 37560 Ai Lab Final Exam
Minhaj Farooqi
No ratings yet
Naive Bayes
Document58 pages
Naive Bayes
Dhaleshwar Prasad
No ratings yet
IA1
Document4 pages
IA1
grpothysufdtyhu
No ratings yet
Ass1 Merged Merged
Document15 pages
Ass1 Merged Merged
Akash Sahu
No ratings yet
Lab Programs.
Document21 pages
Lab Programs.
ram
No ratings yet
Py Code Example 11 0 Baird Bellman
Document2 pages
Py Code Example 11 0 Baird Bellman
Emily Cheng
No ratings yet
Diksha Nasa RA1911042010003 Csbs - R1: Pip Install Colorama
Document4 pages
Diksha Nasa RA1911042010003 Csbs - R1: Pip Install Colorama
Diksha Nasa
No ratings yet
Output
Document65 pages
Output
it's technical
No ratings yet
#Iteration Counter: Gradient - Descent
Document2 pages
#Iteration Counter: Gradient - Descent
Divyani Chavan
No ratings yet
DAA Record
Document15 pages
DAA Record
dharun0704
No ratings yet
Ai Experiment 4
Document7 pages
Ai Experiment 4
avnimote121
No ratings yet
Hidden Markov Model - 18SE02CE051
Document5 pages
Hidden Markov Model - 18SE02CE051
DENISH ASODARIYA
No ratings yet
P03 A Star Algorithm 35 Anushka Shetty
Document23 pages
P03 A Star Algorithm 35 Anushka Shetty
anohanabrotherhoodcave
No ratings yet
ML Record
Document24 pages
ML Record
mkesav3070
No ratings yet
Aiml Lab PGM
Document15 pages
Aiml Lab PGM
Shubham Kumar
No ratings yet
AI Lets Go
Document28 pages
AI Lets Go
Rushikesh Ballal
No ratings yet
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
Document6 pages
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
Faran Khalid
No ratings yet
Excercise Sheet
Document9 pages
Excercise Sheet
Fabian Hafner
No ratings yet
Output
Document65 pages
Output
it's technical
No ratings yet
Lab
Document25 pages
Lab
Syed Moizuddin Quadri
No ratings yet
AIML Lab Programs
Document13 pages
AIML Lab Programs
Neeraja Rajesh
No ratings yet
Exe 1
Document13 pages
Exe 1
jaya pandi
No ratings yet
Ai Lab
Document15 pages
Ai Lab
4653Anushika Patel
No ratings yet
Ai Lab Code
Document14 pages
Ai Lab Code
Aladin sabari
No ratings yet
21BAI10063 MonteCarloLab
Document18 pages
21BAI10063 MonteCarloLab
meet.joysar2021
No ratings yet
Aiml Lab
Document14 pages
Aiml Lab
1DT19IS146Triveni
No ratings yet
Ai - Digital Assignment
Document14 pages
Ai - Digital Assignment
Ramagopal Vemuri
No ratings yet
AI Practical 07
Document28 pages
AI Practical 07
Vaishnavi Awchar
No ratings yet
Lab Manual AI Lab VI Sem
Document50 pages
Lab Manual AI Lab VI Sem
Nikhil Kumar
No ratings yet
''' Function To Load Dataset ''': Open List Range Len Float
Document3 pages
''' Function To Load Dataset ''': Open List Range Len Float
cnd
No ratings yet
AI Journal
Document33 pages
AI Journal
laxman guptta
No ratings yet
Lab#08 - (2018 BCS 076)
Document10 pages
Lab#08 - (2018 BCS 076)
Lucky Asr Awais
No ratings yet
Scoa Codes
Document9 pages
Scoa Codes
kp
No ratings yet
Assignment No: 04: Code
Document20 pages
Assignment No: 04: Code
Vishwakarma University
No ratings yet
Aiml Lab Manual 2023
Document17 pages
Aiml Lab Manual 2023
shamilie17
No ratings yet
Lab Manual AI Lab VI Sem
Document34 pages
Lab Manual AI Lab VI Sem
Assoc.Prof, CSE , Vel Tech, Chennai
No ratings yet
DWM 10
Document5 pages
DWM 10
Aqid Khatkhatay
No ratings yet
Single Layer Perceptron With Custom Dataset Link
Document4 pages
Single Layer Perceptron With Custom Dataset Link
Jaydev Raval
No ratings yet
(Assignment) : Introductiontocomputer Scienceandcontemporaraylan
Document22 pages
(Assignment) : Introductiontocomputer Scienceandcontemporaraylan
HASHAM DESAI
No ratings yet
DSA Turing
Document8 pages
DSA Turing
Ayodeji Akinola
No ratings yet
8 Puzzle Problem
Document3 pages
8 Puzzle Problem
hajra
No ratings yet
Cardio Screen RF
Document27 pages
Cardio Screen RF
The Mind
100% (1)
Ass1 Merged Merged
Document19 pages
Ass1 Merged Merged
Akash Sahu
No ratings yet
Noron Thi
Document26 pages
Noron Thi
Đức Thánh
No ratings yet
Dadi
Document5 pages
Dadi
Gïø Kavtaradze
No ratings yet
En LG-C333 SVC Eng 120824 PDF
Document181 pages
En LG-C333 SVC Eng 120824 PDF
Sergio Mesquita
No ratings yet
PSMOD - Chapter 2 - Summary Measures of Statistics
Document31 pages
PSMOD - Chapter 2 - Summary Measures of Statistics
sabin
No ratings yet
707 300 Limitations
Document24 pages
707 300 Limitations
Matias Javier Alvarez
No ratings yet
Lesson 2.5 Application of Normal Curve
Document12 pages
Lesson 2.5 Application of Normal Curve
Klarence Timothy Pineda Bundang
No ratings yet
Job Advertisement
Document4 pages
Job Advertisement
Mulenga Davies
No ratings yet
12 - Paediatric Abdomen Radiology
Document74 pages
12 - Paediatric Abdomen Radiology
Maria Douka
No ratings yet
Questin and Probles
Document3 pages
Questin and Probles
ikran caaqil
No ratings yet
Seafile Business Edition Whitepaper
Document7 pages
Seafile Business Edition Whitepaper
Elena Sterpu
No ratings yet
Cost2 Questions
Document6 pages
Cost2 Questions
Luong Hong Giang
No ratings yet
Caltex Aquatex 3180-0805
Document3 pages
Caltex Aquatex 3180-0805
faisii
No ratings yet
Eskom Pension and Provident Fund Vs Brian Molefe
Document20 pages
Eskom Pension and Provident Fund Vs Brian Molefe
Molefe Seeletsa
No ratings yet
Materials Corrosion - 2018 - Wilson - Properties of TSA in Natural Seawater at Ambient and Elevated Temperature
Document14 pages
Materials Corrosion - 2018 - Wilson - Properties of TSA in Natural Seawater at Ambient and Elevated Temperature
Karine Freitas
No ratings yet
ENS161 Practice Problems
Document7 pages
ENS161 Practice Problems
Erl Gomez
No ratings yet
Introductiontosteamturbine Linkedin 180927210151 PDF
Document42 pages
Introductiontosteamturbine Linkedin 180927210151 PDF
Islam Rabie
No ratings yet
Duplex I, Rivera Compound, Quezon Avenue, Catbangen, San Fernando City La Union (072) 607-4972 0998-550-0544 / 0917-104-5665
Document5 pages
Duplex I, Rivera Compound, Quezon Avenue, Catbangen, San Fernando City La Union (072) 607-4972 0998-550-0544 / 0917-104-5665
aziel caith florentin
No ratings yet
TFelem 1
Document75 pages
TFelem 1
Zemicheal Berihu
No ratings yet
WP7 12 (28W)
Document2 pages
WP7 12 (28W)
Mohammed R
No ratings yet
List of Exhibitor - Updated 150622
Document3 pages
List of Exhibitor - Updated 150622
dipak kamble
No ratings yet
Welding, Cutting, and Brazing
Document31 pages
Welding, Cutting, and Brazing
hasan
No ratings yet
LSSGB Project Submission by U Narasimha Rao
Document25 pages
LSSGB Project Submission by U Narasimha Rao
NarasimhaRao
No ratings yet
Reading 30 Central Clearing
Document5 pages
Reading 30 Central Clearing
Rahul Gupta
No ratings yet
Islamic Relief - Application - Form
Document8 pages
Islamic Relief - Application - Form
Junaid Salman
No ratings yet
Wired Network and Wireless Network.
Document11 pages
Wired Network and Wireless Network.
Desi Manalu Agrifa
No ratings yet
Schumacher Xc103 Charger
Document29 pages
Schumacher Xc103 Charger
Kimberly Zahn Thibodeaux
No ratings yet