Welcome to Scribd!

Ass 3

Uploaded by

0% found this document useful (0 votes)

2 views3 pages

This document shows code for solving the FrozenLake environment in OpenAI Gym using value iteration. It defines functions for value iteration and extracting the optimal policy. Value iteration is run on the environment with a discount factor of 0.9, converging in 7 iterations. The optimal value function and policy are then extracted and displayed.

Original Description:

Original Title

ass3

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

2 views3 pages

Ass 3

Uploaded by

Akash Sahu

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

ASSIGNMENT - 3

In [1]:

import gym
import numpy as np
import matplotlib.pyplot as plt

In [2]:

env = gym.make("FrozenLake-v1", render_mode="human" ,is_slippery=False)

In [3]:

env.reset()
# env.render()
# env.step(2)
Out[3]:

(0, {'prob': 1})

In [4]:
# env.step(2)

In [5]:

# env.step(2)

In [6]:
print(env.observation_space)
env.action_space

Discrete(16)
Out[6]:

Discrete(4)

In [7]:
env.P[2][2]
Out[7]:
[(1.0, 3, 0.0, False)]

In [8]:

def value_iteration(env, gamma = 0.9):

# initialize value table with zeros

value_table = np.zeros(env.observation_space.n)

# set number of iterations and threshold

no_of_iterations = 100000
threshold = 1e-20

for i in range(no_of_iterations):

# On each iteration, copy the value table to the updated_value_table

updated_value_table = np.copy(value_table)

# Now we calculate Q Value for each actions in the state

# and update the value of a state with maximum Q value
for state in range(env.observation_space.n):
Q_value = []
for action in range(env.action_space.n):
next_states_rewards = []
for next_sr in env.P[state][action]:
trans_prob, next_state, reward_prob, _ = next_sr
next_states_rewards.append((trans_prob * (reward_prob + gamma * upda
ted_value_table[next_state])))

Q_value.append(np.sum(next_states_rewards))

value_table[state] = max(Q_value)

# we will check whether we have reached the convergence i.e whether the differenc
e
# between our value table and updated value table is very small. But how do we kn
ow it is very
# small? We set some threshold and then we will see if the difference is less
# than our threshold, if it is less, we break the loop and return the value funct
ion as optimal
# value function

if (np.sum(np.fabs(updated_value_table - value_table)) <= threshold):

print ('Value-iteration converged at iteration# %d .' %(i+1))
break

return value_table

In [9]:
def extract_policy(value_table, gamma = 0.9):

# initialize the policy with zeros

policy = np.zeros(env.observation_space.n)

for state in range(env.observation_space.n):

# initialize the Q table for a state

Q_table = np.zeros(env.action_space.n)

# compute Q value for all ations in the state

for action in range(env.action_space.n):
for next_sr in env.P[state][action]:
trans_prob, next_state, reward_prob, _ = next_sr
Q_table[action] += (trans_prob * (reward_prob + gamma * value_table[next
_state]))

# select the action which has maximum Q value as an optimal action of the state
policy[state] = np.argmax(Q_table)

return policy

In [10]:
optimal_value_function = value_iteration(env=env,gamma=0.9)
optimal_value_function

Value-iteration converged at iteration# 7.

Out[10]:
array([0.59049, 0.6561 , 0.729 , 0.6561 , 0.6561 , 0. , 0.81 ,
0. , 0.729 , 0.81 , 0.9 , 0. , 0. , 0.9 ,
1. , 0. ])

In [11]:
optimal_policy = extract_policy(optimal_value_function, gamma=0.9)
optimal_policy
Out[11]:
Out[11]:

array([1., 2., 1., 0., 1., 0., 1., 0., 2., 1., 1., 0., 0., 2., 2., 0.])

In [12]:
opt_pol = optimal_policy.reshape(4,4)
print('THE OPTIMAL POLICY IS \n ',opt_pol)

THE OPTIMAL POLICY IS

[[1. 2. 1. 0.]
[1. 0. 1. 0.]
[2. 1. 1. 0.]
[0. 2. 2. 0.]]

Ass1 Merged Merged
Document19 pages
Ass1 Merged Merged
Akash Sahu
No ratings yet
Frozen Lake
Document6 pages
Frozen Lake
Akash Sahu
No ratings yet
Ass 2
Document4 pages
Ass 2
Akash Sahu
No ratings yet
Py Code Example 11 0 Baird Semi Gradient TD0
Document3 pages
Py Code Example 11 0 Baird Semi Gradient TD0
Emily Cheng
No ratings yet
Py Code Example 4 1 Policy Evaluation
Document3 pages
Py Code Example 4 1 Policy Evaluation
Emily Cheng
No ratings yet
Py Code Example 4 1 Gradient MC Evaluation
Document4 pages
Py Code Example 4 1 Gradient MC Evaluation
Emily Cheng
No ratings yet
Py Code Example 11 0 Baird Emphatic TD
Document3 pages
Py Code Example 11 0 Baird Emphatic TD
Emily Cheng
No ratings yet
Py Code Example 11 0 Baird Semi Gradient DP Like
Document3 pages
Py Code Example 11 0 Baird Semi Gradient DP Like
Emily Cheng
No ratings yet
Ass1 Merged Merged
Document15 pages
Ass1 Merged Merged
Akash Sahu
No ratings yet
Py Code Example 3 5 Policy Evaluation
Document3 pages
Py Code Example 3 5 Policy Evaluation
Emily Cheng
No ratings yet
21BAI10063 MonteCarloLab
Document18 pages
21BAI10063 MonteCarloLab
meet.joysar2021
No ratings yet
Ai Lab
Document15 pages
Ai Lab
4653Anushika Patel
No ratings yet
Prob13: 1 EE16A Homework 13
Document23 pages
Prob13: 1 EE16A Homework 13
Michael ARK
No ratings yet
Getting Started With Reinforcement Learning and Open AI Gym
Document10 pages
Getting Started With Reinforcement Learning and Open AI Gym
KSD
No ratings yet
Assignment 2
Document13 pages
Assignment 2
Vasudha Singh
No ratings yet
35 Case Syntax
Document269 pages
35 Case Syntax
Lukman MB
No ratings yet
#Iteration Counter: Gradient - Descent
Document2 pages
#Iteration Counter: Gradient - Descent
Divyani Chavan
No ratings yet
Reinforcement Learning
Document3 pages
Reinforcement Learning
Nikhil Vaya
No ratings yet
AI Outputs (4,5,6,7)
Document16 pages
AI Outputs (4,5,6,7)
avnimote121
No ratings yet
Exe 1
Document13 pages
Exe 1
jaya pandi
No ratings yet
Lab#08 - (2018 BCS 076)
Document10 pages
Lab#08 - (2018 BCS 076)
Lucky Asr Awais
No ratings yet
Lecture 04 - Install Python Dan Searching in Python
Document10 pages
Lecture 04 - Install Python Dan Searching in Python
Adit Sanur
No ratings yet
Aiml Lab PGM
Document15 pages
Aiml Lab PGM
Shubham Kumar
No ratings yet
''' Function To Load Dataset ''': Open List Range Len Float
Document3 pages
''' Function To Load Dataset ''': Open List Range Len Float
cnd
No ratings yet
T21-86 AI Exp6
Document5 pages
T21-86 AI Exp6
Gaurang Patyane
No ratings yet
Class 3 Code
Document8 pages
Class 3 Code
jess
No ratings yet
ML Lab Manual
Document37 pages
ML Lab Manual
apekshapandekar01
100% (1)
Exp 6
Document6 pages
Exp 6
jay
No ratings yet
Az Her and Tahir
Document2 pages
Az Her and Tahir
syedosmanali040
No ratings yet
Artificial Intelligence Record
Document26 pages
Artificial Intelligence Record
Harishwaran V.
No ratings yet
Aiml Lab
Document14 pages
Aiml Lab
1DT19IS146Triveni
No ratings yet
Fitting The Nelson-Siegel-Svensson Model With Differential Evolution
Document10 pages
Fitting The Nelson-Siegel-Svensson Model With Differential Evolution
happy_24471
No ratings yet
Statistical Learning in R
Document31 pages
Statistical Learning in R
Angela Ivanova
No ratings yet
K-Nearest Neighbor On Python Ken Ocuma
Document9 pages
K-Nearest Neighbor On Python Ken Ocuma
Aliyha Dionio
100% (2)
Lab2 Linear Regression
Document18 pages
Lab2 Linear Regression
Juan Zarate
100% (1)
DAA Maual
Document24 pages
DAA Maual
Manoj M
No ratings yet
Ai - Digital Assignment
Document14 pages
Ai - Digital Assignment
Ramagopal Vemuri
No ratings yet
Fitting The Nelson-Siegel-Svensson Model With Differential Evolution
Document10 pages
Fitting The Nelson-Siegel-Svensson Model With Differential Evolution
Dickson phiri
No ratings yet
Machine Learnin
Document23 pages
Machine Learnin
Manoj Kumar 1183
100% (1)
Asep Purnama - 140710180027 - Praktik LSTM
Document9 pages
Asep Purnama - 140710180027 - Praktik LSTM
Asep
No ratings yet
Practical File of Machine Learning 1905388
Document42 pages
Practical File of Machine Learning 1905388
Devansh
No ratings yet
ML Lab
Document7 pages
ML Lab
Rishi TP
No ratings yet
Python
Document10 pages
Python
Kalim Ullah Marwat
No ratings yet
DAA Record
Document15 pages
DAA Record
dharun0704
No ratings yet
P03 A Star Algorithm 35 Anushka Shetty
Document23 pages
P03 A Star Algorithm 35 Anushka Shetty
anohanabrotherhoodcave
No ratings yet
AI Lab Assignment#02
Document10 pages
AI Lab Assignment#02
Babar Ali
No ratings yet
Fall Semester 2021-22 Mathematical Modelling For Data Science CSE 3045
Document8 pages
Fall Semester 2021-22 Mathematical Modelling For Data Science CSE 3045
PRAKHAR MISHRA
No ratings yet
Data Manipulation With Numpy
Document13 pages
Data Manipulation With Numpy
babul
No ratings yet
Soft Sensor Code
Document4 pages
Soft Sensor Code
Marvin Martins
No ratings yet
Soft Sensor Code
Document4 pages
Soft Sensor Code
Marvin Martins
No ratings yet
Big Data Merged
Document7 pages
Big Data Merged
Ingame Id
No ratings yet
Ai Lab Code
Document14 pages
Ai Lab Code
Aladin sabari
No ratings yet
4ann - 5Th Prog
Document15 pages
4ann - 5Th Prog
shreyas .c
No ratings yet
Problem Set 1
Document15 pages
Problem Set 1
Muhammad Hamza
No ratings yet
8 Puzzel Problem Using Best First Search: Import As Def
Document3 pages
8 Puzzel Problem Using Best First Search: Import As Def
Brijesh Kuvadiya
No ratings yet
Machine
Document45 pages
Machine
Gagan Sharma
100% (1)
Quiz Pertemuan 3 Dan Pertemuan 4
Document10 pages
Quiz Pertemuan 3 Dan Pertemuan 4
Adit Sanur
No ratings yet
Is Lab 7
Document7 pages
Is Lab 7
Aman Bansal
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Basic Exercises for Competitive Programming: Python
From Everand
Basic Exercises for Competitive Programming: Python
Jan Pol
No ratings yet
R Packages: How To Install, Include and Remove The Packages in R
Document10 pages
R Packages: How To Install, Include and Remove The Packages in R
Akash Sahu
No ratings yet
Ass1 Merged Merged
Document15 pages
Ass1 Merged Merged
Akash Sahu
No ratings yet
Ass 6
Document2 pages
Ass 6
Akash Sahu
No ratings yet
Ass 4
Document2 pages
Ass 4
Akash Sahu
No ratings yet
Ass 1
Document2 pages
Ass 1
Akash Sahu
No ratings yet
FATEHGARH-II 765-400-220KV SLD REv 01-Model
Document1 page
FATEHGARH-II 765-400-220KV SLD REv 01-Model
ANUP KAMBOJ
No ratings yet
API Testing Cheat Sheet
Document3 pages
API Testing Cheat Sheet
mithun nair77
100% (1)
TCS India - FAQs For Flexi Holidays
Document5 pages
TCS India - FAQs For Flexi Holidays
Alok
No ratings yet
CBSE Class 11 Mathematics Worksheet - Permutations and Combinations (2) - 0
Document5 pages
CBSE Class 11 Mathematics Worksheet - Permutations and Combinations (2) - 0
ranjith kumar Eswaran
No ratings yet
Engineering For Sustainable Development: United Nations Educational, Scientific and Cultural Organization
Document183 pages
Engineering For Sustainable Development: United Nations Educational, Scientific and Cultural Organization
drhipo
No ratings yet
EG-547 SG en-GB
Document2 pages
EG-547 SG en-GB
Philippe Pereira
No ratings yet
Effect of Unified Power Quality Conditioner in Smart Grid Operation and Control
Document5 pages
Effect of Unified Power Quality Conditioner in Smart Grid Operation and Control
DrAshok Kumar Tiwari
No ratings yet
RoCE vs. IWARP Final
Document35 pages
RoCE vs. IWARP Final
montecito1
No ratings yet
Trimble Training
Document22 pages
Trimble Training
Rich G
No ratings yet
Nomor Registrasi Tanggal Nama Produk Nama Pabrik Nama Pendaftar
Document224 pages
Nomor Registrasi Tanggal Nama Produk Nama Pabrik Nama Pendaftar
suwanta
No ratings yet
Aps 101 121
Document2 pages
Aps 101 121
OPG service
No ratings yet
Sistem Steering: By: Mazheidy Bin Mat Darus
Document20 pages
Sistem Steering: By: Mazheidy Bin Mat Darus
Mazheidy Mat Darus
No ratings yet
Phases of A Formal Review
Document8 pages
Phases of A Formal Review
sureshkumar1143
No ratings yet
Stands Out From The Crowd in Mobile X-Ray Imaging: MOBILETT Elara Max
Document20 pages
Stands Out From The Crowd in Mobile X-Ray Imaging: MOBILETT Elara Max
Shadi Masri
No ratings yet
JPSP 2022 504
Document7 pages
JPSP 2022 504
Befekadu Berhanu
No ratings yet
What Do We Learn From Experimental Algor
Document723 pages
What Do We Learn From Experimental Algor
Camelia Chisalita
No ratings yet
BMT SPSF Schedule1040
Document1 page
BMT SPSF Schedule1040
Michael Lopes
No ratings yet
Deep Learning Based Trajectory Optimization For UAV
Document23 pages
Deep Learning Based Trajectory Optimization For UAV
Salil Sharma
No ratings yet
The Application of The Principles of Res
Document25 pages
The Application of The Principles of Res
phucnguyen.31221023675
No ratings yet
Flammable Liquids: CRC Handbook of Laboratory Safety, 5th Edition, Page 265
Document7 pages
Flammable Liquids: CRC Handbook of Laboratory Safety, 5th Edition, Page 265
Mariano Bruni
No ratings yet
LEGRAND COMPLETE ALL PRODUCT PRICE LIST W PDF
Document143 pages
LEGRAND COMPLETE ALL PRODUCT PRICE LIST W PDF
technofree
33% (3)
087183
Document600 pages
087183
Advocate
No ratings yet
DM Screen 1 PDF
Document4 pages
DM Screen 1 PDF
maszerk
No ratings yet
SEMICON-LAB Syllabus
Document6 pages
SEMICON-LAB Syllabus
nandithanarayanan349
No ratings yet
Databases Designed Concept
Document14 pages
Databases Designed Concept
MUHAMMAD YAZIB
No ratings yet
Bar Bending Schedule For Box Culvert at KM: 146+396: Length: 12000
Document6 pages
Bar Bending Schedule For Box Culvert at KM: 146+396: Length: 12000
sriharsha boyapati
No ratings yet
CIS Controls and Sub-Controls Mapping To ISO
Document47 pages
CIS Controls and Sub-Controls Mapping To ISO
Ronald López
No ratings yet
Camera Angles
Document38 pages
Camera Angles
John Royd Apurada
No ratings yet
Hoja de Especificaciones - Serie CentreCOM GS980MX - RevB
Document4 pages
Hoja de Especificaciones - Serie CentreCOM GS980MX - RevB
Marco Antonio Hernandez Ibarra
No ratings yet
Veeam Iq: Propartner Training Guide
Document6 pages
Veeam Iq: Propartner Training Guide
Sakthi Murugan S
No ratings yet