Welcome to Scribd!

Report PDF

Uploaded by

100% found this document useful (1 vote)

27 views5 pages

The document summarizes reinforcement learning experiments on Taxi, Cartpole, and DQN models. For Taxi, optimal Q-values were calculated and compared to learned values. For Cartpole, optimal initial state Q-values were also calculated and compared. DQN performed better than discretized Q-learning in Cartpole due to its experience replay buffer. Key aspects of epsilon-greedy exploration, experience replay buffers, and training multiple neural networks were discussed.

Original Description:

Original Title

report (1).pdf

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

100% found this document useful (1 vote)

27 views5 pages

Report PDF

Uploaded by

鄭博仁

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 5

Search inside document

Homework 4: Reinforcement Learning Report

109550116 楊傑宇
Part I. Experiment Results :
Taxi:

Cartpole:
DQN:

Compare:
Part II. Question Answering (50%):
1. Calculate the optimal Q-value of a given state in Taxi-v3 (the state is assigned in
google sheet), and compare with the Q-value you learned (Please screenshot the
result of the “check_max_Q” function to show the Q-value you learned). (4%)

My optimal Q-value is -0.5856821172999993

2. Calculate the max Q-value of the initial state in CartPole-v0, and compare with
the Q-value you learned. (Please screenshot the result of the “check_max_Q”
function to show the Q-value you learned) (4%)

My optimal Q-value is 4.816074831364288

3.
a. Why do we need to discretize the observation in Part 2? (2%)
Because we get a continuous interval, which is not easy to judge cartpole
position.
b. How do you expect the performance will be if we increase “num_bins”?
(2%)
I think performance will be better because it can calculate more accurately.
c. Is there any concern if we increase “num_bins”? (2%)
It may need more time to run the code and more memory to save qtable.
4. Which model (DQN, discretized Q learning) performs better in Cartpole-v0, and
what are the reasons? (3%)
Because DQN use memory buffer save the record until 1000 times, then it starts
learning.
5.
a. What is the purpose of using the epsilon greedy algorithm while choosing an
action? (2%)
The epsilon-greedy approach selects the action with the highest estimated
reward most of the time
b. What will happen, if we don’t use the epsilon greedy algorithm in the
CartPole-v0 environment? (3%)
We can’t balance between exploration and exploitation. it can’t perform truly
situation.
c. Is it possible to achieve the same performance without the epsilon greedy
algorithm in the CartPole-v0 environment? Why or Why not? (3%)
I think it is possible if we have another method to determine the factor
between exploration and exploitation.
d. Why don’t we need the epsilon greedy algorithm during the testing section?
(2%)
Because when we are in testing section, qtable have been done, we just get
the max q-value between each action.
6. Why is there “with torch.no_grad():“ in the “choose_action” function in DQN?
(3%)
It lets tensor with gradient currently attached with the current computational graph
is now detached from the current graph
7.
a. Is it necessary to have two networks when implementing DQN? (1%)
No.
b. What are the advantages of having two networks? (3%)
It helps keep runaway bias from bootstrapping from dominating the system
numerically, causing the estimated Q values to diverge.
c. What are the disadvantages? (2%)
training two networks separately will double the convergence time.
8.
a. What is a replay buffer(memory)? Is it necessary to implement a replay
buffer? What are the advantages of implementing a replay buffer? (5%)
- Replay buffers is to store trajectories of experience when executing a policy
in an environment.
- Yes
- More efficient use of previous experience, by learning with it multiple
times.
b. Why do we need batch size? (3%)
Batch size controls the accuracy of the estimate of the error gradient when
training neural networks.
c. Is there any effect if we adjust the size of the replay buffer(memory) or batch
size? Please list some advantages and disadvantages. (2%)
The more batch size is, the more training examples used in the estimate and
the more accurate this estimate will be. However it may slow our code.
9.
a. What is the condition that you save your neural network? (1%)
I don’t use condition to save, however I think I can save every 100 times
b. What are the reasons? (2%)
It can accelerate my code.
10. What have you learned in the homework? (2%)
In taxi, I learned the epsilon greedy algorithm and how to calculate q-value. In
cartpole, I learned how to discretize. In DQN, Beside the operation of DQN, I
learned how to convert dimension between variables more.

C & C++ Interview Questions You'll Most Likely Be Asked
From Everand
C & C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Basic Methods of Linguistic Analysis
Document4 pages
Basic Methods of Linguistic Analysis
melaniacara
80% (5)
Operating System Concepts Chapter 3 Exercise Solution Part 1
Document3 pages
Operating System Concepts Chapter 3 Exercise Solution Part 1
marriyam nadeem
No ratings yet
HUAWEI Final Written Exam 3333
Document13 pages
HUAWEI Final Written Exam 3333
Jonafe Piamonte
No ratings yet
MCQ of Machine Learning
Document151 pages
MCQ of Machine Learning
Kunal Bangar
100% (1)
AIOps Report 10 July 2023 Final
Document75 pages
AIOps Report 10 July 2023 Final
Prince Joseph
100% (2)
Generative Adversarial Networks (GANs) - Engine and Applications PDF
Document13 pages
Generative Adversarial Networks (GANs) - Engine and Applications PDF
Khaja Riazuddin Nawaz Mohammed
No ratings yet
K Means
Document329 pages
K Means
yousef shaban
100% (1)
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
From Everand
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
Abhishek Mishra
No ratings yet
Huawei Final Written Exam 2.2 Attempts
Document19 pages
Huawei Final Written Exam 2.2 Attempts
Jonafe Piamonte
No ratings yet
Model Predictive Control - Approaches Based On The Extended State Space Model and Extended Non-Minimal State Space Model
Document143 pages
Model Predictive Control - Approaches Based On The Extended State Space Model and Extended Non-Minimal State Space Model
umarsabo
No ratings yet
Huawei Final Written Exam
Document18 pages
Huawei Final Written Exam
Jonafe Piamonte
50% (2)
NPTEL CC Assignment3-1
Document4 pages
NPTEL CC Assignment3-1
Paul Stark
100% (2)
Final Exam Update Huawei
Document13 pages
Final Exam Update Huawei
Jonafe Piamonte
No ratings yet
4th Attempts Huawei
Document6 pages
4th Attempts Huawei
Jonafe Piamonte
No ratings yet
Report PDF
Document5 pages
Report PDF
鄭博仁
No ratings yet
Report PDF
Document12 pages
Report PDF
鄭博仁
No ratings yet
hw4 PDF
Document6 pages
hw4 PDF
鄭博仁
No ratings yet
Assignment 3 Solution
Document3 pages
Assignment 3 Solution
Kundan Kumar yadav
No ratings yet
Assgn7 NPTEL
Document7 pages
Assgn7 NPTEL
Sweety
No ratings yet
2081 Quiz Questions Exam
Document15 pages
2081 Quiz Questions Exam
Nipun Arora
No ratings yet
Processes: Practice Exercises
Document4 pages
Processes: Practice Exercises
MARYAM
No ratings yet
CENG400-Final-Fall 2014
Document11 pages
CENG400-Final-Fall 2014
Mohamad Issa
No ratings yet
ProjectReport Kanwarpal
Document17 pages
ProjectReport Kanwarpal
Kanwarpal Singh
No ratings yet
2008-2009-1 Solution
Document9 pages
2008-2009-1 Solution
r_k_s
No ratings yet
CSC2626: Assignment 1 Due January 28 at 6pm ET 25 Points
Document2 pages
CSC2626: Assignment 1 Due January 28 at 6pm ET 25 Points
Beerbhan Naru
No ratings yet
Neural Networks Project Report: 1. Background
Document5 pages
Neural Networks Project Report: 1. Background
sunil sanju
No ratings yet
Efficient Net B0
Document4 pages
Efficient Net B0
Navya
No ratings yet
# Assignment 4&5 (Combined) (Clustering & Dimension Reduction)
Document15 pages
# Assignment 4&5 (Combined) (Clustering & Dimension Reduction)
raosaheb
No ratings yet
Deep Neural Nets - 33 Years Ago and 33 Years From Now
Document17 pages
Deep Neural Nets - 33 Years Ago and 33 Years From Now
txboxi23
No ratings yet
Self-Driving Car Using Convolution Neural Network and Road Lane Detection
Document4 pages
Self-Driving Car Using Convolution Neural Network and Road Lane Detection
Subhankar Dey
No ratings yet
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
Document4 pages
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
SURENDRAN D CS085
No ratings yet
QTP - Vbscript Interview Questions and Answers: Length of The String
Document6 pages
QTP - Vbscript Interview Questions and Answers: Length of The String
Praveen Kumar
No ratings yet
Solution 3.1
Document4 pages
Solution 3.1
Adnan Kiani
No ratings yet
All Types of Cross Validation
Document9 pages
All Types of Cross Validation
Priya dharshini.G
No ratings yet
Learning To Drive A Real Car in 20 Minutes
Document8 pages
Learning To Drive A Real Car in 20 Minutes
tery
No ratings yet
NNProject t2
Document9 pages
NNProject t2
ayten55zoweil
No ratings yet
CSE512 Fall19 HW4V1
Document6 pages
CSE512 Fall19 HW4V1
JaspreetSingh
No ratings yet
Hatdog 1.2
Document18 pages
Hatdog 1.2
Jonafe Piamonte
No ratings yet
作業系統HW2
Document7 pages
作業系統HW2
Chia Chen
No ratings yet
Report Digit Recognition
Document11 pages
Report Digit Recognition
Aristofanio Meyrele
No ratings yet
(COMP5214) (2021) (S) Midterm Cmu63l 15021
Document9 pages
(COMP5214) (2021) (S) Midterm Cmu63l 15021
ferdy
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
Document7 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
Mrunal Bhilare
No ratings yet
Assignment 1
Document2 pages
Assignment 1
Arnav Yadav
No ratings yet
Question
Document3 pages
Question
Akaitom
No ratings yet
Architecture Computer
Document3 pages
Architecture Computer
lillyren2808
No ratings yet
DP 100
Document13 pages
DP 100
manan511shah
No ratings yet
Practical 01
Document16 pages
Practical 01
vikram rathore
No ratings yet
Software Engineering - 2023 - Assignment 9
Document5 pages
Software Engineering - 2023 - Assignment 9
Liya Wilson
No ratings yet
The Bcs Professional Examinations BCS Level 6 Professional Graduate Diploma in IT April 2008 Examiners Report Distributed & Parallel Systems
Document8 pages
The Bcs Professional Examinations BCS Level 6 Professional Graduate Diploma in IT April 2008 Examiners Report Distributed & Parallel Systems
Ozioma Ihekwoaba
No ratings yet
OS Question Bank - All Modules - II ND Year
Document8 pages
OS Question Bank - All Modules - II ND Year
Vamshidhar Reddy
100% (1)
Remaining Life Estimation With Keras - by Marco Cerliani - Towards Data Science
Document7 pages
Remaining Life Estimation With Keras - by Marco Cerliani - Towards Data Science
8c354be21d
No ratings yet
Instructions:: Q1. Answer The Following Questions: (Marks 10)
Document3 pages
Instructions:: Q1. Answer The Following Questions: (Marks 10)
Zoha Mobin
No ratings yet
DIT865 2018 Mar Solution
Document9 pages
DIT865 2018 Mar Solution
Education VietCo
No ratings yet
MCA 4th Sem. Asst. 2018-19
Document9 pages
MCA 4th Sem. Asst. 2018-19
Kumar Info
0% (1)
Assignment 4 - OSF-merged
Document81 pages
Assignment 4 - OSF-merged
Akash Samad
No ratings yet
Improving Data Locality by Chunking
Document15 pages
Improving Data Locality by Chunking
Salim Mehenni
No ratings yet
SVM and Linear Regression Interview Questions 1gwndp
Document23 pages
SVM and Linear Regression Interview Questions 1gwndp
ASHISH KUMAR MAJHI
No ratings yet
Visit Braindump2go and Download Full Version DP-100 Exam Dumps
Document10 pages
Visit Braindump2go and Download Full Version DP-100 Exam Dumps
Naganjana Repana
No ratings yet
I. General Simulation Concept: (68%, Problems 1-7: 3 Points Each, Problem 8: 2 Points Each)
Document3 pages
I. General Simulation Concept: (68%, Problems 1-7: 3 Points Each, Problem 8: 2 Points Each)
Sharif M Mizanur Rahman
No ratings yet
Cs2106 1617s1 Midterm Solution
Document13 pages
Cs2106 1617s1 Midterm Solution
Reynard Ardian
No ratings yet
Modified Non-Uniformly Distributed Error CMAC Algorithm For Regulation Purpose
Document6 pages
Modified Non-Uniformly Distributed Error CMAC Algorithm For Regulation Purpose
ijsret
No ratings yet
Ch4 Python
Document8 pages
Ch4 Python
Mohamed
No ratings yet
Evolutionary Algorithms for Food Science and Technology
From Everand
Evolutionary Algorithms for Food Science and Technology
Evelyne Lutton
No ratings yet
HW4 Spec
Document5 pages
HW4 Spec
鄭博仁
No ratings yet
HW4 Slide
Document8 pages
HW4 Slide
鄭博仁
No ratings yet
Lecture 9 - RL
Document82 pages
Lecture 9 - RL
鄭博仁
No ratings yet
Homework Assignments 2 PDF
Document10 pages
Homework Assignments 2 PDF
鄭博仁
No ratings yet
Artificial Neural Network: Architectures: Debasis Samanta
Document27 pages
Artificial Neural Network: Architectures: Debasis Samanta
Soumya Paik
No ratings yet
ENT203 Reading Assignment
Document28 pages
ENT203 Reading Assignment
thuhyekyo
No ratings yet
PGCON Paper Final
Document4 pages
PGCON Paper Final
Arbaaz Shaikh
No ratings yet
Paper 2021.findings-Emnlp.278
Document7 pages
Paper 2021.findings-Emnlp.278
melophilesbd
No ratings yet
Conference Paper ID #1570854831
Document12 pages
Conference Paper ID #1570854831
Imelda Uli Vistalina Simanjuntak
No ratings yet
Aiml Assignment - 1
Document2 pages
Aiml Assignment - 1
chethan m s
No ratings yet
Naïve Bayes Classifier: Ke Chen
Document19 pages
Naïve Bayes Classifier: Ke Chen
Alimushwan Adnan
No ratings yet
Data Science Lab Assignment: Name: Vemula Yaminee Jyothsna Roll No: 20BM6JP44
Document2 pages
Data Science Lab Assignment: Name: Vemula Yaminee Jyothsna Roll No: 20BM6JP44
Jyothsna Vemula
No ratings yet
An Improved Demand Forecasting Model Using PDF
Document16 pages
An Improved Demand Forecasting Model Using PDF
Arun Kumar
No ratings yet
Inductive Bias Hypothesis Hypothesis Space Variance
Document12 pages
Inductive Bias Hypothesis Hypothesis Space Variance
suryavamshimeghana
No ratings yet
Subjects You Need To Know:: Programming Languages of AI
Document7 pages
Subjects You Need To Know:: Programming Languages of AI
MaxSteel
0% (1)
AI Techno AI
Document15 pages
AI Techno AI
Mahreen Khan
No ratings yet
Neural Joint Source-Channel Coding
Document15 pages
Neural Joint Source-Channel Coding
Theodora Skotinioti
No ratings yet
Knowledge Representation
Document65 pages
Knowledge Representation
Rajamaheshwar Mahesh
No ratings yet
UX, Roles and The Skills You Need To Learn
Document31 pages
UX, Roles and The Skills You Need To Learn
no ember day
No ratings yet
Seminar PPT - Swasti Aggarwal
Document10 pages
Seminar PPT - Swasti Aggarwal
nipakforever
No ratings yet
Boosting
Document13 pages
Boosting
Pankaj Pandey
No ratings yet
Artists As Inventors Inventors As Artists: Dieter Daniels / Barbara U. Schmidt (Editors)
Document8 pages
Artists As Inventors Inventors As Artists: Dieter Daniels / Barbara U. Schmidt (Editors)
MariannaLadane
No ratings yet
Lec.2.Intro.D.S. Fall 2024
Document31 pages
Lec.2.Intro.D.S. Fall 2024
abdelhamedhsn.2003
No ratings yet
PID Controller With Feedforward Low Pass Filters For Permanent Magnet Stepper Motors
Document4 pages
PID Controller With Feedforward Low Pass Filters For Permanent Magnet Stepper Motors
prateek agarwal
No ratings yet
SQL Interview Questions
Document14 pages
SQL Interview Questions
Manik Panwar
No ratings yet
Lesson Plan in Oral Communication
Document2 pages
Lesson Plan in Oral Communication
Baby Jean T. Gipanao
No ratings yet
Recurrent Residual U-Net Short Critical Review
Document3 pages
Recurrent Residual U-Net Short Critical Review
International Journal of Innovative Science and Research Technology
No ratings yet
12-3 p22 More Than 120 Plcs Using Codesys
Document4 pages
12-3 p22 More Than 120 Plcs Using Codesys
Emerson Batista
No ratings yet
08 Real - Time - Object - Detection - With - Audio - Feedback - Using - Yolo - vs. - Yolo - v3
Document7 pages
08 Real - Time - Object - Detection - With - Audio - Feedback - Using - Yolo - vs. - Yolo - v3
Nasser Al Musalhi
No ratings yet
EE250 Fall2012 Lecture-Notes
Document230 pages
EE250 Fall2012 Lecture-Notes
Anurag Pandey
No ratings yet