Problem Set 3 (Q1 To Q3) Solutions

Uploaded by

Keerthana Chirumamilla

0% found this document useful (0 votes)

6 views3 pages

Original Title

3.1

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views3 pages

Problem Set 3 (Q1 To Q3) Solutions

Uploaded by

Keerthana Chirumamilla

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

EE675A : Introduction to Reinforcement Learning

Problem Set 3 (Q1 to Q3) solutions

04/05/2023
Lecturer: Subramanya swamy peruru Scribe: A.V.Jayanth reddy

1 Contextual bandits
Q1: Given an over-determined system of linear equations Ay = b, where there are more equa-
tions than unknowns, assuming (AT A)−1 exists, we generally use the concept of pseudo-inverse to
compute  ŷ = (ATA)−1 ATb asthe estimate of y. Find ŷ if
1 3 17
A =  5 7  , b = 19
11 13 23
A1: To compute
[ the pseudo-inverse
] of a matrix, we can use the formula: ŷ = (AT A)−1 AT b
147 181
(AT A) =
181
[ 227 ]
0.3734 −0.2977
(AT A)−1 =
−0.2977
[ 0.2418 ]
T −1 T −0.5197 −0.2171 0.2368
(A A) A =
0.4276 0.2039 −0.1316[ ]
−7.513
from this, we can estimate the value of ŷ to be
8.11
Q2: Consider a contextual bandits scenario in which the true mean µa (x) = θaT x of an arm a is
a linear function of the context vector x. Here θa and x are n × 1 vectors if n is the number of
features in the context vector. Assume that we have two arms a1 and a2 , and the [ ]samples (con-
1
text, action, reward) observed by the agent in the ﬁrst 6 rounds are as follows: ( , a1 , r = 17),
[ ] [ ] [ ] [ ] [ ] 3
7 5 5 11 5
( , a2 , r = 2), ( , a1 , r = 2), ( , a2 , r = 1), ( , a1 , r = 23), ( , a2 , r = 9)
13 7 [3 ] 13 7
2
If the context seen in the 7th round is , what arm is played by the agent in that round if it uses
1
a greedy policy?

A2: Given the samples (context, action, reward) observed by the agent in the ﬁrst 6 rounds, we can
estimate the parameters a using linear regression. For each arm, we can collect the samples where
that arm was selected and use them to estimate the corresponding parameters. Speciﬁcally, we can
use the least squares method to estimate θa as follows:

1
θa = (XaT Xa)−1 XaT ra
   
1 3 17
for a1 , Xa1 =  5 7  , ra1 = 2 

 11 13   23
7 13 2
for a2 , Xa2 = 5 3 , ra2 = 1
  
5 7 9 ] [ ] [
−3.8 0.6
from this we can obtain the sample estimated to be θ1 = andθ2 =
4.6 0.0
To determine the arm to select in the 7th round using a[greedy
] policy, we compute
[ ] the expected
2 2
reward for each arm given the observed context: µ1 = θ1T = −2.8, µ2 = θ2T = 1.2
1 1
Since the expected reward for arm 2 is higher than that of arm 1, the agent should select arm 2
in the 7th round if it uses a greedy policy.

2 Policy Gradient Algorithm

Q3: Consider a parametric representation of policy π given by

π(a; {θb }b∈A ) = ∑ exp(θa ) , for a ∈ A.

b∈A exp(θb )

Here A is the set of actions, and {θb }b∈A is the parameter vector that characterizes the policy
π. Assuming there are only two actions a1 and a2 , derive the policy gradient update for this case.

A3: In the policy gradient algorithm, the policy gradient update is given by
(t+1) (t)
θi = θi + α(E[Rt dlog(π(x))
dθi
])

π(ai ; θ) = ∑ exp(θi ) ,
j∈A exp(θj )
∑
dlog(π(ai ;θ)) d(log(exp(θi )− j∈A exp(θj )))
Here, dθk
= dθk

= 1i=k − ∑ expθk
j∈A exp(θj )
exp(θ1 ) exp(θ2 )
we know that, π(a1 ) = exp(θ1 )+exp(θ2 )
and π(a2 ) = exp(θ1 )+exp(θ2 )

If we picked arm 1, then the update will be -

(t+1) (t) (t+1) (t)
θ1 = θ1 + α(Rt )(1 − π(a1 )); θ2 = θ2 + α(Rt )(0 − π(a2 ))

2
If we picked arm 2, then the update will be -
(t+1) (t) (t+1) (t)
θ2 = θ2 + α(Rt )(1 − π(a2 )); θ1 = θ1 + α(Rt )(0 − π(a1 ))

A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
Rating: 3.5 out of 5 stars
3.5/5 (8)
Long-Memory Time Series: Theory and Methods
From Everand
Long-Memory Time Series: Theory and Methods
Wilfredo Palma
No ratings yet
Study Guide for Applied Finite Mathematics
From Everand
Study Guide for Applied Finite Mathematics
Nicholas A. Macri
No ratings yet
Building A Data Culture in MOF
Document136 pages
Building A Data Culture in MOF
wowo
100% (1)
耳から覚える聴解N2
Document135 pages
耳から覚える聴解N2
Do Ngoc Huu
100% (1)
Real Analysis and Probability: Solutions to Problems
From Everand
Real Analysis and Probability: Solutions to Problems
Robert P. Ash
No ratings yet
Cs421 Cheat Sheet
Document2 pages
Cs421 Cheat Sheet
Joe McGuckin
No ratings yet
Exercise Section 1.3
Document8 pages
Exercise Section 1.3
adelinameidy
75% (4)
Grade 5 Tests of Divisibility: Choose Correct Answer(s) From The Given Choices
Document7 pages
Grade 5 Tests of Divisibility: Choose Correct Answer(s) From The Given Choices
Yifan ZHU
No ratings yet
Kenya Urban Roads Authority: Office of The Regional Manager - Coast
Document19 pages
Kenya Urban Roads Authority: Office of The Regional Manager - Coast
zewo13
No ratings yet
2 Merged
Document45 pages
2 Merged
Keerthana Chirumamilla
No ratings yet
S2 Mathematics Module 1 Handout
Document32 pages
S2 Mathematics Module 1 Handout
Aakanksha Kartik
No ratings yet
Algorithms Solution - 2
Document9 pages
Algorithms Solution - 2
Milton
No ratings yet
IJPAM - No 32 2014
Document20 pages
IJPAM - No 32 2014
bhushanam kosuri
No ratings yet
Unit 9 Determinants: Structure
Document25 pages
Unit 9 Determinants: Structure
Sajal Kumar
No ratings yet
Bayes Estimator of One Parameter Gamma Distribution Under Quadratic and LINEX Loss Function Wael Abdul Lateef Jasim
Document16 pages
Bayes Estimator of One Parameter Gamma Distribution Under Quadratic and LINEX Loss Function Wael Abdul Lateef Jasim
Ayush choudhary
No ratings yet
On K Gamma and K Beta Distributions and Moment Generating Functions
Document7 pages
On K Gamma and K Beta Distributions and Moment Generating Functions
Nikos Mantzakouras
No ratings yet
Stat443 Final2018 PDF
Document4 pages
Stat443 Final2018 PDF
Tongtong Zhai
No ratings yet
Kill Me
Document23 pages
Kill Me
Emily
No ratings yet
Tables For Cas Exam Mas-Ii
Document18 pages
Tables For Cas Exam Mas-Ii
Phat Loc
No ratings yet
MATH 211 - Winter 2013 Lecture Notes: (Adapted by Permission of K. Seyffarth)
Document28 pages
MATH 211 - Winter 2013 Lecture Notes: (Adapted by Permission of K. Seyffarth)
cmculham
No ratings yet
Exfall2021 With Solutions
Document8 pages
Exfall2021 With Solutions
Julius Desertus
No ratings yet
HW13A Sol
Document3 pages
HW13A Sol
Hilmi Nur Ardian
No ratings yet
The Algebra of The Equality On Complex Numbers III
Document14 pages
The Algebra of The Equality On Complex Numbers III
José Rodrigo Alejandro Martínez Díaz
No ratings yet
Lab 8
Document13 pages
Lab 8
naimoon
No ratings yet
Plotting System Response in MATLAB: Lab 2: Notes
Document5 pages
Plotting System Response in MATLAB: Lab 2: Notes
Oburu David Katandi
No ratings yet
Characteristic
Document9 pages
Characteristic
thanhptit
No ratings yet
Math Sample
Document40 pages
Math Sample
Hudson Benevides
No ratings yet
Template Chaos Circuit
Document14 pages
Template Chaos Circuit
Enrique Price
No ratings yet
Applications of Calculus
Document11 pages
Applications of Calculus
Siddharth Acharya
No ratings yet
Calculo Vectorial
Document16 pages
Calculo Vectorial
Cristian Huamaní
No ratings yet
111123-Artikkelin Teksti-237361-1-10-20220527
Document13 pages
111123-Artikkelin Teksti-237361-1-10-20220527
Rohan Bihlove
No ratings yet
CEE471 HW1 Sol
Document7 pages
CEE471 HW1 Sol
sami_bangash_1
No ratings yet
HMW1 PDF
Document11 pages
HMW1 PDF
eduardolarangeira
No ratings yet
Robert J. T. Bell
Document79 pages
Robert J. T. Bell
Angshika Dey
No ratings yet
Attachment 1
Document21 pages
Attachment 1
anthony.onyishi.242680
No ratings yet
EE675A Lecture 3
Document8 pages
EE675A Lecture 3
SAURABH VISHWAKARMA
No ratings yet
MATLAB2Python-Julia Cheatsheet
Document13 pages
MATLAB2Python-Julia Cheatsheet
Sina
No ratings yet
Functions - by Trockers
Document53 pages
Functions - by Trockers
chimbo priest
No ratings yet
Lie Algebra 2
Document5 pages
Lie Algebra 2
симплектик геометрия
No ratings yet
Ex 3
Document1 page
Ex 3
Alex Oringen
No ratings yet
Lecture07 Student
Document12 pages
Lecture07 Student
112304025
No ratings yet
CL 204 - Assignment 2
Document6 pages
CL 204 - Assignment 2
Sameer Hansda
No ratings yet
Odf-SC1 05 MonteCarlo
Document22 pages
Odf-SC1 05 MonteCarlo
i w
No ratings yet
Solutions To Exercises 8.1: Section 8.1 Partial Differential Equations in Physics and Engineering
Document21 pages
Solutions To Exercises 8.1: Section 8.1 Partial Differential Equations in Physics and Engineering
Tri Phương Nguyễn
No ratings yet
How To Parametrize A Curve
Document3 pages
How To Parametrize A Curve
Khalid Almuteri
No ratings yet
HW Rest Part1
Document5 pages
HW Rest Part1
Anand Sridharan
No ratings yet
Ch8 (3) Numerical Integration
Document19 pages
Ch8 (3) Numerical Integration
আসিফ রেজা
No ratings yet
Problem Set 5 Solution Numerical Methods
Document10 pages
Problem Set 5 Solution Numerical Methods
Ariyan Jahanyar
No ratings yet
Chapter 2 Solutions
Document46 pages
Chapter 2 Solutions
Gulrez M
No ratings yet
Math 462: HW2 Solutions: Due On July 25, 2014
Document7 pages
Math 462: HW2 Solutions: Due On July 25, 2014
mjtbbhrm
No ratings yet
Linearization Stability Definitions Simulation in Matlab/Simulink
Document39 pages
Linearization Stability Definitions Simulation in Matlab/Simulink
Sachin Shende
No ratings yet
Sample q1
Document5 pages
Sample q1
Keerthana Chirumamilla
No ratings yet
No Zero-Crossings For Random Polynomials and The Heat Equation Amir Dembo and Sumit Mukherjee - Kabluchko - Poonen-Zeitouni-Thomas Bloom
Document34 pages
No Zero-Crossings For Random Polynomials and The Heat Equation Amir Dembo and Sumit Mukherjee - Kabluchko - Poonen-Zeitouni-Thomas Bloom
Sam Taylor
No ratings yet
Math F211 - Compre-Qb
Document2 pages
Math F211 - Compre-Qb
f20221235
No ratings yet
Wa0193.
Document4 pages
Wa0193.
jagarapuyajnasri
No ratings yet
Sampling Process and Theorems
Document15 pages
Sampling Process and Theorems
Izzat Azman
No ratings yet
Linear Algebra Lecture Notes
Document94 pages
Linear Algebra Lecture Notes
Soon Chang SC
No ratings yet
Mathematics Cheat Sheet For Population Biology: James Holland Jones
Document9 pages
Mathematics Cheat Sheet For Population Biology: James Holland Jones
Dinesh Barupal
No ratings yet
Sample Midterm 1 PDF
Document3 pages
Sample Midterm 1 PDF
lolnodudeborshismine
No ratings yet
Probability 2019
Document11 pages
Probability 2019
Swarna Chadalavada
No ratings yet
hw1 Sols PDF
Document5 pages
hw1 Sols PDF
Quang H. Lê
No ratings yet
What Is Asymptotic Notation
Document51 pages
What Is Asymptotic Notation
Hemalatha Sivakumar
No ratings yet
Process Modelling and Optimization
Document11 pages
Process Modelling and Optimization
avish mehta
No ratings yet
Ds 12
Document12 pages
Ds 12
Keerthana Chirumamilla
No ratings yet
Ds 3
Document12 pages
Ds 3
Keerthana Chirumamilla
No ratings yet
Ds 2
Document17 pages
Ds 2
Keerthana Chirumamilla
No ratings yet
Prev Year Quiz1
Document11 pages
Prev Year Quiz1
Keerthana Chirumamilla
No ratings yet
Sample q1
Document5 pages
Sample q1
Keerthana Chirumamilla
No ratings yet
Sample q2
Document6 pages
Sample q2
Keerthana Chirumamilla
No ratings yet
Endsem 2022
Document2 pages
Endsem 2022
Keerthana Chirumamilla
No ratings yet
RL Exercises Solutions
Document54 pages
RL Exercises Solutions
Keerthana Chirumamilla
No ratings yet
Notes Exercise RL
Document31 pages
Notes Exercise RL
Keerthana Chirumamilla
No ratings yet
CPE207 Object Oriented Programming (Week 5)
Document27 pages
CPE207 Object Oriented Programming (Week 5)
arya
No ratings yet
ProModel Versions Historie
Document45 pages
ProModel Versions Historie
toño figueroa
No ratings yet
Introduction To
Document83 pages
Introduction To
RambhupalReddy
100% (1)
Charts and Graphs
Document6 pages
Charts and Graphs
Anand Bhawna
No ratings yet
GSM Based Home Security System
Document41 pages
GSM Based Home Security System
OpilaStephen
No ratings yet
Versasec Releases vSEC:CMS S-Series v6.0
Document3 pages
Versasec Releases vSEC:CMS S-Series v6.0
PR.com
No ratings yet
Doubly Linked List
Document4 pages
Doubly Linked List
shameedp
No ratings yet
Expradv Filelist
Document57 pages
Expradv Filelist
newrichtemp money
No ratings yet
On Tang Co Laboratory 1
Document10 pages
On Tang Co Laboratory 1
Efraim John Hernandez
100% (1)
CPU Sizing For Concurrent Users, Running HANA Analytical Applications
Document5 pages
CPU Sizing For Concurrent Users, Running HANA Analytical Applications
ivan
No ratings yet
CEL FI RS2 Multi-Carrier Cellular Indoor Coverage: Single Band
Document2 pages
CEL FI RS2 Multi-Carrier Cellular Indoor Coverage: Single Band
Milo Boon
No ratings yet
Integration of The Signum, Piecewise and Related Functions
Document7 pages
Integration of The Signum, Piecewise and Related Functions
Hemendra Verma
No ratings yet
How To Use My Disney Experience Family and Friends
Document1 page
How To Use My Disney Experience Family and Friends
Dante Arguijo
No ratings yet
Vsat Viasat
Document2 pages
Vsat Viasat
vsharma26
No ratings yet
Catalog Advantys AS-I IP20-IP67 - 803510 - DIA3ED2040909EN - 200408
Document30 pages
Catalog Advantys AS-I IP20-IP67 - 803510 - DIA3ED2040909EN - 200408
Jean Marzan
No ratings yet
Working With Microsoft® Office Files
Document2 pages
Working With Microsoft® Office Files
Beautifulandbad
No ratings yet
Sánchez - Modelado Termodinámico de Ciclos Avanzados para Propulsión de Alta Velocidad
Document97 pages
Sánchez - Modelado Termodinámico de Ciclos Avanzados para Propulsión de Alta Velocidad
Sara
No ratings yet
Chapter 7 Pointers Unimap
Document26 pages
Chapter 7 Pointers Unimap
SitiNajwaAmran
No ratings yet
lastCleanException 20220424185711
Document19 pages
lastCleanException 20220424185711
beom choi
No ratings yet
Camfrog Bot 6.0 Developer - S Guide
Document29 pages
Camfrog Bot 6.0 Developer - S Guide
Lingga Noviawan
No ratings yet
Capability Study With His To Gram Excel Template
Document3 pages
Capability Study With His To Gram Excel Template
Melissa Murray
No ratings yet
K L University Freshman Engineering Department: A Project Based Lab Report On
Document18 pages
K L University Freshman Engineering Department: A Project Based Lab Report On
Manaskiran Gmk
50% (2)
Engl393 - Instructions Final Submission
Document24 pages
Engl393 - Instructions Final Submission
api-622526151
No ratings yet
Amcat - 12 - C
Document5 pages
Amcat - 12 - C
Shobana
No ratings yet
Recycling
Document112 pages
Recycling
John P. Bandoquillo
No ratings yet
Ms 201 Mock Test
Document9 pages
Ms 201 Mock Test
Balachandraraju Lakkam King
No ratings yet