Professional Documents
Culture Documents
CS282r Practical0 v2
CS282r Practical0 v2
HarvardUniversity
CS282rPractical0:ImplementingSARSA
GettingStarted
Thissemester,wewillprimarilybeusingPython2.7forCS282r.1 OnceyouhavePythonand
NumPyinstalled, runthefollowinglineatyourcommandprompt:
pipinstallegit+https://github.com/dtak/cs282rl.git#egg=cs282rl
Thiswilldownloadoursupportcodeandinstalltheapplicablemodule.
ipythonnotebookp0example.ipynb
(YoucanalsofollowalongintheHTMLversion ofthenotebook.)Youshouldcreateanew
notebookforyourresponses,butfeelfreetouseanycodefromtheexamplenotebook.
Practical0TasksandQuestions
NowitstimeforyoutoimplementSARSA.Please codeitinitiallyonyourown
without
talkingtoothersorconsultingthewebforimplementations .Youronlysourcesshouldbe
theSuttonandBartobook(namely,Section6.4inthe2012secondeditionPDF)andyour
classnotes.
Usethistaskconfiguration:
task=GridWorld(
GridWorld.samples['larger'],
action_error_prob=.1,
rewards={'*':50,'moved':1,'hitwall':1})
1
AllenusesEnthoughtCanopy Kenuses
Anaconda .Botharefreeandprovideaconvenientmeansof
installinganumberofusefulPythonlibrariesforscientificcomputing,includingtoolsforgeneratingplots
(Matplotlib)andefficientarraymanipulation(NumPy).KenusesPython3.4daytoday,sothecoursecode
shouldworkinbothPython2.7and3.4.
Problem 0
RunyourSARSAlearnerfor50trials.Leteachtrialrununtilithascompleted100episodesor
5000iterationsofexperience,whichevercomesfirst. Createfourplots
usingtwodifferent
waysofviewingtimeandrewards:
1. Time :Trymakingthexaxiseithertherawiterationnumber(upto5000)orthe
episodenumber(upto100).Whenmakingthexaxisthenumberofepisodes,sumthe
valuesofalltherewardsinanepisode.
2. Rewards :Trymakingtheyaxiseithertheimmediatereward(atthatiterationor
episode)orthecumulativerewardsofarthattrial.
Include95%errorbarsonyourplots.Basedontheseplots, whatdoyouthinkmightbe
goodwaystoreportonthequalityofareinforcementlearningalgorithm,andwhy ?
Problem 1
WhatdidyoulearnintheprocessofimplementingSARSA?
Onceyouhavewrittenupthecode,youmaydiscusswithothersandconsulttheweb
however,youmust(1)citewithwhomyoudiscussedand(2)clearlystatewhatinsightsyou
gainedfromthesediscussionsand/orsources.
Practical0Deliverables
UploadyourIPythonnotebooktothecoursewebsite.Usethefollowingnamingconvention:
lastname_firstname_P0.ipynb .Yournotebookfileshouldcontainallinformationthatyou
wouldlikeustoreview:
1. Yourcode
2. Plotsofthecumulativeandinstantaneousrewardsperepisodeandperiteration,with
errorbars(total4plots)
3. YournarrativeresponsetothequestioninProblem0(Whatdoyouthinkmightbe
goodwaystoreportonthequalityofareinforcementlearningalgorithm,andwhy?)
4. Anarrativesectionattheendwithobservations/insightsbasedonyourexperience
codingthealgorithm,discussionswithothers,andtheresults
Youmaydiscusswithotherswhenthinkingabouttheevaluation,butthewriteupshouldbe
yourown. Pleaseincludethenamesofanyoneyouworkedwithortalkedtoduringthe
assignment.
IfforsomereasonyoucantusetheIPythonnotebook,packageyourcodeandaPDFofyour
writeupinaZIParchivenamed lastname_firstname_P0.zip .Wedoencourageyoutotry
IPython,though:)