You are on page 1of 2

January2015

HarvardUniversity
CS282rPractical0:ImplementingSARSA

GettingStarted

Thissemester,wewillprimarilybeusingPython2.7forCS282r.1 OnceyouhavePythonand
NumPyinstalled, runthefollowinglineatyourcommandprompt:

pipinstallegit+https://github.com/dtak/cs282rl.git#egg=cs282rl

Thiswilldownloadoursupportcodeandinstalltheapplicablemodule.

Weencourageyoutoworkthroughthispracticalinthe IPythonNotebook interface,whichlets


youkeepcode,results,andnarrativeinoneplace,easingyourworkandours.Togetstarted
withinteractingwiththeGridWorlddomain,downloadthe p0example notebookandrun:

ipythonnotebookp0example.ipynb

(YoucanalsofollowalongintheHTMLversion ofthenotebook.)Youshouldcreateanew
notebookforyourresponses,butfeelfreetouseanycodefromtheexamplenotebook.

Practical0TasksandQuestions

NowitstimeforyoutoimplementSARSA.Please codeitinitiallyonyourown
without
talkingtoothersorconsultingthewebforimplementations .Youronlysourcesshouldbe
theSuttonandBartobook(namely,Section6.4inthe2012secondeditionPDF)andyour
classnotes.

Usethistaskconfiguration:

task=GridWorld(
GridWorld.samples['larger'],
action_error_prob=.1,
rewards={'*':50,'moved':1,'hitwall':1})

1
AllenusesEnthoughtCanopy Kenuses
Anaconda .Botharefreeandprovideaconvenientmeansof
installinganumberofusefulPythonlibrariesforscientificcomputing,includingtoolsforgeneratingplots
(Matplotlib)andefficientarraymanipulation(NumPy).KenusesPython3.4daytoday,sothecoursecode
shouldworkinbothPython2.7and3.4.
Problem 0
RunyourSARSAlearnerfor50trials.Leteachtrialrununtilithascompleted100episodesor
5000iterationsofexperience,whichevercomesfirst. Createfourplots
usingtwodifferent
waysofviewingtimeandrewards:
1. Time :Trymakingthexaxiseithertherawiterationnumber(upto5000)orthe
episodenumber(upto100).Whenmakingthexaxisthenumberofepisodes,sumthe
valuesofalltherewardsinanepisode.
2. Rewards :Trymakingtheyaxiseithertheimmediatereward(atthatiterationor
episode)orthecumulativerewardsofarthattrial.
Include95%errorbarsonyourplots.Basedontheseplots, whatdoyouthinkmightbe
goodwaystoreportonthequalityofareinforcementlearningalgorithm,andwhy ?

Problem 1
WhatdidyoulearnintheprocessofimplementingSARSA?

Onceyouhavewrittenupthecode,youmaydiscusswithothersandconsulttheweb
however,youmust(1)citewithwhomyoudiscussedand(2)clearlystatewhatinsightsyou
gainedfromthesediscussionsand/orsources.

Practical0Deliverables

UploadyourIPythonnotebooktothecoursewebsite.Usethefollowingnamingconvention:
lastname_firstname_P0.ipynb .Yournotebookfileshouldcontainallinformationthatyou
wouldlikeustoreview:

1. Yourcode
2. Plotsofthecumulativeandinstantaneousrewardsperepisodeandperiteration,with
errorbars(total4plots)
3. YournarrativeresponsetothequestioninProblem0(Whatdoyouthinkmightbe
goodwaystoreportonthequalityofareinforcementlearningalgorithm,andwhy?)
4. Anarrativesectionattheendwithobservations/insightsbasedonyourexperience
codingthealgorithm,discussionswithothers,andtheresults

Youmaydiscusswithotherswhenthinkingabouttheevaluation,butthewriteupshouldbe
yourown. Pleaseincludethenamesofanyoneyouworkedwithortalkedtoduringthe
assignment.

IfforsomereasonyoucantusetheIPythonnotebook,packageyourcodeandaPDFofyour
writeupinaZIParchivenamed lastname_firstname_P0.zip .Wedoencourageyoutotry
IPython,though:)

You might also like