Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword or section
Like this
11Activity

Table Of Contents

0 of .
Results for:
No results containing your search query
P. 1
Large Scale Reinforcement Learning using Q-SARSA(λ) and Cascading Neural Networks

Large Scale Reinforcement Learning using Q-SARSA(λ) and Cascading Neural Networks

Ratings: (0)|Views: 1,607 |Likes:
Published by Steffen Nissen
This thesis explores how the novel model-free reinforcement learning algorithm
Q-SARSA(λ) can be combined with the constructive neural network training algorithm
Cascade 2, and how this combination can scale to the large problem of
backgammon.
This thesis explores how the novel model-free reinforcement learning algorithm
Q-SARSA(λ) can be combined with the constructive neural network training algorithm
Cascade 2, and how this combination can scale to the large problem of
backgammon.

More info:

Published by: Steffen Nissen on Sep 06, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

03/30/2012

pdf

text

original

 
Large Scale Reinforcement Learningusing
Q
-SARSA(
λ
) andCascading Neural Networks
M.Sc. Thesis
Steffen Nissen
<lukesky@diku.dk>
October 8, 2007
Department of Computer ScienceUniversity of CopenhagenDenmark
 
ii
 
Abstract
This thesis explores how the novel model-free reinforcement learning algorithm
Q
-SARSA(
λ
) can be combined with the constructive neural network training al-gorithm Cascade 2, and how this combination can scale to the large problem of backgammon.In order for reinforcement learning to scale to larger problem sizes, it needsto be combined with a function approximator such as an artificial neural network.Reinforcement learning has traditionally been combined with simple incrementalneural network training algorithms, but more advanced training algorithms likeCascade 2 exists that have the potential of achieving much higher performance.All of these advanced training algorithms are, however, batch algorithms and sincereinforcement learning is incremental this poses a challenge. As of now the potentialof the advanced algorithms have not been fully exploited and the few combinationalmethods that have been tested have failed to produce a solution that can scale tolarger problems.The standard reinforcement learning algorithms used in combination with neuralnetworks are
Q
(
λ
) and SARSA(
λ
), which for this thesis have been combined to formthe
Q
-SARSA(
λ
) algorithm. This algorithm has been combined with the Cascade 2neural network training algorithm, which is especially interesting because it is aconstructive algorithm that can grow a neural network by gradually adding neurons.For combining Cascade 2 and
Q
-SARSA(
λ
) two new methods have been developed:The NF
Q
-SARSA(
λ
) algorithm, which is an enhanced version of Neural Fitted
Q
Iteration and the novel sliding window cache.The sliding window cache and Cascade 2 are tested on the medium sized moun-tain car and cart pole problems and the large backgammon problem. The resultsfrom the test show that
Q
-SARSA(
λ
) performs better than
Q
(
λ
) and SARSA(
λ
)and that the sliding window cache in combination with Cascade 2 and
Q
-SARSA(
λ
)performs significantly better than incrementally trained reinforcement learning. Forthe cart pole problem the algorithm performs especially well and learns a policy thatcan balance the pole for the complete 300 steps after only 300 episodes of learn-ing, and its resulting neural network contains only one hidden neuron. This shouldbe compared to 262 steps for the incremental algorithm after 10,000 episodes of learning. The sliding window cache scales well to the large backgammon problemand wins 78% of the games against a heuristic player, while incremental trainingonly wins 73% of the games. The NF
Q
-SARSA(
λ
) algorithm also outperforms theincremental algorithm for the medium sized problems, but it is not able to scale tobackgammon.The sliding window cache in combination with Cascade 2 and
Q
-SARSA(
λ
)performs better than incrementally trained reinforcement learning for both mediumsized and large problems and it is the first combination of advanced neural networktraining algorithms and reinforcement learning that can scale to larger problems.iii

Activity (11)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads
artz_6 liked this
Jalila Zouhair liked this
iky77 liked this
iky77 liked this
sandip liked this
Nitin Sharma liked this
dustinfranklin liked this
dave_765 liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->