Professional Documents
Culture Documents
Boris Mocialov
2015
1 / 10
Outline
PyBrain
Alternatives
Short on RL
PyBrain RL
Examples
Source Alterations
2 / 10
PyBrain
I Easy-To-Use
I Algorithms for ANN, UL, SL, RL, Evolution
I Modular
I FF/R-NN, LSTM, Deep Belief, Boltzmann Machines
3 / 10
ML Alternatives
I FANN(.c/.cpp)
Fast, evolving topologies, adjust parameters on-the-fly
I Encog(.java)
Multi-threaded, SVM, ANN, GP, BN, HMM, GA
I Theano(.py)
Number-crunching framework, tight integration with Numpy,
fast, many sub-projects:
I Pylearn, Theanets [scientific]
I Lasagne [lightweight (FF/C/R-NN), LSTM, CPU/GPU]
I Keras [modular, minimalistic, (C/R-NN), CPU/GPU]
I Caffe(.cpp)
models defined separately, CPU/GPU
I Accord(.net)
combined with audio/video processing libraries, backprop,
DBN, BM
etc.
4 / 10
Short on RL
5 / 10
PyBrain RL
I Maze
Discrete states
Certain Transitions
Q-Learning
Epsilon-Greedy
(Non-)Stationary
Fully Observable
Finite Horizon
7 / 10
Source Alterations
I pybrain.rl.environments.mazes.maze
class Maze(Environment, Named):
initPos = None
def __init__(self, topology, goal, **args):
if self.initPos == None:
self.initPos = self._freePos()
def _freePos(self):
if self.punishing_states != None:
if (i, j) not in self.punishing_states:
res.append((i, j))
I pybrain.rl.environments.mazes.tasks
class MDPMazeTask(Task):
def getReward(self):
if self.env.goal == self.env.perseus:
self.env.reset()
reward = 1
elif self.env.punishing_states != None and
self.env.perseus in self.env.punishing_states:
self.env.reset()
reward = -1
else:
reward = -0.02
return rewar
I pybrain.rl.explorers.discrete.egreedy
class EpsilonGreedyExplorer(DiscreteExplorer):
#self.epsilon *= self.decay
8 / 10
Maze Real-Time Learning Set-Up
envmatrix = array([[1, 1, 1, 1, 1, 1, 1, 1, 1],
...])
env = Maze(envmatrix, (1, 7), [(1, 1)], [(1, 6)])
# create task
task = MDPMazeTask(env)
# create agent
agent = LearningAgent(table, learner)
# create experiment
experiment = Experiment(task, agent)
for i in range(5000):
# interact with the environment (here in batch mode)
experiment.doInteractions(200)
agent.learn()
agent.reset()
if i == 2500:
env.clearPunishingStates()
9 / 10
Results
10 / 10