You are on page 1of 10

Real-Time Machine Learning with PyBrain

Boris Mocialov

Engineering & Physical Sciences


Heriot-Watt University,
Edinburgh Centre For Robotics

2015

1 / 10
Outline

PyBrain

Alternatives

Short on RL

PyBrain RL

Examples

Source Alterations

2 / 10
PyBrain

I Easy-To-Use
I Algorithms for ANN, UL, SL, RL, Evolution
I Modular
I FF/R-NN, LSTM, Deep Belief, Boltzmann Machines

3 / 10
ML Alternatives
I FANN(.c/.cpp)
Fast, evolving topologies, adjust parameters on-the-fly
I Encog(.java)
Multi-threaded, SVM, ANN, GP, BN, HMM, GA
I Theano(.py)
Number-crunching framework, tight integration with Numpy,
fast, many sub-projects:
I Pylearn, Theanets [scientific]
I Lasagne [lightweight (FF/C/R-NN), LSTM, CPU/GPU]
I Keras [modular, minimalistic, (C/R-NN), CPU/GPU]
I Caffe(.cpp)
models defined separately, CPU/GPU
I Accord(.net)
combined with audio/video processing libraries, backprop,
DBN, BM
etc.
4 / 10
Short on RL

I Data is spread out in the environment and spates are


distinguished
I Algorithm (agent) must learn mapping between input and
output (behaviour)
I Agent must explore the environment
I Agent receives reinforcement based on the state transitions

5 / 10
PyBrain RL

PyBrain src pybrain.rl.environments.mazes

PyBrain src pybrain.rl.learners.valuebased 6 / 10


Examples
I Inverted Pendulum (aka pole balancing)
Continuous states
Certain Transitions
Neuro-Fitted Q-Learning
Epsilon-Greedy
Stationary
Fully Observable
Finite Horizon

I Maze
Discrete states
Certain Transitions
Q-Learning
Epsilon-Greedy
(Non-)Stationary
Fully Observable
Finite Horizon
7 / 10
Source Alterations
I pybrain.rl.environments.mazes.maze
class Maze(Environment, Named):
initPos = None
def __init__(self, topology, goal, **args):
if self.initPos == None:
self.initPos = self._freePos()
def _freePos(self):
if self.punishing_states != None:
if (i, j) not in self.punishing_states:
res.append((i, j))

I pybrain.rl.environments.mazes.tasks
class MDPMazeTask(Task):
def getReward(self):
if self.env.goal == self.env.perseus:
self.env.reset()
reward = 1
elif self.env.punishing_states != None and
self.env.perseus in self.env.punishing_states:
self.env.reset()
reward = -1
else:
reward = -0.02
return rewar

I pybrain.rl.explorers.discrete.egreedy
class EpsilonGreedyExplorer(DiscreteExplorer):
#self.epsilon *= self.decay

8 / 10
Maze Real-Time Learning Set-Up
envmatrix = array([[1, 1, 1, 1, 1, 1, 1, 1, 1],
...])
env = Maze(envmatrix, (1, 7), [(1, 1)], [(1, 6)])

# create task
task = MDPMazeTask(env)

# create value table and initialize with ones


table = ActionValueTable(81, 4)
table.initialize(0.)

# create agent with controller and learner


learner = Q()

# create agent
agent = LearningAgent(table, learner)

# create experiment
experiment = Experiment(task, agent)

for i in range(5000):
# interact with the environment (here in batch mode)
experiment.doInteractions(200)
agent.learn()
agent.reset()

if i == 2500:
env.clearPunishingStates()

9 / 10
Results

First 2500 Iterations

Second 2500 Iterations

10 / 10

You might also like