You are on page 1of 1

Dream to Control

Learning Behaviors by Latent Imagination


12 3 2 1
Danijar Hafner , Timothy Lillicrap , Jimmy Ba , Mohammad Norouzi
1 2 3
Google Brain University of Toronto DeepMind

1 We introduce Dreamer 2 Agent Overview 3 Learning Latent Dynamics


World models are an explicit way to summarize an agent's
We present Dreamer, an RL agent that solves visual control experience and represent its knowledge about the world.
1
tasks purely by latent imagination inside of a world model.
When inputs are large, predicting forward in a compact latent
space lets us imagine thousands of trajectories in parallel.
To efficiently learn behaviors, we backpropagate multi-step
2 Dreamer is compatible with any representation objective, e.g.
value gradients through imagined latent transitions.
reconstruction, contrastive estimation, or reward prediction.
Dreamer learns off-policy and is applicable to both r̂ 1 a1 r̂ 2 a2 r̂ 3
3
continuous and discrete actions and early episode ends.
encode images
Dreamer exceeds top model-free agents in final
4
performance, data-efficiency, and wall-clock time.
compute states

predict rewards

reconstruction
o1 ô 1 o2 ô 2 o3 ô 3

4 Learning Behaviors in Imagination 5 Comparison to Existing Methods 6 Ablation Studies


Robustness to the imagination horizon
Dreamer learns to predict
actions and values from
the compact model states.
Start imagined trajectory
from each latent state of 20 visual control tasks posing challenges incl. contact dynamics,
sparse rewards, many degrees of freedom, and 3D environments. Effect of value model and action model
the current data batch.
Dreamer outperforms the previous best D4PG agent in final
Value model optimizes performance, sample-efficiency, and wall-clock time. We use the
Bellman consistency for same hyper parameters across all tasks.
current action model.
243 332 786 823
Action model maximizes
values by backpropagating Choice of representation learning method
through latent transitions.
Reparam. gradients for
latent states and actions,
ST for discrete actions.
Training time: 1-2 days on a single GPU

Contact: mail@danijar.com Twitter: @danijarh Project website with videos: danijar.com/dreamer

You might also like