Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba

Uploaded by

Badr Hirchoua

0% found this document useful (0 votes)

6 views1 page

Original Title

poster dreamer v2

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views1 page

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba

Uploaded by

Badr Hirchoua

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

Mastering Atari with Discrete World Models

12 3 1 2
Danijar Hafner , Timothy Lillicrap , Mohammad Norouzi , Jimmy Ba
1 2 3
Google Brain University of Toronto DeepMind

1 Introducing DreamerV2 2 World Model Learning 3 Actor Critic Learning

Based on the latent dynamics model of PlaNet that predicts ahead in Learn neural network actor and state-value critic from imagined
a compact latent space and is trained as a sequential VAE rollouts in the compact state space of the world model
RL agent that learns purely
Each latent state is a vector of 32 categoricals with 32 classes each Compact states enable highly eﬃcient predictions of several
1 from latent space predictions trained via straight-through gradients (1 line of code) thousands of rollouts with a large batch size on a single GPU
of a learned world model

First world model to achieve

2 human-level performance on
the Atari 200M benchmark

Outperforms top model-free

3 agents with the same amount
of data and computation time

Vector of categorical latents

4 enforces learning sparse state Gamer normalized task median
representations over 55 games at 200M frames
with sticky actions.
For each imagined trajectory, we compute the lambda return that is an
KL balancing scales prior vs According to its authors, SimPLe KL balancing scales the two terms in the KL, the posterior entropy and average of n-step returns of all horizons n
5 posterior loss to encourage trained for fewer steps but does the prior cross entropy, to encourage accurate prior dynamics
accurate prior dynamics The critic is trained to predict the lambda return via squared loss
not improve further after that.
The actor is trained to maximize the lambda return, using both
reinforce gradients and straight-through gradients

4 Atari Benchmark 5 Ablation Study 6 Related & Future Work

SimPLe Learns a large world model that predicts forward in image

Comparison on the established Atari 200M benchmark of 55 games space, making it slow to learn the policy. The highest performance of
To understand the contributions to performance, we disable key SimPLe, which is achieved after 2M frames, is far behind that of a
with sticky actions. We consider the following score aggregations: components of the world model:
human gamer or model-free agents.
Task Median, Gamer Normalized Most common in the literature Categorical latents contribute substantially to performance MuZero Learns a task-speciﬁc sequential value function similar to
but median score ignores almost half of the games VPN without leveraging image information. Uses sophisticated MCTS
KL balancing contributes substantially to performance
Task Mean, Gamer Normalized Dominated by few games with low tree search that would be compatible with DreamerV2. Requires an
gamer baseline (Crazy Climber, Video Pinball, James Bond) Image gradients are essential for representations in DreamerV2
expensive distributed training setup and is closed source.
Task Mean, Record Normalized Suggested by Toromanoff et al. Reward gradients are not necessary for learning representations
What did not work Vector of binary latents, single categorical latents,
(2019) but few games still contribute the most long-term entropy bonus as part of the value function, free bits, KL
Task Mean, Clipped Record Norm We recommend clipping scores annealing, using only straight-through policy gradients
to not exceed world record, which results in a robust metric Future work Multi-task transfer, long-term memory, global exploration
DreamerV2 outperforms the top single-GPU algorithms Rainbow and using model uncertainty, hierarchical latents and temporal abstraction,
IQN in all four metrics environments with real-world visual complexity, better policy
optimization using techniques from the model-free literature

@danijarh Website with videos and code: danijar.com/dreamerv2

Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
Document10 pages
Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
haisabi
No ratings yet
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Google Brain University of Toronto Deepmind
Document1 page
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Google Brain University of Toronto Deepmind
Badr Hirchoua
No ratings yet
DLCV Ch2 Neural Network
Document68 pages
DLCV Ch2 Neural Network
Mario Parot
No ratings yet
Document Image Binarization Using Dual Discriminator Generative Adversarial Networks
Document5 pages
Document Image Binarization Using Dual Discriminator Generative Adversarial Networks
ivanz.arsi
No ratings yet
Deep Learning QP
Document4 pages
Deep Learning QP
Gowri Ilayaraja
No ratings yet
Image Edge Detection Based On Fpga: Sree Vidyanikethan Engineering College
Document38 pages
Image Edge Detection Based On Fpga: Sree Vidyanikethan Engineering College
Ramesh Mk
No ratings yet
Crowds in Cars2 - Kanyuk
Document2 pages
Crowds in Cars2 - Kanyuk
Nicolas Chaverou
No ratings yet
Guest-Lecture - Transfer Learning
Document61 pages
Guest-Lecture - Transfer Learning
sk1029
No ratings yet
Muzero Poster Neurips 2019
Document1 page
Muzero Poster Neurips 2019
Lakadi Ka Fool
No ratings yet
Zoe Depth
Document20 pages
Zoe Depth
Blerim Hasa
No ratings yet
Instructor: Weeks T L Presentation Topic
Document2 pages
Instructor: Weeks T L Presentation Topic
fourth year project
No ratings yet
Scimakelatex 1008 Bna+Bla Zerge+Zita
Document7 pages
Scimakelatex 1008 Bna+Bla Zerge+Zita
anon_586378449
No ratings yet
Megalodon: Efficient LLM Pretraining and Inference With Unlimited Context Length
Document18 pages
Megalodon: Efficient LLM Pretraining and Inference With Unlimited Context Length
John
No ratings yet
Deep Learning: Hoàng Huy Minh Hoàng Thảo Lan Chi Phạm Huy Thiên Phúc Trương Huỳnh Đăng Khoa
Document25 pages
Deep Learning: Hoàng Huy Minh Hoàng Thảo Lan Chi Phạm Huy Thiên Phúc Trương Huỳnh Đăng Khoa
sandy milk min
No ratings yet
DROID-SLAM Research Paper
Document15 pages
DROID-SLAM Research Paper
Item Songs
No ratings yet
Depth Estimation
Document3 pages
Depth Estimation
Prithvi Poddar
No ratings yet
Image Edge Detection Based On Fpga: Presented By: M.Ashok
Document38 pages
Image Edge Detection Based On Fpga: Presented By: M.Ashok
Pavan Kumar
No ratings yet
PGDDSA Computer Vision Session 8&9 HO
Document13 pages
PGDDSA Computer Vision Session 8&9 HO
Surajit Das
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
Document106 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
Zee Ingame
No ratings yet
08 Upscaling-Day 2
Document8 pages
08 Upscaling-Day 2
Thắng Nguyễn
No ratings yet
Learning To Play Atari Games: David Hershey, Rush Moody, Blake Wulfe (Dshersh, Rmoody, Wulfebw) @stanford
Document6 pages
Learning To Play Atari Games: David Hershey, Rush Moody, Blake Wulfe (Dshersh, Rmoody, Wulfebw) @stanford
Bakthakolahalan Shyamsundar
No ratings yet
Assignment 2
Document3 pages
Assignment 2
Yufa Yulia Fatma
No ratings yet
Gan Fpga
Document35 pages
Gan Fpga
964520978sbj
No ratings yet
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
Document50 pages
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
Zee Ingame
No ratings yet
SimCLR: Simple Framework For Contrastive Learning of Visual Representaitons
Document20 pages
SimCLR: Simple Framework For Contrastive Learning of Visual Representaitons
YuFei Yang
No ratings yet
CS391R Presentation Dexterous
Document16 pages
CS391R Presentation Dexterous
Muhammad Yasir Khan
No ratings yet
P 2M: Generating Deployable Models From Natural Language Instructions
Document10 pages
P 2M: Generating Deployable Models From Natural Language Instructions
italianrussian max
No ratings yet
Autoencoders - Buffalo University
Document36 pages
Autoencoders - Buffalo University
nitin
No ratings yet
AIML-Module-3-part 2
Document122 pages
AIML-Module-3-part 2
srujanmoily
No ratings yet
Zhu2018 1
Document6 pages
Zhu2018 1
Ramjan Khandelwal
No ratings yet
Learning Simple Games With Neural Networks: Kumar Ashutosh Mohd Safwan Manas Vashistha
Document6 pages
Learning Simple Games With Neural Networks: Kumar Ashutosh Mohd Safwan Manas Vashistha
Mohd Safwan
No ratings yet
Dnjnjjadsjbfj
Document17 pages
Dnjnjjadsjbfj
qicher
No ratings yet
Escape Analysis For Java
Document19 pages
Escape Analysis For Java
Alexander
No ratings yet
FBGEMM
Document5 pages
FBGEMM
yang gu
No ratings yet
111 Report
Document6 pages
111 Report
Nguyễn Tự Sang
No ratings yet
Memory-Efficient Fine-Tuning
Document17 pages
Memory-Efficient Fine-Tuning
Shrikant Koltur
No ratings yet
Google Wakeword Detection 1 PDF
Document5 pages
Google Wakeword Detection 1 PDF
Özgür Bora Gevrek
No ratings yet
Slides PyConfr Bordeaux Calcagno
Document46 pages
Slides PyConfr Bordeaux Calcagno
Facundo Calcagno
No ratings yet
Parrot
Document22 pages
Parrot
Sagnik Ghosh
No ratings yet
ALDPR - A Comparison of Architectures
Document8 pages
ALDPR - A Comparison of Architectures
Bharath Desikan
No ratings yet
FTML Book
Document130 pages
FTML Book
Jorge
No ratings yet
Hands-On Neural Networks
Document346 pages
Hands-On Neural Networks
naveen441
100% (2)
L10 - Intro - To - Deep - Learning
Document75 pages
L10 - Intro - To - Deep - Learning
NALE SVPMEngg
No ratings yet
Deep Learning For Beginners Mock Exam PDF
Document15 pages
Deep Learning For Beginners Mock Exam PDF
Natalie B
No ratings yet
Scimakelatex 1168 A +park C +jones
Document6 pages
Scimakelatex 1168 A +park C +jones
Henry Iscre
No ratings yet
MetaLabelNet: Learning To Generate Soft-Labels From Noisy-Labels
Document10 pages
MetaLabelNet: Learning To Generate Soft-Labels From Noisy-Labels
Görkem Algan
No ratings yet
Deep Learning Stereo Vision at The Edge: Luca Puglia and Cormac Brick
Document10 pages
Deep Learning Stereo Vision at The Edge: Luca Puglia and Cormac Brick
Chulbul Pandey
No ratings yet
Dlincv 161110052148 PDF
Document271 pages
Dlincv 161110052148 PDF
Raj Verma
No ratings yet
VVBVBVBB
Document19 pages
VVBVBVBB
nasimakhtar
No ratings yet
SAR Target Recognition Based On Deep Learning
Document7 pages
SAR Target Recognition Based On Deep Learning
Liiz Rozaz
No ratings yet
G2o-A General Framework For Graph Optimization PDF
Document7 pages
G2o-A General Framework For Graph Optimization PDF
Sidharth Sharma
No ratings yet
Jonasandersen 20194920 Report
Document4 pages
Jonasandersen 20194920 Report
Ömür Bilgin
No ratings yet
Image Colorization Using GANs
Document18 pages
Image Colorization Using GANs
Email Service
No ratings yet
C43F7
Document3 pages
C43F7
Albert Luna
No ratings yet
JPEG Using Baseline Method2
Document5 pages
JPEG Using Baseline Method2
krinunn
No ratings yet
Asphlat - Pavement - Images - Labelling - Timeline - Sheet1
Document1 page
Asphlat - Pavement - Images - Labelling - Timeline - Sheet1
Rahib Ali
No ratings yet
Brainformers - Trading Simplicity For Efficiency
Document12 pages
Brainformers - Trading Simplicity For Efficiency
Nilay Jain
No ratings yet
Semi-Supervised Domain Adaptation Via Minimax Entropy PDF
Document13 pages
Semi-Supervised Domain Adaptation Via Minimax Entropy PDF
Hưng Ngô
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
High-Performance Computing: Paradigm and Infrastructure
From Everand
High-Performance Computing: Paradigm and Infrastructure
Laurence T. Yang
Rating: 2 out of 5 stars
2/5 (1)
8x8 Omega
Document37 pages
8x8 Omega
Giri Saranu
100% (2)
Samsung - PS42C430 - Exploded View & Part List - (SM)
Document3 pages
Samsung - PS42C430 - Exploded View & Part List - (SM)
domisoft
No ratings yet
Trending Technologies
Document4 pages
Trending Technologies
Neeraj Varma
No ratings yet
Mid Term
Document19 pages
Mid Term
op dapodik
No ratings yet
Gmsys-3D Tutorial Ii: Magnetic 1-Step Susceptibility Inversion
Document5 pages
Gmsys-3D Tutorial Ii: Magnetic 1-Step Susceptibility Inversion
Raden Tuna
No ratings yet
AMC Unit I PPT
Document132 pages
AMC Unit I PPT
Sasi Bhushan
No ratings yet
SIP Trunk Cost and Tariff
Document9 pages
SIP Trunk Cost and Tariff
HENG SONG
No ratings yet
Iot Levels and Deployment Templates
Document9 pages
Iot Levels and Deployment Templates
Prince Jain
No ratings yet
NVR2.0 User Manual180530
Document31 pages
NVR2.0 User Manual180530
Alicia Carmona
No ratings yet
CIA Triad Presentation
Document30 pages
CIA Triad Presentation
Gayashan Perera
No ratings yet
Music Recommendation System by Facial Emotion by Rahul
Document2 pages
Music Recommendation System by Facial Emotion by Rahul
rahulmonson
No ratings yet
Ex No: 1 Web Page Design Using HTML Date:: SRC Alt Usemap Name Shape Coords Alt Href Shape Coords Alt Href
Document5 pages
Ex No: 1 Web Page Design Using HTML Date:: SRC Alt Usemap Name Shape Coords Alt Href Shape Coords Alt Href
vinayagam b
No ratings yet
Digital Forensics Tools
Document15 pages
Digital Forensics Tools
Chartered Accountant
No ratings yet
Questions and Key Answer A. Multiple Choice Put The Cross (O) On The Letter A, B, or C For The Correct Answer!
Document2 pages
Questions and Key Answer A. Multiple Choice Put The Cross (O) On The Letter A, B, or C For The Correct Answer!
FATHNISAH NURSAHBAN HZ
No ratings yet
المحاضرة 11
Document19 pages
المحاضرة 11
هبة نبيل بالخير
No ratings yet
Inventory Management
Document42 pages
Inventory Management
Kavo
100% (1)
Usability
Document3 pages
Usability
carlos juarez
No ratings yet
Disease Detection in Plants - Report
Document78 pages
Disease Detection in Plants - Report
Prajwal
No ratings yet
Computers Generations
Document7 pages
Computers Generations
Bubrika
No ratings yet
Combination of Stochastic Degradation Processes Based On Accelerated Degradation Tests Applied To Photovoltaic Modules
Document9 pages
Combination of Stochastic Degradation Processes Based On Accelerated Degradation Tests Applied To Photovoltaic Modules
RANDRIAMAHEFA Alido Soidry
No ratings yet
Iwc Dump
Document612 pages
Iwc Dump
stefan ungurean
No ratings yet
Zain Assesment
Document20 pages
Zain Assesment
Amra Ali Ali
No ratings yet
Student: Basilisco, Jalefaye Professor: Dr. Gilda Membrillos Midterm Examination Answers
Document3 pages
Student: Basilisco, Jalefaye Professor: Dr. Gilda Membrillos Midterm Examination Answers
Jalefaye Talledo Abapo
No ratings yet
Tele-Graffiti: By: Nagendra K. (4BW07EC410)
Document26 pages
Tele-Graffiti: By: Nagendra K. (4BW07EC410)
nagendra_ec105393
100% (1)
ADE Lab Manual Updated As Per New Syllabus 2018
Document33 pages
ADE Lab Manual Updated As Per New Syllabus 2018
Santosh Santhu
No ratings yet
A340-500/600 Airbus: Fuel 28
Document202 pages
A340-500/600 Airbus: Fuel 28
Navid Khalili Safa
No ratings yet
Cloudsim 3.0.3 Manual
Document90 pages
Cloudsim 3.0.3 Manual
Bathini Nipun
No ratings yet
User Guide For Events Related To Documments
Document8 pages
User Guide For Events Related To Documments
SURYAPRATAPREDDY BHUMA
No ratings yet
Toslink RX Data 16Mbps Sys - Concept
Document6 pages
Toslink RX Data 16Mbps Sys - Concept
Harold Jose Medrano Morales
No ratings yet
Irina Shamaeva
Document2 pages
Irina Shamaeva
api-20350857
100% (1)