You are on page 1of 86

Building Intelligent Systems with

Large Scale Deep Learning
Jeff Dean
Google Brain team
g.co/brain
Presenting the work of many people at Google
Google Brain Team Mission:
Make Machines Intelligent.
Improve People’s Lives.
How do we do this?
● Conduct long-term research (>200 papers, see g.co/brain & g.co/brain/papers)
○ Unsupervised learning of cats, Inception, word2vec, seq2seq, DeepDream, image
captioning, neural translation, Magenta, ML for robotics control, healthcare, …

● Build and open-source systems like TensorFlow (see tensorflow.org and
https://github.com/tensorflow/tensorflow)

● Collaborate with others at Google and Alphabet to get our work into
the hands of billions of people (e.g., RankBrain for Google Search, GMail
Smart Reply, Google Photos, Google speech recognition, Google Translate, Waymo, …)

● Train new researchers through internships and the Google Brain
Residency program
Main Research Areas

● General Machine Learning Algorithms and Techniques
● Computer Systems for Machine Learning
● Natural Language Understanding
● Perception
● Healthcare
● Robotics
● Music and Art Generation
Main Research Areas

● General Machine Learning Algorithms and Techniques
● Computer Systems for Machine Learning
● Natural Language Understanding
● Perception
● Healthcare
● Robotics
● Music and Art Generation
research.googleblog.com/2017/01
/the-google-brain-team-looking-ba
ck-on.html
1980s and 1990s

Accuracy neural networks

other approaches

Scale (data size, model size)
1980s and 1990s

more
Accuracy compute neural networks

other approaches

Scale (data size, model size)
Now

more
Accuracy compute neural networks

other approaches

Scale (data size, model size)
Growth of Deep Learning at Google

and many more . . . .
Directories containing model description files
Experiment Turnaround Time and Research Productivity
● Minutes, Hours:
○ Interactive research! Instant gratification!
● 1-4 days
○ Tolerable
○ Interactivity replaced by running many experiments in parallel
● 1-4 weeks
○ High value experiments only
○ Progress stalls
● >1 month
○ Don’t even try
Build the right tools

Google Confidential + Proprietary (permission granted to share within NIST)
Open, standard software for
general machine learning

Great for Deep Learning in
particular

http://tensorflow.org/ First released Nov 2015
and
Apache 2.0 license
https://github.com/tensorflow/tensorflow
TensorFlow Goals
Establish common platform for expressing machine learning ideas and systems

Make this platform the best in the world for both research and production use

Open source it so that it becomes a platform for everyone, not just Google
TensorFlow Scaling
Near-linear performance gains with each additional 8x NVIDIA® Tesla® K80
server added to the cluster
TensorFlow supports many platforms

CPU GPU

iOS Android

Raspberry Pi

1st-gen TPU Cloud TPU
TensorFlow supports many languages
Java
2013

2011

2013

2013

2010

late 2015
ML is done in many places

TensorFlow GitHub stars by GitHub user profiles w/ public locations
Source: http://jrvis.com/red-dwarf/?user=tensorflow&repo=tensorflow
TensorFlow: A Vibrant Open-Source Community
● Rapid development, many outside contributors
○ 475+ non-Google contributors to TensorFlow 1.0
○ 15,000+ commits in 15 months
○ Many community created tutorials, models, translations, and projects
■ ~7,000 GitHub repositories with ‘TensorFlow’ in the title

● Direct engagement between community and TensorFlow team
○ 5000+ Stack Overflow questions answered
○ 80+ community-submitted GitHub issues responded to weekly

● Growing use in ML classes: Toronto, Berkeley, Stanford, ...
Google Photos

[glacier]
Google Cloud Platform Confidential & Proprietary 24 24
Reuse same model for completely
different problems

Same basic model structure
trained on different data,
useful in completely different contexts

Example: given image → predict interesting pixels
www.google.com/sunroof

We have tons of vision problems

Image search, StreetView, Satellite Imagery,
Translation, Robotics, Self-driving Cars,
Computers can now see

Large implications for healthcare

Google Confidential + Proprietary (permission granted to share within NIST)
MEDICAL IMAGING
Using similar model for detecting diabetic
retinopathy in retinal images
Performance on par or slightly better than the median of 8 U.S.
board-certified ophthalmologists (F-score of 0.95 vs. 0.91).
http://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html
Computers can now see

Large implications for robotics

Google Confidential + Proprietary (permission granted to share within NIST)
Combining Vision with Robotics

“Deep Learning for Robots: Learning
from Large-Scale Interaction”, Google
Research Blog, March, 2016

“Learning Hand-Eye Coordination for
Robotic Grasping with Deep Learning
and Large-Scale Data Collection”,
Sergey Levine, Peter Pastor, Alex
Krizhevsky, & Deirdre Quillen,
Arxiv, arxiv.org/abs/1603.02199
Self-Supervised and End-to-end Pose Estimation

Confidential + Proprietary
TCN + Self-Supervision (No Labels!)

Confidential + Proprietary
Scientific Applications of ML

Google Confidential + Proprietary (permission granted to share within NIST)
Predicting Properties of Molecules
Toxic?
Message
Passing Neural Bind with a given protein?
Aspirin Net

Quantum properties.

● Chemical space is too big, so chemists often rely on virtual screening.
● Machine Learning can help search this large space.
● Molecules are graphs, nodes=atoms and edges=bonds (and other stuff)
● Message Passing Neural Nets unify and extend many neural net models
that are invariant to graph symmetries
● State of the art results predicting output of expensive quantum chemistry
calculations, but ~300,000 times faster
https://research.googleblog.com/2017/04/predicting-properties-of-molecules-with.html and
https://arxiv.org/abs/1702.05532 and https://arxiv.org/abs/1704.01212 (latter to appear in ICML 2017)
Measuring live cells with
image to image regression

“Seeing More”
Enabling technology: Image to image regression
Input True Depth Predicted Depth
Depth prediction on portrait data
Applications for camera effects
Input Saturation Defocus
Predict cellular markers
from transmission microscopy?
Human cancer cells / DIC / nuclei (blue)
and cell mask (green)
Human iPSC neurons / phase contrast /
nuclei (blue), dendrites (green), and
axons (red)
Scaling language understanding models

Google Confidential + Proprietary (permission granted to share within NIST)
Sequence-to-Sequence Model
Target sequence
[Sutskever & Vinyals & Le NIPS 2014] X Y Z Q

v

Deep LSTM
A B C D __ X Y Z

Input sequence
Sequence-to-Sequence Model: Machine Translation
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How

v

Quelle est votre taille? <EOS>

Input sentence
Sequence-to-Sequence Model: Machine Translation
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall

v

Quelle est votre taille? <EOS> How

Input sentence
Sequence-to-Sequence Model: Machine Translation
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall are

v

Quelle est votre taille? <EOS> How tall

Input sentence
Sequence-to-Sequence Model: Machine Translation
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall are you?

v

Quelle est votre taille? <EOS> How tall are

Input sentence
Sequence-to-Sequence Model: Machine Translation
At inference time:
Beam search to choose most probable
[Sutskever & Vinyals & Le NIPS 2014] over possible output sequences

v

Quelle est votre taille? <EOS>

Input sentence
Smart Reply
Google Research Blog
- Nov 2015
Incoming Email
Activate
Smart Reply?
Small
Feed-Forward yes/no
Neural Network
Smart Reply
Google Research Blog
- Nov 2015
Incoming Email
Activate
Smart Reply?
Small
Feed-Forward yes/no
Neural Network

Generated Replies

Deep Recurrent
Neural Network
Smart Reply
April 1, 2009: April Fool’s Day joke

Nov 5, 2015: Launched Real Product

Feb 1, 2016: >10% of mobile Inbox replies
Sequence to Sequence model
applied to Google Translate

Google Confidential + Proprietary (permission granted to share within NIST)
https://arxiv.org/abs/1609.08144
Google Neural Machine Translation Model
Y1 Y2 </s>
One
model
Encoder LSTMs SoftMax
replica:
one
machine Decoder LSTMs
Gpu8
w/ 8 Gpu8
GPUs

8 Layers
+ + +
+ + +
Gpu3

Gpu3

Gpu2

Attention Gpu2

Gpu2

Gpu1
Gpu1
<s> Y1 Y3
X3 X2 </s>
Model + Data Parallelism

Parameters
distributed across Params Params Params
many parameter
server machines
...

Many
replicas
...
Neural Machine Translation

6 perfect translation

5
human
Translation quality

4 neural (GNMT)
phrase-based (PBMT)
3

2
Closes gap between old system
1 and human-quality translation
by 58% to 87%
0
English English English Spanish French Chinese
> > > > > >
Spanish French Chinese English English English
Enables better communication
Translation model
across the world

research.googleblog.com/2016/09/a-neural-network-for-machine.html
BACKTRANSLATION
FROM JAPANESE (en->ja->en)
Phrase-Based Machine Translation (old system):
Kilimanjaro is 19,710 feet of the mountain covered with snow, and
it is said that the highest mountain in Africa. Top of the west,
“Ngaje Ngai” in the Maasai language, has been referred to as the
house of God. The top close to the west, there is a dry, frozen
carcass of a leopard. Whether the leopard had what the demand at
that altitude, there is no that nobody explained.

Google Neural Machine Translation (new system):
Kilimanjaro is a mountain of 19,710 feet covered with snow, which is
said to be the highest mountain in Africa. The summit of the west is
called “Ngaje Ngai” God ‘s house in Masai language. There is a dried
and frozen carcass of a leopard near the summit of the west. No one
can explain what the leopard was seeking at that altitude.
Automated machine learning
(“learning to learn”)

Google Confidential + Proprietary (permission granted to share within NIST)
Current:
Solution = ML expertise + data + computation
Current:
Solution = ML expertise + data + computation

Can we turn this into:
Solution = data + 100X computation

???
Early encouraging signs

Trying multiple different approaches:

(1) RL-based architecture search
(2) Model architecture evolution
(3) Learn how to optimize
Appeared in ICLR 2017

Idea: model-generating model trained via RL

(1) Generate ten models
(2) Train them for a few hours
(3) Use loss of the generated models as reinforcement
learning signal
arxiv.org/abs/1611.01578
CIFAR-10 Image Recognition Task
Penn Tree Bank Language Modeling Task
“Normal” LSTM cell

Cell discovered by
architecture search
Learn2Learn: Learn the Optimization Update Rule

Neural Optimizer Search using Reinforcement Learning,
Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc Le. To appear in ICML 2017
More computational power needed

Deep learning is transforming how we
design computers

Google Confidential + Proprietary (permission granted to share within NIST)
Special computation properties

about 1.2 1.21042
reduced
precision × about 0.6 NOT × 0.61127
ok
about 0.7 0.73989343
Special computation properties

about 1.2 1.21042
reduced
precision × about 0.6 NOT × 0.61127
ok
about 0.7 0.73989343

handful of
specific
operations
× =
Tensor Processing Unit v2

Revealed in May at Google I/O

Google-designed device for neural net training and inference

● 180 teraflops of computation, 64 GB of memory
TPU Pod
64 2nd-gen TPUs
11.5 petaflops
4 terabytes of memory
Programmed via TensorFlow
Same program will run with only minor
modifications on CPUs, GPUs, & TPUs

Will be Available through Google Cloud
Cloud TPU - virtual machine w/180 TFLOPS TPUv2 device attached
Making 1000 Cloud TPUs available for free to top researchers who are
committed to open machine learning research

We’re excited to see what researchers will do with much more computation!
g.co/tpusignup
Machine Learning in Google Cloud

Custom ML models Pre-trained ML models

Vision API Speech API Jobs API

TensorFlow Machine Learning
Engine

Natural Translation Video
Language API API Intelligence API
Machine Learning for
Higher Performance Machine Learning
Models

Google Confidential + Proprietary (permission granted to share within NIST)
Device Placement with Reinforcement Learning
Placement model (trained
via RL) gets graph as input Measured time
+ set of devices, outputs per step gives
device placement for each RL reward signal
graph node

+19.3% faster vs. expert human for NMT model +19.7% faster vs. expert human for InceptionV3

Device Placement Optimization with Reinforcement Learning,
Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou,
Naveen Kumar, Rasmus Larsen, and Jeff Dean, to appear in ICML 2017, arxiv.org/abs/1706.04972
Now

more
Accuracy compute neural networks

other approaches

Scale (data size, model size)
Future

more
Accuracy compute neural networks

other approaches

Scale (data size, model size)
Example queries of the future
Which of these eye
images shows Describe this video
symptoms of diabetic in Spanish
retinopathy?

Find me documents related to
reinforcement learning for
Please fetch me a cup robotics and summarize them
of tea from the kitchen in German
Conclusions
Deep neural networks are making significant strides in
speech, vision, language, search, robotics, healthcare, …

If you’re not considering how to use deep neural nets to solve
your problems, you almost certainly should be
g.co/brain
More info about our work
g.co/brain

Thanks!