You are on page 1of 86
Building Intelligent Systems with Large Scale Deep Learning
Building Intelligent Systems with Large Scale Deep Learning
Building Intelligent Systems with Large Scale Deep Learning
Building Intelligent Systems with Large Scale Deep Learning

Building Intelligent Systems with Large Scale Deep Learning

Building Intelligent Systems with Large Scale Deep Learning

Jeff Dean Google Brain team g.co/brain

Presenting the work of many people at Google

Google Brain Team Mission:

Make Machines Intelligent. Improve People’s Lives.

How do we do this?

● Conduct long-term research (>200 papers, see g.co/brain & g.co/brain/papers)

○ Unsupervised learning of cats, Inception, word2vec, seq2seq, DeepDream, image captioning, neural translation, Magenta, ML for robotics control, healthcare, …

● Build and open-source systems like TensorFlow (see tensorflow.org and

● Collaborate with others at Google and Alphabet to get our work into

Train new researchers through internships and the Google Brain Residency program

Main Research Areas

● General Machine Learning Algorithms and Techniques

● Computer Systems for Machine Learning

● Natural Language Understanding

● Perception

● Healthcare

● Robotics

● Music and Art Generation

Main Research Areas

● General Machine Learning Algorithms and Techniques

● Computer Systems for Machine Learning

● Natural Language Understanding

● Perception

● Healthcare

● Robotics

● Music and Art Generation

1980s and 1990s

1980s and 1990s Accuracy neural networks other approaches Scale (data size, model size)

Accuracy

neural networks

1980s and 1990s Accuracy neural networks other approaches Scale (data size, model size)

other approaches

Scale (data size, model size)

1980s and 1990s

1980s and 1990s Accuracy more compute neural networks other approaches Scale (data size, model size)

Accuracy

more

compute

neural networks

1980s and 1990s Accuracy more compute neural networks other approaches Scale (data size, model size)

other approaches

Scale (data size, model size)

Now

Now Accuracy more compute neural networks other approaches Scale (data size, model size)

Accuracy

more

compute

neural networks

Now Accuracy more compute neural networks other approaches Scale (data size, model size)

other approaches

Scale (data size, model size)

Growth of Deep Learning at Google

Growth of Deep Learning at Google Directories containing model description files and many more

Directories containing model description files

Growth of Deep Learning at Google Directories containing model description files and many more
Growth of Deep Learning at Google Directories containing model description files and many more
Growth of Deep Learning at Google Directories containing model description files and many more
Growth of Deep Learning at Google Directories containing model description files and many more
Growth of Deep Learning at Google Directories containing model description files and many more
Growth of Deep Learning at Google Directories containing model description files and many more
Growth of Deep Learning at Google Directories containing model description files and many more
Growth of Deep Learning at Google Directories containing model description files and many more

and many more

Growth of Deep Learning at Google Directories containing model description files and many more

Experiment Turnaround Time and Research Productivity

Minutes, Hours:

Interactive research! Instant gratification!

1-4 days

Tolerable

Interactivity replaced by running many experiments in parallel

1-4 weeks

High value experiments only

Progress stalls

>1 month

Don’t even try

Build the right tools

Build the right tools Google Confidential + Proprietary (permission granted to share within NIST)

Google Confidential + Proprietary (permission granted to share within NIST)

http://tensorflow.org/ and https://github.com/tensorflow/tensorflow Open, standard software for general machine

Open, standard software for general machine learning

Great for Deep Learning in particular

First released Nov 2015

Apache 2.0 license

TensorFlow Goals

Establish common platform for expressing machine learning ideas and systems

Make this platform the best in the world for both research and production use

Open source it so that it becomes a platform for everyone, not just Google

TensorFlow Scaling
TensorFlow Scaling
Near-linear performance gains with each additional 8x NVIDIA® Tesla® K80 server added to the cluster
Near-linear performance gains with each additional 8x NVIDIA® Tesla® K80
server added to the cluster

TensorFlow supports many platforms

CPU
CPU
TensorFlow supports many platforms CPU GPU iOS 1st-gen TPU Cloud TPU Android Raspberry Pi

GPU

TensorFlow supports many platforms CPU GPU iOS 1st-gen TPU Cloud TPU Android Raspberry Pi
TensorFlow supports many platforms CPU GPU iOS 1st-gen TPU Cloud TPU Android Raspberry Pi

iOS

1st-gen TPU

Cloud TPU

TensorFlow supports many platforms CPU GPU iOS 1st-gen TPU Cloud TPU Android Raspberry Pi

Android

Raspberry Pi
Raspberry Pi

TensorFlow supports many languages

TensorFlow supports many languages Java

Java

TensorFlow supports many languages Java
2013 2011 2013 2013 2010 late 2015
2013
2011
2013
2013
2010
late 2015

ML is done in many places

ML is done in many places TensorFlow GitHub stars by GitHub user profiles w/ public locations

TensorFlow GitHub stars by GitHub user profiles w/ public locations Source: http://jrvis.com/red-dwarf/?user=tensorflow&repo=tensorflow

TensorFlow: A Vibrant Open-Source Community

 

Rapid development, many outside contributors

475+ non-Google contributors to TensorFlow 1.0

15,000+ commits in 15 months

Many community created tutorials, models, translations, and projects

~7,000 GitHub repositories with ‘TensorFlow’ in the title

Direct engagement between community and TensorFlow team

 

5000+ Stack Overflow questions answered

80+ community-submitted GitHub issues responded to weekly

Growing use in ML classes: Toronto, Berkeley, Stanford,

Google Photos
Google Photos

Google Cloud Platform

[glacier]

24

24

Confidential & Proprietary

Reuse same model for completely different problems

Same basic model structure trained on different data, useful in completely different contexts

Example: given image → predict interesting pixels

trained on different data , useful in completely different contexts Example: given image → predict interesting
www.google.com/sunroof We have tons of vision problems Image search, StreetView, Satellite Imagery, Translation,

We have tons of vision problems

Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars,

We have tons of vision problems Image search, StreetView, Satellite Imagery, Translation, Robotics, Self-driving Cars,

Computers can now see

Large implications for healthcare

Computers can now see Large implications for healthcare Google Confidential + Proprietary (permission granted to share

Google Confidential + Proprietary (permission granted to share within NIST)

MEDICAL IMAGING Using similar model for detecting diabetic retinopathy in retinal images

Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score
Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score

Performance on par or slightly better than the median of 8 U.S. board-certified ophthalmologists (F-score of 0.95 vs. 0.91).

Computers can now see

Large implications for robotics

Computers can now see Large implications for robotics Google Confidential + Proprietary (permission granted to share

Google Confidential + Proprietary (permission granted to share within NIST)

Combining Vision with Robotics

“Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection”, Sergey Levine, Peter Pastor, Alex

Krizhevsky, & Deirdre Quillen, Arxiv, arxiv.org/abs/1603.02199

Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen, Arxiv, arxiv.org/abs/1603.02199
Data Collection”, Sergey Levine, Peter Pastor, Alex Krizhevsky, & Deirdre Quillen, Arxiv, arxiv.org/abs/1603.02199

Self-Supervised and End-to-end Pose Estimation

Self-Supervised and End-to-end Pose Estimation Confidential + Proprietary
Self-Supervised and End-to-end Pose Estimation Confidential + Proprietary
Confidential + Proprietary

Confidential + Proprietary

TCN + Self-Supervision (No Labels!)

TCN + Self-Supervision (No Labels!) Confidential + Proprietary
TCN + Self-Supervision (No Labels!) Confidential + Proprietary
Confidential + Proprietary

Confidential + Proprietary

Scientific Applications of ML

Scientific Applications of ML Google Confidential + Proprietary (permission granted to share within NIST)

Google Confidential + Proprietary (permission granted to share within NIST)

Predicting Properties of Molecules

Aspirin
Aspirin

Toxic?

Message Passing Neural Net
Message
Passing Neural
Net
of Molecules Aspirin Toxic? Message Passing Neural Net Bind with a given protein? Quantum properties. ●

Bind with a given protein?

Quantum properties. ● Chemical space is too big, so chemists often rely on virtual screening.
Quantum properties.
● Chemical space is too big, so chemists often rely on virtual screening.
● Machine Learning can help search this large space.
● Molecules are graphs, nodes=atoms and edges=bonds (and other stuff)
● Message Passing Neural Nets unify and extend many neural net models
that are invariant to graph symmetries
● State of the art results predicting output of expensive quantum chemistry
calculations, but ~300,000 times faster

Measuring live cells with image to image regression

Seeing More

Measuring live cells with image to image regression “ Seeing More ”

Enabling technology: Image to image regression

Enabling technology: Image to image regression Input True Depth Predicted Depth

Input

Enabling technology: Image to image regression Input True Depth Predicted Depth
Enabling technology: Image to image regression Input True Depth Predicted Depth

True Depth

Enabling technology: Image to image regression Input True Depth Predicted Depth
Enabling technology: Image to image regression Input True Depth Predicted Depth

Predicted Depth

Enabling technology: Image to image regression Input True Depth Predicted Depth
Enabling technology: Image to image regression Input True Depth Predicted Depth

Depth prediction on portrait data

Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data
Depth prediction on portrait data

Applications for camera effects

Input

Saturation

Defocus

Applications for camera effects Input Saturation Defocus
Applications for camera effects Input Saturation Defocus
Applications for camera effects Input Saturation Defocus
Applications for camera effects Input Saturation Defocus
Applications for camera effects Input Saturation Defocus

Predict cellular markers

from transmission microscopy?

Predict cellular markers from transmission microscopy?
Predict cellular markers from transmission microscopy?
Human cancer cells / DIC / nuclei (blue) and cell mask (green)

Human cancer cells / DIC / nuclei (blue) and cell mask (green)

Human cancer cells / DIC / nuclei (blue) and cell mask (green)
Human iPSC neurons / phase contrast / nuclei (blue), dendrites (green), and axons (red)

Human iPSC neurons / phase contrast / nuclei (blue), dendrites (green), and axons (red)

Human iPSC neurons / phase contrast / nuclei (blue), dendrites (green), and axons (red)

Scaling language understanding models

Scaling language understanding models Google Confidential + Proprietary (permission granted to share within NIST)

Google Confidential + Proprietary (permission granted to share within NIST)

Sequence-to-Sequence Model

Target sequence X Y Z Q
Target sequence
X
Y
Z Q

[Sutskever & Vinyals & Le NIPS 2014]

v A B C D X Y Z
v
A B
C D
X
Y Z
Model Target sequence X Y Z Q [Sutskever & Vinyals & Le NIPS 2014] v A

Deep LSTM

Input sequence
Input sequence

Sequence-to-Sequence Model: Machine Translation

Target sentence [Sutskever & Vinyals & Le NIPS 2014] How v Quelle est votre taille?
Target sentence
[Sutskever & Vinyals & Le NIPS 2014]
How
v
Quelle
est
votre
taille?
<EOS>

Input sentence

Sequence-to-Sequence Model: Machine Translation

Target sentence [Sutskever & Vinyals & Le NIPS 2014] How tall v Quelle est votre
Target sentence
[Sutskever & Vinyals & Le NIPS 2014]
How
tall
v
Quelle
est
votre
taille?
<EOS>
How

Input sentence

Sequence-to-Sequence Model: Machine Translation

Target sentence [Sutskever & Vinyals & Le NIPS 2014] How tall are v Quelle est
Target sentence
[Sutskever & Vinyals & Le NIPS 2014]
How
tall
are
v
Quelle
est
votre
taille?
<EOS>
How
tall

Input sentence

Sequence-to-Sequence Model: Machine Translation

Target sentence [Sutskever & Vinyals & Le NIPS 2014] How tall are you? v Quelle
Target sentence
[Sutskever & Vinyals & Le NIPS 2014]
How
tall
are
you?
v
Quelle
est
votre
taille?
<EOS>
How
tall
are

Input sentence

Sequence-to-Sequence Model: Machine Translation

At inference time:

Beam search to choose most probable over possible output sequences

[Sutskever & Vinyals & Le NIPS 2014]

output sequences [Sutskever & Vinyals & Le NIPS 2014] v est votre taille? <EOS> Quelle Input
v est votre taille? <EOS>
v
est
votre
taille?
<EOS>

Quelle

Input sentence

Incoming Email

Smart Reply

Google Research Blog - Nov 2015

Activate Smart Reply?

yes/no

Email Smart Reply Google Research Blog - Nov 2015 Activate Smart Reply? yes/no Small Feed-Forward Neural
Small Feed-Forward Neural Network
Small
Feed-Forward
Neural Network
Email Smart Reply Google Research Blog - Nov 2015 Activate Smart Reply? yes/no Small Feed-Forward Neural

Incoming Email

Smart Reply

Google Research Blog - Nov 2015

Activate Smart Reply?

yes/no

Google Research Blog - Nov 2015 Activate Smart Reply? yes/no Small Feed-Forward Neural Network Deep Recurrent
Small Feed-Forward Neural Network
Small
Feed-Forward
Neural Network
Deep Recurrent Neural Network
Deep Recurrent
Neural Network

Generated Replies

Nov 2015 Activate Smart Reply? yes/no Small Feed-Forward Neural Network Deep Recurrent Neural Network Generated Replies
Smart Reply
Smart Reply

April 1, 2009: April Fool’s Day joke

Nov 5, 2015: Launched Real Product

Feb 1, 2016: >10% of mobile Inbox replies

1, 2009: April Fool’s Day joke Nov 5, 2015: Launched Real Product Feb 1, 2016: >10%

Sequence to Sequence model applied to Google Translate

Sequence to Sequence model applied to Google Translate Google Confidential + Proprietary (permission granted to share

Google Confidential + Proprietary (permission granted to share within NIST)

Google Neural Machine Translation Model

One model
One
model

replica:

one

machine

w/ 8

GPUs

Y1 Y2
Y1 Y2

Y1

Y1 Y2
Y1 Y2
Y1 Y2

Y2

Y1 Y2
Y1 Y2

</s>

Encoder LSTMs

Encoder LSTMs

Encoder LSTMs
Gpu8 + + + 8 Layers Gpu3 Gpu2 Attention Gpu2 Gpu1 X3 X2 </s>
Gpu8
+
+
+
8 Layers
Gpu3
Gpu2
Attention
Gpu2
Gpu1
X3
X2
</s>
SoftMax
SoftMax

Decoder LSTMs

Decoder LSTMs
Gpu8 + + + Gpu3 Gpu2 Gpu1 <s> Y1 Y3
Gpu8
+
+
+
Gpu3
Gpu2
Gpu1
<s>
Y1
Y3

Model + Data Parallelism

Parameters distributed across many parameter server machines

Params Params
Params
Params
Model + Data Parallelism Parameters distributed across many parameter server machines Params Params Many replicas
Model + Data Parallelism Parameters distributed across many parameter server machines Params Params Many replicas
Model + Data Parallelism Parameters distributed across many parameter server machines Params Params Many replicas
Model + Data Parallelism Parameters distributed across many parameter server machines Params Params Many replicas
Model + Data Parallelism Parameters distributed across many parameter server machines Params Params Many replicas

Many

replicas

Neural Machine Translation

6

5

4

3

2

1

0

Neural Machine Translation 6 5 4 3 2 1 0 English > Spanish English > French

English

>

Spanish

English

>

French

English Spanish French Chinese > > > > Chinese English English English
English
Spanish
French
Chinese
>
>
> >
Chinese
English
English
English

Translation model

perfect translation

English English Translation model perfect translation human neural (GNMT) phrase-based (PBMT) Translation quality

human neural (GNMT)

Translation model perfect translation human neural (GNMT) phrase-based (PBMT) Translation quality Closes gap between

phrase-based (PBMT)

Translation quality

Closes gap between old system and human-quality translation by 58% to 87%

Enables better communication across the world

BACKTRANSLATION FROM JAPANESE (en->ja->en)

Phrase-Based Machine Translation (old system):

 

Kilimanjaro is 19,710 feet of the mountain covered with snow, and it is said that the highest mountain in Africa. Top of the west, “Ngaje Ngai” in the Maasai language, has been referred to as the house of God. The top close to the west, there is a dry, frozen carcass of a leopard. Whether the leopard had what the demand at that altitude, there is no that nobody explained.

Google Neural Machine Translation (new system):

Kilimanjaro is a mountain of 19,710 feet covered with snow, which is said to be the highest mountain in Africa. The summit of the west is called “Ngaje Ngai” God ‘s house in Masai language. There is a dried and frozen carcass of a leopard near the summit of the west. No one can explain what the leopard was seeking at that altitude.

carcass of a leopard near the summit of the west. No one can explain what the

Automated machine learning (“learning to learn”)

Automated machine learning (“learning to learn”) Google Confidential + Proprietary (permission granted to share within

Google Confidential + Proprietary (permission granted to share within NIST)

Current:

Solution = ML expertise + data + computation

Current:

Solution = ML expertise + data + computation

Can we turn this into:

Solution = data + 100X computation

???

Early encouraging signs Trying multiple different approaches:

Early encouraging signs

Trying multiple different approaches:

Trying multiple different approaches:

Early encouraging signs Trying multiple different approaches:
Early encouraging signs Trying multiple different approaches:
Early encouraging signs Trying multiple different approaches:
Early encouraging signs Trying multiple different approaches:
Early encouraging signs Trying multiple different approaches:

(1) RL-based architecture search (2) Model architecture evolution (3) Learn how to optimize

Appeared in ICLR 2017 Idea: model-generating model trained via RL (1) Generate ten models (2)

Appeared in ICLR 2017

Idea: model-generating model trained via RL

(1)

Generate ten models

(2)

Train them for a few hours

(3)

Use loss of the generated models as reinforcement learning signal

arxiv.org/abs/1611.01578

CIFAR-10 Image Recognition Task

CIFAR-10 Image Recognition Task

“Normal” LSTM cell

Penn Tree Bank Language Modeling Task

Cell discovered by architecture search
Cell discovered by
architecture search

Learn2Learn: Learn the Optimization Update Rule

Learn2Learn: Learn the Optimization Update Rule Neural Optimizer Search using Reinforcement Learning , Irwan Bello, Barret

Neural Optimizer Search using Reinforcement Learning, Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc Le. To appear in ICML 2017

More computational power needed

Deep learning is transforming how we design computers

Deep learning is transforming how we design computers Google Confidential + Proprietary (permission granted to

Google Confidential + Proprietary (permission granted to share within NIST)

Special computation properties

reduced

precision

ok

NOT
NOT

about 1.2

× about 0.6

about 0.7

1.21042

× 0.61127

0.73989343

Special computation properties

reduced

precision

ok

handful of specific operations

about 1.2 1.21042 × about 0.6 NOT × 0.61127 about 0.7 0.73989343 × =
about 1.2
1.21042
× about 0.6
NOT
× 0.61127
about 0.7
0.73989343
×
=

Tensor Processing Unit v2

Revealed in May at Google I/O

Google-designed device for neural net training and inference

● 180 teraflops of computation, 64 GB of memory

TPU Pod 64 2nd-gen TPUs 11.5 petaflops 4 terabytes of memory

TPU Pod 64 2nd-gen TPUs 11.5 petaflops 4 terabytes of memory

TPU Pod 64 2nd-gen TPUs 11.5 petaflops 4 terabytes of memory

Programmed via TensorFlow

Same program will run with only minor modifications on CPUs, GPUs, & TPUs

Will be Available through Google Cloud

Cloud TPU - virtual machine w/180 TFLOPS TPUv2 device attached

GPUs, & TPUs Will be Available through Google Cloud Cloud TPU - virtual machine w/180 TFLOPS
Making 1000 Cloud TPUs available for free to top researchers who are committed to open
Making 1000 Cloud TPUs available for free to top researchers who are committed to open

Making 1000 Cloud TPUs available for free to top researchers who are committed to open machine learning research

We’re excited to see what researchers will do with much more computation! g.co/tpusignup

Machine Learning in Google Cloud

Custom ML models

Custom ML models TensorFlow Machine Learning Engine

TensorFlow

Custom ML models TensorFlow Machine Learning Engine

Machine Learning Engine

Custom ML models TensorFlow Machine Learning Engine Pre-trained ML models Vision API Speech API Jobs API
Pre-trained ML models Vision API Speech API Jobs API Natural Language API Translation API Video
Pre-trained ML models
Vision API
Speech API
Jobs API
Natural
Language API
Translation
API
Video
Intelligence API

Machine Learning for Higher Performance Machine Learning Models

Learning for Higher Performance Machine Learning Models Google Confidential + Proprietary (permission granted to

Google Confidential + Proprietary (permission granted to share within NIST)

Device Placement with Reinforcement Learning

Device Placement with Reinforcement Learning Placement model (trained via RL) gets graph as input + set

Placement model (trained via RL) gets graph as input + set of devices, outputs device placement for each graph node

Measured time per step gives RL reward signal

+19.3% faster vs. expert human for NMT model
+19.3% faster vs. expert human for NMT model
reward signal +19.3% faster vs. expert human for NMT model +19.7% faster vs. expert human for

+19.7% faster vs. expert human for InceptionV3

Device Placement Optimization with Reinforcement Learning, Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou, Naveen Kumar, Rasmus Larsen, and Jeff Dean, to appear in ICML 2017, arxiv.org/abs/1706.04972

Now

Now Accuracy more compute neural networks other approaches Scale (data size, model size)

Accuracy

more

compute

neural networks

Now Accuracy more compute neural networks other approaches Scale (data size, model size)

other approaches

Scale (data size, model size)

Future

Future Accuracy more compute neural networks other approaches Scale (data size, model size)

Accuracy

more

compute

neural networks

Future Accuracy more compute neural networks other approaches Scale (data size, model size)

other approaches

Scale (data size, model size)

Example queries of the future

Which of these eye images shows symptoms of diabetic retinopathy?

Please fetch me a cup of tea from the kitchen
Please fetch me a cup
of tea from the kitchen
Describe this video in Spanish
Describe this video
in Spanish
cup of tea from the kitchen Describe this video in Spanish Find me documents related to

Find me documents related to reinforcement learning for robotics and summarize them in German

Conclusions

Deep neural networks are making significant strides in speech, vision, language, search, robotics, healthcare, …

If you’re not considering how to use deep neural nets to solve your problems, you almost certainly should be

More info about our work

g.co/brain More info about our work
g.co/brain Thanks!

Thanks!