Pytorch Chintala

Researcher Edition
Adam Paszke, Sam Gross, Soumith Chintala, Francisco Massa, Adam

Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin,
Natalia Gimelshein, Alban Desmaison, Andreas Kopf, Edward Yang, Zach
Devito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy & Team
What is PyTorch?
What is PyTorch?
automatic differentiation Ndarray library gradient based Utilities

engine with GPU support optimization package (data loading, etc.)
Deep Learning
Numpy-alternative
Reinforcement Learning
ndarray library
• np.ndarray <-> torch.Tensor
• 200+ operations, similar to numpy
• very fast acceleration on NVIDIA GPUs

ndarray library
Numpy PyTorch
ndarray / Tensor library
NumPy bridge
NumPy bridge
Zero memory-copy
very efficient
NumPy bridge
NumPy bridge
Seamless GPU Tensors
automatic differentiation engine
for deep learning and reinforcement learning
PyTorch Autograd
from torch.autograd import Variable
PyTorch Autograd
x = Variable(torch.randn(1, 10))
prev_h = Variable(torch.randn(1, 20))
W_h = Variable(torch.randn(20, 20))
W_x = Variable(torch.randn(20, 10))
PyTorch Autograd
x = Variable(torch.randn(1, 10)) MM MM
i2h = torch.mm(W_x, x.t())

h2h = torch.mm(W_h, prev_h.t())
PyTorch Autograd

next_h = i2h + h2h
PyTorch Autograd
Add
next_h = i2h + h2h
PyTorch Autograd
Add
next_h = i2h + h2h
next_h = next_h.tanh() Tanh
PyTorch Autograd
Add
next_h = i2h + h2h
next_h = next_h.tanh() Tanh
next_h.backward(torch.ones(1, 20))
Neural Networks
Optimization package
SGD, Adagrad, RMSProp, LBFGS, etc.
Research Core Philosophy Upcoming
Pain Points
workflows (of PyTorch) features
what do they look

how does PyTorch
like in
deal with them?
the deep learning space
Research Workflows
Research Workflow
pick datasets /
idea / theory design experiments
environments
Add results Training &

Implement models
to paper Validation
Research Workflow
Literature Review
pick datasets /
environments

Implement models
to paper Validation
Research Workflow I'm watching you
Literature Review
pick datasets /
environments

Implement models
to paper Validation
Work items in practice
Writing Building models Implementing Checkpointing

Dataset loaders Training loop models
Interfacing with Dealing with Building

Building optimizers
environments GPUs Baselines
Work items in practice
Writing Building models Implementing Checkpointing

Dataset loaders Training loop models
Python + PyTorch - an environment to do all of this

Interfacing with Dealing with Building
Building optimizers
environments GPUs Baselines
Writing Data Loaders
• every dataset is slightly differently formatted
• have to be preprocessed and normalized differently
• have to be preprocessed and normalized differently
• need a multithreaded Data loader to feed GPUs fast enough

PyTorch solution:
• share data loaders across the community!
PyTorch solution:
• share data loaders across the community!
PyTorch solution:
• use regular Python to write Datasets:
leverage existing Python code

PyTorch solution:
• use regular Python to write Datasets:
leverage existing Python code

Example: ParlAI
PyTorch solution:
• Code in practice
PyTorch solution:
Research Upcoming
Pain Points Core Philisophy
Workflows Features
PyTorch solution:
Research Upcoming
Pain Points Core Philisophy
Workflows Features
Interfacing with environments
Cars Video games

Internet
Cars Video games

Pretty much every environment provides a Python API
Cars Video games

Natively interact with the environment directly
Debugging
• PyTorch is a Python extension
Debugging
• Use your favorite Python debugger
Debugging
Debugging
• Use the most popular debugger:

Debugging
• Use the most popular debugger:
print(foo)
Identifying bottlenecks
• Use your favorite Python profiler
Identifying bottlenecks
• Use your favorite Python profiler: Line_Profiler
Compilation Time
• PyTorch is written for the impatient
Compilation Time
• Absolutely no compilation time when writing your scripts
whatsoever
Compilation Time
• Absolutely no compilation time when writing your scripts
whatsoever
• All core kernels are pre-compiled
Ecosystem
• Use the entire Python ecosystem at your will
Ecosystem
• Including SciPy, Scikit-Learn, etc.
Ecosystem
Ecosystem
Ecosystem
• A shared model-zoo:
Ecosystem
• A shared model-zoo:
Linear style of programming
• PyTorch is an imperative / eager computational toolkit
- Not unique to PyTorch
- Chainer, Dynet, MXNet-Imperative, TensorFlow-imperative, TensorFlow-eager, etc.
- Chainer, Dynet, MXNet-Imperative, TensorFlow-imperative, TensorFlow-eager, etc.
- Least overhead, designed with this in mind
- 20 to 30 microseconds overhead per node creation
- vs several milliseconds / seconds in other options
Go Through an example
The Philosophy of PyTorch
The Philosophy of PyTorch
• Stay out of the way
• Cater to the impatient
• Promote linear code-flow
• Full interop with the Python ecosystem
• Be as fast as anything else

Upcoming features
Distributed PyTorch
• MPI style distributed communication
• Broadcast Tensors to other nodes
• Reduce Tensors among nodes
- for example: sum gradients among all nodes

Higher order derivatives
• grad(grad(grad(grad(grad(grad(torch.norm(x)))))))
Higher order derivatives
• grad(grad(grad(grad(grad(grad(torch.norm(x)))))))
• Useful to implement crazy ideas
Broadcasting and Advanced Indexing
• Numpy-style broadcasting
• Numpy-style indexing that covers advanced cases
JIT Compilation
• Possible in imperative frameworks
• The key idea is deferred or lazy evaluation
-y=x+2
-z=y*y
- # nothing is executed yet, but the graph is being constructed
- print(z) # now the entire graph is executed: z = (x+2) * (x+2)
• We can do just in time compilation on the graph before execution
Computation Deep Learning Imperative JIT

Graphs Frameworks Frameworks Compilation
Lazy Evaluation
W_h = Variable(torch.randn(20, 20)) MM MM

h2h = torch.mm(W_h, prev_h.t()) Add
next_h = i2h + h2h
next_h = next_h.tanh()
Tanh

Lazy Evaluation

next_h = i2h + h2h
Graph built but not actually executed
Tanh

Lazy Evaluation

next_h = i2h + h2h
print(next_h)
Tanh
Data accessed. Execute graph.

Lazy Evaluation
• A little bit of time between building and executing graph
- Use it to compile the graph just-in-time
MM MM
Add
Tanh

JIT Compilation
• Fuse and optimize operations
MM MM Fuse operations. Ex:

Add
Tanh

JIT Compilation
• Cache subgraphs
I've seen this part of the graph before

MM MM
let me pull up the compiled version
from cache
Add
Tanh

Compilation benefits
Out-of-order Automatic
Kernel fusion
execution work placement
Node 1
ReLU CPU
1 2 3 3 1 2 Node 1
Node 0 GPU 1
BatchNorm GPU 0
Node 1
Node 0 GPU 0
Conv2d GPU 1
Node 0
GPU 0
With ❤ from
http://pytorch.org
Released Jan 18th
100,000+ downloads
950+ community repos
8200+ user posts
245 contributors

Pytorch Chintala

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pytorch Chintala

Uploaded by

Copyright:

Available Formats

Researcher Edition

Adam Paszke, Sam Gross, Soumith Chintala, Francisco Massa, Adam

automatic diﬀerentiation Ndarray library gradient based Utilities

• very fast acceleration on NVIDIA GPUs

i2h = torch.mm(W_x, x.t())

i2h = torch.mm(W_x, x.t())

what do they look

Add results Training &

Add results Training &

Add results Training &

Writing Building models Implementing Checkpointing

Interfacing with Dealing with Building

Writing Building models Implementing Checkpointing

Python + PyTorch - an environment to do all of this

• need a multithreaded Data loader to feed GPUs fast enough

leverage existing Python code

leverage existing Python code

Cars Video games

Cars Video games

Cars Video games

• Use the most popular debugger:

• Use the most popular debugger:

• Promote linear code-flow

• Full interop with the Python ecosystem

• Be as fast as anything else

• Reduce Tensors among nodes

- for example: sum gradients among all nodes

Computation Deep Learning Imperative JIT

i2h = torch.mm(W_x, x.t())

Computation Deep Learning Imperative JIT

i2h = torch.mm(W_x, x.t())

Computation Deep Learning Imperative JIT

i2h = torch.mm(W_x, x.t())

Data accessed. Execute graph.

Computation Deep Learning Imperative JIT

Computation Deep Learning Imperative JIT

MM MM Fuse operations. Ex:

Computation Deep Learning Imperative JIT

I've seen this part of the graph before

Computation Deep Learning Imperative JIT

You might also like