Professional Documents
Culture Documents
Subhankar Roy
email: subhankar.roy@unitn.it
where: Open Space 5, Povo 1
https://web.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/Pytorch_Tutorial.pdf
Caffe
- Fast
- Automatic differentiation
Target: Loss:
Forward and backward pass of a NN
Forward pass: Backward pass:
Forward and backward pass of a NN (NumPy)
Forward pass: Backward pass:
Forward and backward pass of a NN (PyTorch)
Forward pass: Backward pass:
Forward and backward pass of a NN (PyTorch)
Forward pass: Backward pass:
Computational Graphs
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Computational Graphs
Input image
Loss
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Deep Learning Frameworks
They need to permit to:
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
CPU vs GPU
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Computational Graphs
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Computational Graphs
Problems:
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Computational Graphs
We have:
- Computational Graph
creation
- Automatic gradient
computation
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Computational Graphs
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Computational Graphs
We have:
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Computational Graphs
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Summary
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Back to Tensors
Back to Tensors
Think about PyTorch as a deep learning oriented upgrade of NumPy:
- Allows operations on GPU(s)
- Contains everything you need to set up and train a network
- Network
What do we need to train a network?
- Data
- Network
- Cost function
What do we need to train a network?
- Data
- Network
- Cost function
- Update rule
What do we need to train a network? (PyTorch)
- torchvision.datasets + torch.utils.data (.DataLoader)
- torch.nn.Module
- torch.nn.*Loss
- torch.optim
What do we need to train a network? (PyTorch)
- torchvision.datasets + torch.utils.data (.DataLoader)
- torch.nn.Module
- torch.nn.*Loss
- torch.optim
Everything is customizable:
you can create your own version of each component
Let’s train!
Your task is to classify digits (MNIST dataset):
- Instantiate a dataloader
- MNIST is already in torchvision.dataset
- Create a simple MLP
- input-to-hidden and hidden-to-output fully connected layers (torch.nn.Linear)
- Do not forget about activation(s)
- Instantiate an optimizer
- torch.optim is the guide
- Instantiate a loss/cost function
- It is a classification task with 10 classes...what about a cross entropy?
- Put things together to implement the training and test procedure
- the evaluation metric is obviously accuracy (= correct_predictions/number_of_samples)
Setting up a dataset
A dataset is defined through torchvision.dataset.
Various datasets are already there (e.g. MNIST, Cifar, ImageNet, COCO, ...).
Then we can obtain data and labels by just retrieving its elements:
Note:
- There exist different type of loaders and different samplers
Setting up a network
In PyTorch, a network is defined as a subclass of torch.nn.Module. We need to
define:
- The initialization of the network (i.e. layers, initial values of the parameters, etc.)
- The forward pass
E.g. :
Note:
- Most of these functions already contain the proper activation function
(e.g. CrossEntropyLoss contains a softmax activation)
Setting up an optimizer
The optimizers are defined in torch.optim. The standard template for initializing an
optimizer is:
E.g.
Note:
- Different optimizers need different hyperparameters
- You can filter the parameters given to the optimizer
Setting up an optimizer - operations
The optimizer controls how the parameters are updated after each iteration
(iteration = 1 forward pass + 1 backward pass) with respect to their gradient. To
update the weights after the backward call:
To avoid accumulating gradient, we must free the .grad component of each Tensor in
the graph after each iteration. This can be achieved by just calling:
Visualizing the results
Babysitting the training procedure by just looking at printed text is objectively boring.
We can exploit tensorboardcolab to have a nice visualization of the lines, like this:
Visualizing the results
Session 3
How to train your first neural network
Useful links
- Colab: https://colab.research.google.com
- PyTorch: https://pytorch.org/
Follow the steps we discussed before for classifying digits in the SVHN dataset
(already included in torchvision). Try to do this from scratch, starting from the rough
template and not directly from the previous solution.
Other useful tasks you may want to try
- Let/make the network overfit
- How/why does it happen?
- I change the number of parameters? (e.g. I increase/decrease the hidden state dimension)
Tip: Save the logs and visualize (in Tensorboard) the effect of the above components in the classification
accuracy and loss curves.