Deep Learning Lab: How To Train Your First Neural Network

Deep Learning Lab
how to train your first neural network

Teaching Assistant
Subhankar Roy
email: subhankar.roy@unitn.it
where: Open Space 5, Povo 1
- PhD student (University of Trento and FBK)

- Working on Transfer Learning, Unsupervised Domain Adaptation, Image
Generation.
Goal of Labs
- Having a practical experience of the theory
- Learning to use a deep learning framework, i.e. Pytorch
- Understanding how to set up and train a deep neural network for various
tasks/settings
Outline
- Google CoLab
- Overview of deep learning frameworks
- How to train my first neural network
CoLab (https://colab.research.google.com)
- Jupyter notebook environment hosted by Google
- No setup required (basically)
- Allows running code on GPU (12 hour maximum of GPU runtime)

CoLab (https://colab.research.google.com)
CoLab
CoLab
- Understanding how to set up and train a neural network
- Practising with a deep learning framework

Session 1
Let’s try colab together
Deep Learning Frameworks
Deep Learning Frameworks over Time
- Imperative: Imperative-style programs perform computation as you run them
- Symbolic: define the function first, then compile them
https://web.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/Pytorch_Tutorial.pdf
Caffe
- Protobuf as the interface

- Not easy to write and read protobuf
Tensorflow
- Rich set of operator

- Code is often difficult to read
Keras
- High level wrapper

- Simple and easy to use
- Difficult to personalize and write
complex algorithms
Pytorch
- Flexible and easy to write

Why PyTorch?
- Python based
- Fast
- Amazingly flexible and easy to learn
- Automatic differentiation
- Dynamic graph computation

PyTorch vs TensorFlow
- Biggest difference: Static vs. dynamic computation graphs

- Creating a static graph beforehand is unnecessary
Example: Linear Regression
- Tensorflow: Create optimizer before feeding data
Example: Linear Regression
- PyTorch: Create optimizer while feeding data
What is PyTorch?
Think about PyTorch as a deep learning oriented upgrade of NumPy:
- Allows operations on GPU(s)
- Contains everything you need to set up and train a network
It is based on the concept of Tensor. A Tensor is a version of the n-dimensional

array of NumPy which can be stored both in CPU and GPU.
Training/deploying a network is held out as operations among tensors

Forward and backward pass of a NN
Input: Output: Fully connected params:
Activation function: … its gradient:
Target: Loss:
Forward and backward pass of a NN
Forward pass: Backward pass:
Forward and backward pass of a NN (NumPy)
Forward and backward pass of a NN (PyTorch)
Forward and backward pass of a NN (PyTorch)
Computational Graphs
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdf
Input image
Loss
They need to permit to:
(1) Easily build big computational graphs
(2) Easily compute gradients in computational graphs
(3) Run it all efficiently on GPU (wrap cuDNN, cuBLAS, etc)

CPU vs GPU
CPU vs GPU
Problems:
- Can’t run on GPU
- Have to compute our own

gradients
We have:
- Computational Graph
creation
- Automatic gradient
computation
We can ask TF to run on GPU
We have:
- Variable definition for

building CG
- Forward and Backward

pass.
We can ask Pytorch to run on

GPU
Summary
Back to Tensors
Back to Tensors
Think about PyTorch as a deep learning oriented upgrade of NumPy:
- Allows operations on GPU(s)
- Contains everything you need to set up and train a network
It is based on the concept of Tensor. A Tensor is a version of the n-dimensional

array of NumPy which can be stored both in CPU and GPU.
Training/deploying a network is held out as operations among tensors

Tensors
Multi-dimensional matrix. (Float/Byte/Long)
Can initialize from and convert to numpy arrays.

Tensors
Tensors are objects used to instantiate both variables and parameters
(torch.nn.Parameter).
Tensors
Tensors are objects used to instantiate both variables and parameters
(torch.nn.Parameter). In both cases they have different fields, among which:
- .data, storing the numerical values of a Tensor

- .requires_grad, saying whether the Tensor needs or not gradient
computations (can be set)
- .grad, which allows to store (and even retrieve) the gradient (in case the Tensor
requires it)
Especially .grad can be a useful tool to check the gradient flow.

Session 2
Tensors
Train a deep network
What do we need to train a network?
- Data
- Data
- Network
- Data
- Network
- Cost function
- Data
- Network
- Cost function
- Update rule
What do we need to train a network? (PyTorch)
- torchvision.datasets + torch.utils.data (.DataLoader)
- torch.nn.Module
- torch.nn.*Loss
- torch.optim
What do we need to train a network? (PyTorch)
- torchvision.datasets + torch.utils.data (.DataLoader)
- torch.nn.Module
- torch.nn.*Loss
- torch.optim
Everything is customizable:
you can create your own version of each component
Let’s train!
Your task is to classify digits (MNIST dataset):
- Instantiate a dataloader
- MNIST is already in torchvision.dataset
- Create a simple MLP
- input-to-hidden and hidden-to-output fully connected layers (torch.nn.Linear)
- Do not forget about activation(s)
- Instantiate an optimizer
- torch.optim is the guide
- Instantiate a loss/cost function
- It is a classification task with 10 classes...what about a cross entropy?
- Put things together to implement the training and test procedure
- the evaluation metric is obviously accuracy (= correct_predictions/number_of_samples)
Setting up a dataset
A dataset is defined through torchvision.dataset.
Various datasets are already there (e.g. MNIST, Cifar, ImageNet, COCO, ...).
E.g. Initialize MNIST:
Initialize your custom dataset:

Setting up a dataloader
A dataloader is a wrapper of the dataset which allows to iterate over the data. It can be
defined through:
Then we can obtain data and labels by just retrieving its elements:
Note:
- There exist different type of loaders and different samplers
Setting up a network
In PyTorch, a network is defined as a subclass of torch.nn.Module. We need to
define:
- The initialization of the network (i.e. layers, initial values of the parameters, etc.)
- The forward pass
No definition of the backward is needed (due to the automatic differentiation).

Setting up a network - example
torch.nn
In torch.nn are defined all the basic components needed to build a network. For
instance, there you can find:
- Layers (nn.Linear, nn.Conv2d, nn.Dropout, nn.BatchNorm2d, ...)
- Activation functions (nn.ReLU, nn.Sigmoid, ...)
- Loss functions (nn.CrossEntropyLoss, nn.MSELoss, ...)

Setting up a cost function
From what we said before, it is pretty easy: have a look at torch.nn:
E.g. :
Note:
- Most of these functions already contain the proper activation function
(e.g. CrossEntropyLoss contains a softmax activation)
Setting up an optimizer
The optimizers are defined in torch.optim. The standard template for initializing an
optimizer is:
E.g.
Note:
- Different optimizers need different hyperparameters
- You can filter the parameters given to the optimizer
Setting up an optimizer - operations
The optimizer controls how the parameters are updated after each iteration
(iteration = 1 forward pass + 1 backward pass) with respect to their gradient. To
update the weights after the backward call:
To avoid accumulating gradient, we must free the .grad component of each Tensor in
the graph after each iteration. This can be achieved by just calling:
Visualizing the results
Babysitting the training procedure by just looking at printed text is objectively boring.
We can exploit tensorboardcolab to have a nice visualization of the lines, like this:
Visualizing the results
Session 3
How to train your first neural network
Useful links
- Colab: https://colab.research.google.com
- PyTorch: https://pytorch.org/
- PyTorch doc: https://pytorch.org/docs/stable/index.html
- how to build an MLP with NumPy:

https://github.com/Trion129/Neural-Network-using-numpy/blob/master/neuralnetwork.py
Let’s train! (by yourself)
Follow the steps we discussed before for classifying digits in the SVHN dataset
(already included in torchvision). Try to do this from scratch, starting from the rough
template and not directly from the previous solution.
Other useful tasks you may want to try
- Let/make the network overfit
- How/why does it happen?
- Increase the performances of the network

- How can you do that? Is 99% of accuracy achievable?
- How do the gradient flow changes with the optimizer?

- Each parameter of a network as the field .grad, e.g. layer1.weight.grad
What happens if...
- I change some of the hyperparameters? (e.g. learning rate, weight decay, etc.)
- I change the optimizer? (Adam, RMSProp, etc. ...)
- I change the number of parameters? (e.g. I increase/decrease the hidden state dimension)
- I add more layers?
- I add Dropout? (torch.nn.Dropout)
Tip: Save the logs and visualize (in Tensorboard) the effect of the above components in the classification
accuracy and loss curves.

Deep Learning Lab: How To Train Your First Neural Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning Lab: How To Train Your First Neural Network

Uploaded by

Copyright:

Available Formats

Deep Learning Lab

how to train your ﬁrst neural network

- PhD student (University of Trento and FBK)

- No setup required (basically)

- Allows running code on GPU (12 hour maximum of GPU runtime)

- Practising with a deep learning framework

- Protobuf as the interface

- Rich set of operator

- High level wrapper

- Flexible and easy to write

- Amazingly ﬂexible and easy to learn

- Dynamic graph computation

- Biggest diﬀerence: Static vs. dynamic computation graphs

It is based on the concept of Tensor. A Tensor is a version of the n-dimensional

Training/deploying a network is held out as operations among tensors

Activation function: … its gradient:

(1) Easily build big computational graphs

(2) Easily compute gradients in computational graphs

(3) Run it all eﬃciently on GPU (wrap cuDNN, cuBLAS, etc)

- Can’t run on GPU

- Have to compute our own

We can ask TF to run on GPU

- Variable deﬁnition for

- Forward and Backward

We can ask Pytorch to run on

It is based on the concept of Tensor. A Tensor is a version of the n-dimensional

Training/deploying a network is held out as operations among tensors

Can initialize from and convert to numpy arrays.

- .data, storing the numerical values of a Tensor

Especially .grad can be a useful tool to check the gradient ﬂow.

E.g. Initialize MNIST:

Initialize your custom dataset:

No deﬁnition of the backward is needed (due to the automatic diﬀerentiation).

- Layers (nn.Linear, nn.Conv2d, nn.Dropout, nn.BatchNorm2d, ...)

- Activation functions (nn.ReLU, nn.Sigmoid, ...)

- Loss functions (nn.CrossEntropyLoss, nn.MSELoss, ...)

- PyTorch doc: https://pytorch.org/docs/stable/index.html

- how to build an MLP with NumPy:

- Increase the performances of the network

- How do the gradient ﬂow changes with the optimizer?

- I change the optimizer? (Adam, RMSProp, etc. ...)

- I add more layers?

- I add Dropout? (torch.nn.Dropout)

You might also like