Professional Documents
Culture Documents
Guest Lecture
Oct. 2020 1
Neural Network Architectures
● CNN Architectures
● RNN Architectures
● Neural Network Architecture Design
Based on course materials from http://cs231n.github.io/ and the Dive into Deep Learning book: https://d2l.ai/
2
Recap: Neural Networks and MLP
Multilayer Perceptrons:
Multiple fully connected layers (at least input, hidden, and output layers).
Non-linear activations (sigmoid, ReLU, etc).
4
Recap: Neural Network Layers and Representation
5
Why so many NN architectures?
https://arxiv.org/abs/1605.07678 6
Why so many NN architectures?
Theoretically a single hidden layer network with infinite number of neurons
can fit any function.
7
Neural Network Architectures
● CNN Architectures
● RNN Architectures
● Neural Network Architecture Design
Based on course materials from http://cs231n.github.io/ and the Dive into Deep Learning book: https://d2l.ai/
8
Recap: Convolutional Neural Networks
9
Recap: Convolutional Neural Networks
10
Recap: AlexNet
https://arxiv.org/abs/1409.1556 12
VGG
https://arxiv.org/abs/1409.1556 13
If more layers the better, can we just stack arbitrarily large number of layers?
VGG-infinity??
...
14
Train on 56 layers vs 20 layers
Note this is not overfitting because the training error is also worse.
15
ResNet
This doesn’t make sense.
https://arxiv.org/abs/1512.03385 16
Residual Block
Then let’s help the network with identity mapping
17
ResNet
ResNet-50
ResNet-101
ResNet-152
18
DenseNet
Each layer is connected to every other layer in the same block.
https://arxiv.org/abs/1608.06993 19
Neural Network Architectures
● CNN Architectures
● RNN Architectures
● Neural Network Architecture Design
Based on course materials from http://cs231n.github.io/ and the Dive into Deep Learning book: https://d2l.ai/
20
Sequential Data
Data distribution changes over time
21
Sequential Data Same value point, going up or going down?
22
Sequential Data
Data distribution changes over time
23
Sequential Data Similar sound, similar meaning?
24
Sequential Data
Data distribution changes over time
25
Sequential Data Going forward or backward?
27
Sequential Data
Data distribution changes over time
28
Sequential Data
29
Recurrent Neural Networks
Neural network models that have connections from each layer to itself.
30
Recurrent Neural Network - a simple example
31
Gated Recurrent Units (GRU)
32
Learning long term dependency is hard!
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 33
Long Short Term Memory (LSTM)
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 34
Forget gate
35
Store info in cell states
36
Update cell states
37
Get output
38
Attention & Transformers
However, RNNs are not the default choice for temporal data anymore!
Attention Model
RNNs
39
Attention & Transformers
However, RNNs are not the default choice for temporal data anymore!
Attention Model
RNNs
40
Attention & Transformers
With attention mechanism as building block, an architecture called the
Transformers are proposed.
41
Attention & Transformers
However, RNNs are not the default choice for temporal data anymore!
They show that you don’t really need complex recurrent structure to perform
tasks on sequential data like languages.
42
Neural Network Architectures
● CNN Architectures
● RNN Architectures
● Neural Network Architecture Design
Based on course materials from http://cs231n.github.io/ and the Dive into Deep Learning book: https://d2l.ai/
43
Design a NN model for your problem
Things to consider:
44
Data and Neural Network Models
Static Data Dynamic Data Unsupervised Data
45
Static Data - Image
46
Static Data - Translation invariance in images
47
Static Data - Translation invariance in images
48
Static Data - Translation invariance in images
49
Convolutional Networks
50
Design a NN model for your problem
Things to consider:
51
Predicting a category
52
Dense predictions
U-Net
53
Input is a graph
54
https://github.com/tkipf/gcn
Design a NN model for your problem
Things to consider:
55
Binary Networks
56
https://mohitjain.me/2018/07/14/bnn/
Knowledge Distillation
Train a smaller network (Student) by learning from a more powerful network (Teacher).
57
https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764
Neural Network Model Search
Even with a targeted network architecture, there are still a lot of flexibility in
the exact configuration of your neural network framework.
58
Hyperparameters
Learning rate: controls how much you update the model’s weights.
Batch size: how many data you put into the model each time to calculate the
gradient.
Even Data: it is almost always guaranteed that a model would get better with
more data.
59
Tools for hyperparameter tuning
60
Finally...
61
AutoML, Neural Architecture Search, meta-learning…
https://cloud.google.com/automl/ 63
Summary: Neural Network Architectures
● CNN Architectures
● RNN Architectures
● Neural Network Architecture Design
Based on course materials from http://cs231n.github.io/ and the Dive into Deep Learning book: https://d2l.ai/
64