Professional Documents
Culture Documents
Suresh Jaganathan
1. what exactly is deep learning ?
2. why is it generally better than other methods on image,
speech and certain other types of data?
Answers
1. ‘Deep Learning’ means using a neural network
with several layers of nodes between input and output
2
but:
3. multilayer neural networks have been around for
25 years. What’s actually new?
4
-0.06
W1
W2
-2.5 f(x)
W3
1.4
5
-0.06
2.7
-8.6
-2.5 f(x)
0.002 x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34
1.4
6
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
7
Training the neural network
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
8
Training data
Fields class Initialise with random weights
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
9
Training data
Fields class Present a training pattern
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7
1.9
10
Training data
Fields class Feed it through to get output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
1.9
11
Training data
Fields class Compare with target output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
0
1.9 error 0.8
12
Training data
Fields class Adjust weights based on error
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
0
1.9 error 0.8
13
Training data
Fields class Present a training pattern
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8
1.7
14
Training data
Fields class Feed it through to get output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1.7
15
Training data
Fields class Compare with target output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1
16
Training data
Fields class Adjust weights based on error
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1
17
Training data
Fields class And so on ….
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1 6.4
4.1 0.1 0.2 0
etc …
2.8 0.9
1
1.7 error -0.1
Repeat this thousands, maybe millions of times – each time
taking a random training instance, and making slight
weight adjustments
Algorithms for weight adjustment are designed to make
changes that will reduce the error 18
The decision boundary perspective…
Initial random weights
19
The decision boundary perspective…
Present a training instance / adjust the weights
20
The decision boundary perspective…
Present a training instance / adjust the weights
21
The decision boundary perspective…
Present a training instance / adjust the weights
22
The decision boundary perspective…
Present a training instance / adjust the weights
23
The decision boundary perspective…
Eventually ….
24
Point is…….
• Weight Learning is done by making thousands and thousands
of tiny adjustments
25
Some other points
Detail of a standard NN weight learning algorithm
𝑻 𝑻 𝑻
𝑯
𝒐 =𝐭𝐚𝐧𝐡 (𝑰 . 𝑾𝑰 ) 𝑵 𝒐=𝐭𝐚𝐧𝐡 ( 𝑰 . 𝑾𝑰 ) . 𝑾𝑶
)
𝑊𝐼 = ¿ ¿
𝑬𝒓=𝑹𝒆𝒒 . 𝑶𝒖𝒕𝒑𝒖𝒕 − 𝑵 𝒐
26
Some other points
If f(x) is non-linear, a network with 1 hidden layer can, in
theory, learn perfectly any classification problem.
A set of weights exists that can produce the targets
from the inputs.
The problem is finding them (weights).
27
Some other ‘by the way’ points
If f(x) is linear, the NN can only draw straight decision
boundaries (even if there are many layers of units)
28
Some other ‘by the way’ points
NNs use nonlinear f(x) so they
can draw complex boundaries,
but keep the data unchanged
29
Some other ‘by the way’ points
NNs use nonlinear f(x) so they SVMs only draw straight lines,
30
Feature extraction and selection
Few things you should keep in mind are:
- What kind of problem are you trying to solve? e.g. classification,
regression, clustering, etc.
- Do you have a lot of data?
- Do your data have very high dimensionality?
- Is your data labelled?
31
Feature
detectors
32
what is this
unit doing?
33
Hidden layer units become
self-organised feature detectors
1 5 10 15 20 25 …
…
1
strong +ve weight
low/zero weight
34
63
What does this unit detect?
1 5 10 15 20 25 …
…
1
strong +ve weight
low/zero weight
35
63
What does this unit detect?
1 5 10 15 20 25 …
…
1
strong +ve weight
low/zero weight
1
strong +ve weight
low/zero weight
37
63
What does this unit detect?
1 5 10 15 20 25 …
…
1
strong +ve weight
low/zero weight
39
vertical lines
40
63
Horizontal lines
41
63
Small circles
42
63
successive layers can learn higher-level features …
detect lines in
Specific positions etc …
43
successive layers can learn higher-level features …
detect lines in
Specific positions etc …
45
So: multiple layers make sense
Many-layer neural network architectures
capable of learning the true underlying features
46
But, until very recently, our weight-learning
algorithms simply did not work on multi-layer
architectures
47
Vanishing Gradient Problem
Dataset: MNIST [0,1,2,…9]
784 Neurons [28x28]
10 Neurons
96.48% 96.90% 96.53%
48
Hidden Layer I Hidden Layer II
784,30,30,30,10
0.012, 0.060, and 0.283
784,30,30,30,30,10
49
Note: http://neuralnetworksanddeeplearning.com/chap5.html
Here comes deep learning …
50
The new way to train multi-layer NNs…
51
The new way to train multi-layer NNs…
52
The new way to train multi-layer NNs…
53
The new way to train multi-layer NNs…
58
an auto-encoder is trained, with an absolutely standard
weight-adjustment algorithm to reproduce the input
60
Final layer trained to predict class based
on outputs from previous layers
61
And that’s that
• That’s the basic idea
• There are many types of deep learning,
• different kinds of autoencoder, variations on
architectures and training algorithms, etc…
62
Stacked Autoencoder
63
Auto Encoders
can be used for finding a low-dimensional representation of your
input data – why?
Visible
Hidden
65
Recreate Input
66
Deep Belief Networks
Structure : DBM = MLP
69
Other Deep Learning Models
70
Thank You
71
72
73