Introduction To Deep Learning: Suresh Jaganathan

Introduction to Deep Learning
Suresh Jaganathan
1. what exactly is deep learning ?
2. why is it generally better than other methods on image,
speech and certain other types of data?
Answers
1. ‘Deep Learning’ means using a neural network
with several layers of nodes between input and output
2. the series of layers between input & output do

feature identification and processing in a series of stages,
just as our brains seem to.
2
but:
3. multilayer neural networks have been around for
25 years. What’s actually new?
we have always had good algorithms

for learning the
weights in networks with 1 hidden
layer
but these algorithms are not good at

learning the weights for
networks with more hidden layers
what’s new is: algorithms for training
many-later networks 3
Neural Networks
1. quick-explanation of how neural network

weights are learned;
2. the idea of feature learning (‘intermediate
features’)
3. The ‘breakthrough’ – the simple trick for
training Deep neural networks
4
-0.06
W1
W2
-2.5 f(x)
W3
1.4
5
-0.06
2.7
-8.6
-2.5 f(x)
0.002 x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34
1.4
6
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
7
Training the neural network
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
8
Training data
Fields class Initialise with random weights
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
9
Training data
Fields class Present a training pattern
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7
1.9
10
Training data
Fields class Feed it through to get output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
1.9
11
Training data
Fields class Compare with target output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
0
1.9 error 0.8
12
Training data
Fields class Adjust weights based on error
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
0
1.9 error 0.8
13
Training data
Fields class Present a training pattern
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8
1.7
14
Training data
Fields class Feed it through to get output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1.7
15
Training data
Fields class Compare with target output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1
16
Training data
Fields class Adjust weights based on error
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1
17
Training data
Fields class And so on ….
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1 6.4
4.1 0.1 0.2 0
etc …
2.8 0.9
1
1.7 error -0.1
Repeat this thousands, maybe millions of times – each time
taking a random training instance, and making slight
weight adjustments
Algorithms for weight adjustment are designed to make
changes that will reduce the error 18
The decision boundary perspective…
Initial random weights
19
Present a training instance / adjust the weights
20
21
22
23
Eventually ….
24
Point is…….
• Weight Learning is done by making thousands and thousands
of tiny adjustments
• but, by dumb luck, eventually this tends to be good enough to

learn effective classifiers for many real applications
25
Some other points
Detail of a standard NN weight learning algorithm
𝑻 𝑻 𝑻
𝑯
𝒐 =𝐭𝐚𝐧𝐡 ⁡(𝑰 . 𝑾𝑰 ) 𝑵 𝒐=𝐭𝐚𝐧𝐡 ( 𝑰 . 𝑾𝑰 ) . 𝑾𝑶
)
𝑊𝐼 = ¿ ¿
𝑬𝒓=𝑹𝒆𝒒 . 𝑶𝒖𝒕𝒑𝒖𝒕 − 𝑵 𝒐
26
Some other points
If f(x) is non-linear, a network with 1 hidden layer can, in
theory, learn perfectly any classification problem.
A set of weights exists that can produce the targets
from the inputs.
The problem is finding them (weights).
27
Some other ‘by the way’ points
If f(x) is linear, the NN can only draw straight decision
boundaries (even if there are many layers of units)
28
NNs use nonlinear f(x) so they
can draw complex boundaries,
but keep the data unchanged
29
NNs use nonlinear f(x) so they SVMs only draw straight lines,
can draw complex boundaries, but they transform the data

first
but keep the data unchanged in a way that makes that OK
30
Feature extraction and selection
Few things you should keep in mind are:
- What kind of problem are you trying to solve? e.g. classification,
regression, clustering, etc.
- Do you have a lot of data?
- Do your data have very high dimensionality?
- Is your data labelled?
If you are handling images, you extract features (appropriate)
If the feature dimension is high - try feature selection or feature

transformation - get high quality discriminant features.
31
Feature
detectors
32
what is this
unit doing?
33
Hidden layer units become
self-organised feature detectors
1 5 10 15 20 25 …
…
1
strong +ve weight
low/zero weight
34
63
What does this unit detect?
1 5 10 15 20 25 …
…
1
strong +ve weight
low/zero weight
35
63
1 5 10 15 20 25 …
…
1
strong +ve weight
low/zero weight
it will send strong signal for a horizontal

line in the top row, ignoring everywhere else
36
63
1 5 10 15 20 25 …
…
1
strong +ve weight
low/zero weight
37
63
1 5 10 15 20 25 …
…
1
strong +ve weight
low/zero weight
Strong signal for a dark area in the top left

corner
38
63
What features might you expect a good NN
to learn, when trained with data like this?
39
vertical lines
40
63
Horizontal lines
41
63
Small circles
42
63
successive layers can learn higher-level features …
detect lines in
Specific positions etc …
Higher level detetors

( horizontal line,
“RHS vertical lune”
“upper loop”, etc…
v etc …
43
successive layers can learn higher-level features …
detect lines in
Specific positions etc …
Higher level detetors

( horizontal line,
“RHS vertical line”
“upper loop”, etc…
v etc …
What does this unit detect? 44

So: multiple layers make sense
45
So: multiple layers make sense
Many-layer neural network architectures
capable of learning the true underlying features
46
But, until very recently, our weight-learning
algorithms simply did not work on multi-layer
architectures
47
Vanishing Gradient Problem
Dataset: MNIST [0,1,2,…9]
784 Neurons [28x28]
10 Neurons
96.48% 96.90% 96.53%
48
Hidden Layer I Hidden Layer II
lot of variation in how rapidly the

neurons learn
784,30,30,30,10
0.012, 0.060, and 0.283
784,30,30,30,30,10
0.003, 0.017, 0.070, and 0.285
49
Note: http://neuralnetworksanddeeplearning.com/chap5.html
Here comes deep learning …
50
The new way to train multi-layer NNs…
51
Train this layer first
52

then this layer
53

then this layer
then this layer
54

then this layer
then this layer
then this layer
55

then this layer
then this layer
then this layer
finally this layer56
EACH of the (non-output) layers is

trained to be an auto-encoder
Basically, it is forced to learn good
features that describe what comes from
57
the previous layer
an auto-encoder is trained, with an absolutely standard
weight-adjustment algorithm to reproduce the input
58
an auto-encoder is trained, with an absolutely standard
weight-adjustment algorithm to reproduce the input
By making this happen with (many) fewer units than the

inputs, this forces the ‘hidden layer’ units to become good
feature detectors
59
intermediate layers are each trained to be
auto encoders (or similar)
60
Final layer trained to predict class based
on outputs from previous layers
61
And that’s that
• That’s the basic idea
• There are many types of deep learning,
• different kinds of autoencoder, variations on
architectures and training algorithms, etc…
62
Stacked Autoencoder
63
Auto Encoders
can be used for finding a low-dimensional representation of your
input data – why?
Some features may be redundant or correlated

resulting in wasted processing time and overfitting in your
model (too many parameters).
It is thus ideal to only include the features we need.
If your “reconstruction” of x is very accurate, that means your low-

dimensional representation is good.
The aim of an auto-encoder is to learn a compressed, distributed

representation (encoding) for a set of data
64
Restricted Boltzmann Machine
Visible
Hidden
65
Recreate Input
66
Deep Belief Networks
Structure : DBM = MLP
Input Layer Output Layer

67
Hidden Layer
68
Implications
Small Labelled Dataset
69
Other Deep Learning Models
RNN – Recurrent Neural Networks

CNN – Convolutional Neural Networks
70
Thank You
71
72
73

Introduction To Deep Learning: Suresh Jaganathan

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Deep Learning: Suresh Jaganathan

Uploaded by

Copyright:

Available Formats

Introduction to Deep Learning

2. the series of layers between input & output do

we have always had good algorithms

but these algorithms are not good at

1. quick-explanation of how neural network

• but, by dumb luck, eventually this tends to be good enough to

can draw complex boundaries, but they transform the data

If you are handling images, you extract features (appropriate)

If the feature dimension is high - try feature selection or feature

it will send strong signal for a horizontal

Strong signal for a dark area in the top left

Higher level detetors

Higher level detetors

What does this unit detect? 44

lot of variation in how rapidly the

0.003, 0.017, 0.070, and 0.285

Train this layer first

Train this layer first

Train this layer first

Train this layer first

Train this layer first

EACH of the (non-output) layers is

By making this happen with (many) fewer units than the

Some features may be redundant or correlated

It is thus ideal to only include the features we need.

If your “reconstruction” of x is very accurate, that means your low-

The aim of an auto-encoder is to learn a compressed, distributed

Input Layer Output Layer

Small Labelled Dataset

RNN – Recurrent Neural Networks

You might also like