Slide 7 - Neural Networks

Neural Networks:
Representation
Non-linear
hypotheses
Machine Learning
2
Non-linear Classification
x2
x1
size
# bedrooms
# floors
age
Andrew Ng
3
What is this?
You see this:
But the camera sees this:
Andrew Ng
4
Computer Vision: Car detection
Cars Not a car
Testing:
What is this?
Andrew Ng
pixel 1
5
Learning
Algorithm
pixel 2
Raw image
pixel 2
Cars pixel 1
“Non”-Cars Andrew Ng
pixel 1
6
Learning
Algorithm
pixel 2
Raw image 50 x 50 pixel images→ 2500 pixels

pixel 2 (7500 if RGB)
pixel 1 intensity
pixel 2 intensity
pixel 2500 intensity
Cars pixel 1 Quadratic features ( ): ≈3 million

“Non”-Cars features
Andrew Ng
Neural Networks:
Representation
Model
representation I
Machine Learning
8
Neural Networks
Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s and early 90s; popularity
diminished in late 90s.
Recent resurgence: State-of-the-art technique for many
applications
Andrew Ng
9
The “one learning algorithm” hypothesis
Auditory Cortex
Auditory cortex learns to see
[Roe et al., 1992] Andrew Ng

10
The “one learning algorithm” hypothesis
Somatosensory Cortex
Somatosensory cortex learns to see
[Metin & Frost, 1989] Andrew Ng

11
Neuron in the brain
Andrew Ng
12
Neurons in the brain
[Credit: US National Institutes of Health, National Institute on Aging] Andrew Ng

13
Neuron model: Logistic unit
Sigmoid (logistic) activation function.
Andrew Ng
14
Neural Network
Layer 1 Layer 2 Layer 3

Andrew Ng
15
Neural Network
“activation” of unit in layer
matrix of weights controlling
function mapping from layer to
layer
If network has units in layer , units in layer , then

will be of dimension .
Andrew Ng
16
17
Neural Networks:
Representation
Model
representation II
Machine Learning
19
20
21
22
23
24
Forward propagation: Vectorized implementation
Input —> Hidden —> Output
Forward propagation
Add .
Vectorized Implementation
Andrew Ng
25
26
27
28
what this neural network is doing is
Neural Network learning its own features just like logistic regression, except
that rather than using the original
features x1, x2, x3, is using these
new features a1, a2, a3.
Layer 1 Layer 2 Layer 3

28
Andrew Ng
29
The features a1, a2, a3, they themselves are learned as

functions of the input. The function mapping from layer 1 to
layer 2, that is determined by some other set of parameters
theta 1.
30
Concretely, the function mapping from layer 1 to layer 2, that is determined by

some other set of parameters, theta 1. So it's as if the neural network, instead of
being constrained to feed the features x1, x2, x3 to logistic regression, It gets to
learn its own features, a1, a2, a3, to feed into the logistic regression and as you can
imagine depending on what parameters it chooses for theta 1, You can learn some
pretty interesting and complex features
and therefore you can end up with a better hypotheses than if you were constrained
to use the raw features x1, x2 or x3 or if you will constrain to choose the
polynomial terms, x1, x2, x3, and so on. But instead, this algorithm has the
flexibility to try to learn whatever features at once, using these a1, a2, a3 in order
to feed into this last unit that's essentially a logistic regression.
Other network architectures
31
Layer 1 Layer 2 Layer 3 Layer 4
Andrew Ng
Neural Networks:
Representation
Examples and
intuitions I
Machine Learning
Non-linear classification example: XOR/XNOR
33
, are binary (0 or 1).
x2
x2
x1
x1
We'd like to learn a non-linear division of
boundary that may separate the positive
and negative examples.
Andrew Ng
34
Simple example: AND 1.0
0 0
0 1
1 0
1 1
Andrew Ng
35
Example: OR function
-10
20 0 0
20 0 1
1 0
1 1
Andrew Ng
36
Negation:
0
1
Andrew Ng
Putting it together:
37
-30 10 -10
20 -20 20
20 -20 20
0 0
0 1
1 0
1 1
Andrew Ng
38
when you have multiple layers you have relatively

Neural Network intuition simple function of the inputs of the second layer. But
the third layer is built on to compute more complex
functions, and then the layer after can compute
even more complex functions.
Layer 1 Layer 2 Layer 3 Layer 4
38
Andrew Ng
39
Handwritten digit classification
[Courtesy of Yann LeCun] Andrew Ng

Neural Networks:
Representation
Multi-class
classification
Machine Learning
Andrew Ng
41
Multiple output units: One-vs-all.
Pedestrian Car Motorcycle Truck
Want , , , etc.
when pedestrian when car when motorcycle
Andrew Ng
42
Multiple output units: One-vs-all.
Want , , , etc.
when pedestrian when car when motorcycle
Training set:
one of , , ,
pedestrian car motorcycle truck
Andrew Ng
Neural'Networks:'
Learning'
Cost'func5on'
Machine'Learning'
44
2
Neural'Network'(Classifica2on)'
total'no.'of'layers'in'network'
no.'of'units'(not'coun5ng'bias'unit)'in'
layer''
Example: S1= 3 ; S2= 5 ; S3= 5 ; S4 = SL= 4
Layer'1' Layer'2' Layer'3' Layer'4' L=4
Binary'classifica5on' Mul5Bclass'classifica5on'(K'classes)'
'' ' E.g.''''''''''',''''''''''''',''''''''''''''''','
' pedestrian''car''motorcycle'''truck'
''1'output'unit' ''K'output'units'
' '
SL = 1 SL = K ( K >= 3 )
Andrew'Ng'
45
3
46
Cost'func2on'
Logis5c'regression:'
'
'
'
instead of having just one, which is the compression output unit,
Neural'network:' we may have K
Andrew'Ng'
47
In the regularization part, after the square brackets, we must account for
multiple theta matrices. The number of columns in our current theta matrix is
equal to the number of nodes in our current layer (including the bias unit).
The number of rows in our current theta matrix is equal to the number of
nodes in the next layer (excluding the bias unit). As before with logistic
regression, we square every term.
Note:
- the double sum simply adds up the logistic regression costs calculated for
each cell in the output layer
- the triple sum simply adds up the squares of all the individual Θs in the
entire network.
- the i in the triple sum does not refer to training example i
Neural'Networks:'
Learning'
Backpropaga5on'
algorithm'
Machine'Learning'
49
5
Gradient'computa2on'
Need'code'to'compute:'
B  ''
B  ''
50
6
Gradient'computa2on'
Given'one'training'example'(''',''''):'
Forward'propaga5on:'
Layer'1' Layer'2' Layer'3' Layer'4'
This is our vectorized implementation of

forward propagation and it allows us to
compute the activation values for all of the
neurons in our neural network.
51
7
In order to compute the derivatives, we're

going to use an algorithm called back
propagation.
The intuition of the back propagation algorithm is that for each node we're
going to compute the term delta that's going to represent the error of node
j in the layer l. This delta term is going to capture our error in the activation
of that neural duo.
52
8
if you think of “delta”, “a” and “y” as vectors then you can come up with a
vectorized implementation of it, which is just “delta 4” gets set as “a4” minus “y”.
Each of these “delta 4”, “a4” and “y”, is a vector whose dimension is equal to the
number of output units in our network.
So we've now computed the error term of “delta 4” for our network.
53
9
Gradient'computa2on:'Backpropaga2on'algorithm'
Intui5on:''''''''''''''“error”'of'node''''in'layer'''.'
For'each'output'unit'(layer'L'='4)'
Layer'1' Layer'2' Layer'3' Layer'4'
Ignoring the regularization

54
55
10
Backpropaga2on'algorithm'
Training'set'
Set''''''''''''''''''''(for'all'''''''''').' Used to compute
For'
Set'
Perform'forward'propaga5on'to'compute'''''''''for'''''''
Using''''''','compute'
Compute''
Vecto:
56
57
Neural'Networks:'
Learning'
Backpropaga5on'
intui5on'
Machine'Learning'
59
60
Forward'Propaga2on'
“error”'of'cost'for'''''''''(unit''''in'layer''').''
Formally, ' ''''''''''''''(for'''''''''''),'where''
14
Andrew'Ng'
Neural'Networks:'
Learning'
Pu`ng'it'
Machine'Learning'
together'
62
Training'a'neural'network'
16
Pick'a'network'architecture'(connec5vity'paaern'between'neurons)'
No.'of'input'units:'Dimension'of'features'
No.'output'units:'Number'of'classes'
Reasonable'default:'1'hidden'layer,'or'if'>1'hidden'layer,'have'same'no.'of'
hidden'units'in'every'layer'(usually'the'more'the'beaer)'
Andrew'Ng'
63
17
1.  Randomly'ini5alize'weights'
2.  Implement'forward'propaga5on'to'get'''''''''''''''for'any'''
3.  Implement'code'to'compute'cost'func5on'
4.  Implement'backprop'to'compute'par5al'deriva5ves'
for i = 1:m
Perform'forward'propaga5on'and'backpropaga5on'using'
example'
(Get'ac5va5ons''''''''and'delta'terms'''''''for '''''''''''').'
Andrew'Ng'
64
18
5.  Use'gradient'checking'to'compare'''''''''''''''''''computed'using'
backpropaga5on'vs.'using''numerical'es5mate'of'gradient''''''''''
of''''''''''.'
Then'disable'gradient'checking'code.'
6.  Use'gradient'descent'or'advanced'op5miza5on'method'with'
backpropaga5on'to'try'to''minimize''''''''''as'a'func5on'of'
parameters'
Andrew'Ng'

Slide 7 - Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slide 7 - Neural Networks

Uploaded by

Copyright:

Available Formats

Neural Networks:

But the camera sees this:

Cars Not a car

Raw image 50 x 50 pixel images→ 2500 pixels

pixel 2500 intensity

Cars pixel 1 Quadratic features ( ): ≈3 million

Auditory cortex learns to see

[Roe et al., 1992] Andrew Ng

Somatosensory cortex learns to see

[Metin & Frost, 1989] Andrew Ng

[Credit: US National Institutes of Health, National Institute on Aging] Andrew Ng

Sigmoid (logistic) activation function.

Layer 1 Layer 2 Layer 3

If network has units in layer , units in layer , then

Layer 1 Layer 2 Layer 3

The features a1, a2, a3, they themselves are learned as

Concretely, the function mapping from layer 1 to layer 2, that is determined by

Layer 1 Layer 2 Layer 3 Layer 4

when you have multiple layers you have relatively

Layer 1 Layer 2 Layer 3 Layer 4

[Courtesy of Yann LeCun] Andrew Ng

Pedestrian Car Motorcycle Truck

Layer'1' Layer'2' Layer'3' Layer'4'

This is our vectorized implementation of

In order to compute the derivatives, we're

Layer'1' Layer'2' Layer'3' Layer'4'

Ignoring the regularization

You might also like