Professional Documents
Culture Documents
Representation
Non-linear
hypotheses
Machine Learning
2
Non-linear Classification
x2
x1
size
# bedrooms
# floors
age
Andrew Ng
3
What is this?
You see this:
Andrew Ng
4
Computer Vision: Car detection
Testing:
What is this?
Andrew Ng
pixel 1
5
Learning
Algorithm
pixel 2
Raw image
pixel 2
Cars pixel 1
“Non”-Cars Andrew Ng
pixel 1
6
Learning
Algorithm
pixel 2
pixel 1 intensity
pixel 2 intensity
Andrew Ng
9
The “one learning algorithm” hypothesis
Auditory Cortex
Somatosensory Cortex
Andrew Ng
12
Neurons in the brain
Andrew Ng
14
Neural Network
Add .
Vectorized Implementation
Andrew Ng
25
26
27
28
what this neural network is doing is
Neural Network learning its own features just like logistic regression, except
that rather than using the original
features x1, x2, x3, is using these
new features a1, a2, a3.
and therefore you can end up with a better hypotheses than if you were constrained
to use the raw features x1, x2 or x3 or if you will constrain to choose the
polynomial terms, x1, x2, x3, and so on. But instead, this algorithm has the
flexibility to try to learn whatever features at once, using these a1, a2, a3 in order
to feed into this last unit that's essentially a logistic regression.
Other network architectures
31
Andrew Ng
Neural Networks:
Representation
Examples and
intuitions I
Machine Learning
Non-linear classification example: XOR/XNOR
33
, are binary (0 or 1).
x2
x2
x1
x1
We'd like to learn a non-linear division of
boundary that may separate the positive
and negative examples.
Andrew Ng
34
Simple example: AND 1.0
0 0
0 1
1 0
1 1
Andrew Ng
35
Example: OR function
-10
20 0 0
20 0 1
1 0
1 1
Andrew Ng
36
Negation:
0
1
Andrew Ng
Putting it together:
37
-30 10 -10
20 -20 20
20 -20 20
0 0
0 1
1 0
1 1
Andrew Ng
38
38
Andrew Ng
39
Handwritten digit classification
Andrew Ng
41
Multiple output units: One-vs-all.
Want , , , etc.
when pedestrian when car when motorcycle
Andrew Ng
42
Multiple output units: One-vs-all.
Want , , , etc.
when pedestrian when car when motorcycle
Training set:
one of , , ,
pedestrian car motorcycle truck
Andrew Ng
Neural'Networks:'
Learning'
Cost'func5on'
Machine'Learning'
44
2
Neural'Network'(Classifica2on)'
total'no.'of'layers'in'network'
no.'of'units'(not'coun5ng'bias'unit)'in'
layer''
Example: S1= 3 ; S2= 5 ; S3= 5 ; S4 = SL= 4
Layer'1' Layer'2' Layer'3' Layer'4' L=4
Binary'classifica5on' Mul5Bclass'classifica5on'(K'classes)'
'' ' E.g.''''''''''',''''''''''''',''''''''''''''''','
' pedestrian''car''motorcycle'''truck'
''1'output'unit' ''K'output'units'
' '
SL = 1 SL = K ( K >= 3 )
Andrew'Ng'
45
3
46
Cost'func2on'
Logis5c'regression:'
'
'
'
instead of having just one, which is the compression output unit,
Neural'network:' we may have K
Andrew'Ng'
47
In the regularization part, after the square brackets, we must account for
multiple theta matrices. The number of columns in our current theta matrix is
equal to the number of nodes in our current layer (including the bias unit).
The number of rows in our current theta matrix is equal to the number of
nodes in the next layer (excluding the bias unit). As before with logistic
regression, we square every term.
Note:
- the double sum simply adds up the logistic regression costs calculated for
each cell in the output layer
- the triple sum simply adds up the squares of all the individual Θs in the
entire network.
- the i in the triple sum does not refer to training example i
Neural'Networks:'
Learning'
Backpropaga5on'
algorithm'
Machine'Learning'
49
5
Gradient'computa2on'
Need'code'to'compute:'
B ''
B ''
50
6
Gradient'computa2on'
Given'one'training'example'(''',''''):'
Forward'propaga5on:'
The intuition of the back propagation algorithm is that for each node we're
going to compute the term delta that's going to represent the error of node
j in the layer l. This delta term is going to capture our error in the activation
of that neural duo.
52
8
if you think of “delta”, “a” and “y” as vectors then you can come up with a
vectorized implementation of it, which is just “delta 4” gets set as “a4” minus “y”.
Each of these “delta 4”, “a4” and “y”, is a vector whose dimension is equal to the
number of output units in our network.
So we've now computed the error term of “delta 4” for our network.
53
9
Gradient'computa2on:'Backpropaga2on'algorithm'
Intui5on:''''''''''''''“error”'of'node''''in'layer'''.'
For'each'output'unit'(layer'L'='4)'
Forward'Propaga2on'
“error”'of'cost'for'''''''''(unit''''in'layer''').''
Formally, ' ''''''''''''''(for'''''''''''),'where''
14
Andrew'Ng'
Neural'Networks:'
Learning'
Pu`ng'it'
Machine'Learning'
together'
62
Training'a'neural'network'
16
Pick'a'network'architecture'(connec5vity'paaern'between'neurons)'
No.'of'input'units:'Dimension'of'features'
No.'output'units:'Number'of'classes'
Reasonable'default:'1'hidden'layer,'or'if'>1'hidden'layer,'have'same'no.'of'
hidden'units'in'every'layer'(usually'the'more'the'beaer)'
Andrew'Ng'
63
17
Training'a'neural'network'
1. Randomly'ini5alize'weights'
2. Implement'forward'propaga5on'to'get'''''''''''''''for'any'''
3. Implement'code'to'compute'cost'func5on'
4. Implement'backprop'to'compute'par5al'deriva5ves'
for i = 1:m
Perform'forward'propaga5on'and'backpropaga5on'using'
example'
(Get'ac5va5ons''''''''and'delta'terms'''''''for '''''''''''').'
Andrew'Ng'
64
18
Training'a'neural'network'
5. Use'gradient'checking'to'compare'''''''''''''''''''computed'using'
backpropaga5on'vs.'using''numerical'es5mate'of'gradient''''''''''
of''''''''''.'
Then'disable'gradient'checking'code.'
6. Use'gradient'descent'or'advanced'op5miza5on'method'with'
backpropaga5on'to'try'to''minimize''''''''''as'a'func5on'of'
parameters'
Andrew'Ng'