You are on page 1of 64

Neural Networks:

Representation
Non-linear
hypotheses
Machine Learning
2
Non-linear Classification

x2

x1
size
# bedrooms
# floors
age

Andrew Ng
3
What is this?
You see this:

But the camera sees this:

Andrew Ng
4
Computer Vision: Car detection

Cars Not a car

Testing:

What is this?
Andrew Ng
pixel 1
5
Learning
Algorithm
pixel 2

Raw image
pixel 2

Cars pixel 1
“Non”-Cars Andrew Ng
pixel 1
6
Learning
Algorithm
pixel 2

Raw image 50 x 50 pixel images→ 2500 pixels


pixel 2 (7500 if RGB)

pixel 1 intensity
pixel 2 intensity

pixel 2500 intensity

Cars pixel 1 Quadratic features ( ): ≈3 million


“Non”-Cars features
Andrew Ng
Neural Networks:
Representation
Model
representation I
Machine Learning
8
Neural Networks
Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s and early 90s; popularity
diminished in late 90s.
Recent resurgence: State-of-the-art technique for many
applications

Andrew Ng
9
The “one learning algorithm” hypothesis

Auditory Cortex

Auditory cortex learns to see

[Roe et al., 1992] Andrew Ng


10
The “one learning algorithm” hypothesis

Somatosensory Cortex

Somatosensory cortex learns to see

[Metin & Frost, 1989] Andrew Ng


11
Neuron in the brain

Andrew Ng
12
Neurons in the brain

[Credit: US National Institutes of Health, National Institute on Aging] Andrew Ng


13
Neuron model: Logistic unit

Sigmoid (logistic) activation function.

Andrew Ng
14
Neural Network

Layer 1 Layer 2 Layer 3


Andrew Ng
15
Neural Network
“activation” of unit in layer
matrix of weights controlling
function mapping from layer to
layer

If network has units in layer , units in layer , then


will be of dimension .
Andrew Ng
16
17
Neural Networks:
Representation
Model
representation II
Machine Learning
19
20
21
22
23
24
Forward propagation: Vectorized implementation
Input —> Hidden —> Output
Forward propagation

Add .

Vectorized Implementation
Andrew Ng
25
26
27
28
what this neural network is doing is
Neural Network learning its own features just like logistic regression, except
that rather than using the original
features x1, x2, x3, is using these
new features a1, a2, a3.

Layer 1 Layer 2 Layer 3


28
Andrew Ng
29

The features a1, a2, a3, they themselves are learned as


functions of the input. The function mapping from layer 1 to
layer 2, that is determined by some other set of parameters
theta 1.
30

Concretely, the function mapping from layer 1 to layer 2, that is determined by


some other set of parameters, theta 1. So it's as if the neural network, instead of
being constrained to feed the features x1, x2, x3 to logistic regression, It gets to
learn its own features, a1, a2, a3, to feed into the logistic regression and as you can
imagine depending on what parameters it chooses for theta 1, You can learn some
pretty interesting and complex features

and therefore you can end up with a better hypotheses than if you were constrained
to use the raw features x1, x2 or x3 or if you will constrain to choose the
polynomial terms, x1, x2, x3, and so on. But instead, this algorithm has the
flexibility to try to learn whatever features at once, using these a1, a2, a3 in order
to feed into this last unit that's essentially a logistic regression.
Other network architectures
31

Layer 1 Layer 2 Layer 3 Layer 4

Andrew Ng
Neural Networks:
Representation
Examples and
intuitions I
Machine Learning
Non-linear classification example: XOR/XNOR
33
, are binary (0 or 1).

x2
x2

x1

x1
We'd like to learn a non-linear division of
boundary that may separate the positive
and negative examples.
Andrew Ng
34
Simple example: AND 1.0

0 0
0 1
1 0
1 1

Andrew Ng
35
Example: OR function

-10

20 0 0
20 0 1
1 0
1 1

Andrew Ng
36

Negation:

0
1

Andrew Ng
Putting it together:
37

-30 10 -10

20 -20 20
20 -20 20

0 0
0 1
1 0
1 1

Andrew Ng
38

when you have multiple layers you have relatively


Neural Network intuition simple function of the inputs of the second layer. But
the third layer is built on to compute more complex
functions, and then the layer after can compute
even more complex functions.

Layer 1 Layer 2 Layer 3 Layer 4

38
Andrew Ng
39
Handwritten digit classification

[Courtesy of Yann LeCun] Andrew Ng


Neural Networks:
Representation
Multi-class
classification
Machine Learning

Andrew Ng
41
Multiple output units: One-vs-all.

Pedestrian Car Motorcycle Truck

Want , , , etc.
when pedestrian when car when motorcycle
Andrew Ng
42
Multiple output units: One-vs-all.

Want , , , etc.
when pedestrian when car when motorcycle
Training set:

one of , , ,
pedestrian car motorcycle truck
Andrew Ng
Neural'Networks:'
Learning'
Cost'func5on'
Machine'Learning'
44
2
Neural'Network'(Classifica2on)'
total'no.'of'layers'in'network'
no.'of'units'(not'coun5ng'bias'unit)'in'
layer''
Example: S1= 3 ; S2= 5 ; S3= 5 ; S4 = SL= 4
Layer'1' Layer'2' Layer'3' Layer'4' L=4

Binary'classifica5on' Mul5Bclass'classifica5on'(K'classes)'
'' ' E.g.''''''''''',''''''''''''',''''''''''''''''','

' pedestrian''car''motorcycle'''truck'

''1'output'unit' ''K'output'units'
' '
SL = 1 SL = K ( K >= 3 )

Andrew'Ng'
45
3
46
Cost'func2on'
Logis5c'regression:'
'
'
'
instead of having just one, which is the compression output unit,
Neural'network:' we may have K

Andrew'Ng'
47

In the regularization part, after the square brackets, we must account for
multiple theta matrices. The number of columns in our current theta matrix is
equal to the number of nodes in our current layer (including the bias unit).
The number of rows in our current theta matrix is equal to the number of
nodes in the next layer (excluding the bias unit). As before with logistic
regression, we square every term.

Note:

- the double sum simply adds up the logistic regression costs calculated for
each cell in the output layer
- the triple sum simply adds up the squares of all the individual Θs in the
entire network.
- the i in the triple sum does not refer to training example i
Neural'Networks:'
Learning'
Backpropaga5on'
algorithm'
Machine'Learning'
49
5
Gradient'computa2on'

Need'code'to'compute:'
B  ''
B  ''
50
6
Gradient'computa2on'
Given'one'training'example'(''',''''):'
Forward'propaga5on:'

Layer'1' Layer'2' Layer'3' Layer'4'

This is our vectorized implementation of


forward propagation and it allows us to
compute the activation values for all of the
neurons in our neural network.
51
7

In order to compute the derivatives, we're


going to use an algorithm called back
propagation.

The intuition of the back propagation algorithm is that for each node we're
going to compute the term delta that's going to represent the error of node
j in the layer l. This delta term is going to capture our error in the activation
of that neural duo.
52
8

if you think of “delta”, “a” and “y” as vectors then you can come up with a
vectorized implementation of it, which is just “delta 4” gets set as “a4” minus “y”.

Each of these “delta 4”, “a4” and “y”, is a vector whose dimension is equal to the
number of output units in our network.

So we've now computed the error term of “delta 4” for our network.
53
9
Gradient'computa2on:'Backpropaga2on'algorithm'
Intui5on:''''''''''''''“error”'of'node''''in'layer'''.'

For'each'output'unit'(layer'L'='4)'

Layer'1' Layer'2' Layer'3' Layer'4'

Ignoring the regularization


54
55
10
Backpropaga2on'algorithm'
Training'set'
Set''''''''''''''''''''(for'all'''''''''').' Used to compute
For'
Set'
Perform'forward'propaga5on'to'compute'''''''''for'''''''
Using''''''','compute'
Compute''
Vecto:
56
57
Neural'Networks:'
Learning'
Backpropaga5on'
intui5on'
Machine'Learning'
59
60

Forward'Propaga2on'

“error”'of'cost'for'''''''''(unit''''in'layer''').''
Formally, ' ''''''''''''''(for'''''''''''),'where''
14
Andrew'Ng'
Neural'Networks:'
Learning'
Pu`ng'it'
Machine'Learning'
together'
62
Training'a'neural'network'
16
Pick'a'network'architecture'(connec5vity'paaern'between'neurons)'

No.'of'input'units:'Dimension'of'features'
No.'output'units:'Number'of'classes'
Reasonable'default:'1'hidden'layer,'or'if'>1'hidden'layer,'have'same'no.'of'
hidden'units'in'every'layer'(usually'the'more'the'beaer)'

Andrew'Ng'
63
17
Training'a'neural'network'
1.  Randomly'ini5alize'weights'
2.  Implement'forward'propaga5on'to'get'''''''''''''''for'any'''
3.  Implement'code'to'compute'cost'func5on'
4.  Implement'backprop'to'compute'par5al'deriva5ves'
for i = 1:m
Perform'forward'propaga5on'and'backpropaga5on'using'
example'
(Get'ac5va5ons''''''''and'delta'terms'''''''for '''''''''''').'

Andrew'Ng'
64
18
Training'a'neural'network'
5.  Use'gradient'checking'to'compare'''''''''''''''''''computed'using'
backpropaga5on'vs.'using''numerical'es5mate'of'gradient''''''''''
of''''''''''.'
Then'disable'gradient'checking'code.'
6.  Use'gradient'descent'or'advanced'op5miza5on'method'with'
backpropaga5on'to'try'to''minimize''''''''''as'a'func5on'of'
parameters'

Andrew'Ng'

You might also like