11 ANN (Backpropagation)

CS 4104
APPLIED MACHINE LEARNING
Dr. Hashim Yasin

National University of Computer
and Emerging Sciences,
Faisalabad, Pakistan.
PERCEPTRON-RECAP
AND OR Functions
3
Dr. Hashim Yasin Applied Machine Learning (CS4104)

AND Function
4
X1 X2 Y
0 0 0
0 1 0
1 0 0
1 1 1

OR Function
5
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1

XOR Function
6

MULTILAYER NETWORKS
Multilayer Perceptron Architecture
8
MLP used to describe any general feedforward (no

recurrent connections) network
Input Output
layer layer
Hidden Layers

Multilayer Networks
9
Network Topology:
w11 o1 w13 2 hidden nodes
x1
1 output
w12 w01
o3 y
w21 x01=1 w23
Weights:
w22
w03 w11 = w12 = 1
x2 w21 = w22 = 1
o2 o01=1
w01 = −1.5
w02 w02 = −0.5
Desired behavior: x02=1 w13 = −1
𝐱 𝟏 𝐱 𝟐 𝐨𝟏 𝐨𝟐 𝐲 w23 = 1
Hidden Layer
0 0 0 0 0 w03 = −0.5
1 0 0 1 1
0 1 0 1 1 Piecewise linear classification using an MLP
1 1 1 1 0 with threshold (perceptron) units
Multilayer Networks
10
 The single perceptron can only express linear

decision surfaces.
 The kind of multilayer networks learned by the

back propagation algorithm are capable of
expressing a rich variety of nonlinear decision
surfaces.

Multilayer Networks… Example
11

12
Example:
 The speech recognition task involves distinguishing
among 10 possible vowels, all spoken in the context

of "h-d" (i.e., "hid," "had," "head," "hood," etc.).

13

14

Multilayer Networks
15
 What type of unit shall we use as the basis for

constructing multilayer networks?
 Can we use the delta/gradient descent learning rule?

 multi-layers of linear units… multiple layers of cascaded
linear units still produce only linear functions, and we prefer
networks capable of representing highly nonlinear functions.
 The perceptron unit is another possible choice, is it?

 its discontinuous threshold makes it undifferentiable and
hence unsuitable for gradient descent.
Multilayer Networks
16
Solution:
 One solution is the sigmoid unit:
 a unit very much like a perceptron, but based
on a smoothed, differentiable threshold

function.
 Like the perceptron, the sigmoid unit,
 firstcomputes a linear combination of its inputs,

 then applies a threshold to the result. However, the
threshold output is a continuous function of its input.

Multilayer Networks
17
 In case of sigmoid unit, however, the threshold output

is a continuous function of its input.
 More precisely, the sigmoid unit computes its output 𝒐
as,
 𝜎 is often called the sigmoid function or, alternatively,

the logistic function.

Sigmoid Threshold Unit
18

Sigmoid Function
19
 Sigmoid function maps a very large input domain to

a small range of outputs, it is often referred to as
the squashing function of the unit.
 The sigmoid function has the useful property that its
derivative is easily expressed in terms of its
output.

Sigmoid Function
20
 Sigmoid function exists between 0 and 1.

Sigmoid Function
21
 The term e-y in the sigmoid function definition is

sometimes replaced by e-k.y
 where k is some positive constant that

determines the steepness.
 The function tanh is also sometimes used in place of

the sigmoid function.

Tangent Function
22
 The tangent function tanh is:

𝑒 𝑥 + 𝑒 −𝑥
tanh 𝑥 = 𝑥
𝑒 − 𝑒 −𝑥
 Its derivative can also easily expressed as;
tanh′ 𝑥 = 1 − tanh2 (𝑥)

Tangent Function
23

Sigmoid vs Tangent
24
 Smooth gradient,  Zero centered—

preventing “jumps” in making it easier to
output values. model inputs that have
 Output values strongly negative,
bound between 0 and neutral, and strongly
1, normalizing the positive values.
output of each neuron.  Its range is between -1
to 1.
Dr. Hashim Yasin Dr. Hashim Advanced

Yasin Machine Learning (CS622)
Applied Machine Learning (CS4104)
Sigmoid vs Tangent
25

BACK PROPAGATION
ALGORITHM
Multilayer Perceptron Architecture
27

Multilayer Network Architecture
28

Multilayer Network Architecture
29

The Back Propagation Algorithm
30
The Back Propagation algorithm has two phases:

 Forward pass phase: computes ‘functional signal’,
feed forward propagation of input pattern signals

through network
 Backward pass phase: computes ‘error signal’,
propagates the error backwards through network

starting at output units
 (where the error is the difference between actual and
desired output values)

31
Conceptually: Forward Activity -

Backward Error
32
 The back propagation algorithm learns the weights for

a multilayer network,
 given a network with a fixed set of units and
interconnections.
 It employs gradient descent to attempt to minimize the

squared error between the network output values and
the target values for these outputs.
 As we are considering networks with multiple output

units, we begin by redefining E to sum the errors over
all of the network output units.
33
 where outputs is the set of output units in the

network, and 𝒕𝒌𝒅 and 𝑶𝒌𝒅 are the target and
output values associated with the kth output unit and
training example d.

34

35

Reading Material
36
 Artificial Intelligence, A Modern Approach

Stuart J. Russell and Peter Norvig
 Chapter 18.
 Machine Learning
Tom M. Mitchell
 Chapter 4.

37

11 ANN (Backpropagation)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

11 ANN (Backpropagation)

Uploaded by

Copyright:

Available Formats

CS 4104

APPLIED MACHINE LEARNING

Dr. Hashim Yasin

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

MLP used to describe any general feedforward (no

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 The single perceptron can only express linear

 The kind of multilayer networks learned by the

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

among 10 possible vowels, all spoken in the context

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 What type of unit shall we use as the basis for

 Can we use the delta/gradient descent learning rule?

 The perceptron unit is another possible choice, is it?

 a unit very much like a perceptron, but based

on a smoothed, differentiable threshold

 firstcomputes a linear combination of its inputs,

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 In case of sigmoid unit, however, the threshold output

 𝜎 is often called the sigmoid function or, alternatively,

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 Sigmoid function maps a very large input domain to

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 Sigmoid function exists between 0 and 1.

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 The term e-y in the sigmoid function definition is

 where k is some positive constant that

 The function tanh is also sometimes used in place of

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 The tangent function tanh is:

 Its derivative can also easily expressed as;

tanh′ 𝑥 = 1 − tanh2 (𝑥)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 Smooth gradient,  Zero centered—

Dr. Hashim Yasin Dr. Hashim Advanced

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

The Back Propagation algorithm has two phases:

feed forward propagation of input pattern signals

propagates the error backwards through network

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Conceptually: Forward Activity -

 The back propagation algorithm learns the weights for

 It employs gradient descent to attempt to minimize the

 As we are considering networks with multiple output

 where outputs is the set of output units in the

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

 Artificial Intelligence, A Modern Approach

Dr. Hashim Yasin Applied Machine Learning (CS4104)

Dr. Hashim Yasin Applied Machine Learning (CS4104)

You might also like