You are on page 1of 37

CS 4104

APPLIED MACHINE LEARNING

Dr. Hashim Yasin


National University of Computer
and Emerging Sciences,
Faisalabad, Pakistan.
PERCEPTRON-RECAP
AND OR Functions
3

Dr. Hashim Yasin Applied Machine Learning (CS4104)


AND Function
4

X1 X2 Y
0 0 0
0 1 0
1 0 0
1 1 1

Dr. Hashim Yasin Applied Machine Learning (CS4104)


OR Function
5

X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1

Dr. Hashim Yasin Applied Machine Learning (CS4104)


XOR Function
6

Dr. Hashim Yasin Applied Machine Learning (CS4104)


MULTILAYER NETWORKS
Multilayer Perceptron Architecture
8

MLP used to describe any general feedforward (no


recurrent connections) network

Input Output
layer layer

Hidden Layers

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Multilayer Networks
9

Network Topology:
w11 o1 w13 2 hidden nodes
x1
1 output
w12 w01
o3 y
w21 x01=1 w23
Weights:
w22
w03 w11 = w12 = 1
x2 w21 = w22 = 1
o2 o01=1
w01 = −1.5
w02 w02 = −0.5
Desired behavior: x02=1 w13 = −1
𝐱 𝟏 𝐱 𝟐 𝐨𝟏 𝐨𝟐 𝐲 w23 = 1
Hidden Layer
0 0 0 0 0 w03 = −0.5
1 0 0 1 1
0 1 0 1 1 Piecewise linear classification using an MLP
1 1 1 1 0 with threshold (perceptron) units
Dr. Hashim Yasin Applied Machine Learning (CS4104)
Multilayer Networks
10

 The single perceptron can only express linear


decision surfaces.

 The kind of multilayer networks learned by the


back propagation algorithm are capable of
expressing a rich variety of nonlinear decision
surfaces.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Multilayer Networks… Example
11

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Multilayer Networks… Example
12

Example:
 The speech recognition task involves distinguishing

among 10 possible vowels, all spoken in the context


of "h-d" (i.e., "hid," "had," "head," "hood," etc.).

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Multilayer Networks… Example
13

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Multilayer Networks… Example
14

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Multilayer Networks
15

 What type of unit shall we use as the basis for


constructing multilayer networks?

 Can we use the delta/gradient descent learning rule?


 multi-layers of linear units… multiple layers of cascaded
linear units still produce only linear functions, and we prefer
networks capable of representing highly nonlinear functions.

 The perceptron unit is another possible choice, is it?


 its discontinuous threshold makes it undifferentiable and
hence unsuitable for gradient descent.
Dr. Hashim Yasin Applied Machine Learning (CS4104)
Multilayer Networks
16

Solution:
 One solution is the sigmoid unit:

 a unit very much like a perceptron, but based

on a smoothed, differentiable threshold


function.
 Like the perceptron, the sigmoid unit,

 firstcomputes a linear combination of its inputs,


 then applies a threshold to the result. However, the
threshold output is a continuous function of its input.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Multilayer Networks
17

 In case of sigmoid unit, however, the threshold output


is a continuous function of its input.
 More precisely, the sigmoid unit computes its output 𝒐
as,

 𝜎 is often called the sigmoid function or, alternatively,


the logistic function.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Sigmoid Threshold Unit
18

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Sigmoid Function
19

 Sigmoid function maps a very large input domain to


a small range of outputs, it is often referred to as
the squashing function of the unit.
 The sigmoid function has the useful property that its
derivative is easily expressed in terms of its
output.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Sigmoid Function
20

 Sigmoid function exists between 0 and 1.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Sigmoid Function
21

 The term e-y in the sigmoid function definition is


sometimes replaced by e-k.y

 where k is some positive constant that


determines the steepness.

 The function tanh is also sometimes used in place of


the sigmoid function.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Tangent Function
22

 The tangent function tanh is:


𝑒 𝑥 + 𝑒 −𝑥
tanh 𝑥 = 𝑥
𝑒 − 𝑒 −𝑥

 Its derivative can also easily expressed as;

tanh′ 𝑥 = 1 − tanh2 (𝑥)

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Tangent Function
23

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Sigmoid vs Tangent
24

 Smooth gradient,  Zero centered—


preventing “jumps” in making it easier to
output values. model inputs that have
 Output values strongly negative,
bound between 0 and neutral, and strongly
1, normalizing the positive values.
output of each neuron.  Its range is between -1
to 1.

Dr. Hashim Yasin Dr. Hashim Advanced


Yasin Machine Learning (CS622)
Applied Machine Learning (CS4104)
Sigmoid vs Tangent
25

Dr. Hashim Yasin Applied Machine Learning (CS4104)


BACK PROPAGATION
ALGORITHM
Multilayer Perceptron Architecture
27

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Multilayer Network Architecture
28

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Multilayer Network Architecture
29

Dr. Hashim Yasin Applied Machine Learning (CS4104)


The Back Propagation Algorithm
30

The Back Propagation algorithm has two phases:


 Forward pass phase: computes ‘functional signal’,

feed forward propagation of input pattern signals


through network
 Backward pass phase: computes ‘error signal’,

propagates the error backwards through network


starting at output units
 (where the error is the difference between actual and
desired output values)

Dr. Hashim Yasin Applied Machine Learning (CS4104)


The Back Propagation Algorithm
31

Conceptually: Forward Activity -


Backward Error
Dr. Hashim Yasin Applied Machine Learning (CS4104)
The Back Propagation Algorithm
32

 The back propagation algorithm learns the weights for


a multilayer network,
 given a network with a fixed set of units and
interconnections.

 It employs gradient descent to attempt to minimize the


squared error between the network output values and
the target values for these outputs.

 As we are considering networks with multiple output


units, we begin by redefining E to sum the errors over
all of the network output units.
Dr. Hashim Yasin Applied Machine Learning (CS4104)
The Back Propagation Algorithm
33

 where outputs is the set of output units in the


network, and 𝒕𝒌𝒅 and 𝑶𝒌𝒅 are the target and
output values associated with the kth output unit and
training example d.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


The Back Propagation Algorithm
34

Dr. Hashim Yasin Applied Machine Learning (CS4104)


The Back Propagation Algorithm
35

Dr. Hashim Yasin Applied Machine Learning (CS4104)


Reading Material
36

 Artificial Intelligence, A Modern Approach


Stuart J. Russell and Peter Norvig
 Chapter 18.
 Machine Learning
Tom M. Mitchell
 Chapter 4.

Dr. Hashim Yasin Applied Machine Learning (CS4104)


37

Dr. Hashim Yasin Applied Machine Learning (CS4104)

You might also like