Radial Basis Function (RBF) Networks

Radial Basis Function (RBF) Networks
1. They are two-layer feed-forward networks.

2. The hidden nodes implement a set of radial basis
functions (e.g. Gaussian functions).
3. The output nodes implement linear summation
functions as in an MLP.
4. The network training is divided into two stages:
first the weights from the input to hidden layer are
determined, and then the weights from the hidden
to output layer.
5. The training/learning is very fast.
6. The networks are very good at interpolation.
There is considerable evidence that neurons in visual
cortex are tuned to local regions in the retina. They
are maximally sensitive to some specific stimulus, &
their output falls off as presented stimulus moves
away from this “best” stimulus.
Gaussian basis functions
• Each RBF node in the hidden layer responds to input
only in some subspace of the input space. When input
is far away from its own center µ, (many radii
(standard deviations, σ ) away), then the output of that
unit will be so small as to be ignorable
• Each RBF has a receptive field, that is, an area of the
input space to which it responds
BIOLOGICAL PLAUIBILITY:
• RBF is more BIOLOGICALLY PLAUSIBLE, since
many sensory neurons respond only to some small
subspace of the input space, and are silent in response
to all other inputs.
Implementing XOR
  0 .5
2
When mapped into the feature space (z1, z2) , the two
classes become linearly separable.
Training RBF nets
Typically, the weights of the two layers are determined
separately, i.e. find RBF weights, and then find output layer
weights
Hidden layer
– estimate parameters for each hidden unit k (whose output depends
on distance between input and a stored prototype)
e.g. for Gaussian activation function, estimate parameters: µk, σk2
– This stage involves an Unsupervised training process (no targets available)
Output layer
– set the weights (including bias weights)
– the same as training a single layer perceptron: each unit’s output depends on
weighted sum of inputs,
– using for example, the Delta Rule (DR)
– This stage involves a Supervised training process
Clustering
K-Means Approach
1. Select k multidimensional points to be the
“seeds” or initial centroids for the k clusters to be
formed. Seeds usually selected at random
2. Assign each observation to the cluster with the
nearest seed.
3. Update cluster centroids once all observations
have been assigned.
4. Repeat steps 2 and 3 until changes in cluster
centroids small.
5. Repeat steps 1-5 with new starting seeds. Do this
step 3 to 5 times.
K-Means Illustration – two dimensions
Fine Tuning
Computing the Output Weights
We want W (a weight matrix) such that

Target T = WX
Thus W= TX-1
If an inverse exists, then the error can be minimized
If no inverse exists, then use the pseudo-inverse to
get minimum error ‘Minimum-norm solution to a
linear system’
The pseudo-inverse is defined as
W=TX+
Where X+ = (XTX)-1 XT
RBF Performance
An MLP performs a global mapping, meaning all inputs cause an

output, while an RBF performs a local mapping, meaning only
inputs near a receptive field produce an activation
Width parameter σ . This is often set equal to a multiple of the

average distance between the centres.
Function Approximation Example:
(function approximation for differently chosen
width parameters)
1 1
Target function: y  x  x  3x  20
3 2
2 5
Type of Activation Function: Gaussian
Input Range: x = [-10:10]
Centers = [-8 -5 -2 0 2 5 8]
Function to be Approximated
30
25
20
15
10
Output
-5
-10
-15
-20
-10 -5 0 5 10
Input
Case 1: Width is chosen to equal 6. This way, the
receptive fields overlap, but no neuron function
covers the entire input space. For proper overlap,
the width parameter needs to be at least equal to the
distance between the input parameters.
Testing the RBF Network, w=6
100
50
0
Output
-50
-100
-150
-15 -10 -5 0 5 10 15
Input
Case 2: Width is chosen to equal 0.2 (too
small). This causes poor generalization
inside the training space
Testing the RBF Network, w=0.2

100
50
0
Output
-50
-100
-150
-15 -10 -5 0 5 10 15
Input
Case 3: Width is chosen to be 200 (too large). This
causes each radial basis function to cover the entire
input space, and is being activated for each input
value. Thus, the network cannot properly learn the
desired mapping.
Testing the RBF Network, w=200
100
50
0
Output
-50
-100
-150
-15 -10 -5 0 5 10 15
Input
XOR Problem
The relationship between the input and the
output of the network can be given by
where xj is an input vector and dj is the associated value of the desired

output.
g=[1 .1353 1;.3678 .3678 1;.1351 1 1;.3678 .3678 1];
d=[1 0 1 0]';
w=pinv(g)*d
Comparison of RBF and MLP networks
•An RBF network will usually have only 1 hidden layer, but a MLP
will usually have more than 1
•Usually hidden and output neurons in an MLP share the same
neuronal model, but this is not true of RBF networks
•The hidden layer of a RBF network is non linear and the output
layer is linear. In an MLP both layers are non linear
•The argument of the activation function in a hidden RBF network
neuron computes the Euclidean norm between the input vector and
the center of the unit. In an MLP the activation function calculates
the dot product (inner product) of the input vector and the synaptic
weight vector
•MLP's construct global approximations to non linear input - output
mapping but RBF networks produce local approximations (when
using a exponentially decaying function such as Gaussians)

Radial Basis Function (RBF) Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Radial Basis Function (RBF) Networks

Uploaded by

Copyright:

Available Formats

Radial Basis Function (RBF) Networks

1. They are two-layer feed-forward networks.

We want W (a weight matrix) such that

An MLP performs a global mapping, meaning all inputs cause an

Width parameter σ . This is often set equal to a multiple of the

Testing the RBF Network, w=0.2

where xj is an input vector and dj is the associated value of the desired

You might also like