You are on page 1of 17

Radial Basis Function (RBF)

Neural Networks

for the Senior Design Project

10/27/2004 1
Radial Basis Functions

• Functions, φ(r), whose output (φ) Example: Gaussian


depends on the distance (r) from
some center point ⎧⎪ − r 2 ⎫⎪
Φ (r ) = exp⎨ 2 ⎬
⎪⎩ 2σ ⎪⎭
– Output is large (≈ 1) for
input points near the center 1
(i.e., for small r) 0.75

φ(r)
0.5
– Output falls off rapidly (→ 0) 0.25
0
as input points move away
from the center (i.e., as r -2 -1 0 1 2
increases) r

• Used to form “clusters” in


σ is called the radius of the basis
pattern recognition (assuming
clusters exist in the pattern) function. (It is a measure of the
width of the Gaussian pulse.)
10/27/2004 2
RBF Architecture
• RBF Neural Networks are 2-layer, feed-forward networks.

• The 1st layer (hidden) is not a traditional neural network layer.

• The function of the 1st layer is to transform a non-linearly


separable set of input vectors to a linearly separable set.

• The second layer is then a simple feed-forward layer (e.g., of


Perceptron or ADALINE type neurons) that draws the hyperplane to
separate the classes.

10/27/2004 3
1st-Layer Functionality
• Example: X-OR Problem

p1 p2 X-OR
0 0 0
0 1 1
1 0 1
p2 1 1 0 φ2

1 1
p1 φ1(r1)
Layer 1
p2
(2 neurons)
0 p1 φ2(r2) 0 φ1
0 1 0 1
10/27/2004 4
1st Layer Architecture
Example: 2 Inputs, 2 Neurons

r1 ⎧⎪ − r12 ⎫⎪
distance Φ1 (r1 ) = exp⎨ 2 ⎬ φ1
p1 from p to C1 ⎪⎩ 2σ 1 ⎪⎭
Perceptron
or ADALINE
r2 ⎧⎪ − r22 ⎫⎪ Network
p2 distance
from p to C2
Φ 2 (r2 ) = exp⎨ 2 ⎬ φ2
⎪⎩ 2σ 2 ⎪⎭

1st Layer 2nd Layer


For each neuron in 1st Layer:
1. Compute the distance of the input “point” (p) from the
center of the cluster (C) representing that neuron

2. Find the radial basis function φ as of function of the


10/27/2004 distance r between the input and the cluster center. 5
X-OR Example (Mechanics Only)
• Assume 2 neurons in Layer 1, with centers: C1 = (0, 0), C2 = (1,1)
p2
C2 = (1, 1) • Find r1 for each point p = (p1, p2)
1 (distance to C1)

p1 • Find r2 for each point p = (p1, p2)


C1 = (0, 0) (distance to C2)
1

p1 p2 r12 r22 φ(r1) φ(r2) • Set σ: 2σ2 = 1 ⇒ σ2 = ½


0 0 0 2 1 .135 (In this example φ1 = φ2 = φ)
0 1 1 1 .368 .368
• Calculate φ(r1) and φ(r2) using
1 0 1 1 .368 .368
1 1 2 0 .135 1 φ(r) = exp(-r2)
10/27/2004 6
X-OR Example, Continued
• Result: p1 p2 r1 2 r2 2 φ(r1) φ(r2)
0 0 0 2 1 .135
0 1 1 1 .368 .368
1 0 1 1 .368 .368
1 1 2 0 .135 1

φ2
• Plot resulting points (φ1, φ2)
1
0.8
(Linearly Separable) 0.6 X-OR = 0
0.4 X-OR = 1
0.2
0 φ1
10/27/2004 7
0 0.2 0.4 0.6 0.8 1
Problems with X-OR Example
• # of neurons in the hidden layer, the centers of the neurons, and the
radius (σ) of the RBF’s were assumed known

• In most pattern recognition problems, the centers for the neurons


must be learned*

• Training of the RBF (i.e., finding the centers of the clusters) can be
done via
– the k-means Clustering Algorithm; or
– the Kohonen Algorithm (types of unsupervised
learning)
• We will use k-means Clustering, where the number of clusters, k, is
set in advanced

* Since, in traditional NN’s, the weights are the parameters that must be learned,
the centers are sometimes called the weights of RBF neurons.
10/27/2004 8
k-Means Clustering Algorithm
1. Randomly choose k points from the input training data to be initial
cluster centers

2. For each training data point, find the distance to each of the k
cluster centers

3. Choose the closest center for each training data point, and assign
the corresponding cluster number from 1 to k for the data point.

4. For each cluster i (i = 1, …, k), find the average of the assigned


training data points in that cluster. Make the average the new center
for the cluster.

5. Repeat steps 2 – 4, until the clusters have converged (i.e., until


none of the training data points are changing clusters.)
10/27/2004 9
Finding the Radius (σ) for the RBF’s
• Usually found with P-nearest-neighbor algorithm (often with P = 2)

• P-nearest-neighbor algorithm:

1. For each cluster center, find the P nearest cluster centers

2. For each neuron/cluster, Set σ = RMS distance between the


cluster center and its P nearest cluster centers.

10/27/2004 10
Practice Problem: k-Means Clustering, k = 5

7 Vectors p1 coord p2 coord


6
5 p1 0 5
4
p2

3 p2 0 6
2
1
0 p3 3 3

0 1 2 3 4 5 6 p4 5 0

p1 p5 0 0

p6 2 2
Step 1: k = 5 randomly selected
cluster centers: C1 = p1, …, C5 = p5 p7 5 5
10/27/2004 11
K-Means Clustering, Steps 2 -4
(2 Iterations)
distances new ave.
1st Iteration i Ci(x) Ci(y) p1 p2 p3 (3,3) p4 p5 p6 (2,2) p7 (5,5)

1 0 5 0 1

2 0 6 0

3 3 3 0 * * (10/3,10/3)

4 5 0 0

5 0 0 0

2nd Iteration 1 0 5 0 1

2 0 6 0

3 3.33 3.33 * * *

4 5 0 0

5 0 0 0
10/27/2004 Done 12
Finding the Radius (σ) for Each
Neuron/Cluster

distances squared nearest MS-dst RMS-dst 2 σ2

i Ci(x) Ci(y) C1 C2 C3 C4 C5

1 0 5 0 1 13.89 50 25 C2, C3 7.44 2.73 14.89

2 0 6 1 0 18.22 61 36 C1, C3 9.61 3.10 19.22

3 3.33 3.33 13.89 18.22 0 13.89 22.22 C1, C4 13.89 3.73 27.78

4 5 0 50 61 13.89 0 25 C3, C5 19.44 4.41 38.89

5 0 0 25 36 22.22 25 0 C1, C3 23.61 4.86 47.22

10/27/2004 13
1st Layer of RBF Network
for the Practice Problem
(0,5)
d(p, C1)
r1
{
Φ1 (r1 ) = exp − r12 / 14.9} φ1
(0,6)
d(p, C2)
r2
{ }
Φ 2 (r2 ) = exp − r22 / 19.2 φ2

(10/3, 10/3) Perceptron


p d(p, C3)
r3
{
Φ 3 (r3 ) = exp − r32 / 27.8 } φ3 or ADALINE
Network
(5, 0)
d(p, C4)
r4
{
Φ 4 (r4 ) = exp − r42 / 38.9 } φ4
(0, 0)
d(p, C5)
r5
{
Φ 5 (r5 ) = exp − r52 / 47.2 } φ5

10/27/2004 14
Outputs from Layer 1 = Inputs for Layer 2

5-dim. Output vectors from Layer 1

i pi(x) pi(y) r12 r22 r32 r42 r52 φ1 φ2 φ3 φ4 φ5

1 0 5 0 1 13.89 50 25 1 0.95 0.61 0.28 0.59

2 0 6 1 0 18.22 61 36 0.94 1 0.52 0.21 0.47

3 3 3 13 18 0.22 13 18 0.42 0.39 0.99 0.72 0.68

4 5 0 50 61 13.89 0 25 0.03 0.04 0.61 1 0.59

5 0 0 25 36 22.22 25 0 0.19 0.15 0.45 0.53 1

6 2 2 13 20 3.56 13 8 0.42 0.35 0.88 0.72 0.84

7 5 5 25 26 5.56 25 50 0.19 0.26 0.82 0.53 0.35

5-dim. Input vectors for Layer 2


10/27/2004 15
Layer 2: ADALINE or Perceptron
• Perceptron • ADALINE (ADAptive Linear
1. Initialize W, b with 0’s Neuron)
2. Apply first input vector: 1. Initialize W, b with 0’s
output a = hardlim(Wp+b) 2. Apply first input vector:
3. Update: Wnew = Wold +(t-a)pt output a = Wp+b
bnew = bold + (t-a) 3. Update: Wnew = Wold +(t-a)pt
4. Repeat steps 2 and 3 for other bnew = bold + (t-a)
input vectors, until weights & 4. Repeat steps 2 and 3 for other
biases converge input vectors, until weights &
5. Check: a = hardlim(Wp + b) for all biases converge
inputs 5. Find outputs: a = hardlims(Wp +
b) for all inputs

10/27/2004 16
Layer 2: Perceptron or ADALINE
Single Multiple-Input Neuron
Input
p
W
5x1 n a
Perceptron 1x5
+ hardlim
1 1x1 1x1
b
5 1x1 a = hardlim(Wp + b)

Single Multiple-Input Neuron


Input
p
ADALINE W
5x1 n a
1x5 ft = purelin
+
1 b 1 x 1 fcl = hardlims 1 x 1
5
10/27/2004
1x1 17

You might also like