Professional Documents
Culture Documents
f1
f2
Typical Applications of NN
Pattern Classification
x X Rm
l f ( x)
l C N
Function Approximation
x X Rn
y f ( x)
y Y Rm
Time-Series Forecasting
x(t ) f (xt 1 , xt 2 , xt 3 , )
Function Approximation
Unknown f : X Y
X R m
f YR n
Approximator fˆ : X Y
X R m
ˆf YR n
Supervised Learning
Unknown
Function
yi +
xi +
yˆ
i 1, 2, i
Neural
Network ei
Neural Networks as
Universal Approximators
Feedforward neural networks with a single hidden layer of
sigmoidal units are capable of approximating uniformly any
continuous multivariate function, to any desired degree of
accuracy.
– Hornik, K., Stinchcombe, M., and White, H. (1989). "Multilayer
Feedforward Networks are Universal Approximators," Neural Networks,
2(5), 359-366.
Like feedforward neural networks with a single hidden layer
of sigmoidal units, it can be shown that RBF networks are
universal approximators.
– Park, J. and Sandberg, I. W. (1991). "Universal Approximation Using
Radial-Basis-Function Networks," Neural Computation, 3(2), 246-257.
– Park, J. and Sandberg, I. W. (1993). "Approximation and Radial-Basis-
Function Networks," Neural Computation, 5(2), 305-316.
The Model of
Function Approximator
Linear Models
Weights
m
f (x) wifi (x)
i 1 Fixed Basis
Functions
m
f (x) wifi (x)
i 1
Linear Models
y
Linearly
Output
weighted
Units
output
w1 w2 wm
• Decomposition
Hidden
Units
f1 f2 fm • Feature Extraction
• Transformation
Linear Models
y
Linearly
Output
weighted
Units
output
w1 w2 wm
• Decomposition
Hidden
Units
f1 f2 fm • Feature Extraction
• Transformation
f ( x) wi x fi ( x) x , i 0,1, 2,
i i
i
Fourier Series
f ( x) wk exp j 2k0 x
k
w1 w2 wm
With sufficient number of
Hidden
Units
f1 f2 fm sigmoidal units, it can be a
universal approximator.
x = x1 x2 xn
m
f (x) wifi (x)
i 1
Radial Basis Function Networks as
Universal Aproximators
x = x1 x2 xn
m
f (x) wifi (x)
i 1
Non-Linear Models
Weights
m
f (x) wifi (x)
i 1 Adjusted by the
Learning process
The Radial Basis
Function Networks
Radial Basis Function Networks
Output Interpolation
Units
Hidden Projection
Units
Output
Units Classes
Hidden
Subclasses
Units
22
Network Parameters
w jk : The weights joining hidden and output layers. These are the
weights which are used in obtaining the linear combination of the radial
basis functions. They determine the relative amplitudes of the RBFs when
they are combined to form the complex function.
x uj : the Euclidean distance between the input x and the prototype
vector u ji . Activation of the hidden unit is determined according to this
distance through .
Typical Radial Functions
Gaussian
r2
f r e 2 2
0 and r
Hardy Multiquadratic
f r r 2 c2 c c 0 and r
Inverse Multiquadratic
f r c r 2 c2 c 0 and r
r2
f r e 2 2
0 and r
1.5
1.0
0.5
f r c r 2 c2 c 0 and r
Inverse Multiquadratic
1
0.9
0.8
0.7
0.6 c=5
0.5 c=4
0.4 c=3
0.3 c=2
0.2 c=1
0.1
0
-10 -5 0 5 10
RBFN’s for
Function Approximation
The idea
y Unknown Function
to Approximate
Training
Data
x
The idea
y Unknown Function
to Approximate
Training
Data
x
Basis Functions (Kernels)
m
y f (x) wifi (x)
The idea i 1
Function
y Learned
x
Basis Functions (Kernels)
m
y f (x) wifi (x)
The idea i 1
Nontraining
Sample Function
y Learned
x
Basis Functions (Kernels)
m
y f (x) wifi (x)
The idea i 1
Nontraining
Sample Function
y Learned
x
Radial Basis Function Networks as
Universal Aproximators
m
y f (x) wifi (x)
x , y
p
Training set T (k ) (k )
i 1
k 1
min E y ( k ) f x( k )
p
1 w1 w2 wm
2 k 1
2
1 (k )
y wifi x( k )
p m
2 k 1 i 1
x = x1 x2 xn
Learn the Optimal Weight Vector
m
y f (x) wifi (x)
x , y
p
Training set T (k ) (k )
i 1
k 1
min E y ( k ) f x( k )
p
1 w1 w2 wm
2 k 1
2
1 (k )
y wifi x( k )
p m
2 k 1 i 1
x = x1 x2 xn
Learning the Kernels
.
.
.
.
How to Train?
.
Exact RBF
The first layer weights u are set to the training data; U=XT. That is the
gaussians are centered at the training data instances.
The spread is chosen as d2 N , where dmax is the maximum Euclidean
max
distance between any two centers, and N is the number of training data
points. Note that H=N, for this case.
The output of the kth RBF output neuron is then
Single output
N N
yk wkj x u j Multiple y w j x u j
j 1 outputs j 1
If {xi}iN=1 are a distinct set of points in the d-dimensional space, then the
N by N interpolation matrix with elements obtained from radial basis
functions ij xi isx nonsingular,
j and hence can be inverted!
Note that the theorem is valid regardless the value of N, the choice of the
RBF (as long as it is an RBF), or what the data points may be, as long as
they are distinct!
.
.
.
Approach1
.
. (Cont.)
Gaussian RBFs are localized functions ! unlike the sigmoids used by MLPs
Using Gaussian radial basis functions Using sigmoidal radial basis functio
.
.
.
.
Exact RBF Properties
.
Using localized functions typically makes RBF networks more suitable for
function approximation problems.
Since first layer weights are set to input patterns, second layer weights are
obtained from solving linear equations, and spread is computed from the
data, no iterative training is involved !!!
Guaranteed to correctly classify all training data points!
However, since we are using as many receptive fields as the number of
data, the solution is over determined, if the underlying physical process
does not have as many degrees of freedom Overfitting!
The importance of : Too small will
also cause overfitting. Too large will
fail to characterize rapid changes in
the signal.
.
.
Too many
Receptive Fields?
.
.
.
C (x) arg min x(n) t k (n) , k 1,2,..., M tk(n): center of kth RBF at
nth iteration
k
E ( w ) e( n ) e(n) E (w )
e( n ) z ( n) z ( n )e( n )
w( n ) w w w (n)
M
e j d j wk x j t i
i 1
G x j ti
C
x j ti
G’ represents the first derivative
of the function wrt its argument