You are on page 1of 4

A two-pass hybrid training algorithm for RBF networks

Ali Ekber ZDEMR 1, lyas EMNOLU 2


1

nye Meslek Yksek Okulu,


Ordu University, TR

Electrical & Electronic Engineering Dept.


Ondokuz Mays University, TR

eozdemir@omu.edu.tr,2ilyaseminoglu@hotmail.com
6. Evolutionary Algorithms: All RBF parameters are
optimized by genetic algorithms according to defined (single
or multi-objective) cost function, but this approach can be
computationally expensive [4].
Several heuristic hybrid learning methods, which apply a
clustering algorithm for locating the centers and subsequently
a linear least squares method for the linear weights, have been
previously suggested with considerable success for many
applications. A few hybrid learning methods is to mention
[5],[9],[10],[17] and [19].
The general framework of proposed hybrid two-stage
structure is shown in Fig. 1.

Abstract
This paper presents a systematic construction of linearly
weighted Gaussian radial basis function (RBF) neural
network. The proposed method is computationally a two-stage
hybrid training algorithm. The first stage of the hybrid
algorithm is a pre-processing unit which generates a coarselytuned RBF network. The second stage is a fine-tuning phase.
The coarsely-tuned RBF network is then optimized by using a
two-pass training algorithm. In forward-pass, the output
weights of RBF are calculated by the Levenberg - Marquardt
(LM) algorithm while the rest of the parameters is remained
fixed. Similarly, in backward-pass, the free parameters of
basis function (center and width of each node) are adjusted by
gradient descent (GD) algorithm while the output weights of
RBF are remained fixed. Hence, the effectiveness of the
proposed method for an RBF network is demonstrated with
simulations.

1. Introduction
Simple structure of RBF enables learning in stages, gives
a reduction in the training time, and this has lead to the
application of such networks to many practical problems. The
learning strategies in the literature used for the design of RBF
network differ from each other mainly in the determination of
centers. These can be categorized into the following groups
[1].
1. Fixed Centers Assigned Randomly Among Input
Samples: In this method, which is the simplest one, the
centers are chosen randomly from the set of input training
samples.
2. Orthogonalization of Regressors: The most commonly
used algorithm is orthogonal least squares (OLS) [2] which
selects a suitable set of centers (regressors) but might not be
the optimal set as demonstrated in among input training
samples [3].
3. Supervised Selection of Centers: In this method, the
centers together with all other parameters of RBF network
linear weights, variances) are updated using a backpropagation type of learning.
4. Input Clustering (IC) : The locations of centers are
determined by a clustering algorithm applied to input training
sample vectors.
5. InputOutput Clustering (IOC) : The IC method in (4)
is based on the distribution of the training inputs alone. When
variation of output in a cluster is high, centers are also
selected based on both input and output data or joint inputoutput data such as [1].

Figure 1: General framework of the proposed two-stage


hybrid training algorithm.
Each stage of Fig. 1. has a unique operational target and
contributes model construction in a sequential manner. These
two stages - i) pre-processing unit and ii) two-pass hybrid
training unit - are summarized as below:
The first stage (or the pre-processing unit) is coarsetuning stage. It determines a coarsely-tuned RBF network
which has the final structure (in terms of node numbers) and
roughly initiated a set of free parameters. The pre-processing
unit behaves like a structural and parametric initialization
unit. The number and the locations of M initial centers of
RBF network are determined by using an orthogonal least
squares (OLS) algorithm. Afterward, a coarse-tuning of all

617

free-parameters (centers, widths and weights) is achieved by


using the Gustafson-Kessel (GK) clustering procedure.
The partition validation algorithm embedded into the GK
clustering algorithm may further reduce the number of centers
since M (found by OLS algorithm) may not be optimal.
Obtained RBF network is passed into the next stage for further
processing and tuning. In the literature, usage of a kind of preprocessing unit for construction of an initial model of interest
is not uncommon. A pre-processing unit is initially proposed
in [5] by Linkens&Chen to construct a normalized fuzzy
system for model construction. A modified counter
propagation network (CPN) is exploited as a preprocessor to
extract a number of clusters which can be viewed as an initial
fuzzy model from the raw data [5],[6]. The fine tuning step is
achieved by using a back-propagation type of learning.
The pre-processing unit (OLS+GK) adopted to construct
initial RBF model in this paper is one of the four-method
proposed in [7] and [8].
The second stage (or two-pass hybrid training unit) is a
fine-tuning stage that is presented in this paper in detail. The
coarsely-tuned RBF network is then optimized by using a twopass training algorithm. In forward-pass of the computation,
the output weights of RBF are adjusted by the LevenbergMarquardt (LM) algorithm while the rest of the parameters is
remained fixed. Similarly, in backward-pass of the
computation, the free parameters of basis function (center and
width of each node) are adjusted by gradient descent (GD)
algorithm while the output weights of RBF are remained fixed.
The final form of RBF network is efficiently constructed
through computationally a two-pass hybrid training algorithm.

[12]. The LM algorithm is originally an intermediate


optimization algorithm between the GaussNewton (GN)
method and gradient descent (GD) algorithm. And address the
limitations of each of those techniques. By combining the
positive attributes of GN and GD algorithms, the LM
algorithm constructs a hybrid optimization technique which is
suitable for many real-world applications. A detailed
treatment of the LM method can be found in [12], [13], [14]
and [15],

e = [e1 e2 ... e L ] : error vector


a11 L a1D
w = M
M
M
a M 1 L a MD

a10

M : parameter vector
a M 0

Jacobien matrix can be computed as follows:


w1
w1
64447
444
8
64447
4448
de
de
de
de
de
de1
1
1
1
1 L 1

L
L
da M 1 da MD da M 0
da11 da1D da10

J=
M
L
M
de L
de L de L
de L
de L de L
L
L
L

da M 1 da MD da M 0
da11 da1D da10

H = J J + LM I : Hessian matrix ( LM :Marquardt

2. Two-pass hybrid training unit

parameter, I: unit matrix)


T

As can be seen from Table 1, in forward-pass, the output


weights of RBF are adjusted by the Levenberg - Marquardt
(LM) algorithm while the rest of the parameters is remained
fixed. Initially, the output of hidden units (node output or )
is treated as input vector and ei =(di - yi) is treated as error
vector. The weights in the output layer are then updated by
the LM algorithm. In backward pass, the free parameters of
basis function (center and width of each node) are adjusted by
gradient descent (GD) algorithm while the output weights
(updated in the last forward-pass) of RBF are remained fixed.
The final form of RBF network is efficiently constructed
through a computationally two-pass algorithm. The two-pass
algorithm employed in this paper is more efficient than GD
method only as presented in [7] and [8]. The two-pass
algorithm requires less total number of iterations than GD
only algorithm employed in [7] and [8].

g = J e : Gradient vector
W

(t +1)

=W

(t )

g : Updating low of free parameters

Thus, LM is decreased after each successful step


(reduction in cost function) and is increased only when a
tentative step would increase the cost function. In this way,
the cost function might always be reduced at each iteration of
the algorithm.
2.2. Gradient Descent (GD) Algorithm
The GD algorithm utilizes a cost function given in equation
(1) and detailed treatment of the GD method can be found in
[16]. The desired output of RBF network is represented by di,
actual output is yi and L shows total number of input data.
The desired output of RBF network is represented by di,
actual output is yi and L shows total number of input data.
Input-output data set is applied during training NGD times (the
number of iteration) and the main goal is to minimize total
cost function given in equation (2).

Table 1: Two-pass hybrid training procedure for linearlyweighted RBF networks.

E=

1 L
(d i y i ) 2
2 i =1

(1)

N GD

T = min( E i )

2.1. Levenberg-Marquardt (LM) Algorithm

i =1

A mathematical description of the LM neural network


training algorithm has been presented by Haganand Menhaj

618

(2)

Table 2: The outcomes of two simulated examples.


The free-parameters of RBF network (widths and centers)
using GD algorithm can be computed using equation (3) .

i +1 = i
In equation (3),

i +1

dE i
d i

(3)

is the current (updated) value of

free-parameter, is the previous values of free-parameter


and depicts learning rate for this parameter. Using
i

equation (3), all free parameters of basis functions can be


updated in such a way that total cost function is iteratively
minimized.

3. Experimental Results
Example 1: Box and Jenkinss gas furnace is a famous
example of system identification [18]. The data consist of 296
I/O measurements of a gas furnace system: the input
measurement u(t) is gas flow rate into the furnace and the
output measurement y(t) is CO2 concentration in outlet gas.
For modeling, u(t-6), u(t-1) , y(t-6) and y(t-1) are chosen as
the input variables of RBF network and y(t) is chosen as the
output. The outcomes of the simulation are graphically
presented in Fig. 2.

4. Conclusion
Systematic construction of linearly-weighted Gaussian radial
basis function (RBF) neural network with a two-stage hybrid
training method is presented. The first stage of the hybrid
algorithm is a preprocessing unit which generates a coarselytuned RBF network. The second stage is a fine-tuning phase
which employs computationally a two-pass algorithm. The
proposed method is compared with ANFIS structure over two
non-linear benchmarks (including Box - Jenkins gas furnace
and Mackey-Glass chaotic time series) in terms of MSE
errors. As can be seen from Table 2, similar level of MSE
errors are obtained with the proposed method along with
fewer rule number- as compared to ANFIS structure. ANFIS
gives slightly better results, but it employs more rule than the
proposed method. When GD only algorithm is employed in
the second stage as presented in [7] and [8], the obtained MSE
results are expectantly poor as compared to both the proposed
method and ANFIS.

Example 2: The last application of the proposed method is to


predict complex time series [10], a special function
approximation problem that arises in such real-world
problems as detecting arrhythmia in heartbeats. The chaotic
MackeyGlass differential is generated from the following
delay differential equation (4) where t = 17 and the first 500
data (x(t-3), x(t-2), x(t-1), x(t) and x(t+1) ) is obtained and
normalized in the range of [-1,1].

dx(t )
0.2 x (t )
=
0.1x(t )
dt
1 + x 10 (t )

(4)

After completion of the training, the outcomes are graphically


presented in Fig.3. Comparative results are given in Table 2.
Parameters in Table 2 are given as below:
OLS : Termination parameter for OLS algorithm
C, : Learning rates for centers and widths, respectively
LM : Marquardt parameter
NEpoch : Total epoch number

619

[16] Jenison R. L. and Fissell K. A comparison of the von


mises and gaussian basis functions for approximating
spherical acoustic scatter, IEEE Transactions On
Neural Networks, 6(5):12841287, Sept. 1995.
[17] Staiano A, Tagliaferri R. , Pedrycz W., Improving RBF
networks performance in regression tasks by means of a
supervised fuzzy clustering , Neurocomputing, 69, 1315, 1570-1581, Aug. 2006.
[18] Kukolj D. and Levi E. Identification of complex systems
based on neural and takagi-sugeno fuzzy model, IEEE
Transactions on Systems, Man, and Cybernetics, Part B,
34(1):272282, Feb. 2004.
[19] Emami M. R Turksen I. B. Goldenberg A. A, An
Improved Fuzzy Modeling Algorithm, Part I: Inference
Mechanism, Part II: System Identification , NAFIPS,
1996

5. References
[1] Uykan Z., Gzelis C., elebi M. E., and Heikki N. K.,
Analysis of input-output clustering for determining
centers of RBFN, IEEE Transactions On Neural
Networks, 11:851858, 2000.
[2] Chen S., Cowan C.F.N., and Grant P.M. Orthogonal least
squares learning algorithm for radial basis function
networks IEEE Transactions On Neural Networks,
2:302309, March 1991.
[3] Sherstinsky A. and Picard R. W. On the efficiency of the
orthogonal least squares training method for radial basis
function networks IEEE Transactions On Neural
Networks, 7:1995200, 1996.
[4] Buchtala O., Klimek M., and Sick B. Evolutionary
optimization of radial basis function classifiers for data
mining applications, IEEE Transactions on Systems Man
and Cybernetics Part B-Cybernetics, 35:928947, 2005.
[5] Chen M.-Y. and Linkens D.A. A systematic neuro-fuzzy
modeling framework with application to material
property prediction, IEEE Transactions on Systems,
Man,and Cybernetics- Part B:Cybernetics, 31:781790,
2001.
[6] D.A. Linkens and Chen M.-Y. Input selection and
partition validation for fuzzy modelling using neural
network, Fuzzy Sets and Systems, 107:299308, 1999.
[7] Kayhan G., zdemir A.E. and Eminolu ., Designing
Pre-Processing Units For RBF Networks Part-1: Inital
Structure Identification, To appear in International
Symposium on Innovations in Intelligent Systems and
Applications, Trabzon,Trkiye,2009, (INISTA09).
[8] zdemir A.E., Kayhan G. and Eminolu , Designing
Pre-Processing Units For RBF Networks Part-2: Final
Structure Identification And Course Tuning of
Parameters, To appear in International Symposium on
Innovations in Intelligent Systems and Applications,
Trabzon,Trkiye ,2009, (INISTA09).
[9] Jang J.-S. Roger. Anfis : Adaptive-network-based fuzzy
inference system, IEEE Transactons On Systems, Man,
And Cybernetics, 23:665685, 1993.
[10] Ouyang C.-S., Lee W.-J., and Lee S.-J. A tsk-type
neurofuzzy network approach to system modeling
problems, IEEE Transactions on Systems, Man, and
Cybernetics, Part B, 35(4):751767, Aug. 2005.
[11] Lee S.-J. and Ouyang C.-S. A neuro-fuzzy system
modeling with self-constructing rule generationand
hybrid svd-based learning, IEEE Transactions On Fuzzy
Systems, 11(3):341353, June 2003.
[12] Hagan M. T. and Menhaj M. B. Training feedforward
networks with the marquardt algorithm, , IEEE
Transactions On Neural Networks, 5(6):989993, Nov.
1994.
[13] Wilamowski B. M., Chen Y., and Malinowski A.
Efficient algorithm for training neural networks with one
hidden layer, In Neural Networks, IJCNN 99.
International Joint Conference on, 1999.
[14] Kermani B. G., Schiffman S. S., and Nagle H. T.
Performance of the levenberg marquardt neural network
training method in electronic nose applications, Sensors
and Actuators B, 110:1322, 2005.
[15] cer S., Kara S., and Gven A. Comparison of multilayer
perceptron training algorithms for portal venous doppler
signals in the cirrhosis disease, Expert Systems with
Applications, 31:406413, 2006.

Figure 2: The original data, modelled system and input


output errors are given for the gas furnace example.

Figure 3: The original data, modelled system and inputoutput errors are given for the MackeyGlass time series
example.

620

You might also like