You are on page 1of 9

Neural Networks 39 (2013) 1826

Contents lists available at SciVerse ScienceDirect

Neural Networks
journal homepage: www.elsevier.com/locate/neunet

Generalized classifier neural network


Buse Melis Ozyildirim a, , Mutlu Avci b
a

Department of Computer Engineering, University of Adana Science and Technology, Adana, Turkey

Department of Computer Engineering, University of Cukurova, Adana, Turkey

article

info

Article history:
Received 14 April 2012
Received in revised form 7 November 2012
Accepted 3 December 2012
Keywords:
GCNN
GRNN
PNN
Classification neural networks
Gradient descent learning

abstract
In this work a new radial basis function based classification neural network named as generalized classifier
neural network, is proposed.
The proposed generalized classifier neural network has five layers, unlike other radial basis function
based neural networks such as generalized regression neural network and probabilistic neural network.
They are input, pattern, summation, normalization and output layers. In addition to topological difference,
the proposed neural network has gradient descent based optimization of smoothing parameter approach
and diverge effect term added calculation improvements. Diverge effect term is an improvement
on summation layer calculation to supply additional separation ability and flexibility. Performance
of generalized classifier neural network is compared with that of the probabilistic neural network,
multilayer perceptron algorithm and radial basis function neural network on 9 different data sets and
with that of generalized regression neural network on 3 different data sets include only two classes in
MATLAB environment. Better classification performance up to %89 is observed. Improved classification
performances proved the effectivity of the proposed neural network.
2012 Elsevier Ltd. All rights reserved.

1. Introduction
Pattern classification problems are important application areas
of neural networks used as learning systems (Al-Daoud, 2009;
Bartlett, 1998; Specht, 1990). Multilayer perceptrons (MLP), radial
basis functions (RBF), probabilistic neural networks (PNN), self
organizing maps (SOM), cellular neural networks (CNN), recurrent
neural networks and conic section function neural network
(CSFNN) are some of these neural networks. In addition to
classification problems, function approximation problems are
also solved with neural networks. Generalized regression neural
network (GRNN) is one of the most popular neural network,
used for function approximation. GRNN and PNN are kinds of
radial basis function neural networks (RBFNN) with one pass
learning (Al-Daoud, 2009). However they are similar; PNN is used
for classification where GRNN is used for continuous function
approximation (Mosier & Jurs, 2002).
PNN introduced by Donald F. Specht in 1990 (Specht, 1990)
is used for various classification problems ever since (Adeli
& Panakkat, 2009; Hajmeer & Basheer, 2002; Kailun, Huijun,
& Maohua, 2010; Zhu & Hao, 2009). Since performance of
PNN is related with smoothing parameter and size of the

Corresponding author.
E-mail addresses: melis.ozyildirim@gmail.com (B.M. Ozyildirim),
mavci@cu.edu.tr (M. Avci).
0893-6080/$ see front matter 2012 Elsevier Ltd. All rights reserved.
doi:10.1016/j.neunet.2012.12.001

neural network, previous works are generally proposed on


optimization of smoothing parameters and topology of neural
network (Berthold & Diamond, 1998; Mao, Tan, & Set, 2000;
Montana, 1992; Rutkowski, 2004). Genetic algorithm is one
of the optimization methods used for smoothing parameter
identification (Mao et al., 2000). Automatic topology construction
is a solution to determine the appropriate size of neural network.
In Berthold and Diamond (1998), new hidden units are added
to PNN when necessary, thus large datasets are classified with
minimum PNN topology. In addition to topology construction,
smoothing parameter optimization is provided with dynamic
decay adjustment algorithm (Berthold & Diamond, 1998). Pattern
layer neurons also effect PNN performance. In Mao et al. (2000),
orthogonal algorithm is used to select the most representative
pattern layer neurons from training data. Affine transformations
of feature space cause problems on PNN performance. To deal
with these problems, anisotropic Gaussian is implemented in
Montana (1992). Anisotropic Gaussian form includes covariance in
exponential part and training of this method is based on genetic
coding. Studies mentioned so far are related to static probabilistic
distribution, however, some pattern of probabilistic distributions
vary over time. In Rutkowski (2004), time-varying probabilistic
distribution problems are considered as prediction problems and
solved with adaptive PNN structure.
GRNN also introduced by Donald F. Specht in 1991 (Specht,
1991) is based on NadarayaWatson kernel (Kiyan & Yildirim,
2004). GRNN is used for many applications such as prediction,

B.M. Ozyildirim, M. Avci / Neural Networks 39 (2013) 1826

control, medical diagnosis, engineering, speech recognition and


3D modeling (Amrouche & Rouvaen, 2006; Asad, Zhijiang, Lining,
Reza, & Fereidoun, 2007; Firat & Gungor, 2009; Kayaer & Yildirim,
2003; Kiyan & Yildirim, 2004; Popescu, Kanatas, Constantinou,
& Nafornita, 2002; Ren, Yang, Ji, & Tian, 2010; Wang & Sheng,
2010; Yildirim & Cigizoglu, 2002). Studies show that GRNN has
better function approximation performance than feedforward
networks and other statistical neural networks on some datasets
(Amrouche & Rouvaen, 2006; Firat & Gungor, 2009; Kayaer &
Yildirim, 2003; Kiyan & Yildirim, 2004; Ren et al., 2010; Wang
& Sheng, 2010; Yildirim & Cigizoglu, 2002). Although GRNN is
proposed for function approximation some binary classification
applications exist (Kayaer & Yildirim, 2003; Kiyan & Yildirim,
2004). Large datasets cause complex and huge neural networks
and decrease the efficiency of GRNN. In addition to huge network
size, smoothing parameter directly effects GRNN performance.
Determining optimal smoothing parameter value (Ren et al., 2010)
and decreasing pattern layer size are the major problems of
GRNN. Clustering methods such as K -means, fuzzy means and
fuzzy adaptive resonance theory reduce the number of neurons
at the pattern layer by grouping data into clusters and calculating
centroids of these clusters are utilized for GRNN (Lee, Lim,
Yuen, & Lo, 2004; Specht, 1991; Zhao, Zhang, Li, & Song, 2007).
Feature extraction methods are also utilized for improving the
performance of GRNN (Erkmen & Yildirim, 2008). In Hoya and
Chambers (2001), growing and pruning processes are used for
finding optimal number of neurons at the pattern layer. At growing
step, all misclassified data are added to pattern layer iteratively
until all data are correctly classified. At pruning step, repetitive
data are removed. Smoothing parameter is also updated at both
growing and pruning processes in accordance with the maximum
distance between input and patterns, number of pattern layer
neurons and number of output layer neurons (Hoya & Chambers,
2001). Gradient descent, Quasi-Newton optimization methods
and genetic algorithm are used for optimization of smoothing
parameter (Lee et al., 2004; Masters & Land, 1997). In Tomandl
and Schober (2001), GRNN is modified to be used for any form of
data. Modified GRNN (MGRNN) uses the relative distance between
each sample instead of data. Training of MGRNN is provided with
specific error function (Tomandl & Schober, 2001). In Yoo, Sikder,
Zhou, and Zomaya (2007), GRNN is improved for high dimensional
data by linear dimensionality reduction method.
In this work, a new RBFNN based classification neural
network named as Generalized Classifier Neural Network (GCNN)
is proposed. GCNN has five layers named as input, pattern,
summation, normalization and output. For each pattern layer
neuron, a smoothing parameter is assigned. Smoothing parameters
are updated to converge squared error of winner neuron to global
minimum. GCNN uses target values for each pattern layer neuron
and provides regression based effective classification. Increasing
the distance among different classes provides better classification
performance. For this purpose, a new term amplifying the target
value effects by increasing the distance among classes, is defined
as diverge effect term. It is contained at the summation layer
calculation. Summation layer contains two different types of
neurons. First type of neuron is assigned for each class. Only one
second type of neuron is assigned for denominator calculation.
First type of neurons are used for sum of product of output of
pattern layer and diverge effect term. Normalization layer has N
neurons where N denotes number of classes. In this layer, each
neuron divides first type neuron output to second type neuron
output of summation layer. Output layer contains competition
among normalization layer neurons. Smoothing parameters are
optimized according to squared error of winner neuron estimated
value and target value.
Proposed GCNN is tested with 9 data sets in MATLAB
environment. These are glass identification, Habermans survival,

19

Fig. 1. GRNN architecture.

Fig. 2. PNN architecture.


Table 1
Description of data sets.
Data set

Attributes

Classes

Data

Glass
Habermans survival
Two spiral problem
Lenses
Balance-scale
Iris
Breast cancer wisconsin
E.coli
Yeast

10
3
2
4
4
4
10
8
8

7
2
2
3
3
3
2
8
10

214
306
328
24
625
150
699
336
1484

two spiral problem, lenses, balance-scale, iris, breast-cancerwisconsin Bennett and Mangasarian (1992), Mangasarian, Setiono,
and Wolberg (1990), Mangasarian and Wolberg (1990), Wolberg
and Mangasarian (1990), E.coli and yeast data sets (Frank &

20

B.M. Ozyildirim, M. Avci / Neural Networks 39 (2013) 1826

Table 2
10-fold cross validation classification performances.
Data sets/methods (%)

GCNN

GRNN

PNN

= 0.3

GRNN
optimized

PNN
optimized

Glass

sigma =
0.2567/94.3925
sigma =
0.2794/66.0131
sigma =
0.2998/89.0244
sigma = 0.3/100

52.8037

59.4771

59.4771

85.3659

85.3659

66.6667

sigma =
0.2/59.8
sigma =
0.24/85.37

72.1154

Iris

sigma =
0.2997/91.5064
sigma = 0.2823/100

94

Breast cancer wisconsin

sigma = 0.3/96.2751

95.4155

95.4155

E.coli

sigma = 0.2376/100

56.5476

sigma =
0.265/95.13

Yeast

sigma = 0.2171/100

11.1186

sigma =
0.196/55.61
sigma =
0.25/59.48
sigma =
0.25/85.37
sigma =
0.3/66.6667
sigma =
0.35/72.12
sigma =
0.26/95.33
sigma =
0.265/95
sigma =
0.14/77.68
sigma =
0.15/31.2

Habermans survival
Two spiral problem
Lenses
Balance-scale

= 0.3

GRNN

PNN

MLP

RBF

53.7383

48.130

49.0654

58.8235

61.1111

64.0523

71.2418

18.9024

85.3659

31.0976

79.2683

66.6667

70.83

75

69.7115

87.1795

73.2372

95.33

92

92

95.7020

70.7736

96.1318

67.1920

78.5714

76.1905

71.7262

38.6792

43.7332

38.2749

=1

= 0.1

Table 3
Training and test times.
Data sets

Glass
Habermans survival
Two spiral problem
Lenses
Balance-scale
Iris
Breast cancer wisconsin
E.coli
Yeast

Methods training/ test times


GCNN

GRNN
= 0.3

= 0.3

PNN

GRNN
optimized

PNN
optimized

GRNN
=1

= 0.1

PNN

MLP

RBF

10.9110/0.0873
11.8723/0.1334
14.3450/0.1955
0.1094/0.0014
61.6890/0.4979
3.3015/0.0382
60.8600/0.5141
32.0276/0.3169
680.5347/4.6442

0.1838
0.2004

0.2174

0.1888
0.1840
0.1989
0.1831
0.2064
0.1772
0.2108
0.2001
0.2562

0.1817
0.1924

0.2101

0.2049
0.1844
0.1987
0.1915
0.2053
0.1744
0.2117
0.1924
0.3951

0.1824
0.1909

0.2169

0.2014
0.1841
0.2112
0.1856
0.2069
0.1762
0.2126
0.2289
0.4267

2.0599/0.0607
1.2340/0.0620
1.2129/0.0606
0.8263/0.0521
2.7872/0.0598
1.3461/0.0563
1.7337/0.0552
3.1709/0.0637
14.1513/0.0801

10.1004/0.1005
14.3833/0.1011
11.0586/0.0854
0.9545/0.0802
21.8738/0.0954
0.6813/0.0983
37.7400/0.0883
1.9160/0.0949
174.1730/0.1344

Asuncion, 2010). Classification performance results are compared


with that of PNN and GRNN of MATLAB Toolbox with varying
and constant smoothing parameters, Multi Layer Perceptron (MLP)
and Radial Basis Function Neural Network (RBFNN) of MATLAB
Toolbox. According to that test results, GCNN is proposed as a new
and effective classifier neural network.
2. Fundamental approaches for GCNN
Since GCNN, GRNN and PNN are based on radial basis function neural network, GCNN can be considered as a close relative
of GRNN and PNN. When they are compared according to their
purposes, PNN and GCNN are used for classification; however
GRNN is used for regression. Unlike PNN, GCNN is based on
regression methodology for effective classification. GCNN is
different from others with its topology, diverge effect term and
training method. Gradient descent method is used as training
method in GCNN. GRNN, PNN and gradient descent training
method are briefly explained in the following subsections.
2.1. Generalized regression neural networks
GRNN is proposed for function approximation purposes (Amrouche & Rouvaen, 2006); however in some works, it is applied
to classification problems (Amrouche & Rouvaen, 2006; Kayaer &
Yildirim, 2003; Kiyan & Yildirim, 2004). Its advantages are fast
learning, consistency and optimal regression with large number of
samples (Ren et al., 2010). GRNN has four layers; input, pattern,
summation and output as shown in Fig. 1 (Specht, 1991).
Input layer provides transmission of input vector x to pattern
layer. Pattern layer consists of neurons for each training datum or

Fig. 3. GCNN architecture.

B.M. Ozyildirim, M. Avci / Neural Networks 39 (2013) 1826

21

Fig. 4. Smoothing parameters and classification rates for each fold.

for each cluster center. In this layer, weighted squared Euclidean


distance is calculated according to (1). Any new input applied
to network is first subtracted from pattern layer neuron values,
then according to the distance function either squares or absolute
values of subtracts are summed and applied to activation function.
Generally, exponential function is used as activation function.
Results are transferred to summation layer. Neurons in summation
layer add dot product of pattern layer outputs and weights. In Fig. 1
weights are shown by A and B, their values are determined by y
values of training data stored at pattern layer and f (x)K denotes
weighted outputs of pattern layer where K is a constant associated
with Parzen window. Yf (x)K denotes multiplication of pattern
layer outputs and training data output Y values. At output layer,
Yf (x)K is divided by f (x)K to estimate desired Y , given in (2), (3),
(Al-Daoud, 2009; Erkmen & Yildirim, 2008; Specht, 1991).

Dj = x tj

x tj

(1)

Yf (x, Y ) dY

Y (x) =

f (x, Y ) dY

Y (x) =

yj e

j=1

Dj
2 2

j=1

2.2. Probabilistic neural networks


Specht introduced RBFNN based classification neural network
named as PNN (Specht, 1990). PNN structure is shown in Fig. 2.
According to figure, input layer holds applied input values
to be processed in pattern layer. Each pattern unit consists of
weight vector t and input vector x. In this layer, for each pattern
neuron first dot product of t and x is performed then nonlinear
activation function is applied to this product, given in (4). Generally
exponential function is used as activation function. Results are
summed at summation layer for each pattern unit and fA1 (x)
and fB1 (x) which represent Gaussian activation functions, are
calculated. Output layer is known as decision layer (Specht, 1990).

T
(t x) 2(t x)

(x) = e

(4)

PNN is a BayesParzen classifier. Parzen introduced univariate


case of probability density function (pdf). Cacoullos extended pdf
to multivariate case as in (5).

.
p

Dj
e

(2)

the other hand the larger one extends radius of effective neighbors
(Amrouche & Rouvaen, 2006; Ren et al., 2010).

(3)

2 2

GRNN is also known as normalized RBFNN. RBF units are


probability density functions as in (3). In GRNN structure only
smoothing parameter ( ); also known as bandwidth; is updated
(Kiyan & Yildirim, 2004; Tomandl & Schober, 2001). values are
important; smaller limits the number of effective samples, on

g (x1 , x2 , . . . , xm ) =
p

j =1

1
p1 2 . . . m

x1 t1,j x2 t2,j

,...,

xm tm,j

(5)

1 , 2 , . . . , m denote standard deviations named as smoothing


parameters. x1 , x2 , . . . , xm are input variables. W is weighting

22

B.M. Ozyildirim, M. Avci / Neural Networks 39 (2013) 1826

Fig. 4. (continued)

function with specific characteristics and p is number of training


samples. In case of equal smoothing parameters and bell-shaped
Gaussian activation function (6) is obtained. This is the most
popular case of PNN.

g ( x) =

1
m
(2)( 2 ) p m

xtj 2

e = (y f )2

2 2

Since the purpose of neural network training is minimizing the


squared error of system, gradient descent is the popular method
for this aim. Firstly, squared error is calculated according to (7).

(6)

j =1

where x denotes input vector, tj refers to jth training vector


(Hajmeer & Basheer, 2002).
2.3. Gradient descent training method
Gradient descent is an iterative first order optimization
algorithm. The purpose of this method is finding minimum of
a differentiable function by following the opposite direction of
gradient with defined step size (Madsen, Nielsen, & Tingleff, 2004).

(7)

where y denotes desired value, f is output value of function and e


is squared error. Derivative of e is calculated as in (8), where w is
the parameter converging e to minimum.

e
(f )
=2
.
w
w

(8)

e
Finally, w is updated in the opposite direction of w
with step
size as given in (9).

wt +1 = wt +

e
.
w

(9)

B.M. Ozyildirim, M. Avci / Neural Networks 39 (2013) 1826

3. Generalized classifier neural network


GCNN is a new classification neural network with gradient
descent learning on smoothing parameter and can be identified as
a new kind of RBFNN. It has five layers; input, pattern, summation,
normalization and output. Structure of GCNN is shown in Fig. 3.
Input layer transmits applied input vector x to pattern layer.
Pattern layer contains one neuron for each training datum.
Neurons at pattern layer calculate squared Euclidean distance
between the input vector x and the training data vector t as given
in (10) where p denotes the total number of training data. Output
of pattern layer is determined by RBF kernel activation function,
given in (11). As GCNN classification methodology is based
on regression, it builds on one-vs.-all discriminative structure;
therefore each training datum has N values determined by whether
or not belonging to class: if a training datum belongs to ith class
then its ith value is 0.9 and others are 0.1. The reason of choosing
0.9 and 0.1 values is to prevent stuck neuron problem of learning
process.
dist(j) = x tj 2 ,
r (j) = e

y (j, i) =

1 dist(2j)

0.9
0.1

1jp
1jp

tj belongs to ith class


else 1 j p.

1iN

d (j, i) = e

y (j, i)

d (j, i) r (j),

1iN

D=

(11)
(12)

(13)

(14)

(15)

In normalization layer, there are N neurons represent, each


class and outputs of these neurons are calculated according to (16).
ui
D

r (j)

1 i N.

(16)

Finally, at the last layer the winner decision mechanism given


in (17) selects maximum of normalization layer outputs.

[o, id] = max(c )

dist(j)

j =1

j =1

ci =

where y(z , id) represents the value of zth training input data for
idth class and cid is value of winner class. Secondly, first derivative
of error e is calculated according to (19)(22) (Masters & Land,
1997).

(17)

where c is normalization layer output vector, o denotes winner


neuron value and id denotes winner class.

(19)
(20)
(21)

(22)

Smoothing parameter is updated with the gradient of error (23),


where is learning step.

new = old +

e
.

(23)

Algorithm of GCNNs training step is given in Algorithm 1,


where epoch denotes number of iterations that training algorithm
takes place, amse denotes acceptable mean squared error and lr
stands for learning rate of gradient descent method. When one of
the stopping criteria is provided, optimum smoothing parameters
for each training datum are obtained.
Algorithm 1 Training of GCNN
inputs: epoch, lr, training_input_data, amse
outputs: smoothing parameter
initialize smoothing parameter and ymax
while iteration epoch
e
update w ith
for each training datum; tj
find Euclidean distance between input
and training data, dist (j)
perform RBF activation function, r (j)
for each class; i
calculate diverge effect term,
d (j, i) = e(y(j, i)ymax ) y (j, i)
compute ui =

r (j).

(18)

l(id) = 2

j =1
p

e = (y (z , id) cid )2

(10)

where d(j, i) denotes diverge effect term of jth training data and ith
class. ymax is initialized with 0.9 which denotes the maximum value
of y (j, i) and updated with the maximum value of output layer for
each iteration.
At this layer, when N neurons calculate sum of dot product of
diverge effect terms and pattern layer outputs as given in (14),
other neuron calculates denominator the same as GRNN, given in
(15).
ui =

Since smoothing parameter has an important effect on classification performance, gradient descent based training approach is
adapted to GCNN. During the training step, each training datum at
pattern layer is sequentially applied to neural network. Firstly for
each input, squared error e is calculated as given in (18).

e
cid
= 2 [(y (z , id) cid )]

o
b(id) l(id) cid
=

D
p

dist(j)
b(id) = 2
d (j, id) r (j)
3
j =1

Summation layer has N + 1 neurons where N is for total


number of classes and 1 is for one neuron to obtain denominator.
At summation layer, GCNN uses diverge effect term in N neurons
for better classification performances. Diverge effect term uses
exponential form of y (j, i) ymax , (13) to increase the effect
of y (j i). The aim of using exponential function is providing
convergence to minimal error between limits. Diverge effect term
provides two important advantages to GCNN. By increasing the
effect of y (j, i), data belong to different classes, are separated
from each other. By taking the advantage of exponential function,
overfitting problem, generally gradient descent approach suffers
from, is suppressed.
(y(j, i)ymax )

23

and D =

j =1

j =1

d (j, i) r (j)

r (j)
u

calculate normalization layer neurons values; ci = Di


end-for
find winner neuron and its value; [o, id] = max (c )
to update diverge effect term winner neuron values are stored;
cmax(iteration) = cid
calculate squared error e =(y (z , id) cid )2
where z denotes zth input.
end-for
ymax
= max (cmax) increment iteration
if | e | amse
stop training
end- while

24

B.M. Ozyildirim, M. Avci / Neural Networks 39 (2013) 1826

Fig. 5. Smoothing parameters and classification rates.

Inputs of test algorithm are test data and optimum smoothing parameters obtained from training algorithm. Outputs are estimated classes of test data. Algorithm 2 shows algorithm of GCNN
test step.
Algorithm 2 Test of GCNN
inputs: smoothing_parameter, test_data
outputs: class
for each training datum; tj
find Euclidean distance between test
and training data, dist (j)
perform RBF activation function, r (j)
for each class; i
calculate diverge effect term,
d (j, i) = e(y(j, i)ymax ) y (j, i)
compute ui =
and D =

j =1

d (j, i) r (j)

j=1 r (j)

calculate normalization layer neurons values; ci =


end-for
find winner neuron and its value; [o, id] = max (c )
end-for

ui
D

4. Tests and results


GCNN is compared with PNN, MLP and RBFNN for 9 different
data sets by 10-fold cross validation test. Although GRNN is

designed for function approximation some binary classification


applications exist. Therefore, in this paper GCNN is also compared
with the classification usage of GRNN. Performance of GCNN
is compared with MATLAB 9 Standard Neural Network Toolbox
PNN, GRNN, MLP and RBFNN implementations. Data sets are
glass identification, Habermans survival, two spiral problem,
lenses, balance-scale, iris, breast-cancer-wisconsin, E.coli and
yeast. Number of attributes and classes are given in Table 1.
Table 2 shows 10-fold cross validation test performances of
GCNN with average smoothing parameter values, obtained from
10-fold cross validation and represented with sigma in table,
GRNN/ PNN with GCNNs initial smoothing parameter value
and GRNN/PNN with average optimized smoothing parameter,
obtained from 10-fold cross validation and represented with sigma
in Table, GRNN/PNN with default smoothing parameter values of
MATLAB Toolbox, standard MLP algorithm and RBFNN of MATLAB
Toolbox respectively.
According to the Table 2 for 9 data sets, GCNN provides better classification performance in the range of %1%89 than PNN.
GRNNs function approximation nature allows its binary classification usage. However since GRNN uses weighted arithmetic mean
in its output layer, as the number of class increases; the classification performance of GRNN decreases. In this paper, GCNN is
compared with GRNN for only binary classification problems.
According to the results, GCNN provides better classification
performance in the range of %0%71 than GRNN. Optimized
smoothing parameter causes both increases and decreases on the
performance of standard GRNN and PNN. However, GCNN provides
%1%68 improved classification performance than GRNN and PNN

B.M. Ozyildirim, M. Avci / Neural Networks 39 (2013) 1826

25

Fig. 5. (continued)

with optimized smoothing parameter. GCNN has better classification performance than standard and optimized PNN and GRNN.
In addition to the radial basis function based neural networks
GCNN provides better classification performance than both MLP
and RBFNN except Habermans survival data set.
In GCNN smoothing parameter is optimized according to
training data for each fold. Fig. 4 shows smoothing parameter
values for each data set and fold.
Number of epoch and learning step for GCNN, optimized
GRNN and PNN models are chosen as 10 and 0.3 respectively. In

Table 3 average training and test times are given for each methods
compared in Table 2.
Since GCNN has training step, it requires more computational
time than other methods which do not include training step. On
the other hand, MLP has also training step however requires less
training time than GCNN. This is because GCNN has one neuron for
each training data in its hidden layer where MLP has less number
of neurons. Computational times of RBFNN and GCNN are close to
each other. The difference between RBFNN of MATLAB Toolbox and
GCNN is that, while GCNN includes one neuron for each training

26

B.M. Ozyildirim, M. Avci / Neural Networks 39 (2013) 1826

datum in hidden layer, RBFNN adds neurons to the hidden layer if


it requires.
Smoothing parameter determines the radius of effective
neighbors of datum. If some of the data that belong to the
same class can be grouped according to their classes, using the
smoothing parameter that encapsulates these data, improves the
classification performance. In Fig. 5 the relationships between the
smoothing parameters and classification rates of GCNN and PNN
are shown. For binary classification problems, GRNN performance
results are also added to figure.
According to Fig. 5 generally GCNN has equal or better
classification performances than GRNN and PNN under different
smoothing parameter values for most of data sets.
5. Conclusion
Through this work a new neural network for classification
purposes is proposed, and training method for it is introduced.
At training step, smoothing parameters are updated with gradient
descent method to reach optimal smoothing parameters for
corresponding data set. It includes additional layer and calculation
term with respect to existing RBFNN based GRNN and PNN. GCNN
has tested with 9 different popular data sets. Classification results
show that GCNN has better performance than both standard and
optimized GRNN, PNN, MLP and RBFNN. This improvement is
provided by diverge effect term addition, smoothing parameter
optimization approach and competition ability of output layer of
GCNN. Another reason for better classification results than PNN
is GCNNs function approximation based structure. Unlike fixed
smoothing parameters of GRNN and PNN, the training algorithm
adjusts smoothing parameters for each training datum. Diverge
effect term provides classification performance improvement by
increasing the distance among data belong to different classes.
Output layer with competition ability decides effective update for
smoothing parameter.
The memory problem of RBF based neural networks, such
as GRNN and PNN appearing one neuron assignment for each
training datum, still exists with GCNN. Future works are required
to overcome large memory assignment problem. In order to
use GCNN effectively, initial values of the smoothing parameters
should be selected in accordance with the data sets. GCNNs
training time requirement can be seen as a problem; however,
better classification performances are achieved by optimizing the
smoothing parameter.
Finally, GCNN has three important advantages: regression
based training method, dynamic smoothing parameter with
training method and more efficient classification performance
with diverge effect term. Test data performances proved the
effectivity of GCNN.
References
Adeli, H., & Panakkat, A. (2009). A probabilistic neural network for earthquake
magnitude prediction. Neural Networks, 22, 10181024.
Al-Daoud, E. (2009). A comparison between three neural network models for
classification problems. Journal of Artificial Intelligence, 2, 5664.
Amrouche, A., & Rouvaen, J. M. (2006). Efficient system for speech recognition using
general regression neural network. International Journal of Computer Systems
Science and Engineering, 183189.
Asad, B., Zhijiang, D., Lining, S., Reza, K., & Fereidoun, M. A. (2007). Fast 3D
reconstruction of ultrasonic images based on generalized regression neural
network. In World congress on medical physics and biomedical engineering.
Bartlett, P. L. (1998). The sample complexity of pattern classification with neural
networks: the size of the weights is more important than the size of the
network. IEEE Transactions on Information Theory, 44(2), 525536.

Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 2334.
Berthold, M. R., & Diamond, J. (1998). Constructive training of probabilistic neural
networks. Neurocomputing, 19, 167183.
Erkmen, B., & Yildirim, T. (2008). Improving classification performance of sonar
targets by applying general regression neural network with PCA. Expert Systems
with Applications, 35, 472475.
Firat, M., & Gungor, M. (2009). Generalized regression neural networks and feed
forward neural networks for prediction of scour depth around bridge piers.
Advances in Engineering Software, 40, 731737.
Frank, A., & Asuncion, A. (2010). UCI machine learning repository.
http://archieve.ics.uci.edu/ml.
Hajmeer, M., & Basheer, I. (2002). A probabilistic neural network approach for
modeling and classification of bacterial growth/no-growth data. Journal of
Microbiological Methods, 51, 217226.
Hoya, T., & Chambers, J. A. (2001). Heuristic pattern corrrection scheme using
adaptively trained generalized regression neural networks. IEEE Transactions on
Neural Networks, 12, 1.
Kailun, H., Huijun, X., & Maohua, X. (2010). The application of probabilistic neural
network model in the green supply chain performance evaluation for pig
industry. In International conference on e-business and e-government.
Kayaer, K., & Yildirim, T. (2003). Medical diagnosis on pima indian diabetes using
general regression neural networks. In Artificial neural networks and neural
information processing.
Kiyan, T., & Yildirim, T. (2004). Breast cancer diagnosis using statistical neural
networks. Journal of Electrical & Electronics Engineering, 4(2), 11491153.
Lee, E. W. M., Lim, C. P., Yuen, R. K. K., & Lo, S. M. (2004). A hybrid neural
network model for noisy data regression. IEEE Transactions on Systems, Man, and
Cybernetics, 34(2), 951960.
Madsen, K., Nielsen, H. B., & Tingleff, O. (2004). Methods for non-linear least
squares problems. Informatics and Mathematical Modeling Technical University
of Denmark.
Mangasarian, O. L., Setiono, R., & Wolberg, W. H. (1990). Pattern recognition via
linear programming: theory and application to medical diagnosis. In Large-scale
numerical optimization (pp. 2230). SIAM Publications.
Mangasarian, O. L., & Wolberg, W. H. (1990). Cancer diagnosis via linear
programming. SIAM News, 23(5), 118.
Mao, K., Tan, K., & Set, W. (2000). Probabilistic neural-network structure
determination for pattern classification. IEEE Transactions on Neural Networks,
11(4), 10091016.
Masters, T., & Land, W. (1997). A new training algorithm for the general regression
neural network. In IEEE international conference on systems, man and cybernetics,
computational cybernetics and simulation. Vol. 3 (pp. 19901994).
Montana, D. (1992). A weighted probabilistic neural network. Advances in neural
information processing systems, 4, 11101117.
Mosier, P. D., & Jurs, P. C. (2002). QSAR/QSPR studies using probabilistic neural
networks and generalized regression neural networks. Journal of Chemical
Information and Computer Sciences, 42, 14601470.
Popescu, I., Kanatas, A., Constantinou, P., & Nafornita, I. (2002). Application of
general regression neural networks for path loss prediction. In Proceedings of
international workshop trends and recent achievements in information technology.
Ren, S., Yang, D., Ji, F., & Tian, X. (2010). Application of generalized regression neural
network in prediction of cement properties. In 2010 International conference on
computer design and applications.
Rutkowski, L. (2004). Adaptive probabilistic neural networks for pattern classification in time-varying environment. IEEE Transactions on Neural Networks, 15(4),
811827.
Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3, 109118.
Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural
Networks, 2(6), 568576.
Tomandl, D., & Schober, A. (2001). A modified general regression neural network
with new efficient training algorithms as a robust black boxtool for data
analysis. Neural Networks, 14, 10231034.
Wang, Z., & Sheng, H. (2010). Rainfall prediction using generalized regression
neural network: case study Zhengzhou. In 2010 International conference on
computational and information sciences.
Wolberg, W. H., & Mangasarian, O. (1990). Multisurface method of pattern
separation for medical diagnosis applied to breast cytology. Proceedings of the
National Academy of Sciences, 87, 91939196.
Yildirim, T., & Cigizoglu, H. K. (2002). Comparison of generazlized regression neural
network and MLP performances on hydrologic data forecasting. In International
conference on neural information processing.
Yoo, P. D., Sikder, A. R., Zhou, B. B., & Zomaya, A. Y. (2007). Improved
general regression network for protein domain boundary prediction. In Sixth
international conference on bioinformatics.
Zhao, S., Zhang, J., Li, X., & Song, W. (2007). A generalized regression neural network
based on fuzzy means clustering and its application in system identification. In
International symposium on information technology convergence.
Zhu, C., & Hao, Z. (2009). Application of probabilistic neural network model in
evaluation of water quality. In Internation conference on environmental science
and information application technology.