Professional Documents
Culture Documents
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 53
Abstract— A study on improving training efficiency of Artificial Neural Networks algorithm was carried out throughout many
previous works. This paper presents a new approach to improve the training efficiency of back propagation neural network
algorithms. The proposed algorithm (GDM/AG) adaptively modifies the gradient based search direction by introducing the
value of gain parameter in the activation function. It has been shown that this modification significantly enhance the
computational efficiency of training process. The proposed algorithm is generic and can be implemented in almost all gradient
based optimization processes. The robustness of the proposed algorithm is shown by comparing convergence rates and the
effectiveness of gradient descent methods using the proposed method on heart disease data.
Index Terms— Back propagation, Search direction, adaptive gain, effectiveness, computational efficiency.
—————————— ——————————
1 INTRODUCTION
based optimization processes. This research will im- layer s , and let o = [o ...o ] be the column vector
s s
1
s T
n
prove data mining techniques particularly Artificial of activation values in the layer s and the input
layer as layer 0. Let wijs be the weight values for
Neural Network (ANN) in extracting hidden know-
the connecting link between the i th node in layer
ledge (patterns and relationship) associated with s − 1 and the j th node in layer s , and let
heart disease from a historical heart disease database w sj = [ w1s j ...wnjs ]T be the column vector of weights
efficiently. from layer s − 1 to the j th node of layer s . The net
input to the j th node of layer s is defined as
∑
net sj = ( w sj , o s −1 ) = k w sj ,k oks −1 , and let
3. THE PROPOSED METHOD net s = [net1s ...net ns ]T be the column vector of the
net input values in layer s . The activation value
for a node is given by a function of its net inputs
In this section, a novel approach for improving the and the gain parameter c sj ;
training efficiency of back propagation neural net- o sj = f (c sj net sj ) (2)
work algorithms is proposed. The proposed algo-
rithm modifies the initial search direction by chang- where f is any function with bounded deriva-
ing the gain value adaptively for each node. The fol- tive.
lowing subsection describes the algorithm. The ad- This information is now used to derive an expres-
vantages of using an adaptive gain value have been sion for modifying gain values for the next epoch.
Most of gradient based optimization methods use the
explored. Gain update expressions as well as weight
following gradient descent rule:
and bias update expressions for output and hidden ∂E (3)
nodes have also been proposed. These expressions ∆wij( n ) = −η ( n )
∂wij( n )
have been derived using same principles as used in
where η (n ) is the learning rate value at step n and
deriving weight updating expressions.
the gradient based search direction at step n is
The following iterative algorithm has been pro- ∂E
d (n) = − (n) = g (n) .
posed for changing the gradient based search direc- ∂
In the proposed
wij method the gradient based
tion using a gain value. search direction is modified by including the
variation of gain value to yield
Initialize the initial weight vector with random values and ∂E (4)
d ( n ) = − ( n ) (c (jn ) ) = g ( n ) (c (jn ) )
the vector of gain values with unit values. Repeat the fol- ∂wij
lowing steps 1 and 2 on an epoch-by-epoch basis until the The derivation of the procedure for calculating the
given error minimization criteria are satisfied. gain value is based on the gradient descent algorithm.
Step 1 By introducing gain value into activation function, The error function as defined in Equation (1) is diffe-
rentiated with respect to the weight value wijs . The
calculate the gradient of error with respect to
chain rule yields,
weights by using Equation (5), and gradient of error
∂E ∂E ∂net s +1 ∂o j ∂net j
s s
used in standard back propagation algorithm[18]. It should be noted that, the iterative formula
Suppose that for a particular input pattern o 0 , the de- as described in Equation (6) to calculate δ1s is
sired output is the teacher pattern t = [t1...t n ]T , and the same as used in the standard back propaga-
the actual output is okL , where L denotes the output tion algorithms [18] except for the appearance of
layer. The error function on that pattern is defined as, the gain value in the expression. The learning
1 (1) rule for calculating weight values as given in
E=
2
∑
(t − o L ) 2
k k k
Equation (3) is derived by combining (5) and (6).
In this approach, the gradient of error with respect
Let oks be the activation values for the k th node of
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 55
to the gain parameter can also be calculated by using dataset (364 records). The records for each set were
the chain rule as previously described; it is easy to selected randomly in order to avoid bias. For consis-
compute as
tency, only categorical attributes are used for neural
∂E (7)
= (∑ δ ks +1 wks ,+j1 ) f ' (c sj net sj )net sj networks model. All the medical attributes in Table 1
∂c sj k
were transformed from numerical to categorical data.
Then the gradient descent rule for the gain value The attribute “Diagnosis” was identified as the pre-
becomes,
dictable attribute with value ‘1’ for patients with heart
net sj (8)
∆ c sj = ηδ js s disease and value ‘0’ for patients with no heart dis-
cj ease. The attribute ‘PatientID’ was used as the key,
At the end of every epoch the new gain value is the rest were used as input attributes. It is assume
updated using a simple gradient based method that missing, inconsistent and duplicate data have
as given by the following formula, been resolved.
c new
j = c old
j + ∆c sj (9)
TABLE 1. Description of attributes
Predictable Attribute
1. Diagnosis (Value 0: <50% diameter narrowing (no
3.2 Implementation of the proposed method with heart disease); value 1:> 50% diameter narrowing
gradient descent method (has heart disease))
disease
works that fail to converge are obviously excluded
Patients with no heart disease, 220 Correct
from the calculations of the mean and standard devia-
predicted as having no heart
tion but are reported as failures. disease
Patients with heart disease, 62 Incor-
predicted as having no heart rect
4.2 Validating Algorithm Effectiveness disease
Patients with heart disease, 184 correct
predicted as having heart
The effectiveness of each algorithm was tested us-
disease
ing Classification Matrix which displays the frequen- Patients with no heart disease, 35 Incor-
cy of correct and incorrect predictions by comparing predicted as having heart rect
disease
GDM
disease
correct predictions. Patients with no heart disease, 211 Correct
predicted as having no heart
disease
Patients with heart disease, 20 Incor-
predicted as having no heart rect
disease
Counts for traingdm Table 2 summarizes the results of all three algorithms.
Predicted 0 (Actual) 1 (Actual) The proposed algorithm (GDM/AG) appears to be
0 220 62 most effective as it has the highest percentage of cor-
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 57
Generalization Accura-
Mean Number of epochs
Number of failures
and traingdm. However, traingdm appears to be most
cy (%)
effective for predicting patients with no heart disease
(89.4%) compared to other algorithms.
2.69 x 10-2
Traingdm
43.30
94.01
3467
14
For each training datasets, 100 different trials were
run, each with different initial random set of weights.
For each run, the number of iterations required for
convergence is reported. For an experiment of 100
4.59 x 10-2
runs, the mean of the number of iterations, the stan-
34.60
94.32
989
4
dard deviation, and the number of failures are col-
GDM
lected. A failure occurs when the network exceeds the
maximum iteration limit; each experiment is run to
one thousand iterations except for back propagation
3.69 x 10-2
GDM/AG
which is run to ten thousand iterations; otherwise, it
21.42
94.45
487
3
is halted and the run is reported as a failure. Conver-
gence is achieved when the outputs of the network
conform to the error criterion as compared to the de-
sired outputs.
Table 3 shows that the proposed algorithm
(GDM/AG) outperforms other algorithms in term of
3500 CPU time and number of epochs. The proposed algo-
3000 rithm (GDM/AG) only required 487 epochs in 21.4203
seconds of CPU times to achieve the target error,
2500
whereas GDM required 989 epochs in 34.5935 seconds
Epochs
5 CONLUSION
Fig. 2. 3D plot results for Heart disease classification
problem A novel approach is presented in this paper for im-
proving the training efficiency of back propagation
Figure 2 shows the 3D plot for the results of the Heart neural network algorithms by adaptively modifying
Disease classification problem. The proposed algo- the initial search direction. The proposed algorithm
rithms (GDM/AG) show better results because it uses the gain value to modify the initial search direc-
converges in smaller number of epochs as suggested tion. The proposed algorithm is generic and has been
by the low value of the mean. Furthermore, the num- implemented in all commonly used gradient based
ber of failures for GDM/AG is lower as compared to optimization processes. Classification Matrix methods
other two algorithms. This makes the GDM/AG algo- are used to evaluate the effectiveness and the conver-
rithm a better choice for this problem since it had only gence speed of the proposed algorithm. All three al-
3 failures for the 100 different runs.
gorithms are able to extract patterns in response to the
predictable state. The most effective algorithm to pre-
TABLE 3. Algorithms Results
dict patients who are likely to have a heart disease
appears to the proposed method (GDM/AG) fol-
Heart Diesease classification
Problem (Target error = 0.05) lowed by others two algorithms. The results showed
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 58
that the proposed algorithm is stable and has a poten- [8] S. M. Shamsuddin, M. Darus, and M. N. Sulaiman, Classifica-
tial to significantly enhance the computational effi- tion of Reduction Invariants with Improved Back Propagation.
IJMMS, 2002. 30(4): p. 239-247.
ciency of the training process.
[9] S. C. Ng, et al., Fast convergence for back propagation net-
work with magnified gradient function. Proceedings of the In-
ternational Joint Conference on Neural Networks 2003, 2003.
3: p. 1903-1908.
ACKNOWLEDGEMENT [10] R.A. Jacobs, Increased rates of convergence through learning
rate adaptation. Neural Networks, 1988. 1: p. 295–307.
[11] M.K. Weir, A method for self-determination of adaptive learn-
This work was supported by Universiti Tun Hussein
ing rates in back propagation. Neural Networks, 1991. 4: p.
Onn Malaysia (UTHM). 371-379.
[12] X.H. Yu, G.A. Chen, and S.X. Cheng, Acceleration of backpro-
pagation learning using optimized learning rate and momen-
tum. Electronics Letters, 1993. 29(14): p. 1288-1289.
[13] Bishop C. M., Neural Networks for Pattern Recognition. 1995:
REFERENCES Oxford University Press.
[14] R. Fletcher and M. J. D. Powell, A rapidly convergent descent
[1] A. van Ooyen and B. Nienhuis.: Improving the convergence of method for minimization. British Computer J., 1963: p. 163-
the back-propagation algorithm. Neural Networks, 1992. 5: p. 168.
465-471. [15] Fletcher R. and Reeves R. M., Function minimization by con-
[2] M. Ahmad and F.M.A. Salam, Supervised learning using the jugate gradients. Comput. J., 1964. 7(2): p. 149-160.
cauchy energy function. International Conference on Fuzzy [16] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients
Logic and Neural Networks, 1992. for solving linear systerns. J. Research NBS, 1952. 49: p. 409.
[3] Pravin Chandra and Yogesh Singh, An activation function [17] HUANG H.Y., A unified approach to quadratically conver-
adapting training algorithm for sigmoidal feedforward net- gent algorithms for function minimization. J. Optim. Theory
works. Neurocomputing, 2004. 61: p. 429-437. Appl., 1970. 5: p. 405-423.
[4] Krzyzak A., Dai W., and Suen C. Y., Classification of large set [18] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning
of handwritten characters using modified back propagation internal representations by error propagation. in D.E. Rumel-
model. Proceedings of the International Joint Conference on hart and J.L. McClelland (eds), Parallel Distributed Processing,
Neural Networks, 1990. 3: p. 225-232. 1986. 1: p. 318-362.
[5] Sang Hoon Oh, Improving the Error Backpropagation Algo- [19] N. M. Nawi, M. R. Ransing, and R. S. Ransing: “An improved
rithm with a Modified Error Function. IEEE TRANSACTIONS Conjugate Gradient based learning algorithm for back propa-
ON NEURAL NETWORKS, 1997. 8(3): p. 799-803. gation neural networks”, International Journal of Computa-
[6] Hahn-Ming Lee, Tzong-Ching Huang, and Chih-Ming Chen, tional Intelligence, March 2007, Vol. 4, No. 1, pp. 46-55.
Learning Efficiency Improvement of Back Propagation Algo- [20] Blake C. L.., UCI Machine Learning Databases,
rithm by Error Saturation Prevention Method. IJCNN '99, http://mlearn.ics.uci.edu/database/heart-disease/, 2004.
1999. 3: p. 1737-1742.
[7] Sang-Hoon Oh and Youngjik Lee, A Modified Error Function
to Improve the Error Back-Propagation Algorithm for Multi-
Layer Perceptrons. ETRI Journal, 1995. 17(1): p. 11-22.
Computer Science and Information Technology, Universiti Tun Hus-
sein Onn Malaysia (UTHM) since 2001. He had bachelor’s degree
in Computer Science from Universiti Putra Malaysia (UPM). He
received a Master Degree in Computer Science in Information Sys-
tem from Universiti Teknologi Malaysia (UTM). His Ph.D in decision
tree modeling with incomplete information in classification task
problem in Data Mining from Universite De La Rochelle, France.
Nazri Mohd Nawi received his B.S.degree in Computer Science His research interests including rough set theory, artificial intelli-
from University of Science Malaysia (USM), Penang, Malaysia. His gence in data mining and knowledge discovery.
M.Sc.degree in computer science was received from University of
Technology Malaysia (UTM), Skudai,Johor, Malaysia. He received
his Ph.D. degree in Mechanical Engineering department, Swansea
University, Wales Swansea.He is currently a seniour lecturer in
Software Engineering Depart ment at Universiti Tun Hussein Onn
Malaysia (UTHM). His research interests are in optimization, data-
mining techniques and neural networks.