Professional Documents
Culture Documents
ICRTCIS 2020 Paper 57-4-5
ICRTCIS 2020 Paper 57-4-5
posed weight initialization as New Weight Initialization (NWI) routines and label
them as NWI1 a a a =1, a d NWI½ to mea a a =
½.
Table 1. Function approximation problems and the associated network architecture. (I: Number
of inputs, H: Number of hidden nodes and O: number of output nodes)
Task Function I H O
1 1
𝐹1 𝑥 6
F1: 𝑥 0.3 2 0.01 𝑥 0.9 2 0.4 1 20 1
Where x (0,1,1).
3 1 𝑥 2 e− − +1
𝑥
𝐹2 𝑥, 𝑦 10 𝑥 3 𝑦 5 e− −
F2: 5 2 15 1
1 − +1 −
e
3
Where x,y ( 3,3).
𝐹3 𝑥, 𝑦 e sin
F3: 2 10 1
Where x,y ( 1,1).
1 sin 2𝑥 3𝑦
𝐹4 𝑥, 𝑦
F4: 3.5 sin 𝑥 𝑦 2 64 1
Where x,y ( 2,2).
𝐹5 𝑥, 𝑦 42.659 0.1 𝑥 0.05 𝑥 4 10 𝑥 2 𝑦 2 5𝑦 4
F5: 2 17 1
Where x,y ( 0.5,0.5).
𝐹6 𝑥, 𝑦 1.9 1.35 e sin 13 𝑥 0.6 2 e sin 7𝑦
F6: 2 18 1
Where x,y (0,1).
𝐹7 𝑥, 𝑦 sin 2𝜋 𝑥 2 𝑦2
F7: 2 24 1
Where x,y ( 1,1).
2
𝐹 𝑥1 , … , 𝑥6 10 sin 𝜋𝑥1 𝑥2 20 𝑥3 0.5 10𝑥4 5𝑥5 0𝑥6
F8: 6 30 1
Where x1, , x6 ( 1,1).
For each learning task, a set of 1000 data points were generated from the domain
of the inputs using uniform random number generator. Together with the correspond-
ing output values of the function, constitute the data set for experiments. Both the
inputs as well as the outputs are scaled to the interval [-1,1] (or, min-max normaliza-
tion). All experiments are conducted on these scaled variables and the results are re-
ported for the same. The 1000 tuple data set is divided in two parts, 500 tuples consti-
tute the training set (TRS) and the other set of 500 tuples constitutes the test data set
(TES).
The architecture was fixed by exploratory experiments in which the number of
nodes in the hidden layer was varied from 1 to 100, and the smallest network that
provided a satisfactory error of training was taken as the architecture for further ex-
perimentation. The architecture summary is a part of Table 1. The exploratory exper-
iments were conducted for 500 epochs of training.
For each learning task (function approximation problem) an ensemble of 50 initial
networks is created by initialization of weights / thresholds using each of the weight
initialization routines, namely: (a) URW, (b) NWI1, and (c) NWI½. Thus, for each
problem we have 150 networks while there are 8 tasks, thus in all 1200 networks are
trained. The training is conducted for 2000 epochs for the detailed experiments for
comparison of weight initialization techniques.
For the measurement of the errors of training (over the training data set) and the
generalization error (over the test data set) we use the mean squared error measure
(MSE). Since for each weight initialization technique, the ensemble of networks con-
tains 50 networks (for each problem), the average of the MSEs for the ensemble is
reported as MMSE. We also report the median of the ensemble MSE as MeMSE, as
the median is deemed to be a more robust estimator of central tendency [14].
The values of the MMSE and MeMSE allows us to compare the different initiali-
zation techniques but to see whether the difference in the MMSE values are statisti-
ca ca , a dS d -test [15,16] at a significance
level of 0.05 and similarly to assess the statistical significance of the difference of
MeMSE for the different weight initialization techniques we use the one-tailed Wil-
c a -sum test [15,16] at a significance level of 0.05.
5 Results
The experimental results for training (over the training data set, TRS) and the general-
ization error (over the test data set, TES) is summarized in Table 2 and Table 3, re-
spectively. From these tables, it is clear that the proposed weight initialization rou-
tines have better training behavior (that is, achieve lower values of MMSE and MeM-
SE) and also have better generalization ability. From Table 1, we observe that in 6 of
the tasks, NWI½ performs the best during training while in 2 tasks NWI1 has the best
performance. A similar trend is observed from the generalization error summary (Ta-
ble 3). Though, in this case for F8, on the basis of MMSE, NWI1 performs better than
NWI½. A similar trend is shown on the basis of the ratios calculated. The ratios reflect