You are on page 1of 2

6

the magnitude by which the proposed weight initialization technique using networks
reach deeper/lower training and generalization errors.

Table 2. Summary error data over the training data set (TRS), that is, training error summary.
All statistics are reported × 10-3. The lowest values of the statistics are in bold while the highest
value of the ratio is in bold.
Weight Initialization Routines Ratio (not scaled)
Task Statistics URW NWI1 NWI½ URW/ NWI1 URW/ NWI½
MMSE 0.3690 0.0358 0.1591 10.30 2.32
F1
MeMSE 0.1730 0.0273 0.0305 6.34 5.67
MMSE 12.0943 6.1430 4.8860 1.97 2.48
F2
MeMSE 10.9070 5.5014 4.3913 1.98 2.48
MMSE 1.8148 1.4587 1.6432 1.24 1.10
F3
MeMSE 1.5874 1.4225 1.4678 1.12 1.08
MMSE 13.0492 3.0386 1.9804 4.29 6.59
F4
MeMSE 13.5603 3.1490 1.9799 4.31 6.85
MMSE 7.4851 5.2311 4.7519 1.43 1.58
F5
MeMSE 7.4279 5.1000 4.3876 1.46 1.69
MMSE 3.8722 1.9926 1.8179 1.94 2.13
F6
MeMSE 3.1753 1.9385 1.6385 1.64 1.94
MMSE 32.3349 13.8924 11.1025 2.33 2.91
F7
MeMSE 28.2194 10.8099 9.5618 2.61 2.95
MMSE 0.4349 0.2044 0.1582 2.13 2.75
F8
MeMSE 0.4086 0.2044 0.1570 2.00 2.60

Table 3. Summary error data over the test data set (TES), that is, generalization error summary.
All statistics are reported × 10-3. The lowest values of the statistics are in bold while the highest
value of the ratio is in bold.
Weight Initialization Routines Ratio (not scaled)
Task Statistics URW NWI1 NWI½ URW/ NWI1 URW/ NWI½
MMSE 0.3551 0.0374 0.1539 9.49 2.31
F1
MeMSE 0.1783 0.0296 0.0298 6.02 5.98
MMSE 14.0219 8.1246 6.7738 1.73 2.07
F2
MeMSE 12.2396 7.3359 5.9354 1.67 2.06
MMSE 1.9591 1.5776 1.7876 1.24 1.10
F3
MeMSE 1.7258 1.5573 1.5886 1.11 1.09
MMSE 24.3807 7.5120 5.3736 3.25 4.54
F4
MeMSE 25.0304 7.5633 5.5282 3.31 4.53
MMSE 9.5628 7.2219 6.6908 1.32 1.43
F5
MeMSE 8.8931 7.0395 6.4935 1.26 1.37
MMSE 5.0938 3.0535 2.8572 1.67 1.78
F6
MeMSE 4.3172 2.8516 2.4871 1.51 1.74
MMSE 38.8125 18.9459 14.7499 2.05 2.63
F7
MeMSE 33.8231 15.2652 13.1461 2.22 2.57
MMSE 1.3582 0.8947 0.8976 1.52 1.51
F8
MeMSE 1.2674 0.8882 0.8776 1.43 1.44

As described in Section 4, to test the statistical significance in the difference in


MMSE values, one sided t-test is conducted at a significance level of 0.05. From these
test the following is observed:
1. For the training data set, in all 8 tasks, NWI1 is statistically better than URW.
7

2. For the test data set, in all 8 tasks, NWI1 is statistically better than URW.
3. For the training data set, in 7 tasks (excluding F3), NWI½ is statistically better
than URW.
4. For the test data set, in 7 tasks (excluding F3), NWI½ is statistically better than
URW.
5. In no case for training and test data sets, NWI½ or NWI1 are statistically worse
than URW.
6. For the training data set, in 7 tasks (excluding F3), NWI½ is statistically better
than URW.
7. For the training data set, NWI1 is statistically better than NWI½ for two tasks
(F1 and F3) while in 4 tasks NWI½ is better than NWI1 (F2, F4, F7 and F8).
While for the tasks F5 and F6, for the training data set, the behavior of NWI½
and NWI1 is statistically comparable.
8. For the test data set, NWI1 is statistically better than NWI½ for two tasks (F1
and F3) while in 3 tasks NWI½ is better than NWI1 (F2, F4 and F7). While for
the tasks F5, F6, and F8 for the test data set, the behavior of NWI½ and NWI1
is statistically comparable.
Similarly, the comparison on the basis of Wilcoxon s rank-sum test allows us to
observe the following:
1. For the training data set, in all 8 tasks, NWI1 is statistically better than URW.
2. For the training data set, in all 7 tasks (except F3), NWI½ is statistically better
than URW.
3. For the training data set, in no task, NWI1 or NWI½ has higher median than
URW that is statistically significant.
4. For the training data set, NWI½ has lower median than NWI1 that is statistical-
ly significant in 5 tasks (F2, F4, F5, F7 and F8) while for no task NWI1 has
lower median than NWI½ that is statistically significant.
5. For the test data set, in all 8 tasks, NWI1 is statistically better than URW.
6. For the test data set, in all 7 tasks (except F3), NWI½ is statistically better than
URW.
7. For the test data set, in no task, NWI1 or NWI½ has higher median than URW
that is statistically significant.
8. For the test data set, NWI½ has lower median than NWI1 that is statistically
significant in 4 tasks (F2, F4, F5, and F7) while for no task NWI1 has lower
median than NWI½ that is statistically significant.
These observations allow us to conclude that though, NWI1 as well as NWI½ are
clearly better weight initialization method than URW. On the basis comparison of
MeMSE (with median being a more robust estimator of central tendency), we recom-
mend the usage of NWI½ as a weight initialization method.

6 Conclusions

In this paper, two weight initialization methods are proposed and compared with uni-
form weight initialization in the interval (-1,1) on a set of 8 function approximation

You might also like