You are on page 1of 2

2

demonstrated by comparing the training and generalization errors for 8 function ap-
proximation tasks. All supervised learning task can be considered as a regression
problem and the function approximation tasks are equivalent to regression tasks with-
out the error of estimation or measurement [4]. That is, the function approximation
may be considered as a regression problem without any error in the measurement of
the independent variables and is therefore a suitable task for demonstration of learn-
ing methodology.
The paper is organized as follows: Section 2 describes the architecture design of
the ANNs used in the experiments. Section 3 describes the weight initialization rou-
tines. Section 4 describes the design of experiments. The results are presented in Sec-
tion 5 while the conclusions are presented in Section 6.

2 ANN Architecture

The architecture of ANN used in this work is called the feed-forward architecture
without short-cut connections. That is, the nodes in the layers in-between the input
and the output layer are connected to preceding and succeeding layer node(s) only.
The universal approximation results for feed-forward ANNs require at least one layer
of hidden layer with sufficient number of non-linear (sigmoidal nodes) (see [5] for a
survey and further references). Since the minimal number of hidden layer required for
the universal approximation property is one, in this work we use networks with only
one hidden layer of sigmoidal nodes. Fig. 1 shows the schematic diagram of a single
hidden layer network used in this work. In general, more than one node may be pre-
sent in the output layer.

Fig. 1. The schematic diagram of a single hidden layer feed-forward network with I inputs and
one output (O).

T a d dd a d ,
d dd a d ,
dd (a ) d a d s the thresholds
th
( a ) d ( ). T dd a node (with xj being
the jth input) is:
3

𝑛𝑖 ∑ =1 𝜔𝑖 𝑥 𝜃𝑖 ; for 𝑖 1,2, … , 𝐻 (1)

The output of the ith hidden layer node is:


ℎ𝑖 𝜎 𝑛𝑖 ; for 𝑖 1,2, … , 𝐻 (2)

W ( ) a da c .T da c d
logistic or the log-sigmoid activation function defined as:
1
𝜎 𝑥 (3)
1+e

And, the derivative of this function is:


d 1 1
1 𝜎 𝑥 1 𝜎 𝑥 (4)
d 1+e 1+e

3 Weight Initialization Routines

The generally used weight initialization routine is to use uniform random numbers in
the range [-1,1]. This weight initialization routine may be called Unit Random Weight
(URW) initialization routine. This mechanism of weight initialization shall form the
base against which the proposed weight initialization routines shall be validated. The
proposal for the new weight initialization routine is to initialize the hidden layer node
thresholds as:
2𝑖−1
𝜃𝑖 1; for 𝑖 1,2, … . , 𝐻 (5)
𝐻

Where H is the number of hidden layer nodes. The input to hidden layer nodes
weights are initialized to uniform random numbers taken from the range [ 1, 1],
where:

λ1 (6)
𝐻√

Where H is the number of hidden layer nodes and I is the number of inputs to the
.W ac a a 1a d .
Similarly, for the output layer nodes, the thresholds are initialized as:
2𝑖−1
𝛾𝑖 1; for 𝑖 1,2, … . , 𝑂 (7)
𝑂

Where O is the number of output layer nodes. The hidden layer nodes to output
a d a a d a d b a a
[ 2, 2], where:

λ2 (8)
𝑂√𝐻

Where H is the number of hidden layer nodes and O is the number of outputs of the
.W ac a a 1a d .W a -

You might also like