You are on page 1of 4

A proposal of neural network architecture for non-linear function approximation

Yoshiki Mizukami, Yuji Wakasa, Kanya Tanaka


Faculty of Engineering, Yamaguchi University
2-16-1 Tokiwadai, Ube, 755-8611, Japan
mizukami@eee.yamaguchi-u.ac.jp

Abstract Concerning the first problem, radial basis function net-


work (RBFN) [2] seems to be very promising because of
In this paper, a neural network architecture for non- its localized output function. However, as pointed out by
linear function approximation is proposed. We point out Weigand et al. [3], RBFN requires enormous hidden units as
problems in non-linear function approximation with tradi- the dimensionality of the input space is increased. Thus, we
tional neural networks, that is, difficulty in analyzing inter- propose a sigmoidal function with the localized derivative.
nal representation, no reproducibility in function approxi- Even if only the derivative is localized, the problem of dif-
mation due to the random scheme for weight initialization, ficulty in analyzing internal representation can be dramat-
and the insufficient generalization ability in learning with- ically remedied. Concerning the second problem, we pro-
out enough samples. Based on these considerations, we sug- pose a deterministic weight initialization based on the re-
gest three main improvements. The first is the design of a sult of linear approximation. Concerning the third problem,
sigmoidal function with localized derivative. The second is a new constraint for learning is introduced so that the local
a deterministic scheme for weight initialization. The third is mapping of neural network does not separate so far from the
an updating rule for weight parameters. Simulation results linearity.
show beneficial characteristics of our proposed method; Until now, neural networks have been utilized as a
low approximation error at the beginning of iterative cal- “black-box approximation tool” in many applications and
culation, smooth convergence of error and its improvement some attempts have been done to explain the role of hid-
for difficulty in analyzing internal representation. den units in the non-linear mapping or combined them
into knowledge-based models(e.g. [4]) so that neural net-
works can be used as “a white- or gray-box tool.” In this pa-
per, by employing the assumption of weak non-linearity, we
1. Introduction propose a neural network architecture for non-linear func-
tion approximation. Simulation results show the approxi-
Cybenko et al. [1] gave a mathematical background of mation performance and the role of its hidden units with
applying neural networks with sigmoidal function to the the visual excitation maps. We discuss how to make the ap-
problem of non-linear function approximation and many re- proximation results provided by neural network more un-
searchers have utilized neural networks as one of the stan- derstandable mainly from the viewpoint of difficulty in
dard approximation tools. We believe, however, that there analyzing internal representation.
are mainly three problems to be solved in using traditional
neural network for the function approximation; 1) difficulty 2. Principle
in analyzing internal representation, 2) no reproducibility in
function approximation due to its random weight initializa- Assume that the objective non-linear function to be ap-
tion, and 3) the insufficient generalization ability in learn- proximated can be described as the following equation,
ing without enough samples.
To deal with above three problems, we employ an as-  ¼ ½   ½ (1)
sumption of weak non-linearity on input-output character- and that there are  input units,  hidden units and one out-
istic; the non-linearity of the objective function is not so put unit in the neural network. The internal value of the  -th
hidden unit,       , is the product sum of the out-
far from the linearity. According to this assumption, our ap- ´½µ
proach describes the non-linearity of the objective function
put values of input unit,      , and the weights
´¼µ
based on the result of linear approximation performed in ad-
´½µ ´½µ ´½µ
vance. of hidden unit,
 . After adding the bias,
 , to  ,

Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04)


1051-4651/04 $ 20.00 IEEE

the output value of the -th hidden unit,  , is obtained 1.4
through the non-linear output function, , described in 2.1. 1.2 f(5x)
The output value of the output unit,  , is the sum of the 1.0
g(x)
f’(5x)

bias input,  , and the product sum of the output value of 0.8 g’(x)

 0.6
the hidden units and the weight values,  , that is, 0.4
  0.2
   0.0
     (2)
-0.2

   
  -0.4
   (3) -0.6
  -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

 



 

 (4)
3.0
(a) function ,  and their derivatives.

2.0
2.1. sigmoidal function with localized derivative
1.0

In this study, we propose a non-linear output function


0.0
with a continuous sigmoidal shape, G(x)


-1.0

 
g(x+2a)

     
 g(x+a)
g(x)
-2.0


g(x-a)


 

       (5) -3.0
g(x-2a)

-3 -2 -1 0 1 2 3

  (b) synthesized mapping, , of  with   and  .


 Figure 1. proposed non-linear function  and
where we can easily notice that the derivative is localized as its synthesized function .
shown in Fig. 1(a). The figure compares the traditional sig-
moidal function,      , the proposed func-
tion  and their derivatives, where and its derivative are 2.2. deterministic weight initialization
scaled horizontally for the easy comparison. Figure 1(b)
shows that the non-linear function, , has a unique prop- The initial weights are given deterministically based on
erty that can compose the linear mapping at the domain of the result of linear approximation performed in advance for
[   ,   ] by placing them with a proper the objective non-linear function
so that the initial map-
interval , ping of the proposed method is the same with the result of


 
    linear approximation. Assume that the result of linear ap-
 

     (6) proximation for the function
is          ,
 that the input domain to be mapped is  
, and that the
We explain the procedure for deriving Eq.6. The domain number of hidden units is  . The interval value, , is given
of [   ,   ] is divided into segments with as    and the initial weights are determined as the
the length of , then the following segments are obtained,
    
 (7)
following,
   (    )
 
 

    (9)
where  is the integer index of each segment. On the seg- 
 ( )
ment of  , according to the characteristic of , the number  (  )
of  giving   is   , while the number of  giv-


ing   is k+2. Since the others are only    and

 ( )
 (10)

    ,  is derived as the following, The initial mapping of the proposed network is equal to
the result of linear approximation performed in advance as

 
         
 
shown below,
            
    
     
  

    
    

 



  

      
   
 (8)   

 

   
 

Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04)


1051-4651/04 $ 20.00 IEEE
      

    


  
 2.0 2.0
 1.0 1.0
   (11) 0.0
-1.0
0.0
-1.0
 -2.0 -2.0
1.0 1.0
0.5 0.5
-1.0 0.0 x2 -1.0 0.0 x2
-0.5 -0.5 -0.5 -0.5
x10.0 x10.0
2.3. updating rule for weight parameters 0.5 1.0 -1.0 0.5 1.0 -1.0

(a) type 1: function  (b) type 2: function 


Especially in the input domain in which the number of
Figure 2. functions for simulation.
samples is not enough, it is not easy for the traditional neu-
ral network to give the proper approximation. One of the
reasons for this generalization problem is that any assump-
tion on input-output characteristics of the objective func-
tion are not employed. Thus, we apply the assumption of
2.0 2.0
weak non-linearity to the updating rule for weight parame- 1.0 1.0
0.0 0.0
ters. -1.0 -1.0
-2.0 -2.0
The penalty function used in this work is shown as 1.0 1.0
0.5 0.5
-1.0 0.0 x2 -1.0 0.0 x2
-0.5 -0.5

                  
-0.5 -0.5
x10.0 0.5 x10.0 0.5
1.0 -1.0 1.0 -1.0


           (a) traditional method (b) proposed method
Figure 3. approximation results for type 1.
 
     

(12)
 

where the first term in the right-hand side is the traditional


error measure between the training samples and the outputs 2.0 2.0
of the network, the second term is forcing the neighbor hid- 1.0
0.0
1.0
0.0
den units to perform the similar linear calculation for the -1.0
-2.0
-1.0
-2.0
internal values and the third term is forcing the output unit 0.5
1.0
0.5
1.0
-1.0 0.0 x2 -1.0 0.0 x2
to sum up the output of hidden units with equal gain. The  -0.5
x10.0 0.5 1.0 -1.0
-0.5 -0.5
x10.0 0.5 1.0 -1.0
-0.5

and
are parameters for controlling the effect of the second (a) traditional method (b) proposed method
and third terms, respectively. Our penalty function become
that of the traditional Back Propagation by setting both of  Figure 4. approximation results for type 2.
and
to , while it become that of the recursive linear ap-
proximation by setting both of  and
to ½. The gradient
Type 1 with uniform slope direction, the slope an-
descent procedure gives the following updating rules,
gle changes.
 
 
               
¼

  
       (15)
            
     
 
(13)
Type 2 both of the slope direction and angle change.
        
  

 
 
             
 (16)
 
(14)

where   and  are learning parameters for hidden and where type 2 was used in a previous work for studying the
output units, respectively. application of neural network [5]. The domains of  and
 were set to [ , ]. Figure 2 shows the function sur-
3. Simulation faces of  and  with their contours.
The 40,401 test samples were given by sub-sampling the
We performed simulations to compare the pro- function surface at the  grids and 50 training samples
posed method and the traditional method. Two types were selected randomly from all the test samples. The to-
of non-linear function with two inputs and one out- tal iteration number for learning was set to 5,000 and one
put were used as the objective function to be approximated, training sample was used for updating weights in one itera-
tion. The number of hidden units was 21, and both of  

Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04)


1051-4651/04 $ 20.00 IEEE
and  were 0.3, and the values of  and  were 0.001.
The traditional neural network was employed for compar- 1 traditional method
ison, in which the sigmoidal function  was used as the proposed method

output function and its initial weights were given with the 0.1

random value of [ , ]. In this study, the approxima-


0.01
tion error was defined as the mean square error between test
samples and the outputs of neural network. 0.001
Figure 3 shows the approximation results for   by the
traditional and proposed methods. Both of them gave proper 0.0001

mapping surfaces. The errors of the traditional and pro- 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

posed methods were     and    , respec-


Number of Iterations

tively. Figure 4 shows the results for the function   . Com- Figure 5. changes in approximation error.
pared with the traditional contours, the contours of the pro-
posed method are somehow angular. The reason seems to be
that the proposed method employs the localized sigmoidal
function and that a new constraint was utilized in updat-
ing weights. The obtained errors by the traditional and pro-
posed methods were     and     . These re-
sults show that the proposed method have the almost same
approximation performance as the traditional method for
  and   . Figure 5 shows the changes of the approx-
imation error in learning period. We note that the initial (a) traditional method
error of the proposed method is much lower and its ten-
dency to decrease is very smooth owing to the deterministic
weight initialization and the updating rule, while the tradi-
tional method gives oscillated decrease.
We discuss the improvement for difficulty in analyzing
internal representation. Figure 6 shows the excitation maps
of 21 hidden units, where the horizontal and vertical axes
are those of  and  , their origins locate at the center
of the squares and the intensities correspond to the out- (b) proposed method
 Figure 6. excitation maps of 21 hidden units.
put level of the hidden units,  . As shown in Fig. 6(a),
the excitation maps of the traditional method is very vague
and difficult to understand. However, as shown in Fig. 6(b), sults indicated small initial error, smooth convergence of er-
the excitation maps of the proposed method keep good or- ror and improvement for difficulty in analyzing the inter-
der and easy to understand for which part of domain and nal representation. In our future research, the applications
how each hidden unit is contributing. To put it concretely, of the proposed method to other types of non-linear func-
the obtained weight of hidden units indicated that the 6-th tion will be investigated.
and 7-th hidden units respond to the input domain around This work was partially supported by JSPS 16700208.
 , that the 15-th and 16-th hidden units respond
to the input domain around  , and that the middle References
ones from 10-th to 12-th units respond to the input domain
around  
 . This easiness of analysis is [1] Cybenko, G., Approximation by superpositions of a sigmoidal
mainly owing to the localized sigmoidal function. function, Math. Control Signal Systems, 2:303-314, 1989.
[2] Poggio, T. and F. Girosi, Networks for approximation and
learning, Proc. of IEEE, 78:1481-1497, 1990.
4. Conclusion [3] Weigand, A.S. et al., Predicting the future: a connectionist ap-
proach, Int. J. Neural Systems, 3(193):1481-1497, 1990.
In this paper, we proposed a neural network architec- [4] Oussar, Y. and G. Dreyfus, How to Be a Gray Box: The
ture for non-linear function approximation. Based on the as- Art of Dynamic Semi-Physical Modeling, Neural Networks,
sumption of weak non-linearity, three improvements were 14:1161-1172, 2001.
suggested; 1) the design of sigmoidal function with local- [5] Narendra, K.S. and K. Parthasarathy, Identification and con-
ized function, 2) the deterministic weight initialization, and trol of Dynamical Systems Using Neural Networks, IEEE
3) the updating rule for weight parameters. Simulation re- Trans. NN, 1(1):4-27, 2000.

Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04)


1051-4651/04 $ 20.00 IEEE