4 views

Uploaded by nabeel

- best approximation for constant e
- Papers List Program Intercon 2017
- Artificial Neural Network
- Ch1 Slides(1)
- AI Neural Network Disaster Recovery Cloud Operations Systems
- Paper 28-Estimating the Number of Test Workers Necessary
- Active Learning for Convolutional Neural Networks - A Core-set Approach - e03df38b7410cbd2c6a27c4578c872e0dc1ce300
- zhaohaofu
- How to Build Your Own Neural Network From Scratch In
- 1511.08458v2
- Optimization of Injection Molding Process
- 1710.09829
- V1I9-IJERTV1IS9379
- Robotics
- ele01010[1]
- 9D06105 Neural Networks and Applications
- ENG_2013080615273386
- Data Mining and Neural Networks Danny Leung
- pvt22
- Www.sciencedirect.com Science Article Pii S0950705113003833

You are on page 1of 4

Faculty of Engineering, Yamaguchi University

2-16-1 Tokiwadai, Ube, 755-8611, Japan

mizukami@eee.yamaguchi-u.ac.jp

work (RBFN) [2] seems to be very promising because of

In this paper, a neural network architecture for non- its localized output function. However, as pointed out by

linear function approximation is proposed. We point out Weigand et al. [3], RBFN requires enormous hidden units as

problems in non-linear function approximation with tradi- the dimensionality of the input space is increased. Thus, we

tional neural networks, that is, difﬁculty in analyzing inter- propose a sigmoidal function with the localized derivative.

nal representation, no reproducibility in function approxi- Even if only the derivative is localized, the problem of dif-

mation due to the random scheme for weight initialization, ﬁculty in analyzing internal representation can be dramat-

and the insufﬁcient generalization ability in learning with- ically remedied. Concerning the second problem, we pro-

out enough samples. Based on these considerations, we sug- pose a deterministic weight initialization based on the re-

gest three main improvements. The ﬁrst is the design of a sult of linear approximation. Concerning the third problem,

sigmoidal function with localized derivative. The second is a new constraint for learning is introduced so that the local

a deterministic scheme for weight initialization. The third is mapping of neural network does not separate so far from the

an updating rule for weight parameters. Simulation results linearity.

show beneﬁcial characteristics of our proposed method; Until now, neural networks have been utilized as a

low approximation error at the beginning of iterative cal- “black-box approximation tool” in many applications and

culation, smooth convergence of error and its improvement some attempts have been done to explain the role of hid-

for difﬁculty in analyzing internal representation. den units in the non-linear mapping or combined them

into knowledge-based models(e.g. [4]) so that neural net-

works can be used as “a white- or gray-box tool.” In this pa-

per, by employing the assumption of weak non-linearity, we

1. Introduction propose a neural network architecture for non-linear func-

tion approximation. Simulation results show the approxi-

Cybenko et al. [1] gave a mathematical background of mation performance and the role of its hidden units with

applying neural networks with sigmoidal function to the the visual excitation maps. We discuss how to make the ap-

problem of non-linear function approximation and many re- proximation results provided by neural network more un-

searchers have utilized neural networks as one of the stan- derstandable mainly from the viewpoint of difﬁculty in

dard approximation tools. We believe, however, that there analyzing internal representation.

are mainly three problems to be solved in using traditional

neural network for the function approximation; 1) difﬁculty 2. Principle

in analyzing internal representation, 2) no reproducibility in

function approximation due to its random weight initializa- Assume that the objective non-linear function to be ap-

tion, and 3) the insufﬁcient generalization ability in learn- proximated can be described as the following equation,

ing without enough samples.

To deal with above three problems, we employ an as- ¼ ½ ½ (1)

sumption of weak non-linearity on input-output character- and that there are input units, hidden units and one out-

istic; the non-linearity of the objective function is not so put unit in the neural network. The internal value of the -th

hidden unit, , is the product sum of the out-

far from the linearity. According to this assumption, our ap- ´½µ

proach describes the non-linearity of the objective function

put values of input unit, , and the weights

´¼µ

based on the result of linear approximation performed in ad-

´½µ ´½µ ´½µ

vance. of hidden unit,

. After adding the bias,

, to ,

1051-4651/04 $ 20.00 IEEE

the output value of the -th hidden unit, , is obtained 1.4

through the non-linear output function, , described in 2.1. 1.2 f(5x)

The output value of the output unit, , is the sum of the 1.0

g(x)

f’(5x)

bias input, , and the product sum of the output value of 0.8 g’(x)

0.6

the hidden units and the weight values, , that is, 0.4

0.2

0.0

(2)

-0.2

-0.4

(3) -0.6

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

(4)

3.0

(a) function , and their derivatives.

2.0

2.1. sigmoidal function with localized derivative

1.0

0.0

with a continuous sigmoidal shape, G(x)

-1.0

g(x+2a)

g(x+a)

g(x)

-2.0

g(x-a)

(5) -3.0

g(x-2a)

-3 -2 -1 0 1 2 3

Figure 1. proposed non-linear function and

where we can easily notice that the derivative is localized as its synthesized function .

shown in Fig. 1(a). The ﬁgure compares the traditional sig-

moidal function, , the proposed func-

tion and their derivatives, where and its derivative are 2.2. deterministic weight initialization

scaled horizontally for the easy comparison. Figure 1(b)

shows that the non-linear function, , has a unique prop- The initial weights are given deterministically based on

erty that can compose the linear mapping at the domain of the result of linear approximation performed in advance for

[ , ] by placing them with a proper the objective non-linear function

so that the initial map-

interval , ping of the proposed method is the same with the result of

linear approximation. Assume that the result of linear ap-

(6) proximation for the function

is ,

that the input domain to be mapped is

, and that the

We explain the procedure for deriving Eq.6. The domain number of hidden units is . The interval value, , is given

of [ , ] is divided into segments with as and the initial weights are determined as the

the length of , then the following segments are obtained,

(7)

following,

( )

(9)

where is the integer index of each segment. On the seg-

( )

ment of , according to the characteristic of , the number ( )

of giving is , while the number of giv-

ing is k+2. Since the others are only and

( )

(10)

, is derived as the following, The initial mapping of the proposed network is equal to

the result of linear approximation performed in advance as

shown below,

(8)

1051-4651/04 $ 20.00 IEEE

2.0 2.0

1.0 1.0

(11) 0.0

-1.0

0.0

-1.0

-2.0 -2.0

1.0 1.0

0.5 0.5

-1.0 0.0 x2 -1.0 0.0 x2

-0.5 -0.5 -0.5 -0.5

x10.0 x10.0

2.3. updating rule for weight parameters 0.5 1.0 -1.0 0.5 1.0 -1.0

Especially in the input domain in which the number of

Figure 2. functions for simulation.

samples is not enough, it is not easy for the traditional neu-

ral network to give the proper approximation. One of the

reasons for this generalization problem is that any assump-

tion on input-output characteristics of the objective func-

tion are not employed. Thus, we apply the assumption of

2.0 2.0

weak non-linearity to the updating rule for weight parame- 1.0 1.0

0.0 0.0

ters. -1.0 -1.0

-2.0 -2.0

The penalty function used in this work is shown as 1.0 1.0

0.5 0.5

-1.0 0.0 x2 -1.0 0.0 x2

-0.5 -0.5

-0.5 -0.5

x10.0 0.5 x10.0 0.5

1.0 -1.0 1.0 -1.0

(a) traditional method (b) proposed method

Figure 3. approximation results for type 1.

(12)

error measure between the training samples and the outputs 2.0 2.0

of the network, the second term is forcing the neighbor hid- 1.0

0.0

1.0

0.0

den units to perform the similar linear calculation for the -1.0

-2.0

-1.0

-2.0

internal values and the third term is forcing the output unit 0.5

1.0

0.5

1.0

-1.0 0.0 x2 -1.0 0.0 x2

to sum up the output of hidden units with equal gain. The -0.5

x10.0 0.5 1.0 -1.0

-0.5 -0.5

x10.0 0.5 1.0 -1.0

-0.5

and

are parameters for controlling the effect of the second (a) traditional method (b) proposed method

and third terms, respectively. Our penalty function become

that of the traditional Back Propagation by setting both of Figure 4. approximation results for type 2.

and

to , while it become that of the recursive linear ap-

proximation by setting both of and

to ½. The gradient

Type 1 with uniform slope direction, the slope an-

descent procedure gives the following updating rules,

gle changes.

¼

(15)

(13)

Type 2 both of the slope direction and angle change.

(16)

(14)

where and are learning parameters for hidden and where type 2 was used in a previous work for studying the

output units, respectively. application of neural network [5]. The domains of and

were set to [ , ]. Figure 2 shows the function sur-

3. Simulation faces of and with their contours.

The 40,401 test samples were given by sub-sampling the

We performed simulations to compare the pro- function surface at the grids and 50 training samples

posed method and the traditional method. Two types were selected randomly from all the test samples. The to-

of non-linear function with two inputs and one out- tal iteration number for learning was set to 5,000 and one

put were used as the objective function to be approximated, training sample was used for updating weights in one itera-

tion. The number of hidden units was 21, and both of

1051-4651/04 $ 20.00 IEEE

and were 0.3, and the values of and were 0.001.

The traditional neural network was employed for compar- 1 traditional method

ison, in which the sigmoidal function was used as the proposed method

output function and its initial weights were given with the 0.1

0.01

tion error was deﬁned as the mean square error between test

samples and the outputs of neural network. 0.001

Figure 3 shows the approximation results for by the

traditional and proposed methods. Both of them gave proper 0.0001

mapping surfaces. The errors of the traditional and pro- 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Number of Iterations

tively. Figure 4 shows the results for the function . Com- Figure 5. changes in approximation error.

pared with the traditional contours, the contours of the pro-

posed method are somehow angular. The reason seems to be

that the proposed method employs the localized sigmoidal

function and that a new constraint was utilized in updat-

ing weights. The obtained errors by the traditional and pro-

posed methods were and . These re-

sults show that the proposed method have the almost same

approximation performance as the traditional method for

and . Figure 5 shows the changes of the approx-

imation error in learning period. We note that the initial (a) traditional method

error of the proposed method is much lower and its ten-

dency to decrease is very smooth owing to the deterministic

weight initialization and the updating rule, while the tradi-

tional method gives oscillated decrease.

We discuss the improvement for difﬁculty in analyzing

internal representation. Figure 6 shows the excitation maps

of 21 hidden units, where the horizontal and vertical axes

are those of and , their origins locate at the center

of the squares and the intensities correspond to the out- (b) proposed method

Figure 6. excitation maps of 21 hidden units.

put level of the hidden units, . As shown in Fig. 6(a),

the excitation maps of the traditional method is very vague

and difﬁcult to understand. However, as shown in Fig. 6(b), sults indicated small initial error, smooth convergence of er-

the excitation maps of the proposed method keep good or- ror and improvement for difﬁculty in analyzing the inter-

der and easy to understand for which part of domain and nal representation. In our future research, the applications

how each hidden unit is contributing. To put it concretely, of the proposed method to other types of non-linear func-

the obtained weight of hidden units indicated that the 6-th tion will be investigated.

and 7-th hidden units respond to the input domain around This work was partially supported by JSPS 16700208.

, that the 15-th and 16-th hidden units respond

to the input domain around , and that the middle References

ones from 10-th to 12-th units respond to the input domain

around

. This easiness of analysis is [1] Cybenko, G., Approximation by superpositions of a sigmoidal

mainly owing to the localized sigmoidal function. function, Math. Control Signal Systems, 2:303-314, 1989.

[2] Poggio, T. and F. Girosi, Networks for approximation and

learning, Proc. of IEEE, 78:1481-1497, 1990.

4. Conclusion [3] Weigand, A.S. et al., Predicting the future: a connectionist ap-

proach, Int. J. Neural Systems, 3(193):1481-1497, 1990.

In this paper, we proposed a neural network architec- [4] Oussar, Y. and G. Dreyfus, How to Be a Gray Box: The

ture for non-linear function approximation. Based on the as- Art of Dynamic Semi-Physical Modeling, Neural Networks,

sumption of weak non-linearity, three improvements were 14:1161-1172, 2001.

suggested; 1) the design of sigmoidal function with local- [5] Narendra, K.S. and K. Parthasarathy, Identiﬁcation and con-

ized function, 2) the deterministic weight initialization, and trol of Dynamical Systems Using Neural Networks, IEEE

3) the updating rule for weight parameters. Simulation re- Trans. NN, 1(1):4-27, 2000.

1051-4651/04 $ 20.00 IEEE

- best approximation for constant eUploaded bymark_villarino7158
- Papers List Program Intercon 2017Uploaded byLuighi Vitón Zorrilla
- Artificial Neural NetworkUploaded bySelva Kumar
- Ch1 Slides(1)Uploaded byPierre
- AI Neural Network Disaster Recovery Cloud Operations SystemsUploaded byIRJET Journal
- Paper 28-Estimating the Number of Test Workers NecessaryUploaded bySanchit Kapoor
- Active Learning for Convolutional Neural Networks - A Core-set Approach - e03df38b7410cbd2c6a27c4578c872e0dc1ce300Uploaded bysomnath_iii
- zhaohaofuUploaded byapi-411844848
- How to Build Your Own Neural Network From Scratch InUploaded byBrian Ramiro Oporto Quispe
- 1511.08458v2Uploaded byTehreem Ansari
- Optimization of Injection Molding ProcessUploaded byAbcvdgtyio Sos Rsvg
- 1710.09829Uploaded bySouhaiel Riahi
- V1I9-IJERTV1IS9379Uploaded bySuresh
- RoboticsUploaded byShivam
- ele01010[1]Uploaded byPreethu Mohan
- 9D06105 Neural Networks and ApplicationsUploaded bysubbu
- ENG_2013080615273386Uploaded byDauda Baba
- Data Mining and Neural Networks Danny LeungUploaded byabbhay26
- pvt22Uploaded byJitendra Vaish
- Www.sciencedirect.com Science Article Pii S0950705113003833Uploaded byOliveira JM
- Hinton.pdfUploaded byMade Artha
- Convolutional_Neural_Networks_with_Gener.pdfUploaded byyudi
- Lung Pattern ClassificationUploaded byMEGHANA
- Artificial Neural NetworksUploaded byMudit Misra
- Neural ControlUploaded byArioston Júnior
- Modelling of Utltimate Tensile Strength of Ferritic Steel WeldsUploaded byIJRASETPublications
- Random Pruning Cnns for PlasticityUploaded byHarshReq
- 00817703Uploaded byHüthâifâ Abdérahmân
- 05506963Uploaded byGreen Heart
- resumefinalUploaded byapi-355151291

- SWilliams Digital CitizenshipUploaded bySha1015
- InfoBulletin01 Fuels v1111Uploaded byNicholas Koutsouvanos
- udl lesson plan biomeUploaded byapi-272099822
- Ringot Et Al, 2007 (Isoterma de Hill)Uploaded byMessias Sousa Karoline Milhomem
- The Application of ISO 9001 to Agile Software _(Gkh Comments_)Uploaded byAdhik Wahyu
- Management Guru 4Uploaded byGabriel
- Jyotish_Delayed Marriage of Girls_ KN.raoUploaded byVarun Kalra
- Rotina de Treino - PercussãoUploaded byAriadne Rodrigues
- Southwestern UniversityUploaded bylazyreaderr
- Unit 10 - Statistics word problemsUploaded bybemdas
- Aptitude Verbal Non Verbal Reasoning PdffileUploaded byselvi0412
- IGC2K16 Final RulebookUploaded byEdwin
- StabilityUploaded bysridharparthipan
- Lecture-1Uploaded byyakesh
- MEC442_KJM492 (9)Uploaded byafnanhanany
- Measuring Inaccesible Points in Land SurveyingUploaded byCristi Istrate
- supersurfer_opmanUploaded bytamcollier
- PBS_-_Safety_Case_-_ISS02_REV00_-_Dec_13th_2016_-_PBS-HSE-SC-001Uploaded byGeorge Medeiros
- CO2 Safety ManualUploaded byKrishna Yadav
- SUBGRADE CBRUploaded byOlaolu Olalekan
- Simulation of Gain flattening 32 channels EDFA-DWDM Optical SystemUploaded byInfogain publication
- 1psycho.docxUploaded byfarah
- Definition of a Small GroupUploaded byPhilip Felicitas
- The Secret to Setting Great Goals(2)Uploaded byvarun168
- OB NotesUploaded byAnkur Max
- Conceptual Framework ReadUploaded byClara Ines
- Block de Pruebas SiemensUploaded byFernando Ramos
- ASTM (Kadar Air Agregat Halus)Uploaded byEfri Dwiyanto
- 1_Attitude.pptUploaded bynayakya
- AP HUMAN UNIT 1Uploaded byJohn Doe