Professional Documents
Culture Documents
238
243
② Computing the proportion of each class of M − Err
Pr ecision = × 100 % (2)
samples in training dataset, named ratet , which is the M
proportion of the t-th class samples in training dataset. Where Err is the number of samples that are wrong
③ Deciding Q number samples to train an ISGNN. classified; M is the number of testing samples. Here, M
is 31.
④ Selecting the samples from each class randomly,
To verdict whether classification results are correct
according to the ratet . Then combine these samples
or not as follows:
to generate a new training dataset whose size is Q . if yˆi − yi ≤ a , then the classification result is
⑤ According to the step ④, use the new training correct. (3)
sample dataset to train an ISGNN. A SGNT is
Where ŷ i is the real value of the i-th sample, which
generated.
⑥ Repeating the steps ④ and ⑤ . Number of is expressed 1or 0. Here, 1 denotes that the enterprise
iteration is decided by CN . CN is the number of is legitimate. 0 denotes that the enterprise is
ISGNN in ensemble ISGNN. illegitimate; ŷ i is the output of the i-th sample, which
As soon as each ISGNN of the ensemble ISGNN is obtained from ensemble ISGNN; a is a constant,
was trained independently, we can predict testing whose scale is from 0 to 0.5. Here a is 0.4.
samples’ class using the trained ensemble ISGNN. According to section 4.2, the result of FDTD used
Each testing sample was inputted to the ensemble ensemble ISGNN is obtained. The result of FDTD
ISGNN and CN classification results were obtained. If used SGNN is obtained using the algorithm proposed
CN ISGNNs deduce the same results for one testing by Wang et al [1]. Table 1 lists FDTD results of
sample, the final classification result is decided easily. ensemble ISGNN and SGNN.
Otherwise, we find out winner among CN SGNTs. In the experiment, one third samples of training data
The winner has the minimum distance between the were employed to train each ISGNN in ensemble
testing sample and leaf neurons among CN SGNTs. ISGNN, and there are 3 ISGNNs in the ensemble
The class of the testing sample is then decided as same ISGNN. Experimental results show that classification
as the class of winner. Finally, we can predict class of precision of ensemble ISGNN is 96.7742% in 31
each sample in testing set by ensemble ISGNN and testing sample data; it is 3.22 points higher than that of
calculate of classification precision of testing sample SGNN, and the number of training sample data of
set. ensemble ISGNN is one third that of SGNN. Ensemble
ISGNN is efficient for FDTD. However, ensemble
5. Experiments ISGNN consumes more time than that of SGNN.
We choose 61 sampled enterprises as experimental Table 1. Performence Evaluation of ensemble ISGNN and
objects to compare FDTD performances between SGNN with Tax Dataset
ensemble ISGNN and SGNN. These 61 enterprises are Number of
representative in the retail business of Qingdao city in Precision Time-consume
Err Training sample
(%) ( sec )
China, whose business area ranges from dozens of data
square meters to nearly 30,000 square meters, and Ensemble
96.7742 1 10 0.7110
registered capital ranges from hundreds of thousands ISGNN
SGNN 93.5484 2 30 0.1875
RMB to over 230 million RMB, staff's number ranges
from several to several hundred people. The financial
information of each enterprise, such as registered
capital, business area, staff’s number, total project of 6. Conclusion
the appreciation amount of tax to be paid monthly, etc.,
composes an attribute set of each sample. Every We propose a new method used a neural network
sample has 71 attributes as input vector and one ensemble of ISGNN to solve the problem of FDTD
attribute as output vector. These 61 enterprises are and give comparison results of ensemble ISGNN and
divided two sets: the first 30 enterprises are training SGNN in this paper. We evaluate the performances
samples; the others are testing samples. between ensemble ISGNN and SGNN according to
Precision of fraud detection is defined as follows: financial data of 61 sampled enterprises. Experimental
239
244
results show that ensemble ISGNN is effective: [3] Li, A.G, Yong, H, Li, Z.H. Iteration Learning SGNN.
classification precision is 3.22 points higher than that Proc of IEEE ICNN&B'05, Beijing, pp.1912~1916.
of SGNN in 31 testing sample data, and the number of [4] Wen, W, Jennings, A, Liu, H. Learning a neural tree.
training sample data is one third that of SGNN. Proc of Int’l Joint Conf. on Neural Networks, Beijing,
However, classification precision of ensemble ISGNN pp.751~756.
is unsteady because of selecting training samples [5] Inoue, H, Narihisa, H. Efficiency of Self-Generating
randomly.
Neural Network Applied to Pattern Recognition.
Mathematical and Computer Modeling, 2003, 38 (11-
Acknowledgement 13):1225~1232.
[6] Inoue, H, Fukunaga, Y, Narihisa, H. Efficient Hybrid
This work was partially supported by the Natural
Neural Network for Chaotic Time Series Prediction.
Science Fund of Department of Education of Shaanxi
IEICE Transactions on Information and Systems, 2002,
province,China under Grant No. 07JK314,and the
J85-D-II (4):689~694.
Science Research Fund of Xi’an University of Science
[7] Hansen, Lk, Salamon, P. Neural network ensembles.
and Technology.
IEEE Trans Pattern Analysis and Machine Intelligence,
1990,12 (10): 993~1001.
References [8] Inoue, H, Narihisa, H. Improving Generalization
Ability of Self-Generating Neural Networks through
[1] Wang, S.W, Li, A.G. Fraud detection in tax declaration Ensemble Averaging. Proc of 4th SICE Symposium on
using SGNN. Journal of Xi’an University of Science Decentralized Autonomous Systems, SICE, Okinawa,
and Technology, 2004, 24 (4):470~474. Japan ,2000, pp:177~180.
[2] Inoue, H, Narihisa, H. Efficient Pruning Method for [9] Zhou, Z.H, Chen, S.F. Neural Network Ensemble.
Ensemble Self-Generating Neural Networks. Journal of CHINESE J.COMPUTERS, 2002, 25 (1):1~8.
Systemic, Cybernetics and Informatics, 2003,
1(6):72~77.
240
245