You are on page 1of 6

2012 International Conference on Management Science & Engineering (19th)

September 20-22, 2012 Dallas, USA

Research on Sampling Method of Tax-checking


Based on Neural Network
WANG Guang-liang
School of Management, Harbin Institute of Technology, P.R.China, 150001

Abstract: It is a core component of the Golden Tax 2 The main problems in sampling practices
Project that the application of information technology of tax-checking
supports the tax-checking. According to some problems
of inefficiency and poor accuracy in tax-checking 2.1 Asymmetric information
sampling practices, learning from the current (1) The information asymmetry of collection,
tax-checking sampling study, selects financial indicators management, tax-checking within the tax authorities. A
of tax-checking sample of the value-added tax (VAT) direct result is, sampling in tax-checking link based on
based on gradually discriminant analysis (GDA), has a the data which quality is not high, inaccurate and can not
better solution to discriminant classifier of the "honest reflect the basic status of taxpayers, low utilization of
tax group" and "dishonest tax group", and then using the data, the sampling based on inaccurate information will
technology of self-organizing map neural network result in low accuracy.
(SOM), builds a intelligent analysis model on VAT (2) Asymmetric information between tax authorities
sampling; Finally, uses the real data of 43 enterprises as and taxpayers. The enterprise information the
an example to test, Finally, the use of 43 actual business tax-checking department have is not complete, and the
data as an example the test, and the results of taxpayer tends to conceal the true financial information
discriminant analysis were compared with that of to the tax authorities, so that the source of information is
statistical analysis, and the results show that the sampling not accurate, it influences that the tax authorities takes
effect of BP nets is remarkable. samples correctly.
Keywords: data mining (DM) sampling, gradually
discriminant analysis (GDA), self-organizing mapping 2.2 Inadequate sharing of information resources
(SOM), tax check First, information sharing is not enough, due to
historical reasons, the tax authorities, customs, local
1 Introduction taxes, banks, industrial and commercial bureau, and
other departments, etc, their application systems are for
Traditional sampling of tax-checking is determined their respective development. Second, from within the
by personal experience, the higher authority’s informants tax authorities, the information access is not convenient.
to provide materials, and other artificial means, human Again, IT is less assisted to decisions by tax department.
factors on sampling of tax-checking are more, greater
uncertainty, and can not guarantee the fairness and 2.3 Sampling approach inefficient, poor
transparency of the tax authorities. At the same time, Tax-checking management is low-tech, and
facing the techniques that have emerged increasingly electronic applications is relatively weak in consistency
sophisticated tax evasion in the market economy, the tax of internal data, large amounts of data management and
authorities are more and more powerless. Therefore, it is processing, statistical analysis, etc, it does not give full
increasingly a subject concerned by the tax-checking play to the role of IT in the statistical analysis and
department that the tax inspectors based on their decision support [2]. Most samplings are through artificial
knowledge of all kinds of tax information, taxation selection [3], mainly by the subjective desire of the
materials, use the methods of artificial and computer inspectors to determine the inspection object, the lack of
artificial intelligence to classify, analyze, filter out the a scientific, rational procedures and strict, effective
most possible tax evasion from a large number of supervision and restraint mechanisms, the lack of a
taxpayers and withholding agents to complete the work comprehensive analysis of tax data to checking object,
of sampling of tax-checking, in order to improve the making the object that should not be investigated be
[1]
scientific and accuracy of sampling work . check, the leak of the investigation, law enforcement
unjust, inefficient.

- 1541 -
978-1-4673-3014-5/12/$31.00 ©2012 IEEE
3 The present situation of researching on discrimination accuracy.
tax-checking sampling Mang Qing-guo, Wang Wei-hong (2002) classified
honest Taxpaying Enterprise tax and non-honest
The tax-checking is certain means that the tax Taxpaying Enterprise classification based on BP network
authorities takes according to the laws, regulations, etc, it model [12]. Due to the nonlinearity of the BP network in
is enforcement actions that the tax authorities conducts to the neural network, it has an obvious advantage in
check, audit and inspect if the taxpayers or citizens fulfill dealing with such nonlinear problems. Its discrimination
its obligations to pay taxes within the scope of report for effect is significantly better to compare with that of
a certain period. The tax inspectors must obtain the best discriminant analysis in the statistical analysis. The study
checking results with minimal manpower, material selected 19 indicators of financial ratios [13] to construct
consumption. At present, the sampling is an important three-tier network architecture. However, the BP network
part of tax-checking [4], generally refers to the method of simulation based on simulation of history samples, if the
inspection that the tax inspectors draws some may be sample is not complete or deviation will lead to the
dishonest tax business according to the data which training effect which is not satisfactory, and predicting
companies regularly reported to tax department, and effect is naturally bad.
using various technologies for financial data, tax data Gao Yong, Geng Xin-wei (2001) provided the
that enterprises report to process and judge. The solution that established model on sampling of
sampling technique of tax-checking generally includes the tax-checking, using the arithmetic relationship error in
choice of the index system, the choice of techniques. financial report, between reports, and the logic
relationship errors in financial report, between reports,
3.1 Index system fluctuations in the financial indicators more than a
Index selection rely mostly on the experience of specific alert range, severity over a specific average
inspectors, academics do not have a unified view. It was fluctuations and follower in the associated financial
also devoted index selection problem on tax-checking indicators [14]. However, part of the model can only deal
sample, but the basis of selection is highly subjective. with simple balance in the statement, among tables.
Wang Shu-ling, Wu Zhen (1997) discussed index Some indicators is designed to settings, therefore, the
selection problem on tax-checking sample, and model is less flexible and may be caused low accuracy
established a set of index system [5]. WANG Li-pin rate for sampling due to improper parameter.
(2005)thought that the assessment criteria as the tax
inspecting aim should follow its adaptability, regulation, 4 Theoretical bases of the artificial
importance, complexity, simplicity and stability. The intelligence methods on sampling
structure pattern is an effective way to choose inspection
indicator. For building up the indicator system as the tax 4.1 Classification discriminant on self-organizing
inspecting aim, defending the legality of the tax behavior feature map
of business should be based on the different features of The classification discriminant is based on a certain
the soft and hard indicate [6]. And others used the theories number of samples, select the appropriate financial
and methods of systems engineering to research and indicators as the input of neural network, select the
investigate synthetically various factors related to appropriate output node, the application of
tax-checking, and their and mutual connection, self-organizing feature map network, competitively learn
respectively implemented qualitative and quantitative each sample, automatically form a different class. For
analysis, put forward a more complete check index these categories, the corresponding analysis determine
system [7-10]. It means that the enterprise may very well the overall status of each category, classification
have problems when these indicators cause abnormal according to the different levels of tax evasion, and the
changes. rational allocation of checking resources, and thus a
definite purpose.
3.2 The choice of methods
In the current research, more used discriminant 4.2 Network model establishment process
analysis method in mathematical statistics to solve this Network structure, learning algorithm and the input
problem, discriminant analysis of statistical methods, the node of self-organizing feature maps, which should be
Tobit model, and the logistic regression method. Li set based on the results of GDA,. The number of output
Xuan-ju (1998) put forward the difference analysis to nodes is determined based on the number of samples. Let
identify tax evasion, the basic idea of using the Tobit X ∈ R is the input pattern vector, W is the weight
model to estimate tax evasion that developed the matrix, Y ∈ R is matching response of the output node.
discriminant model and the estimation model of tax And are: Y = W ⊕ X, ⊕ is an operator; it is the
evasion [11]. However, due to its linear structure of the ordinary dot product operation or the Euclid distance
discriminant function, it is difficult to adapt to the highly computing. The specific algorithm is as follows:
nonlinear relationship between financial data and tax (1) Weights initialization
evasion. Therefore, there are greater limitations in Let Wj (j = 1,2, ..., p ) for the connected input node

- 1542 -
to the weight vector of the j-th output node, and assign a is as an example as research sampling.
random decimal, set the start number of cycles t = 1. Indicator selection principles on VAT checking
(2) For each input mode X k (k=1,2,...,m) sample: (1) times. (2). comprehensiveness. (3) Simplicity.
(4) Availability. (5) Stability [17].
① To solve connection weights vector Wt in the According to checking practical experiences of tax
minimum distance of W j and X k : are: inspectors for years and related indicators which internal
p documents require [18]: (1) rate of the tax burden and
X k − W=
g minX k − W j (1) gross profit to sales; (2) The effective tax rate; (3) rate of
j =1
inventory turnover; (4) inventory rate; (5) the relevant
Here distance is Euclidean distance; p is one of output
indicators of reflecting the financial situation, such as
nodes.
asset-liability ratio, quick ratio;(6) indicators reflecting
② Defining the element g is the winner, the the business results, such as return on total assets.
definition of N (t) is the winner' neighborhood. The In summary, based on research result related to
connection weight vector to which each unit of the literature, selects following indicators as the analysis
neighborhood is corresponds and moves closer to X k , basis of part of, the basic indicators of checking sample
the learning equation is: on the VAT as shown in Tab.1
Wij = a(t) éëêX ik - W ij ùûú W ij = W ij ) W ij (2)
Tab.1 Basic indicators of checking sample on the VAT
Among them: a (t) for the learning rate of t times, Names of index The formula
with the increase of the number of training diminishes. Is
defined as: rate of the tax burden payable VAT / sales
= a (t ) 0.2 (1 − t ) 10000  (3) gross profit to sales gross profit from sales / sales
Of which: t = 1, 2, 3... z (500 <z <10000)
x is the input of the i-th input node of the k-th rate of inventory selling cost /average inventory balance
sample data. w is the connection weights between the i-th turnover
effective tax rate output VAT/main, other operating
input node and the j-th output node. Among them, income
j ∈ N (t ) . inventory rate final inventory/ cost of goods sold
(3) For different training times t, repeat step 2.
Regarded as convergence when the network weights asset-liability ratio total liabilities/ total assets
stable
(4) Network convergence, according to the response quick ratio quick assets / current liabilities
of the output node, the sample is normalized to the
return on total assets net profit / average total assets
respective category.
For all samples, one by one calculates the Euclidean ratio of sales to cost selling cost / sales
distance of the connection weights between the output
node and input node. If the distance is the minimum by Sales rate selling expenses/ sales
the j-th output node, the sample will be attributed to the
class with the j-th output node points [15]. Accordingly, ratio of sales to financial expense / sales
the constructed neural network model on SOM, see Fig. financial expense
1. ratio of sales to overhead / sales
overhead

Output layer Of course, the analysis using the above index


……
j=1,2,……p derived object to be examined, the unknown object
selected only explain that the taxpayer may be a problem
Input layer in some ways and some links, but not have a problem.
…… This is because in the analysis of indicators, the problem
obtained after large number of artificial investigations,
Fig.1 The network diagram of self-organizing feature map
and has the limitations. These reference data will change
over time. Thus according to these indicators, elected the
5 Data mining in sampling of tax-checking checking object, we need to do further analysis to select
the most effective indicators for the sampling of
5.1 Choice of basic indicators of checking sample on tax-checking.
VAT
VAT is one of the main taxes in China's tax system, 5.2 Choice of sampling indicators on VAT checking
the revenues accounted for a large proportion, the tax based on GDA
authority checks in tax categories, and the evasion The sampling is judged through composite indicator
behavior of VAT is still the focus of the inspection by tax analysis of corporate tax situation; it is essentially pattern
authorities [16]. Therefore, the sampling of VAT checking

- 1543 -
recognition of the overall corporate tax situation [19]. variable of discrimination efficiency based on the
Pattern recognition identifies the classification in discriminant function to all variables. Selected variables
accordance with their common characteristics or in the above two methods, has always belonged to the
attributes of the object of study. Each category of collection of the discriminated variable, the variable that
pattern recognition is a collection of elements of certain is deleted will always be ruled out in addition to the set
types of common property elements. of variables. There are interactions between the variables,
When the measured data are real numbers, can be the information that the elected variables provide
the vector of a model as a point of the n dimensional becomes negligible in the case of new variables selected,
Euclidean space. Elements belonging to the same kind of naturally proposed gradual discriminant analysis (GDA).
pattern set are spread in a range of the measurement GDA is used of the algorithm which is in and out,
space. Effective computing needs to retain all the that is, every step needs to be inspected. Strongest
characteristics of those most important to distinguish discriminant variables selected into the discriminant
between different categories of features, that is feature equation, the variables that early elected to the
selection or feature compression. For a model category discriminant function, with the entry of other variables,
feature, selection is good or bad; it is difficult to predict its significant may change, if its discriminant ability is
in advance completely, but only to give their assessment not strong, you should promptly be removed the
of the classification results from the classification and variables from the discriminant function. By calculation
recognition systems. Best discrimination process based steps of GDA (slightly), calculated and selected the
on the observed pattern samples, results separate factors of GDA.
discriminated interface in various pattern classes, and
Function rule indicates the inner structure of data in 5.3 SOM in the application of tax-checking sample
order to achieve efficient discrimination of the new data SOM model is used to distinguish tax loophole
[20]
. In particular should be noted that the correctness of degree. For a single tax enterprise, you can use SOM
classification depends on the proximity of the training neural network memory to discriminate its class. When
model and the actual data. SOM neural network is in retrospect, it is in the
Seeing from the above analysis, more sampling following way to classify the input samples. For new
indicators of VAT audit, and the tax situation that each samples, we must identify the nearest unit g with which
indicator within the index system reflects an enterprise is the connection weights vector is in the output layer. The
good or bad in a particular aspect. The evaluation that unit g has 1 of largest activation value, while the other
these indicators to corporate tax have differences, and unit is 0 of the smallest activation value, which said input
there may be statistical relativity among many of the sample belongs to the category corresponding to g unit.
indicators. Excessive indicators increase work intensity In terms of Υ represents a new input sample, b represents
and sampling complexity, affect operational efficiency in the response of the output layer nodes. The
tax investigation, but using at the same time has a higher above-mentioned process using the formula expressed
correlation and the indicators of weak evaluation also as:
reduces accuracy. Excessive indicators increase work ìï1 Y - W = min p Y - W
ï
intensity and sampling complexity, affect operational bg = ïí
g j =1 j
(4)
efficiency in tax investigation, but using at the same time ïï0 Y - W ¹ min p Y - W
ïî g j = 1 j
has a higher correlation and the indicators of weak
evaluation also reduces accuracy. So variable selection is
an important issue in the analysis of the sampling in tax 6 Case analyses
audit, the appropriateness in choosing indicators is the
key to distinguish the pros and cons of the effect. As the This study used data from an Inland Revenue
indicators in tax audit, the most important indicators are Department, a total of 43 financial statements and tax
ignored; corresponding choice effect must be bad. On the returns in commercial enterprises. Of which 10
other hand, if the number of sampling indicators is enterprises is dishonest tax behavior, the other is the
excessive, the data mining must be very great to cause normal tax-paying enterprises. According to the analysis
decline in analytical accuracy. Especially when the of the previous indicators, before using neural network
introduction of some indicators that is not strong in model to deal with the data, we need deal with the ratio
discrimination capability, but also seriously interfere in standardized processing and discriminant analysis and
with and impact of discrimination effect. Therefore, in selection work of indicators. The data is based on a
the actual checking work, we should be based on the matrix with 43×12.
discrimination ability of the sampling index to filter.
Variable screening methods includes forward 6.1 Choice of index sampling on VAT checking
method, backward method, and stepwise discriminant The choice of sampling indicators on VAT checking
method, etc. Forward method is selected in based on with is based on GDA, corporate tax issues (10) enterprises
the variable function, single out the strongest appear and corporate tax issues does not appear (33),
discriminant ability of all variables one by one, while respectively, as two samples overall, the overall most
backward method is followed by removing the lowest

- 1544 -
effective indicator as an alternative these two samples Tab.4 Discriminant results of BP network of 43 companies
overall with GDA in Tab.2. Discrimination actual Percentage
Number result
category category (%)
Tab.2 Basic indicators of GDA
SN index SN index 30 ρ1 ρ1 Correct 90.9
rate of the tax
1 7 quick ratio 3 ρ2 ρ1 Error 9.1
burden
2 gross profit to sales 8 return on total assets 8 ρ2 ρ2 Correct 80
rate of inventory
3 9 ratio of sales to cost 2 ρ1 ρ2 Error 20
turnover
4 effective tax rate 10 Sales rate
Note: ρ1 honest tax categories, ρ2 dishonest tax categories.
ratio of sales to financial
5 inventory rate 11
expense
6.4 Comparison of discriminant result with SPSS
6 asset-liability ratio 12 ratio of sales to overhead
In order to compare the effect of the neural network
of BP and statistical discriminant analysis in the
6.2 Sampling analysis of VAT checking based on SOM sampling of tax checking, we do discriminant analysis on
Use the neural network components of Matlab6.5 to the same data with SPSS. Discriminant accuracy rate
handle the data. Build following SOM network. It has compared to 43 companies, in Tab.5.
seven input nodes, corresponding to each variable. We
set 32 output nodes, making the sample can be fully Tab.5 Discriminant accuracy rate of 43 enterprises in two
classified. We randomly assign to two groups of initial methods
weights of input layer and competitive layer. After BP results SPSS results
repeated experiments, we take the learning rate is 0.1,
Accuracy Accuracy
200 times of learning. After the data is normalized, we Number Number
rate (%) rate (%)
began to train the network.
ρ1 judged
6.3 Analysis conclusion 30 90.9 26 78.8
as ρ1
It can be seen that most dishonest taxpaying
enterprises are divided in a category 4, and the honest ρ1 judged
3 9.1 7 21.2
taxpaying enterprises almost in a category 32. Seen, this as ρ2
clustering method will be effective to distinguish tax
situation. Two types of enterprises indicator data got ρ2 judged
8 80% 7 70
from clustering were taken average, are shown in Tab.3. as ρ2
ρ2 judged
Tab.3 Index of honest and dishonest taxpaying enterprises 2 20% 3 30
classes as ρ1
4 32
index
rate of the tax burden 0.0286 0.0137 It can be seen from the above results, artificial
effective tax rate 4.306 0.665 neural networks with higher non-linear, adaptive and
inventory rate 5.636 5.260 self-learning, though training samples, it can change
quick ratio -0.588 0.938 according to surrounding environment and deal with the
return on total assets various information changes, itself is also constantly
-1.636 0.117
changed. And its information storage can be distributed,
ratio of sales to cost 1.430 0.780
even if there is local damage or loss, it will not affect the
ratio of sales to overall situation. The statistical analysis does not have
0.076 0.043
financial expense
this advantage. The analysis showed that BP neural
network is greater progress than the statistical
It can be seen that the different characteristics of discriminant analysis.
each index between the categories, we believe that these
features may become a basis for future sampling of tax
checking. Can be learned through trial, the selection of
7 Conclusion
enterprise indicators is appropriate. In other words, seven
indicators reacts the actual situation of honest taxpaying In this paper, the combination of practical
enterprises. We do the statistics about operating result, in experience and theoretical analysis, using GDA to get a
Tab.4. After analysis and calculation, the accuracy of new set of index system, practice has proved that the
sampling has been significantly improved, and greatly indicators system can better reflect the actual situation if
improves the effect about the sampling of tax checking. the corporate tax is honest. At present, there are few to
use self-organizing map network (SOM) to research on

- 1545 -
the sampling of tax audit. But through trying to study on [7]Erard Brian, Jonathan S Feinstein. Estimating the
the sampling of tax checking, explain the effectiveness of federal income tax gap using operational audit data[R].
SOM in the tax audit practice, further improve the Report prepared by B Erard & Associates for the IRS,
efficiency of tax checking, and achieved good results in 2001.
the sampling process. [8]Erard Brian. Compliance measurement and workload
The defects of this study are index selection, the selection with operational audit data[R]. The Internal
approach of use and maintenance does not fully consider. Revenue Service Research Conference, Geroge
The accuracy and rationality of indicators have yet to Washington University, 2002:11-12.
perfect the design and operation of the system. With [9]Wang Qing. Objectives and mode of computer
raising China's computer application level in tax sampling system on tax checking[J]. Journal of Nanjing
collection, enhancing tax information sources and University of Finance and Economics, 2007(1):69-70. (in
processing capabilities, the sampling indicators should be Chinese)
expanded, should be more in-depth analysis. It needs to [10]Chen Ying, Wu Xuan. The problems and index
be in-depth study by artificial intelligence techniques for system selection on the sampling of tax audit selection
the selection of indicators. The sampling analysis [J]. Taxation Research, 2005(8):80-81. (in Chinese).
involves only the VAT on tax checking, not covered by [11]Li Xuan-ju. Research on model of the sampling of
other taxes. Commercial enterprises data is used in the tax checking[J]. Financial Studies, 1998(8):10-14. (in
sampling choice, the sampling of tax checking can be Chinese)
further in-depth study on manufacturing and other [12]Ma Qing-guo, Wang Wei-hong, Chen Jian. Research
industries Inspection Cases in the future. on BP application in sampling of tax-checking[J]. The
Journal of Quantitative and Technical Economics,
References 2002(2):98-101. (in Chinese)
[13]Liu De-jiang, Hu Wen-ping, Shi Jing. Corporate
[1]Gou Yan, Liu Yi-wen. Reasons for the low accuracy financial analysis[M]. China Economic Press, 2003:
and solutions on the sampling of checking[J]. 23-25.
International Taxation in China, 2003(11):61-63. (in [14]Gao Yong, Geng Xin-wei. Mathematical model
Chinese) introduction on checking sampling system selection[J].
[2]Yang Mo-ru. Experience and learn from foreign tax Taxation Research, 2001(7):42-46. (in Chinese)
audit[J]. International Taxation in China, 2008(11):45-47. [15]Chen Ying. Research on techniques of tax checking
(in Chinese) sample[D]. Tianjin University Ph.D. Thesis, 2004:34-42.
[3]Gao Ying-cheng, Li Tian-ning, Sun Sheng-nian. (in Chinese)
Example of accuracy and related factors on the sampling [16]Wu Ming. Practical manual on VAT check[M]. Jilin
of tax checking[J]. Gansu Taxation, 2003(10):36-37. (in People Press, 1999:77:84. (in Chinese)
Chinese) [17]Su Yan. Tax inspection[M]. Northeast Financial
[4]Xi Qi-wan. Countermeasures and suggestions for the University Press, 1999:60-67. (in Chinese)
effectiveness of improving tax checking[J]. Taxation [18]Jonathan S Einstein. An econometric analysis of
Research, 2010(8):73-75. (in Chinese) income tax evasion and its detection[J].Rand Journal of
[5]Wang Shu-ling, Wu Zhen. Analysis and design of Economics, 1991, 22(1):14-35.
index system on computer sampling of tax checking[J]. [19]Sun Shang-gong, Pan An-pei. Practical discriminant
Journal of Central University of Finance and Economics, analysis[M]. Science Press, 1990:33-39. (in Chinese)
1997(7):57-59. (in Chinese) [20]Erard Brian. Self-selection with measurement errors:
[6]Wang Li-pin. The principles and ways for building up A microeconometric analysis of the decision to seek tax
the indicator system as the tax inspection aim[J]. assistance and its implications for tax compliance[J].
Taxation and Economy, 2005(5):28-30. (in Chinese) Journal of Econometrics, 1991, 81:319-356. (in Chinese)

- 1546 -

You might also like