Professional Documents
Culture Documents
Abstract— Tax fraud includes a large spectrum of methods to parallel patterns in .Net framework. Bayesian networks have
deny the facts and realities, claiming wrong information, and been used for parallelism. It utilizes conditional probability
accomplishing financial businesses regardless of what the legal distribution to identify and to predict tax fraud among people
frameworks are. Nowadays, with the development tax systems who pay tax. The proposed system has two stages, training and
and the large volume of the data stored in them, need is felt for a testing. Two parallel techniques .Net parallel loops and parallel
tool by which we can process the stored data and provide users LINQ technologies were used [11, 12, 13, 14].
with the information obtained from it. According to tax politics,
especially value-added tax, the rate of tax fraud is now This paper is organized as follows. Section two discusses
increasing. Based on the investigations, recent researchers tend to some related works and data mining algorithms. Section three
use similar and standard methods to detect tax fraud, which presents the proposed method and section four discusses the
includes, association rules, clustering, neural networks, decision implementation results and finally section five includes
trees, Bayesian networks, regression and genetic algorithms. conclusions.
Because of large volume of tax database, most of the studied
methods about fraud detection are computationally intensive. In
order to increase the performance of fraud detection algorithms II. RELATED WORKS
such as Bayesian networks, parallelism techniques are used in Some data mining algorithms such as multilayer feed
this paper. We used parallel technology of Microsoft .Net, forward neural network, Support Vector Machine (SVM),
parallel loops and P-LINQ on the Intel Xeon server with 16,
Genetic Programming (GP), Group Method Data Handle
X7755 dual core processors and memory of 32GB. The
implementation results on real database show that a speedup of (GMDH), Logistic Regression (LR), Probabilistic Neural
up to 9.2x is achieved. Networks (PNN) are used in fraud detection systems. Tax data
has usually a large volume of data and information and
Keywords— Data Mining, Tax Fraud Detection, Bayesian processing those algorithms are computationally intensive.
Networks, Parallelism Technaiues Some related works in the field of tax fraud detection is
present in Table 1.
I. INTRODUCTION TABLE 1. OVERALL ANALYSIS AND EVALUATION OF
TAX FRAUD RESEARCHES
Nowadays, available hardware systems are capable of Ref Tools techniques Environment Approach Knowledge
parallel processing. Successful use of these capabilities by Number and
means of multi-thread programming, parallel algorithms and clementine and Canada, Chile Clustering
sum of the
SPSS and the and neural
[1]
about it. It is possible to utilize such intelligent systems to (non-parallel) of Germany Association claimed in the
rules assertions
identify the taxes accurately in order to increase remittance of
Available
tax [3, 4, 5]. Tax is considered the most central and the main T-statistics Experimental Genetics and
fields in the
[3]
source of the income countries [6, 7]. Since the tax specialists (non- parallel) data Regression
assertions
utilize traditional strategies for auditing, a remarkable amount clementine and Information Regression Published
of tax incomes of governments are lost [8, 9, 10]. There are SPSS of 49 related and Neural articles in this
[4]
some techniques for tax fraud detection such as data mining (no parallel) articles Network field
techniques, while those algorithms for a large tax databases are
computationally intensive. III. PROPOSED METHOD
In this paper, some parallelism techniques are applied on The main purpose of training algorithm is to discover the
data mining algorithms to order to increase the performance. In rules governing various labels, based on features of records.
other words, a parallel tax fraud detection system is presented. Figure 1 depicts the different levels of training algorithm, used
Implementation results on some real data show that a in this study, in which building the model according to parallel
performance improvement of 9.2x is achieved using available method, is the main factor helping detection.
7
5) Evaluating results of conforming the behavioral model of for (i=0, i< Code_Hoze.Capacity; i++)
people committing tax fraud, to all the data. for (j=0, j<TaxPaery.length; j++)
6) Returning to step 3 in case of exceeding the error calculateMultiLevelBaysianFormules();
threshold. By means of parallelism technique in DOT NET, codes of
7) Detecting fraud, making reports and notifying the tax used loops in the program, became parallel in order to achieve
departments. higher speed as follows:
Based on Figure 4, a label is assigned to each of the features for (i=0, i< Code_Hoze.Capacity; i++)
Parallel.For (0, TaxPaery.length, j => {
of the payer, which will be used in constructing model. These
ret=calculateMultiLevelBaysianFormules
labels have been used in formulae 5. Then according to the (f1Arr[j], f2Arr[j], f3Arr[j], f4Arr[j]);
below conditions we can compute the rate of financial fraud. }
E. Second level: queries level parallelism
Serial code at the center of the fraud detector program is
continuously run to fetch the tax data from database and to
process and restore it, which is time consuming.
for (i=0, i< dgvTaxPayer.RowCount; i++)
for (j=0, j< ghatee_inf_RowsCount; j++)
KernelComDo.Excute (QueryString);
Using PLINQ technology, parallelism in query level has
been actualized as the code piece below. Which resulted in
reducing the running time of the program significantly.
for (int i = 0; i < dgvTaxPayer.RowCount; i++){
Int32 result1=ghatee_inf_Rows.Where (p =>
p.Field <string> ("K_Parvand") ==
dgvMoadi.Rows[i].Cells[2].Value) .Count();
}
FIGURE 4. LABELING PARALLEL BAYESIAN LAYERS IN ORDER
TO BE INSERTED IN THE FORMULAE F. Third level: infrastructure level parallelism
X= (Notification = 1, Declaration = 0, After analyzing, designing and actualizing tax fraud
Financial Offices = 1, No surrender = 1) detection system, it was run over Intel Xeon server with 16
P (Fraud=1 | X) = P (N=1 | A = 0, B = 1, C = 0, D = 1) dual core processer of X5570 and RAM of 64GB. The results
have been published as follows.
G. Compute performance of proposed method of parallelism
In this study, part of the loops of the program became
(5) parallel, in extracting Bayesian tables section, and also in part
of program queries of fraud detection section. Then the final
program was run with speed processing.
TABLE 2. THE RESULTS OF THE LOOP AND QUERY PARALLELISM (SECOND)
Number of Time programs in Time in Loop Time in Query
taxpayer serial mode (s) Level Parallel Level Parallel
1000 505 33 14
2000 539 45 18
C. Parallel algorithm of tax fraud detection 3000 561 72 27
Parallel thinking and parallel analysis and designs can be 4000 598 122 34
5000 659 130 40
the solution for the significant problems of nowadays
6000 691 155 59
software. It can be considered as a rescue for the unable world 7000 730 182 97
of software. In this stage, based on the investigations done, 8000 872 201 114
parallelism can be possible, in three levels: 9000 954 227 189
10000 1114 241 193
1) Parallelism in the level of loops used in the program.
SUM 7223 1408 785
2) Parallelism in the level of queries used in the program.
3) Parallelism in the level of infrastructure, network and (6)
operating system.
The acceleration we achieved to by means of loop
D. First level: loop level parallelism parallelism, can be seen in Table 2. Also Figure 5 displays the
Loops used in the body of program in serial mode, were accelerated graph obtained from loop parallelism. It is worth
time consuming, which was altered completely with changing mentioning the numerical values of serial and parallel models
the structure. were same and their difference is in the time consumed.
8
Results gained from parallelism in the program queries is as
shown in Table 2. The important point is that, sum of the time
duration of all experiment in serial method was divided by
sum of the time duration of all experiments in parallel method,
the third and fourth columns of Table 2 and in Formulae 6.
Finally the mean of total rate of performance for each method
was computed.
Final comparison of the two methods-parallelism in loops
and parallelism in queries. The reason that parallelism in
queries (PLINQ) outperforms parallelism in loops, is the fact
that tax fraud identification and prediction program is data
level and is originally data based. FIGURE 6. PREDICT AMOUNT OF FRAUD BY MEANS OF EXAMINING DATA
IV. CONCLUSIONS
Intelligent systems entitled tax fraud detection systems,
identify fraud among the available tax data with less error
coefficient, which depends on the tax department chosen by
the user, can report the list of frauds committed in that
department to the related specialists.
Because of the large volume of tax data, running the
program in serial mode was time consuming. So in this study
we tried to rely on new parallelism technology and apply it on
parts of procedure to reduce the running time of the program.
This system can be applied to the structure of tax system.
FIGURE 5. SPEEDUP LINQ OVER LOOP PARALLELISM
In this study, parallelism in loops was done using REFERENCES
Parallel.For, while queries parallelism was actualized by [1] P.C. González, J.D. Velásquez, "Characterization and detection of
means of Parallel LINQ in dotNet. Parallelism in executive taxpayers with false invoices using data mining techniques," Expert
Systems with Applications vol.40, no. 5, pp. 1427-1436, (2013).
floor was carried out, using parallel processing, available
[2] R.S. Wu, C.S. O.U. H. Lin, S.I. Chang, DC Yen, "Using data mining
multi core hardware systems and proper utilization of the technique to enhance tax evasion detection performance," Expert
abilities in multi core processor. Given the data in financial Systems with Applications vol.39, no. 10, pp. 8769-8777, (2012).
departments including 15 of them in financial careers, around [3] P. Ravisankar, V. Ravi, G.R. Rao, I. Bose, "Detection of financial
10028 experiments were conducted. Out of 10028 tax payers, statement fraud and feature selection using data mining
technique," Decision Support Systems vol.50, no. 2, pp. 491-500, (2011).
995 of them were committing frauds clearly. Output result of
[4] E.W.T. Ngai, Y. Hu, Y.H. Wong, Y. Chen, X. Sun, "The application of
fraud detection in this section is shown in Table 3. data mining techniques in financial fraud detection: A classification
TABLE 3. OUTPUT OF SYSTEM BY MEANS OF LOOP AND QUERIES PARALLELISM framework and an academic review of literature," Decision Support
Number of the processed tuples Systems vol.50, no. 3, pp. 559-569, (2011).
Department Tax
Forgers of tax payers from their 10 years [5] V. Ajay, D.V. Ashoka, V.N. Aradya. "Application of Data Mining
code payers
history file, for identification Techniques for Defect Detection and Classification," In Proceedings of
H-1 303 9 2,877,094 the 3rd International Conference on Frontiers of Intelligent Computing:
H-2 1264 170 203,091,683 Theory and Applications (FICTA) 2015.
H-3 806 46 13,038,888 [6] M.S. Abadeh, S. Mahmoodi, M. Taherparvar, “Application Data
H-4 1117 90 7,260,277 Mining,” Niyaz Danesh Press, 2012
H-5 802 71 6,973,117 [7] J. Shahrabi, V. Shakor; “Data mining Concepts,” Metalon Press, 2007
H-6 476 42 10,420,630 [8] A. Ahmadi, A. Mohebbi, “Business Intelligence: data mining and
H-7 771 110 12,249,956 optimization,” Amirkabir University Press, 2013
H-8 603 89 45,498,883 [9] Central Michigan University, “Computational Data Mining Techniques
H-9 670 34 19,956,647 in Automotive Insurance Fraud Detection”; Data Science Journal 2012.
H-10 333 44 4,048,641 [10] F. Nonyelum, “Data Mining Application in credit card fraud detection
H-11 501 92 57,167,908 system,” Journal of Engineering Science and Technology Vol. 6, (2011)
H-12 733 35 29,967,092 [11] Akash Verenkar, “Using .NET Parallel Programming Model to Achieve
H-13 473 26 9,692,546 Data Parallelism in Multi-tier,” Microsoft Corporation, 2015
H-14 578 101 27,129,424
[12] D. Leijen, W. Schulte, S. Burckhardt, “The Design of a Task Parallel
H-15 598 36 9,598,905 Library,” Microsoft Corporation, 2015
Sum: 10028 995 458,971,690
[13] S. Okur, D. Dig, “How do Developers Use Parallel Libraries?,” MSDN
By means of data in Table 3 and the value of variable from Microsoft Corporation, 2015
the previous step, we increased the amount of prediction. The [14] I. Ostrovsky, “Parallel Programming in .NET 4,” Parallel Computing
dotted area on the Figure 6 shows the area of fraud prediction. Platform Group, 2015.