You are on page 1of 6

Application of Soft Computing to Tax Fraud

Detection in Small Businesses


Cao Thang
1
, Pham Quang Toan
2
, Eric W. Cooper
3
and Katsuari Kamei
3
1
Graduate School of Science and Engineering, Ritsumeikan University, Japan
Email: thangc@spice.ci.ritsumei.ac.jp
2
General Department of Taxation of Vietnam, Vietnam
Email: pqtoan@gdt.gov.vn
3
College of Information Science and Engineering, Ritsumeikan University, Japan
Email: copper@is.ritsumei.ac.jp; kamei@ci.ritsumei.ac.jp
AbstractIn this paper, we present a soft computing model for
tax fraud detection in small firms and businesses. Inputs to the
model are periodical finance reports and related information
about market and inspection firms, and outputs are an inference
of the tax fraud status. First, after using fuzzy inferences, the
system determines a close business class to which the inspected
firms belong. Next, training by statistical data from the business
class, Neural Network (NN) is used to determine the fraud status
of the inspected firm. Training data for the NN is periodical
finance reports, market information of the business class and
fraud history of the inspected firms. Finally, we describe initial
evaluations and our future works.
Keywords: decision support system, tax fraud detection, neural
networks, fuzzy inference
I. INTRODUCTION
In the open market economy, especially in developing
countries like Vietnam, small businesses make up around 70%
of the total number of businesses and they have significantly
contributed to the development of the countries. Taxes are the
main revenue source of governments and in Vietnam the
revenue from small businesses is about 20% of the total tax
revenue. When a country has a high rate of tax fraud, not only
does it have less revenue but also its economic situation is held
back, affecting the socioeconomics of the country. Large
businesses are often inspected carefully and frequently because
the number of these businesses is not large but they have big
sales. In contrast, the number of small businesses is large and
the sales of each small business are not much. Commonly these
small businesses are not inspected permanently and carefully.
In addition, it often takes much time for government to verify
finance reports and real business activities of all small
businesses, so the tax fraud is easily dodged here. Loss from
tax fraud of a single small business is not worth considering,
but loss from a large number of small businesses is remarkable.
Minimizing tax fraud always is the first challenge of
governments in strengthening and developing their
socioeconomics.
Accuracy and time-consumption for verifying financial
status and checking business activities of all small businesses
have an important role in the tax fraud detection procedure.
Building a successful Decision Support System (DSS) for tax
fraud detection based on knowledge from experienced auditors,
financial experts and inspectors will help to increase the
accuracy and reduce the time-consumption and human resource
for the inspection process, efficiently taking back the tax-fraud
income. It also contributes to stabilize the economic situation
of the country and helps to restrict negative activities in the
inspecting procedures.
II. DECISIONSUPPORT SYSTEMS
In the last 50 years, the advent of the computer has greatly
stimulated developments of DSS and Expert Systems, which
perform the roles of a specialist, or assist people in carrying out
tasks requiring specific expertise. There are many domains in
which DSS has been successfully applied, such as medicine [1-
2], geology [3], chemistry [4-5] and business [6-20], including
financial fraud detection [8-9, 17-20].
According to experienced auditors, in the tax fraud
inspection process, first the scale of businesses is categorized
into some defined groups. Each group has some similarities in
sales, commodities, services and markets. Second, for each
firm, periodical finance reports are inspected, together with the
information about its market and history of business activities.
Table 1 shows a required periodical finance report of a firm.
Table 2 shows information about the market and business
history.
The tax fraud inspection process described above can be
suitably assisted with a DSS as shown in Fig.1.
Roles of functional parts in Fig. 1 are as follows:
- Inspected Finance Reports: Periodical finance reports
of the inspected firm, submitted in a predefined format.
- Fuzzy Classification: Determines the scale of inspected
firm and which business classes the firm belongs to.
- Sample Finance Reports: Typical periodical finance
reports clarified by skilled auditors.
1-4244-0569-6/06/$20.00 2006 IEEE
402
- Market Information: Information about market and
commodities that the firm uses in its business.
- Firm Information: Business history of the inspected
firm.
- Feature Extraction: Considers useful information and
codes this information in suitable forms so that they
can be leaned well by a NN.
- Neural Network: Generalizes training data, checks
finance report to find any fraud, and then gives a
conclusion about the tax fraud status.
Inspectors
Tax of s Experience
Reports Finance Inspected
tion Classifica Fuzzy
n Informatio Firm -
n Informatio Market -
Reports Finance Sample -
Activities Business
Extraction Feature
Data Training Network Neural
s Conclusion Fraud Tax
Figure 1. Structure of DSS for the Tax Fraud Detection Process
TABLE I. AREPORT OF BUSINESS RESULTS OF A FIRM
Finance
Code
Criteria (values are in currency)
01 1. Sum of sales and services
03 2. Deducted expenditure
10 3. Sum of plain sales and services
11 4. Prime cost
20 5. Profits from sales and services
21 6. Sales from finance activities
22 7.1 Total finance cost
23 7.2. Interest cost
24 8. Selling cost
25 9. Management cost
30 10. Plain profits from sales
31 11. Other incomes
32 12. Other costs
40 13. Other profits
50 14. Sum of profits before tax
51 15. Firms income tax
60 16. Sum of profits after tax
Properties
420 Welfare budget
430 Sum of assets
270 Source of capital
TABLE II. INFORMATION ABOUT MARKET AND HISTORY OF BUSINESS
ACTIVITIES
No. Criteria
I Market information
1.1 Are the commodities from the firm popular in the market?
1.2 Should the commodities be sold with special services?
1.2.a Are special services provided by this firm?
1.2.b Are special services provided by other firms?
1.3 Ratio of values added to the commodities and basis prices
II Business information and history
2.1 Does this firm have tax fraud history?
2.2 Ratio of sales of this period and last period
2.3 Ratio of profits of this period and last period
2.4 Number of employee
III. FUZZYINFERENCE ANDBUSINESS
CLASSIFICATION
The classification criteria are often based on activities that
bring out main profits. In the open market, a firm often does
many business activities, and business regions of activities may
overlap each other. For a firm, some business activities are
more important while some are not. The vagueness and
complexity of these activities make it unsuitable for traditional
quantitative approaches to determine which business class a
firm belongs to. Fuzzy sets, known for their abilities to deal
with vague variables using membership functions rather than
with crisp values, have proven to be one of the most suitable
approaches to resolve this problem. They also enable
developers to use linguistic variables and build a friendly user
interface. Using fuzzy rules, a DSS can give expert-like
explanations, making it easier for users to understand results
from the DSS.
So far, some practical applications in finance have been
built based on fuzzy logic [6-9,20]. In our model, based on
business activities in which the inspection firm involved, fuzzy
logic is used to determine which business class the firm
belongs to.
In the Vietnamese market, there are about 4000 kinds of
production, commodities and services that are taxed, so it will
definitely take too much time and efforts to take all of them
into account. In the classification step, first we use properties
of goods to classify all of commodities and services into groups.
A business group consists of business activities that can be
completely done by a firm, and a firm can do one or some
business groups. For example the group computers includes
computers and their components, group vegetables consists
of vegetable products, group transportation services
comprises taxi, cargo and passenger services.
The scales of businesses are classified into 12 defined
classes: agricultural produces, agricultural services, mechanical
products, wholesales trading, retail trading, light industrial
products, light industrial services, heavy industrial produces,
heavy industrial services, real estate services, construction
services and education services.
1-4244-0569-6/06/$20.00 2006 IEEE
403
A. Fuzzy Expressions of Business Activities
Suppose that F is the inspected firm, the number of
business groups is k , number of business classes is 12 = m ,
and the firm has activities in n groups ) ( k n s . Let
} ,..., , {
2 1 k
L L L L = be the set of all business groups. Let
} ,..., , {
2 1
F
n
F F F
L L L L = be a set of the business groups in which
firm F involved ) ( L L
F
e . Let } ,..., , {
2 1 m
B B B B = be the set
of the main business groups.
Let the following fuzzy values in ) ,..., 1 ( k i L
i
= be defined:
- ] 1 , 0 [ e
i
j
L
B
: a dominant degree of
i
L in
j
B
) ,..., 1 ( m j = given by experienced auditors via survey
in advance, where:
1
1
_ =
=
n
i
L
B
i
j

(1)
1 =
i
j
L
B
means that the class
j
B contains only
i
L , 0 =
i
j
L
B
means that
j
B does not contain
i
L , and
1 0 < <
i
j
L
B
means that
i
L appears on
j
B with the
dominant ratio
i
j
L
B
.
The dominant degrees are introduced because a business
class may have several different groups, and each group
has different degrees of importance in each class. The
sum in Eq.(1) equals 1, that means the maximum belief
degree of a business group in a business class is 1. This
sum is also a constraint to eliminate that auditors freely
express the dominant degree in their own view.
- ] 1 , 0 [ e
i
L
F
: the ratio of business activities from
i
L
and all business activities done by the inspected firm
F . 1 =
i
L
F
means that firm F only has business
activities in
i
L , 0 =
i
L
F
means that firm F has no
activities in
i
L , and 1 0 < <
i
L
F
means that firm F
has more activities than
i
L and the activities in
i
L
take a ratio expressed by
i
L
F
.
i
L
F
also satisfies the
following constraint:
_ =
=
n
i
L
F
i
1
1
(2)
A firm F has business activities in n groups
} ,..., , {
2 1
F
n
F F F
L L L L = . Each activity
F
i
L done by firm F
has a ratio
i
j
L
B
in class
j
B . The degree to which firm F
belongs to
j
B is determined by the following rule:
j
F
n
F
B F
L L
F
to belongs THEN
and...and
are firm of activities busines IF
1
(3)
B. Classification Inference Process
First, the certainty ratio of business activities
F
i
L done by
firm F in class
j
B is evaluated as follows:
,... 1 , m j
i
j
i
j
F
i
L
B
L
F
B
L
= =
(4)
is a norm t operator.
Then, the certainty value of all business activities
F
L done
by firm F in class
j
B is calculated:
1
j
F
i
j
B
L
n
i
B
F

=
=
(5)
is a conorm t operator.
Finally, the system finds a business class
x
B that has the
maximum value
x
B
F
among m classes:
max
)
`

= =
j y
B
F
j
B
F y x
B B
(6)
IV. NEURAL NETWORKS ANDTAXFRAUD
DETECTION
A. NN and Finance Applications
NNs are a powerful technique to help finance experts to
analyze, model and make sense of complex business data [14 -
17,20]. They enable intelligent systems to learn from
experience, examples and finance records, improving the
performance of the systems over time. Based on typical sample
finance reports and knowledge from experienced finance
experts and inspectors, NNs can generalize finance rules as
well as relations among markets, sales and profits. After
training, NNs can give reasonable advice about the likely tax
fraud status of a new firm in accordance with the periodical
finance report of the firm and its market information. Fig. 2
illustrates steps to apply NNs into decision making applications
in finance.
B. Finance Data for NN
An important point in the feature extraction step is to select
the right sets of input and output features. Raw data are finance
reports of firms, market and business information of business
classes, and experience of inspectors. Features should be
reasonably chosen so that NN can well generalize relations in
the training data, and from a trained NN we can get suitable
conclusions about tax fraud of a new firm based on its market
information and periodical finance reports.
1-4244-0569-6/06/$20.00 2006 IEEE
404
1) Raw data
Raw data on finance reports are monetary values, ranging
from a small value, for example deducted VAT, to a large
number, for example Sum of sales and services. Depending
on the sale scale, values of a criterion on finance reports of
different firms may have big differences.
Raw data about market and business history is in the form
of responses, percentages and numbers which also have wide
ranges. The conclusion of inspection process is Yes that
means the firm is likely to have committed tax fraud, and No
that means the firm is not likely to have committed tax fraud.
1 H
1
j
n
l H
1 I
m I
i I j H
ij w
1 O
ij w k O
p O
rules market and Finance -
n Informatio mic Socioecono -
Reports Finance -
Experts Finance of s Experience -
Data Training
Extraction Feature
Vectors Feature
Decisions
Network Neural
Figure 2. Neural Networks for decision making applications in finance
2) Data Normalization
Architectures of NNs widely used are Multi-Layer
Perceptron Neural Network (MLP NN) and Radial Basis
Function Neural Network (RBF NN). For a good
generalization, input and output values should be normalized.
Because ofthe large ranges of values in the finance reports, if
we normalize amounts of money for items in these reports into
[0,1] using the maximum value as a standard, some small but
important values may be close to 0, for example Other
incomes, or Interest costs, some values may close to each
other, for example Sum of plain sales and services and
Prime cost, and some other values may be close to 1, for
example Source of capital and Sum of assets.
To avoid the above problem in the data normalization on
the finance reports, to make outstanding for each item, we use
some important criteria for standards and use ratios of other
criterions comparing with these standards. The ratio features
from finance reports are described in table 3.
Features from the table of market and history of business
activities have Boolean values: Yes (true, coded by 1) and No
(false, coded by 0). The other percentages have values in [0,1],
so they are unchanged in the training data. The number of
employees ranges from the minimum number
min
E to the
maximum
max
E (normally 5
min
= E and 80
max
= E for a
small business). We code the number in fuzzy membership
values. The membership value
E
F
of the number of
employees
F
E in firm F is calculated as
min max
min
E E
E E
F E
F

=
(7)
TABLE III. RATIO FEATURES FROM FINANCE REPORTS.
1
S = Sum of sales and services
1
C = Prime cost
1
P = Profits from sales and services
1
M = Management cost
2
P = Sum of profits before tax
No. Ratio Features
1
1
S = Sum of sales and services
2
Deducted expenditure/
1
S
3
Sum of plain sales and services/
1
S
4
1 1
/ S C
5
1 1
/ S P
6
Sales from finance activities/
1
P
7
Finance cost/
1
P
8
Interest cost/
1
P
9
Selling cost/
1
C
10
1 1
/ C M
11
Plain profits from sales/
1
S
12
Other incomes/
1
M
13
Other costs/
1
M
14
Other profits/
1
M
15
1 2
/ S P
16
Firms income tax/
2
P
17
Sum of profits after tax/
2
P
18
Welfare budget/
1
S
19
Sum of assets/
1
S
20
Source of capital/
1
S
V. IMPLEMENTATION
The small businesses were selected are firms that have sales
lower than 10 billion VND (equivalent to 700,000 USD) per
year. The number of business groups is 80 = k . The dominant
1-4244-0569-6/06/$20.00 2006 IEEE
405
degree
i
j
L
B
of business group
i
L in business class
j
B is
evaluated by experienced auditors via a survey in advance. The
ratio of business activities
i
L
F
from
i
L with all business
activities done by the inspected firm F is evaluated by
inspectors in the inspection process. The multiplication and
addition are chosen as the norm t and conorm t
operations in Eqs. (4) and (5).
There are 12 = m NNs corresponding to 12 business
classes. Each NN has 3 layers as shown in Fig. 4. Inputs to the
NN are 20 features from finance reports and 9 features from
market and business information, and one output that have
value 1, meaning the inspected firm has committed tax fraud,
and 0 that means the firm has not committed tax fraud. In the
testing data, since the output of NN has values in [0,1], first a
threshold for deciding whether the firm has committed fraud or
not is chosen as 0.5. Depending on the training data in each
business class, this threshold can be changed.
By experiment, the number of neurons in the hidden layer
is chosen as the third of number of inputs. NNs are back-
propagation MLP NNs adopting sigmoid or hyperbolic tangent
activated functions. To accelerated training, adaptive learning
and momentum term are also used. Fig. 3 shows one NN in the
inspection procedure.
1
H
1

l
H
O
1
I
m
I
i
I
j
H
jk
w
ij
w
reports finance
from Features
n informatio business
from Features
(Yes/No) status fraud
tax of Conclusion
Figure 3. One NN in the inspection procedure
VI. EVALUATIONS
Combining fuzzy inference and NNs provides a more
powerful and effective DSS with reasoning and generalizing
capabilities for evaluating fraud status in periodical finance
reports.
The rule (3), together with Eqs. (4), (5) and constraints (1)
and (2), is equivalent to the following fuzzy rule:
j
n
j j
B
F j
L
B
F
n
L
B
F
B F
L L
F


certainty with to belongs THEN
) is ( and...and ) is (
are firm of activities business of ratios the IF
1
1
(8)
A DSS using rule form (8) may need thousands of
inference rules with many combinations of ratios in premises.
Not only do they take lots of time for developers but also much
effort for experienced auditors to revise all of the rules. Our
model uses only m rules (3) with Eqs. (1), (2), (4) and (5).
After inference, the system shows a graph of membership
degrees of firm F in m classes. Experienced auditors also
confirmed that it was easy to review the knowledge presented
by the rules, and easy to understand inference results via the
membership graph. Fig. 4 shows an example of the graph. In
this graph, the inspected firm F belongs to the class
9
B
because most of its business activities are in the business class
9
B ( 52 . 0
9
=
B
F
).
Figure 4. A graph of membership degrees of a firm in 12 classes
Relations of data may have a big difference when sample
finance reports are collected from both large and small sale
firms, or gathered from firms in different business classes. NNs
learned the training data better if we classify business activities
and scales into classes and use one NN for one business class
that has similar business activities.
As opposed to errors or mistakes, tax fraud is not due to
accident. Some firms try to make it reasonable when
submitting the finance reports, and fraud is hidden
intentionally. The NNs may not efficiently discover the hidden
fraud if we only use features selected from the periodical
finance reports. NNs would better generalize the data and more
effectively classify fraud and non-fraud firms when we use
training features extracted from business activities and firm
information, together with the finance reports.
In the first phase, we use real data from 13 non-fraud firms
and 20 fraud firms in heavy industrial services class as the
training data of a NN. The number of training datasets is 24,
including 9 datasets from non-fraud firms and 15 datasets from
fraud firms, and the number of testing data is 9, including 4
datasets from non-fraud firms and 5 datasets from fraud firms.
The sigmoid function is chosen as the activated function of the
NN. After 2000 iterations, the NN leant the data with an
accuracy of
1
10 2

of Mean Square Error in both training and


testing data. The correct classification rate in non-fraud firms is
4 / 3 or % 75 , and in fraud-firms is 5 / 4 or % 80 . These
correct classification rates could be improved if we have more
training and testing data for the NN. After getting enough data,
we can obtain a final evaluation in classification and prediction
about the tax fraud status.
1-4244-0569-6/06/$20.00 2006 IEEE
406
VII. CONCLUSION
We present a DSS model by using fuzzy inference for
business classification and NN for tax fraud detection in
business classes. The initial results show this is apromisingly
approach for tax fraud detection applications, and it could be
applied to other finance problems due to its reasoning and
generalizing capabilities.
There are some limitations of the DSS. Because fraud is
intentionally and sophisticatedly hidden, at times it can be quite
difficult to detect, even by skilled auditors. This DSS is
developed to assist auditors in their inspection process.
Although the results from the DSS are reasonable and logical,
auditors should not only depend on the advice of the system,
they also need to consider other related information such as
market trends or current socioeconomics, to get final
conclusions about the tax fraud status of the inspected firms.
Our near future works are to gather enough training data for
NNs, to collect more experience from finance experts, and to
evaluate results of the DSS with experienced auditors and
inspectors.
REFERENCES
[1] Cao Thang, Eric W. Cooper, Yukinobu Hoshino and Katsuari Kamei, A
Decision Support System for Rheumatic Evaluation and Treatment in
Oriental Medicine Using Fuzzy Logic and Neural Network, Lecture
Notes in Artificial Intelligence LNAI 3558, Springer-Verlag Berlin
Heidelberg, pp 399-409 (2005).
[2] Cao Thang, Eric W. Cooper, Yukinobu Hoshino, Katsuari Kamei and
Nguyen Hoang Phuong, A Proposed Model of Diagnosis and
Prescription in Oriental Medicine Using RBF Neural Networks, Journal
of Advanced Computational Intelligence and Intelligent Informatics
JACIII, Vol.10, No.4, pp. 458-464 (2006).
[3] Meksown, D.M., Wilson A.H., Automating Knowledge Acquisition for
Aerial Image Interpretation, Computer Vision Graphics (1990).
[4] Lindsay RK, Buchanan BG, Feigenbaum E A, Lederberg J., DENDRAL
- a Case study of the first expert system for scientific hypothesis
formation, Artificial Intelligence, Vol. 61 (2), pp 209261 (1993).
[5] Razinger M., Balasubramanian K., Rerdih M. et al., Stereoisomer
generation in computer-enhanced structure elucidation, J. Chem. Inf.
Comput. Sci., Vol. 33 (6), pp 812 (1993).
[6] Nguyen Hoang Phuong, Pratit Santiprabhob, Cao Thang, Masayuki
Ando, A fuzzy consultation system for computer configurations.
Proceedings of International Conference InTech/VJFuzzy2002, pp 137-
142 (2002).
[7] Altrock C.V., Fuzzy Logic and Neurofuzzy Applications in Business and
Finance. Prentice-Hall (1996).
[8] A. Deshmukh and L. Talluru, A Rule-Based Fuzzy Reasoning System
for Assesing the Risk of Management Fraud, Int. J. of Intelligent
Systems in Accounting, Finance and Management, Vol. 7, pp 223-241
(1998).
[9] Pathak J., Vidyarthi N., Summers S. L., A fuzzy-based algorithm for
auditors to detect elements of fraud in settled insurance claims,
Managerial Auditing Journal, Vol. 20 (6), pp 632-644 (2005).
[10] Baldwin-Morgan, A. A., The impact of an expert system for audit
planning: Evidence from a case study. International Journal of Applied
Expert Systems, Vol. 2 (3), pp 159174 (1994).
[11] Watkins, P., Eliot L., Expert Systems in Business and Finance: Issues
and Applications. John Wiley and Sons (1993).
[12] Eom, S. B., A survey of operational expert systems in business (1980
1993). Interfaces, Vol. 26 (5), pp 5070, (1996).
[13] Shaaf M., Ahmadi A., An artificial intelligence approach to the role of
exports in economic development of Malaysia. Atlantic Economic
Journal, December, pp 36375, (1999).
[14] Evans O.V.D., Short-Term Currency Forecasting using Neural Networks,
ICL Systems Journal, Vol. 11 (2) , pp 117 (1997).
[15] Zhang G. and M.Y. Hu, Neural Network Forecasting of the British
Pound/US Dollar Exchange Rate, International Journal of Management
Science, Vol. 26 (4), pp 495506, (1998).
[16] Swanson N. R., Halbert W., A model selection approach to real-time
macroeconomic forecasting using linear model and artificial neural
networks. Review of Economics and Statistics, Vol 79(4), pp540
573(1997).
[17] B.P. Green and J.H. Choi, Assessing the Risk of Management Fraud
Through Neural Network Technology, Auditing, Spring 1997, Vol. 16
(1), pp 14-28 (1997).
[18] Jianyong T., Shouju R., Wenhuang L., Xiu., Bing L., Lin L., Artificial
immune system for fraud detection, 2004 IEEE International
Conference on Systems, Man and Cybernetics, Vol. 2, pp 1407 1411
(2004).
[19] Wheeler R., Aitken S., Multiple algorithms for fraud detection,
Knokwledge-based System, Vol. 13 (2), pp 93-99 (2000).
[20] Suran G., and Philip T., (Eds.), Intelligent systems for finance and
business, John Wiley and Sons Inc., (1995).
1-4244-0569-6/06/$20.00 2006 IEEE
407