A Combination of Mathematics, Statistics and Machine Learning To Detect Fraud

A Combination of Mathematics, Statistics and Machine Learning
To Detect Fraud
Md. Kowsher
Department of Applied Mathematics, Noakhali Science and Technology University,
Noakhali-3814, Bangladesh.
Email: ga.kowsher@gmail.com
Abstract: Fraud detection which is a discussible phenomenon to many bounds together with financial sectors, banking, insurance
as well as diverse forms of industries. Nowadays fraud endeavors are being amplifying with rampant pace especially via the
development of technology, so building fraud discovery more significant than ever before. In this paper, we analyzed the fraud
operations by managing “Banking Fraud Detection” database by the combination of mathematical, statistical as well as machine
learning ways and tried to spectacle a comparison among this ways. We narrated the elementary explorations by the combining
methods such as mathematical, statistical as well as machine learning to equip the ways of insidious banking transactions.
Keywords: Mathematics; Statistics; Machine learning; Fraud detection.
1. Introduction
If we behold on our daily newspapers or magazines we get some pieces of news about the event of the fraud case
such as economical deception, money laundering case, tax cheating etc. Banking Fraud is a rampant object of
critical part for financier, analyzers, comptroller, and observer. So tackle the cheater is very warming issue. In this
papers we discussed diverse forms techniques such as Benford's Law and various kinds of classifier algorithms
(Logistic Regression, Naive Bayes classifier, K-Nearest Neighbors algorithm, Support Vector Machine , Random
Forest classifier) in order to discovery the trickster so that we can get rid of economical or financial cheating . In
our path we used “Banking Fraud Detection” database which is obtained from a renowned bank. In our dataset
184763 data are available, among this data 653 data are have fraudulent.
2. METHODOLOGICAL ISSUES
The only destination of the paper is in order to reckon to unearth fraudulent or non-fraudulent emerged on the
datasets via mathematics and statistics (Benford‟s Law) as well as Machine Learning (Classifier algorithms/ Binary
Decision Model) hence make a comparison between them. In this way it will be also classified in the illegality of
companies or industries and also society.
2.1. Benford’s Law
The Benford‟s Law was first found out by the great mathematician Simon Newcomb in 1881[3]. He noticed in the
logarithm tables that one more occurring than two and three and so on then he enumerated the probability mass
function of a non-zero number for occurring is:
P(d) = , d = 1, 2, 3, 4, ….. , 9
Where P (d) is the probability of d.

In the equation the digit 1 occurs nearing 30% but it would take place as the foremost 11%. However sooth to say
that it occurs nearing 30%, similarly digit 2 occurs nearing 17% and the digit 9 occurs nearing 4.5%. The deliberated
probability for all digits form 0 to 9 is spectacled in the table-1 till first 4th place.
Table. 1. Benfors‟d table.
Similarly for the second digit the probability mass function:
P( ) =∑ ( ) , = 1, 2, 3, 4, ….. , 9 and so on.
But it is authentic that the worth wasn‟t acknowledged about 57 years. When the physicist Frank Benford noticed
that in real datasets in the world, the leading digits are allotted in a distinctive and non-uniform way and also
significant mathematically and statistically. It is sooth that Benford‟s Law is very significant mathematically and
statistically to detect fraud when the datasets are very large cause most of accounting real datasets obeys the rule of
Benford‟s Law such as economical data, financial data or banking data etc. So it‟s very simple to separate between
occurring task to non-occurring task according to law of Benford‟s. In our dataset we tried to visualize fraud via
Benford‟s Law and uncover the cheaters in mathematically and statistically. Here figure-1 show the fitted dataset
with the Benford‟s Law
Fig. 1. Fitted data with Benford‟s Law
In this Fig:1 the frauds can be determined using probability by the different between expected and found
distributions.
2.1.1. Statistical Tests
To calculate the accuracy or measure the fitted of the first two digits with Benford‟s Law the z-score or z-statistics is
used. A z-score or z-test ( standard score) indicates how many standard deviations of a component is from the
mean. A z-score can be calculated from the following formula.
Here, µ is mean and α is Standard Deviations
In Table: 2 the z-score of some distributions are spectacled with the expected and found distributions of our fitted
datasets throug Benford‟s Law.
Table. 2. Z-score of fitted distributions.
2.2. Machine learning Classifier Algorithms

Expect Benford‟s Law different types of classifier algorithms of machine learning such as Naive Bayes classifier,
K-nearest neighbors algorithm, Support vector machine , Logistic Regression, Random forest classifier are used to
detect fraud since the fraud detection is a classification problems and in this ways we got the best accuracy from the
Naive Bayes classifier.
2.2.1. Naive Bayes classifier

The Naive Bayes method is based on the Bayesian theorem of probability and is came true when the rate of the
dataset is too exalted. The theorem is based on probability which is statistically significant. The formula of Bayes
classifier is
2.2.2. Comparison of Classification Algorithms
Table. 3. Accuracy of classifier algorithms Fig. 2. Comparison among the classifer algorithms
In the above Table. 3 we get the Naive Bayes Classifier shows 98.23% accuracy among the six Classification
Algorithms. So our selection classification algorithm is Naive Bayes Classifier
3. CONCLUSIONS
It‟s is recognized that diverse types of „Fraud Detections‟ techniques already have with real dataset. In this papers a
new decision support system is implemented with the combining with mathematics, statistics and machine learning
to detect fraud and showed a comparison among them.
Reference
1. Alali, F. A., & Romero, S. (2013). Benford's Law: Analyzing a decade of financial data. Journal of
Emerging Technologies in Accounting, 10(1), 1-39.
2. Alhosani, W. (2016). Anti-Money Laundering. A Comparative and Critical Analysis of the UK and UAE's
Financial Intelligence Units. Springer.
3. Newcomb, Simon. "Note on the frequency of use of the different digits in natural numbers." American
Journal of mathematics 4.1 (1881): 39-40.
4. Burns, B. D. (2009), „Sensitivity to Statistical Regularities: People (Largely) Follow Benford‟s Law‟, in: N.
Taatgen and H. van Rijn (eds.), Proceedings of the Thirty-First Annual Conference of the Cognitive
Science Society, Cognitive Science Society, Austin, TX, pp. 2872–2877.
5. Diekmann, A. (2007), „Not the First Digit! Using Benford‟s Law to Detect Fraudulent Scientific Data‟,
Journal of Applied Statistics 34, 321–329.
6. Yee, Ong Shu, Saravanan Sagadevan, and Nurul Hashimah Ahamed Hassain Malim. "Credit Card Fraud
Detection Using Machine Learning As Data Mining Technique." Journal of Telecommunication, Electronic
and Computer Engineering (JTEC) 10.1-4 (2018): 23-27.

A Combination of Mathematics, Statistics and Machine Learning To Detect Fraud

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Combination of Mathematics, Statistics and Machine Learning To Detect Fraud

Uploaded by

Copyright:

Available Formats

A Combination of Mathematics, Statistics and Machine Learning

Department of Applied Mathematics, Noakhali Science and Technology University,

Keywords: Mathematics; Statistics; Machine learning; Fraud detection.

2.1. Benford’s Law

Where P (d) is the probability of d.

Table. 1. Benfors‟d table.

Similarly for the second digit the probability mass function:

P( ) =∑ ( ) , = 1, 2, 3, 4, ….. , 9 and so on.

Fig. 1. Fitted data with Benford‟s Law

Here, µ is mean and α is Standard Deviations

Table. 2. Z-score of fitted distributions.

2.2. Machine learning Classifier Algorithms

2.2.1. Naive Bayes classifier

2.2.2. Comparison of Classification Algorithms

You might also like