Professional Documents
Culture Documents
A Combination of Mathematics, Statistics and Machine Learning To Detect Fraud
A Combination of Mathematics, Statistics and Machine Learning To Detect Fraud
To Detect Fraud
Md. Kowsher
Noakhali-3814, Bangladesh.
Email: ga.kowsher@gmail.com
Abstract: Fraud detection which is a discussible phenomenon to many bounds together with financial sectors, banking, insurance
as well as diverse forms of industries. Nowadays fraud endeavors are being amplifying with rampant pace especially via the
development of technology, so building fraud discovery more significant than ever before. In this paper, we analyzed the fraud
operations by managing “Banking Fraud Detection” database by the combination of mathematical, statistical as well as machine
learning ways and tried to spectacle a comparison among this ways. We narrated the elementary explorations by the combining
methods such as mathematical, statistical as well as machine learning to equip the ways of insidious banking transactions.
1. Introduction
If we behold on our daily newspapers or magazines we get some pieces of news about the event of the fraud case
such as economical deception, money laundering case, tax cheating etc. Banking Fraud is a rampant object of
critical part for financier, analyzers, comptroller, and observer. So tackle the cheater is very warming issue. In this
papers we discussed diverse forms techniques such as Benford's Law and various kinds of classifier algorithms
(Logistic Regression, Naive Bayes classifier, K-Nearest Neighbors algorithm, Support Vector Machine , Random
Forest classifier) in order to discovery the trickster so that we can get rid of economical or financial cheating . In
our path we used “Banking Fraud Detection” database which is obtained from a renowned bank. In our dataset
184763 data are available, among this data 653 data are have fraudulent.
2. METHODOLOGICAL ISSUES
The only destination of the paper is in order to reckon to unearth fraudulent or non-fraudulent emerged on the
datasets via mathematics and statistics (Benford‟s Law) as well as Machine Learning (Classifier algorithms/ Binary
Decision Model) hence make a comparison between them. In this way it will be also classified in the illegality of
companies or industries and also society.
The Benford‟s Law was first found out by the great mathematician Simon Newcomb in 1881[3]. He noticed in the
logarithm tables that one more occurring than two and three and so on then he enumerated the probability mass
function of a non-zero number for occurring is:
P(d) = , d = 1, 2, 3, 4, ….. , 9
But it is authentic that the worth wasn‟t acknowledged about 57 years. When the physicist Frank Benford noticed
that in real datasets in the world, the leading digits are allotted in a distinctive and non-uniform way and also
significant mathematically and statistically. It is sooth that Benford‟s Law is very significant mathematically and
statistically to detect fraud when the datasets are very large cause most of accounting real datasets obeys the rule of
Benford‟s Law such as economical data, financial data or banking data etc. So it‟s very simple to separate between
occurring task to non-occurring task according to law of Benford‟s. In our dataset we tried to visualize fraud via
Benford‟s Law and uncover the cheaters in mathematically and statistically. Here figure-1 show the fitted dataset
with the Benford‟s Law
In this Fig:1 the frauds can be determined using probability by the different between expected and found
distributions.
2.1.1. Statistical Tests
To calculate the accuracy or measure the fitted of the first two digits with Benford‟s Law the z-score or z-statistics is
used. A z-score or z-test ( standard score) indicates how many standard deviations of a component is from the
mean. A z-score can be calculated from the following formula.
In Table: 2 the z-score of some distributions are spectacled with the expected and found distributions of our fitted
datasets throug Benford‟s Law.
Table. 3. Accuracy of classifier algorithms Fig. 2. Comparison among the classifer algorithms
In the above Table. 3 we get the Naive Bayes Classifier shows 98.23% accuracy among the six Classification
Algorithms. So our selection classification algorithm is Naive Bayes Classifier
3. CONCLUSIONS
It‟s is recognized that diverse types of „Fraud Detections‟ techniques already have with real dataset. In this papers a
new decision support system is implemented with the combining with mathematics, statistics and machine learning
to detect fraud and showed a comparison among them.
Reference
1. Alali, F. A., & Romero, S. (2013). Benford's Law: Analyzing a decade of financial data. Journal of
Emerging Technologies in Accounting, 10(1), 1-39.
2. Alhosani, W. (2016). Anti-Money Laundering. A Comparative and Critical Analysis of the UK and UAE's
Financial Intelligence Units. Springer.
3. Newcomb, Simon. "Note on the frequency of use of the different digits in natural numbers." American
Journal of mathematics 4.1 (1881): 39-40.
4. Burns, B. D. (2009), „Sensitivity to Statistical Regularities: People (Largely) Follow Benford‟s Law‟, in: N.
Taatgen and H. van Rijn (eds.), Proceedings of the Thirty-First Annual Conference of the Cognitive
Science Society, Cognitive Science Society, Austin, TX, pp. 2872–2877.
5. Diekmann, A. (2007), „Not the First Digit! Using Benford‟s Law to Detect Fraudulent Scientific Data‟,
Journal of Applied Statistics 34, 321–329.
6. Yee, Ong Shu, Saravanan Sagadevan, and Nurul Hashimah Ahamed Hassain Malim. "Credit Card Fraud
Detection Using Machine Learning As Data Mining Technique." Journal of Telecommunication, Electronic
and Computer Engineering (JTEC) 10.1-4 (2018): 23-27.