Professional Documents
Culture Documents
Automobile Insurance Fraud Detection An Overview
Automobile Insurance Fraud Detection An Overview
Abstract— Frauds are committed in a highly professional way, Exploited accidents are the ones in which the damage
due to which companies sometimes fail to identify that any fraud occurred in the past or the damage is intentionally increased
has occurred. Unprofessional frauds are also taking place, but for claiming more amount. Fabricated accidents are the ones
they can be identified easily by the companies. Detecting a fraud that did not take place or the complaint has not been
traditionally is carried out using manual techniques, but by
registered. Staged accidents, where vehicle under policy
using data mining and algorithms from machine leaning or deep
learning the detection process is automated and the frauds can cover is used or the one which is rented for representing the
be detected in a more efficient and structured way with more accident. Provoked accident, where the victim is not ready to
accuracy. The processing speed of information is gradually accept the fault and blames the other vehicle driver [2].
increased by using various algorithms of machine learning. The
highlight of this survey paper is to compare different methods It is very difficult to fight against insurance fraud, it is a
based on their performance. The algorithms help by detecting challenging task to do. Every year companies or
crucial patterns found in historical data and recognizing them if organizations which provide insurance suffer a lot due to
found in input data. The survey provides proper understanding false insurance claims. According to records, more than 21%
of different methodologies used for automobile fraud detection.
of vehicle insurance claims consist of questionable fraud, but
Sometimes machine learning cannot be used to detect multiple
frauds involving behavior changes. We propose a vehicle legal action is taken only on less than 3% of the suspected
insurance fraud detection mechanism using LSTM RNN fraud. Insurance fraud detection traditionally depends mostly
networks. LSTM is commonly used in deep learning to design on the survey and expert inspection. Insurance fraud must be
time series information. detected before insurance claim payment. Detecting frauds
manually is a tedious task. Machine learning methods can
Keywords— Fraud detection, Automobile, Insurance, Random play big roles in detecting suspicious cases, which will help
Forest, K-Means, RNN, LSTM. to minimize losses, both to insurance organizations and
policyholders. Good predictive methods will find fault
I. INTRODUCTION insurance claims [3].
Fraudulent claims have been increasing more due to The frauds are categorized into different categories like
digitalization. The scammers somehow bypass the present financial frauds, auto indemnity frauds, credit card frauds. It
fraud detection systems and find loopholes and make is further classified into soft insurance fraud and hard
fraudulent claims. It causes a lot of problems for the existing insurance fraud. When insurance is claimed for an accident
as well as new policyholders as the policy prices are increased that actually has not occurred, then it reflects as hard
to cope up with the losses from fraudulent claims. It has been insurance. When the part of the accident is exaggerated to
observed that systems follow a supervised approach [1]. claim more insurance reward, then it reflects as soft insurance
[4].
With the digitizing of services, there has been ease for
consumers but has been a challenging experience for The frauds are classified into opportunistic/professional and
insurance companies for tracing the genuine claims. organized-group frauds. The organized-group frauds are
However, this can be lowered by building a system using comparatively less than opportunistic/professional frauds.
cutting-edge technologies, but it requires high investment for Traditional methodologies are good, but they cannot be
short-term implementation. But, in long term, this will be extended beyond the limit to track opportunistic/professional
useful for the smooth working of the system and for frauds. Opportunistic frauds are hard to track, but
efficiently carrying out the day-to-day claims by professional frauds are much more complex to track [5].
policyholders. As stated in [6], for survival, the insurance companies are
imposing losses on consumers by increasing the policy
premium. The importance to discover a structured
7
978-1-6654-6756-8/22/$31.00 ©2022 IEEE
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 08,2023 at 12:19:49 UTC from IEEE Xplore. Restrictions apply.
2022 3rd International Conference on Intelligent Engineering and Management (ICIEM)
In classification, the maximum of all values is considered, According to [14], it is utilized to identify the bonds between
while, regression the average of the result of trees is more than one independent variables with dependent
considered. It is used by beginners to create a strong model. variable. Many studies consider Logistic Regression method
It consists of many decision trees. It can be used to manage as benchmark for further work. It is basically undertaken
missing values quite efficiently [7]. using many mathematical technologies for processing the
dataset with algorithms.
According to [8], it displays accurate result without any
information planning, actual modelling or demonstrating. To The model is developed considering the claim files of
be explicit it is dependent on decision trees. The ultimate aim insurers. Executives are interested in detailed examination.
is to develop various decision trees with subsets of the Only 0.64% of vehicle insurance files were used. To
dataset. complete automatic examination profitability y is mandatory
[15].
Mariya Mathew [9] proposed that estimation process is
created by increasing number of trees. Python library called The research work discussed in this sub-section is
Scikit-learn (sklearn), is used in training the model. Dataset summarized in Table II.
from a moto insurance organisation is used to train the model.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 08,2023 at 12:19:49 UTC from IEEE Xplore. Restrictions apply.
2022 3rd International Conference on Intelligent Engineering and Management (ICIEM)
[11] 1000 Labelled Classification, Random Forest Supervised Classification Highly efficient High training NA
Samples prediction in processing time
larger dataset
[12] Oracle NA RFM Random Forest Supervised Classification Deals with high Heterogeneity Precision: 71%
(Berger) (Recency, dimensionality of claims are
Frequency, not taken into Recall: 90%
Monetary) account Accuracy: NA
Model
C. Decision Tree
According to Janhavi Naik [18], it is an unsupervised
Decision trees are used to represent independent properties machine learning algorithm. Dataset is classified as clusters
and dependent properties in a tree-shaped structure. It is used and the basic aim is to define N centroids for each cluster to
to handle input and output variables. IF-THEN expressions be placed far from each other. Points from the dataset are
are the classification extracted from the decision trees and taken and associated with nearest centroids, when no points
they are logical ANDed [16]. are remaining N new centroids are re-calculated. A loop is
formed as a consequence of binding between nearest centroid
It is a machine learning algorithm that expresses independent
and dataset points. Finally, N new centroids change their
attribute with dependent attribute. The variety of applications
locations.
include instances from homeless security, customs
declaration fraud.
Ali Ghorbani [19] mentioned that the fraudulent cases are
The steps of classification and learning are typically fast, but taken from Iranian companies. Common ways are selected to
sometimes performance can decrease gradually. Applying simulate losses in a fake manner. The Cases having problem
C5.0 is considered as marginal improvement over C4.5 [17]. with their license are compared information is transferred into
a csv file.
Papers discussed in this sub-section are summarized in Table
III.
The research work discussed in this sub-section is
D. K-Means summarized in Table IV.
TABLE II. INSURANCE FRAUD DETECTION USING LOGISTIC REGRESSION
Learning Data mining
Ref Dataset Dataset Type Tools/Technologies Methodologies Merits Demerits Accuracy
Approach category
Unknown
923 Regularization Logistic Binary records are Maintaining data
[13] Labelled Supervised 69%
Samples Method Regression classification classified are is a tedious job
very fast
Solid
Logistic Binary Unknown inner
[14] NA Labelled Statistical methods Supervised generalizatio 95.1%
Regression classification structure
n
Adjuster’s decision, Some points of
Logistic Binary Simultaneou
[15] NA Labelled Threshold Supervised the fraud can be 47.1%
Regression classification s detection
comparison missed
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 08,2023 at 12:19:49 UTC from IEEE Xplore. Restrictions apply.
2022 3rd International Conference on Intelligent Engineering and Management (ICIEM)
E. RNN
Naïve Bayes is a powerful method for predicting class values
RNN is widely used in applications having sequential data,
of future. It is used for Binary classification. The reason
such as speech data, language, audio. The units of RNN are
behind the name is each property of a variable is independent
structured consequently. Internal memory is used to process
of the target variable. Fuzzy logic is also being used which is
inputs in RNN. The training time of RNN is more compared
used for abundant data. Time complexity is improved and it
to Feed-Forward Neural Networks (FFNNs), also it is more
is easy to implement. It has a considerable amount of
difficult to train than FFNNs [20].
accuracy. It is also proved to be reliable and also has many
According to [21], Deep learning techniques such as applications for identifying fraud [25].
recurrent neural networks are used in fraud detection as they
are marked as precise algorithms. RNN is marked as a Claim files were randomly selected from population files and
dynamic approach that can handle multiple transactions. a model was developed, which treated all possible fraud
Sequence classifier was introduced that is based on the LSTM cases. Insurance executives are interested in detailed
networks. investigations. According to [26], classified the frauds such
as inner vs outer and softer vs harder. To capitalize on soft vs
RNN’s are capable of processing long term dependencies.
hard, soft frauds were partially true whereas hard frauds were
They are feed-forward networks that has nodes which takes
completely fake.
present input and hidden node information of last steps [22].
The research work discussed in this sub-section is The conclusion from the analysis is that feature selection
summarized in Table V. method has considerable influence in determination of
anomalies in the insurance fraud detection dataset [27].
F. Miscellaneous
By referring to [24], Spectral Ranking of Anomalies (SRA) The research work discussed in this sub-section is
is the latest unsupervised method for anomaly detection. It summarized in Table VI.
has high precision compared to other unsupervised methods.
TABLE V. INSURANCE FRAUD DETECTION USING RNN
Learning Data mining
Ref Dataset Dataset Type Tools Methodologies Merits Demerits Accuracy
approach category
Real-world More suitable for
data for Semi- Neural solving
[20] NA LDA DNN, RNN NA NA
insurance supervised networks regression
company problems.
594643
Semi- Neural Information is Handling of
[21] Samples Structured Banksim LSTM RNN NA
supervised networks retained new patterns
(Kaggle)
Amazon Classifier trained
Tensorflow, RNN RBM Semi- Neural Failure to
[22] review/real Structured by augmented set NA
AutoEncoder GAN supervised networks generalize
world performed better
Structured/Un LR GBT RNN Semi- Neural Increased
[23] NA XGBoost NA NA
structured GNN supervised networks secretiveness
III. CHALLENGES AND FUTURE RESEARCH On the basis literature review in this paper following are
DIRECTIONS challenges and future research directions discussed briefly.
Automobile Insurance Fraud Detection is still an evolving
10
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 08,2023 at 12:19:49 UTC from IEEE Xplore. Restrictions apply.
2022 3rd International Conference on Intelligent Engineering and Management (ICIEM)
11
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 08,2023 at 12:19:49 UTC from IEEE Xplore. Restrictions apply.
2022 3rd International Conference on Intelligent Engineering and Management (ICIEM)
implemented using the RNN model, TensorFlow, Python in [19] Ali ghorbani and sara farzai, “fraud detection in automobile insurance
using a data mining based approach”, mu, iran, 2018.
due course of time.
[20] Ahmet murat ozbayoglu, mehmet ugur gudelek, omer berat sezer,”
REFERENCE deep learning for financial applications : a survey”, department of
computer engineering, tobb university of economics and technology,
ankara, turkey.
[1] Aisha Abdulllah, Mohd Aizaini Maroof, Anazida Zainal, “Fraud [21] Ibtissam benchaji, samira douzi, and bouabid el ouahidi,” credit card
detection system: A survey”, Information Assurance and Security fraud detection model based on lstm recurrent neural networks”,
Research Group, Faculty of Computing, Universiti Teknologi faculty of sciences ipss, university mohammed v, rabat, morocco
Malaysia, 81310 Skudai, Malaysia,2016. journal of advances in information technology vol. 12, no. 2, may 2021.
[2] Christian eckert, katrin osterrieder, “how digitalization affects [22] Saptarshi sengupta, sanchita basak, pallabi saikia, sayak paul, vasilios
insurance companies: overview and use cases of digital technologies”, tsalavoutis, frederick ditliac atiah, vadlamani ravi and richard alan
school of business, economics and society, friedrich-alexander peters ii,” a review of deep learning with special emphasis on
university erlangen-nürnberg (fau), lange gasse 20, 90403 nuremberg, architectures, applications and recent trends”.
germany, 2020. [23] Xiaoqian zhu, xiang ao, zidi qin, yanpeng chang,yang liu, qing he,and
[3] Ke nian, haofan zhang, aditya tayal, thomas coleman, yuying li,” auto jianping l,” intelligent financial fraud detection practices in post-
insurance fraud detection using unsupervised spectral ranking for for pandemic era”, klip, ict, china.
anomaly” cheriton school of computer science, waterloo, 2016. [24] Z. Shaeiri, s. J. Kazemitabar, “fast unsupervised automobile insurance
[4] H.lookman sithic, t.balasubramanian, "survey of insurance fraud fraud detection based on spectral ranking of anomalies”, son corporate
detection using data mining techniques", international journal of group, tehran, iran, department of electrical and computer engineering,
innovative technology and exploring engineering (ijitee) issn: 2278- babol noshirvani university of technology, babol, iran, 2020.
3075, volume-2, issue-3, february 2013. [25] K. Supraja, s.j. Saritha,” robust fuzzy rule based technique to detect
[5] Arezo bodaghi, md, babak teimourpour, ph.d, assistant professor "the frauds in vehicle insurance”,computer science and engineering
detection of professional fraud in automobile insurance using social jntuacep, pulivendula, a.p, india, 2017.
network analysis", school of industrial and systems engineering,tarbiat [26] Stijn viaene and guido dedene, "insurance fraud: issues and
modares university, iran, article · may 2018. challenges", the geneva papers on risk and insurance vol. 29 no. 2, april
[6] Yibo wang, wei xu, “leveraging deep learning with lda-based text 2004.
analytics to detect automobile insurance fraud”, school of information, [27] Tessy badriyah, lailul rahmaniah, iwan syarif,” nearest neighbour and
renmin university of china, beijing, 2017. statistics method based for detecting fraud in auto insurance”,
[7] Sapna panigrahi, bhakti palkar, “comparative analysis on classification informatics and computer engineering department electronics
algorithms of auto-insurance fraud detection based on feature selection engineering polytechnic insitute of surabaya (eepis) surabaya,
algorithms”, department of computer engineering, k.j. Somaiya college indonesia, 2018.
of engineering, vidyavihar, mumbai-77, maharashtra india, 2018. [28] Kavya priya m, anusha y g, amrutha t, harsha r, harshitha m r, "auto
[8] P sai pranavi, sheethal h d, sharanya s kumar, sonika kariappa, swathi insurance fraud detection”, assistant professor, department of computer
b h, "analysis of vehicle insurance data to detect fraudusing machine science and engineering,maharaja institute of technology, mysore,
learning", cse, vce. volume 8 issue vii, july 2020. karnataka, vol. 9, issue 7, july 2020.
[9] Mariya mathew , nimitha m kunjumon , ria maria lalji , kency susan [29] Clifton phua, vincent lee, kate smith & ross gayler, "a comprehensive
skariah , dr jeyakrishnan v, “motor insurance claim processing and survey of data mining-based fraud detection research.school of
detection of fraudulent claims using machine learning”, ug students, business systems", faculty of information technology, monash
department of computer science and engineering, saintgits college of university, clayton campus,wellington road, clayton, victoria 3800,
engineering, kottayam, kerala- 686536, india,vol. 13, no. 3, 2020. australia.
[10] G.kowshalya, dr.m.nandhini,“predicting fraudulent claims in [30] Manuel artís, mercedes ayuso, montserrat guillén, "detection of
automobile insurance”,research scholar department of computer automobile insurance fraud withdiscrete choice models and
science government arts college udumalpet, 2018. misclassified claims", the journal of risk and insurance, 2002.
[11] Yaqi li, chun yan, wei liu, maozhen li,” research and application of [31] Wei xu, shenhnan wang, dailing zhang, bo yang, "random subspace
random forest model in mining automobile insurance fraud”, sssu, based neural network ensemble for insurance fraud detection", school
quingdao, 2016. of information, renmin university of china and key laboratory of data
[12] Johannes stephen kalwihura, rajasvaran logeswaran, “auto-insurance engineering and knowledge engineering, ministry, 2011.
fraud detection: a behavioral feature engineering”, auti, malaysia, [32] Sharmila subudhi, suvasini panigrahi,” use of optimized fuzzy c-means
2020. clustering and supervised classifiers for automobile insurance fraud
[13] Santosh kumar majhia, subho bhatachharyaa, rosy pradhanb and detection”, department of computer science and engineering & it, aveer
shubhra biswala, "fuzzy clustering using salp swarm algorithm for surendra sai university of technology, burla, odisha, india, 2017.
automobile insurance fraud detection", department of computer science
and engineering, veer surendra sai university of technology, burla,
odisha, india, 2019.
[14] Anuj sharma, prabin kumar panigrahi, “a review of financial
accounting fraud detection based on data mining techniques”,
information systems area indian institute of management, indore, india,
2012.
[15] El bachir belhadji, georges dionne and faouzi tarkhani "a model for the
detection of insurance fraud", the geneva papers on risk and insurance
vol. 25 no. 4, october 2000.
[16] G. Ganesh sundarkumar, vadlamani ravi, v. Siddeshwar, “one-class
support vector machine based undersampling: application to churn
prediction and insurance fraud detection”, school of computer and
information sciences, university of hyderabad hyderabad - 500046,
india, 2015.
[17] Rekha Bhowmik, cs, ut, dallas, detecting auto insurance fraud by data
mining techniques”, volume 2 no.4, april 2011.
[18] Ms janhavi naik, dr j.a laxminarayana, “designing hybrid model for
fraud detection in insurance”, (computer department, goa college of
engineering, india, 2017.
12
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on September 08,2023 at 12:19:49 UTC from IEEE Xplore. Restrictions apply.