Professional Documents
Culture Documents
Application of Naïve Bayes Classification in Fraud Detection
Application of Naïve Bayes Classification in Fraud Detection
BAYES CLASSIFICATION IN
FRAUD DETECTION IN
AUTOMOBILE INSURANCE
INTRODUCTION TO NAÏVE
BAYES CLASSIFICATION:
• Naive Bayes is a classification algorithm for binary (two-
class) and multi-class classification problems. The technique
is easiest to understand when described using binary or
categorical input values.
• A Naive Bayesian model is easy to build, with no
complicated iterative parameter estimation which makes it
particularly useful for very large datasets.
• Despite its simplicity, the Naive Bayesian classifier often
does surprisingly well and is widely used because it often
outperforms more sophisticated classification methods.
Suppose we are building a classifier that says whether a text is about sports or not.
Our training data has 5 sentences:
• Now, which tag does the sentence A very close game belong to?
• Since Naive Bayes is a probabilistic classifier, we want to calculate the probability that
the sentence “A very close game” is Sports and the probability that it’s Not Sports. Then,
we take the largest one. Written mathematically, what we want is the probability that
the tag of a sentence is Sports given that the sentence is “A very close game”.
APPLICATIONS
• Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it
could be used for making predictions in real time.
• Multi class Prediction: This algorithm is also well known for multi class prediction feature.
Here we can predict the probability of multiple classes of target variable.
• Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used
in text classification (due to better result in multi class problems and independence rule)
have higher success rate as compared to other algorithms.
• Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds
a Recommendation System that uses machine learning and data mining techniques to filter
unseen information and predict whether a user would like a given resource or not
QUICK INTRODUCTION TO
BAYES’ THEOREM
Likelihood: Chance of a positive test
• Bayes’ theorem converts the results from your
(E) given testhadinto
that you cancerthe
(H). real
probability of the event.
Posterior: Chance of having cancer
•(H)Eg: Consider
given a positive testa(E).
Cancer Testing Scenario:
VARIABLE •
discretization that categorizes values into any arbitrary number
of buckets is possible as well.
Probability Density Estimation: The other option is using the
S ARE distribution of the numerical variable to have a good guess of
the frequency. For eg. Assumption of normal distribution for
numerical variables.
HANDLED?
WHAT IS A
LAPLACE
ESTIMATE ?
• In statistics, Laplace Smoothing is a
technique to smooth categorical
data.
• Laplace Smoothing is introduced to
solve the problem of zero
probability.
• By applying this method, prior K denotes the number of different values
probability and conditional
probability can be written as: in y and A denotes the number of different values
in aj. Usually lambda in the formula equals to 1.
APPLICATION OF NAÏVE BAYES IN FRAUD DETECTION
Numeric Variables Categorical Variables
DATASET Months_as_customer
Age
Insured_sex
Insured_occupation
The data is obtained from a Policy_number Insured_hobbies
US based Allstate Policy_bind_date Incident_type
Insurance Company Policy_state Collision_type
It contains 24 variables and Policy_deductable Incident_severity
9459 data points. Policy_annual_premium Incident_state
It shows the customer Auto_year Witnesses
demographics like age, Insured_education_level Fraud_reported
education level etc Incident_date Auto_make
It also gives the details Incident_hour_of_the_day
about policies and whether Number_of_vehicles_involved
the customers have claimed Policy_csl
fraudulently or not. Bodily_injuries
DATASET SNIPPET
EXPLORATORY DATA ANALYSIS
Fraud due to Incident severity Fraud due to Auto make
100%
359 57
343 90%
80%
70%
1585
60%
Fraud Reported
Fraud Reported
50%
20%
1032 10%
0%
a i t e rd a p s n b ru ota n
cur Aud BMW ole dg Fo nd Jee ede is sa Saa bu y ge
r o o a
Maj or Minor Total Loss Tr iv ial Ac ev D H rc N Su To ksw
Ch Me l
D amage D amage D amage Vo
Automobile Make
Incident Severity
State wise distribution of the Policy State
policyholders
76 76
1082 1110 725 752 867
721 814
Fraud Reported
Fraud Reported
2884 2696
IL IN OH
FEATURE incident_severity
total_claim_amount
777.5752
391.104
SELECTION policy_state
policy_annual_premium
308.2135
290.7139
incident_type 270.8983
Mean Decrease in Gini is the average witnesses 241.7709
(mean) of a variable’s total decrease in node auto_make 213.525
impurity, weighted by the proportion of incident_hour_of_the_day 209.2444
samples reaching that node in each
policy_deductable 186.9934
individual decision tree in the random
forest. authorities_contacted 142.3512
age 95.22258
A higher Mean Decrease in Gini indicates bodily_injuries 80.65723
higher variable importance. insured_education_level 80.35778
incident_state 73.91043
number_of_vehicles_involved 64.79794
months_as_customer 59.02295
insured_sex 38.15112
ANALYSIS OF CATEGORICAL VARIABLES
A-priori Probabilities
0 1
0.7581936 0.2418064
Conditional Probabilities
Incident Type Policy state Incident Severity
incident_type Yes No policy_state No Yes incident_severity Yes No
Multi-vehicle IL 0.3481468 0.312965
Collision 0.4051792 0.459712 Major Damage 0.146812 0.684572
IN 0.3045846 0.315622
Parked Car 0.099203 0.034353
Minor Damage 0.417928 0.148657
Single Vehicle OH 0.3472686 0.371413
0.376294 0.472204
Collision Total Loss 0.319123 0.143035
Vehicle Theft 0.119322 0.033728
Trivial Damage 0.116135 0.023735
VISUAL REPRESENTATION
ANALYSIS OF NUMERIC VARIABLES:
policy_annual_premium No Yes
mean 1257.1943 1249.014
sd 239.956 254.1597
months_as_customer No Yes
mean 202.8082 207.2072
sd 113.842 119.5043
total_claim_amount No Yes
mean 50128.24 60141.19
sd 27649.15 20856.31
age No Yes
mean 38.91692 39.02604
sd 8.992596 9.652652
train.fraud_reported No Yes p1
No 0.9960421 0.003957948 No
No 0.9302784 0.069721614 No
No 0.9513506 0.048649379 No
No 0.9954772 0.004522754 No
No 0.8428649 0.157135117 No
No 0.9843237 0.015676306 No
No 0.9993648 0.000635164 No
No 0.9993648 0.000635164 No
No 0.6912763 0.308723717 No
No 0.9541085 0.045891528 No
Yes 0.1191065 0.880893519 Yes
Sensitivity: 0.8853
Specificity: 0.5996
CONCLUS
ION
From the corresponding analysis and outputs of this research, the following conclusions
can be made:
PROS CONS
Simple, fast, and very effective Relies on an often-faulty assumption of equally
important and independent features
Not ideal for datasets with large numbers of
Does well with noisy and missing data numeric features
Requires relatively few examples for training, Estimated probabilities are less reliable than the
but also works well with very large numbers of predicted classes
examples
Easy to obtain the estimated probability for a
prediction
REFERENCES
• https://machinelearningmastery.com/naive-bayes-for-
machine-learning/
• https://betterexplained.com/articles/an-intuitive-and-sh
ort-explanation-of-bayes-theorem/
• https://www.cs.cmu.edu/~./awm/tutorials/prob_and_na
ive_bayes.pdf