You are on page 1of 30

APPLICATION OF NAÏVE

BAYES CLASSIFICATION IN
FRAUD DETECTION IN
AUTOMOBILE INSURANCE
INTRODUCTION TO NAÏVE
BAYES CLASSIFICATION:
• Naive Bayes is a classification algorithm for binary (two-
class) and multi-class classification problems. The technique
is easiest to understand when described using binary or
categorical input values.
• A Naive Bayesian model is easy to build, with no
complicated iterative parameter estimation which makes it
particularly useful for very large datasets.
• Despite its simplicity, the Naive Bayesian classifier often
does surprisingly well and is widely used because it often
outperforms more sophisticated classification methods. 
Suppose we are building a classifier that says whether a text is about sports or not.
Our training data has 5 sentences:

Text Tag (Class)

“A great game” Sports

“The election was over” Not sports


Example
“Very clean match” Sports

“A clean but forgettable game” Sports

“It was a close election” Not sports

• Now, which tag does the sentence A very close game belong to?
• Since Naive Bayes is a probabilistic classifier, we want to calculate the probability that
the sentence “A very close game” is Sports and the probability that it’s Not Sports. Then,
we take the largest one. Written mathematically, what we want is the probability that
the tag of a sentence is Sports given that the sentence is “A very close game”.
APPLICATIONS
• Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it
could be used for making predictions in real time.
• Multi class Prediction: This algorithm is also well known for multi class prediction feature.
Here we can predict the probability of multiple classes of target variable.
• Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used
in text classification (due to better result in multi class problems and independence rule)
have higher success rate as compared to other algorithms.
• Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds
a Recommendation System that uses machine learning and data mining techniques to filter
unseen information and predict whether a user would like a given resource or not
QUICK INTRODUCTION TO
BAYES’ THEOREM
Likelihood: Chance of a positive test
• Bayes’ theorem converts the results from your
(E) given testhadinto
that you cancerthe
(H). real
probability of the event. 
Posterior: Chance of having cancer
•(H)Eg: Consider
given a positive testa(E).
Cancer Testing Scenario:

Prior: Chance of having cancer

Marginalization: Chance of a positive


test (E)
BUZZWORD MEANING

Evidence Some symptom, or other thing you can


observe

Prior Probability of H being true. This is the


knowledge

Likelihood The probability of E being true, given that


H is true
The probability of H being true, given that
Posterior
E is true
Marginalization The probability of E being true
ASSUMPTIONS OF NAÏVE
BAYES ALGORITHM
• The Naive Bayesian classifier
is based on Bayes’ theorem
with the independence
assumptions between
predictors. 
• This assumption is called
class conditional
independence.
REPRESENTATION USED BY
NAIVE BAYES MODELS:
• The representation for naive Bayes is probabilities.
• A list of probabilities are stored to file for a learned naive
Bayes model. This includes:
 Class Probabilities: The probabilities of each class in the
training dataset.
 Conditional Probabilities: The conditional probabilities of
each input value given each class value.
STEPS FOR COMPUTING
NAÏVE BAYES ALGORITHM:
In case of a single feature
• Naive Bayes classifier calculates the probability of an event
in the following steps:
• Step 1: Calculate the prior probability for given class labels
• Step 2: Find Likelihood probability with each attribute for
each class
• Step 3: Put these value in Bayes Formula and calculate
posterior probability.
• Step 4: See which class has a higher probability, given the
input belongs to the higher probability class.
HOW • Discretization: One can transform continuous features into
discrete features by categorizing different values into discrete

NUMERIC buckets. For instance, a continuous feature can be binarized by


treating all values that exceed a threshold as "Large" and all
values the don't "Small". Over course, more fine-grained

VARIABLE •
discretization that categorizes values into any arbitrary number
of buckets is possible as well.
Probability Density Estimation: The other option is using the
S ARE distribution of the numerical variable to have a good guess of
the frequency. For eg. Assumption of normal distribution for
numerical variables.

HANDLED?
WHAT IS A
LAPLACE
ESTIMATE ?
• In statistics, Laplace Smoothing is a
technique to smooth categorical
data.
• Laplace Smoothing is introduced to
solve the problem of zero
probability.
• By applying this method, prior K denotes the number of different values
probability and conditional
probability can be written as: in y and A denotes the number of different values
in aj. Usually lambda in the formula equals to 1.
APPLICATION OF NAÏVE BAYES IN FRAUD DETECTION
Numeric Variables Categorical Variables
DATASET Months_as_customer
Age
Insured_sex
Insured_occupation
 The data is obtained from a Policy_number Insured_hobbies
US based Allstate Policy_bind_date Incident_type
Insurance Company Policy_state Collision_type
 It contains 24 variables and Policy_deductable Incident_severity
9459 data points. Policy_annual_premium Incident_state
 It shows the customer Auto_year Witnesses
demographics like age, Insured_education_level Fraud_reported
education level etc Incident_date Auto_make
 It also gives the details Incident_hour_of_the_day
about policies and whether Number_of_vehicles_involved
the customers have claimed Policy_csl
fraudulently or not. Bodily_injuries  
DATASET SNIPPET
EXPLORATORY DATA ANALYSIS
Fraud due to Incident severity Fraud due to Auto make
100%
359 57
343 90%

80%

70%
1585
60%
Fraud Reported

Fraud Reported
50%

2981 795 40%


2307
30%

20%
1032 10%

0%
a i t e rd a p s n b ru ota n
cur Aud BMW ole dg Fo nd Jee ede is sa Saa bu y ge
r o o a
Maj or Minor Total Loss Tr iv ial Ac ev D H rc N Su To ksw
Ch Me l
D amage D amage D amage Vo

Automobile Make
Incident Severity
State wise distribution of the Policy State
policyholders

Total Claim Amount


Policy Annual Premium
Fraud due to Incide nt t ype Fraud due to Policy s tat e

76 76
1082 1110 725 752 867

721 814
Fraud Reported

Fraud Reported
2884 2696

2475 2177 2463

IL IN OH

Incident Type Policy State


OBJECTIVE
To analyse the data for All State Insurance
Company and classify the policy holders as
Fraudulent or Not Fraudulent
ANALYSIS
1. Factorisation of the variables: Certain variables like “Number of vehicles involved”,
“Bodily injuries” which were categorical in nature were converted into factors
2. The data was then divided into train and test data with a split ratio of 80:20
Variables Mean Decrease Gini

FEATURE incident_severity
total_claim_amount
777.5752
391.104

SELECTION policy_state
policy_annual_premium
308.2135
290.7139
incident_type 270.8983
Mean Decrease in Gini is the average witnesses 241.7709
(mean) of a variable’s total decrease in node auto_make 213.525
impurity, weighted by the proportion of incident_hour_of_the_day 209.2444
samples reaching that node in each
policy_deductable 186.9934
individual decision tree in the random
forest. authorities_contacted 142.3512
age 95.22258
A higher Mean Decrease in Gini indicates bodily_injuries 80.65723
higher variable importance. insured_education_level 80.35778
incident_state 73.91043
number_of_vehicles_involved 64.79794
months_as_customer 59.02295
insured_sex 38.15112
ANALYSIS OF CATEGORICAL VARIABLES

A-priori Probabilities
0 1
  0.7581936  0.2418064

Conditional Probabilities
Incident Type Policy state Incident Severity
incident_type Yes No policy_state No Yes incident_severity Yes No
Multi-vehicle IL 0.3481468 0.312965
Collision 0.4051792 0.459712 Major Damage 0.146812 0.684572
IN 0.3045846 0.315622
Parked Car 0.099203 0.034353
Minor Damage 0.417928 0.148657
Single Vehicle OH 0.3472686 0.371413
0.376294 0.472204
Collision Total Loss 0.319123 0.143035
Vehicle Theft 0.119322  0.033728
Trivial Damage 0.116135 0.023735
VISUAL REPRESENTATION
ANALYSIS OF NUMERIC VARIABLES:
policy_annual_premium No Yes
mean 1257.1943 1249.014

sd 239.956 254.1597

months_as_customer No Yes
mean 202.8082 207.2072
sd 113.842 119.5043

total_claim_amount No Yes
mean 50128.24 60141.19
sd 27649.15 20856.31

age No Yes
mean 38.91692 39.02604
sd 8.992596 9.652652
train.fraud_reported No Yes p1
No 0.9960421 0.003957948 No
No 0.9302784 0.069721614 No
No 0.9513506 0.048649379 No
No 0.9954772 0.004522754 No
No 0.8428649 0.157135117 No
No 0.9843237 0.015676306 No
No 0.9993648 0.000635164 No
No 0.9993648 0.000635164 No
No 0.6912763 0.308723717 No
No 0.9541085 0.045891528 No
Yes 0.1191065 0.880893519 Yes

MODEL FITTING Yes


No
No
0.9173722
0.7843711
0.8708181
0.082627772
0.215628927
0.129181892
No
No
No
Yes 0.3525399 0.647460076 Yes
No 0.9260462 0.073953799 No
No 0.9843237 0.015676306 No
No 0.5456276 0.454372433 No
… … … …
No 0.9664259 0.03357412 No
No 0.9514347 0.048565279 No
No 0.9688303 0.031169717 No
No 0.9960421 0.003957948 No
Yes 0.9173722 0.082627772 No
No 0.9260462 0.073953799 No
No 0.3591227 0.640877264 Yes
MEASURES OF
ACCURACY
Inferences from the confusion matrix created:

 Accuracy of the model: 0.8084

 Sensitivity: 0.8853

 Specificity: 0.5996
CONCLUS
ION
From the corresponding analysis and outputs of this research, the following conclusions
can be made:

• The accuracy of the Naïve Bayes model is 0.8084


• The probability density function of the numeric variables turn out to follow gaussian
distribution
• Naïve Bayes is not sensitive to scaling, as we take in account only the probability
values
• Incident Severity is the most important attribute with Mean Decrease in Gini as
777.5752
LIMITATIONS AND FUTURE SCOPE:

Although Bayes Classifiers are simples to implement, Logistic


Regression or other discriminative methods often learn more
accurately.

The accuracy of the model can be tested at different values of Laplace


Estimator and a more precise model can be gauged.

The numerical variables can be encoded and further conditional


probabilities can be derived for the same.
PROS AND CONS OF NAÏVE BAYES ALGORITHM:

PROS CONS
Simple, fast, and very effective Relies on an often-faulty assumption of equally
important and independent features
Not ideal for datasets with large numbers of
Does well with noisy and missing data numeric features
Requires relatively few examples for training, Estimated probabilities are less reliable than the
but also works well with very large numbers of predicted classes
examples
Easy to obtain the estimated probability for a
prediction
REFERENCES
• https://machinelearningmastery.com/naive-bayes-for-
machine-learning/
• https://betterexplained.com/articles/an-intuitive-and-sh
ort-explanation-of-bayes-theorem/
• https://www.cs.cmu.edu/~./awm/tutorials/prob_and_na
ive_bayes.pdf

You might also like