Professional Documents
Culture Documents
MANGENMENT
(An Autonomous Institute affiliated to SPPU)
AY 2020-21
MINI PROJECT – Machine Learning
Credit Cards Fraud Detection System
Third Year VI Semester
GROUP: 15
Abstract
Introduction
Literature Review
Proposed Technique
Performance Metrics and Experimental results
Scope of future application
Conclusions
References
Introduction
ML Mini Project- Credit Cards Fraud Detection System
‘Fraud’ in credit card transactions is unauthorized and unwanted
usage of an account by someone other than the owner of that account.
Necessary prevention measures can be
taken to stop this abuse and the behaviour of such fraudulent
practices can be studied to minimize it and protect against
similar occurrences in the future. In other words, Credit Card Fraud
can be defined as a case where a person uses someone else’s credit
card for personal reasons while the owner and the card issuing
authorities are unaware of the fact that the card is
being used.
Fraud detection involves monitoring the activities of
populations of users in order to estimate, perceive or avoid
objectionable behaviour, which consist of fraud, intrusion, and
defaulting.
This is a very relevant problem that demands the attention of
communities such as machine learning and data science where
the solution to this problem can be automated.
This problem is particularly challenging from the perspective
of learning, as it is characterized by various factors such as
class imbalance. The number of valid transactions far
outnumber fraudulent ones. Also, the transaction patterns
often change their statistical properties over the course of
time. These are not the only challenges in the implementation of a
real-world fraud detection system, however. In real world
examples, the massive stream of payment requests is quickly
scanned by automatic tools that determine which transactions
to authorize.
Machine learning algorithms are employed to analyse all the
authorized transactions and report the suspicious ones.
These reports are investigated by professionals who contact the
cardholders to confirm if the transaction was genuine or fraudulent.
The investigators provide a feedback to the automated system which
is used to train and update the algorithm to eventually improve the
fraud-detection performance over time.
Literature Review
ML Mini Project- Credit Cards Fraud Detection System
1. The Uncertain Case of Credit Card Fraud Detection:
Uncertainty is inherent in many real-time event-driven applications.
Credit card fraud detection is a typical uncertain domain, where
potential fraud incidents must be detected in real time and tagged
before the transaction has been accepted or denied. We present
extensions to the IBM Proactive Technology Online (PROTON) open
source tool to cope with uncertainty. The inclusion of uncertainty
aspects impacts all levels of the architecture and logic of an event
processing engine. The extensions implemented in PROTON include
the addition of new built-in attributes and functions, support for new
types of operands, and support for event processing patterns to cope
with all these. The new capabilities were implemented as building
blocks and basic primitives in the complex event processing
programmatic language. This enables implementation of event-driven
applications possessing uncertainty aspects from different domains in
a generic manner. A first application was devised in the domain of
credit card fraud detection. Our preliminary results are encouraging,
showing potential benefits that stem from incorporating uncertainty
aspects to the domain of credit card fraud detection [1].(Author-
Fabiana Fournier, Ivo carriea, Inna skarbovsky)
Proposed Technique
Algorithm steps:
Step 1: Read the dataset.
Step 2: Random Sampling is done on the data set to make it
balanced.
Step 3: Divide the dataset into two parts i.e., Train dataset and
Test dataset.
Step 4: Feature selection are applied for the proposed models.
Step 5: Accuracy and performance metrics has been calculated to
know the efficiency for different algorithms.
Step6: Then retrieve the best algorithm based on efficiency for the
given dataset.
1. Logistic Regression:
Logistic Regression is one of the classification algorithm, used to
predict a binary value in a given set of independent variables (1 / 0,
Yes / No, True / False). To represent binary / categorical values,
dummy variables are used. For the purpose of special case in the
logistic regression is a linear regression, when the resulting variable is
categorical then the log of odds is used for dependent variable and
also it predicts the probability of occurrence of an event by fitting data
to a logistic function. Such as
O = e^ (I0 + I1*x) / (1 + e^ (I0 + I1*x))
Where,
O is the predicted output
I0 is the bias or intercept term
I1 is the coefficient for the single input value (x).
3. Random Forest:
Random forest is a tree based algorithm which involves building
several trees and combining with the output to improve generalization
ability of the model. This method of combining trees is known as an
ensemble method.
Ensembling is nothing but a combination of weak learners (individual
trees) to produce a strong learner. Random Forest can be used to solve
regression and classification problems. In regression problems, the
dependent variable is continuous. In classification problems, the
dependent variable is categorical.
Accuracy:
Accuracy is calculated as the total number of two correct
predictions(A+B) divided by the total number of the dataset(C+D).
It is calculated as (1-error rate).
Accuracy=A+B/C+D
Whereas,
A=True Positive
B=True Negative
C=Positive
D=Negative
Error rate:
Error rate is calculated as the total number of two incorrect
predictions(F+E) divided by the total number of the dataset(C+D).
Error rate=F+E/C+D
Whereas,
E=False Positive F=False Negative
C=Positive D=Negative
Sensitivity:
Sensitivity is calculated as the number of correct positive
predictions(A) divided by the total number of positives(C).
Sensitivity=A/C
Specificity:
Future Scope
ML Mini Project- Credit Cards Fraud Detection System
While we couldn’t reach out goal of 100% accuracy in fraud
detection, we did end up creating a system that can, with enough time
and data, get very close to that goal.
As with any such project, there is some room for improvement here.
The very nature of this project allows for multiple algorithms to be
integrated together as modules and their results can be combined to
increase the accuracy of the final result.
This model can further be improved with the addition of more
algorithms into it. However, the output of these algorithms needs to
be in the same format as the others. Once that condition is satisfied,
the modules are easy to add as done in the code. This provides a great
degree of modularity and versatility to the project.
More room for improvement can be found in the dataset. As
demonstrated before, the precision of the algorithms increases when
the size of dataset is increased. Hence, more data will surely make the
model more accurate in detecting frauds and reduce the number of
false positives. However, this requires official support from the banks
themselves.
Conclusion