Batch 31

CREDIT CARD FRAUD DETECTION SYSTEM
A PROJECT REPORT
Submitted by
NITHIN M (312417104061)
SABHARAM M (312417104083)
in partial fulfilment for the requirement of award of the degree
of
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
St. JOSEPH’S INSTITUTE OF TECHNOLOGY
CHENNAI - 119
ANNA UNIVERSITY: CHENNAI 600 025
APRIL-2021
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “CREDIT CARD FRAUD DETECTION
SYSTEM” is the bonafide work of “SABHARAM M (312417104083) and
NITHIN M (312417104061)” who carried out the project under my supervision
SIGNATURE SIGNATURE
Dr. J. DAFNI ROSE M.E, Ph.D., Mrs. V. NISHA JENIPHER, M.E., (Ph.D.),
Professor, Assistant Professor,
Head of the Department, Computer Science and Engineering,
Computer Science and Engineering, St. Joseph’s Institute of Technology,
St. Joseph’s Institute of Technology, Old Mamallapuram road,
Old Mamallapuram road, Chennai - 600 119.
Chennai - 600 119.
i
ACKNOWLEDGEMENT
We also take this opportunity to thank our honourable Chairman Dr. B. Babu
Manoharan, M.A., M.B.A., Ph.D. for the guidance he offered during our tenure in
this institution.
We extend our heartfelt gratitude to our honourable Director Mrs. B. Jessie

Priya, M.com., for providing us with the required resources to carry out this project.
We express our deep gratitude to our honourable CEO Mr. B. Sashi Sekar,
M.Sc (INTL.Business) for the constant guidance and support for our project.
We are indebted to our Principal Dr. P. Ravichandran, M.Tech., Ph.D. for

granting us permission to undertake this project.
Our earnest gratitude to our Head of the Department Dr. J. Dafni Rose, M.E.,
Ph.D., for her commendable support and encouragement for the completion of the
project with perfection.
We express our profound gratitude to our guide Mr. J. Nisha Jenipher, M.E.,
(Ph.D.), for his guidance, constant encouragement, immense help and valuable advice
for the completion of this project.
We wish to convey our sincere thanks to all the teaching and non-teaching
staff of the department of COMPUTER SCIENCE AND ENGINEERING without
whose co-operation this venture would not have been a success.
ii
CERTIFICATE OF EVALUATION
College Name: St. JOSEPH’S INSTITUTE OF TECHNOLOGY
Branch: COMPUTER SCIENCE AND ENGINEERING
Semester: VIII
Name of the Supervisor with

Sl.No Name of the Title of the designation
Students Project
NITHIN M Mr. J. Nisha Jenipher,

1
(312416104061) Credit Card M.E., (Ph.D.),
Fraud Assistant Professor,
SABHARAM M Detection St. Joseph’s Institute Of

2
(312416104102) System Technology
The reports of the project work submitted by the above students in partial
fulfilment for the award of Bachelor of Engineering Degree in Computer Science
and Engineering of Anna University were evaluated and confirmed to be reports
of the work done by above students.
Submitted for project review and viva voce exam held on ________________
(INTERNAL EXAMINER) (EXTERNAL EXAMINER)
iii
ABSTRACT
Machine Learning is the field of study that provides computers ability to learn without
explicitly being programmed. It focuses on the development of computer programs that can
access data and use it learn for themselves. Training is given on dataset and testing is done on
new dataset to get desired result. This process is repeated many times to improvise the learning
process of computer. It enables computers to handle tasks which are done by humans. It is the
process of teaching a computer system to make accurate predictions when fed data. A dataset
was taken by European company to find accuracy of number of fraud transactions that has
happened genuinely. It is found using an algorithm in machine learning called logistic
regression. This technique showed that the accuracy of 99.962% of genuine data and 79.065%
of fraud data. Accuracy of the algorithm showed 50%. In order to avoid and increase the
accuracy of existing system, another algorithm was taken into consideration which is Random
Forest. This algorithm is stable as it processes large number of dataset and gives exact accuracy
value even if new data is added or any missing value is present in dataset. It showed that
accuracy is 99.988% of genuine data and 42.683% of fraud data. Various libraries are imported
and dataset s collected and loaded using pandas library. Data is explored as numerical values
are defined or not and data imbalance is checked and make it balanced if needed. Data is split
and plotted to calculate the percentage of fraud transactions and valid transactions and
correlate the features which are relevant to each other for prediction. The purpose of this
algorithm is to obtain goods without paying or to obtain unauthorized frauds in an account.
The e-commerce website is used as an example to detect number of fraud transactions made
in customer’s account. So the products which are not bought by customer but paid are known
and recognized that fraud transactions are made. Credit card Fraud Detection
iv
TABLE OF CONTENTS
CHAPTER PAGE
TITLE
NO NUMBER
ABSTRACT iv
LIST OF TABLES vi
LIST OF FIGURES vii
1. INTRODUCTION 1
1.1 Overview 1
1.2 Problem Statement 1
1.3 Existing System 2
1.4 Proposed System 3
2. LITERATURE SURVEY 4
3. SYSTEM DESIGN 6
3.1 System Requirements 6
3.2 UML Flow Diagrams 6
3.2.1 Use Case Diagram of Credit card 6
Fraud Detection
3.2.2 Sequence Diagram of Credit card 7

Fraud Detection
3.2.3 Activity Diagram of Credit card 8

Fraud Detection
3.2.4 Component Diagram of Credit card 9

Fraud Detection
3.2.6 Deployment Diagram of Credit card 10

Fraud Detection
3.2.7 Package Diagram of Credit card

Fraud Detection 11
v
4. SYSTEM ARCHITECTURE 13
4.1 Architectural Design 13
4.2 List of Modules 14
4.2.1 Preprocessing module 14
4.2.2 Machine Learning Module 14
4.2.3 Data exploration Module 14
5. SYSTEM IMPLEMENTATION 15
5.1 System Description 15
5.2 Pseudo code for Random Forest Algorithm 15
5.4 System Accuracy 15
6. RESULTS AND CODING 16
6.1 Sample Code 16
6.1.1 ML Model Code 16
6.1.2 Dataset File 16
6.2 Screenshots 17
7. CONCLUSION AND FUTURE WORK 20
7.1 Conclusion 20
7.2 Future Work 20
References
21
vi
LIST OF FIGURES
LIST OF NAME OF THE FIGURE PAGE NO

FIGURES
3.1 Use case diagram 6
3.2 Sequence diagram 7
3.3 Activity Diagram 8
3.4 Component Diagram 9
3.5 Collaboration Diagram 9
3.6 Deployment Diagram 10
3.7 Package Diagram 11
4.1 System Architecture Diagram 13
vii
CHAPTER 1
INTRODUCTION
To analyze and identify percentage of the fraudulent in the given data set.
Now a days, technology had been improvised and frauds have been raising rapidly.
In banking sector, fraudulent activities in credit-card have been increased. In our
model, main process is to make accurate predictions when data is fed. With the use
of Machine Learning, we analyze and summarize the frauds in the credit card
transactions.
1.1 OVERVIEW
There are huge number of credit card transactions happening in real-world.

But there is also third party person who monitors our activities and put people in
trouble. This was happening not only now but also about twenty years ago, and
with the help of latest technology algorithms, this fraud activity has been increased
rapidly in certain areas such as online shopping, marketing and so on. So to detect
these types of fraud activity, we have made project to implement a model.
1.2 PROBLEM STATEMENT
The Credit Card Fraud Detection Problem includes modeling past credit
card transactions with the knowledge of the ones that turned out to be fraud. This
model is then used to identify whether a new transaction is fraudulent or not. Our
aim here is to detect 100% of the fraudulent transactions while minimizing the
incorrect fraud classifications.
Enormous Data is processed every day and the model build must be fast
enough to respond to the scam in time. Imbalanced Data i.e most of the
transactions (99.8%) are not fraudulent which makes it really hard for detecting
the fraudulent ones. Data availability as the data is mostly private. Misclassified
Data can be another major issue, as not every fraudulent transaction is caught and
reported. Adaptive techniques used against the model by the scammers.
1.3 EXISTING SYSTEM
1
1.3.1 INPUT DATA SET
The data set is based on real life transactional data by a large European
company and personal details in data is kept confidential. Accuracy of an
algorithm is around 50%.
The Data is highly skewed, consisting of 492 frauds in a total of 284,807

observations. This resulted in only 0.172% fraud cases. This skewed set is justified
by the low number of fraudulent transactions. The Dataset consists of numerical
values from the 28 'Principal Component Analysis (PCA)' transformed features,
namely V1 to V28. Furthermore, there is no metadata about the original features
provided, so pre-analysis or feature study could not be done. No missing or null
value in the dataset.
1.3.2 APPLYING VARIOUS ALGORITHMS
It is found using an algorithm in machine learning called logistic regression.

This technique showed that the accuracy of 99.962% of genuine data and 79.065%
of fraud data .Accuracy of the algorithm showed 50%.
A research about a case study involving credit card fraud detection, where
data normalization is applied before Cluster Analysis and with results obtained
from the use of Cluster Analysis and Artificial Neural Networks on fraud detection
has shown that by clustering attributes neuronal inputs can be minimized.
T++++his research was based on unsupervised learning. Significance was to find

new methods for fraud detection and to increase the accuracy of results. The data
set is based on real life transactional data by a large European company and
personal details in data is kept confidential. Accuracy of an algorithm is around
50%.
To find an algorithm and to reduce the cost measure, the result obtained was by
23% and the algorithm they found was Bayes minimum risk.
1.4 PROPOSED SYSTEM
2
In this proposed project we designed a protocol or a model to detect the
fraud activity in credit card transactions. This system is capable of providing most
of the essential features required to detect fraudulent and legitimate transactions.
With the upsurge of machine learning, artificial intelligence and other

relevant fields of information technology, it becomes feasible to automate the
process and to save some of the effective amount of labor that is put into detecting
credit card fraudulent activities.
Various libraries are imported and dataset s collected and loaded using
pandas library. Data is explored as numerical values are defined or not and data
imbalance is checked and make it balanced if needed. Data is split and plotted to
calculate the percentage of fraud transactions and valid transactions and correlate
the features which are relevant to each other for prediction.
In order to avoid and increase the accuracy of existing system, another

algorithm was taken into consideration which is Random Forest. This algorithm
is stable as it processes large number of dataset and gives exact accuracy value
even if new data is added or any missing value is present in dataset. It showed that
accuracy is 99.988% of genuine data and 42.683% of fraud data.
The Random Forest algorithm has been found to provide a good estimate
of the generalization error and to be resistant to over fitting.
The performance of the techniques is evaluated based on accuracy,

sensitivity, and specificity, and precision. Then processing of some of the
attributes provided identifies the fraud detection and provides the graphical model
visualization.
3
CHAPTER 2
LITERATURE SURVEY
Devi Meenakshi proposed a Random Forest algorithm detects percentage

of fraud transactions made in a customer’s account in precise and accurate manner
when compared to other algorithms such as Logistic Regression ,Naive Bayes and
other algorithms.With the proposed scheme, using random forest algorithm the
accuracy of detecting the fraud can be improved. Classification process of random
forest algorithm to analyze data set and user current dataset. Processing of some
of the attributes identifies the fraud detection and provides the graphical model
visualization and finally optimize the accuracy of the result data.
Enormous Data is processed every day and the model build must be fast
enough to respond to the scam in time. Imbalanced Data i.e most of the
transactions (99.8%) are not fraudulent which makes it really hard for detecting
the fraudulent ones. Data availability as the data is mostly private. Misclassified
Data can be another major issue, as not every fraudulent transaction is caught and
reported. Deep learning algorithm is used to learn from dataset automatically
without training on dataset.
John O. Awoyemi, Adebayo O. Adewunmi and Samuel A. Oluwadare

proposed This paper investigates the performance of naive bayes, k-nearest
neighbor and logistic regression on highly skewed credit card fraud data. Dataset
of credit card transactions is sourced from European cardholders containing
284,807 transactions. A hybrid technique of under-sampling and oversampling is
carried out on the skewed data. The three techniques are applied on the raw and
preprocessed data. The work is implemented in Python. The results shows of
optimal accuracy for naïve bayes, k-nearest neighbor and logistic regression
classifiers are 97.92%, 97.69% and 54.86% respectively. The comparative results
4
show that k-nearest neighbor performs better than naïve bayes and logistic
regression techniques. Credit card fraud detection, which is a data mining
problem, becomes challenging due to two major reasons – first, the profiles of
normal and fraudulent behaviors change constantly and secondly, credit card fraud
data sets are highly skewed.
Expected future areas of research could be in examining meta-classifiers and meta
learning approaches in handling highly imbalanced credit card fraud data. Also
effects of other sampling approaches can be investigated.
S P Maniraj proposed This model is then used to recognize whether a new

transaction is fraudulent or not. Our objective here is to detect 100% of the
fraudulent transactions while minimizing the incorrect fraud classifications. In this
process, focus is on analyzing and pre-processing data sets as well as the
deployment of multiple anomaly detection algorithms such as Local Outlier Factor
and Isolation Forest algorithm on the PCA transformed Credit Card Transaction
data. The code prints out the number of false positives it detected and compares it
with the actual values. This is used to calculate the accuracy score and precision of
the algorithms. Characterized by various factors such as class imbalance. The
number of valid transactions far outnumber fraudulent ones. Also, the transaction
patterns often change their statistical properties over the course of
time. This model can further be improved with the addition of more algorithms
into it. However, the output of these algorithms needs to be in the same format as
the others. Once that condition is satisfied, the modules are easy to add as done in
the code. This provides a great degree of modularity and versatility to the project.
5
CHAPTER 3
SYSTEM DESIGN
3.1 SYSTEM REQUIREMENTS
3.1.1 SOFTWARE REQUIREMENTS

 Python (Processing of dataset)
 Spyder - IDE of Anaconda Navigator application ( To use machine learning
algorithm of Random Forest)
 OS - Windows 10 (64 bit)
3.1.2 HARDWARE REQUIREMENTS

 Processor - Intel
 RAM - 4 GB
 Hard Disk - 930 GB
 Laptop (or) PC - Acer
3.2 UML Flow Diagrams
3.2.1 Use Case Diagram of Credit card Fraud Detection
Use case diagrams are considered for high level requirement analysis
of a system. So when the requirements of a system are analysed the
6
functionalities are captured in use cases. So it can be said that uses cases
are nothing but the system functionalities written in an organized manner.
Now the second things which are relevant to the use cases are the actors.
Actors can be defined as something that interacts with the system. The actors
can be human user, some internal applications or may be some external
applications. Use case diagrams are used to gather the requirements of a
system including internal and external influences. These requirements are
mostly design requirements. Hence, when a system is analyzed to gather
its functionalities, use cases are prepared and actors are identified.
3.2.2 Sequence Diagram of Credit card Fraud Detection
UML sequence diagrams model the flow of logic within the system in
a visual manner, enabling to both document and validate the logic, and are commonly
used for both analysis and design purposes.
The various actions that take place in the application in the correct
sequence are shown in the above figure. Sequence diagrams are the most popular UML
for dynamic modeling.
7
3.2.3 Activity Diagram of Credit card Fraud Detection
Activity is a particular operation of the system. Activity diagram is suitable for

modeling the activity flow of the system. Activity diagrams are not only used for
visualizing dynamic nature of a system but they are also used to construct the
executable system by using forward and reverse engineering techniques. The only
missing thing in activity diagram is the message part.An application can have
multiple systems. Activity diagram also captures these systems and describes the
flow from one system to another.
This specific usage is not available in other diagrams. These systems can
be database, external queues, or any other system. Activity diagram is suitable for
modeling the activity flow of the system.
8
3.2.4 Component Diagram of Credit card Fraud Detection
A component diagram displays the structural relationship of components

of a software system. These are mostly used when working with complex
systems that have many components such as sensor nodes, cluster head and base
station. It does not describe the functionality of the system but it describes the
components used to make those functionalities. Components communicate with
each other using interfaces. The interfaces are linked using connectors. The
below figure shows a component diagram.
3.2.5 Collaboration Diagram of Credit card Fraud Detection
The next interaction diagram is collaboration diagram. It shows the object

organization. Here in collaboration diagram the method call sequence is indicated by
some numbering technique. The number indicates how the methods are called one
after another.
The method calls are similar to that of a sequence diagram. But the
difference is that the sequence diagram does not describe the object organization whereas
the collaboration diagram shows the object organization.
9
3.2.6 Deployment Diagram of Credit card Fraud Detection
A deployment diagrams shows the hardware of your system and the

software in those hardware. Deployment diagrams are useful when your software
solution is deployed across multiple machines such as sensor nodes, cluster head and
base station with each having a unique configuration. The Figure represents deployment
diagram for the developed application.
Deployment Diagram in the figure shows how the modules such as AFT
Algorithm, Hash Algorithm, Transaction, Database gets deployed in the system.
10
3.2.7 Package Diagram of Credit card Fraud Detection
11
Package diagrams are used to reflect the organization of packages and
their elements. When used to represent class elements, package diagrams provide
a visualization of the namespaces.
Package diagrams are used to structure high level system elements.
Package diagrams can be used to simplify complex class diagrams, it can
group classes into packages. A package is a collection of logically related UML
elements.
Packages are depicted as file folders and can be used on any of
the UML diagrams. The Figure represents package diagram for the developed
application which represents how the elements are logically related.
3.2.8 Flow Chart of Credit card Fraud Detection
Figure 3.1 Flow Chart diagram of ML model

The above diagram explains about methodology of various models such as
Logistic Regression, Naive Bayes and Random Forest.
12
CHAPTER 4
SYSTEM ARCHITECHTURE
4.1 Architectural Design
The Credit Card Fraud Detection Problem includes modeling past credit
card transactions with the knowledge of the ones that turned out to be fraud. This
model is then used to identify whether a new transaction is fraudulent or not. Our
aim here is to detect 100% of the fraudulent transactions while minimizing the
incorrect fraud classifications.
13
4.2 List of Modules
The first module tells about importing libraries to load the data such as
numpy, pandas, matplotlib and seaborn.
Secondly dataset is loaded and stored as matrix form using numpy and
manipulatio of those dataset is done in pandas. This is called data analysis.Using
matplotlib , bar chart representation is drawn to know numerical values prediction
for each transaction and seaborn is used to make graphical representation of data.
Thirdly, dataset is explored using head() function.
Fourthly, Imbalanced data is balanced using SMOTE library in order to get

efficient output.
Next, dataset is splitted into train dataset and test dataset before applying
machine learning model.
Next,correlation matrix is drawn to identify which columns are dependent

to each other by visual representation using seaborn library.
Next, Machine learning model is applied on train dataset in order to learn

to predict fraud transactions as 1 and genuine transactions as 0 in Class column,
Finally, data is visualised using confusion matrix showing True Positive,

True Negative, False Positive and False Negative values. Also parameters such as
Accuracy, Precision, Recall and F1-score are identified.
Comparing the parameters, Random Forest algorithm shows greater

accuracy in identifying fraud transactions.
14
CHAPTER 5
SYSTEM IMPLEMENTATION
In this chapter, pseudo code for implementation of Random Forest model is shown
to predict better accuracy.
5.1 SYSTEM DESCRIPTION

 Among the machine learning models, Random Forest model is used in
this project.
 Random Forest is a robust machine learning model that can be used for
a variety of tasks including regression and classification.
 This model is made up of a large number of small decision trees, called
estimators, which each produce their own predictions.
 This model is used for predicting accurate percentage of fraud
transactions on given data set when compared to other models.
5.2 PSEUDO CODE FOR RANDOM FOREST
Import datatset from drive

Import numpy
Import sklearn
Import pandas
From matplot lib import pyplot
Import seaborn
#import dataset
Import dataset
From sklearn import modelselection to split the dta ainto train and test test
Split the train and test set data
From sklearn.ensemble import RandomForestClassifier
#Create an object
Obj = RandomForestClassifier()
Using this object fit the training data to the random forest model
#Test the data

Now test the data using predict() method
The tested data is in y_pred and the original is in variable y_test
#Data visualization
From sklearn.metrics import confusion matrix
Using confusion matrix display the Accuracy score
15
CHAPTER 6
RESULTS AND CODING
6.1 SAMPLE CODE
6.1.1 ML MODEL CODE

from google.colab import drive
drive.mount('/content/drive/')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import gridspec
import io
import requests
data = pd.read_csv('/content/drive/MyDrive/Data Set/creditcard.csv')
data.head()
print(data.shape)
print(data.describe())
fraud = data[data['Class'] == 1]
valid = data[data['Class'] == 0]
outlierFraction = len(fraud)/float(len(valid))
print(outlierFraction)
print('Fraud Cases: {}'.format(len(data[data['Class'] == 1])))
print('Valid Transactions: {}'.format(len(data[data['Class'] == 0])))
print('Amount details of the fraudulent transaction')
fraud.Amount.describe()
print('Details of valid transaction')

valid.Amount.describe()
corrmat = data.corr()
fig = plt.figure(figsize = (12, 9))
sns.heatmap(corrmat, vmax = .8, square = True)
plt.show()
X = data.drop(['Class'], axis = 1)
Y = data["Class"]
print(X.shape)
print(Y.shape)
xData = X.values
yData = Y.values
from sklearn.model_selection import train_test_split
xTrain, xTest, yTrain, yTest = train_test_split(
16
xData, yData, test_size = 0.2, random_state = 42)
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
rfc.fit(xTrain, yTrain)
yPred = rfc.predict(xTest)
from sklearn.metrics import classification_report, accuracy_score
from sklearn.metrics import precision_score, recall_score
from sklearn.metrics import f1_score, matthews_corrcoef
from sklearn.metrics import confusion_matrix
n_outliers = len(fraud)
n_errors = (yPred != yTest).sum()
print("The model used is Random Forest classifier")
acc = accuracy_score(yTest, yPred)

print("The accuracy is {}".format(acc))
prec = precision_score(yTest, yPred)

print("The precision is {}".format(prec))
rec = recall_score(yTest, yPred)

print("The recall is {}".format(rec))
f1 = f1_score(yTest, yPred)
print("The F1-Score is {}".format(f1))
MCC = matthews_corrcoef(yTest, yPred)

print("The Matthews correlation coefficient is{}".format(MCC))
LABELS = ['Normal', 'Fraud']

conf_matrix = confusion_matrix(yTest, yPred)
plt.figure(figsize =(12, 12))
sns.heatmap(conf_matrix, xticklabels = LABELS,
yticklabels = LABELS, annot = True, fmt ="d");
plt.title("Confusion matrix")
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
6.1.2 DATASET FILE

The dataset file used in this project contains nearly 20,000 data and it was
uploaded in the following github link
https://drive.google.com/drive/folders/14nFYduRY24jtdMSICJ3ieM0OiiiiD
jaB?usp=sharing
17
6.2 SCREENSHOTS
6.2.1 HEAT MAP
18
6.2.2 CONFUSION MATRIX
19
CHAPTER 7
CONCLUSION AND FUTURE WORK
7.1 Conclusion
Credit Card Fraud Detection has emerged as major solution for the credit card fraud
problem in the electronic payment sector. we developed a novel method for fraud
detection, where customers are grouped based on their transactions and extract
behavioral patterns to develop a profile for every cardholder. Then different
classifiers are applied on three different groups later rating scores are generated for
every type of classifier. This dynamic changes in parameters lead the system to adapt
to new cardholder's transaction behaviors timely. Followed by a feedback
mechanism to solve the problem of concept drift. We observed that the Matthews
Correlation Coefficient was the better parameter to deal with imbalance dataset.
MCC was not the only solution. By applying the SMOTE, we tried balancing the
dataset, where we found that the classifiers were performing better than before. The
other way of handling imbalance dataset is to use one-class classifiers like one-class
SVM. We finally observed that Logistic regression, decision tree and random forest
are the algorithms that gave better results.
7.2 Future Work
Machine Learning algorithm called Random Forest is used to detect fraud

transactions in accurate manner. In this algorithm, training is given on dataset and
then tested on new dataset. This is done in static approach because learning is done
by training dataset. But to improve this further ,deep learning algorithms are used so
that learning is done by it’s own without training dataset. Further front-edn
development can also be included by creating an user-friendly model which helps
user themselves to scan their card and identify the fraud in it.
20
REFERENCES
[1] Jiang, Changjun et al. “Credit Card Fraud Detection: A Novel Approach Using
Aggregation Strategy and Feedback Mechanism.” IEEE Internet of Things Journal 5
(2018): 3637-3647.
[2] Pumsirirat, A. and Yan, L. (2018). Credit Card Fraud Detection using Deep Learning
based on Auto-Encoder and Restricted Boltzmann Machine. International Journal of
Advanced Computer Science and Applications, 9(1).
[3] Mohammed, Emad, and Behrouz Far. “Supervised Machine Learning Algorithms for
Credit Card Fraudulent Transaction Detection: A Comparative Study.” IEEE Annals of
the History of Computing, IEEE, 1 July 2018,
doi.ieeecomputersociety.org/10.1109/IRI.2018.00025.
[1] S. Akila and U. Srinivasulu Reddy, “Cost-sensitive Risk Induced Bayesian Inference
Bagging (RIBIB) for credit card fraud detection,” Journal of Computational Science, vol.
27, pp. 247–254, Jul. 2018, doi: 10.1016/j.jocs.2018.06.009.
[2] A. M. Ozbayoglu, M. U. Gudelek, and O. B. Sezer, “Deep learning for financial
applications : A survey,” Applied Soft Computing, vol. 93, p. 106384, Aug. 2020, doi:
10.1016/j.asoc.2020.106384.
[3] Y. Jin, R. M. Rejesus *, and B. B. Little, “Binary choice models for rare events data: a
crop insurance fraud application,” Applied Economics, vol. 37, no. 7, pp. 841–848, Apr.
2005, doi: 10.1080/0003684042000337433.
[1]. Fabiana Fournier, Ivo carriea, Inna skarbovsky, The Uncertain Case of Credit Card
Fraud Detection, The 9th ACM International Conference On Distributed Event Based
Systems(DEBS15) 2015.
[2]. Yashvi Jain, Namrata Tiwari, ShripriyaDubey, Sarika Jain, A Comparative Analysis of
Various Credit Card Fraud Detection Techniques, Blue Eyes Intelligence Engineering
And Sciences Publications 2019
[3]. Dinesh L. Talekar, K. P. Adhiya, Credit Card Fraud Detection System-A Survey,
International journal of modern engineering research(IJMER) 2014.
[1] Raj S.B.E., Portia A.A., Analysis on credit card fraud detection methods, Computer,
Communication and Electrical Technology International Conference on (ICCCET)
(2011), 152-156.
21
[2] Jain R., Gour B., Dubey S., A hybrid approach for credit card fraud detection using
rough set and decision tree technique, International Journal of Computer Applications
139(10) (2016).
[3] Dermala N., Agrawal A.N., Credit card fraud detection using SVM and Reduction of
false alarms, International Journal of Innovations in Engineering and Technology (IJIET)
7(2) (2016).
22

Batch 31

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Batch 31

Uploaded by

Copyright:

Available Formats

CREDIT CARD FRAUD DETECTION SYSTEM

in partial fulfilment for the requirement of award of the degree

St. JOSEPH’S INSTITUTE OF TECHNOLOGY

ANNA UNIVERSITY: CHENNAI 600 025

Certified that this project report “CREDIT CARD FRAUD DETECTION

SYSTEM” is the bonafide work of “SABHARAM M (312417104083) and

NITHIN M (312417104061)” who carried out the project under my supervision

We extend our heartfelt gratitude to our honourable Director Mrs. B. Jessie

We are indebted to our Principal Dr. P. Ravichandran, M.Tech., Ph.D. for

College Name: St. JOSEPH’S INSTITUTE OF TECHNOLOGY

Branch: COMPUTER SCIENCE AND ENGINEERING

Name of the Supervisor with

NITHIN M Mr. J. Nisha Jenipher,

SABHARAM M Detection St. Joseph’s Institute Of

fulfilment for the award of Bachelor of Engineering Degree in Computer Science

and Engineering of Anna University were evaluated and confirmed to be reports

of the work done by above students.

(INTERNAL EXAMINER) (EXTERNAL EXAMINER)

3.2.2 Sequence Diagram of Credit card 7

3.2.3 Activity Diagram of Credit card 8

3.2.4 Component Diagram of Credit card 9

3.2.6 Deployment Diagram of Credit card 10

3.2.7 Package Diagram of Credit card

LIST OF NAME OF THE FIGURE PAGE NO

3.1 Use case diagram 6

3.2 Sequence diagram 7

3.3 Activity Diagram 8

3.4 Component Diagram 9

3.5 Collaboration Diagram 9

3.6 Deployment Diagram 10

3.7 Package Diagram 11

4.1 System Architecture Diagram 13

There are huge number of credit card transactions happening in real-world.

1.2 PROBLEM STATEMENT

1.3 EXISTING SYSTEM

The Data is highly skewed, consisting of 492 frauds in a total of 284,807

1.3.2 APPLYING VARIOUS ALGORITHMS

It is found using an algorithm in machine learning called logistic regression.

T++++his research was based on unsupervised learning. Significance was to find

1.4 PROPOSED SYSTEM

With the upsurge of machine learning, artificial intelligence and other

In order to avoid and increase the accuracy of existing system, another

The performance of the techniques is evaluated based on accuracy,

Devi Meenakshi proposed a Random Forest algorithm detects percentage

John O. Awoyemi, Adebayo O. Adewunmi and Samuel A. Oluwadare

S P Maniraj proposed This model is then used to recognize whether a new

3.1 SYSTEM REQUIREMENTS

3.1.1 SOFTWARE REQUIREMENTS

3.1.2 HARDWARE REQUIREMENTS

3.2 UML Flow Diagrams

3.2.1 Use Case Diagram of Credit card Fraud Detection

3.2.2 Sequence Diagram of Credit card Fraud Detection

Activity is a particular operation of the system. Activity diagram is suitable for

A component diagram displays the structural relationship of components

3.2.5 Collaboration Diagram of Credit card Fraud Detection

The next interaction diagram is collaboration diagram. It shows the object

A deployment diagrams shows the hardware of your system and the

3.2.8 Flow Chart of Credit card Fraud Detection

Figure 3.1 Flow Chart diagram of ML model

4.1 Architectural Design

Thirdly, dataset is explored using head() function.

Fourthly, Imbalanced data is balanced using SMOTE library in order to get

Next,correlation matrix is drawn to identify which columns are dependent