You are on page 1of 34

Credit Card Fraud Detection

Project Report Submitted in the partial fulfillment of the requirements


for the
award of the degree of

MASTER OF SCIENCE IN COMPUTER SCIENCE

Submitted by
JEGADEESH.J
(Reg No.20CSEE12)

Under the guidance of


Dr.P.B.Pankajavalli,Mca.,M.Phil.,Me(Cse).,Ph
.D., Assistant Professor
DEPARTMENT OF COMPUTER SCIENCE
BHARATHIAR UNIVERSITY
COIMBATORE – 641 046
MAY 2022
BONAFIDE CERTIFICATE

I hereby declare that this project entitled “CREDIT CARD FRAUD


DETECTION” submitted to Bharathiar University, in partial
fulfillment of the requirements for the award of the Degree of Master
Science in Computer Science is a record of original work done by
JEGADEESH.J during his period of study in the Department of
Computer Science, Bharathiar University, Coimbatore. This project
work has not formed the basis for the award of any Degree / Diploma /
Associate ship / Fellowship or other similar title to any candidate of any
University.

Submitted to Bharathiar University Viva-Voice Examination held on

Signature of the Supervision Signature of the Head of Department

Internal Examine External


Examiner
DECLARATION

This is to certify that JEGADEESH. J has successfully


completed the project entitled “CREDIT CARD FRAUD
DETECTION”, to be submitted to Bharathiar University,
Coimbatore in partial fulfillment of the requirements for the
award of the Degree of MASTER OF SCIENCE IN
COMPUTER SCIENCE . This report is a record of the
original field work done during the period of study in
Bharathiar University, Coimbatore.

Place: Coimbatore Signature of the Candidate


Date : JEGADEESH. J
ACKNOWLEDGEMENT
First of all I record my sincere gratitude to the almighty for his
blessing for the successful completion of this project.

My deep sense of gratitude and sincere thanks to Dr.


E.CHANDRA Professor & Head, Department of Computer
Science, School of Computer Science and Engineering,
Bharathiar University for giving me the opportunity to
complete my project successfully.

I am extremely thankful to my project guide,


Dr.P.B.Pankajavalli Assistant Professor, Department of
Computer Science, School of Computer Science and
Engineering, Bharathiar University for the immense guidance,
valuable suggestions and support for me in the completion of
this project. Her encouragement has given me the urge to do
my best and I would like to express my sincere gratitude to her.

I express my sincere thanks to the project faculty for the kind


cooperation provided in all aspects of the information provided
to complete this project.

1 take this opportunity to thank my parents and friends for their


support, contribution and motivation which helped me a lot to
complete this project successfully.
ABSTRACT

The Project entitled CREDIT CARD FRAUD DETECTION has become one
of the growing problems. A large financial loss has greatly affected individual people
using credit cards and also the merchants and banks. Machine learning is considered as
one of the most successful techniques to identify fraud. This project reviews different
fraud detection techniques using machine learning and compares them using
performance measures like accuracy, precision and specificity. The project also
proposes an FDS (Fault Detection System) which uses supervised Logistic Regression
and Decision Tree. With this proposed system the accuracy of detecting fraud in credit
card is increased. Further, the proposed system uses the learning to rank approach to
rank the alert and also effectively addresses the problem of concept drift in fraud
detection.
CONTENT

INDEX PAG
E

DECLARATION i

CERTIFICATE i
i

ACKNOWLEDGEMENT i
i
i

ABSTRACT i
v

INTRODUCTION

1 CREDIT CARD FRAUD TYPES


.1

1 REALTIME AND NEAR REAL TIME

.2 FRAUD DETECTION SYSTEM

1 SALIENT FEATURES OF THE SYSTEM

.3

1.3 PYTHON
.1

1.3 ANACONDA
.2

1.3 JUPYTER NOTEBOOK


.3
BACKGROUND STUDY

2 EXISTING SYSTEM
.1

2 PROPOSED SYSTEM
.2

SELECTION OF THE
ORGANIZATION

PROBLEM FORMULATION

4 MAIN OBJECTIVE
.
1

4 CONFIGURATION SUPPORT
.
2

4 INSTALLATION INSTRUCTIONS
.
3

SYSTEM ANALYSIS AND DESIGN

5 FACT FINDING

.
1

5 FEASIBILITY ANALYSIS

.
2

5 INPUT DESIGN

.
3

5 OUTPUT DESIGN

.
4

5 MENU DESIGN

.
5
CHAPTER-I
INTRODUCTION
1. INTRODUCTION
Credit card fraud is a major problem that involves payment cards like credit cards
as illegal sources of funds in transactions. Fraud is an illegal way to obtain goods
and funds. The goal of such illegal transactions might be to get products without
paying or gain an unauthorized fund from an account. Identifying such fraud is
troublesome and may risk the business and business organizations. Here the fraud
detection system monitors all the approved transactions and alerts the most
suspicious one. Investigator verifies these alerts and provides FDS with feedback if
the transaction was authorized or fraudulent. Verifying all the alerts everyday is a
time consuming and costly process. Hence the investigator is able to verify only a
few alerts each day. The rest of the transactions remain unchecked until the customer
identifies them and reports them as a fraud. Also the techniques used for fraud and
the cardholder spending behavior changes over time. This change in credit card
transactions is called concept drift. Hence most of the time it is difficult to identify
the credit card fraud. Machine Learning is considered as one of the most
successful techniques for fraud identification. It uses a classification and regression
approach for recognizing fraud in credit cards. Many learning algorithms have been
presented for fraud detection in credit cards which includes logistic regression and
decision tree. This project examines the performance of above algorithms based
on their ability to classify whether the transaction was authorized or fraudulent
and then compares them. The comparison is made using performance measure
accuracy, specificity and precision. The result showed that logistic regression and
decision algorithms showed better accuracy and precision than other techniques.

1.1 Credit card fraud types


Credit card frauds are categorized in various ways in the literature. proposes to
differentiate the fraud with respect to the fraudsters’ strategies. They split them into
application frauds and behavioral frauds. In application frauds the fraudsters apply
for a credit card with a false ID whereas in behavioral frauds the fraudsters find a
way to obtain the cardholder’s credential in order to use a pre-existing credit card.
split the fraudulent transactions into six categories with respect to the fraudulent
process: frauds from lost or stolen cards, frauds from counterfeit cards, online frauds,
bankruptcy frauds, merchant frauds and frauds from cards that got stolen during the
expedition process. split the fraudulent transactions in three categories: card related
frauds, merchant related frauds and Internet frauds.go further in this direction and
propose to only split fraud into face-to-face (card present) fraud and e-commerce
(card not present) fraud. Their argument for this strict and simple classification is
that the overlap between categories may weaken fraud detection approaches.
1.2 Real time and near real time fraud detection systems
In practice, credit card fraud detection is a process that happens in two steps: the
blocking time that aims to detect fraudulent transactions and the checking time that
aims to detect fraudulent cards. These two steps are very different in terms of time
constraint: the blocking time must be short since the card-holder is waiting for his
transaction to be accepted whereas the checking time can be more. The blocking time
corresponds to the real time fraud detection. It aims to authorize or not the
transactions. This process must be fast and very precise. Indeed, it must block only
transactions that are very likely to be fraudulent, otherwise it would hinder the
cardholder buying routine. Most of the transactions blocked by the real time fraud
detection systems are fraudulent transactions, on the other hand, a lot of fraudulent
transactions are not detected. The priority of the real time fraud detection system is
precision, not recall. For rapidity purposes, it is made of expert rules based on simple
and short term aggregations of the past transactions of the card holder. The checking
time corresponds to the near real time fraud detection. In the hours following the
authorized transaction, the near real time fraud detection system combines expert
rules based on long term aggregations and machine learning algorithms to fire alerts
on credit cards that present suspicious transactions. Afterwards, alerts are usually
checked by expert investigators, or automatic SMS are sent to the cardholder to invite
them to verify their recent transactions and block the card if needed. The cardholder is
sometimes called in order to label all the recent transactions in order to know when
the fraud started.

1.3 SALIENT FEATURES OF THE SYSTEM


1.3.1 PYTHON
Python is an interpreted, object-oriented, high-level programming language with
dynamic semantics. Its high-level built-in data structures, combined with dynamic
typing and dynamic binding, make it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect existing
components together. Python's simple, easy to learn syntax emphasizes readability
and therefore reduces the cost of program maintenance. Python supports modules
and packages, which encourages program modularity and code reuse. The Python
interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed.
PYTHON FEATURES
Python has few keywords, simple structure, and a clearly defined syntax. Python
code is more clearly defined and visible to the eyes. Python's source code is fairly
easy-to-maintaining. Python's bulk of the library is very portable and cross-platform
compatible on UNIX, Windows, and Macintosh. Python has support for an
interactive mode which allows interactive testing and debugging of snippets of code.

PORTABLE
Python can run on a wide variety of hardware platforms and has the same interface
on all platforms.
EXTENDABLE
It allows adding low-level modules to the Python interpreter. These modules enable
programmers to add to or customize their tools to be more efficient.
DATABASES
Python provides interfaces to all major commercial databases.
GUI Programming
Python supports GUI applications that can be created and ported to many system
calls, libraries and windows systems, such as Windows MFC, Macintosh, and the X
Window system of Unix.
SCALE ABLE
Python provides a better structure and support for large programs than shell
scripting.
OBJECT-ORIENTED APPROACH
One of the key aspects of Python is its object-oriented approach. This basically
means that Python recognizes the concept of class and object encapsulation thus
allowing programs to be efficient in the long run.
HIGHLY DYNAMIC
Python is one of the most dynamic languages available in the industry today. There
is no need to specify the type of the variable during coding, thus saving time and
increasing efficiency.
EXTENSIVE ARRAY OF LIBRARIES
Python comes inbuilt with many libraries that can be imported at any instance and be
used in a specific program.
OPEN SOURCE AND FREE
Python is an open-source programming language which means that anyone can
create and contribute to its development. Python is free to download and use in any
operating system, like Windows, Mac or Linux.
1.3.2 ANACONDA
Anaconda is a free and open-source distribution of the Python and R
programming languages for scientific computing (data science,machine learning
applications, large-scale data processing, predictive analytic, etc.), that aims to simplify
package management and deployment. Package versions are managed by the package
management system .The Anaconda distribution includes data-science packages
suitable for Windows, Linux, and Mac OS.
Anaconda Navigator is a desktop graphical user interface (GUI) included in
Anaconda distribution that allows users to launch applications and manage conda
packages, environments and channels without using command-line commands.
Navigator can search for packages on Anaconda Cloud or in a local Anaconda
Repository, install them in an environment, run the packages and update them. It is
available for Windows, Mac OS and Linux.
1.3.3 JUPYTER NOTEBOOK
"Jupyter" is a loose acronym meaning Julia, Python, and R. These programming
languages were the first target languages of the Jupyter application.  As a server-client
server, it can be accessed through the Internet.
A application, the Jupyter Notebook App allows you to edit and run your
notebooks via a web browser. The application can be executed on a PC without
Internet access, or it can be installed on a remote kernel is a program that runs and
introspects the user’s code. The Jupyter Notebook App has a kernel for Python
code."Notebook" or "Notebook documents" denote documents that contain both code
and rich text elements, such as figures, links and equations. The mix of code and text
elements, these documents are the ideal place to bring together an analysis description,
and can be executed to perform the data analysis in real time.

CHAPTER- II
BACKGROUND STUDY
2.1 EXISTING SYSTEM

In the early days a large financial loss has greatly affected individual people
using credit cards and also the merchants and banks. Here the unauthorized person
easily made the transaction instead of the authorized one.
The goal of such illegal transactions might be to get products without paying
or gain an unauthorized fund from an account.

Some of the drawbacks are:

● Lack of technologies to detect fraudulent transactions.


● Financial loss happening during credit card and debit card transactions.
● No more security regarding legal transactions.
● a huge amount of financial losses are caused by the illegal credit card
transactions.

2.2 PROPOSED SYSTEM


Today modern society is using credit cards for a variety of reasons. Similarly
fraud in credit card transactions has been growing in recent years. Each year, a
huge amount of financial losses are caused by the illegal credit card
transactions. Fraud may occur in a variety of different forms and may be limited.
Therefore there is a need to solve the issues of fraud detection in credit cards.
Additionally, with the development of new technologies criminals find new ways to
commit fraud. To overcome this problem the proposed system for fraud detection in
credit card transactions will be designed using ML technique that will provide
investigators with small reliable fraud alerts.

2.2.1 ADVANTAGES OF THE PROPOSED SYSTEM


● By using Logistic Regression and Decision Tree the issues of credit card
and debit card detection has to be handled successfully.
● No more fraudulent problems happening during credit card/debit card
transactions.
● Legally the money transaction occurs without any interruption of
unauthorized one.

CHAPTER-III
3.1 MAIN OBJECTIVE
The use of credit cards to perform financial transactions at banks or other
institutions is a common action in light of the currently available technology. Online
payments (or any other online transactions) bring benefits to companies and individuals
in terms of the convenience, velocity, and flexibility of performing daily duties. The
work presented a statistical analysis related to the usage of credit cards over five years.
This reflected the huge dependency on credit cards by both people and organizations.
To take advantage of advanced technologies, companies try to use advanced techniques
to provide high quality services to customers. Automation can be seen as the best
solution for attracting more customers and consequently collecting more financial gain.
The process of converting a manual system to a fully automatic one, as found in smart
cities, is not without risk.

3.2 METHODOLOGY

We have gathered data from the kaggle website. The dataset is trained and tested
using the following techniques: logistic regression, random forests with decision trees,
xgboost, isolation forest and confusion matrix . If our algorithm is applied into bank
credit card fraud detection systems, the probability of fraud transactions can be
predicted soon after credit card transaction occurs. Thereafter a series of anti-fraud
strategies can be adopted to prevent banks from great losses and reduce risks.

3.3 CONFIGURATION SUPPORT


HARDWARE CONFIGURATION
Processor : Intel Core i3
RAM Capacity : 4 GB
Hard Disk : 90 GB
Mouse : Logical Optical Mouse
Keyboard : Logitech 107 Keys
Monitor : 15.6 inch
Motherboard : Intel
Speed : 3.3GHZ

SOFTWARE CONFIGURATION

Operating System : Windows 10


Front End : PYTHON
Middle Ware : ANACONDA
Back End : CSV

CHAPTER-IV
SYSTEM ANALYSIS AND DESIGN
4.1 FEASIBILITY STUDY

A feasibility analysis is used to determine the viability of an idea, such as ensuring a


project is legally and technically feasible as well as economically justifiable.
Feasibility study lets the developer foresee the project and the usefulness of the system
proposal as per its workability. It impacts the organization, ability to meet the user
needs and effective use of resources. Thus, when a new application is proposed it
normally goes through a feasibility study before it is approved for development.

Three key consideration involved in the feasibility analysis are,

● TECHNICAL FEASIBILITY

● OPERATIONAL FEASIBILITY

TECHNICAL FEASIBILITY

This phase focuses on the technical resources available to the organization. It helps
organizations determine whether the technical resources meet capacity and whether the
ideas can be converted into a working system model. Technical feasibility also
involves the evaluation of the hardware, software, and other technical requirements of
the proposed system.

OPERATIONAL FEASIBILITY

This phase involves undertaking a study to analyse and determine how well the
organization’s needs can be met by completing the project. Operational feasibility
study also examines how a project plan satisfies the requirements that are needed for
the phase of system development.

4.2 INPUT DESIGN


The credit card transactions are given to machine learning algorithms as an input.
forest to do anomaly detection. User that fraud transaction has occurred and the user
can block the card to prevent further financial loss to him as well as the credit
card company.
Input design will consider the following steps:
● The dataset should be given as input.
● The dataset should be arranged.
● Methods for preparing input validations.
4.3 OUTPUT DESIGN
Legal output is one, which meets the requirement of the user and presents
the information clearly. In output design, it is determined how the information is to be
displayed for immediate need.
Designing computer output should proceed in an organized, well thought out
manner; the right output must be developed while ensuring that each output element is
designed so that the user will find the system can be used easily and effectively.
4.4 DATABASE DESIGN
This phase contains the attributes of the dataset which are maintained in the database
table. The dataset collection can be of two types namely train dataset and test dataset.

4.5 DATA FLOW DIAGRAM


Data flow diagrams are used to graphically represent the flow of data in a
business information system. DFD describes the processes that are involved in a
system to transfer data from the input to the file storage and reports generation. Data
flow diagrams can be divided into logical and physical. The logical data flow diagram
describes flow of data through a system to perform certain functionality of a business.
The physical data flow diagram describes the implementation of the logical data flow.
DFD graphically representing the functions, or processes, which capture, manipulate,
store, and distribute data between a system and its environment and between
components of a system. The visual representation makes it a good communication tool
between User and System designer. The objective of a DFD is to show the scope and
boundaries of a system.The DFD is also called as a data flow graph or bubble chart. It
can be manual, automated, or a combination of both. It shows how data enters and
leaves the system, what changes the information, and where data is stored.

Data Flow Diagram :

Dataset Data Preprocessing Data Exploration

Decision Tree
Data Virtualization

Feedback Logistic
Data Modeling Regression

Performance Fraud alert Prediction


Analysis

CHAPTER-V
SYSTEM DEVELOPMENT
5. DESCRIPTION OF MODULES
● DATA PREPROCESSING
● DATA EXPLORATION
● DATA VIRTUALIZATION
● DATA MODELING
5.1 DATA PREPROCESSING
In this module selected data is formatted, cleaned and sampled. The data
preprocessing steps includes following:
● Formatting: The data which is been selected may not be in a suitable format. The
data may be in a file format and we may like it in relational database or vice
versa.
● Cleaning: Removal or fixing of missing data is called as cleaning. The dataset
may contain record which may be incomplete or it may have null values. Such
records need to remove.
● Sampling: As number of frauds in dataset is less than overall transaction,
class distribution is unbalanced in credit card transaction. Hence sampling
method is used to solve this issue.
5.2 DATA EXPLORATION

Data exploration is an informative search used by data consumers to form true


analysis from the information gathered. Data exploration is used to analyse the data
and information from the data to form true analysis. After having a look at the dataset,
certain information about the data was explored. Here the dataset is not unique while
collecting the dataset. In this module, the uniqueness of the dataset can be created.
5.3 DATA VIRTUALIZATION

Data virtualization is a logical data layer that integrates all enterprise data siloed


across the disparate systems, manages the unified data for centralized security and
governance, and delivers it to business users in real time.

5.4 DATA MODELING

In the data modeling module, the machine learning algorithms were used to
predict the sales. Linear regression and K-means algorithm were used to predict the
sales. The user provides the ML algorithm with a dataset that includes desired inputs
and outputs, and the algorithm finds a method to determine how to arrive at those
results.

5.4.1 Linear Regression algorithm


Linear regression algorithm is a supervised learning algorithm. It implements a
statistical model when relationships between the independent variables and the
dependent variable are almost linear, shows optimal results. Logistic Regression is a
supervised classification method that returns the probability of binary dependent
variable that is predicted from the independent variable of dataset that is logistic
regression predict the probability of an outcome which has two values either zero or
one, yes or no and false or true. Logistic regression has similarities to linear regression
but as in linear regression a straight line is obtained, logistic regression shows a curve.
The use of one or several predictors or independent variable is on what prediction is
based, logistic regression produces logistic curves which plots the values between zero
and one.

5.4.2 Decision Tree algorithm


Decision Tree algorithm is an unsupervised learning algorithm. It deals with the
correlations and relationships by analysing available data. This algorithm is used to
show the prediction with increased accuracy rate. Decision tree is an algorithm that
uses a tree like graph or model of decisions and their possible outcomes to predict the
final decision, this algorithm uses conditional control statement. A Decision tree is an
algorithm for approaching discrete-valued target functions, in which decision tree is
denoted by a learned function.

5.5 PREDICTION
The Credit Card Fraud Detection Problem includes modeling past credit
card transactions with the knowledge of the ones that turned out to be fraud. This
model is then used to identify whether a new transaction is fraudulent or not.

CHAPTER – VI
SYSTEM TESTING
System testing is the stage of implementation that is aimed at ensuring that the
system works accurately and efficiently before live operation commences. Testing is
vital to the success of the system. System testing makes a logical assumption that if all
the parts of the system are correct, then the goal will be successfully achieved. System
testing involves user training system testing and successful running of the developed
proposed system. The user tests the developed system and changes are made per their
needs. The testing phase involves the testing of a developed system using various kinds
of data. While testing, errors are noted and the corrections are made. The corrections
are also noted for future use.

6.2 UNIT TESTING:

Unit testing focuses verification effort on the smallest unit of software design,
software component or module. Using the component level design description as a
control, paths are tested to uncover errors within the boundary of the module. The
relative complexity of tests and the errors those uncover is limited by the constrained
scope established for unit testing. The unit test focuses on the internal processing logic
and data structures within the boundaries of a component. This is normally considered
as an adjunct to the coding step. The design of unit tests can be performed before
coding begins.

6.3 BLACK BOX TESTING

Black box testing, also called behavioral testing, focuses on the functional
requirement of the software. This testing enables us to derive a set of input conditions
of all functional requirements for a program. This technique focuses on the information
domain of the software, deriving test cases by partitioning the input and output of a
program.
6.4 WHITE BOX TESTING

White box testing, also called as glass box testing, is a test case design that uses
the control structures described as part of component level design to derive test cases.
This test case is derived to ensure all statements in the program have been executed at
least once during the testing and that all logical conditions have been exercised.

6.5 INTEGRATION TESTING

Integration testing is a systematic technique for constructing the software


architecture to conduct errors associated with interfacing. Top-down integration testing
is an incremental approach to construction of the software architecture. Modules are
integrated by moving downward through the control hierarchy, beginning with the
main control module. Bottom-up integration testing begins the construction and testing
with atomic modules. Because components are integrated from the bottom up,
processing required for components subordinate to a given level is always available.

6.6 VALIDATION TESTING

Validation testing begins at the culmination of integration testing, when


individual components have been exercised, the software is completely assembled as a
package. The testing focuses on user visible actions and user recognizable output from
the system. The testing has been conducted on possible conditions such as the function
characteristic conforms the specification and a deviation or error is uncovered. The
alpha test and beta test is conducted at the developer site by end-users.

CHAPTER-VI
CONCLUSION & FUTURE ENHANCEMENT
CONCLUSION
Decision trees and Logistic Regression algorithms were used in developing four
fraud detection models to classify a transaction as fraudulent or legitimate. Decision
tree algorithm used to show the prediction with increased accuracy rate. Logistic
Regression algorithm implements a statistical model when relationships between the
independent variables and the dependent variable are almost linear, shows optimal
results. Also the results showed that there is no data mining technique that is
universally better than others. Performance improvement could be achieved through
developing a fraud detection model using a combination of different data mining
techniques.
FUTURE ENHANCEMENT
Advances in technology give criminals increasingly powerful tools to commit
fraud, especially using credit cards or internet bots. To combat the evolving face of
fraud, researchers are developing increasingly sophisticated tools, with algorithms and
data structures capable of handling large-scale complex data analysis and storage. This
system is capable of providing most of the essential features required to detect
fraudulent and legitimate transactions. As technology changes, it becomes difficult to
track the behavior and pattern of fraudulent transactions. In future the accuracy of
detecting fraud in credit cards will be increased. Further, the proposed system use
learning to rank approach to rank the alert and also effectively addresses
The problem concept drift in fraud detection.

REFERENCES

1. ”Detection of Fake Twitter accounts with Machine Learning Algorithms” Ilhan


aydin,Mehmet sevi, Mehmet umut salur.

2. ”Detection of fake profile in online social networks using Machine Learning” Naman

singh, Tushar sharma, Abha Thakral, Tanupriya Choudhury.

3. ”Detecting Fake accounts on Social Media” Sarah Khaled, Neamat el tazi, Hoda M.O.

Mokhtar.

4. ”Twitter fake account detection”, Buket Ersahin, Ozlem Aktas, Deniz kilinc, Ceyhun

Akyol.

5. ” a new heuristic of the decision tree induction” ning li, li zhao, ai-xia chen, qing-wu

meng, guo-fang zhang.

6. ” statistical machine learning used in integrated anti-spam system” peng-fei zhang, yu-jie

su, cong wang.

7. ” a study and application on machine learning of artificial intellligence” ming xue,

changjun zhu.

8. ” learning-based road crack detection using gradient boost decision tree” peng sheng, li

chen, jing tian.

9. ” verifying the value and veracity of extreme gradient boosted decision trees on a variety

of datasets” aditya gupta, kunal gusain, bhavya popli.

10. ” fake account identification in social networks” loredana caruccio.

SCREENSHOTS
IMPORTING REQUIRED LIBRARIES

DISPLAYING INFORMATIONS ABOUT THE DATASET


CALCULATING THE MATHEMATICAL DESCRIPTIONS OF THE DATASET

HISTOGRAM PLOTS FOR EACH COLUMNS


DISTRIBUTION PLOT
NUMBER OF FRAUD AND NON-FRAUD USERS

FINDING THE CORRELATION BETWEEN THE FEATURES

FEATURE EXTRACTION
CORRELATION VALUES

SPLITTING FEATURES AND LABEL


LOGISTIC REGRESSION MODEL

DECISION TREE CLASSIFIER MODEL

You might also like