Professional Documents
Culture Documents
It's important to note that the contribution level of each feature to the fraud
score is determined by the artificial intelligence of the machine, driven by the training
set, and not by a fraud analyst. In the context of card fraud, if the use of cards for
fraudulent activities is high, the fraud weighting of a credit card transaction will also
be high. Conversely, if fraudulent activity decreases, the contribution level will
decrease accordingly. These models self-learn without explicit programming, as seen
in manual review processes.
1
it prunes the trees by setting a stopping criterion for node splits, a concept that will be
explored in more detail later in this study.
2
CHAPTER-2
LITERAURE SURVEY
Abstract: This project focuses on real-world credit card fraud detection, addressing
the substantial increase in fraudulent activities accompanying the growth of credit
card transactions. The challenge lies in the absence of the cardholder during
purchases, making it difficult for merchants to verify the authenticity of the
transaction. The implementation of efficient fraud detection systems has become
crucial for minimizing losses. The proposed scheme utilizes the Random Forest
algorithm to enhance fraud detection accuracy. The classification process involves
analyzing the dataset and user current dataset, optimizing the result data accuracy.
The evaluation of technique performance includes metrics such as accuracy,
sensitivity, specificity, and precision. The processing of attributes identifies fraud
detection, providing graphical model visualization.
Abstract: Credit card fraud prevention is a prevalent issue in the developed world,
particularly with the increasing popularity of e-commerce sites. This form of fraud is
often identified through fraudulent transactions. As credit card theft becomes more
common, with fraudsters using stolen credit card information for illicit purposes,
tracking online fraud transactions becomes essential. This study employs various
methods and introduces a random forest algorithm to identify suspicious transactions.
The algorithm is based on supervised learning, classifying the dataset using decision-
making processes to enhance the proposed scheme's consistency.
3
CHAPTER-3
SYSTEM ANALYSIS
3.1 Existing System
In the existing system, a case study on credit card fraud detection is conducted. Data
normalization is applied before Cluster Analysis, and the results from using Cluster
Analysis and Artificial Neural Networks show that clustering attributes can minimize
neuronal inputs. The research, based on unsupervised learning, focuses on finding
new methods for fraud detection to enhance accuracy. The dataset used in this study is
derived from real-life transactional data from a large European company, with
personal details kept confidential. The algorithm's accuracy is approximately 50%,
aiming to reduce cost measures. The proposed algorithm is Bayes minimum risk, and
some disadvantages include the introduction of a new collative comparison measure
and the use of a cost-sensitive method based on Bayes minimum risk.
Disadvantages:
Advantages:
4
2. The algorithm efficiently handles the 'amount' feature (transaction amount) and the
'class' feature (target class for binary classification, with values 1 for fraud and 0 for
non-fraud).
5
CHAPTER-4
SYSTEM REQUIREMENTS
Requirement Analysis
The project involved analyzing the design of a few applications to enhance user-
friendliness. Emphasis was placed on maintaining well-ordered navigations between
screens while minimizing user input. To improve accessibility, the application was
designed to be browser-compatible, ensuring compatibility with most browsers.
Requirement Specification
Functional Requirements:
Software Requirements:
For developing the application, the following software requirements are identified:
1. Python
1. Windows
1. Python
3. RAM: 4 GB.
6
CHAPTER-5
SYSTEM STUDY
5.1 Feasibility Study
In this phase, the feasibility of the project is analyzed, and a business proposal is
presented with a general plan and cost estimates. The feasibility study ensures that the
proposed system is viable and won't burden the company. Understanding the major
requirements for the system is crucial during system analysis.
Economical Feasibility
This study assesses the economic impact of the system on the organization. The
budgetary constraints and the funds available for research and development are
considered. The developed system must stay within the budget, and this was achieved
by utilizing freely available technologies, with only customized products requiring
purchase.
Technical Feasibility
The technical feasibility study examines the technical requirements of the system. The
system should not place high demands on available technical resources, ensuring that
it has modest requirements. Excessive demands on the client's technical resources are
to be avoided, and the developed system should necessitate minimal or no changes for
implementation.
Social Feasibility
This aspect assesses the level of acceptance of the system by users. User acceptance is
crucial, and the process of training users to efficiently use the system is considered.
Users should not feel threatened but should see the system as a necessity. Training
methods should instill confidence in users, encouraging constructive criticism, as they
are the final users of the system. The success of social feasibility relies on effective
user education and familiarity with the system.
7
CHAPTER-6
SYSTEM ARCHITECTURE
6.1 Data Flow Diagram
4. Use Case Diagrams: Model the functionality of a system using actors and use
cases.
8. Activity Diagrams: Illustrate the dynamic nature of a system by modeling the flow
of control.
9
6.2.3 Class Diagram
In software engineering, a class diagram in the Unified Modeling Language (UML) is
a type of static structure diagram that describes the structure of a system. It shows the
system's classes, their attributes, operations (or methods), and the relationships among
the classes. The class diagram explains which class contains information.
10
6.2.5 Collaboration Diagram
11
CHAPTER-7
INPUT AND OUTPUT DESIGN
7.1 Input Design
Input design serves as the connection between the information system and the
user. It involves specifying procedures for data preparation and the necessary steps to
transform transaction data into a usable form for processing. This transformation can
occur through data reading from written or printed documents or direct data entry by
individuals. The design of input focuses on minimizing input requirements,
controlling errors, avoiding delays, eliminating unnecessary steps, and keeping the
process simple. It also emphasizes security, ease of use, and privacy retention. Input
design considers:
3. Validity Checks: Input design ensures that entered data is valid. Screens with
appropriate messages are designed to guide users and avoid confusion. The
objective is to create an input layout that is easy to follow.
12
immediate needs and in hard copy format. It is the most direct source of information
for users and plays a crucial role in user decision-making.
Objectives:
The output form of an information system should achieve one or more of the
following objectives:
Trigger an action.
Confirm an action.
13
CHAPTER-8
IMPLEMENTATION
8.1 Modules
This project comprises four modules:
1. Data Collection
2. Data Pre-processing
3. Feature Extraction
4. Evaluation Model
14
is used to evaluate the models. Popular machine learning algorithms, such as Random
Forest, are employed for text classification tasks.
15
CHAPTER-9
SOFTWARE ENVIRONMENT
9.1 Python
9.1.1 History of Python:
The programming language Python has its roots in the programming language
ABC, developed at CWI (Centrum Wiskunde & Informatica) in Amsterdam,
Netherlands. Guido van Rossum, who worked on the ABC project in the early 1980s,
was influenced by his experiences and frustrations with ABC when conceptualizing
Python. He aimed to design a simple scripting language with some of ABC's strengths
and without its problems. Python was born in the late 1980s, featuring a basic syntax,
indentation for statement grouping, and powerful data types.
4. Simple and Easy: Python is known for its simplicity, ease of learning, and
reduced coding effort compared to languages like Java.
7. Free and Open-Source: Python is freely available, and its source code can be
modified and distributed.
16
8. Portable: Code written in Python can run on different platforms without
modification, following the "Write Once Run Anywhere" (WORA) principle.
3. Python is for Everyone: Python code can run on any machine, making it
versatile for web apps, data analysis, machine learning, automation, web
scraping, games, and visualizations.
17
Unsupervised Learning: Utilizes unlabeled data to find underlying structures
through factor and cluster analysis models.
2. Time and Resources: ML needs time and substantial resources to learn and
develop accurate and relevant algorithms.
18
4. High Error-Susceptibility: ML, while autonomous, is prone to errors,
especially if trained with biased datasets, leading to irrelevant predictions.
There have been several updates in the Python version over the years. The question is
how to install Python? It might be confusing for the beginner who is willing to start
learning Python, but this tutorial will solve your query. The latest or the newest
version of Python is version 3.7.4 or in other words, it is Python 3. Note: The Python
version 3.7.4 cannot be used on Windows XP or earlier devices.
Before you start with the installation process of Python. First, you need to know about
your System Requirements. Based on your system type i.e. operating system and
based processor, you must download the Python version. My system type is a
19
Windows 64-bit operating system. So the steps below are to install Python version
3.7.4 on a Windows 7 device or to install Python 3.
Download the Python Cheatsheet here. The steps on how to install Python on
Windows 10, 8 and 7 are divided into 4 parts to help understand better.
Step 1: Go to the official site to download and install Python using Google Chrome or
any other web browser. OR Click on the following link: [https://www.python.org]
(https://www.python.org)
Now, check for the latest and the correct version for your operating system.
Step 3: You can either select the Download Python for Windows 3.7.4 button in
Yellow Color or you can scroll further down and click on download with respective to
their version. Here, we are downloading the most recent Python version for Windows
3.7.4
20
Step 4: Scroll down the page until you find the Files option.
Step 5: Here you see a different version of Python along with the operating system.
21
To download Windows 32-bit Python, you can select any one from the three
options: Windows x86 embeddable zip file, Windows x86 executable installer,
or Windows x86 web-based installer.
To download Windows 64-bit Python, you can select any one from the three
options: Windows x86-64 embeddable zip file, Windows x86-64 executable
installer, or Windows x86-64 web-based installer.
Here we will install Windows x86-64 web-based installer. Here your first part
regarding which version of Python is to be downloaded is completed. Now we
move ahead with the second part in installing Python i.e. Installation
Note: To know the changes or updates that are made in the version you can
click on the ReleaseNote Option.
Installation of Python
Step 1: Go to Download and Open the downloaded Python version to carry out the
installation process.
Step 2: Before you click on Install Now, Make sure to put a tick on Add Python 3.7 to
PATH.
22
Step 3: Click on Install NOW After the installation is successful. Click on Close.
With these above three steps on Python installation, you have successfully and
correctly installed Python. Now is the time to verify the installation.
Step 4: Let us test whether the Python is correctly installed. Type python –V and press
Enter.
23
Step 5: You will get the answer as 3.7.4
Note: If you have any of the earlier versions of Python already installed. You must
first uninstall the earlier version and then install the new one.
Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program
Step 4: To go ahead with working in IDLE you must first save the file. Click on File >
Click on Save
Step 5: Name the file and save as type should be Python files. Click on SAVE. Here I
have named the files as Hey World.
24
CHAPTER-10
SOFTWARE TESTING
10.1 System Test
The purpose of testing is to discover errors. Testing is the process of trying to
discover every conceivable fault or weakness in a work product. It provides a way to
check the functionality of components, sub-assemblies, assemblies, and/or a finished
product. It is the process of exercising software with the intent of ensuring that the
software system meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of tests. Each test type addresses a
specific testing requirement.
Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid outputs.
All decision branches and internal code flow should be validated. It is the testing of
individual software units of the application. It is done after the completion of an
individual unit before integration. This is a structural testing that relies on knowledge
of its construction and is invasive. Unit tests perform basic tests at the component
level and test a specific business process, application, and/or system configuration.
Unit tests ensure that each unique path of a business process performs accurately to
the documented specifications and contains clearly defined inputs and expected
results.
Integration Testing
25
Functional Test
System Test
System testing ensures that the entire software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system
testing is the configuration-oriented system integration test. System testing is based
on process descriptions and flows, emphasizing pre-driven process links and
integration points.
White Box Testing is a testing in which the software tester has knowledge of
the inner workings, structure, and language of the software, or at least its purpose. It
is used to test areas that cannot be reached from a black box level.
Black Box Testing is testing the software without any knowledge of the inner
workings, structure, or language of the module being tested. Black box tests, like
most other kinds of tests, must be written from a definitive source document, such as
26
a specification or requirements document. It is a testing in which the software under
test is treated as a black box. You cannot “see” into it. The test provides inputs and
responds to outputs without considering how the software works.
Acceptance Testing
27
CHAPTER-11
RESULT AND DISCUSSION
In this project, we are using the Python Random Forest inbuilt CART algorithm to detect
fraud transactions from a credit card dataset. We downloaded this dataset from the 'Kaggle'
website using the following URL:
Dataset
URL:[https://www.kaggle.com/mlg-ulb/creditcardfraud](https://www.kaggle.com/mlg-ulb/
creditcardfraud)
Using the 'CreditCardFraud.csv' file, we will train the Random Forest algorithm and then
upload a test data file. This test data will be applied to the Random Forest train model to
predict whether the test data contains normal or fraud transaction signatures. When we upload
test data, it will contain only transaction data, and no class label will be there. The application
will predict and give the result.
Random forests are a supervised learning algorithm that can be used for both classification
and regression. It is the most flexible and easy-to-use algorithm. A forest is comprised of
trees, and the more trees it has, the more robust the forest is. Random forests create decision
trees on randomly selected data samples, get predictions from each tree, and select the best
solution by means of voting. It also provides a pretty good indicator of the feature
importance. Python's SKLEARN inbuilt contains support for CART with all decision trees
and a random forest classifier.
28
Screen Shots
To run the project, double click on the 'run.bat' file to get the following screen:
In the above screen, click on the 'Upload Credit Card Dataset' button to upload the dataset.
After uploading, the dataset will get the below screen:
29
Now click on 'Generate Train & Test Model' to generate a training model for the Random
Forest Classifier.
In the above screen, after generating the model, we can see the total records available in the
dataset and then how many records the application is using for training and testing. Now click
on the 'Run Random Forest Algorithm' button to generate the Random Forest model on train
and test data.
30
In the above screen, we can see that Random Forest generates 99.78% accuracy while
building the model on train and test data. Now click on 'Detect Fraud From Test Data' button
to upload test data and to predict whether the test data contains normal or fraud transactions.
In the above screen, I am uploading the test dataset, and after uploading test data, we will get
the below prediction details.
31
In the above screen, beside each test data, the application will display the output as whether
the transaction contains cleaned or fraud signatures. Now click on 'Clean & Fraud
Transaction Detection Graph' button to see the total test transactions with clean and fraud
signatures in graphical format.
In the above graph, we can see the total test data and the number of normal and fraud
transactions detected. In the above graph, the x-axis represents the type, and the y-axis
represents the count of clean and fraud transactions.
32
CHAPTER-12
CONCLUSION
33
CHAPTER-13
REFERENCES/BIBLIOGRAPHY
34