You are on page 1of 37

CONTENT

TITLE PAGE NO.


CHAPTER-1 INTRODUCTION 1
CHAPTER-2 LITERATURE SURVEY 2
CHAPTER-3 SYSTEM ANALYSIS 3
3.1 Existing System 3
3.2 Proposed System 3
CHAPTER-4 SYSTEM REQUIREMENTS 5
4.1 Hardware Requirements 5
4.2 Software Requirements 5
CHAPTER-5 SYSTEM STUDY 6
5.1 Feasibility Study 6
5.2 Feasibility Analysis 6
CHAPTER-6 SYSTEM ARCHITECTURE 7
6.1 Data Flow Diagram 7
6.2 UML Diagrams 7
6.2.1 Types of UML Diagrams 7
6.2.2 Use Case Diagram 8
6.2.3 Class Diagram 9
6.2.4 Sequence Diagram 9
6.2.5 Collaboration Diagram 10
CHAPTER-7 INPUT AND OUTPUT DESIGN 11
7.1 Input Design 11
7.2 Output Design 11
CHAPTER-8 IMPLEMENTATION 13
8.1 Modules 13
8.1.1 Data Collection 13
8.1.2 Data Pre-processing 13
8.1.3 Feature Extraction 13
8.1.4 Evaluation Model 14
CHAPTER-9 SOFTWARE ENVIRONMENT 15
9.1 Python 15
9.1.1 History of Python 15
9.1.2 Advantages of Python 15
9.1.3 Advantages of Python Over Other Languages 16
9.1.4 Disadvantages of Python 16
9.2 Machine Learning 16
9.2.1 Types of Machine Learning 16
9.2.2 Advantages of Machine Learning 17
9.2.3 Disadvantages of Machine Learning 17
9.3 Modules Used in Project 18
9.4 How to Install Python on Windows and Mac 18
CHAPTER-10 SOFTWARE TESTING 24
10.1 System Test 24
10.2 Types of Tests 24
CHAPTER-11 RESULT AND DISCUSSION 26
11.1 Output Screens 26
CHAPTER-12 CONCLUSION 32
CHAPTER-13 REFERENCES/BIBLIOGRAPHY 33
LIST OF FIGURES
FIG NO. DESCRIPTION PAGE NO.
Figure 1 System Architecture 7
Figure 2 Use Case Diagram 8
Figure 3 Class Diagram 9
Figure 4 Sequence Diagram 9
Figure 5 Collaboration Diagram 10
CHAPTER-1
INTRODUCTION
Various techniques for detecting fraudulent activities in credit card
transactions have been considered by researchers. These methods involve the
development of models based on artificial intelligence, data mining, fuzzy logic, and
machine learning. Credit card fraud detection is a challenging yet prevalent problem.
In our proposed system, we have implemented credit card fraud detection using
machine learning, leveraging the advancements in machine learning techniques.

Machine learning has proven to be a successful approach for fraud detection,


especially given the large amount of data transferred during online transactions. This
results in a binary outcome: genuine or fraudulent. In our approach, fraudulent
datasets are used to construct features, which include data points such as the
customer's age and account value, as well as the origin of the credit card. These
features, numbering in the hundreds, contribute to varying extents to the fraud
probability.

It's important to note that the contribution level of each feature to the fraud
score is determined by the artificial intelligence of the machine, driven by the training
set, and not by a fraud analyst. In the context of card fraud, if the use of cards for
fraudulent activities is high, the fraud weighting of a credit card transaction will also
be high. Conversely, if fraudulent activity decreases, the contribution level will
decrease accordingly. These models self-learn without explicit programming, as seen
in manual review processes.

Credit card fraud detection using machine learning involves deploying


classification and regression algorithms. In this system, a supervised learning
algorithm, specifically the Random Forest algorithm, is utilized to classify fraudulent
card transactions, whether conducted online or offline. Random Forest is an advanced
version of the Decision Tree algorithm, known for its superior efficiency and
accuracy compared to other machine learning algorithms.

Random Forest addresses the correlation issue by selecting only a subsample


of the feature space at each split, aiming to make the trees de-correlated. Additionally,

1
it prunes the trees by setting a stopping criterion for node splits, a concept that will be
explored in more detail later in this study.

2
CHAPTER-2
LITERAURE SURVEY

1. Title: CREDIT CARD FRAUD DETECTION USING RANDOM FOREST

Authors: Devi Meenakshi, Janani, Gayathri, Mrs. Indira.

Abstract: This project focuses on real-world credit card fraud detection, addressing
the substantial increase in fraudulent activities accompanying the growth of credit
card transactions. The challenge lies in the absence of the cardholder during
purchases, making it difficult for merchants to verify the authenticity of the
transaction. The implementation of efficient fraud detection systems has become
crucial for minimizing losses. The proposed scheme utilizes the Random Forest
algorithm to enhance fraud detection accuracy. The classification process involves
analyzing the dataset and user current dataset, optimizing the result data accuracy.
The evaluation of technique performance includes metrics such as accuracy,
sensitivity, specificity, and precision. The processing of attributes identifies fraud
detection, providing graphical model visualization.

2. Title: CREDIT CARD FRAUD DETECTION

Authors: Teena Varma, Mahesh Poojari, Jobin Joseph, Ainsley Cardozo

Abstract: Credit card fraud prevention is a prevalent issue in the developed world,
particularly with the increasing popularity of e-commerce sites. This form of fraud is
often identified through fraudulent transactions. As credit card theft becomes more
common, with fraudsters using stolen credit card information for illicit purposes,
tracking online fraud transactions becomes essential. This study employs various
methods and introduces a random forest algorithm to identify suspicious transactions.
The algorithm is based on supervised learning, classifying the dataset using decision-
making processes to enhance the proposed scheme's consistency.

3
CHAPTER-3
SYSTEM ANALYSIS
3.1 Existing System
In the existing system, a case study on credit card fraud detection is conducted. Data
normalization is applied before Cluster Analysis, and the results from using Cluster
Analysis and Artificial Neural Networks show that clustering attributes can minimize
neuronal inputs. The research, based on unsupervised learning, focuses on finding
new methods for fraud detection to enhance accuracy. The dataset used in this study is
derived from real-life transactional data from a large European company, with
personal details kept confidential. The algorithm's accuracy is approximately 50%,
aiming to reduce cost measures. The proposed algorithm is Bayes minimum risk, and
some disadvantages include the introduction of a new collative comparison measure
and the use of a cost-sensitive method based on Bayes minimum risk.

Disadvantages:

1. Introduction of a new collative comparison measure representing gains and losses


in fraud detection.

2. A cost-sensitive method based on Bayes minimum risk is presented using the


proposed cost measure.

3.2 Proposed System


In the proposed system, the random forest algorithm is applied for the classification
of credit card datasets. Random Forest, an algorithm for classification and regression,
is essentially a collection of decision tree classifiers. It addresses the overfitting issue
present in decision trees by randomly sampling a subset of the training set for each
individual tree. Each node then splits on a feature selected from a random subset of
the full feature set. Random Forest is advantageous for large datasets with many
features, ensuring fast training and resistance to overfitting.

Advantages:

1. Random Forest naturally ranks the importance of variables in a regression or


classification problem.

4
2. The algorithm efficiently handles the 'amount' feature (transaction amount) and the
'class' feature (target class for binary classification, with values 1 for fraud and 0 for
non-fraud).

5
CHAPTER-4
SYSTEM REQUIREMENTS
Requirement Analysis

The project involved analyzing the design of a few applications to enhance user-
friendliness. Emphasis was placed on maintaining well-ordered navigations between
screens while minimizing user input. To improve accessibility, the application was
designed to be browser-compatible, ensuring compatibility with most browsers.

Requirement Specification

Functional Requirements:

Graphical User Interface (GUI) for user interaction.

Software Requirements:

For developing the application, the following software requirements are identified:

1. Python

Operating Systems Supported:

1. Windows

Technologies and Languages Used:

1. Python

4.1 Hardware Requirements

1. System: INTEL i5.

2. Hard Disk: 500 GB.

3. RAM: 4 GB.

4.2 Software Requirements


 Operating System: Windows.
 Coding Language: Python

6
CHAPTER-5
SYSTEM STUDY
5.1 Feasibility Study
In this phase, the feasibility of the project is analyzed, and a business proposal is
presented with a general plan and cost estimates. The feasibility study ensures that the
proposed system is viable and won't burden the company. Understanding the major
requirements for the system is crucial during system analysis.

5.2 Feasibility Analysis


Three key considerations are involved in the feasibility analysis:

Economical Feasibility

This study assesses the economic impact of the system on the organization. The
budgetary constraints and the funds available for research and development are
considered. The developed system must stay within the budget, and this was achieved
by utilizing freely available technologies, with only customized products requiring
purchase.

Technical Feasibility

The technical feasibility study examines the technical requirements of the system. The
system should not place high demands on available technical resources, ensuring that
it has modest requirements. Excessive demands on the client's technical resources are
to be avoided, and the developed system should necessitate minimal or no changes for
implementation.

Social Feasibility

This aspect assesses the level of acceptance of the system by users. User acceptance is
crucial, and the process of training users to efficiently use the system is considered.
Users should not feel threatened but should see the system as a necessity. Training
methods should instill confidence in users, encouraging constructive criticism, as they
are the final users of the system. The success of social feasibility relies on effective
user education and familiarity with the system.

7
CHAPTER-6
SYSTEM ARCHITECTURE
6.1 Data Flow Diagram

Figure 1 System Architecture

6.2 UML Diagrams


UML (Unified Modeling Language) Diagrams are a standardized way of visualizing
and documenting the design of a system. There are three main classifications of UML
diagrams:

1. Behavior Diagrams: Depict behavioral features of a system or business process,


including activity, state machine, and use case diagrams, as well as interaction
diagrams.

2. Interaction Diagrams: Emphasize object interactions and include communication,


interaction overview, sequence, and timing diagrams.

3. Structure Diagrams: Depict elements of a specification that are irrespective of time,


including class, composite structure, component, deployment, object, and package
diagrams.

6.2.1 Types of UML Diagrams


1. Class Diagrams: Describe the static structure of a system.

2. Package Diagrams: Organize elements into related groups to minimize


dependencies.
8
3. Object Diagrams: Describe the static structure of a system at a particular time.

4. Use Case Diagrams: Model the functionality of a system using actors and use
cases.

5. Sequence Diagrams: Describe interactions among classes over time.

6. Collaboration Diagrams: Represent interactions between objects as a series of


sequenced messages.

7. State Chart Diagrams: Describe the dynamic behavior of a system in response to


external stimuli.

8. Activity Diagrams: Illustrate the dynamic nature of a system by modeling the flow
of control.

9. Component Diagrams: Describe the organization of physical software components.

10. Deployment Diagrams: Depict the physical resources in a system.

6.2.2 Use Case Diagram


A use case diagram presents a graphical overview of the functionality
provided by a system in terms of actors, their goals (use cases), and dependencies
between those use cases. It shows what system functions are performed for which
actor and depicts the roles of actors in the system.

Figure 2 Use Case Diagram

9
6.2.3 Class Diagram
In software engineering, a class diagram in the Unified Modeling Language (UML) is
a type of static structure diagram that describes the structure of a system. It shows the
system's classes, their attributes, operations (or methods), and the relationships among
the classes. The class diagram explains which class contains information.

Figure 3 Class Diagram

6.2.4 Sequence Diagram


A sequence diagram in Unified Modeling Language (UML) is a kind of interaction
diagram that shows how processes operate with one another and in what order. It is a
construct of a Message Sequence Chart. Sequence diagrams are sometimes called
event diagrams, event scenarios, and timing diagrams.

Figure 4 Sequence Diagram

10
6.2.5 Collaboration Diagram

Figure 5 Collaboration Diagram

11
CHAPTER-7
INPUT AND OUTPUT DESIGN
7.1 Input Design
Input design serves as the connection between the information system and the
user. It involves specifying procedures for data preparation and the necessary steps to
transform transaction data into a usable form for processing. This transformation can
occur through data reading from written or printed documents or direct data entry by
individuals. The design of input focuses on minimizing input requirements,
controlling errors, avoiding delays, eliminating unnecessary steps, and keeping the
process simple. It also emphasizes security, ease of use, and privacy retention. Input
design considers:

 What data should be given as input?

 How the data should be arranged or coded?

 Dialogs to guide operating personnel in providing input.

 Methods for input validation and error handling procedures.


Objectives:
1. Error Prevention: Input design aims to prevent errors in the data input process
and guide management in obtaining accurate information from the
computerized system.

2. User-Friendly Interface: It involves creating user-friendly screens for data


entry to handle large volumes of data. The goal is to make data entry easier
and error-free, providing facilities for data manipulation and record viewing.

3. Validity Checks: Input design ensures that entered data is valid. Screens with
appropriate messages are designed to guide users and avoid confusion. The
objective is to create an input layout that is easy to follow.

7.2 Output Design


Quality output meets the end user's requirements and presents information
clearly. In any system, the results of processing are communicated through outputs to
users and other systems. Output design determines how information is displayed for

12
immediate needs and in hard copy format. It is the most direct source of information
for users and plays a crucial role in user decision-making.

1. Organized Design Process: Designing computer output should follow an


organized, well-thought-out approach. The right output needs to be developed,
ensuring that each output element is user-friendly and effective.

2. Identification of Specific Output Needs: Output design involves identifying


the specific outputs needed to meet system requirements.

3. Presentation Methods: Selection of methods for presenting information in a


format that is easily understandable.

4. Document Creation: Creation of documents, reports, or other formats


containing information produced by the system.

Objectives:
The output form of an information system should achieve one or more of the
following objectives:

 Convey information about past activities, current status, or projections of the


future.

 Signal important events, opportunities, problems, or warnings.

 Trigger an action.

 Confirm an action.

13
CHAPTER-8
IMPLEMENTATION
8.1 Modules
This project comprises four modules:

1. Data Collection

2. Data Pre-processing

3. Feature Extraction

4. Evaluation Model

8.1.1 Data Collection


The data used in this paper consists of product reviews gathered from credit card
transactions records. This step involves selecting a subset of all available data for
analysis. Machine learning problems typically start with labeled data, where the target
answers are already known.

8.1.2 Data Pre-processing


Organize your selected data through formatting, cleaning, and sampling. Three
common data pre-processing steps include:

 Formatting: Ensure the data is in a suitable format for analysis, such as


converting it from a relational database to a flat file.

 Cleaning: Remove or fix missing data instances, incomplete entries, or


sensitive information.

 Sampling: Take a representative sample of the data to reduce computational


and memory requirements.

8.1.3 Feature Extraction


Feature extraction involves attribute reduction, transforming attributes into
linear combinations of the original ones. The models are trained using a classifier
algorithm, utilizing the classify module from the Natural Language Toolkit library in
Python. The labeled dataset is employed for training, while the remaining labeled data

14
is used to evaluate the models. Popular machine learning algorithms, such as Random
Forest, are employed for text classification tasks.

8.1.4 Evaluation Model


Model evaluation is crucial in the model development process. It helps
identify the best model representing the data and its future performance. Evaluating
model performance with training data alone is insufficient, as it can lead to
overoptimistic and overfitted models. Two methods for evaluating models in data
science are Hold-Out and Cross-Validation, both using a test set unseen by the model.
The performance of each classification model is estimated based on its average, and
the results are visualized through graphs. Accuracy, defined as the percentage of
correct predictions for the test data, is calculated by dividing the number of correct
predictions by the total number of predictions.

15
CHAPTER-9
SOFTWARE ENVIRONMENT
9.1 Python
9.1.1 History of Python:
The programming language Python has its roots in the programming language
ABC, developed at CWI (Centrum Wiskunde & Informatica) in Amsterdam,
Netherlands. Guido van Rossum, who worked on the ABC project in the early 1980s,
was influenced by his experiences and frustrations with ABC when conceptualizing
Python. He aimed to design a simple scripting language with some of ABC's strengths
and without its problems. Python was born in the late 1980s, featuring a basic syntax,
indentation for statement grouping, and powerful data types.

9.1.2 Advantages of Python:


1. Extensive Libraries: Python comes with a wide range of libraries for various
purposes, eliminating the need to write extensive code manually.

2. Extensible: Python can be extended to other languages, allowing the


incorporation of code in languages like C++ or C.

3. Embeddable: Python can be embedded in the source code of other languages,


such as C++, providing scripting capabilities.

4. Simple and Easy: Python is known for its simplicity, ease of learning, and
reduced coding effort compared to languages like Java.

5. Readable: Python's syntax is readable and resembles English, aiding


comprehension. It uses indentation instead of curly braces for block
structure.

6. Object-Oriented: Python supports both procedural and object-oriented


programming paradigms, promoting code reusability and real-world
modeling.

7. Free and Open-Source: Python is freely available, and its source code can be
modified and distributed.

16
8. Portable: Code written in Python can run on different platforms without
modification, following the "Write Once Run Anywhere" (WORA) principle.

9. Interpreted: Python is an interpreted language, making debugging easier than


in compiled languages.

9.1.3 Advantages of Python Over Other Languages:


1. Less Coding: Python requires less code for various tasks compared to
other languages, with a rich standard library.

2. Affordable: Being free, Python is accessible to individuals, small


companies, and large organizations, with strong community support.

3. Python is for Everyone: Python code can run on any machine, making it
versatile for web apps, data analysis, machine learning, automation, web
scraping, games, and visualizations.

9.1.4 Disadvantages of Python:


1. Speed Limitations: Python's interpreted nature can result in slower
execution, impacting speed-sensitive projects.

2. Weak in Mobile Computing and Browsers: Python is less common on the


client-side and is seldom used for smartphone-based applications.

3. Design Restrictions: Dynamic typing can lead to run-time errors, as


variables are not explicitly declared.

9.2 Machine Learning


Before delving into machine learning methods, understanding machine
learning's role as a means of building models of data is crucial. Machine learning
involves building mathematical models with tunable parameters that adapt to
observed data. It helps predict and understand aspects of new data based on models
fitted to previous data.

9.2.1 Types of Machine Learning:


 Supervised Learning: Involves learning from labeled training data, using
classification and regression models.

17
 Unsupervised Learning: Utilizes unlabeled data to find underlying structures
through factor and cluster analysis models.

 Semi-supervised Learning: Combines unlabeled data with a small amount of


labeled data for improved accuracy at a lower cost.

 Reinforcement Learning: Learns optimal actions through trial and error,


deciding the next action based on the current state to maximize future rewards.

9.2.2 Advantages of Machine Learning:

1. Identifies Trends and Patterns: ML reviews large datasets, discovering


trends and patterns not easily apparent to humans.

2. No Human Intervention Needed (Automation): ML allows machines to


learn, make predictions, and improve algorithms without constant human
supervision.

3. Continuous Improvement: ML algorithms improve in accuracy and


efficiency as they gain experience, making better decisions over time.

4. Handling Multi-dimensional and Multi-variety Data: ML excels in


managing multi-dimensional and multi-variety data in dynamic or
uncertain environments.

5. Wide Applications: ML finds applications in diverse fields, delivering


personalized experiences to customers and targeting the right audience.

9.2.3 Disadvantages of Machine Learning:

1. Data Acquisition: ML requires massive, inclusive, unbiased, and high-


quality datasets for effective training.

2. Time and Resources: ML needs time and substantial resources to learn and
develop accurate and relevant algorithms.

3. Interpretation of Results: Accurate interpretation of results generated by


ML algorithms can be challenging, requiring careful algorithm selection.

18
4. High Error-Susceptibility: ML, while autonomous, is prone to errors,
especially if trained with biased datasets, leading to irrelevant predictions.

9.3 Modules Used in Project:

1. TensorFlow: A free and open-source software library for dataflow and


differentiable programming used for machine learning applications.

2. Numpy: A general-purpose array-processing package providing high-


performance multidimensional array objects for scientific computing.

3. Pandas: An open-source Python library for high-performance data


manipulation and analysis, facilitating data processing in various domains.

4. Matplotlib: A Python 2D plotting library for generating publication-quality


figures, charts, and graphs.

5. Scikit-learn: A Python library providing supervised and unsupervised


learning algorithms for academic and commercial use.

9.4 How to Install Python on Windows and Mac:


To install Python, follow these steps:

1. Visit the official site: https://www.python.org.

2. Download the correct version based on your system requirements.

3. Install the downloaded version, considering your operating system and


processor type.

There have been several updates in the Python version over the years. The question is
how to install Python? It might be confusing for the beginner who is willing to start
learning Python, but this tutorial will solve your query. The latest or the newest
version of Python is version 3.7.4 or in other words, it is Python 3. Note: The Python
version 3.7.4 cannot be used on Windows XP or earlier devices.

Before you start with the installation process of Python. First, you need to know about
your System Requirements. Based on your system type i.e. operating system and
based processor, you must download the Python version. My system type is a
19
Windows 64-bit operating system. So the steps below are to install Python version
3.7.4 on a Windows 7 device or to install Python 3.

Download the Python Cheatsheet here. The steps on how to install Python on
Windows 10, 8 and 7 are divided into 4 parts to help understand better.

Download the Correct version into the system

Step 1: Go to the official site to download and install Python using Google Chrome or
any other web browser. OR Click on the following link: [https://www.python.org]
(https://www.python.org)

Now, check for the latest and the correct version for your operating system.

Step 2: Click on the Download Tab.

Step 3: You can either select the Download Python for Windows 3.7.4 button in
Yellow Color or you can scroll further down and click on download with respective to
their version. Here, we are downloading the most recent Python version for Windows
3.7.4

20
Step 4: Scroll down the page until you find the Files option.

Step 5: Here you see a different version of Python along with the operating system.

21
 To download Windows 32-bit Python, you can select any one from the three
options: Windows x86 embeddable zip file, Windows x86 executable installer,
or Windows x86 web-based installer.
 To download Windows 64-bit Python, you can select any one from the three
options: Windows x86-64 embeddable zip file, Windows x86-64 executable
installer, or Windows x86-64 web-based installer.
 Here we will install Windows x86-64 web-based installer. Here your first part
regarding which version of Python is to be downloaded is completed. Now we
move ahead with the second part in installing Python i.e. Installation
 Note: To know the changes or updates that are made in the version you can
click on the ReleaseNote Option.

Installation of Python

Step 1: Go to Download and Open the downloaded Python version to carry out the
installation process.

Step 2: Before you click on Install Now, Make sure to put a tick on Add Python 3.7 to
PATH.

22
Step 3: Click on Install NOW After the installation is successful. Click on Close.

With these above three steps on Python installation, you have successfully and
correctly installed Python. Now is the time to verify the installation.

Note: The installation process might take a couple of minutes.

Verify the Python Installation

Step 1: Click on Start

Step 2: In the Windows Run Command, type “cmd”.

Step 3: Open the Command prompt option.

Step 4: Let us test whether the Python is correctly installed. Type python –V and press
Enter.

23
Step 5: You will get the answer as 3.7.4

Note: If you have any of the earlier versions of Python already installed. You must
first uninstall the earlier version and then install the new one.

Check how the Python IDLE works

Step 1: Click on Start

Step 2: In the Windows Run command, type “python idle”.

Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program

Step 4: To go ahead with working in IDLE you must first save the file. Click on File >
Click on Save

Step 5: Name the file and save as type should be Python files. Click on SAVE. Here I
have named the files as Hey World.

Step 6: Now for e.g. enter print

24
CHAPTER-10
SOFTWARE TESTING
10.1 System Test
The purpose of testing is to discover errors. Testing is the process of trying to
discover every conceivable fault or weakness in a work product. It provides a way to
check the functionality of components, sub-assemblies, assemblies, and/or a finished
product. It is the process of exercising software with the intent of ensuring that the
software system meets its requirements and user expectations and does not fail in an
unacceptable manner. There are various types of tests. Each test type addresses a
specific testing requirement.

10.2 Types of Tests


Unit Testing

Unit testing involves the design of test cases that validate that the internal
program logic is functioning properly, and that program inputs produce valid outputs.
All decision branches and internal code flow should be validated. It is the testing of
individual software units of the application. It is done after the completion of an
individual unit before integration. This is a structural testing that relies on knowledge
of its construction and is invasive. Unit tests perform basic tests at the component
level and test a specific business process, application, and/or system configuration.
Unit tests ensure that each unique path of a business process performs accurately to
the documented specifications and contains clearly defined inputs and expected
results.

Integration Testing

Integration tests are designed to test integrated software components to


determine if they actually run as one program. Testing is event-driven and is more
concerned with the basic outcome of screens or fields. Integration tests demonstrate
that although the components were individually satisfactory, as shown by successful
unit testing, the combination of components is correct and consistent. Integration
testing is specifically aimed at exposing the problems that arise from the combination
of components.

25
Functional Test

Functional tests provide systematic demonstrations that functions tested are


available as specified by the business and technical requirements, system
documentation, and user manuals. Functional testing is centered on the following
items:

 Valid Input: identified classes of valid input must be accepted.


 Invalid Input: identified classes of invalid input must be rejected.
 Functions: identified functions must be exercised.
 Output: identified classes of application outputs must be exercised.
 Systems/Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests focus on requirements, key


functions, or special test cases. In addition, systematic coverage pertaining to
identified business process flows, data fields, predefined processes, and successive
processes must be considered for testing. Before functional testing is complete,
additional tests are identified, and the effective value of current tests is determined.

System Test

System testing ensures that the entire software system meets requirements. It
tests a configuration to ensure known and predictable results. An example of system
testing is the configuration-oriented system integration test. System testing is based
on process descriptions and flows, emphasizing pre-driven process links and
integration points.

White Box Testing

White Box Testing is a testing in which the software tester has knowledge of
the inner workings, structure, and language of the software, or at least its purpose. It
is used to test areas that cannot be reached from a black box level.

Black Box Testing

Black Box Testing is testing the software without any knowledge of the inner
workings, structure, or language of the module being tested. Black box tests, like
most other kinds of tests, must be written from a definitive source document, such as
26
a specification or requirements document. It is a testing in which the software under
test is treated as a black box. You cannot “see” into it. The test provides inputs and
responds to outputs without considering how the software works.

Acceptance Testing

User Acceptance Testing is a critical phase of any project and requires


significant participation by the end user. It also ensures that the system meets the
functional requirements.

27
CHAPTER-11
RESULT AND DISCUSSION

11.1 Output Screens


Employing Random Forest & CART Methods for Credit Card Fraud Analysis

In this project, we are using the Python Random Forest inbuilt CART algorithm to detect
fraud transactions from a credit card dataset. We downloaded this dataset from the 'Kaggle'
website using the following URL:

Dataset

URL:[https://www.kaggle.com/mlg-ulb/creditcardfraud](https://www.kaggle.com/mlg-ulb/
creditcardfraud)

Using the 'CreditCardFraud.csv' file, we will train the Random Forest algorithm and then
upload a test data file. This test data will be applied to the Random Forest train model to
predict whether the test data contains normal or fraud transaction signatures. When we upload
test data, it will contain only transaction data, and no class label will be there. The application
will predict and give the result.

Random forests are a supervised learning algorithm that can be used for both classification
and regression. It is the most flexible and easy-to-use algorithm. A forest is comprised of
trees, and the more trees it has, the more robust the forest is. Random forests create decision
trees on randomly selected data samples, get predictions from each tree, and select the best
solution by means of voting. It also provides a pretty good indicator of the feature
importance. Python's SKLEARN inbuilt contains support for CART with all decision trees
and a random forest classifier.

Random forests have a variety of applications, such as recommendation engines, image


classification, and feature selection. They can be used to classify loyal loan applicants,
identify fraudulent activity, and predict diseases. They lie at the base of the Boruta algorithm,
which selects important features in a dataset.

28
Screen Shots

To run the project, double click on the 'run.bat' file to get the following screen:

In the above screen, click on the 'Upload Credit Card Dataset' button to upload the dataset.
After uploading, the dataset will get the below screen:

29
Now click on 'Generate Train & Test Model' to generate a training model for the Random
Forest Classifier.

In the above screen, after generating the model, we can see the total records available in the
dataset and then how many records the application is using for training and testing. Now click
on the 'Run Random Forest Algorithm' button to generate the Random Forest model on train
and test data.
30
In the above screen, we can see that Random Forest generates 99.78% accuracy while
building the model on train and test data. Now click on 'Detect Fraud From Test Data' button
to upload test data and to predict whether the test data contains normal or fraud transactions.

In the above screen, I am uploading the test dataset, and after uploading test data, we will get
the below prediction details.

31
In the above screen, beside each test data, the application will display the output as whether
the transaction contains cleaned or fraud signatures. Now click on 'Clean & Fraud
Transaction Detection Graph' button to see the total test transactions with clean and fraud
signatures in graphical format.

In the above graph, we can see the total test data and the number of normal and fraud
transactions detected. In the above graph, the x-axis represents the type, and the y-axis
represents the count of clean and fraud transactions.

32
CHAPTER-12
CONCLUSION

The Random Forest algorithm demonstrates improved performance with a larger


number of training data, although there is a trade-off with speed during testing and
application. The application could benefit from the implementation of more pre-processing
techniques. On the other hand, the SVM algorithm still faces challenges related to imbalanced
datasets, and further preprocessing is needed to enhance the results. While the results
achieved by SVM are significant, they could be further improved with additional data
preprocessing.

33
CHAPTER-13
REFERENCES/BIBLIOGRAPHY

1. Sudhamathy G: "Credit Risk Analysis and Prediction Modelling of Bank Loans


Using R," vol. 8, no-5, pp. 1954-1966.
2. LI Changjian, HU Peng: "Credit Risk Assessment for Rural Credit Cooperatives
based on Improved Neural Network," International Conference on Smart Grid and
Electrical Automation vol. 60, no. - 3, pp 227-230, 2017.
3. Wei Sun, Chen-Guang Yang, Jian-Xun Qi: "Credit Risk Assessment in Commercial
Banks Based On Support Vector Machines," vol.6, pp 2430-2433, 2006.
4. Amlan Kundu, Suvasini Panigrahi, Shamik Sural, Senior Member, IEEE, "BLAST-
SSAHA Hybridization for Credit Card Fraud Detection," vol. 6, no. 4 pp. 309-315,
2009.
5. Y. Sahin and E. Duman, "Detecting Credit Card Fraud by Decision Trees and Support
Vector Machines," Proceedings of the International Multi-Conference of Engineers
and Computer Scientists, vol. I, 2011.
6. Sitaram Patel, Sunita Gond, "Supervised Machine (SVM) Learning for Credit Card
Fraud Detection," International Journal of Engineering Trends and Technology, vol.
8, no. -3, pp. 137-140, 2014.
7. Snehal Patil, Harshada Somavanshi, Jyoti Gaikwad, Amruta Deshmane, Rinku
Badgujar, "Credit Card Fraud Detection Using Decision Tree Induction Algorithm,"
International Journal of Computer Science and Mobile Computing, Vol.4 Issue.4,
April-2015, pg. 92-95.
8. Dahee Choi and Kyungho Lee, "Machine Learning based Approach to Financial
Fraud Detection Process in Mobile Payment System," vol. 5, no. - 4, December 2017,
pp. 12-24.

34

You might also like