You are on page 1of 5

CHAPTER THREE

RESEARCH METHODOLOGY

3.1 Introduction

The system analysis, tools and methods employed in the research work is discussed in this
chapter. This study technique will direct the methodical approach we take to gathering data,
preparing it, choosing models, and evaluating them. We seek to create a strong and accurate
system capable of identifying and stopping fraudulent transactions by fusing the power of
machine learning with credit card transaction data, thereby enhancing security and trust in the
constantly changing world of digital money. To gain insights, validate our findings, and make a
contribution to the field of fraud detection technology, we will follow a strict process throughout
this journey. The various tools used in this research work, the research approach or methods
engaged in this research work were also discussed in this chapter.

3.2 Research Methodology

3.2.1 Dataset

In this research, we use a dataset that includes credit card transactions that were made by
European cardholders for 2 days in September 2013. This dataset contains 284807 transactions in
total in which 0.172% of the transactions are fraudulent. The dataset has the following 30
features (V1..., V28), Time and Amount. All the attributes within the dataset are numerical. The
last column represents the class (type of transaction) whereby the value of 1 denotes a fraudulent
transaction and the value of 0 otherwise. The features V1 to V28 are not named for data security
and integrity reasons kaggle.com (2021). In order to solve the issue of class imbalance, we
applied the Synthetic Minority Oversampling Technique (SMOTE) method in the Data-
Preprocessing phase of the proposed framework in Fig. 1. The SMOTE method works by picking
samples that are close to each other within the feature space, drawing a line between the data
points in the feature space and creating a new instance of the minority class at a point along the
line.
3.3 Method of Data Collection

The primary and secondary methods of data collection are two approaches used to gather
information for research or analysis purposes.

Primary data collection is gathering original information directly from the source or via direct
engagement with respondents. This strategy enables researchers to gather firsthand knowledge
that is unique to their study aims. There are several methods for gathering primary data,
including:

a. Questionnaires and Surveys: Researchers create organized questionnaires or surveys to obtain


information from people or groups. Face-to-face interviews, phone conversations, mail, and
internet platforms can all be used.

b. Interviews: In interviews, the researcher and the responder interact directly. They can take
place in person, over the phone, or via video conferencing. Structured interviews (with preset
questions), semi-structured interviews (providing flexibility), and unstructured interviews (more
conversational) are all options.

c. Observations: Researchers observe and document natural behaviors, acts, or occurrences. This
strategy can be used to collect data about human behavior, interactions, or occurrences without
requiring direct involvement.

d. Experiments: Experiments entail manipulating factors to see how they affect the outcome.
Researchers manipulate the variables and gather data in order to form conclusions regarding
cause-and-effect correlations.

e. Focus Groups: A focus group is a small group of people who meet in a controlled setting to
discuss a given issue. This strategy aids in comprehending the participants' ideas, perceptions,
and experiences.

Data collecting entails gathering relevant information on a given subject of research. The
essential data for this research topic was acquired from secondary sources, notably from the
dataset accessible at kaggle repository.

3.4 Research Approach


The random Forest approach is employed as the fitness method within the GA in this work.
Furthermore, the RF approach is used because it eliminates the over-fitting problem that is
sometimes seen when utilizing standard Decision Trees (DTs). Furthermore, RF performs well
with both continuous and categorical variables, and RF is recognized to perform best on datasets
with class imbalance. Furthermore, because the RF is a rule-based technique, data normalization
is not necessary. Khalilia et al (2011). Tree-based ML methods such as Extra-Trees and Extreme
Gradient Boosting Abhishek (2020), Chen et al (2015) are alternatives to the RF.

The fitness method is defined a function that receives a candidate solution (a feature vector) and
determines whether it is fit or not. The measure of fitness is determined by the accuracy that is
yielded by a particular attribute vector in the testing process of the RF method within the GA.
Algorithm 1 provides more details about the implementation of RF in the GA.

The architecture of the proposed methodology is depicted in Fig. 3


Architecture of the proposed framework

3.6 Tools and Techniques


Reference

1. The Credit card fraud [Online]. https://www.kaggle.com/mlg-ulb/creditcardfraud


2. Dornadula VN, Geetha S. Credit card fraud detection using machine learning algorithms.
Proc Comput Sci. 2019;165:631–41.
3. Campus K. Credit card fraud detection using machine learning models and collating
machine learning models. Int J Pure Appl Math. 2018;118(20):825–38.
4. Varmedja D, Karanovic M, Sladojevic S, Arsenovic M, Anderla A. Credit card fraud
detection-machine learning methods. In: 18th international symposium INFOTEH-
JAHORINA (INFOTEH); 2019. p. 1-5.
5. Awoyemi JO, Adetunmbi AO, Oluwadare SA. Credit card fraud detection using machine
learning techniques: a comparative analysis. In: International conference on computer
networks and Information (ICCNI); 2017. p. 1-9.
6. Guo S, Liu Y, Chen R, Sun X, Wang X. X, Improved SMOTE algorithm to deal with
imbalanced activity classes in smart homes. Neural Process Lett. 2019;50(2):1503–26.
7. Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly
imbalanced data using random forest. BMC Med Inf Decis Mak. 2011;11(1):1–13.

8. Abhishek L. Optical character recognition using ensemble of SVM, MLP and extra
trees classifier. In: International conference for emerging technology (INCET)
IEEE; 2020. p. 1–4.
9. Chen T, He T, Benesty M, Khotilovich V, Tang Y, Cho H. Xgboost: extreme
gradient boosting. R package version 04-2. 2015;1(4):1–4.

You might also like