You are on page 1of 8

Evaluating Random Forest, XGBoost, and Artificial

Neural Network in Detecting Credit Card Fraud


James Martin Vincent Febrien Laurensius Fabian Novandito
Computer Science Department Computer Science Department Computer Science Department
School of Computer Science School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
james.martin@binus.ac.id vincent.febrien@binus.ac.id laurensius.fabian@binus.ac.id

Henry Lucky Derwin Suhartono


Computer Science Department Computer Science Department
School of Computer Science School of Computer Science
Bina Nusantara University Bina Nusantara University
Jakarta, Indonesia Jakarta, Indonesia
henry.lucky@binus.ac.id dsuhartono@binus.edu

Abstract—The usage of credit card is constantly improving due application fraud and behavior fraud [5]. Application fraud
to the ease in making transactions in online system. These huge occurs when a fraudster uses false information or other people’s
number of usages cause many fraudsters to commit a fraud credit information to apply for a new credit card. On the other hand,
card to surpass the online payment. Because of the usage of credit behavior fraud is done by stealing legitimate card details to make
card are very high, detecting them with only human resources may purchases. According to [6], falsification accounts are the
not be effective. Hence, the aim of this paper is to propose and greatest type of card fraud among the other different methods of
evaluate several models in detecting the credit card fraud, such as obtaining other personal details to get a legal card which
Random Forest, XGBoost, and Artificial Neural Network. From afterwards can be duplicated and used without the permission of
the gathered dataset, we found that the ratio between fraud and
the cardholder. In contrary, stealing a credit card is a minor type
non-fraud class are not balanced. Hence, we decided to conduct
several sampling methods, such as Random Undersampling and
of fraud.
SMOTE. Based on the results, all models conducted in this Once fraudsters successfully obtain personal details from the
research are performing quite well without any help from the cardholder, they begin to use this opportunity to create a fraud
sampling methods. Furthermore, it is found that Random card to do transactions illegally. This dangerous incident caused
Undersampling caused a negative performance in detecting credit many cardholders to become frustrated since they were not
card fraud. Moreover, models that are conducted with SMOTE capable of reporting the theft, loss, or fraudulent use of the card
show a relatively balanced between precision and recall. [7]. It is estimated that the worldwide card fraud accounts are
about 0.055 percent of the sum of all credit card accounts [6].
Keywords—Credit Card Fraud, Machine Learning, Sampling, Statistics from [8] believed that these number of credit card
Random Forest, XGBoost, Artificial Neural Network frauds may cause loss which about $1 billion and it was expected
to reach more than $3 billion in future. These enormous number
I. INTRODUCTION of losses pose a threat to many enterprises and public institutions
where they must face a growing presence of fraudulent activities
The popularity of credit card payments is continuously and at the same time, they must find an automatic system that is
increasing as people move further to online shopping and capable of detecting fraud. For this case, automatic systems are
technological developments. Most people use a card-not-present far more effective due to the incapability of humans detecting
transaction through online payment systems to make payment in fraudulent patterns in transaction datasets that are often provided
e-commerce or other transactions [1]. This integration between by a vast number of samples that consist of many labels and
a credit card and e-payment platform brings convenience to online updates [7].
many people in their daily life. For this reason, the usage of As credit cards have assisted the needs of many people, of
credit cards has started to spread all over the world, which causes course, they have some drawbacks, including security, and this
credit cards slowly to become the most popular and efficient loophole that is used by irresponsible people to take advantage
means of payment [2]. Due to these large usage levels, it is of these loopholes for their own benefit which can harm others.
claimed that it may lure fraudsters to commit more criminal This loss can be categorized as an economic loss that can be felt
activities, including credit card fraud [3]. by both the client and the bank itself. For this reason, a model
Fraud is defined as any activity that involves deceiving that is capable of detecting fraud is important to avoid any
another for the purpose of obtaining a gain such as money or unintended economic consequences that could result in a bad
property [4]. With fraud being so common, there are numerous way [6].
ways to categorize it, and credit card fraud is one of them. In There are many types of algorithms that are used for model
general, there are two types of credit card fraud, such as detection. Most studies that are conducting research about

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


detecting credit card fraud are commonly using neural networks Response Code, Transaction above 100$ with the highest
model as the main algorithm [9][10][11][12]. There are also accuracy is by using support vector machine, which can achieve
researchers that used ensemble models, for instance Random higher level of performance result by resampling.
Forest Tree, Logistic Regression, and Decision Tree [13][14]. Research has been conducted by [18], they managed to
Despite there are many algorithms that are capable of detecting compare a few machine learning algorithms such as Decision
credit card fraud, some algorithms may not perform well when Tree, K-NN, Logistic Regression, Random Forest, and Naïve
the data distribution is unbalanced. In general, unbalanced Bayes and found that Decision Tree has the best result to detect
dataset is the factor that affects the model performance in fraudulent transaction. Although this method needs to be
performing detection tasks [7]. Hence, there are some studies assisted by the help of unsupervised methods. This is so that the
that apply random sampling to maintain the model performance result and performance met the expectations of the researcher.
by balancing the number of classes, for instance SMOTE
algorithm for oversampling the dataset [14]. All these methods B. Ensemble Learning Approaches
and models that are mentioned above are based on machine
learning algorithms. Ensemble methods have been proven to be more accurate in
This paper aims in conducting an experimental comparison solving real-world problems than using a single classifier
of several machine learning models and techniques on detecting [13][19]. The research by [13] introduced two types of random
credit card fraud. Since there are multiple algorithms that are forest in dealing with credit card fraud problem. The first was
capable of detecting tasks, this experiment only implements the random forest that used a Manhattan distance function to
three various models, which are Random Forest Tree, Extreme compute the center of different classes of data. The second was
Gradient Boosting, and Artificial Neural Networks. Moreover, the random forest that used the minimum Gini impurity criteria
to ensure the dataset is balanced, this experiment begins with to split the value. The main difference between these two
analyzing the dataset followed up with implementing sampling algorithms is the way of splitting nodes. Both have their own
method if required. The result of this experiment is reported by advantages and drawbacks. The first model is faster in
evaluating the model precision, recall, and AUC-PR in detecting calculation but slower in distributing the data. On the other
credit card fraud. If sampling is performed, the result of this hand, the second model is slower in calculation but faster in
experiment is added with the evaluation of whether a sampling distributing the data.
method is improving the model in performing its tasks or not. Another study by [20] used an ensemble algorithm
This study consists of the following: Section 2 discusses the consisting of Fuzzy Rough Nearest Neighbor (FRNN) and
relevant study that has been done on fraud detection. Section 3 Sequential Minimal Optimization (SMO) as the base classifiers
describes the research methodology used in current study.
and coupled with Logistic Regression (LR). Using this
Section 4 discusses the result findings and analysis of this study.
proposed combination of classification models achieved
Section 5 consists of the conclusion.
significant and promising results in terms of detection rate, false
II. RELATED WORKS alarm rate, specificity, positive predictive value, f-measure,
ROC curves, and AUC areas. Based on experiment, it shows
Researchers have conducted many approaches about credit that the detection rate for the Australian credit approval dataset
card fraud detection framework. Numerous machine learning is 85.95%, and the AUC is 0.8555, and the detection rate for the
algorithms have been developed to overcome credit card fraud. German credit approval dataset is 76.3%, with a 0.6795 AUC.
This section reviews several related studies to contextualize the This algorithm is proven to be able to produce effective
present work. As there are many state-of-the-art methods, this detection.
section is divided into four parts: a review of research on simple Another ensemble method is the combination of Random
machine learning, ensemble learning, deep learning, and
Forest and Feed Forward Network [21]. Both methods have
methods of handling imbalanced dataset.
disadvantages, so they complement each other to generate high-
quality prediction. Neural Network is able to correctly identify
A. Simple Machine Learning Approaches
fraudulent transactions, while Random Forest can correctly
An experiment proposed by [15] shows that K-Nearest recognize normal transactions. As the result, this method has a
Neighbor model outperformed in prediction. The accuracy score better result comparing to other popular machine learning
got by KNN is 99,4%. They used oversampling method to models with a precision of 85.85% and accuracy of 99.95%.
handle the minority class of dataset with Adaptive Synthetic Moreover, study conducted by [19] proposed a novelty
Sampling Approach for Imbalanced Learning (ADASYN) framework OXGBoost, a XGBoost model that used a
technique. One Class Support Vector Machine and T2 control hyperparameter optimizer named RandomizedSearchCV to
chart can be effective in detecting credit card fraud [16]. Both help the model in fitting the data without conducting a sampling
methods are using a single class to learn its characteristics and technique. The result indicated that the model achieved a good
determine whether new observations are abnormal. The result
performance compared to the XGBoost that used a sampling
shows that OCSVM method outperform to T2 control chart with
the accuracy of 96.6%, FPR of 8.5% and F-score of 100%. method. Another study by [22] used an optimized light gradient
However, the T2 is still reliable in detecting credit card fraud. boosting approach (OLightGBM). This proposed approach
These results are also strengthened by research from [17] they used Bayesian-based hyperparameter optimization to tune the
compared various learning method and found that to address model parameters, such as number of leaves per tree, maximum
problem like Risky MCC, Unknown web address, ISO- depth of tree, and learning rate. The experiment result observed
that this proposed approach scored greater in terms of AUC, performance, it showed that the model had a 69.8% training
accuracy, recall, and precision compared to several accuracy and 71.33 testing accuracy for the German credit card
classification algorithms, which are logistic regression, SVM, fraud dataset.
etc. Another novelty framework by [9] used deviation networks
Another research to try combining data science and machine to determine an object being an anomaly by identifying the
learning has been done by [23] to try to detect algorithms that anomaly score with Gaussian prior and Z-Score based deviation
can track and find 100% of credit card fraudulent transaction. loss. Based on the research, deviation networks have the best
They did the test by comparing Isolation Forest and Local performance both in AUC-ROC performance and AUC-PR
performance compared to other anomaly detector algorithms. It
Outlier Factor. Through this test, they have found that both have
is stated that the framework can be improved significantly better
similar accuracy, but Isolation Forest has better result in
by randomly sampling negative features and positive features
precision, recall, and FI-Score. set. Overall, it still can perform well in detecting anomalies, such
Study shows that time variance has a significant impact on as credit card fraud.
the performance of machine learning algorithms. Researchers Furthermore, a study conducted by [27] proposed a multi-
stated that they obtained that algorithm perform better and the layer perceptron model in detecting credit card fraud. The study
AUC of the dataset with time variables from Naïve Bayes, conducted several combinations to achieve an optimal model.
SVM, Decision Tree than without the time variables [24]. As a result, it is shown that the MLP consists of 2 hidden layers
While Adaptive boost and Bagging have a slight performance with 1000 neurons and ReLU activation function obtains the
increase in dataset that doesn’t have time variables. They also highest F1-score which is 87%. Moreover, the highest precision
stated that future works are needed to handle imbalance dataset is 96% where the MLP uses 2 hidden layers with 10 neurons and
to improve efficiency. ReLU activation function and 2 hidden layers with 10 neurons
To provide more insight the suitable algorithm for and tanh activation function. The highest recall is about 83%
fraudulent detection, research from [25] is done by comparing where the MLP consists of 3 hidden layers with 100 neurons and
Random Forest and Adaboost Algorithm, in which they show logistic activation function. As the imbalanced data is not taken
similarities in performance especially in accuracy. They also care of, the model failed in detecting some credit card fraud.
found that, even they have the same accuracy result, in terms of Reference from [28] compare a sequence learner, Long
precision, recall, and FI-score, Random Forest still has the Short-Term Memory (LSTM), and static learner, Random
better performance results. Forest. LSTM is a special type of Recurrent Neural Network that
process information in a sequence of data entering, being stored
Another study by [26] has further shown that Random
in and leaving the network. In result, the sequence leaner or
Forest has better result compared to other classifiers. This was
LSTM is strongly improving the fraud detection accuracy,
done by comparing Random Forest, Tree Classifiers, ANN, especially on offline transactions. Manual feature aggregation
Support Vector Machine, Naïve Bayes, Logistic Regression strategies could be beneficial to both sequential and non-
and Gradient Boosting to find the most efficient machine sequential learning approaches.
learning method to detect fraudulent transaction. They also
stated that they want to achieve the highest efficiency possible D. Methods of Handling Imbalanced Dataset
out of all the machine learning that was tested. To do that, this
classifier is helped by using feedbacks from future transactions. A study conducted by [29] implemented a Random
As time goes on and by feeding more data, the efficiency of this Undersampling method combined with the Gaussian Mixture
classifier can be improved. Model to balance the data distribution. According to [29],
undersampling has proven to perform better than oversampling
C. Deep Learning Approaches in handling imbalanced datasets due to oversampling may
easily cause overfitting. Moreover, instead of sampling based
Various models of fraud detection have been made and on randomness, the study proposed an undersampling method
developed to help reduce fraudulent activities. Such as recent based on Gaussian Mixture Model. The datasets that were
study from [10] proposed a deep learning based. They found that conducted in the study are 16 public datasets which included
ANN or artificial neural network has the accuracy up to 100%
the credit card transaction dataset. Furthermore, it is found that
which is the best for fraud detection in a credit card transaction,
the undersampled dataset performs good AUC and F1 values
they first compare it with other machine learning algorithm such
as KNN, support vector machine etc. They found out that ANN with the decision tree model. Moreover, study by [30] used a
gives the best accuracy for credit card fraud detection. clustering algorithm to combine the nearest neighbor of the
A novelty framework by [11] that combined gradient minority class to create a balanced class. The result shows that
boosted trees with neural networks showed a satisfactory the AUC average of random forest based on the partitioning and
performance for tabular data. It started with creating a sequential clustering is about 0.965. While the AUC average of
structure of layers with input and output layers. The weights are random forest based on undersampling is about 0.947.
determined by the feature importance of a gradient boosted tree. Therefore, it can be concluded that the proposed technique
Moreover, each layer in the model connected with a gradient improved better performance in the Random Forest model.
boosted tree. At the time of training, it began with forward and Another study that has been done by [31] on a very
backward propagation to ensure all the weights of the layers get imbalance dataset has proven that unsupervised machine
updated based on the gradient descent. The training was learning can handle better the skewness of data set than the
repeated until the model reached the optimal result. As for the supervised method. Researcher tried to compare between
unsupervised and supervised algorithms, and the results was
unsupervised also give the best results of classification. During
the test, there were cases when there are no true positive and
true negative detected by the classifier.
In data pre-processing, [32] make a slight modification
before conducting data training. First, they use a clustering
method to categorize the cardholders based on their transaction
amounts. Then, split each transaction into its respective groups
with the Sliding-Window method. Furthermore, it was found
that the classifier performed better than before after applying
SMOTE (Synthetic Minority Over-Sampling Technique) to the
dataset. In dealing with imbalance dataset, Matthew Coefficient
Correlation performed more effectively than one-class Fig. 1. Workflow of the research
classifiers. Another way to handle imbalanced dataset proposed
by [33] is Generative Adversarial Networks. These deep A. Data Gathering
learning models can improve the performance of classifiers by The dataset used in the research is obtained from the Kaggle
creating large number of minority-class examples that will be website where it has been collected and analyzed by the
used to rebalance the dataset. Since the credit card dataset often Machine Learning Group of ULB (Université Libre de Bruxelle)
does not include detailed sequential information, [34] proposes by using big data mining. The dataset contains credit card
a history-based features using Hidden Markov models to solve transactions processed by European cardholders in September
this problem. They generate eight new HMM-based features 2013. From a total of 284,807 transactions, there are a total of
and optimized its parameters with three perspectives for 492 fraud transactions and a total of 284,315 non-fraud
modelling a sequence of transactions. In the end, the 8 trained transactions. As a result, it can be concluded that the dataset is
HMMs (Hidden Markov Models) will model 4 types of highly unbalanced, with the fraud accounts of 0.172% for all the
behavior. By using this approach, it can include a wide range of transactions. Hence, dataset imbalance problem should be taken
sequential information to the dataset. into consideration.
In conclusion, each machine learning algorithm has its own The dataset only contains numerical values, and each
advantages and disadvantages in detecting credit card fraud. transaction consists of 31 features, such as time, amount, class,
Some researchers also created their own modeling framework and the rest of the features are the component from the PCA
(principal component analysis) transformation. For details, the
to boost fraud detection more accurately. Most of the modeling
time feature contains the number of seconds elapsed between the
framework uses more than one machine learning model. Since current transaction and the previous transaction in the dataset.
most of the related works used ensemble learning and deep The amount feature refers to the amount of the transaction. The
learning. Hence, we decided to use two ensemble learning class feature is the main variable that takes value 1 as the fraud
models, which are Random Forest and XGBoost, and one deep and 0 otherwise. As for the features obtained from the PCA, the
learning model, which is ANN for the comparison in detecting original information is not provided by the publisher of the
credit card fraud. XGboost can improve weak learner dataset. Moreover, the dataset does not contain any null values
performance better than some other boosting algorithms [35]. in each feature.
Moreover, both oversampling and undersampling cause a
significant change in helping models to train in imbalanced B. Preprocessing Data
dataset [36]. Therefore, we decided to examine three different
models followed up with two different sampling methods in This section explains the methods that are used for
dealing with imbalanced dataset. preprocessing data to ensure the model can perform well in
training. Below are some methods that were sequentially used
III. METHODOLOGY in this research followed up with some results:
This section shows the methodology used in this research to a. Feature Scaling
compare several machine learning algorithms, which are
TABLE II. DATASET DESCRIPTION (BEFORE SCALING)
Random Forest, XGboost, and Artificial Neural Networks
(ANN) on detecting credit card fraud. All models that are Features
Function
conducted in this research are trained with the same dataset and Time Amount V1 V2 V3 ...
it has been preprocessed before. As for the workflow, the mean 94813.86 88.35 0.00 0.00 0.00 ...
research starts from data gathering, preprocessing data, std 47488.1 250.12 1.96 1.65 1.52 ...
undersampling and oversampling data, modeling, and min 0.00 0.00 -56.41 -72.72 -48.33 ...
evaluating models as shown in Figure 1. Each step in the
max 172792.0 25691.16 2.45 22.06 9.38 ...
methodology is explained in further sections.
Feature scaling is a strategy that is used to rescale
independent features into a set of ranges. Based on the
observation that is shown in Table I, the values between
the transformed features, which are V1, V2, V3, … and Testing 148 85,295 85,443 0.00173
the other non-transformed features, such as time and
amount have a great difference between their range. These Based on Table III, the fraud rate between the training
differences may cause a problem for certain machine dataset and testing dataset is almost equivalent.
learning models where it may affect the model Regardless of it, the data splitting has performed very well
performance in training, for instance unscaled dataset may in distributing the fraud class in both datasets. Having an
cause slow learning on ANN. Therefore, we decided to equal distribution of classes in each dataset can help
scale the time and amount feature by using standard scaler models to learn the characteristics of both non-fraud and
or mainly known as standardization. The purpose of fraud transactions [37].
standardization is to rescale the mean of the feature into 0
and the standard deviation of the feature into 1. The result C. Random Sampling
of the feature scaling can be seen in Table II. Furthermore,
below is the formula of standardization: For dealing with imbalanced datasets, a variety of
techniques exist in the practice. In this section, we decided to
𝑥− 𝜇 use random sampling to handle the imbalanced dataset. There
𝑧= (1)
𝜎 are two types of random sampling, which are undersampling
1
𝜇= ∑𝑁
𝑖=1 𝑥𝑖 (2) and oversampling. Undersampling is the process of reducing
𝑁
1
the quantity of majorities of sample, while oversampling is
𝜎 = √ ∑𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2 () producing more minority-class instances or by resampling
𝑁
some instances. According to Table III, it can be concluded that
Formula from (1) is the calculation of standardization the fraud class is the minority, and the non-fraud class is the
which it results in a normalized value based on the mean majority.
and standard deviation. Furthermore, formula from (2) is Since there are several studies conducted random sampling
the calculation of mean and formula from (3) is the in the training dataset [14][29], we decided to follow their way.
calculation of standard deviation. Moreover, testing dataset is not required since we want to
maintain the originality of the dataset. Furthermore, below are
TABLE III. DATASET DESCRIPTION (AFTER SCALING) several methods of random sampling that are used in this
research followed up with the results:
Features
Function a. Random Undersampling
Time Amount V1 V2 V3 ...
mean 0.00 0.00 0.00 0.00 0.00 ... Random Undersampling is a method of randomly
std 1.00 1.00 1.96 1.65 1.52 ... remove subset of data from the majority class to maintain
min -2.00 -0.35 -56.41 -72.72 -48.33 ... the balance between the majority and minority class of the
max 102.36 1.64 2.45 22.06 9.38 ... dataset [38]. The size of the majority class will be reduced
correspondingly to the size of the minority class. As a
b. Data Splitting and Distribution result, the minority and the majority class in the training
dataset have the amount of 343 samples.
When evaluating machine learning models, separating
data into training and testing sets is an essential part. In b. SMOTE (Synthetic Minority Oversampling Technique)
this research, we decided to split the data where the
majority of the dataset is used for training, and the Besides Random Undersampling, we decided to use the
remainder of the dataset is used for testing. More random oversampling method which is the SMOTE
precisely, the dataset is divided into training dataset which (Synthetic Minority Oversampling Technique). SMOTE
is obtained from 70% of the original dataset and testing involves interpolating between neighboring minority
dataset which is obtained from 30% of the original class instances. In this way, it can improve the total of the
dataset. Due to the highly imbalanced ratio of fraud class minority class by providing new minority examples in the
in the dataset, the distribution of the fraud class must be neighborhood. Moreover, the size of the minority class
considered. Therefore, stratified sampling is performed at will be increased correspondingly to the size of the
the same time as data splitting to ensure both datasets majority class. As a result, the minority and the majority
have the same distribution of the fraud class. Moreover, class in the training dataset have the amount of 199,020
the result of the data splitting and distribution can be seen samples.
in Table III.
D. Modelling
TABLE III. RESULT OF THE DATA SPLITTING AND
DISTRIBUTION This section explains the methods of building the models
that are used in this research, such as Random Forest, XGBoost,
Class Fraud rate and ANN. Based on the previous section, the training dataset is
Dataset Total
Fraud Non-fraud (Fraud/Total)
sampled by using Random Undersampling and SMOTE.
Training 344 199,020 199,364 0.00172 Hence, to evaluate the sampling method in certain models, we
decided to use three different training datasets, which are the going to set several parameters which can be seen in Table
original dataset, undersampled dataset, and SMOTE dataset for V. The parameters setting is applied to all the XGBoost
the model training. After all models are trained, the result of models.
their performance will be evaluated in the next section.
TABLE V. PARAMETER SETTINGS ON XGBOOST
a. Random Forest
Parameter form Value
Random forest (RF) methodology is used to address two Number of trees 200
main classes of problems: to construct a prediction rule in Learning rate 0.001
a supervised learning problem and to assess and rank Gamma 0
variables with respect to their ability to predict the
Maximum depth of a tree 6
response [39]. It uses the classification and regression
Minimum sum of instance weight in child 1
method based on the large number of decision tree
aggregation. There are 3 variants of Random Forest,
which are: c. ANN (Artificial Neural Network)

• How individual trees are constructed. This section involves building the ANN model in
detecting credit card fraud. ANN (Artificial Neural
• Based on the procedure used to generate
Network) is a model that consists of several connected
modified data sets on each tree.
layers where each layer contains a certain purpose. There
• The aggregation of each prediction on each tree
are 3 typical layers in ANN, such as input layer, hidden
to produce unique consensus prediction.
layer, and output layer. The input layer is the beginning
For the Random Forest that we implemented, each tree
layer for the workflow for ANN that used to receive
uses decrease of Gini impurity (DGI) as a splitting
variable input. The hidden layer is the layer located
criterion and selects the splitting predictor from a random
between the input and output layer that is used to
selected subset of predictors, and the predictions from the
transform the value according to the assigned function.
trees are aggregated through majority voting.
The output layer is the last layer that produces the result
Furthermore, RF has a feature called out-of-bag (OOB)
of the ANN.
error. OOB error is the average error frequency obtained
The implementation of ANN starts with deciding what
from data set observations that are predicted by the trees
layers are going to be used in the network. The detailed
which are OOB. This validation can be a good estimator
list of layers in ANN is provided sequentially in Table VI.
of the errors that are to be expected for independent data.
The model is also added with Adam optimizer to modify
The implementation of Random Forest is used from
the attributes in the ANN. The loss function that is used
ensemble learning module that is provided by Sklearn
in this model is binary cross entropy. Moreover, every
library. To improve the model’s performance, we decided
implementation of ANN is used from the Keras module
to set some of the parameters which can be seen in Table
that is provided by TensorFlow library. The layers are
IV. The parameters setting is applied to all Random Forest
implemented by using the Keras Sequential model.
models.
TABLE VI. LIST OF LAYERS IN ANN
TABLE IV. PARAMETERS SETTING ON RANDOM FOREST
Layer form Description
Parameter form Value
Input layer Receive the training dataset
Number of trees 200
Output 300 units with ReLU
Maximum depth of the tree 15 Dense layer (ReLU activation)
activation
Criterion split Gini Batch normalization Normalize values
Minimum number of samples to split 2 Randomly set input to 0 with the
Dropout
Minimum number of samples to be rate of 0.3 (prevent overfitting)
1 Output 300 units with ReLU
leaf node Dense layer (ReLU activation)
Maximum leaf nodes None activation
Batch normalization Normalize values
Maximum samples None
Randomly set input to 0 with the
Number of jobs run in parallel None Dropout
rate of 0.3 (prevent overfitting)
Output layer with Sigmoid
Dense layer (Sigmoid activation)
b. XGBoost activation (value either 0 or 1)

This section involves building the XGBoost model in After the model is compiled, then the model can directly
detecting credit card fraud. XGBoost (Extreme Gradient fit the given training dataset. To improve the model's
Boosting) is an ensemble learning method that performance, there are several parameters that need to be
implements several machine learning algorithms under assigned to help the model perform effectively in the
the gradient boosting framework [40]. The training. Moreover, we also decided to use early stopping
implementation of the model is used from the XGBoost in the ANN to prevent the model from becoming
library. To improve the model's performance, we are
overfitting. Early stopping is activated when the model in predicting positive class (fraud) but performing well in
did not gain any improvement in reducing the training loss predicting negative class (non-fraud). Therefore, Random
in certain epochs. If the model is stopped early, then the Undersampling is not an effective method for the credit card
result of the training will return the model that performs fraud detection.
the least loss. Moreover, the parameters setting is Furthermore, models conducted with SMOTE bring a
provided in Table VII. balance precision-recall score for Random Forest and ANN. On
the other hand, XGBoost conducted with SMOTE is showing a
TABLE VII. PARAMETER SETTINGS ON ANN poor performance in precision. The XGBoost is overfitting due
Parameter form Value to the excessive number of samples. Hence, SMOTE method
Batch size 2100
may need to be considered for certain model.
Number of epochs 100 V. CONCLUSIONS
Minimum improvement 0.001
This paper presented the results of Random Forest,
Number of epochs to wait if there is no improvement 5
XGBoost, and ANN in detecting credit card fraud. There are
two sampling methods for dealing with imbalance dataset
IV. RESULTS AND DISCUSSION which are Random Undersampling and SMOTE. Overall, the
models are still performing quite good without any help from
This section shows the results of several models with the sampling methods. Random Undersampling may not be
different kind of sampling methods in dealing with imbalanced appropriate for improving the model performance since it is
dataset. Each model was trained and evaluated using the credit randomly removed majority class which it can waste certain
card fraud dataset. After that, it is followed up with a discussion potential information. SMOTE may be suitable for certain
about the relation between each model performance. models that are not too sensitive to overfitting. It can be seen
Due to imbalance dataset, evaluation metrics need to be from the results that Random Forest and ANN shows a balance
discussed prior. As suggested from research [36], precision- score between precision and recall. Nevertheless, conducted
recall metrics are relatively better in measuring model models can still be improved with other kind of methods and
performance when the classes are imbalance. It is claimed that different sets of parameters. Hence, more research is needed.
precision-recall metrics are more sensitive to imbalance classes. Moreover, proposed sampling methods may not fully improve
Hence, precision-recall metrics such as precision, recall, and the model performance in detecting credit card fraud.
AUC-PR (area under curve in precision-recall curve) are Therefore, this topic can become a future study to find an
assigned as the evaluation metrics for this research. effective method in dealing with imbalanced dataset.
TABLE VIII. LIST OF MODELS PERFORMANCE
REFERENCES
Model Precision Recall AUC-PR
[1] S. Gupta and R. Johari, “A new framework for credit card transactions
RF 92.79 69.59 81.05 involving mutual authentication between cardholder and merchant,” in
RF + RU 7.08 85.81 46.28 Proceedings - 2011 International Conference on Communication Systems
and Network Technologies, CSNT 2011, 2011, pp. 22–26. doi:
RF + SMOTE 72.56 80.41 76.33
10.1109/CSNT.2011.12.
XGB 81.67 66.22 73.80
[2] J. M. Pavía, E. J. Veres-Ferrer, and G. Foix-Escura, “Credit card incidents
XGB + RU 3.67 87.16 45.25 and control systems,” International Journal of Information Management,
XGB + SMOTE 13.34 81.76 47.39 vol. 32, no. 6, pp. 501–503, 2012, doi: 10.1016/j.ijinfomgt.2012.03.003.
ANN 90.08 73.65 79.93 [3] A. Somasundaram and S. Reddy, “Parallel and incremental credit card
fraud detection model to handle concept drift and data imbalance,” Neural
ANN + RU 6.72 85.81 34.12
Computing and Applications, vol. 31, pp. 3–14, 2019, doi:
ANN + SMOTE 82.22 75.00 77.92 10.1007/s00521-018-3633-8.
RF: Random Forest [4] [4] I. Sadgali, N. Sael, and F. Benabbou, “Performance of machine
XGB: XGBoost learning techniques in the detection of financial frauds,” in Procedia
ANN: Artificial Neural Network Computer Science, 2019, vol. 148, pp. 45–54. doi:
RU: Random Undersampling 10.1016/j.procs.2019.01.007.
SMOTE: Synthetic Minority Oversampling Technique
[5] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang, “Random forest
for credit card fraud detection,” in 2018 IEEE 15th International
Table VIII shows the performance scores of precision, Conference on Networking, Sensing and Control (ICNSC), 2018, pp. 1–
recall, and AUC-PR of all models described in the previous 6. doi: 10.1109/ICNSC.2018.8361343.
section. Based on the results, Random Forest without any [6] J. M. Pavía, E. J. Veres-Ferrer, and G. Foix-Escura, “Credit card incidents
sampling method achieved the highest precision and AUC-PR and control systems,” International Journal of Information Management,
vol. 32, no. 6, pp. 501–503, 2012, doi: 10.1016/j.ijinfomgt.2012.03.003.
which are 92.79 and 81.05. On the other hand, XGBoost with
[7] A. Dal Pozzolo, O. Caelen, Y. A. le Borgne, S. Waterschoot, and G.
Random Undersampling achieved the highest recall which is Bontempi, “Learned lessons in credit card fraud detection from a
87.16. practitioner perspective,” Expert Systems with Applications, vol. 41, no.
Models conducted with Random Undersampling have a 10, pp. 4915–4928, 2014, doi: 10.1016/j.eswa.2014.02.026.
poor precision score and decent recall score. This means that [8] ACFE, “The 2007 Fraud Examiners Manual,” ACFE, 2007.
the models with Random Undersampling are underperformed [9] G. Pang, C. Shen, and A. van den Hengel, “Deep anomaly detection with
deviation networks,” in Proceedings of the ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, Jul. 2019, pp. Conference on Intelligent Computing and Control Systems (ICICCS), pp.
353–362. doi: 10.1145/3292500.3330871. 1264–1270, 2020.
[10] A. RB and S. K. KR, “Credit card fraud detection using artificial neural [26] Trivedi, Naresh & Simaiya, Sarita & Kumar Lilhore, Dr & Sharma, and
network,” Global Transitions Proceedings, vol. 2, no. 1, pp. 35–41, Jun. Sanjeev, “An Efficient Credit Card Fraud Detection Model Based on
2021, doi: 10.1016/j.gltp.2021.01.006. Machine Learning Methods,” MATTER: International Journal of Science
[11] T. Sarkar, “XBNet : An Extremely Boosted Neural Network,” Jun. 2021, and Technology, 2020.
[Online]. Available: http://arxiv.org/abs/2106.05239 [27] P. Th. Ib. Sk. Sm. M, “Credit Card Fraud Detection Using Deep Learning
[12] D. and T. Y. and Z. L. Fu Kang and Cheng, “Credit Card Fraud Detection Technique,” Proceedings - 2018 4th International Conference on
Using Convolutional Neural Networks,” in Neural Information Advances in Computing, Communication and Automation, ICACCA
Processing, 2016, pp. 483–490. 2018 (2018), pp. 32–36, 2018, doi: 10.1109/RAICS51191.2020.9332497.
[13] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang, “Random forest [28] J. Jurgovsky et al., “Sequence classification for credit-card fraud
for credit card fraud detection,” in ICNSC 2018 - 15th IEEE International detection,” Expert Systems with Applications, vol. 100, pp. 234–245, Jun.
Conference on Networking, Sensing and Control, 2018, pp. 1–6. doi: 2018, doi: 10.1016/j.eswa.2018.01.037.
10.1109/ICNSC.2018.8361343. [29] F. Zhang, G. Liu, Z. Li, C. Yan, and C. Jiang, GMM-based
[14] V. N. Dornadula and S. Geetha, “Credit Card Fraud Detection using Undersampling and Its Application for Credit Card Fraud Detection.
Machine Learning Algorithms,” in Procedia Computer Science, 2019, 2020. [Online]. Available: http://www.ieee.org/publications
vol. 165, pp. 631–641. doi: 10.1016/j.procs.2020.01.057. [30] H. Wang, P. Zhu, X. Zou, and S. Qin, “An ensemble learning framework
[15] K. Gupka and V. Mall, “Comparative Analysis of Classification for credit card fraud detection based on training set partitioning and
Techniques for Credit Card Fraud Detection,” International Research clustering,” Proceedings - 2018 IEEE SmartWorld, Ubiquitous
Journal of Computer Science, 2022, doi: 10.26562/irjcs.2022.v0902.00. Intelligence and Computing, Advanced and Trusted Computing, Scalable
Computing and Communications, Cloud and Big Data Computing,
[16] P. H. Tran, K. P. Tran, T. T. Huong, C. Heuchenne, P. HienTran, and T. Internet of People and Smart City Innovations,
M. H. Le, “Real Time Data-Driven Approaches for Credit Card Fraud SmartWorld/UIC/ATC/ScalCom/CBDCo, pp. 94–98, 2018, doi:
Detection,” in Proceedings of the 2018 International Conference on E- 10.1109/SmartWorld.2018.00051.
Business and Applications, 2018, pp. 6–9. doi:
10.1145/3194188.3194196. [31] S. Mittal and S. Tyagi, “Performance Evaluation of Machine Learning
Algorithms for Credit Card Fraud Detection,” 2019 9th International
[17] A. Thennakoon, C. Bhagyani, S. Premadasa, S. Mihiranga, and N. Conference on Cloud Computing, Data Science & Engineering
Kuruwitaarachchi, “Real-time Credit Card Fraud Detection Using (Confluence), pp. 320–324, 2019.
Machine Learning,” 2019 9th International Conference on Cloud
Computing, Data Science & Engineering (Confluence), pp. 488–493, [32] V. N. Dornadula and S. Geetha, “Credit Card Fraud Detection using
2019. Machine Learning Algorithms,” in Procedia Computer Science, 2019,
vol. 165, pp. 631–641. doi: 10.1016/j.procs.2020.01.057.
[18] S. Khatri, A. Arora, and A. P. Agrawal, “Supervised Machine Learning
Algorithms for Credit Card Fraud Detection: A Comparison,” 2020 10th [33] U. Fiore, A. de Santis, F. Perla, P. Zanetti, and F. Palmieri, “Using
International Conference on Cloud Computing, Data Science & generative adversarial networks for improving classification effectiveness
Engineering (Confluence), pp. 680–683, 2020. in credit card fraud detection,” Information Sciences, vol. 479, pp. 448–
455, 2019, doi: https://doi.org/10.1016/j.ins.2017.12.030.
[19] C. V. Priscilla and D. P. Prabha, “Influence of optimizing xgboost to
handle class imbalance in credit card fraud detection,” Proceedings of the [34] Y. Lucas et al., “Multiple Perspectives HMM-Based Feature Engineering
3rd International Conference on Smart Systems and Inventive for Credit Card Fraud Detection,” in Proceedings of the 34th
Technology, ICSSIT 2020, no. Icssit 2020, pp. 1309–1315, 2020, doi: ACM/SIGAPP Symposium on Applied Computing, 2019, pp. 1359–
10.1109/ICSSIT48917.2020.9214206. 1361. doi: 10.1145/3297280.3297586.
[20] A. S. Hussein, R. S. Khairy, S. M. Mohamed Najeeb, and H. T. Salim [35] K. Divakar, “Performance Evaluation of Credit Card Fraud Transactions
ALRikabi, “Credit Card Fraud Detection Using Fuzzy Rough Nearest using Boosting Algorithms,” 2019.
Neighbor and Sequential Minimal Optimization with Logistic [36] S. Makki, Z. Assaghir, Y. Taher, R. Haque, M. S. Hacid, and H.
Regression,” International Journal of Interactive Mobile Technologies, Zeineddine, “An Experimental Study With Imbalanced Classification
vol. 15, no. 5, pp. 24–42, 2021, doi: 10.3991/ijim.v15i05.17173. Approaches for Credit Card Fraud Detection,” IEEE Access, vol. 7, pp.
[21] I. Sohony, R. Pratap, and U. Nambiar, “Ensemble Learning for Credit 93010–93022, 2019, doi: 10.1109/ACCESS.2019.2927266.
Card Fraud Detection,” in Proceedings of the ACM India Joint [37] Y. Sahin and E. Duman, “Detecting credit card fraud by ANN and logistic
International Conference on Data Science and Management of Data, regression,” in 2011 International Symposium on Innovations in
2018, pp. 289–294. doi: 10.1145/3152494.3156815. Intelligent Systems and Applications, Jun. 2011, pp. 315–319. doi:
[22] A. A. Taha and S. J. Malebary, “An Intelligent Approach to Credit Card 10.1109/INISTA.2011.5946108.
Fraud Detection Using an Optimized Light Gradient Boosting Machine,” [38] T. Hasanin and T. Khoshgoftaar, “The Effects of Random Undersampling
IEEE Access, vol. 8, pp. 25579–25587, 2020, doi: with Simulated Class Imbalance for Big Data,” in 2018 IEEE
10.1109/ACCESS.2020.2971354. International Conference on Information Reuse and Integration (IRI),
[23] S. P. Maniraj, A. Saini, S. Ahmed, and S. D. Sarkar, “Credit Card Fraud 2018, pp. 70–79. doi: 10.1109/IRI.2018.00018.
Detection using Machine Learning and Data Science,” [39] A. L. Boulesteix, S. Janitza, J. Kruppa, and I. R. König, “Overview of
INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & random forest methodology and practical guidance with emphasis on
TECHNOLOGY (IJERT), vol. 8, no. 9, 2019. computational biology and bioinformatics,” Wiley Interdisciplinary
[24] Rajora et al., “A Comparative Study of Machine Learning Techniques for Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 6, pp. 493–
Credit Card Fraud Detection Based on Time Variance,” pp. 1958–1963, 507, Nov. 2012, doi: 10.1002/widm.1072.
2018. [40] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in
[25] R. Sailusha, V. Gnaneswar, R. Ramesh, and G. R. Rao, “Credit Card Proceedings of the ACM SIGKDD International Conference on
Fraud Detection Using Machine Learning,” 2020 4th International Knowledge Discovery and Data Mining, Aug. 2016, vol. 13-17-August-
2016, pp. 785–794. doi: 10.1145/2939672.2939785.

You might also like