Professional Documents
Culture Documents
The Behavioral Sign of Account Theft: Realizing Online Payment Fraud Alert
Cheng Wang1,2,3∗
1
Department of Computer Science, Tongji University, Shanghai, China
2
Key Laboratory of Embedded System and Service Computing, Ministry of Education, Shanghai, China
3
Shanghai Institute of Intelligent Science and Technology, Tongji University
cwang@tongji.edu.cn
4611
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Special Track on AI in FinTech
30 100
Fraud detection methods based on machine learning and
25
80 artificial intelligence techniques are becoming increasingly
Percentage (/%)
Percentage (/%)
20
60 popular. Supervised learning techniques make use of user-
15 s’ behavior data to create a classification model to deter-
40
10 mine whether a user’s ongoing behavior deviates from the
5
20 global behavior [Nami and Shajari, 2018]. From the point
0 0
of view the fraudster, Jing et al. [Jing et al., 2019] have
(0,0.5] (0.5,3] (3,6] (6,12] (12,24](24,48] (0,0.5] (0.5,3] (3,6] (6,12] (12,24](24,48]
Time Interval(/h)
( 4 8 , ∞ )
4612
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Special Track on AI in FinTech
4613
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Special Track on AI in FinTech
Min(Interval) ... Sum(Amt) Avg(Amt) ... Face Password Fingerprint Mer7 Mer12 Addr
2 ... 880.85 176.17 ... 0.2 0.4 0.2 0.4 0.2 25
Updating Data Preprocessing Feature Aggregation operations. When the risk value is higher than a predefined
A Completed
Sequence & Discrete Features
Transaction threshold, the system will immediately lock the account and
Window Feature Selection Continuous Features
Other Features
disable its following-up transactions.
Historical Transaction Sequence
4614
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Special Track on AI in FinTech
104 103
# accounts
103
102 4.4 Comparison of Benchmark Classifiers
102
We compare the performance of four popular classification
101
101 models, including XGBoost (XGB), Random Forest (RF), L-
100 0 100 0
ogistic Regression (LR), and Deep Neural Network (DNN).
10 101 102 103 104 105 10 101 102 103
In the DNN model, the sigmoid function is used as activation
# transactions # fraudulent transactions
function and there are 3 hidden layers in addition to the input
and output layers; the neurons at the three hidden layers are
Figure 5: The imbalanced distributions of transactions and fraudu-
lent transactions.
20, 30 and 20, respectively. We set the window size w = 2
and repeat this process 3 times. Table 3 shows the average
precision, recall and F1-score on testing data. Note that we
size of transaction sequence windows, denoted as w, is a crit- set F P R = 0.001, F P R = 0.0005 and F P R = 0.0001, re-
ical parameter. It not only decides how much historical in- spectively. We learn that XGBoost not only performs better,
formation of an account is used, but also determines the start but shows superior stability and robustness. Then, we choose
point of the risk prediction task. XGBoost as the classifier of benchmark interim method.
For each account, we set a fixed sequence window size w. 4.5 Impact of Window Size on Prediction
Then, for each user, we only start predicting the risk of ac-
count after the user completes its w − th transaction. For ex- We need to analyze the impact of the transaction sequence
ample, if we set w = 3, we can predict the risk of an account window size on the perdition performance. We increase the
only after it generates three transactions. window size w from 1 to 5, and compare the performance of
corresponding models. The size of transaction sequence win-
4.3 Evaluation Metrics dow decides when to start the risk evaluation. For example,
when w = 1, we can predict the risk of an account after it
In general, there are many metrics to evaluate the per- generates one transaction; but when w = 5, we cannot make
formance of binary classifiers, such as AUC (area Under a prediction until the account finishes five transactions. In
the ROC curve) score, F-measure, and KS (Kolmogorov- order to analyze the impact of window size reasonably, our
Smirnov) score. However, these metrics cannot directly re- prediction starts from the sixth transaction for each account
flect the economic influence of models, especially in the case in the testing data.
of imbalanced data. With the consideration of practical us- Figure 6 shows that the window size has a significant im-
age, we use precision, recall, false positive rate and F1-score pact on performance. The recall and F1-score of all models
as evaluation metrics. As a classification model often out- increase with the increase of window size given fixed F P R.
puts the numerical probability of a transaction being fraudu- The reason lies in that larger window size ensures more be-
lent, we need to set a threshold to determine whether fraud havioral information from account holders. In practice, we
occurs. The threshold actually provides a tradeoff between can choose a properly large window size, but for our test-
precision and recall, and may be different for different clas- ing data, when the window size increases by 1, almost 20
sifiers. We adopt false positive rate (FPR) as a principle for thousand transactions are made unpredictable. For example,
choosing thresholds. when w = 5, there are about 0.1 million transactions that
With a fixed value of F P R, we can get different threshold- cannot be predicted in our testing data. As an explanation,
s for different classifiers. We can also calculate their recall, most accounts have only a few transactions. Therefore, in our
precision and F1-score under the fixed FPR. This can be used experiment, we do not consider window size larger than 5.
for model selection. For models with the same F P R value,
large values of the chosen metrics suggest good model per- 4.6 Impact of Class Ratio for Prediction
formance. In our online payment scenario, F P R must be We compare the effect of class ratio CR on the performance
smaller than 0.001, otherwise the model makes no sense. of classifiers. In our experiment, we increase the class ra-
4615
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Special Track on AI in FinTech
F1-score
XGBoost
F1-score
0.8
Recall
RandomForest
Recall
F1-score
F1-score
0.5 RansomForest 0.3
Recall
0.4
Recall
tio CR from 10 to 70, and set the window size w = 2. Merchant historical behaviour features. Similar to the
We repeat this process 3 times, Figure 7 presents the aver- users’ historical behaviour feature extraction, we extract 9 s-
age recall, F1-score of testing data when F P R = 0.001 and tatistical features related to merchants.
F P R = 0.0001, respectively, where the x-axis is class ratio. User-merchant features. We extract 2 features related to
From Figure 7, we obtain that when F P R = 0.001, the user-merchant, including the number of transactions per user
change of CR has no significant impact on all models. How- in different merchants and the average amount per transac-
ever, if we decrease the F P R to 0.0001, the performance tion.
of XGBoost has hardly changed with the increase of CR , Ongoing payment features. To extract features of an on-
but the performance of Random Forest is decreased great- going transaction, we digitize its non-numeric fields, and cal-
ly. Although the change of CR has little effect on Logistic culate the difference between the ongoing transaction and its
Regression and DNN, their performance is very poor when previous transaction, such as the time intervals and amount
F P R = 0.0001. differences. We extract 12 features related to ongoing trans-
action in all.
4.7 Comparison with Interim Detection Figure 8 illustrates the comparison of the receiver operat-
In this part, we compare our ex-ante fraud detection method ing characteristic (ROC) between two methods. Although the
with the traditional interim fraud detection method. As stated interim fraud detection method outperforms ours, the advan-
before, we adopt XGBoost as the classifier in both methods. tage is very limited. The strong point of our method is that
Activation detection on the fly is widely-used fraud detec- it can prevent fraudulent transactions from occurring, while
tion method. Different from ours, the interim fraud detection still keeping very good performance. This enables financial
method examines every currently ongoing transaction of an platforms to inform users of account risks in advance, and
account by comparing it with the account’s historical transac- remind them to protect their accounts timely.
tions. Once the currently ongoing transaction is determined
to be fraudulent, the system will immediately terminate the 5 Enhanced Anti-Fraud Scheme
transaction. In order to train an effective interim fraud detec- As the fraud prevention and fraud detection are the two most-
tion model, it is important to extract useful features with data ly used schemes to combat fraud in practice, we design a hy-
and business experience. The feature extraction process for brid anti-fraud system that combines both methods.
interim fraud detection is described as follows:
5.1 Hybrid System Architecture
User historical behaviour features. We use the most re-
cent month and day as the time interval to calculate the pay- As depicted in Figure 9, the enhanced scheme consists of two
ment behaviour of users in these intervals. We extract a set modules as follows:
of user-behaviour-related features, including the statistics of Ex-ante risk prediction module. When an account finish-
transaction volume, the statistics of transaction amount, the es a transaction xt , the system immediately uses the ex-ante
trading period and the statistics of other historical trading risk prediction method to predict the risk of account. If the
characteristics. risk is too high, the system will immediately lock the account
4616
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Special Track on AI in FinTech
1.0
Ex-ante Risk Prediction Module Interim Fraud Detection Module
0.9 Terminating the Ongoing Transaction
True Positive Rate
96.5 1.8 40
and prohibit all subsequent transactions. Otherwise, the sys- 96.0
Recall
FPR 1.6 35
tem will allow the occurring of next transaction, say xt+1 . 1.4
Percentage (%)
30
95.5
Recall (%)
FPR (%)
1.2 25
95.0 1.0 20
Interim fraud detection module. If the ex-ante module 94.5 0.8
15
does not lock the account and the account is making a new 0.6
94.0 10
0.4
transaction xt+1 , the system will use the interim fraud detec- 93.5 0.2 5
tion module to check the ongoing transaction xt+1 . If xt+1 is 93.0 0.0 0
0.0 0.2 0.4 0.6 0.8 1.0 (0,1] (1,5] (5,10] (10,60] (60,300](300,+inf)
nated immediately and the account will be locked. Otherwise, Figure 10: Recall and FPR Figure 11: The cumulative distribu-
it will wait for the ongoing transaction until it completes, up- with different thresholds. tion of predicted frauds.
date the transaction sequence of the account, and re-estimate
its fraudulent risk using the ex-ante risk prediction module. 6 Conclusion and Future Work
Different from most existing studies which usually aimed to
5.2 Anti-Fraud Performance Evaluation
design fraud detection methods of interim patterns, this work
We have applied the system to the bank’s one-month B2C takes a different point of view to investigate whether transac-
transaction data. There are 1028437 transaction records tion fraud can be detected in an ex-ante manner in an online
in this month, including 1003539 normal transactions and payment service. We have obtained an insightful finding that
24898 fraudulent transactions. In the ex-ante risk prediction there is really a relation between a user’s historical paymen-
module, we set w = 2. The core of our anti-fraud system t behavior and his/her account compromise. Based on this
is to select an appropriate risk threshold (RT) for the ex-ante finding, we can predict a fraudulent transaction before its oc-
risk prediction model. Figure 10 presents the recall and F P R currence without on-going transaction behaviors which are
with different RT, from which we can observe that the small- necessary for any interim fraud detection methods. Based on
er the risk threshold, the larger the recall and F P R. It means a real-world dataset, it is validated that this ex-ante fraud pre-
that we can prevent and detect more fraudulent transactions at diction can prevent more than 80% of fraudulent transaction-
the cost of interrupting more legitimate ones. When RT = 1, s before their actual occurrences. Moreover, utilizing their
it means that we only use the interim fraud detection method, complementary effects, we have designed a real-time fraud
without predicting the risk of accounts ahead of time. Al- prevention and detection system by combining the ex-ante
though it interrupts very few legitimate transactions, many and interim methods to further improve the effectiveness.
fraudulent transactions cannot be detected. For future work, we will solve the cold-start problem of
In practical applications, it is necessary to ensure that the risk prediction for some accounts whose historical transaction
false positive rate is less than 0.1%. Next, we present the re- count is smaller than the size of transaction sequence win-
sults on the performance of ex-ante prediction module, inter- dows. In addition, it is also interesting to investigate how to
im detection module, and the integrated system, respectively, overcome the concept drift problem [Lu et al., 2019] in risk
when RT = 0.75: prediction.
• Our real-time fraud prevention and detection integrated
system can detect 94.46% of fraudulent transactions with less Acknowledgments
than 0.09% of legitimate transactions interrupted. The author sincerely thanks Pengfei Shu for his great effort-
• It is remarkable that 80.42% of fraudulent transactions s on this work, and thanks the anonymous referees for their
can be prevented by the ex-ante module with less than 0.04% helpful comments and suggestions. The research of author
of legitimate transactions interrupted, leaving only 14% of is partially supported by the National Natural Science Foun-
fraudulent transactions to be detected when they are ongoing. dation of China (NSFC) under Grants 61972287, the Major
• Figure 11 shows the distribution of fraudulent transac- Project of Ministry of Industry and Information Technology
tions over time intervals that the system can predict ahead of of China under Grant [2018] 282, and Municipal Human Re-
time. We observe that nearly 70% of fraudulent transactions sources Development Program for Outstanding Young Tal-
can be predicted more than 5 seconds ahead of occurring. ents in Shanghai.
4617
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)
Special Track on AI in FinTech
4618