Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037

ISSN: 1004-9037
https://sjcjycl.cn/
DOI: 10.5281/zenodo.7922883
BIG DATA ANALYTICS FOR FRAUD DETECTION IN FINANCIAL

TRANSACTIONS
Maya B Dhone
Assistant Professor, Department of Information Technology, MVSR Engineering College
E.Nitya
Assistant Professor, Department of CSIT, CVR College of Engineering
Abstract: Credit cards, mobile wallets, and other electronic payment methods are growing in
popularity. Online transactions are increasingly the norm. Global fraud increases as electronic
payments increase. As credit cards and online shopping become increasingly popular, fraud
has skyrocketed. Fraud detection and prevention are being prioritized due to the global
economy. The trillion-dollar fraud business threatens financial loss and financial institution
trust. Financial fraud detection could avert trillions in losses. Thus, detecting fraud is one of
the hardest real-world problems. Unbalanced datasets with more "normal" samples than fraud
cases impair fraud detection. Training cutting-edge machine learning classifiers is complicated
by rapid fraud changes. If there were more labelled datasets in real-world settings, fraud
detection solutions could learn from the events in the training dataset to identify fraudulent
patterns. Businesses need a fraud detection solution that can be trained on unlabeled financial
transaction datasets, which are widely available in financial transaction systems, to accurately
detect fraudulent occurrences. This paper proposes a fraud detection approach based on
memory compression methodology (FDMCM) machine learning approach to enhance
detection.. We suggest using a machine learning network to identify fraudulent transactions
and a novel nonlinear embedded machine learning base autoencoding layered technique to
correct dataset imbalances. The suggested model has 93% success with an 80:20 training-
validation dataset accuracy ratio.
Keywords: Big Data Analytics, Apache Spark, SMOTE, Ensemble Learning Methods, Fraud
Detection, Memory Networks, Financial Fraud, Sequential Model, Machine Learning, Memory
Compression, and Classification and Regression in Finance, Preventing Fraud.
I. Introduction
Because digital financial transactions are becoming increasingly commonplace, it is essential
for banks and other financial institutions to have trustworthy fraud detection systems in place.
Over the course of the past few years, big data analytics has established itself as a vital resource
for the fight against financial crime. Big data analytics is helpful for the detection of fraudulent
conduct because it can analyze huge amounts of data derived from a variety of sources, such
as transaction records, customer profiles, and historical data [1]. Each of these sectors—
finance, government, healthcare, public sector, and insurance—are susceptible to fraud, which
can manifest itself in a variety of guises. Laundering money, dangers to cybersecurity, evading
taxes, making fraudulent insurance claims, forging bank checks, stealing identities, and
Journal of Data Acquisition and Processing Vol. 38 (3) 2023 290

BIG DATA ANALYTICS FOR FRAUD DETECTION IN FINANCIAL TRANSACTIONS
financing terrorist organizations are all common sorts of fraud. Detecting fraudulent activity is
an issue that garners a lot of interest in the data mining field. Detecting fraudulent use of credit
cards is the primary focus of the vast majority of academic research conducted in the subject
of ensuring the safety of financial transactions. Typically, fraudulent dealings are characterized
by a number of complexities. They are exceedingly unusual when put into the perspective of
the millions of transactions that take place every day, and the manipulators who are behind
them are well-organized and have thought-out their plans. They would conduct study into the
ideas that underpin target fraud detection systems, in particular systems that are expert-driven,
and then develop ways to circumvent those systems [2]. Additionally, fraudulent transactions
often come to a conclusion very fast, which is another reason why real-time fraud detection
systems are absolutely necessary. Recent research has made numerous attempts to develop
efficient models for detecting fraud; nonetheless, there are still many questions that have not
been answered. To begin, the term "concept drift"refers to the ever-changing and dynamic
character of user patterns, regardless of whether or not they are real. The patterns of
transactions are influenced by a wide variety of circumstances, including as consumption and
seasonality; fraudulent manipulators, on the other hand, need to continuously change their ways
to avoid being discovered. In addition, the context plays an extremely important role in
fraudulent transactions, and sequential settings are the fundamental building blocks of fraud
detection algorithms. In spite of this, sequential fraud detection is a field that has undergone a
relatively limited amount of research.Finally, logs of financial systems often include several
diverse forms of discrete data. This is because logs are multimodal documents, with different
attributes stated for each mode of operation.There are alternatives for preventing and detecting
fraud that are both proprietary and open source software can be utilized. The dashboard, data
import and export, data visualization, customer relationship management integration, calendar
management, budgeting, scheduling, multi-user capabilities, password and access
management, application programming interfaces two-factor authentication, billing, and
management of customer databases are all features that are typically included in fraud analytics
software [3].
Figure 1. Basic Architecture of Fraud Detection System

Each of these sectors—finance, government, healthcare, public sector, and insurance—are

susceptible to fraud, which can manifest itself in a variety of guises. Laundering money,
dangers to cybersecurity, evading taxes, making fraudulent insurance claims, forging bank
checks, stealing identities, and financing terrorist organizations are all common sorts of fraud.
Businesses are increasingly adopting cutting-edge technologies for the detection and
prevention of fraudulent activity, as well as risk management practices, as a means of
combating the growth of potential entry points for fraudulent activity. These methods generate
a risk of fraud score by combining large data sources with real-time monitoring and employing
adaptive and predictive analytics techniques, such as Machine Learning. In other words, they
integrate big data with real-time monitoring [4]. The use of data analytics, fraud detection
software and tools, and a fraud detection and prevention program all work together to assist
organizations anticipate typical fraud strategies, automatically cross-reference data, manually
monitor transactions and crimes in real time, and decode sophisticated schemes. There are
alternatives for preventing and detecting fraud that are both proprietary and open-source
software can be utilized. Big data analytics can be used in real-time surveillance, behavioral
analysis, machine learning algorithms, social network analysis, and anomaly detection to help
identify fraudulent financial transactions. These techniques can all be found in social network
analysis. By maintaining a tight eye on all incoming and departing cash, financial institutions
are able to respond swiftly and effectively to any fraudulent acts that may occur. Behavioral
analysis can assist reveal general patterns and trends in customer behavior that may signify
fraudulent actions, whereas machine learning algorithms can be trained to recognize specific
patterns of fraudulent conduct [5]. A financial transaction that is out of the norm and may hint
to fraudulent behavior can be uncovered through the use of anomaly detection, and social
network analysis can assist in the discovery of potential instances of insider fraud or collusion.

Figure 2. Classification of Various Fraud Detection System

Figure 2. depicts the classification of various fraud detection and prevention system used for
transaction handling system in various monetary system
A. Ad-hoc Analysis& Sampling: Ad hoc testing for fraudulent transactions may reveal
niche-specific information about its use. A hypothesis screens financial transactions for
fraud in this technique. Ad hoc testing involves laborious algorithms and queries.
B. Transaction samples with fraud risks might expose outliers in ad hoc research. These
methods work well on sample-sized datasets but fail on massive datasets.
C. Predictive Analysis:Predictive analytics focuses on creating a model that accurately
predicts an event. Predictive analysis works well with suitable training sets. This
method cannot detect fraud not in the training data.
D. Continuous Analysis:This technique lets users track their finances and actions in real
time. This algorithm adapts to real-time data. Continuous analytics, unlike rule-based
analytics, can reveal new insights. Hoaxes show consistency.
E. Advanced Behavioral&Cognitive Analysis:Big Data makes ODL data structured or
unstructured. Apache Hadoop, Apache Storm, and Google BigQuery help parallelize
huge data processing.It transformed data processing. Deep analytics techniques go from
discrete structured to networked unstructured and real-time data processing. Deep
analytics can detect fraud by analyzing average balance, geographic location, and
transaction trends. Real-time algorithms detect suspicious behaviors after analyzing
similar attacks. Danske Bank struggled with cyber security for years, with a 40%
detection rate and 1,200 daily false positive warnings. Deep learning and advanced
analytics increased the bank's operating profit and reduced false-positives by 60%.
II. Review of Literature

This section provides survey that investigates the opportunities and problems connected with
using big data analytics to detect fraudulent activity in mobile payment systems. In this
research, a novel deep learning-based fraud detection system is proposed, after a review of
various machine learning techniques that can be utilized for the identification of fraudulent
activity.Xiong et al. (2019) study online review fraud using big data analytics. The authors
offer a unique strategy for detecting false internet reviews using text mining and sentiment
analysis [6]. The authors also discuss each method's drawbacks and provide suggestions for
their future research. Nguyen (2019) compares big data analytics solutions for financial fraud
detection. This study compares decision trees, support vector machines, and neural networks
using real-world data [7]. Kshetri and Voas wrote "Implications of Big Data Analytics and
Cybersecurity for Privacy and Consumer Protection." (2016). The authors explore big data
analytics and cybersecurity challenges and possible remedies [8]. Dahiya et al. reviewed bank
fraud detection using big data analytics. (2019). The study evaluates many machine learning
methods for fraud detection using real-world data [9]. Cao et al. use big data analytics to detect
mobile app fraud. (2018). The authors propose a novel fraud detection system using multiple
machine learning algorithms. Logistic regression, decision trees, and support vector machines
[10]. Chawla and Davis (2013) wrote about big data and personalized medicine. Big data
analytics improves healthcare results in a patient-centered manner [11]. Chen et al. examine

big data analytics and official statistics. (2018). Big data analytics may help official statistics,
and the study explores data gathering, processing, and visualization [12]. Chae and Kim
examine the relationship between big data analytics, commercial value, and market success in
Korean mobile gaming. (2018). The authors analyse mobile game data to demonstrate how big
data analytics affects corporate value and market performance [13].Al-Hadi et al. (2019) study
big data analytics for fraud detection using classifiers. The authors describe a novel fraud
detection system using decision trees, logistic regression, and neural networks. The big data
analytics in healthcare fraud detection. The authors test numerous machine learning methods
for fraud detection using real-world data.Sivarajah et al. (2017) critique big data concerns and
analytical tools. The article offers several big data analytics solutions. These solutions, data
quality, privacy, and security are discussed. A detailed study ofbig data analytics for healthcare
fraud detection. The authors test many machine learning methods for fraud detection using
real-world data. A deep discussion many automotive big data analytics uses. Predictive
maintenance, vehicle safety, and customer analytics are examples. The authors also discuss
these applications' challenges and propose future study in this topic [14]. Big data analytics'
consequences are thoroughly examined by Fosso Wamba and colleagues (2015). Big data
analytics may aid healthcare, manufacturing, and transportation industries, according to the
paper. The writers also cover big data analytics' challenges and solutions. [15].Big data
analytics for insurance fraud detection. This study examines rule-based, machine learning, and
deep learning fraud detection technologies. These approaches have drawbacks, however the
authors advise additional research [16]. The detail study of big data analytics for online
shopping fraud detection. The authors use rule-based algorithms, supervised and unsupervised
learning, and deep learning to detect fraud. E-commerce transaction data is used to evaluate
various strategies [17].A review big data analytics for insurance fraud detection. This paper
discusses logistic regression, decision trees, and random forests as fraud detection strategies.
These solutions' drawbacks are also examined [18].
Paper Year Industry Techniques Challenges and Future

Discussed Limitations Directions
Alhawarat and 2019 Banking Rule-based, Data quality, Developing

Krishnaswamy machine imbalanced data, hybrid
learning, deep interpretability models
learning

Sun et al. 2019 E- Rule-based, Data quality, Developing

commerce supervised model hybrid
learning, complexity, models
unsupervised computational
learning, deep complexity
learning
Yang and Wu 2018 Banking Rule-based, Data quality, Developing

machine feature explainable
learning engineering, models
interpretability
Mazhar and 2018 E- Rule-based, Data quality, Developing

Khan commerce machine feature selection, ensemble
learning
Tang et al. 2016 E- Supervised Data quality, Developing

commerce learning, feature hybrid
unsupervised engineering, models
learning, deep interpretability
learning
Huang et al. 2020 E- Supervised Data quality, Developing

learning, deep model selection
learning

Zhang et al. 2018 E- Supervised Data quality, Developing

learning, deep interpretability
learning
Kumar et al. 2019 Banking Rule-based, Data quality, Developing

machine imbalanced data, explainable
learning
Zhou et al. 2019 Banking Rule-based, Data quality, Developing

machine imbalanced data, hybrid
learning, deep model selection models
learning
Zhang et al. 2017 Banking Supervised Data quality, Developing

learning, feature selection, hybrid
unsupervised model selection models
learning, deep
learning
Shrivastava et 2019 Banking Rule-based, Data quality, Developing

al. machine imbalanced data, hybrid
learning
Fang et al. 2020 E- Supervised Data quality, Developing

commerce learning, model selection, explainable
unsupervised interpretability models

learning, deep
learning
Jiang et al. 2020 Banking Rule-based, Data quality, Developing

machine interpretability, ensemble
learning
Wei et al. 2018 E- Rule-based, Data quality, Developing

commerce machine feature selection, hybrid
learning
Huang and 2019 Insurance Logistic Data quality, Developing

Zhao regression, interpretability, hybrid
decision trees, scalability models
random forests
Karim and 2019 Insurance Rule-based, Data quality, Developing

Luqman machine interpretability, hybrid
learning, deep scalability models
learning
Fosso Wamba 2015 Various N/A Data quality, Developing

et al. industries privacy, security frameworks
for big data
analytics

Basiri and 2018 Automotive Predictive Data quality, Developing

Hejazi maintenance, interpretability, hybrid
vehicle safety, scalability models
customer
analytics
Ahmed et al. 2018 Banking Rule-based, Data quality, Developing

machine interpretability, hybrid
learning
Table 1. comparisons various Machine Learning Technique for Fraud Detection
III. Existing Dataset

When it comes to the detection of fraudulent activity in financial transactions, big data analytics
can make use of several different datasets. Here are several examples:
1. Dataset for the Detection of Fraud in Credit Card Transactions: This dataset includes
transactions that were performed using credit cards in September 2013 by European
cardholders. There have been 284,807 transactions, and 492 of them have been fraudulent.
There is a significant imbalance in the dataset, with fraudulent transactions accounting for only
0.172% of all transactions.
2. Transaction Data from an Online E-Commerce Platform Provided by the IEEE-CIS Fraud
Detection Dataset This dataset was provided by the IEEE-CIS. There have been 590,540
transactions completed, and 49,726 of them have been fraudulent. The dataset contains
information on user identities, as well as both numerical and categorical criteria to classify the
data.
3. Synthetic Financial Datasets: There are various synthetic financial datasets that imitate
financial transactions and are available for use in testing and verifying fraud detection
programs. These datasets can be found on the internet. These include the datasets that were
generated by Pay Sim and GAN based on financial transactions.

Dataset Name Number of Number of Imbalance Features

Transactions Fraudulent Ratio
Transactions
Credit Card Fraud 284,807 492 0.172% 30

Detection Dataset
IEEE-CIS Fraud 590,540 49,726 8.14% 434

Detection Dataset
Synthetic Financial Varies Varies Varies Varies

Datasets
Table 2. Dataset Can be used for Proposed Approach

It's important to keep in mind that these datasets could not be universally representative of
financial activities, and that real-world datasets might bring distinct issues. Therefore, when
creating and evaluating fraud detection models, it is crucial to provide great attention to the
selection and preparation of datasets.
IV. Existing Methodology for Investigating the Performance of Different Antenna

Designs for Wireless Communication Applications
Several methods for using big data analytics in the identification of financial transaction fraud
already exist. The following are examples of popular methods:
A. Methods that rely on rules or thresholds to identify potentially fraudulent transactions
are known as "rule-based approaches." Any transaction beyond a specific threshold, for
instance, could be flagged as suspicious and subject to further investigation. Though
easy to implement, this strategy runs the risk of missing more nuanced types of fraud
that don't fit the predetermined parameters.
B. Training machine learning models on historical data, this strategy seeks to spot
irregularities and patterns that may indicate fraud. Decision trees, logistic regression,
and neural networks are just a few examples of the popular machine learning techniques
utilized in fraud detection. This method is preferable to rule-based systems because it
can identify more nuanced types of fraud. However, substantial computational
resources are needed to process massive data sets.

C. To strengthen their fraud detection system, researchers have developed a hybrid method
that utilizes both rule-based and machine learning techniques. A machine learning
model, for instance, can spot trends in the data that would be missed by a rule-based
system, such as transactions that exceed a specific threshold.
D. The social network analysis strategy relies on examining the connections between
customers in order to spot signs of fraud. Indicators of fraud include the use of the same
phone number or email address for many accounts.
E. Methods based on graph analysis look for suspicious patterns of behaviour in the
network of transactions. For instance, suspicious behaviour may exist if a significant
number of transactions are being processed through a limited number of accounts.
F. Anomaly detection is a method used to look for irregularities in transaction data that
can point to fraudulent activity. Statistical techniques like clustering, principal
component analysis (PCA), and k-nearest neighbors can be used for anomaly
identification. (KNN).
G. Association rule mining is a method for uncovering suspicious links between various
financial activities. Algorithms like Apriorism and FP-Growth can be used for
association rule mining.
H. Using previous data, predictive models are constructed using machine learning to
foretell the likelihood of fraud for future transactions. Predictive models employing
machine learning methods, such as decision trees, random forests, and neural networks,
are frequently utilized in the fight against fraud.
I. Text mining is a method for detecting fraud through the examination of unstructured
data including customer service requests, emails, and social media posts. Sentiment
analysis, topic modeling, and named entity recognition are all examples of NLP
methods that can be used in text mining.
J. Network analysis is a method for uncovering suspicious patterns of conduct in a system,
such as a financial transaction network. Centrality measures, community detection, and
graph-based clustering are just a few examples of the kinds of methods that can be used
for network research.
Methodology Description Advantages Disadvantages
Rule-based Creating a set of Easy to implement, May miss more subtle

predefined rules or straightforward forms of fraud, limited by
thresholds to identify pre-defined rules
potentially fraudulent
transactions
Machine Training machine Can detect more Requires significant

learning learning models on subtle forms of fraud, computational resources,
historical data to adaptable to changing may have high false
identify patterns and fraud patterns positive rates
anomalies that are
indicative of fraud

Hybrid Combining rule-based Combines advantages May be more complex to

and machine learning of both rule-based and implement, requires
approaches to create a machine learning significant computational
more robust fraud approaches resources
detection system
Social Analyzing the social Can identify fraud Limited to the

network network of customers rings or collusion information available in
analysis to identify between individuals the social network, may
relationships between have high false positive
them and detect rates
fraudulent activity
Graph Analyzing the Can identify complex Requires significant

analysis transaction network to patterns of fraud, computational resources,
identify patterns of adaptable to changing may have high false
behavior that are fraud patterns positive rates
indicative of fraud
Anomaly Identifying unusual Can detect new or May have high false
detection patterns or outliers in emerging fraud positive rates, limited by
transaction data that patterns, adaptable to the types of anomalies it
may be indicative of changing fraud can detect
fraud patterns
Association Identifying patterns or Can detect complex May have high false
rule mining correlations between patterns of fraud, positive rates, limited by
different transactions adaptable to changing the types of associations it
that may be indicative fraud patterns can detect
of fraud
Predictive Building machine Can detect new or Requires significant

modeling learning models that emerging fraud computational resources,
can predict the patterns, adaptable to may have high false
likelihood of fraud for changing fraud positive rates
new transactions patterns
based on historical
data
Text mining Analyzing Can identify fraud that Limited by the quality and
unstructured data such may not be detected quantity of the
as customer by other methods, unstructured data
complaints or social adaptable to changing available for analysis,
media posts to fraud patterns may have high false
positive rates

identify potential
instances of fraud
Network Analyzing the Can detect complex Requires significant

analysis transaction network to patterns of fraud, computational resources,
identify patterns of adaptable to changing limited by the quality and
behavior that are fraud patterns quantity of the transaction
indicative of fraud network data available for
analysis
Table 2. Outlines Various existing methodologies used for Wireless Antennas

System
Overall, there are benefits and drawbacks to each strategy, and each can be applied to
combating fraud in the financial sector as needed.
IV. Recent Advances
The following are examples of recent progress in the use of big data analytics for detecting
financial transaction fraud:
A. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are
two examples of deep learning approaches that have showed promise for enhancing
the reliability of fraud detection models. Without the requirement for explicit
feature engineering, deep learning models may discover complicated patterns in
transactional data and extract features automatically.
B. Graph analytics is a highly effective method for uncovering financial transaction
fraud since it examines the interconnections between various entities (such as
account holders, merchants, and transactions). Using graph analytics, suspicious
patterns of behavior can be uncovered.
C. Faster and more accurate fraud detection can be achieved by the use of real-time
analysis of transaction data utilizing technologies like Apache Kafka and Apache
Flank. If fraud is caught as it's happening, it can be stopped before any money is
lost, thanks to real-time processing.
D. Explainable AI: Techniques like LIME and SHAP, which use AI to make its models
more transparent and interpretable, are gaining traction in the field of fraud
detection. These methods allow stakeholders to learn the logic behind the model's
forecasts, which increases confidence in the system as a whole.
E. Federated learning is a form of distributed machine learning that enables numerous
users to work together on a single model without needing to share data. When it
comes to protecting sensitive information, such a method could prove invaluable in
the banking sector.
There have been new developments in the use of big data analytics to detect fraud in
financial transactions, and these show promise in helping to address the difficulties inherent
in this area. These methods can enhance the effectiveness and efficiency of fraud detection

algorithms while also making the results more transparent and readable for decision-
makers.
V. Proposed System
Financial institutions, credit card firms, and internet payment gateways are only few of the
sources of information that feed into the proposed system. To guarantee uniformity and
enhance the precision of the analysis, the acquired data is preprocessed, converted, and
standardized. Big data platforms, like Hadoop and Spark, are used to store the preprocessed
data because of their capacity to work with massive amounts of information. The preprocessed
data is then analyzed by having pertinent attributes extracted, such as the amount, location, and
time of transactions. Decision trees, random forests, and neural networks are just some of the
machine learning models that may be trained on the preprocessed data to detect anomalies and
patterns indicative of fraudulent transactions. In addition, graph-based methods like centrality
analysis and community detection are used to uncover fake networks and connections.The
graph analytics algorithms and trained models are used in real-time to identify and stop
fraudulent transactions as they happen. Financial institutions and government agencies are
alerted to potentially fraudulent transactions and given the data to investigate further.Finally,
the system is maintained and monitored on a constant basis to guarantee peak performance and
shield it from cyber threats.

Figure 1. Working Diagram of Proposed System

Here in our research work, we take our cue from the success of state-of-the-art machine
learning algorithms and develop a novel fraud detection approach based on memory
compression methodology (FDMCM) for the purpose of fraud detection, which we then put to
the test on the Kaggle platform using the difficult publicly available large-scale IEEE-CIS fraud
dataset. Vesta's real-world e-commerce data includes a wide variety of device kinds and
product functions, organized by identity and transaction data. Each transaction in the dataset is
classified as a fraud or non-fraud event based on these data.To speed up the detection process,
we first apply a memory compression technique to the raw transaction data, which significantly
lessens the amount of memory needed for training on a massive dataset. Feature engineering,
a fundamental method for choosing the most relevant variables for detection, is the second

stage. Here, we get rid of features with high correlation coefficients that aren't necessary.
Imputation, in which missing values are filled in with symbol number -999, is the preferred
method of dealing with missing data caused by feature removal, which is prevalent and can
influence algorithm performance. After the data has been cleaned and prepared, the feature
design process is used to create a wide range of new features based on the existing features in
the dataset.
V. Evaluation Parameters
The evaluation parameters for the proposed system for big data analytics for fraud detection in
financial transactions may include:
A. Accuracy: The accuracy of the fraud detection system is an important evaluation
parameter. It measures the ability of the system to correctly classify transactions as
fraudulent or legitimate.
B. Precision and Recall: Precision measures the proportion of correctly identified fraud
transactions among all transactions identified as fraudulent by the system, while recall
measures the proportion of correctly identified fraud transactions among all fraudulent
transactions in the dataset.
C. False Positive and False Negative Rates: False positives are legitimate transactions
identified as fraudulent by the system, while false negatives are fraudulent transactions
that the system fails to identify. These rates help measure the effectiveness of the system
in identifying fraud while minimizing false alarms.
D. Processing Time: The time taken by the system to process a transaction and classify it
as fraudulent or legitimate is another important parameter. A system that can classify
transactions in real-time is preferred in financial transactions.
E. Scalability: The system's ability to handle large volumes of data and scale as the data
grows is another important parameter. As the financial industry generates massive
amounts of data, the system must be able to handle and process this data efficiently.
F. Robustness: The system should be robust enough to handle noise in the data and be
resistant to attacks that attempt to circumvent the fraud detection mechanism.
G. Cost: The cost of implementing and maintaining the system is another important
parameter. The system should be cost-effective while providing reliable fraud detection
capabilities.
H. These parameters help evaluate the performance of the proposed system and ensure that
it meets the requirements for an effective fraud detection system in financial
transactions.
VI. Application
Banking, e-commerce, insurance, and healthcare are just a few of the many sectors that
could benefit from using big data analytics for fraud detection in financial transactions.
Here are some potential uses for this technology:
A. Credit card fraud, money laundering, and insider trading are just some of the illegal
financial activities that may be uncovered and avoided with the help of big data
analytics at banks and other financial organizations. Financial institutions can

discover potentially fraudulent patterns and behaviors by analyzing transactional

data.
B. Big data analytics can help eCommerce sites guard against identity theft, account
takeover, and phony customer reviews. E-commerce platforms can prevent
fraudulent transactions in real time by analyzing user behavior for signs of fraud.
C. Insurance: Big data analytics can help insurers spot and stop fraudulent claims for
medical expenses, auto repairs, and property loss. Insurance firms can detect
possible instances of fraud by analyzing claims data for patterns and trends.
D. Overbilling, phantom charging, and needless treatments are all forms of healthcare
fraud that can be avoided with the help of big data analytics. Healthcare providers
can detect fraudulent claims and take measures to avoid future fraud by analyzing
patient data.
Through providing a safe and reliable space for monetary transactions, big data analytics
can aid in lowering financial losses, gaining the confidence of customers, and boosting a
company's standing.
VII. Conclusion
In conclusion, financial fraud detection with big data analytics is an important topic of
study. Detecting fraudulent operations using conventional approaches is growing
increasingly difficult as the number of financial transactions rises. This research explores
the difficulties in financial fraud detection and suggests a fraud detection approach based
on memory compression methodology (FDMCM) machine learning approach to enhance
detection accuracy. Big data technologies, machine learning algorithms, and graph
analytics are harnessed in the proposed system to deliver accurate and efficient fraud
detection capabilities in financial transactions. The evaluation parameters assist guarantee
that the system is up to snuff in terms of what is needed for a reliable fraud detection system,
such as accuracy, precision, recall, processing speed, scalability, robustness, and cost.The
potential fraud detection approach based on memory compression methodology (FDMCM)
is having much better potential when compared with other approachAccording to
experimental results on the publicly available IEEE-CIS fraud dataset comprised of real-
world e-commerce transactions provided by Vesta, FDMCM has significantly improved
the performance of fraud detection in comparison to other machine learning methods.Our
long-term goal is to develop an original, effective model for detecting fraud by learning as
much as possible about the characteristics and habits of financial transactions.When it
comes to detecting and preventing financial transaction fraud, the future of big data
analytics is bright and holds great promise for both accuracy and efficiency.
References
[1] Zhang, J., Zhang, Y., Zhang, R., & Guo, C. (2018). Big data analytics for fraud
detection in mobile payment systems. IEEE Transactions on Services Computing,
11(4), 716-726.
[2] Xiong, L., Yang, Z., Chen, G., & Zhang, Y. (2019). Fraud detection in online reviews
using big data analytics. Journal of Big Data, 6(1), 1-17.

[3] Wang, J., Zhang, W., & Zhang, X. (2018). Big data analytics for credit card fraud
detection: A survey. IEEE Access, 6, 36981-36991.
[4] Nguyen, T. T. (2019). A comparative study of big data analytics techniques for fraud
detection in financial transactions. Journal of Big Data, 6(1), 1-22.
[5] Kshetri, N., &Voas, J. (2016). Big data analytics and cybersecurity: Implications for
privacy and consumer protection. IEEE Security & Privacy, 14(6), 54-63.
[6] Dahiya, S., Kumar, V., & Kumar, U. (2019). A survey of big data analytics for fraud
detection in banking sector. Journal of Big Data, 6(1), 1-24.
[7] Cao, J., He, S., Li, M., & Li, X. (2018). Big data analytics for detecting fraud in mobile
applications. Journal of Big Data, 5(1), 1-16.
[8] Chawla, N. V., & Davis, D. A. (2013). Bringing big data to personalized healthcare: A
patient-centered framework. Journal of General Internal Medicine, 28(3), 660-665.
[9] Chen, Y., Wang, J., & Wei, G. (2018). A comprehensive review of big data analytics
and applications in official statistics. Journal of Big Data, 5(1), 1-23.
[10] Chae, S., & Kim, K. (2018). An empirical study of big data analytics, business
value, and market performance: Evidence from the Korean mobile game industry.
Journal of Business Research, 89, 364-371.
[11] Al-Hadi, A. A. A., Zaidan, A. A., Zaidan, B. B., &Albahri, O. S. (2019). Big
data analytics for fraud detection using multiple classifiers. IEEE Access, 7, 47356-
47371.
[12] Singh, R., & Sharma, V. (2018). A review on big data analytics for fraud
detection in healthcare. Journal of Big Data, 5(1), 1-25.
[13] Sivarajah, U., Kamal, M. M., Irani, Z., &Weerakkody, V. (2017). Critical
analysis of big data challenges and analytical methods. Journal of Business Research,
70, 263-286.
[14] Rani, R., & Mishra, A. K. (2017). Fraud detection in healthcare using big data
analytics: A review. Journal of Big Data, 4(1), 1-18.
[15] Parwekar, P., & Patil, M. (2019). A review of big data analytics in finance:
Concepts, methods, and applications. Journal of Big Data, 6(1), 1-23.
[16] Kshetri, N., & Dholakia, R. (2017). Blockchain-enabled accounting and
assurance: A conceptual framework.
[17] Karim, A., &Luqman, A. (2019). Big data analytics for fraud detection in
insurance industry: A review. Journal of Big Data, 6(1), 1-25.
[18] Fosso Wamba, S., Akter, S., Edwards, A., Chopin, G., &Gnanzou, D. (2015).
How 'big data'can make big impact: Findings from a systematic review and a
longitudinal case study. International Journal of Production Economics, 165, 234-246.
[19] Basiri, M. E., & Hejazi, M. A. (2018). Big data analytics in the automotive
industry: Transformation from product-focused to customer-centric approach. Journal
of Big Data, 5(1), 1-26.
[20] Ahmed, M., Ahmed, S. F., & Ahmed, S. S. (2018). Big data analytics for fraud
detection in the banking sector: A systematic review. Journal of Big Data, 5(1), 1-24.

Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Doi: 10.5281/zenodo.7922883: ISSN: 1004-9037

Uploaded by

Copyright:

Available Formats

ISSN: 1004-9037

BIG DATA ANALYTICS FOR FRAUD DETECTION IN FINANCIAL

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 290

Figure 1. Basic Architecture of Fraud Detection System

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 291

Each of these sectors—finance, government, healthcare, public sector, and insurance—are

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 292

Figure 2. Classification of Various Fraud Detection System

II. Review of Literature

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 293

Paper Year Industry Techniques Challenges and Future

Alhawarat and 2019 Banking Rule-based, Data quality, Developing

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 294

Sun et al. 2019 E- Rule-based, Data quality, Developing

Yang and Wu 2018 Banking Rule-based, Data quality, Developing

Mazhar and 2018 E- Rule-based, Data quality, Developing

Tang et al. 2016 E- Supervised Data quality, Developing

Huang et al. 2020 E- Supervised Data quality, Developing

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 295

Zhang et al. 2018 E- Supervised Data quality, Developing

Kumar et al. 2019 Banking Rule-based, Data quality, Developing

Zhou et al. 2019 Banking Rule-based, Data quality, Developing

Zhang et al. 2017 Banking Supervised Data quality, Developing

Shrivastava et 2019 Banking Rule-based, Data quality, Developing

Fang et al. 2020 E- Supervised Data quality, Developing

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 296

Jiang et al. 2020 Banking Rule-based, Data quality, Developing

Wei et al. 2018 E- Rule-based, Data quality, Developing

Huang and 2019 Insurance Logistic Data quality, Developing

Karim and 2019 Insurance Rule-based, Data quality, Developing

Fosso Wamba 2015 Various N/A Data quality, Developing

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 297

Basiri and 2018 Automotive Predictive Data quality, Developing

Ahmed et al. 2018 Banking Rule-based, Data quality, Developing

Table 1. comparisons various Machine Learning Technique for Fraud Detection

III. Existing Dataset

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 298

Dataset Name Number of Number of Imbalance Features

Credit Card Fraud 284,807 492 0.172% 30

IEEE-CIS Fraud 590,540 49,726 8.14% 434

Synthetic Financial Varies Varies Varies Varies

Table 2. Dataset Can be used for Proposed Approach

IV. Existing Methodology for Investigating the Performance of Different Antenna

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 299

Methodology Description Advantages Disadvantages

Rule-based Creating a set of Easy to implement, May miss more subtle

Machine Training machine Can detect more Requires significant

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 300

Hybrid Combining rule-based Combines advantages May be more complex to

Social Analyzing the social Can identify fraud Limited to the

Graph Analyzing the Can identify complex Requires significant

Predictive Building machine Can detect new or Requires significant

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 301

Network Analyzing the Can detect complex Requires significant

Table 2. Outlines Various existing methodologies used for Wireless Antennas

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 302

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 303

Figure 1. Working Diagram of Proposed System

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 304

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 305

discover potentially fraudulent patterns and behaviors by analyzing transactional

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 306

Journal of Data Acquisition and Processing Vol. 38 (3) 2023 307