Professional Documents
Culture Documents
Abstract—This paper is primarily a literature review on Evolution of Big Data and Data Analytics has actually
past research, publications on the changes in Banking and the undergone three phases and is summarized into three
uses of Big Data in transforming the traditional banking model periods: [2]
into a digital bank. It also highlights the Hadoop (HDFS)
system framework infrastructure that is used to conduct Big
Data Analytics to enable banks to extract insights from the
Big Data 1.0 (1994 -2005) - coincides with the advent of e-
databases. commerce in 2014 where content was primarily provided by
company websites. User generated content were minimal.
Data analytics were mainly web usage mining where the
web browser patterns were analyzed, web structure mining
Keywords— Big Data, Big Data Analytics, Banking, Hadoop which analyses the structure of the website and web content
(HDFS) mining whereby useful content of the webpages is extracted.
I. INTRODUCTION Big Data 2.0 (2005 – 2014) - was driven by Web 2.0 and the
This paper is divided into six sections, starting with an social media phenomenon. Web technologies reached the
introduction about Big Data and banking transformation point of maturity where users were able to interact with the
phase defined by the adoption phase of technologies. Next, website and provide their own content. Data explosion
the dimensions of Digital Banking are discussed followed by started with more variety such as images, text, audio being
the challenges, opportunities and benefits of using Big Data shared in the social platforms. Social media analytics were
Analytics. Hadoop HDFS System which is primarily the conducted given the increase of data which allow analysts to
technology enabler of Big Data Analytics is detailed in this Interpret human behavior from the social sites providing
paper. The primary objective of this paper is to provide a insights and conclusion on users’ interest, web browsing
comprehensive assessment on Big Data Analytics in Digital patterns, friends list, profession and opinion. This allow
Banking. Motivation for this research is due to scarce customer relationship marketing campaigns for targeted
resource that provides a comprehensive Big Data overview customers.
with the technical framework for Big Data Analytics used in
banking in a single paper. Big Data 3.0 (2015 – now) - evolves with the introduction of
Internet of Things (IOT) devices and applications. Main
contributors of data from IoT applications are in the form of
II. LITERATURE REVIEW videos, images, audio and text with live streaming that
creates massive data being transmitted over the internet.
This trend leads to streaming analytics which is different
from social analytics as it involves real-time event analysis
Big Data is synonymous with the letter ‘V’ with the few to discover patterns of interest in the data generated or
keywords used in defining its character such as Volume, collected.
Variety, Velocity, Veracity and Value. For clarity, Volume
refers to vast amount of data generated every second. With the advent of Big Data, Big Data Analytics is
Variety refers to they type of data generated. Velocity naturally used to mine the insights from the data collected
dimension refers to the speed at which new data is generate using machine learning algorithms, visual analytical tools
and is transmitted. Veracity presents the degree of reliability and big data framework.
of the data. The final keyword Value is deemed the most
important is referred as the business value that can be From the analysis, exponential generation of data were
extracted from the data. [1] [2] due to the improvement of technologies, user content
generation and devices developed at each stage. At the same
With the presence of structured, unstructured, and semi- time, banking services were also adapting to the market with
structured data in big data, data quality and correctness less the introduction of new technologies.
controllability complicates the entire Big Data Analytics
process. Some data may have reduced Value over time as Banking data is known to have all the characteristics of
timeliness is also a key factor in data extraction. This Big Data which makes it one of the most valuable industry
dimension was introduced as ‘Decay’ [2] which measures to employ Big Data Analytics. Even with the advancement
the decay of data in the exponential of time in the era of Big of technology that has changed the landscape for banking
Data. worldwide, BDA has yet to be fully implemented in banks
due to a list of challenges as it is a is highly regulated TABLE 1: DIMENSIONS OF DIGITAL BANK
industry with a legacy for privacy rights and concerns.
Banks from various countries has different pace in adoption Dimension Information
of technologies into their business models. An article from Customer/Sale/Service Predominantly the pillar of digital
Massachusetts Institute of Technology has classified three banking framework. Holistic
‘wave’ or categories of banks. The First Wave companies Customer Relationship
are categorized as the ‘incrementalist’. These companies add Management (CRM) is applied
digital technologies incrementally to existing operations, and integrated with omnichannel
either as an overlay or a minor extension. The Second Wave platform to engage and collect
companies are called ‘digital hybrids’ as they frequently customer financial needs and data.
take advantage of front-end systems to better market and Regulator/Other Comprises seamless
connect with consumers but lack the middle office Banks communication with the regulator
infrastructure and has a legacy back-end system. Most banks and other banks in relation to
belong to this category which is known as hybrid banks. regulatory requirements, fraud and
These ‘digital hybrids’ still maintains centralized databases, banking transactions with other
cloud-based storage and primitive user data protocols. The banks.
Third Wave banks are labeled as ‘digital natives’. These are Internal Comprises various analytic
the Digital Banks of the Future (DBF). DBF takes advantage applications of measurement for
of the technologies and design the business around the needs management of various risks such
of the natives, the 50 and under crowd that grew up with as credit risk, market risk,
computer in their daily lives. It also employs mobile-first operational risk face in everyday
strategy that drives easy and rapid adoption for millennials business operations.
through seamless integration in their lives. Technology Relates to the core banking
system, internet and mobile
DBF are expected to provide a holistic and customizable banking, omnichannel data
customer experience with end-to-end digital experience warehouse, data lake and service
from application to Know Your Customer (KYC) process architecture.
for the on-boarding of new customers using biometrics for Data Refers to data quality that is
identification, offers electronic credit card, foreign exchange captured by the bank and deals
services, has access to p2p payment, offers lending with data governance and quality
opportunities by adopting the mobile-first strategy. [3] management.
Business Process This dimension refers to the
Reengineering (BPR) continuous improvement in
Figure 1: Dimension of Digital Banking redesigning and engineering of
business processes to be more
efficient and friendly to customers.
Analytics Key dimension to drive
profitability to the banks from
three levels of analytics, ie, (i)
descriptive, (ii) predictive, (iii)
prescriptive, to optimize the bank
efficient use of resources.
People The manpower requirement and
training. Mainly refers to the need
of business banking specialists,
data scientists, analysts, data
engineers and software engineers.
2
Figure 2: Big Data Analytics Capability
Expert:
1. There is no defined data structure/Informal, Central Banks which are the core governance of banking
conflicting and dispersed data, industry also cited challenges ranges from the cost of state-
2. Poor data governance/Basic data reporting using of-the-art infrastructure, data security, veracity and quality
mainly spreadsheet-based tools, of data, data privacy, and the cost of hiring talents in the
3. Pockets of reporting and analysis quest to employ BDA. [9]
capability/Dispersed talent,
4. Preliminary analytics strategy. The research cited from Capgemini’s report
impediments for Big Data analytics which ranges from too
many “silos” – data is not pooled for the benefit of the entire
Proficient: organization, time taken to analyse large data sets, shortage
of skilled people for data analysis, big data is not viewed
1. Data available for existing and potential sufficiently strategically by senior management,
customers/Most data are still unstructured and unstructured content in big data is too difficult to interpret,
internal, the high cost of storing and analysing large data sets, big
2. Use of some statistical and forecasting data sets are too complex to collect and store. [6]
tools/Coherent procedures for data management,
3. Well-defined recruitment process to attract In a research article on Big Data in the Finance and
analytics talent/Budget for analytics training, Insurance Sectors, the researchers mentioned old culture and
4. Analytics is used to understand issues and develop infrastructures, lack of skills, data ‘actionability’ which
data-based options across the business. refers to the ability of banks to take action on the data
available and data security and privacy as key constraints.
[10]
3
Hence it can be summarized that the challenges cited Figure 3 gives an example of type of analytics that are
from the researches are similar in theme and supports the conducted in banks to gain insight from its customers and
Big Data Capability Model requirements. It is only when an prospects based on the data collected. The integrated
organization is able to overcome the challenge then will it dashboard from CRM enable banks to capture all streams of
turn into a competitive strength. data from spending patterns, customer income, savings
balance, available credit, loans, social media participation
and will be able to determine the level of risk and provide
credit scores to similar customers segments. [4]
C. Big Data Analytics Opportunities for Banking
In a research study, benefits of BDA include the use of Figure 3: Big Data Analytic in Banking
IoT in Banking uses sensor data from IoT application
embedded in the devices such as smartwatch for
identification and track fraudulent transactions. Another
opportunity derived from BDA is the development of
Chatbots which employs Natural Language Processing to
extract and leverage on the user feedback data. Chatbots in
banking are efficient and is deployed as a HR assistant,
Market Intelligence assistant, Workflow assistant, Social
Media Channel assistant and most commonly used as
Customer Service Assistant. The study also found 80% of
global financial institutions regard chatbots as a golden
opportunity to enhance productivity. [7]
4
E. Benefit of Big Data Analytics
TABLE 2: TYPES OF ANALYTICS
5
strategies.
FIGURE 5 : NODES IN HADOOP
Make the
most strategic
investment on
k-mean maintaining
clustering, and enhancing
Customer classification customer
satisfaction (NN) Spain satisfaction.
Strategic
banking via
direct
Portugal, marketing,
classification Turkey, targeted
Customer (DT, NN, NB, China, marketing,
development LR, SVM), k- Taiwan, product
and mean UCI cross/up
customization clustering Repository selling.
Customer
churn Name Node: Name node functions as a master node. Name
prediction and node should be a high-end system which could survive
prevention,
attracting hardware faults. The daemon that performs the functions of
classification potential a name node is called Job Tracker. Different functions of
(DT, NN, LR, EU, China, customers and name node include:
Customer SVM), ARM, Nigeria, strategic
retention and k-mean Croatia, future service
acquisition clustering Bangladesh design. Maintaining the namespace of files stored. i.e. keeping the
Nigeria, Branch metadata of file blocks and their locations.
Turkey, strategy, bank Maintains an index of cluster configuration
classification Canada, efficiency
(NN, DT, ASEAN, evaluation,
SVM), k- Islamic deposit • Directs data nodes to execute low level operations
Other advanced
supports
mean
clustering
banks,
BRICS, US
pricing, early
warning
• Records the changes that take place in a cluster
• Replication of data blocks is taken care by Name
node
BDA may bring much benefit to an organization such as • Receives the heart-beat of each data node to check
banks. One of the key pillars supporting BDA is the big data whether it is alive, in case of a data node failure,
technology, tools and underlying architecture. name node will assign the task to another data node
depending on the data block availability, location,
overhead etc.
F. Big Data Analytics Technology and Architecture
Data Node: Data node serves as the slave node. Data node
can be any commodity hardware which will not create any
There are many tools for distributed processing for big problem even if the node crashes. Replication will avoid any
data framework such as Hadoop, Apache Spark, damage associated with data node failure.
ClickHouse, Elastic Search, Splunk Free Hive, Storm,
Apache Samza, Apache Flink and Apache Heron [17]. The daemon that performs the operation of data node is
However, Hadoop is one of the most popular framework to- called Task Tracker. The functions of data node include:
date for being reliable and scalable in distributed
calculations and primarily because it is open-source with • Performs low level read/write operations on blocks
strong technical support. [18]. It can manage all aspect of
Big Data such as volume, velocity, and variety by storing • Replication is implemented by Data node.
and processing of data through cluster of nodes. [19] Other • Forward data to other data nodes, pipeline the data
key features which makes Hadoop suitable for processing and send heart beats to name node.
big data includes flexibility, cost effectiveness, fault
tolerance, scalability, robustness and real time processing.
6
Figure 6 shows the layers of Hadoop Cluster structure runs first and is use to filter, transform or parse data through
that can be used in big data processing. parallel processing in cluster of nodes and the output will be
used as the input for ‘Reduce’ to summarize the data from
the Map input. [22] The Map phase processes each record
Figure 6: Hadoop Cluster Structure sequentially and independently on every node and generates
intermediate key-value pairs. The Reduce processes and
merges all the intermediate values to give the final output,
again in form of key-value pairs. [19]
7
supports Machine Learning (MLlib), SQL queries (Spark Figure 10: Overall Bank Risk Management Model.
SQL), streaming data for online processing and Graph
Algorithms (GraphX). Cluster management in Apache
Spark can be performed in three different method, namely
standalone, Hadoop YARN (Yet Another Resource
Negotiator) and Mesos. Local file system of distributed file
system such as HDFS can be accessed using Spark.
8
Figure 11: Bank Big Data Architecture the adoption of Big Data strategy and mitigate the issues
upfront. The author hope that users can apply the key points
steps listed below in their organizations according to their
requirements as a draft framework for Big Data adoption.
V. CONCLUSION
9
[10] Hussain, K., & Prieto, E. (2016). Big data in the finance and [20] Mohan, L., & Sudheep Elayidom, M. (2016). A Novel Big
insurance sectors. In New Horizons for a Data-Driven Data Approach to Classify Bank Customers-Solution by
Economy (pp. 209-223). Springer, Cham. Combining PIG, R and Hadoop. International Journal of
[11] Liu, X., Montgomery, A. L., & Srinivasan, K. (2016). Information Technology and Computer Science (IJITCS), 8(9),
Optimizing Bank Overdraft Fees with Big Data. 81-90.
[12] Shrivastava, A. (2018). Usage of Machine Learning In [21] Sapozhnikova, M. Y., Gayanova, M. M., Vulfin, A. M.,
Business Industries and Its Significant Impact. International Chuykov, A. V., & Nikonov, A. V. (2018). Processing of big
Journal of Scientific Research in Science and Technology, 4(8). data in the transaction monitoring systems. In
Информационные технологии и нанотехнологии (pp. 2526-
[13] Bhuvana, M., Thirumagal, P. G., & Vasantha, S. (2016). Big
2533).
Data Analytics-A Leveraging Technology for Indian
Commercial Banks. Indian Journal of Science and [22] Beakta, R. (2015). Big Data And Hadoop: A Review Paper.
Technology, 9(32), 1-5. International Journal of Computer Science & Information
Technology, 2(2), 13-15
[14] Srivastava, U., & Gopalkrishnan, S. (2015). Impact of big data
analytics on banking sector: Learning for Indian [23] Etaiwi, W., Biltawi, M., & Naymat, G. (2017). Evaluation of
banks. Procedia Computer Science, 50, 643-652. classification algorithms for banking customer’s behavior
under Apache Spark Data Processing System. Procedia
[15] Latib, M. A., Ismail, S. A., Yusop, O. M., Magalingam, P., &
computer science, 113, 559-564.
Azmi, A. (2018, May). Analysing Log Files For Web Intrusion
Investigation Using Hadoop. In Proceedings of the 7th [24] Ma, S., Wang, H., Xu, B., Xiao, H., Xie, F., Dai, H. N., ... &
International Conference on Software and Information Wang, T. (2018, October). Banking Comprehensive Risk
Engineering (pp. 12-21). Management System Based on Big Data Architecture of
Hybrid Processing Engines and Databases. In 2018 IEEE
[16] Hassani, H., Huang, X., & Silva, E. (2018). Digitalisation and
SmartWorld, Ubiquitous Intelligence & Computing, Advanced
big data mining in banking. Big Data and Cognitive Computing, & Trusted Computing, Scalable Computing & Communications,
2(3), 18. Cloud & Big Data Computing, Internet of People and Smart
[17] Vitaliy Ilyukha (2019). 10 Best Big Data Tools for 2020. Jelvix City Innovation
https://jelvix.com/blog/top-5-big-data-frameworks (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (pp.
[18] Apache Software Foundation, apache hadoop. 1844-1851). IEEE.
https://hadoop.apache.org/
[19] Jain, A., & Bhatnagar, V. (2016). Crime data analysis using pig
with Hadoop. Procedia computer science, 78(C), 571-578.
10