You are on page 1of 10

Big Data Analytics in Digital Banking

Matthias Chin Wooi Hoong


Razak Faculty of Technology and
Informatics
Universiti Teknologi Malaysia
cwhmatthias2@graduate.utm.my

Abstract—This paper is primarily a literature review on Evolution of Big Data and Data Analytics has actually
past research, publications on the changes in Banking and the undergone three phases and is summarized into three
uses of Big Data in transforming the traditional banking model periods: [2]
into a digital bank. It also highlights the Hadoop (HDFS)
system framework infrastructure that is used to conduct Big
Data Analytics to enable banks to extract insights from the
Big Data 1.0 (1994 -2005) - coincides with the advent of e-
databases. commerce in 2014 where content was primarily provided by
company websites. User generated content were minimal.
Data analytics were mainly web usage mining where the
web browser patterns were analyzed, web structure mining
Keywords— Big Data, Big Data Analytics, Banking, Hadoop which analyses the structure of the website and web content
(HDFS) mining whereby useful content of the webpages is extracted.

I. INTRODUCTION Big Data 2.0 (2005 – 2014) - was driven by Web 2.0 and the
This paper is divided into six sections, starting with an social media phenomenon. Web technologies reached the
introduction about Big Data and banking transformation point of maturity where users were able to interact with the
phase defined by the adoption phase of technologies. Next, website and provide their own content. Data explosion
the dimensions of Digital Banking are discussed followed by started with more variety such as images, text, audio being
the challenges, opportunities and benefits of using Big Data shared in the social platforms. Social media analytics were
Analytics. Hadoop HDFS System which is primarily the conducted given the increase of data which allow analysts to
technology enabler of Big Data Analytics is detailed in this Interpret human behavior from the social sites providing
paper. The primary objective of this paper is to provide a insights and conclusion on users’ interest, web browsing
comprehensive assessment on Big Data Analytics in Digital patterns, friends list, profession and opinion. This allow
Banking. Motivation for this research is due to scarce customer relationship marketing campaigns for targeted
resource that provides a comprehensive Big Data overview customers.
with the technical framework for Big Data Analytics used in
banking in a single paper. Big Data 3.0 (2015 – now) - evolves with the introduction of
Internet of Things (IOT) devices and applications. Main
contributors of data from IoT applications are in the form of
II. LITERATURE REVIEW videos, images, audio and text with live streaming that
creates massive data being transmitted over the internet.
This trend leads to streaming analytics which is different
from social analytics as it involves real-time event analysis
Big Data is synonymous with the letter ‘V’ with the few to discover patterns of interest in the data generated or
keywords used in defining its character such as Volume, collected.
Variety, Velocity, Veracity and Value. For clarity, Volume
refers to vast amount of data generated every second. With the advent of Big Data, Big Data Analytics is
Variety refers to they type of data generated. Velocity naturally used to mine the insights from the data collected
dimension refers to the speed at which new data is generate using machine learning algorithms, visual analytical tools
and is transmitted. Veracity presents the degree of reliability and big data framework.
of the data. The final keyword Value is deemed the most
important is referred as the business value that can be From the analysis, exponential generation of data were
extracted from the data. [1] [2] due to the improvement of technologies, user content
generation and devices developed at each stage. At the same
With the presence of structured, unstructured, and semi- time, banking services were also adapting to the market with
structured data in big data, data quality and correctness less the introduction of new technologies.
controllability complicates the entire Big Data Analytics
process. Some data may have reduced Value over time as Banking data is known to have all the characteristics of
timeliness is also a key factor in data extraction. This Big Data which makes it one of the most valuable industry
dimension was introduced as ‘Decay’ [2] which measures to employ Big Data Analytics. Even with the advancement
the decay of data in the exponential of time in the era of Big of technology that has changed the landscape for banking
Data. worldwide, BDA has yet to be fully implemented in banks
due to a list of challenges as it is a is highly regulated TABLE 1: DIMENSIONS OF DIGITAL BANK
industry with a legacy for privacy rights and concerns.
Banks from various countries has different pace in adoption Dimension Information
of technologies into their business models. An article from Customer/Sale/Service Predominantly the pillar of digital
Massachusetts Institute of Technology has classified three banking framework. Holistic
‘wave’ or categories of banks. The First Wave companies Customer Relationship
are categorized as the ‘incrementalist’. These companies add Management (CRM) is applied
digital technologies incrementally to existing operations, and integrated with omnichannel
either as an overlay or a minor extension. The Second Wave platform to engage and collect
companies are called ‘digital hybrids’ as they frequently customer financial needs and data.
take advantage of front-end systems to better market and Regulator/Other Comprises seamless
connect with consumers but lack the middle office Banks communication with the regulator
infrastructure and has a legacy back-end system. Most banks and other banks in relation to
belong to this category which is known as hybrid banks. regulatory requirements, fraud and
These ‘digital hybrids’ still maintains centralized databases, banking transactions with other
cloud-based storage and primitive user data protocols. The banks.
Third Wave banks are labeled as ‘digital natives’. These are Internal Comprises various analytic
the Digital Banks of the Future (DBF). DBF takes advantage applications of measurement for
of the technologies and design the business around the needs management of various risks such
of the natives, the 50 and under crowd that grew up with as credit risk, market risk,
computer in their daily lives. It also employs mobile-first operational risk face in everyday
strategy that drives easy and rapid adoption for millennials business operations.
through seamless integration in their lives. Technology Relates to the core banking
system, internet and mobile
DBF are expected to provide a holistic and customizable banking, omnichannel data
customer experience with end-to-end digital experience warehouse, data lake and service
from application to Know Your Customer (KYC) process architecture.
for the on-boarding of new customers using biometrics for Data Refers to data quality that is
identification, offers electronic credit card, foreign exchange captured by the bank and deals
services, has access to p2p payment, offers lending with data governance and quality
opportunities by adopting the mobile-first strategy. [3] management.
Business Process This dimension refers to the
Reengineering (BPR) continuous improvement in
Figure 1: Dimension of Digital Banking redesigning and engineering of
business processes to be more
efficient and friendly to customers.
Analytics Key dimension to drive
profitability to the banks from
three levels of analytics, ie, (i)
descriptive, (ii) predictive, (iii)
prescriptive, to optimize the bank
efficient use of resources.
People The manpower requirement and
training. Mainly refers to the need
of business banking specialists,
data scientists, analysts, data
engineers and software engineers.

As explained in the dimensions of digital banking, Big


Data Analytics Capability of the bank is subject to the
fulfilment of the structure in Figure 2. [5]

In general, there are 8 dimensions in digital banking


which is similar to traditional banks [4]. The key difference
between digital banking business model and the traditional
banks is the collection and use of data in decision making,
processes, operations, development of banking products and
delivery channel of the products and services to its
customers which is further explained in Table 1.

2
Figure 2: Big Data Analytics Capability
Expert:

1. Internal, external and social media data is merged


to build and integrated and structured dataset,
2. Established robust master data management
framework for structured and unstructured data sets,
3. Existence of analytics centre of excellence for
promotion of best practice/strategic partnership for
supplementary analytics skills,
4. Full executive sponsorship of analytics.

B. Big Data Analytics Challenges in Banking

Transforming from a traditional bank into a digital bank


or setting up a standalone digital bank has its own
Digital banks can only perform well and provide challenges and opportunities. Among the challenge
tangible results from the invested resources with the balance researchers found includes issues related to legal and
of the tangible, human and intangible elements as depicted regulatory, privacy, security, data quality, organizational
in Figure 2. Any missing elements will inhibit the digital mindset, data visualization, data integrity, inefficient data
banks from reaching its desired big data analytics management, lack of capability in back-end systems,
objectives. regulatory reporting, general ledger transformation, and
compliance. [4]

In another research, good data quality, analytical-


A. Big Data Adoption Maturity Model savviness in an organization, qualified and well-trained
managers, participation of the business or department,
To measure the maturity of Big Data analytics adoption executive support in user of BDA, phased approach in
in banks, a research proposed the findings from Capgemini implementation, change in management planning in terms of
Consulting (2014) that defines three levels of building mindset change across organizations were cited as the key
analytics maturity in organization and their characteristics to challenges. [7]
be used. This enable banks to assess their current status in
Big Data adoption strategy in comparison to the desired A research was conducted on 42 commercial banks and
outcome and the transformation needed.[6] 38 fintech companies in Kenya and found that the most
common challenges were integrating legacy systems and
new technologies, working with different data types and
Beginner: poor quality of data. [8]

1. There is no defined data structure/Informal, Central Banks which are the core governance of banking
conflicting and dispersed data, industry also cited challenges ranges from the cost of state-
2. Poor data governance/Basic data reporting using of-the-art infrastructure, data security, veracity and quality
mainly spreadsheet-based tools, of data, data privacy, and the cost of hiring talents in the
3. Pockets of reporting and analysis quest to employ BDA. [9]
capability/Dispersed talent,
4. Preliminary analytics strategy. The research cited from Capgemini’s report
impediments for Big Data analytics which ranges from too
many “silos” – data is not pooled for the benefit of the entire
Proficient: organization, time taken to analyse large data sets, shortage
of skilled people for data analysis, big data is not viewed
1. Data available for existing and potential sufficiently strategically by senior management,
customers/Most data are still unstructured and unstructured content in big data is too difficult to interpret,
internal, the high cost of storing and analysing large data sets, big
2. Use of some statistical and forecasting data sets are too complex to collect and store. [6]
tools/Coherent procedures for data management,
3. Well-defined recruitment process to attract In a research article on Big Data in the Finance and
analytics talent/Budget for analytics training, Insurance Sectors, the researchers mentioned old culture and
4. Analytics is used to understand issues and develop infrastructures, lack of skills, data ‘actionability’ which
data-based options across the business. refers to the ability of banks to take action on the data
available and data security and privacy as key constraints.
[10]

3
Hence it can be summarized that the challenges cited Figure 3 gives an example of type of analytics that are
from the researches are similar in theme and supports the conducted in banks to gain insight from its customers and
Big Data Capability Model requirements. It is only when an prospects based on the data collected. The integrated
organization is able to overcome the challenge then will it dashboard from CRM enable banks to capture all streams of
turn into a competitive strength. data from spending patterns, customer income, savings
balance, available credit, loans, social media participation
and will be able to determine the level of risk and provide
credit scores to similar customers segments. [4]
C. Big Data Analytics Opportunities for Banking

In a research study, benefits of BDA include the use of Figure 3: Big Data Analytic in Banking
IoT in Banking uses sensor data from IoT application
embedded in the devices such as smartwatch for
identification and track fraudulent transactions. Another
opportunity derived from BDA is the development of
Chatbots which employs Natural Language Processing to
extract and leverage on the user feedback data. Chatbots in
banking are efficient and is deployed as a HR assistant,
Market Intelligence assistant, Workflow assistant, Social
Media Channel assistant and most commonly used as
Customer Service Assistant. The study also found 80% of
global financial institutions regard chatbots as a golden
opportunity to enhance productivity. [7]

Big Data analytics improves the quality of economics


analysis and research such as new methods of measuring
economic indicators, price, labor market conditions, housing
market, business sentiments and etc. BDA also enable
timelier publications of official data by bridging the time lag
before official statistics are issued. BDA can produce new
types of statistical forecasting to complement ‘traditional’ In a study, the researchers view BDA from the perspective
statistical data sets through the use of social network on the types of analytics and categorized it into 6 distinct
analysis such as sentiment analysis. [9] types of analytics that can be done in a bank as presented in
Figure 4. [4]. Short summary with reference to Figure 4 is
Other research proposed the use of BDA to optimize the provided in Table 2.
Bank Overdraft fees [11], to use robo-advisors targeted at
millennials who does not need physical advisor to feel Figure 4: Types of Bank Analytics
comfortable investing, algorithmic trading, fraud detection,
loan and insurance writing [12] , risk management, fraud
management, customer segmentation [13], risk management
processes [6] , customer satisfaction and product cross-
selling based on sentiment analysis. [14]

D. Big Data – Types of Analytics in Banks.

BDA for banking and businesses primarily can be


divided into two categories. Risk based or revenue driven
type of analytics. Risk based analytics are to minimize the
risk of losses whether it is financial or reputation related
while revenue driven analytics which are focused on
creating new products by understanding the market demand
to maximize the investments. Source of data can be
obtained internally or externally. Internal data are from
branches, analyst reports, Automated Teller Machines
(ATM), Bank Call Centers, Clients historical transactions,
Internet or mobile banking services log. External source of
data can be found in regulatory bodies data, trading data,
financial data, reports about competitors, social media data.
[6]

4
E. Benefit of Big Data Analytics
TABLE 2: TYPES OF ANALYTICS

From the various type of Big Data Analytics that can be


Type of Information conducted in banks coupled with machine learning
Analytics algorithms, a research summarized the benefits such as
Customer • Produces a 360º view of customer in listed [12]:
Analytics a dashboard. Derive insights on
marketing, social media, channel, 1. Customer Lifetime Value prediction
collection and recovery analytics and 2. Predictive Maintenance
etc. 3. Eliminates Manual Data Entry
• Most explored algorithm used is 4. Detecting Spam
Genetic Algorithm. 5. Product Recommendation
Fraud Analytics • Covers online and offline fraud via 6. Increasing Customer Satisfaction
internet banking, ATM, mobile 7. Financial Analysis
banking and credit cards. 8. Image Recognition
Perpetrators can be insiders, 9. Improving Cyber Security
outsiders or a collusion of both.
• Social network analytics, text
analytics is primarily used for such Another group of researchers studied over 100 data
efforts. Other algorithms for credit mining applications in banking publications post 2013 and
card risk being used includes categorized the value creation into four main sectors
Support Vector Machines (SVM), namely, security and fraud detection, risk management and
Decision Tree (DT), Self-Organizing investment banking, CRM and other advance operational
Map (SOM). Insurance fraud is support. The research also listed the key machine learning
checked with Logistics model, while techniques that was applied in data mining and challenges
Financial statement fraud is analyzed experienced in the banks. The findings conclude that BDA
using Neural Networks (NN), DT, truly delivers value to organizations such as banks and is
SVM and GA. presented in Table 3. [17]
Risk Analytics • Comprise all quantifiable risk such
as credit risk, operational risk,
market risk. Predicted using data- TABLE 3: SUMMARY OF DATA MINING APPLICATIONS IN
BANKING SINCE 2013
driven models.
• Example algorithm used such as Key
Wavelet-based method called Sector Techniques Regions Purposes
Generalized Optimal Wavelet Identifying
Decomposing Algorithm (GOWDA) phishing,
Australia, fraud, money
which forecast volatility in equities Latin- laundering,
in an efficient manner. classification America, credit card
Operational • Covers all operational issue in a (DT, NN, Greece, fraud, security
Analytics bank ranging from transactions, SVM, NB), k- Germany, trend of
mean Belgium, mobile/online/
assessment for bank’s growth, Security and fraud clustering, UCI traditional
performance and profitability, detection ARM Repository banking.
solvency, productivity and liquidity, UCI
etc. Repository
Internation
al Dataset,
Security • Security Analytics includes Australia,
Analytics vulnerability analysis, intrusion Iran,
detection, anomaly detection, spam Indonesia, Credit
classification China, scoring, credit
and malware detection, DDoS (DT, NN, German, granting, risk
detection, SQL injection attack and SVM, NB, Taiwan, management
advanced persistent prediction. Risk management and LR), k-mean US, for peer-to-
• For example, a research proposed investment banking clustering Canada peer lending.
classification Efficiently
log file for big data analytics proved Customer (DT, NN), k- build accurate
to be useful in detecting web profiling and mean customer
intrusion. [15] knowledge clustering Jamaica profiles.
Provide
HR Analytics • Includes all insights needed to assess sufficient
the possible attrition, hiring and customer
remuneration. segmentation,
• Social Network Analytics is an conduct
customer-
example of methods used to provide CR Customer k-mean centric
the insights needed. M segmentation clustering Iran business

5
strategies.
FIGURE 5 : NODES IN HADOOP

Make the
most strategic
investment on
k-mean maintaining
clustering, and enhancing
Customer classification customer
satisfaction (NN) Spain satisfaction.
Strategic
banking via
direct
Portugal, marketing,
classification Turkey, targeted
Customer (DT, NN, NB, China, marketing,
development LR, SVM), k- Taiwan, product
and mean UCI cross/up
customization clustering Repository selling.
Customer
churn Name Node: Name node functions as a master node. Name
prediction and node should be a high-end system which could survive
prevention,
attracting hardware faults. The daemon that performs the functions of
classification potential a name node is called Job Tracker. Different functions of
(DT, NN, LR, EU, China, customers and name node include:
Customer SVM), ARM, Nigeria, strategic
retention and k-mean Croatia, future service
acquisition clustering Bangladesh design. Maintaining the namespace of files stored. i.e. keeping the
Nigeria, Branch metadata of file blocks and their locations.
Turkey, strategy, bank Maintains an index of cluster configuration
classification Canada, efficiency
(NN, DT, ASEAN, evaluation,
SVM), k- Islamic deposit • Directs data nodes to execute low level operations
Other advanced
supports
mean
clustering
banks,
BRICS, US
pricing, early
warning
• Records the changes that take place in a cluster
• Replication of data blocks is taken care by Name
node
BDA may bring much benefit to an organization such as • Receives the heart-beat of each data node to check
banks. One of the key pillars supporting BDA is the big data whether it is alive, in case of a data node failure,
technology, tools and underlying architecture. name node will assign the task to another data node
depending on the data block availability, location,
overhead etc.
F. Big Data Analytics Technology and Architecture
Data Node: Data node serves as the slave node. Data node
can be any commodity hardware which will not create any
There are many tools for distributed processing for big problem even if the node crashes. Replication will avoid any
data framework such as Hadoop, Apache Spark, damage associated with data node failure.
ClickHouse, Elastic Search, Splunk Free Hive, Storm,
Apache Samza, Apache Flink and Apache Heron [17]. The daemon that performs the operation of data node is
However, Hadoop is one of the most popular framework to- called Task Tracker. The functions of data node include:
date for being reliable and scalable in distributed
calculations and primarily because it is open-source with • Performs low level read/write operations on blocks
strong technical support. [18]. It can manage all aspect of
Big Data such as volume, velocity, and variety by storing • Replication is implemented by Data node.
and processing of data through cluster of nodes. [19] Other • Forward data to other data nodes, pipeline the data
key features which makes Hadoop suitable for processing and send heart beats to name node.
big data includes flexibility, cost effectiveness, fault
tolerance, scalability, robustness and real time processing.

Major components of Hadoop are Hadoop Distributed


File System, Name node and Data Nodes and is elaborated
and depicted in Figure 5. [20]

6
Figure 6 shows the layers of Hadoop Cluster structure runs first and is use to filter, transform or parse data through
that can be used in big data processing. parallel processing in cluster of nodes and the output will be
used as the input for ‘Reduce’ to summarize the data from
the Map input. [22] The Map phase processes each record
Figure 6: Hadoop Cluster Structure sequentially and independently on every node and generates
intermediate key-value pairs. The Reduce processes and
merges all the intermediate values to give the final output,
again in form of key-value pairs. [19]

Figure 7: Analysis Framework

Table 4 summarizes the tools and functions of each tool


in the Hadoop system that can be implemented for big data
analytics in banking for transaction monitoring as proposed
by [21]

TABLE 4: HADOOP FILE SYSTEM TOOLS & FUNCTIONS

Tools Fig 7 is a proposed framework to handle Big data using


Function name Short description Hadoop. Big data can be loaded using Apache Sqoop for
Store HDFS Distributed file system structured data or Apache Flume for unstructured data into
Cassandra NoSql database management system the HDFS which can store big size files in multiple nodes in
Cluster cluster. It is reliable as it replicates data across the nodes and
resource hence theoretically does not need RAID (Redundant Array
management YARN Operating system for big data application of Integrated Devices) storage. The storage, access and
Data
processing Spark Engine for big data processing
modification jobs are done by two different tasks, the Job
Scikit-learn python library integrated in tracker (master) and the Task tracker (slave). The Job
Machine Spark- Apache Spark for exploratory data tracker schedules MapReduce jobs to the Task trackers
Learning sklearn analysis which know the location of the data. Data is then processed
Sparkling H2O library integrated in Apache Spark using MapReduce. At the final stage of Analysis, dataset
Water for machine learning in Hadoop system
TensorFlow library integrated in Apache which was processed with meaningful information is
Spark for deep learning in Hadoop queried. Analysis can be conducted in MapReduce codes
TensorFlow system but it requires solid programming skills. As an alternative,
A high-performance coordination service Pig Latin, the scripting language used in Pig tool can be
Coordination Zookeeper for distributed applications.
A data warehouse infrastructure that
used instead. Pig is built to run MapReduce program on top
provides data summarization and ad hoc of Hadoop. The benefit of using Pig is that fewer lines of
Hive querying. codes are required to be written which is about 5% of
A high-level data-flow language and MapReduce program. However, it takes about 50% time
Data Access execution framework for parallel more for execution. Although slower, it is still overall more
Pig computation
productive for engineers and analyst in the coding phase.
Data Application for transferring data between
Collection Sqoop relational databases and Hadoop The study by the researchers demonstrated the use of this
Application for transferring data between framework. [19]
Flume relational databases and Hadoop
Application for collecting, aggregating, Apache Spark is another distributed computing
WorkFlow Oozie and moving of unstructured data
technology that runs horizontally clustering for fast and
Monitoring Hue Web interface to monitor Hadoop efficient computation. Spark provides its computational
framework on top of Hadoop MapReduce model subsuming
interactive database queries and online processing through
Hadoop common module contains utilities and libraries streaming. The key attribute for Spark is its in-memory
needed by other Hadoop modules. HDFS, by its name itself computation, which reduces read/write data latency of the
stores data distributed over several commodity machines intermediate data during processing. Spark is capable of
present in a cluster. The management of clusters is handling different workloads such as iterative codes, batch
leveraged on YARN platform which allocates the resources programs, iterative database and streaming data. Spark is
in the present cluster and scheduling user’s applications. faster than Hadoop because of its in-memory efficiency in
MapReduce primarily has two key functions which is to read/write computation during the execution. Spark core
‘Map’ and ‘Reduce’ for the computational task in the engine supports API in Java, Scala, R and Python. It also
distributed processing for big data. Map function generally supports ‘Map’ and ‘Reduce’ functions. In addition, Spark

7
supports Machine Learning (MLlib), SQL queries (Spark Figure 10: Overall Bank Risk Management Model.
SQL), streaming data for online processing and Graph
Algorithms (GraphX). Cluster management in Apache
Spark can be performed in three different method, namely
standalone, Hadoop YARN (Yet Another Resource
Negotiator) and Mesos. Local file system of distributed file
system such as HDFS can be accessed using Spark.

Apache Spark was used in a research for classification of


banking customers behaviour using two techniques namely
Naïve Bayes (NB) and Support Vector Machine (SVM).
The research found NB to performed better than SVM on
their dataset in terms of precision, recall and F-measure.
[23]

The methodology applied and pre-processing steps are


illustrated in Figure 8 and Figure 9 respectively.

Figure 8: Methodology Steps

The comprehensive risk management is only part of the


Bank’s organizational systems. Hence the researchers
produced the Bank’s Big Data Architecture for a holistic
view on the technology used.

Bank’s Big Data Architecture is divided into four layers.


Data Application Layer uses the Data Service Layer
Figure 9: Pre-processing Steps interface to build data analysis application systems. Data
application layer is mainly used for interaction services
between risk applications and big data platforms, and
subsequently upgraded to a full-line data service platform to
achieve decoupling between downstream applications and
big data platforms. Data Service Layer includes three types
of databases: MPP, transactional database and Hadoop
Similar to the research by [23], a research was conducted Distributed File System (HDFS). Data Exchange Layer aims
using Hadoop, Pig and R for classification of customers to enable two-ways exchanges from business systems, bank
based on risk criteria for a bank. Similar pre-processing outlets and external systems. Data Access Layer accesses to
steps were employed on Hadoop HDFS but the researchers bank internal and external data. Internal data includes major
used Pig Latin to conduct queries to the pre-processed data business system transaction data (such as core system, loan
for insights on the dataset before implementing machine system), image data (such as business credentials), system
learning algorithms to classify the customers using R. For logs, etc. [24] The research provided a detailed architecture
their research, K-means clustering technique was used to design, types and use of processing engines comparison and
obtain three different clusters ranging from low, medium to and explanation of each data layer connection and
high risk. It concluded by affirming the use of Hadoop being workflow. Figure 11 is the primary design used for
a useful tool to extract insights related to big data. [20] discussion in the research.

Rapid changes in the banking industry in particular to


risk assessment has transformed a simple risk management
model to comprehensive risk management model. This
motivated a group of researchers to propose a big data
architecture of hybrid processing engines and databases. The
framework uses Hadoop ecosystem with ETL and Spark
processing engines, and using massive parallel processing
databases (MPP), transactional databases, and HDFS. [24]
To illustrate the comprehensive risk management model, the
researchers mapped the overall bank’s risk management
model as depicted in Figure 10:

8
Figure 11: Bank Big Data Architecture the adoption of Big Data strategy and mitigate the issues
upfront. The author hope that users can apply the key points
steps listed below in their organizations according to their
requirements as a draft framework for Big Data adoption.

1. Big Data Adoption Maturity Assessment


2. List of Data point sources
3. List of Product Opportunities
4. Challenges that needs to be mitigated
5. Big Data Analytics framework based on Hadoop

V. CONCLUSION

The author tried to synthesize the knowledge from the


available literature on Big Data Analytics framework and
overall environment for the banking industry which is
evolving at a very fast rate due to external push factors.

The expected contribution from this proposed outline is


for new big data practitioners, management and bankers to
have an overview of Big Data Analytics practice and the
potential that big data technologies can offer to the
organization. Proposal for further research includes a
comparison of Big Data maturity model comparison among
banks either at a domestic, regional or global level, gap
In summary, various case studies on different objectives analysis and the evaluation on the benefit/cost analysis for
using Hadoop infrastructure was summarized explaining adopting Big Data culture and technologies.
various process flow and architecture used for Big Data
Analytics. But one common finding from the cases studied
is that the use of Big Data Analytics confirmed that it
provided new insights which validates the Value in one of REFERENCES
the dimension of big data.
[1] IBM Big Data & Analytics Hub,
http://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big
data-really-matters.
III. METHODOLOGY
[2] Lee, I. (2017). Big data: Dimensions, evolution, impacts, and
challenges. Business Horizons, 60(3), 293-303.
The methodology used in the paper includes the use of [3] Lipton, A., Shrier, D., & Pentland, A. (2016). Digital banking
previous published reports, journals, and websites related to manifesto: the end of banks?. Massachusetts Institute of
Technology.
the core focus on banking, big data analytics and big data
framework to collate the information. Subsequently, key [4] Siddiqui, A. A., & Qureshi, R. (2017). Big Data In Banking:
findings from the journals are referenced and extracted to Opportunities and Challenges Post Demonetisation in
provide a comprehensive overview of Big Data Analytics in India. IOSR Journal of Computer Engineering (IOSR-JCE),
Banking. 2278-0661
[5] Gupta, M., & George, J. F. (2016). Toward the development of
a big data analytics capability. Information & Management,
53(8), 1049-1064.
IV. DISCUSSION [6] Lackovic, I. D., Kovsca, V., & Vincek, Z. L. (2016).
Framework for big data usage in risk management process in
banking institutions. In Central European Conference on
Information and Intelligent Systems (p. 49). Faculty of
Organization and Informatics Varazdin.
The reviews provided a comprehensive insight of Big
[7] Ravi, V., & Kamaruddin, S. (2017, December). Big Data
Data, Big Data Analytics, the infrastructure and tools used
Analytics Enabled Smart Financial Services: Opportunities and
for data mining and the key techniques used. Therefore, the
Challenges. In International Conference on Big Data
findings can be a base reference for all users currently
Analytics (pp. 15-39). Springer, Cham
planning to roll out Big Data projects or improving current
[8] Ndambo, D. (2016). Big Data Analytics And Competitive
technical architecture in respective organizations. Advantage Of Commercial Banks And Insurance Companies In
Nairobi, Kenya (Doctoral dissertation, University of Nairobi).
The challenges in Big Data highlighted earlier could also [9] Wibisono, O., Ari, H. D., Widjanarti, A., Zulen, A. A., &
be used to measure or prepare for the potential challenges in Tissot, B. The use of big data analytics and artificial
intelligence in central banking.

9
[10] Hussain, K., & Prieto, E. (2016). Big data in the finance and [20] Mohan, L., & Sudheep Elayidom, M. (2016). A Novel Big
insurance sectors. In New Horizons for a Data-Driven Data Approach to Classify Bank Customers-Solution by
Economy (pp. 209-223). Springer, Cham. Combining PIG, R and Hadoop. International Journal of
[11] Liu, X., Montgomery, A. L., & Srinivasan, K. (2016). Information Technology and Computer Science (IJITCS), 8(9),
Optimizing Bank Overdraft Fees with Big Data. 81-90.
[12] Shrivastava, A. (2018). Usage of Machine Learning In [21] Sapozhnikova, M. Y., Gayanova, M. M., Vulfin, A. M.,
Business Industries and Its Significant Impact. International Chuykov, A. V., & Nikonov, A. V. (2018). Processing of big
Journal of Scientific Research in Science and Technology, 4(8). data in the transaction monitoring systems. In
Информационные технологии и нанотехнологии (pp. 2526-
[13] Bhuvana, M., Thirumagal, P. G., & Vasantha, S. (2016). Big
2533).
Data Analytics-A Leveraging Technology for Indian
Commercial Banks. Indian Journal of Science and [22] Beakta, R. (2015). Big Data And Hadoop: A Review Paper.
Technology, 9(32), 1-5. International Journal of Computer Science & Information
Technology, 2(2), 13-15
[14] Srivastava, U., & Gopalkrishnan, S. (2015). Impact of big data
analytics on banking sector: Learning for Indian [23] Etaiwi, W., Biltawi, M., & Naymat, G. (2017). Evaluation of
banks. Procedia Computer Science, 50, 643-652. classification algorithms for banking customer’s behavior
under Apache Spark Data Processing System. Procedia
[15] Latib, M. A., Ismail, S. A., Yusop, O. M., Magalingam, P., &
computer science, 113, 559-564.
Azmi, A. (2018, May). Analysing Log Files For Web Intrusion
Investigation Using Hadoop. In Proceedings of the 7th [24] Ma, S., Wang, H., Xu, B., Xiao, H., Xie, F., Dai, H. N., ... &
International Conference on Software and Information Wang, T. (2018, October). Banking Comprehensive Risk
Engineering (pp. 12-21). Management System Based on Big Data Architecture of
Hybrid Processing Engines and Databases. In 2018 IEEE
[16] Hassani, H., Huang, X., & Silva, E. (2018). Digitalisation and
SmartWorld, Ubiquitous Intelligence & Computing, Advanced
big data mining in banking. Big Data and Cognitive Computing, & Trusted Computing, Scalable Computing & Communications,
2(3), 18. Cloud & Big Data Computing, Internet of People and Smart
[17] Vitaliy Ilyukha (2019). 10 Best Big Data Tools for 2020. Jelvix City Innovation
https://jelvix.com/blog/top-5-big-data-frameworks (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) (pp.
[18] Apache Software Foundation, apache hadoop. 1844-1851). IEEE.
https://hadoop.apache.org/
[19] Jain, A., & Bhatnagar, V. (2016). Crime data analysis using pig
with Hadoop. Procedia computer science, 78(C), 571-578.

10

You might also like