Analytics On Banking Transactions Document

ANALYTICS ON BANKING TRANSACTIONS
Mini Project work submitted to Avinashilingam Institute for Home Science and Higher
Education for Women in partial fulfillment for the degree of
MASTER OF SCIENCE IN COMPUTER APPLICATIONS
SUBMITTED BY
B.DHARANI (16PCA002)
Under the Guidance of

Mrs. Dr. G. SUDHAMATHY MCA., Ph.D.
Assistant Professor Department of Computer Science
AVINASHILINGAM INSTITUTE FOR HOME SCIENCE AND HIGHER EDUCATION

FOR WOMEN
FACULTY OF SCIENCE
DEPARTMENT OF COMPUTER SCIENCE
COIMBATORE-641043
ANALYTICS ON BANKING TRANSACTIONS
Mini Project work submitted to Avinashilingam Institute for Home Science and Higher
Education for Women in partial fulfillment for the degree of
MASTER OF SCIENCE IN COMPUTER APPLICATIONS
SUBMITTED BY
B.DHARANI (16PCA002)
Under the Guidance of

Mrs. Dr. G. SUDHAMATHY MCA., Ph.D.
Assistant Professor Department of Computer Science
AVINASHILINGAM INSTITUTE FOR HOME SCIENCE AND HIGHER EDUCATION

FOR WOMEN
FACULTY OF SCIENCE
DEPARTMENT OF COMPUTER SCIENCE
COIMBATORE-641043
Signature of the supervisor Signature of the Co-ordinator
Signature of the Head of the Department Signature of the External Examinar

DECLARATION
I hereby declare that the project entitled “ANALYTICS ON BANKING
TRANSACTIONS” is record of the original work done by B. DHARANI under the guidance
of, Dr. G. SUDHAMATHY, MCA., Ph.D. Assistant professor Department of Computer Science,
Faculty of Science, Avinashilingam Institute for Home Science and Higher Education for
Women, in the Partial fulfillment for the degree of Master of Science in Computer Applications,
and this project work has not formed the basis for any Degree/Diploma/Associates.
PLACE:
DATE:
Signature of the Candidate

ACKNOWLEDGEMENT
I would like to express our sincere thanks to God Almighty for his constant and grace
that he has showed upon us.
I am very grateful to Shri. Dr. P. R. Krishnakumar, Chancellor, Avinashilingam
Institute for Home Science and Higher Education for Women, Coimbatore, for his support and
encouragement during the course of our study.
I heartily thank Dr. Premavathy Vijayan, M.Sc., M.Ed., Dip.Spl.Edn., M.Phil.,
Ph.D., Vice Chancellor for extending all resources that facilitated the conduct of the present
study.
I express my humble gratitude to Dr. A. Kowsalya, Registrar, Avinashilingam Institute
for Home Science and Higher Education for Women, Coimbatore, for providing all facilities
necessary for the study.
I am also thankful to Dr. A. Parvathi, M.Sc., Dip.Ed., M.Phil., Ph.D., Dean Faculty of
Science, for granting the facility required.
I wish to place on record my deep sense of gratitude to Dr. V. Radha, M.Sc., M.Phil.,
Ph.D., Professor and Head, Department of Computer Science, for providing all the facilities to
complete the project.
I owe great deal of gratitude to my esteemed guide Dr. G. Sudhamathy, M.C.A, Ph.D.,
Assistant Professor, Department of Computer Science, for imparting the tremendous assistance
and well-timed support for triumph of my project.
I am very grateful to all the other Faculty members of our Department for their valuable guidance
throughout the course of the project. Last but not least, I wish to express my deep sense of gratitude and
sincere thanks to our Parents and Friends during the course of our project.
ABSTRACT
The big data revolution happening in and around 21stcentury has found a resonance
with banking firms, considering the valuable data they’ve been storing since many decades. This
data has now unlocked secrets of money movements, helped prevent major disasters and thefts
and understand consumer behaviour. Banks reap the most benefits from big data as they now can
extract good information quickly and easily from their data and convert it into meaningful
benefits for themselves and their customers.
Banks internationally are beginning to harness the power of data in order to derive
utility across various spheres of their functioning, ranging from sentiment analysis, product cross
selling, regulatory compliances management, reputational risk management, financial crime
management and much more. Indian banks are catching up with their international counterparts;
however a lot of scope remains.
This paper aims to capture how big data analytics is being successfully used in banking
sector, with respect to following aspects:
1. Spending pattern of customers
2. Channel usages
3. Customer Segmentation and Profiling
4. Product Cross Selling based on the profiling to increase hit rate
5. Sentiment and feedback analysis
6. Security and fraud management

CONTENTS
S. NO. PARTICULARS PAGE NO
1 INTRODUCTION 1
1.1. ABOUT THE PROJECT
2 SYSTEM CONFIGURATION 2
2.1. HARDWARE SPECIFICATION
2.2. SOFTWARE SPECIFICATION
2.3. ABOUT THE SOFTWARE
3 SYSTEM STUDY AND ANALYSIS 8
3.1. EXISTING SYSTEM
3.2. PROPOSED SYSTEM
4 SYSTEM DESIGN 10
4.1. INPUT DESIGN
4.2. OUTPUT DESIGN
5 SYSTEM DEVELOPMENT 11
5.1. MODULES
5.2. MODULES DESCRIPTION
6 SYSTEM TESTING AND IMPLEMENTATION 20
6.1. SYSTEM TESTING
6.2. SYSTEM IMPLEMENTATION
7 CONCLUSION 21
8 SCOPE OF FUTURE ENHANCEMENT 22
9 BIBLIOGRAPHY 23
10 APPENDIX 24
DATA SET
SCREEN SHOTS
CHAPTER 1
INTRODUCTION
1.1. ABOUT THE PROJECT

The big data revolution happening in and around 21st century has found a resonance with
financial service firms, considering the valuable data they’ve been storing since many decades.
This data has now unlocked secrets of money movements, helped prevent major disasters and
thefts and understand consumer behavior. Banks reap the most benefits from Data Analytics as
they now can extract good information quickly and easily from their data and convert it into
meaningful benefits for themselves and their customers. Financial firms are looking forward to
application of Data Analytics in spheres like front office risk management to back office trade
operations.
This paper aims to capture how big data analytics is being successfully used in banking
sector, with respect to following aspects:
1. Spending pattern of customers
2. Channel usages
3. Customer Segmentation and Profiling
4. Product Cross Selling based on the profiling to increase hit rate
5. Sentiment and feedback analysis
6. Security and fraud management
1
CHAPTER 2
SYSTEM CONFIGURATION
2.1. HARDWARE SPECIFICATION

 Hard Disk : 232.88 GB
 Monitor : 21’ Color with VGI card support
 RAM : 2.00 GB
 Processor : AMDA4PRO-3340B with Radeon HD Graphics
 Processor speed : 2.20 GHz
 System Type : 64-bit Operating System
2.2. SOFTWARE SPECIFICATION

 Operating System : Windows 8.1 Pro
 Front End : R Programming
 Back End : MS Excel, CSV Files
2
2.3. ABOUT THE SOFTWARE
FRONT END: ABOUT R

R is a programming language and software environment for statistical computing and
graphics. The R language is widely used among statisticians and data miners for developing
statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly
literature databases show that R's popularity has increased substantially in recent years.
R is an implementation of the S programming language combined with lexical scoping

semantics inspired by Scheme. S was created by John Chambers while at Bell Labs. There are
some important differences, but much of the code written for S runs unaltered. R was created by
Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently
developed by the R Development Core Team, of which Chambers is a member. R is named
partly after the first names of the first two R authors and partly as a play on the name of S.
R is a GNU project the source code for the R software environment is written primarily in
C, FORTRAN, and R. R is freely available under the GNU General Public License, and pre-
compiled binary versions are provided for various operating systems. R uses a command line
interface; there are also several graphical front-ends for it.
The main advantages of R are:

 R is a Free, Open Source Language
 R is Cross-Platform Compatible
 Most Advanced Statistical Programming Language
 Outstanding Graphical Outputs
 R is Flexible ‘n’ Fun
 R is Extremely Comprehensive
 R easily Relates to other Programming Languages
 R can handle huge data in flat files
 R can handle unstructured data
3
R IS
 An open source programming language
 Data handling and storage: numeric, textual
 Matrix algebra
 Hash tables and regular expressions
 High-level data analytic and statistical functions
 Graphics
 Object Oriented Programming language: loops, branching, subroutines
R IS NOT
 Database, but connects to DBMSs
 Has no click-point user interfaces, but connects to Java, TclTk
 Language interpreter can be very slow, but allows to call own C / C++ code
 Spreadsheet view of data, but connects to Excel / Ms Office
 Has no professional /commercial support
Thus, R is an integrated suite of software facilities for data manipulation, calculation and
graphical display.
R Package includes:
 An effective data handling and storage facility,
 A suite of operators for calculations on arrays, in particular matrices,
 A large, coherent, integrated collection of intermediate tools for data analysis,
 Graphical facilities for data analysis and display either onscreen or on hardcopy, and
 A well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output facilities.”
4
 There are several ways to work with R:
 With the R console GUI;
 With the RStudioIDE;
 With the Tinn-R editor and the R console;
 Wrom one of the other IDE such as JGR;
 From a command line R interface (CLI) ;
 From the ESS (Emacs Speaks Statistics) module of the Emacs editor.
GRAPHICS IN R
 R is a powerful environment for data visualization
 Integrated graphics
 Good quality of graphics
 Full control over graphics
 Can be reproduced
 Huge number of R packages for graphics are available
 R Graphs can be viewed on screen
 R Graphs can be easily interpreted for results
BASE GRAPHICS PLOTTING FUNCTIONS

 Plot - x-y plotting
 Bar plot - bar plots
 Boxplot - box & whisker plot
 Hist - histogram
 Pie - pie charts
 Dotchart - dot plots
 Image, heatmap, contour, persp - functions to generate image like plots
5
R DATA FORMATS
In R, a data frame is used for storing data tables. They don’t have to be of the same type
of vectors of the same length. For instance, you can combine in one data frame a logical, a
character and a numeric vector.
 CSV: Comma Separated Values (text file) is the default file type for data analysis in RStudio
but data can also be imported from various formats.
 Matrix format (Using Microsoft Excel)
 Table format dataset
BACKEND: MS EXCEL / CSV FILES
Microsoft Excel is a software program produced by Microsoft Corp. that allows users to
organize, format and calculate data with formulas using a spreadsheet system. This software is
part of the Microsoft office suite and is compatible with other applications in the office suite.
Excel is a commercial spreadsheet application produced and distributed by Microsoft for

Microsoft Windows and Mac OS X. It features the ability to perform basic calculations, use
graphing tools, create pivot tables and create macro programming language.
Excel has the same basic features as every spreadsheet, which use a collection of cells
arranged into rows and columns to organize data manipulation. They also display data as charts,
histograms and line graphs.
The Features of Microsoft Excel are

 Multi-threading recalculation (MTR) for commonly used functions
 Improved pivot tables
 More conditional formatting options
 Additional image editing capabilities
6
 In-cell charts called spark lines
 Ability to preview before pasting
 Ability to customize the Ribbon
 Many new formulas, most highly specialized to improve accuracy
7
CHAPTER 3
SYSTEM STUDY AND ANALYSIS
3.1. EXISTING SYSTEM

In the existing system, the transaction are captured and stored in the database. No
descriptive or predictive analytics is done to understand the customer behavior and take
further actions accordingly.
Drawbacks
• Lack of fraudulent action detection
• Only transactional queries done
• Customer behavior not understood
• No insight found from the data
• Large history data retained with no use
3.2. PROPOSED SYSTEM

The aim of proposed system, to capture how data analytics is being successfully
used in banking sector, with respect to following aspects:
• Spending pattern of customers

• Channel usages
• Customer Segmentation and Profiling
• Product Cross Selling based on the profiling to increase hit rate
• Sentiment and feedback analysis
• Security and fraud management
Advantages
• Fraudulent activities are identified

• Customer Feedback Analysis done
• Spending and Earning Patterns of Account Holders Understood
8
• Greater efficiency
• Better Service
• Minimum time required
9
CHAPTER-4
SYSTEM DESIGN
4.1. INPUT DESIGN

It is the process of converting user oriented description of inputs to a computer based
specification. The input design includes collecting the required data and grouping similar are
related data. Input design is the part of overall system design, which requires very careful
attention. MS Excel and CSV Files are used as input files for processing
As the inputs, the banking transactions dataset consists of the fields Transaction Id,
Account Id, Date, Transaction Type, Operation, Amount, Balance. This dataset consists of 32408
records for a period of 1 year from 01/01/2010 to 31/12/2010. This dataset in the Comma
Separated Values File Format is loaded into the Data Analytics Tool R. The Banking Feedback
Dataset consists of the fields Month, Service Quality, Service Speed and Solution to Queries.
The data range for these fields are 1 to 5. This dataset consists of the feedback of the customers
who visit the bank over a period of 1 year from 01/01/2010 to 31/12/2010.
4.2. OUTPUT DESIGN

The important feature of an information system is the output without the reliable and
expected results; a user even feels that the entire system is a failure one. Any output from a
computerized system requires communicating the result of data processing to the users and the
output obtained from the system. The scatter plot graphs and the output data in text format are
the final outputs of this project. The graphs can be viewed in R GUI and the text data can be
transferred to an Excel file and further analysis can be done. The objectives of output design are
accuracy, efficiency and security for the data stored there after.
10
CHAPTER-5
SYSTEM DEVELOPMENT
System Development is a series of operations performed to manipulate data to produce

output from a computer system.
5.1. MODULES
The modules in this project are:
1) Data Loading
2) Data Preparation
3) Data Analysis
a) Customer Satisfaction Index Analysis
b) Net Credit Transactions Count
c) Net Credit Transactions Amount
d) Net Debit Transactions Count
e) Net Debit Transactions Amount
f) Net Transactions Amount
g) Transaction Types and Operations
h) Spending and Credit Patterns of Customers
11
5.2. MODULE DESCRIPTION
DATA LOADING
The Banking Transactions Dataset consists of the fields Transaction Id, Account Id, Date,
Transaction Type, Operation, Amount, Balance. This dataset consists of 32408 records for a
period of 1 year from 01/01/2010 to 31/12/2010. This dataset in the Comma Separated Values
File Format is loaded into the Data Analytics Tool R. The Banking Feedback Dataset consists of
the fields Month, Service Quality, Service Speed and Solution to Queries. The data range for
these fields are 1 to 5. This dataset consists of the feedback of the customers who visit the bank
over a period of 1 year from 01/01/2010 to 31/12/2010.
DATA PREPARATION
The loaded datasets were summarized on each month and account id for further
processing. Also, this summarized data set is further divided into debit count, credit count, debit
amount and credit amount transactions. The earnings and the spending of each customer are
segregated. The minimum and maximum earnings and spending of each customer in every
month are extracted.
DATA ANALYSIS
a) Customer Satisfaction Index Analysis
The Customer Feedback are analyzed based on the rating given on the three
aspects listed below over the period of 1 year, that is each month in the year 2010.
1) Is the customer happy with the quality of service?
2) Is the customer happy with the speed of service?
3) Are customer queries addressed effectively?
12
xrange <- (summarycsi$month)
yrange <- (summarycsi$range)
plot(xrange, yrange, type = "n", xlab = "Month", ylab = "CSI value")
colors = rainbow(3)
lines(summarycsi$month, summarycsi$'Service.Quality', type = 'b', col = colors[1], lwd = 2)
lines(summarycsi$month, summarycsi$'Service.Speed', type = 'b', col = colors[2], lwd = 2)
lines(summarycsi$month, summarycsi$'Solution.to.Queries', type = 'b', colors[3], lwd = 2)
title('Customer Feedback')
csilegend <-c("Service Quality, Service Speed, Solution to Queries")
legend("topleft", csilegend, cex = 0.7, fill=colors)
b) Net Credit transactions Count Per Month Per Account

The Bank Transaction data is aggregated on the Month, Account Id and
Transaction Type and the count of such credit transactions count. The count of net credit
transactions are found. The visual graphs for the “Net Credit Transactions Count” are
generated. This is done to understand the customer segmentation and for customer
profiling.
query <- "SELECT month, account_id, count FROM summarydata WHERE type =
'CREDIT'"
summarydatacr <- sqldf(query)
barchart(count ~ month, groups=account_id, summarydatacr, auto.key = list(columns = 3),

main = "Net Credit Transactions Count", xlab = "Month", ylab = "Transaction Count")
13
c) Net Credit Transactions Amount Per Month Per Account
Transaction Type and the amount of such credit transactions amount. The amount of net
credit transactions are found. The visual graphs for the “Net Credit Transactions
Amount” are generated. This is done to understand the customer segmentation and for
customer profiling.
query <- "SELECT month month1, account_id account_id1, amount amount1 FROM
summarydata WHERE type = 'CREDIT'"
summarydatacramt <- sqldf(query)
barchart(amount1 ~ month1, groups=account_id1, summarydatacramt, auto.key =

list(columns = 3), main = "Net Credit Transactions Amount", xlab = "Month", ylab =
"Transaction Amount (in Rs.")
d) Net Debit transactions Count Per Month Per Account

Transaction Type and the count of such debit transactions Count. The count of net debit
transactions are found. The visual graphs for the “Net Debit Transactions Count” are
generated. This is done to understand the customer segmentation and for customer
profiling.
query <- "SELECT month, account_id, count FROM summarydata WHERE type = 'DEBIT'"
summarydatadb <- sqldf(query)
barchart(count ~ month, groups=account_id, summarydatadb, auto.key = list(columns = 3),

main = "Net Debit Transactions Count", xlab = "Month", ylab = "Transaction Count")
14
e) Net Debit Transactions Amount Per Month Per Account
Transaction Type and the amount of such debit transactions amount. The amount of net
debit transactions are found. The visual graphs for the “Net Debit Transactions Amount”
are generated. This is done to understand the customer segmentation and for customer
profiling.
query <- "SELECT month month2, account_id account_id2, amount amount2 FROM
summarydata WHERE type = 'DEBIT'"
summarydatadbamt <- sqldf(query)
barchart(amount2 ~ month2, groups=account_id2, summarydatadbamt, auto.key =

list(columns = 3), main = "Net Debit Transactions Amount", xlab = "Month", ylab =
"Transaction Amount (in Rs.")
f) Net Transactions Amount
Based on historical transactions and consumption capacity of customers, coupled

with the behavioral analysis can help us reveal a potential threat to the system, as well as
uncover frauds that might have happened in the past. Any unusual count or amount of
transaction in a particular account in a particular month can be an indication of fraudulent
activity. The Bank Transaction data is aggregated on the Month, Account Id and
Transaction Type and the amount of such debit and credit transactions. The visual graphs
for the “Net Transaction Amount” are generated.
query <- "SELECT month1, account_id1, amount1, amount2, (amount1 - amount2) 'netamt'
FROM summarydatacramt, summarydatadbamt WHERE month1 = month2 AND account_id1
= account_id2"
summarydatanetamt <- sqldf(query)
barchart(netamt/1000 ~ month1, groups=account_id1, summarydatanetamt, auto.key =

list(columns = 3), main = "Net Transactions Amount", xlab = "Month", ylab = "Transaction
Amount (in Rs. 1000)", ylim = c(0, 5000))
15
g) Transaction Type and Operation Per Account
The Data is analyzed to find the count of Transactions based on the Operation Type
and the amount of Transactions based on the Operation Type. This is done for every
month and for every account. The analysis are presented as bar charts for visual analysis.
This helps in studying the Channel usages. The visual graphs for the “Transaction Type
and Operation Type” are generated.
query <- "SELECT account_id, operation, sum(amount) 'amount', count(*) 'count' FROM
bank GROUP BY account_id,operation ORDER BY account_id, operation"
summaryoperation <- sqldf(query)
barchart(count ~ account_id, groups=operation, summaryoperation, auto.key = list(columns =

3), main = "Transactions Count for Operation Type", xlab = "account_id", ylab =
"Transaction Count")
barchart(amount/100000 ~ account_id, groups=operation, summaryoperation, auto.key =

list(columns = 3), main = "Transactions Amount for Operation Type", xlab = "account_id",
ylab = "Transaction Amount (in Rs. 100000)", ylim = c(0, 300))
h) Spending and credit patterns of Customers

The Data is analyzed to find the earnings and spending patterns of the various
account holder and how the pattern changes over the months. This is done for product
selling based on the customer profiling.
query <- "SELECT account_id, month, type, min(amount) 'minamount', max(amount)

'maxamount' FROM bank GROUP BY month, account_id, type ORDER BY month,
account_id, type"
summaryminmax <- sqldf(query)
query <- "SELECT account_id, month, maxamount FROM summaryminmax WHERE type =
'CREDIT'"
earning <- sqldf(query)
16
query <- "SELECT account_id 'accid', month 'mnth', -maxamount 'minamount' FROM
summaryminmax WHERE type = 'DEBIT'"
spending <- sqldf(query)
query <- "SELECT account_id, month, minamount, maxamount FROM earning, spending
WHERE account_id = accid AND month = mnth"
spendingpattern <- sqldf(query)
query <- "SELECT month 'month11', minamount 'minamount11', maxamount

'maxamount11' FROM spendingpattern WHERE account_id = 'xxxx-xxxx-xxxx-xx11' "
spendingpattern11 <- sqldf(query)



query <- "SELECT month11 'month', minamount11/10000 'minamount11',

maxamount11/10000 'maxamount11', minamount12/10000 'minamount12',
maxamount14/10000 'maxamount14' FROM spendingpattern11, spendingpattern12,
spendingpattern13, spendingpattern14 WHERE month11 = month12 AND month11 =
month13 AND month11 = month14 "
17
spendingpattern <- sqldf(query)
spendingpattern$range <- sample(-10:10, 12, replace=T)
spendingpattern <- spendingpattern[-grep(-220.1002, spendingpattern$minamount12),]
spendingpattern <- spendingpattern[-grep(-20.1001, spendingpattern$minamount12),]
spendingpattern <- spendingpattern[-grep(420.1003, spendingpattern$maxamount14),]
layout(matrix(c(1,2), nrow = 1), widths = c(0.7, 0.3))
par(mar = c(5, 4, 4, 2) + 0.1)
spendingpattern$month <- as.numeric(spendingpattern$month)
xrange <- spendingpattern$month
yrange <- spendingpattern$range
plot(xrange, yrange, type = "n", xlab = "Month", ylab = "Amount (in Rs. 10000)")
colors = rainbow(8)
lines(spendingpattern$month , spendingpattern$minamount11, type = "o", col = colors[1], lwd

= 2)
lines(spendingpattern$month , spendingpattern$maxamount11, type = "o", col = colors[2], lwd

= 2)

= 2)

= 2)

= 2)
18
= 2)

= 2)

= 2)
title("Spending and Credit Pattern - Customer Behaviour")
splegend <- c("Min of xxxx-xxxx-xxxx-xx11", "Max of xxxx-xxxx-xxxx-xx11", "Min of xxxx-

xxxx-xxxx-xx12", "Max of xxxx-xxxx-xxxx-xx12", "Min of xxxx-xxxx-xxxx-xx13", "Max of
xxxx-xxxx-xxxx-xx13", "Min of xxxx-xxxx-xxxx-xx14", "Max of xxxx-xxxx-xxxx-xx14")
par(mar=c(0, 0, 0, 0))
plot.new()
legend('right', splegend, cex = 0.7, fill = colors)
19
CHAPTER-6
SYSTEM TESTING AND IMPLEMENTATION
6.1. SYSTEM TESTING

Testing in the system was done to ensure the integrity of the
system. Testing is the vital for the success of the project, which is the last stage of the
software development testing, has several purposes. Testing is done for each module. After
testing all the modules, the modules are integrated and testing of the final system is done with the
test data, specially designed to show that the system will operated successfully in all its aspects
conditions.
The procedure level testing is made first. By giving improper inputs, the errors occurred
are noted and eliminated. Thus the system testing is a confirmation and an opportunity to show
the user that the system works. The final step involves validation testing, which determines the
software functions as expected. This is the final step in system life cycle. Here we implement the
tested error free system into real life environment and make necessary changes. In our project the
entire data of the banking transaction dataset is used for system testing and individual dataset are
used for unit testing. The system is now found to be error free and ready for use by any user
other than the developer.
6.2. SYSTEM IMPLEMENTATION
Implementation is the process of installing the software into the system so that it will be
provided with original data to process. Implementation phase is started after only the successful
completion of the testing phase in which the above tests should be carried. System
implementation is an activity that continues throughout the development phase. It is the process
of bringing a developed system into the operational use. An implementation provides for test
plans, equipment installation, and a plan for converting from the old system to the new system.
20
CHAPTER-7
CONCLUSION
Data analytics is now being implemented across various spheres of banking sector, and is
helping them deliver better services to their customers, both internal and external, along with
which is also helping them improve on their active and passive security systems. In this project,
we saw how customer sentiments are captured and used to assess functioning of the bank. We
have done transactional analysis and observed how banks today use spending patterns of their
customers, perform consumer behavior based on channel usage and consumption patterns and
segment consumers depending upon the aforementioned attributes, and identify potential
customers for selling financial products.
21
CHAPTER-8
SCOPE OF FUTURE ENHANCEMENT
This study can be further extended into trying and quantifying the financial and non-
financial benefits of the Bank reaped after their implementation of Data Analytics and predict the
improvements in financial statements of the bank. This work can also be extended to cover the
various data mining techniques that can be used by banks to improve the analysis quality.
22
CHAPTER-9
BIBLIOGRAPHY
REFERENCES
1.) How can financial services industry unlock the value of big data.
PricewaterhouseCoopers; 2013.
2.) Big Data: The next big thing. Nasscom; 2012.
3.) David Floyer. Financial Comparison of Big Data MPP Solution and Data Warehouse
Appliance; 2013.
4.) Oracle Financial Services. Initial Steps on the Journey through Big Data for Financial
Services Institutions; 2012
5.) Pivotal Case Study –China CITIC Bank. Driving Revenue and Reducing Risk; 2013
6.) Business Software themes for 2014. Wells Fargo Securities. Equity Research; 2014
23
CHAPTER-10
APPENDIX
DATA SETS
BANKING TRANSACTIONS DATASET
BANKING FEEDBACK DATASET
24
CUSTOMER SATISFACTION INDEX ANALYSIS
NET CREDIT TRANSACTIONS COUNT
25
NET CREDIT TRANSACTIONS AMOUNT
NET DEBIT TRANSACTIONS COUNT
26
NET DEBIT TRANSACTIONS AMOUNT
TRANSACTION TYPES AND OPERATIONS
27
NET TRANSACTIONS AMOUNT
28
SPENDING AND CREDIT PATTERNS OF CUSTOMERS
29

Analytics On Banking Transactions Document

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analytics On Banking Transactions Document

Uploaded by

Copyright:

Available Formats

ANALYTICS ON BANKING TRANSACTIONS

MASTER OF SCIENCE IN COMPUTER APPLICATIONS

Under the Guidance of

AVINASHILINGAM INSTITUTE FOR HOME SCIENCE AND HIGHER EDUCATION

MASTER OF SCIENCE IN COMPUTER APPLICATIONS

Under the Guidance of

AVINASHILINGAM INSTITUTE FOR HOME SCIENCE AND HIGHER EDUCATION

Signature of the supervisor Signature of the Co-ordinator

Signature of the Head of the Department Signature of the External Examinar

I hereby declare that the project entitled “ANALYTICS ON BANKING

Signature of the Candidate

benefits for themselves and their customers.

selling, regulatory compliances management, reputational risk management, financial crime

however a lot of scope remains.

sector, with respect to following aspects:

1. Spending pattern of customers

3. Customer Segmentation and Profiling

4. Product Cross Selling based on the profiling to increase hit rate

5. Sentiment and feedback analysis

6. Security and fraud management

S. NO. PARTICULARS PAGE NO

1.1. ABOUT THE PROJECT

1. Spending pattern of customers

3. Customer Segmentation and Profiling

4. Product Cross Selling based on the profiling to increase hit rate

5. Sentiment and feedback analysis

6. Security and fraud management

2.1. HARDWARE SPECIFICATION

2.2. SOFTWARE SPECIFICATION

FRONT END: ABOUT R

R is an implementation of the S programming language combined with lexical scoping

The main advantages of R are:

BASE GRAPHICS PLOTTING FUNCTIONS

BACKEND: MS EXCEL / CSV FILES

Excel is a commercial spreadsheet application produced and distributed by Microsoft for

The Features of Microsoft Excel are

3.1. EXISTING SYSTEM

3.2. PROPOSED SYSTEM

• Spending pattern of customers

• Fraudulent activities are identified

4.1. INPUT DESIGN

4.2. OUTPUT DESIGN

System Development is a series of operations performed to manipulate data to produce

a) Customer Satisfaction Index Analysis

b) Net Credit Transactions Count

c) Net Credit Transactions Amount

d) Net Debit Transactions Count

e) Net Debit Transactions Amount

f) Net Transactions Amount

g) Transaction Types and Operations

h) Spending and Credit Patterns of Customers

a) Customer Satisfaction Index Analysis

1) Is the customer happy with the quality of service?

2) Is the customer happy with the speed of service?

3) Are customer queries addressed effectively?

yrange <- (summarycsi$range)

plot(xrange, yrange, type = "n", xlab = "Month", ylab = "CSI value")

lines(summarycsi$month, summarycsi$'Service.Quality', type = 'b', col = colors[1], lwd = 2)

lines(summarycsi$month, summarycsi$'Service.Speed', type = 'b', col = colors[2], lwd = 2)

lines(summarycsi$month, summarycsi$'Solution.to.Queries', type = 'b', colors[3], lwd = 2)

csilegend <-c("Service Quality, Service Speed, Solution to Queries")

legend("topleft", csilegend, cex = 0.7, fill=colors)

b) Net Credit transactions Count Per Month Per Account