Professional Documents
Culture Documents
Mini Project work submitted to Avinashilingam Institute for Home Science and Higher
Education for Women in partial fulfillment for the degree of
SUBMITTED BY
B.DHARANI (16PCA002)
Mini Project work submitted to Avinashilingam Institute for Home Science and Higher
Education for Women in partial fulfillment for the degree of
SUBMITTED BY
B.DHARANI (16PCA002)
TRANSACTIONS” is record of the original work done by B. DHARANI under the guidance
of, Dr. G. SUDHAMATHY, MCA., Ph.D. Assistant professor Department of Computer Science,
Faculty of Science, Avinashilingam Institute for Home Science and Higher Education for
Women, in the Partial fulfillment for the degree of Master of Science in Computer Applications,
and this project work has not formed the basis for any Degree/Diploma/Associates.
PLACE:
DATE:
I would like to express our sincere thanks to God Almighty for his constant and grace
that he has showed upon us.
I am very grateful to Shri. Dr. P. R. Krishnakumar, Chancellor, Avinashilingam
Institute for Home Science and Higher Education for Women, Coimbatore, for his support and
encouragement during the course of our study.
I heartily thank Dr. Premavathy Vijayan, M.Sc., M.Ed., Dip.Spl.Edn., M.Phil.,
Ph.D., Vice Chancellor for extending all resources that facilitated the conduct of the present
study.
I express my humble gratitude to Dr. A. Kowsalya, Registrar, Avinashilingam Institute
for Home Science and Higher Education for Women, Coimbatore, for providing all facilities
necessary for the study.
I am also thankful to Dr. A. Parvathi, M.Sc., Dip.Ed., M.Phil., Ph.D., Dean Faculty of
Science, for granting the facility required.
I wish to place on record my deep sense of gratitude to Dr. V. Radha, M.Sc., M.Phil.,
Ph.D., Professor and Head, Department of Computer Science, for providing all the facilities to
complete the project.
I owe great deal of gratitude to my esteemed guide Dr. G. Sudhamathy, M.C.A, Ph.D.,
Assistant Professor, Department of Computer Science, for imparting the tremendous assistance
and well-timed support for triumph of my project.
I am very grateful to all the other Faculty members of our Department for their valuable guidance
throughout the course of the project. Last but not least, I wish to express my deep sense of gratitude and
sincere thanks to our Parents and Friends during the course of our project.
ABSTRACT
The big data revolution happening in and around 21stcentury has found a resonance
with banking firms, considering the valuable data they’ve been storing since many decades. This
data has now unlocked secrets of money movements, helped prevent major disasters and thefts
and understand consumer behaviour. Banks reap the most benefits from big data as they now can
extract good information quickly and easily from their data and convert it into meaningful
Banks internationally are beginning to harness the power of data in order to derive
utility across various spheres of their functioning, ranging from sentiment analysis, product cross
management and much more. Indian banks are catching up with their international counterparts;
This paper aims to capture how big data analytics is being successfully used in banking
2. Channel usages
1 INTRODUCTION 1
1.1. ABOUT THE PROJECT
2 SYSTEM CONFIGURATION 2
2.1. HARDWARE SPECIFICATION
2.2. SOFTWARE SPECIFICATION
2.3. ABOUT THE SOFTWARE
3 SYSTEM STUDY AND ANALYSIS 8
3.1. EXISTING SYSTEM
3.2. PROPOSED SYSTEM
4 SYSTEM DESIGN 10
4.1. INPUT DESIGN
4.2. OUTPUT DESIGN
5 SYSTEM DEVELOPMENT 11
5.1. MODULES
5.2. MODULES DESCRIPTION
6 SYSTEM TESTING AND IMPLEMENTATION 20
6.1. SYSTEM TESTING
6.2. SYSTEM IMPLEMENTATION
7 CONCLUSION 21
8 SCOPE OF FUTURE ENHANCEMENT 22
9 BIBLIOGRAPHY 23
10 APPENDIX 24
DATA SET
SCREEN SHOTS
CHAPTER 1
INTRODUCTION
2. Channel usages
1
CHAPTER 2
SYSTEM CONFIGURATION
2
2.3. ABOUT THE SOFTWARE
R is a GNU project the source code for the R software environment is written primarily in
C, FORTRAN, and R. R is freely available under the GNU General Public License, and pre-
compiled binary versions are provided for various operating systems. R uses a command line
interface; there are also several graphical front-ends for it.
R IS NOT
Database, but connects to DBMSs
Has no click-point user interfaces, but connects to Java, TclTk
Language interpreter can be very slow, but allows to call own C / C++ code
Spreadsheet view of data, but connects to Excel / Ms Office
Has no professional /commercial support
Thus, R is an integrated suite of software facilities for data manipulation, calculation and
graphical display.
R Package includes:
An effective data handling and storage facility,
A suite of operators for calculations on arrays, in particular matrices,
A large, coherent, integrated collection of intermediate tools for data analysis,
Graphical facilities for data analysis and display either onscreen or on hardcopy, and
A well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output facilities.”
4
There are several ways to work with R:
With the R console GUI;
With the RStudioIDE;
With the Tinn-R editor and the R console;
Wrom one of the other IDE such as JGR;
From a command line R interface (CLI) ;
From the ESS (Emacs Speaks Statistics) module of the Emacs editor.
GRAPHICS IN R
R is a powerful environment for data visualization
Integrated graphics
Good quality of graphics
Full control over graphics
Can be reproduced
Huge number of R packages for graphics are available
R Graphs can be viewed on screen
R Graphs can be easily interpreted for results
5
R DATA FORMATS
In R, a data frame is used for storing data tables. They don’t have to be of the same type
of vectors of the same length. For instance, you can combine in one data frame a logical, a
character and a numeric vector.
CSV: Comma Separated Values (text file) is the default file type for data analysis in RStudio
but data can also be imported from various formats.
Matrix format (Using Microsoft Excel)
Table format dataset
Microsoft Excel is a software program produced by Microsoft Corp. that allows users to
organize, format and calculate data with formulas using a spreadsheet system. This software is
part of the Microsoft office suite and is compatible with other applications in the office suite.
Excel has the same basic features as every spreadsheet, which use a collection of cells
arranged into rows and columns to organize data manipulation. They also display data as charts,
histograms and line graphs.
7
CHAPTER 3
SYSTEM STUDY AND ANALYSIS
Drawbacks
• Lack of fraudulent action detection
• Only transactional queries done
• Customer behavior not understood
• No insight found from the data
• Large history data retained with no use
Advantages
9
CHAPTER-4
SYSTEM DESIGN
As the inputs, the banking transactions dataset consists of the fields Transaction Id,
Account Id, Date, Transaction Type, Operation, Amount, Balance. This dataset consists of 32408
records for a period of 1 year from 01/01/2010 to 31/12/2010. This dataset in the Comma
Separated Values File Format is loaded into the Data Analytics Tool R. The Banking Feedback
Dataset consists of the fields Month, Service Quality, Service Speed and Solution to Queries.
The data range for these fields are 1 to 5. This dataset consists of the feedback of the customers
who visit the bank over a period of 1 year from 01/01/2010 to 31/12/2010.
10
CHAPTER-5
SYSTEM DEVELOPMENT
5.1. MODULES
The modules in this project are:
1) Data Loading
2) Data Preparation
3) Data Analysis
11
5.2. MODULE DESCRIPTION
DATA LOADING
The Banking Transactions Dataset consists of the fields Transaction Id, Account Id, Date,
Transaction Type, Operation, Amount, Balance. This dataset consists of 32408 records for a
period of 1 year from 01/01/2010 to 31/12/2010. This dataset in the Comma Separated Values
File Format is loaded into the Data Analytics Tool R. The Banking Feedback Dataset consists of
the fields Month, Service Quality, Service Speed and Solution to Queries. The data range for
these fields are 1 to 5. This dataset consists of the feedback of the customers who visit the bank
over a period of 1 year from 01/01/2010 to 31/12/2010.
DATA PREPARATION
The loaded datasets were summarized on each month and account id for further
processing. Also, this summarized data set is further divided into debit count, credit count, debit
amount and credit amount transactions. The earnings and the spending of each customer are
segregated. The minimum and maximum earnings and spending of each customer in every
month are extracted.
DATA ANALYSIS
The Customer Feedback are analyzed based on the rating given on the three
aspects listed below over the period of 1 year, that is each month in the year 2010.
12
xrange <- (summarycsi$month)
colors = rainbow(3)
title('Customer Feedback')
query <- "SELECT month, account_id, count FROM summarydata WHERE type =
'CREDIT'"
13
c) Net Credit Transactions Amount Per Month Per Account
The Bank Transaction data is aggregated on the Month, Account Id and
Transaction Type and the amount of such credit transactions amount. The amount of net
credit transactions are found. The visual graphs for the “Net Credit Transactions
Amount” are generated. This is done to understand the customer segmentation and for
customer profiling.
query <- "SELECT month month1, account_id account_id1, amount amount1 FROM
summarydata WHERE type = 'CREDIT'"
query <- "SELECT month, account_id, count FROM summarydata WHERE type = 'DEBIT'"
14
e) Net Debit Transactions Amount Per Month Per Account
The Bank Transaction data is aggregated on the Month, Account Id and
Transaction Type and the amount of such debit transactions amount. The amount of net
debit transactions are found. The visual graphs for the “Net Debit Transactions Amount”
are generated. This is done to understand the customer segmentation and for customer
profiling.
query <- "SELECT month month2, account_id account_id2, amount amount2 FROM
summarydata WHERE type = 'DEBIT'"
query <- "SELECT month1, account_id1, amount1, amount2, (amount1 - amount2) 'netamt'
FROM summarydatacramt, summarydatadbamt WHERE month1 = month2 AND account_id1
= account_id2"
15
g) Transaction Type and Operation Per Account
The Data is analyzed to find the count of Transactions based on the Operation Type
and the amount of Transactions based on the Operation Type. This is done for every
month and for every account. The analysis are presented as bar charts for visual analysis.
This helps in studying the Channel usages. The visual graphs for the “Transaction Type
and Operation Type” are generated.
query <- "SELECT account_id, operation, sum(amount) 'amount', count(*) 'count' FROM
bank GROUP BY account_id,operation ORDER BY account_id, operation"
query <- "SELECT account_id, month, maxamount FROM summaryminmax WHERE type =
'CREDIT'"
16
query <- "SELECT account_id 'accid', month 'mnth', -maxamount 'minamount' FROM
summaryminmax WHERE type = 'DEBIT'"
query <- "SELECT account_id, month, minamount, maxamount FROM earning, spending
WHERE account_id = accid AND month = mnth"
17
spendingpattern <- sqldf(query)
plot(xrange, yrange, type = "n", xlab = "Month", ylab = "Amount (in Rs. 10000)")
colors = rainbow(8)
18
lines(spendingpattern$month , spendingpattern$maxamount13, type = "o", col = colors[6], lwd
= 2)
par(mar=c(0, 0, 0, 0))
plot.new()
19
CHAPTER-6
SYSTEM TESTING AND IMPLEMENTATION
The procedure level testing is made first. By giving improper inputs, the errors occurred
are noted and eliminated. Thus the system testing is a confirmation and an opportunity to show
the user that the system works. The final step involves validation testing, which determines the
software functions as expected. This is the final step in system life cycle. Here we implement the
tested error free system into real life environment and make necessary changes. In our project the
entire data of the banking transaction dataset is used for system testing and individual dataset are
used for unit testing. The system is now found to be error free and ready for use by any user
other than the developer.
Implementation is the process of installing the software into the system so that it will be
provided with original data to process. Implementation phase is started after only the successful
completion of the testing phase in which the above tests should be carried. System
implementation is an activity that continues throughout the development phase. It is the process
of bringing a developed system into the operational use. An implementation provides for test
plans, equipment installation, and a plan for converting from the old system to the new system.
20
CHAPTER-7
CONCLUSION
Data analytics is now being implemented across various spheres of banking sector, and is
helping them deliver better services to their customers, both internal and external, along with
which is also helping them improve on their active and passive security systems. In this project,
we saw how customer sentiments are captured and used to assess functioning of the bank. We
have done transactional analysis and observed how banks today use spending patterns of their
customers, perform consumer behavior based on channel usage and consumption patterns and
segment consumers depending upon the aforementioned attributes, and identify potential
21
CHAPTER-8
SCOPE OF FUTURE ENHANCEMENT
This study can be further extended into trying and quantifying the financial and non-
financial benefits of the Bank reaped after their implementation of Data Analytics and predict the
improvements in financial statements of the bank. This work can also be extended to cover the
various data mining techniques that can be used by banks to improve the analysis quality.
22
CHAPTER-9
BIBLIOGRAPHY
REFERENCES
1.) How can financial services industry unlock the value of big data.
PricewaterhouseCoopers; 2013.
2.) Big Data: The next big thing. Nasscom; 2012.
3.) David Floyer. Financial Comparison of Big Data MPP Solution and Data Warehouse
Appliance; 2013.
4.) Oracle Financial Services. Initial Steps on the Journey through Big Data for Financial
Services Institutions; 2012
5.) Pivotal Case Study –China CITIC Bank. Driving Revenue and Reducing Risk; 2013
6.) Business Software themes for 2014. Wells Fargo Securities. Equity Research; 2014
23
CHAPTER-10
APPENDIX
DATA SETS
BANKING TRANSACTIONS DATASET
24
CUSTOMER SATISFACTION INDEX ANALYSIS
25
NET CREDIT TRANSACTIONS AMOUNT
26
NET DEBIT TRANSACTIONS AMOUNT
27
NET TRANSACTIONS AMOUNT
28
SPENDING AND CREDIT PATTERNS OF CUSTOMERS
29