Customer Churn Analysis in Banking Sector Using Data Mining Techniques

Vol 8. No.
3 – September, 2015
African Journal of Computing & ICT
© 2015 Afr J Comp & ICT – All Rights Reserved - ISSN 2006-1781
www.ajocict.net
Customer Churn Analysis In Banking Sector Using Data Mining

Techniques
A. O. Oyeniyi & A.B. Adeyemo
Department of Computer Science,
University of Ibadan
Ibadan, Nigeria
aoyeniyi1@gmail.com, sesanadeyemo@gmail.com
ABSTRACT
Customer churn has become a major problem within a customer centred banking industry and banks have always tried to track
customer interaction with the company, in order to detect early warning signs in customer's behaviour such as reduced
transactions, account status dormancy and take steps to prevent churn. This paper presents a data mining model that can be used
to predict which customers are most likely to churn (or switch banks). The study used real-life customer records provided by a
major Nigerian bank. The raw data was cleaned, pre-processed and then analysed using WEKA, a data mining software tool for
knowledge analysis. Simple K-Means was used for the clustering phase while a rule-based algorithm, JRip was used for the rule
generation phase. The results obtained showed that the methods used can determine patterns in customer behaviours and help
banks to identify likely churners and hence develop customer retention modalities.
Keywords: Customer, banking, data mining, churn analysis, WEKA, retention models & K-means.
African Journal of Computing & ICT Reference Format:

A. O. Oyeniyi & A.B. Adeyemo (2015): Customer Churn Analysis In Banking Sector Using Data Mining Techniques.
Afr J. of Comp & ICTs. Vol 8, No. 3. Pp 165-174.
1. INTRODUCTION
The regulatory framework within which financial institutions The banking industry needs to intensify campaign to deliver a
and insurance firms operate require their interaction with more efficient, customer focused and innovative offerings to
customers to be tracked, recorded, stored in Customer reconnect with their customers. The problem of churn
Relationship Management (CRM) databases, and then data analysis is not peculiar to the banking industry. Churning is
mine the information in a way that increases customer an important problem that has been studied across several
relations, average revenue per unit (ARPU) and decrease areas of interest [16], such as mobile and telephony [2]; [3];
churn rate. According to [23], churn has an equal or greater [17], insurance [27], and healthcare [9], [4]. Other sectors
impact on Customer Lifetime Value (CLTV) when compared where the customer churn problem has been analysed
to one of the most regarded Key Performance Indicator includes online social network churn analysis [20]; [1], and
(KPI’s) such as Average Revenue Per User (ARPU). As one the retail banking industries [8]; 12]; [18].
of the biggest destructors of enterprise value, it has become
one of the top issues for the banking industry. Customers 1.1 Data Mining
churn prediction is aimed at determining customers who are Data mining is an important component of every CRM
at risk of leaving, and whether such customers are worth framework that facilitates analysis of business problems,
retaining. prepare data requirements, and build, validate and evaluate
models for business problems [32]. The data mining process
Churn or customer attrition is a term adopted to define the and algorithms enable firms to search, discover hidden
movement of customers from one provider to another [15], patterns and correlations among data, and to extract relevant
and it is also regarded as the annual turnover of the market knowledge buried in corporate data warehouses, in order to
base, while recognizing the fact that it cost five (5) times gain broader understanding of business. Data mining uses
more to acquire new customers than to retain existing sophisticated statistical data search algorithms to find,
customers’ database, as companies often spend fortune on discover hidden patterns and relationships, and extracts
advertisement to acquire new customers [17]. Therefore knowledge buried in corporate data warehouses, or
banks now need to shift their attention from customer information that visitors have dropped about their experience,
acquisition to customer retention, provide accurate churn most of which can lead to improvements in the understanding
prediction models, and effective churn prevention strategies and use of the data in order to detect significant patterns and
as added customer retention solutions to preventing churn rules underlying consumer’s behaviours.
[24]. And as [18] also observed better products, convenience
and lower fees are not enough to prevent customer churn.
165
Vol 8. No. 3 – September, 2015
www.ajocict.net
Data mining involves four tasks; classification, clustering, Decision tree is also used for classification patterns or
regression and association learning; which are classified into piecewise functions. Cluster analysis is an approach by which
two types of data mining; verification-oriented (where the a set of instances (without predefined class attribute) is
system verifies the user’s hypothesis) and discovery-oriented grouped into several clusters based on information found in
(where the system finds new rules and patterns the data that describes the objects and their relationships [30].
autonomously). Data mining process compliment other data A cluster uses a collection of data objects that are similar to
analysis techniques such as statistics, on-line analytical one another within the same cluster and are dissimilar to the
processing (OLAP), spreadsheets, and basic data access. objects in another cluster. While in classification the classes
are defined prior to building the model, cluster analysis
1.2 Data Mining Techniques divides the data based on their similarities. There are
Generally, there are two types of data mining tasks: different types of clustering from different point of view. The
descriptive data mining tasks that describe the general most common types divide them all into two types,
properties of the existing data, and predictive data mining partitional and hierarchical methods. Partitional clustering is
tasks that attempt to do predictions based on available data. a simple division of a set of data objects into non-overlapping
Data mining applications can use different kind of parameters segments such that each data object is in exactly one segment
to examine the data. They include association (patterns where and if we permit clusters to have sub-clusters then we have
one event is connected to another event), sequence or path hierarchical clustering.
analysis (patterns where one event leads to another event),
classification (identification of new patterns with predefined 1.3 Model Performance Evaluation
targets) and clustering (grouping of identical or similar The performance of machine learning algorithms is typically
objects) [14]. Decision tree is a symbolic learning technique evaluated using predictive accuracy. However, this is not
that organizes information extracted from a training dataset in appropriate when the data is imbalanced and/or the costs of
a hierarchical structure composed of nodes and ramification. different errors vary remarkably [6]. All classifier’s
The tree-like output of decision tree makes it easy to performance evaluation involves certain level of trade-off
understand and interpret, making it the mostly widely used between true positive (TP) rate and true negative (TN) rate;
data mining algorithms in many domain such as supplier and the same applies for recall and precision. Precision,
selection and email user churn analysis [13]. It is capable of Recall and F-Measure are commonly used in information
building models based on numerical and categorical datasets. retrieval as performance measure [5].
1.4 Related Works

Table 1 presents a listing of some of the techniques used in some of the related work on churn prediction in the banking domain.
Table 1: Summary of Related Work on Churn Prediction in Banking

Researchers Data Mining Techniques Domain
Wang, G., Liu, L., Peng, Y., Nie, G., MCDM, BayesNet, Logistic Regression, J48, PART, CART, Banking
Kou, G., & Shi, Y. [29] Naïve Bayes, Random Forests, IBK, SMO, Kstar, BFTree
Chiang, D.-A., Wang, Y.-F., Lee, S.- Association Rules (Goal-Oriented Sequential Pattern) Network Banking
L., & Lin, C. J. [7]
Shao, J., Xiu, L., & Wenhuang, L. Boosting Schemes (Real AdaBoost, Gentle AdaBoost, & Commercial Banking
[26] Modest AdaBoost
Naveen, N., Ravi, V., & Kumar, D. A. fuzzyARTMAP (Neural Network Architecture) Banking (Credit Cards)
[21]
Zhao, Y., Li, B., Li, X., Liu, W., & Support Vector Machine Banking
Ren [34]
Dudyala, A. K., & Ravi, V. [10] Ensemble System (Multilayer Perceptron Neural Network, Banking
Logistic Regression (LR), Decision Tree (J48), Random
Forest (RF), Radial Basis Function (RBF) Network &
Support Vector Machine (SVM)
Farquad, M. A., Ravi, V., & Raju, S. Support Vector Machine (SVM), Naïve Bayes Tree Banking (Credit Cards)
B. [11] (NBTree), SVM+NBTree Hybrid
Teemu, M., Ahola, J., & Nousiainen, Logistic Regression Retail Banking
S. [28]
Zhao, J., & Dang, X.-H. [33] Support Vector Machine (SVM) Wireless Industry
Nie, G., Wang, G., Zhang, P., Tian, Logistic Regression Banking (Credit Cards)
Y., & Shi, Y. [25]
166
www.ajocict.net
2. MATERIALS AND METHODS
The methodology used for this study is illustrated diagrammatically in Figure 1.
Figure 1: Research methodology flowchart
2.1 Data Acquisition

The dataset used for this study for customer churn prediction was acquired from a major Nigerian bank. The raw data was
extracted from the bank's customer relationship management database and transactional data warehouse which contained more
than 1,048,576 customer records described with over 11 attributes. Attributes such as customer name, account number, record
start, record closed descriptor that do not affect the customer churn prediction, and or tend to violate the privacy and
confidentiality status of the customer records were identified and removed. Also, attributes with a lot of missing values were
removed due to the fact that it was difficult recreating values that can fit in for omitted attributes like date of birth (recorded as
age), customer type and account type.
The input attributes descriptors used were those for:

1. Customer demographics is the geographic and population data of a given customer or, information about a group living
in a particular area;
2. Account level is the billing system including charges; and
3. Customer behaviour is any behaviour related to a customer's bank account
167
www.ajocict.net
Error! Reference source not found. presents the descriptors. The customer demographics category included: age and gender. In
the customer behaviour category, the descriptors used were dormancy date (dormcDate)and account balance (balance).
Table 2: Customer's Banking Records Data attributes

Attributes Descriptors
Attributes Data type Description

Age Numeric Customer’s age at the study time
Gender Male or Female (Nominal) Customer’s state of being male, female
Balance Naira currency (Numeric) Average balance in naira
dormStatus Yes or No (Nominal) State of account: active or inactive
The customer records had different account types like The data preparation phase in this study covers all the
individual account and corporate account types. The activities needed to construct the final datasets from the raw
corporate account type had a lot of missing values that sample data obtained from the Nigerian bank. For the
concerns customer demographics such as date of birth, and classification stage, the dataset was divided into training
gender. According to [22] customer demographics have been dataset and test validation dataset. The training dataset was
widely used to differential how a segment of customers used in constructing models for analysing the customer churn
differs from one another. In determining customer churn or problem. The constructed model was validated using the test
switching in banking, customer demographics such as age, validation dataset. Performance evaluation of the applied data
and job type were shown to have an effect on customers mining techniques was carried out to test the goodness of fit,
switching banks. Other studies proposed inclusion of and adequacy of the constructed models in customer churn
additional demographics characteristics such as gender, race and non-churn prediction and analysis. The outcome of the
and occupation as they have great impact on customers model validation and performance evaluation was the ability
switching behaviour in the banking industry. Hence the of the applied model to accurately predict churn and non-
corporate account types were not used in the study. churn customers.
2.2 Data Preparation 2.3 WEKA Machine Learning Workbench

Data preparation tasks consider transforming acquired The Waikato Environment for Knowledge Analysis
datasets to remove noise, inconsistencies, incoherence, bias otherwise known as WEKA, is a collection of machine
and redundancies. The data preparation tasks includes table, learning algorithms for data mining tasks, which can either be
record, and attribute selection as well as transformation and applied directly to a desired dataset or invoked from within a
elimination of data for modelling, that can be performed java code. The WEKA machine learning workbench was
multiple times, in no prescribed order. The preliminary developed by Machine Learning Group at the University of
diagnosis is conducted on the datasets to gain an insight into Waikato New Zealand and distributed under GNU General
their properties by scaling or standardizing the data to reduce Public license. Embedded within the WEKA workbench are
the level of dispersion between the variables in the datasets variants of machine learning tools such as data pre-
[19]. The dataset was pre-processed in Waikato Environment processing and visualisation tools, classification and
for Knowledge Analysis (WEKA) to clean, transform and regression techniques, clustering and association rule mining
establish relationship between the input variables and the techniques, which are well suited for developing new models
output variables. and machine learning schemes. The WEKA machine learning
workbench has a unique file format known as Attribute-
Relation File Format (.arff) for converting and pre-processing
datasets for analysis and evaluation. It also accept datasets
saved in a command-separated value file format (.csv), using
Load Converter function to convert (.csv) to (.arff), and can
connect and load datasets into WEKA from database and
website (url). Figure 2 shows part of the bank dataset in an
attribute-relation file format (.arff).
168
www.ajocict.net
@relationbank_data
@attributeage{}
@attributegender{1,2}
@attributelastTxnDate nominal
@attributedormcDatenominal
@attributebalancenumeric
@attributejobType{1}
@attributecustType{1}
@attributedormtStatus (1,0})
@data
, Male, 11/11/2014, 10/05/2016, 14133, Health& MedicalServices, INDIVIDUAL, N
, Female, , 05/09/2015, 31000, Health& MedicalServices, INDIVIDUAL, N
, Female, 31/10/2014, 01/11/2015, 51018.86, Health& MedicalServices, INDIVIDUAL, N
Figure 2: Bank dataset in an attribute-relation file format
3.0 RESULTS AND DISCUSSION

In this study churn prediction was modeled using K-Means clustering techniques and Repeated Incremental Pruning to Produce
Error Reduction (RIPPER) also known as JRip algorithm in WEKA. The prepared data was used to generate clusters with similar
attributes and to generate rule sets using JRip algorithm. The raw banking datasets acquired consists of 1,048,576 customer’s
records, with 11 attributes descriptors. After rigorous data cleaning and transformation, customer’s records considered for final
analysis consists of 4958 customers banking records with 8 attributes descriptors of which 500 customer records with four
attribute descriptors were used for the study.
For the classifier that was used the datasets was divided into training, testing and cross validation datasets using percentage split
and k-fold cross-validation to avoid training and testing on the same data that could leads to false result. The data was split as
66% for training and testing with the remaining 34%. 10-fold stratified cross-validation was used. The Sample data in Excel
(.CSV) Format is presented in Figure 3.
169
www.ajocict.net
Figure 3: Sample data in Excel (.CSV) Format
3.2 Cluster Analysis

and some who do not perform any transactional activity on
Due to the nature of the segment of the banking sector which their account, is considered. And if a churner is defined as “a
was used which was non-contract based, it is important to person who has not used his/her account for 3 months”, then
give an appropriate definition of churn prior to building the a considerable part of the customers who use their account
prediction model. In almost all studies reviewed, bank occasionally, for instance every 2 months, would be
customers are those customers who had relationship with the mistakenly considered as a churner. If a longer time span is
bank. Consequently, “churn” in such condition could be used for the prediction period and a churner is now defined as
defined as the terminating of contract from the customer’s “a person who hasn’t used his/her account for a year”, the
side or not reactivating an account(s) after going into model may not be able to recognize the real churners. This
dormancy. However, in the banks there is no contract will increase the number of False Negative (FN) and False
between the bank and its customers. The customer through positive (FP) and consequently lower the level of model’s
marketing or self-will can simply decide to open either accuracy.
individual or corporate account(s) type and automatically In the first stage of the empirical analysis, K-Means
become a customer. On the other hand, at any time, clustering technique was applied the training set in WEKA.
customers can stop operating their accounts with the bank, As a result, different numbers of clusters were generated, in
and become a churner without leaving immediate trace. This order to choose the best number of clusters (that is: 2 clusters,
implies that churn in such cases happens with no tracking 3 clusters or 5 clusters etc.) which models the problem. It was
point such as closing of account or inactive account this discovered that the result which gave five clusters was better
makes it difficult to recognize churners. For example if a than the others. In the second stage of the analysis, the chosen
customer database consisting of a number of customers with clusters were then analyzed using the RIPPER (JRip)
different transactional activities, some of which perform classification algorithm. The clustering algorithm was
daily/weekly activity on their account either by walking into therefore used to partition the customers into groups with
the banking hall or using the online banking system platform; similar clustering characteristics based on their attributes as
shown in Table 3.
170
www.ajocict.net
Table 3: Characteristics of 5 initially extracted clusters from the customer’s database

Clusters Attributes Descriptions
# Clusters Instances (%) Age Gender Balance DormtStatus
0 183(37%) 57 Male 6265.5100 Y
1 26(5%) 50 Female 25783.1569 N
2 117(23%) 62 Male 18139.4779 N
3 76(15%) 32 Female 10601.4605 N
4 98(20%) 34 Male 20712.7888 N
Table 3 shows that behavior of Female customers who were d. Cluster 3 – Cluster Three consists of 76
clustered in groups 1 and 3 while that of Male customer was cluster instances, representing 15 percent of
clustered in groups 0, 2, and 4. The transactional Female bank customers in their early 30’s
characteristics of the customers in each of the groups are with operating account balance above 10000
presented as follows:
NGN, who are at no risk of Churn based on
a. Cluster 0 – Cluster Zero consists of 183
their account dormancy status identified as
cluster instances, representing 37 percent of
being Active; and
Male bank customers in their late 50’s with
e. Cluster 4 – Cluster Four consists of 98
operating account balance above 6000 NGN,
who are at high risk of Churn based on their
Male bank customers in their early 30’s with
account dormancy status identified as being
operating account balance above 20000
Inactive;
b. Cluster 1 – Cluster One consists of 26 cluster
NGN, who are at no risk of Churn based on
instances, representing 5 percent of Female bank their account dormancy status identified as
customers in their early 50’s with operating account being Active.
balance above 25000 NGN, who are at no risk to
Churn based on their account dormancy status One interesting way to examine the data in these clusters is to
identified as being Active; inspect the clusters visually through Visualize cluster
c. Cluster 2 – Cluster Two consists of 117 assignments in WEKA. In order to demonstrate in a chart
how the clusters are grouped in terms of Age and Dormancy
Status, the chart was plotted with Age (Num) on the X axis,
Male bank customers in their early 60’s with Dormancy Status (Num) on the Y axis, and the Color to
operating account balance above 18000 Cluster (Nom) with the “Jitter” turned up completely to
NGN, who are at no risk of Churn based on artificially scatter the Plot Points for enhanced visualization
their account dormancy status identified as as shown in Figure 4.
being Active;
Figure 4: Cluster Visual Inspection of Age(Num) against DormtStatus(Num)
171
www.ajocict.net
3.3 RIPPER (JRip) Classification Analysis

In order to evaluate the developed model, RIPPER (JRip) classification rule based algorithm was used to generate sets of rules
and performance evaluation metrics that define the goodness of fit of the developed model. At this stage of empirical analysis,
two criteria are used based on the confusion matrix. Criteria one is based on the Accuracy rate represented by equation 5, which
identifies the percentage of the total number of predictions that were correctly classified; while criteria two is based on the Actual
Churners ‘Rate which identifies the percentage of Churners that were correctly identified. However, to provide more robust and
better indication of how well the developed classification through clustering model will perform when trained on new data, 10-
fold cross validation was applied. Figure 5 shows the WEKA Knowledge Flowchart for the JRip classifier algorithm.
Figure 5: JRip algorithm Knowledge Flowchart
Using 10-folds cross-validation, the dataset is split into 10 mutually exclusive subsets, and then each subset is used as test set
while the remaining datasets are used as training set. This procedure is repeated 10 times, where one fold is used for pruning, and
the other folds are used in growing the rules, example of which are shown in Table 5.
Table 5: RIPPER (JRip) classifier algorithm rule set

# Rules
1 (age >= 62) and (balance <= 1357.282557) => dormtStatus=Y (111.0/18.0)
2 (balance <= 22923.25813) and (balance >= 1169.263145) and (gender = Male) and (age >= 34) and (age <= 45) =>
dormtStatus=Y (28.0/6.0)
3 => dormtStatus=N (361.0/85.0)
4. CONCLUSION
According to [24], customer churn analysis has become a major concern in almost every industry that offers products and
services. In this study a data mining model that can be used to predict which customers are most likely to churn or switch their
banks was developed. The model used K-Means clustering in the first stage and a rule-based algorithm (JRip) in the second stage.
The model was developed using case data from one of the major banks in Nigeria. The developed model can provide banks with
useful knowledge regarding customer transactional behavior, help banks to identify likely churners and hence develop customer
retention modalities. Bank customer databases contain a lot of information which if appropriately analysed can further enhance
banking practices and operations. Future work will also focus on fraud detection and marketing and enhancement of banking
operations and practices.
172
www.ajocict.net
REFERENCES
[1] Abbasimehr, H., Setak, M., & Soroor, J. (2013). A [12] Gorgoglione, M., & Panniello, U. (2011). Beyond
framework for identification of high-value Customer Churn: Generating Personalized Actions
customers by including social network based to Retain Customers in a Retail Bank by a
variables for churn prediction using neuro-fuzzy Recommender System Approach. Journal of
techniques. International Journal of Production Intelligent Learning Systems and Applications,
Research, 51(4), 1279-1294. 3(2), 90-102.
[2] Ahna, J.-H., Hana, S.-P., & Leeb, Y.-S. (2006). [13] Guangli, N., Rowe, W., Zhang, L., Tian, Y., & Shi,
Customer churn analysis: Churn determinants and Y. (2011). Credit card churn forecasting by logistic
mediation effects of partial defection in the Korean regression and decision tree. Expert Systems with
mobile telecommunications service industry. Applications, 15273–15285.
Telecommunications Policy, 30(10), 552–568. [14] Gupta, S., Kumar, D., & Sharma, A. (2011). Data
[3] Bose, I., & Chen, X. (2009). Hybrid Models Using mining classification techniques applied for breast
Unsupervised Clustering for Prediction of cancer diagnosis and prognosis. Indian Journal of
Customer Churn. Journal of Organizational Computer Science and Engineering (IJCSE), 188-
Computing & Electronic Commerce, 19(2), 133- 193.
151. [15] Hadden, J. (2008). A Customer Profiling
[4] Buettgens, M., Nichols, A., & Dorn, S. (2012). Methodology for Churn Prediction. PhD
Churning under the ACA and state policy options Dissertaion, Manufacturing Department, Cranfield
for mitigation. Washington (DC): Urban Institute. University, UK.
[5] Chao, C., Liaw, A., & Breiman, L. (2004). Using [16] Kawale, J., Pal, A., & Srivastava, J. (2009). Churn
random forests to learn imbalanced data. Prediction in MMORPGs: A Social Influence
University of California, Berkeley. Based Approach. In Computational Science and
[6] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Engineering, 2009. CSE'09. International
Kegelmeyer, W. P. (2002). SMOTE: Synthetic Conference on.4, pp. 423-428. IEEE.
Minority Over-sampling Technique. Journal of [17] Khan, A. A., Jamwal, S., & M.M.Sepehri. (2010).
Artificial Intelligence Research, 321–357. Applying Data Mining to Customer Churn
[7] Chiang, D.-A., Wang, Y.-F., Lee, S.-L., & Lin, C. Prediction in an Internet Service Provider.
J. (2003). Goal-oriented sequential pattern for International Journal of Computer Applications,
network banking churn analysis. Expert Systems 9(7), 8-14.
with Applications, 25(3), 293-302. [18] Klimontowicz, M. (2014). Customer-Centricity
[8] Chitra, K., & B.Subashini. (2011). Customer Evolution as a Foundation of Bank's Competitive
Retention in Banking Sector using Predictive Data Strategy. Journal of Economics and Management,
Mining Technique. ICIT 2011 The 5th 16.
International Conference on Information [19] Li, X., Nsofor, G. C., & Song, L. (2009). A
Technology. comparative analysis of predictive data mining
[9] Deprez, R., Buterbaugh, C., Stabler, H., & techniques. International Journal of Rapid
Leonhard, E. (2011). Issues and PoliciesAround Manufacturing, 1(2), 50-172.
"Churning" In Health Insurance Plans: [20] Long, X., Yin, W., An, L., Ni, H., Huang, L., Luo,
Development of the Vermont Exchange Design. . Q., & Chen, Y. (2012). Churn Analysis of Online
University of New England, UNE Center for Social Network Users Using Data Mining
Community & Public Health (CCPH). Portland ME Techniques. In Proceedings of the International
04103: CCPH. MultiConference of Engineers and Computer
[10] Dudyala, A. K., & Ravi, V. (2008). Predicting Scientists.1. Hong Kong: IMECS 2012.
credit card customer churn in banks using data [21] Naveen, N., Ravi, V., & Kumar, D. A. (2009).
mining. International Journal of Data Analysis Application of fuzzyARTMAP for churn prediction
Techniques and Strategies, 1(1), 4-28. in bank credit cards. International Journal of
[11] Farquad, M. A., Ravi, V., & Raju, S. B. (2009). Information and Decision Sciences, 4(1), 428-444.
Data mining using rules extracted from SVM: an [22] Ndung'u, A. W. (2013). Modeling of Churn
application to churn prediction in bank credit cards. Behavior of Bank Customers Using Logistic
In Rough Sets, Fuzzy Sets, Data Mining and Regression. Kenyatta University of Agriculture and
Granular Computing (pp. 390-397). Springer Technology.
Berlin Heidelberg,. [23] NG Data. (2014). Predicting & Preventing Banking
Customer Churn by Unlocking Big Data. NG Data.
Retrieved October 15, 2014, from
http://www.ngdata.com/
173
www.ajocict.net
[24] Nguyen, E. H. (2011). Customer Churn Prediction [34] Zhao, Y., Li, B., Li, X., Liu, W., & Ren, S. (2005).
for the Icelandic Mobile Telephony Market. 60 Customer churn prediction using improved one-
ECTS Thesis , University of Iceland, Engineering class support vector machine. In Advanced data
& Natural Sciences, Sigillum. mining and applications (pp. 300-306). Springer
[25] Nie, G., Wang, G., Zhang, P., Tian, Y., & Shi, Y. Berlin Heidelberg, 2005.
(2009). Finding the hidden pattern of credit card
holder’s churn: A case of China. In Computational
Science–ICCS 2009 (pp. 561-569). Springer Berlin Author’s Biographies
Heidelberg.
[26] Shao, J., Xiu, L., & Wenhuang, L. (2007). The
Application ofAdaBoost in Customer Churn Dr. A. B. Adeyemo is a Lecturer at
Prediction. In Service Systems and Service the Computer Science Department,
Management, 2007 International Conference on University of Ibadan, Oyo State,
(pp. 1-6). IEEE. Nigeria. He has a PhD, M.Tech and
[27] Soeini, R. A., & Rodpysh, K. V. (2012). Applying PGD in Computer Science from the
Data Mining to Insurance Customer Churn Federal University of Technology
Management. In International Proceedings of Akure and B.Sc (Hons) in
Computer Science and Information Technology Engineering Physics from the
(IACSIT Hong Kong Conferences).30, pp. 82-92. University of Ife (now Obafemi Awolowo University) Ile-Ife.
Singapore: IACSIT Press, Singapore. His research interests are in Data Mining, Computer
[28] Teemu, M., Ahola, J., & Nousiainen, S. (2006). Networking and Internet Computing. Email:
Customer churn prediction - a case study in retail sesanadeyemo@gmail.com
banking. In ECML/PKDD 2006 Workshop on
Practical Data Mining:Applications, Experiences
and Challenges (pp. 13-19). Berlin: SAS. Oluwaseyi Oyeniyi is a student of
[29] Wang, G., Liu, L., Peng, Y., Nie, G., Kou, G., & University of Ibadan, Oyo State,
Shi, Y. (2010). Predicting credit card holder churn Nigeria. She hold B.Sc, M.Sc
in banks of China using data mining and MCDM. Computer Science and Professional
In 2010 IEEE/WIC/ACM International Conference certifications and trainings
on Web Intelligence and Intelligent Agent CCNA,ITIL, PMP,CISA. She has
Technology (WI-IAT).3, pp. 215-218. IEEE. banking experience with deep
[30] Weia, C. P., & Chiub, I. T. (2002). Turning understanding of core banking
telecommunications call details to churn prediction: application Oracle FCUBS v.12.0
a data mining approach. Expert Systems with and Oracle FCDB.
Applications, 23(2), 103-112.
[31] Xie, Y., Li, X., Ngai, E., & Ying, W. (2009).
Customer churn prediction using improved
balanced random forests. Expert Systems with
Applications, 36(6), 5445–5449.
[32] Xu, S., & Qiu, M. (2008). A Privacy Preserved
Data Mining Framework for Customer
Relationship Management. Journal of Relationship
Marketing, 7(3), 309-321.
[33] Zhao, J., & Dang, X. H. (2008). Bank Customer
Churn Prediction Based on Support Vector
Machine: Taking a Commercial Bank's VIP
Customer Churn as the Example. Wireless
Communications, Networking and Mobile
Computing, 2008. WiCOM '08. 4th International
Conference on (pp. 1-4). Dalian: IEEE.
174

Customer Churn Analysis in Banking Sector Using Data Mining Techniques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Customer Churn Analysis in Banking Sector Using Data Mining Techniques

Uploaded by

Copyright:

Available Formats

Vol 8. No.

Customer Churn Analysis In Banking Sector Using Data Mining

African Journal of Computing & ICT Reference Format:

1.4 Related Works

Table 1: Summary of Related Work on Churn Prediction in Banking

2. MATERIALS AND METHODS

The methodology used for this study is illustrated diagrammatically in Figure 1.

Figure 1: Research methodology flowchart

2.1 Data Acquisition

The input attributes descriptors used were those for:

Table 2: Customer's Banking Records Data attributes

Attributes Data type Description

2.2 Data Preparation 2.3 WEKA Machine Learning Workbench

Figure 2: Bank dataset in an attribute-relation file format

3.0 RESULTS AND DISCUSSION

Figure 3: Sample data in Excel (.CSV) Format

3.2 Cluster Analysis

Table 3: Characteristics of 5 initially extracted clusters from the customer’s database

Figure 4: Cluster Visual Inspection of Age(Num) against DormtStatus(Num)

3.3 RIPPER (JRip) Classification Analysis

Figure 5: JRip algorithm Knowledge Flowchart

Table 5: RIPPER (JRip) classifier algorithm rule set

You might also like