Read without ads and support Scribd by becoming a Scribd Premium Reader.
 
RFM analysis for decision support in e-banking area
VASILIS AGGELISWINBANKPIRAEUS BANKAthensGREECEAggelisV@winbank.grDIMITRIS CHRISTODOULAKISComputer Engineering and Informatics DepartmentUniversity of PatrasPatrasGREECEdxri@ceid.upatras.gr
 Abstract 
: The introduction of data mining methods in the banking area due to the nature and sensitivity of bank data can already be considered of great assistance to banks as to prediction, forecasting and decision support.Concerning decision making, it is very important a bank to have the knowledge of (a) customer profitabilityand customers’ grouping according to this parameter and (b) association rules between products and services itoffers in order to more sufficiently support its decisions. Object of this paper is to demonstrate that keepingtrack of customer groups according to their profitability and discovery of association rules between productsand services is of major importance as to its decision support.Key-words: Data Mining, Decision Support, Association Rules, RFM analysis, Clustering
1 Introduction
RFM analysis [5] is a three-dimensional way of classifying, or ranking, customers to determine thetop 20%, or best, customers. It is based on the80/20 principle that 20% of customers bring in 80%of revenue.In order to group customers and performanalysis, a customer segmentation model known asthe pyramid model [4] is used. The pyramid modelgroups customers by the revenue they generate,into the categories shown in Figure 1. Thesecategories or value segments are then used in avariety of analytics. The advantage of this approachis that it focuses the analytics on categories andterminology that are immediately meaningful to thebusiness.The pyramid model has been proven extremelyuseful to companies, financial organisations andbanks. Indicatively some issues that can beimproved by the use of the model follow:
 
Decision making.
 
Future revenue forecast.
 
Customer profitability.
 
Predictions concerning the alteration of customers’ position in the pyramid.
 
Understanding the reasons of thesealterations.
 
Conservation of the most importantcustomers.
 
Stimulation of inactive customers.Fig. 1 – Pyramid modelEssentially RFM analysis suggests that thecustomer exhibiting high RFM score shouldnormally conduct more transactions and result inhigher profit for the bank.RFM analysis [3, 7, 8, 9] nowadays can beconducted by the use of Data Mining methods likeclustering. These methods contribute to the moreefficient determination and exploitation of RFManalysis results.Determination of association rules concerningbank data is a challenging though demanding task since:
 
 
The volume of bank data is enormous.Therefore the data should be adequatelyprepared by the data miner before the finalstep of the application of the method.
 
The objective must be clearly set from thebeginning. In many cases not having aclear aim results in erroneous or no resultsat all.
 
Good knowledge of the data is aprerequisite not only for the data miner butalso for the final analyst (manager).Otherwise wrong results will be producedby the data miner and unreliableconclusions will be drawn by the manager.
 
Not all rules are of interest. The analystshould recognize powerful rules as todecision making.The challenge of the whole process rises from thefact that the relations established are not easilyobserved without use of data mining methods.The increase of electronic transactions during thelast years is quite rapid. E-banking nowadays offersa complete sum of products and services facilitatingnot only the individual customers but also corporatecustomers to conduct their transactions easily andsecurely. The ability to discover rules betweendifferent electronic services is therefore of greatsignificance to a bank.Identification of such rules offers advantages asthe following:
 
Description and establishment of therelationships between different types of electronic transactions. 
 
The electronic services become more easilyfamiliar to the public since specific groupsof customers are approached, that usespecific payment manners.
 
Customer approach is well designed withhigher possibility of successfulengagement.
 
The improvement of already offered bank services is classified as to those used morefrequently.
 
Reconsidering of the usefulness of productsexhibiting little or no contribution to therules.Of certain interest is the certification of theobtained rules using other methods of data miningto assure their accuracy.In the present paper, the RFM scoring of activee-banking users is studied along with the ranking of these users according to the pyramid model. Thisstudy is also concerned with the identification of rules between several different ways of payment ineach customer group.
Τ
he software used is SPSSClementine 7.0. Description of various clusteringtechniques and algorithms as well as associationrules’ basic features follow in section 2 while insection 3 the calculation of the RFM scoring of active e-banking users is and the process of investigation for association rules is described.Section 4 contains experimental results derivedfrom the data set of section 3 and section 5 containsthe main conclusions of this work accompaniedwith the impact of our model and includes futurework suggestions in this area.
2 Clustering and Association RulesBasics
2.1 Clustering techniques
Clustering techniques [2, 6] fall into a group of undirected data mining tools. The goal of undirected data mining is to discover structure inthe data as a whole. There is no target variable tobe predicted, thus no distinction is being madebetween independent and dependent variables.Clustering techniques are used for combiningobserved examples into clusters (groups) thatsatisfy two main criteria:
 
each group or cluster is homogeneous;examples that belong to the same group aresimilar to each other.
 
each group or cluster should be differentfrom other clusters, that is, examples thatbelong to one cluster should be differentfrom the examples of other clusters.Depending on the clustering technique, clusters canbe expressed in different ways:
 
identified clusters may be exclusive, so thatany example belongs to only one cluster.
 
they may be overlapping; an example maybelong to several clusters.
 
they may be probabilistic, whereby anexample belongs to each cluster with acertain probability.
 
clusters might have hierarchical structure,having crude division of examples athighest level of hierarchy, which is thenrefined to sub-clusters at lower levels.
2.2 K-means Algorithm
K-means [1, 2, 6, 11] is the simplest clusteringalgorithm. This algorithm uses as input apredefined number of clusters that is the
from itsname. Mean stands for an average, an averagelocation of all the members of a particular cluster.
 
When dealing with clustering techniques, a notionof a high dimensional space must be adopted, orspace in which orthogonal dimensions are allattributes from the table of analysed data. The valueof each attribute of an example represents adistance of the example from the origin along theattribute axes. Of course, in order to use thisgeometry efficiently, the values in the data set mustall be numeric and should be normalized in order toallow fair computation of the overall distances in amulti-attribute space.K-means algorithm is a simple, iterativeprocedure, in which a crucial concept is the one of 
centroid 
.
Centroid 
is an artificial point in the spaceof records that represents an average location of theparticular cluster. The coordinates of this point areaverages of attribute values of all examples thatbelong to the cluster. The steps of the K-meansalgorithm are given in Figure 2.1.
 
Select randomly
points (it can be alsoexamples) to be theseeds for the
centroids
of 
clusters.2.
 
Assign each example to the
centroid 
 closest to the example,forming in this way
exclusive clusters of examples.3.
 
Calculate new
centroids
of the clusters. Forthat purpose averageall attribute values of the examplesbelonging to the same cluster (
centroid 
).4.
 
Check if the cluster
centroids
have changedtheir "coordinates".If yes, start again form the step 2). If not,cluster detection isfinished and all examples have their clustermemberships defined.Fig.2 – K- means algorithmUsually this iterative procedure of redefining
centroids
and reassigning the examples to clustersneeds only a few iterations to converge.
2.3 Two Step Cluster
The Two Step cluster analysis [10] can be used tocluster the data set into distinct groups in case thesegroups are initially unknown. Similar to K-Meansalgorithm, Two Step Cluster models do not
 
use atarget field. Instead of trying to predict an outcome,Two Step Cluster tries to uncover patterns in the setof input fields. Records are grouped so that recordswithin a group or cluster tend to be similar to eachother, being dissimilar to records in other groups.Two Step Cluster is a two-step clusteringmethod. The first step makes a single pass throughthe data, during which it compresses the raw inputdata into a manageable set of subclusters. Thesecond step uses a hierarchical clustering method toprogressively merge the subclusters into larger andlarger clusters, without requiring another passthrough the data. Hierarchical clustering has theadvantage of not requiring the number of clusters tobe selected ahead of time. Many hierarchicalclustering methods start with individual records asstarting clusters, and merge them recursively toproduce ever larger clusters. Though suchapproaches often break down with large amounts of data, Two Step’s initial pre-clustering makeshierarchical clustering fast even for large data sets.
2.4 Association Rules
A rule consists of a left-hand side proposition(antecedent) and a right-hand side (consequent) [2].Both sides consist of Boolean statements. The rulestates that if the left-hand side is true, then theright-hand is also true. A probabilistic rule modifiesthis definition so that the right-hand side is truewith probability p, given that the left-hand side istrue.A formal definition of association rule [6, 13] isgiven below.
Definition
. An association rule is a rule in theform of X
YWhere
 X 
and
are predicates or set of items.As the number of produced associations mightbe huge, and not all the discovered associations aremeaningful, two probability measures, called
support 
and
confidence
, are introduced to discardthe less frequent associations in the database. Thesupport is the joint probability to find X and Y inthe same group; the confidence is the conditionalprobability to find in a group Y having found X.Formal definitions of support and confidence [6]are given below.
Definition
Given an itemset pattern X, its
 frequency
 
 fr(X)
is the number of cases in the datathat satisfy X.
Support 
is the frequency fr(X
Y).
Confidence
is the fraction of rows that satisfy Yamong those rows that satisfy X,
c(X 
 
Y)=
)()(
 X  fr  X  fr 
 In terms of conditional probability notation, theempirical accuracy of an association rule can beviewed as a maximum likelihood (frequency-based)
Search History:
Searching...
Result 00 of 00
00 results for result for
  • p.
  • Notes
    Load more