You are on page 1of 26

Information Sciences 172 (2005) 215240

www.elsevier.com/locate/ins

Temporal analysis of clusters


of supermarket customers: conventional
versus interval set approach
Pawan Lingras a,*, Mofreh Hogo b,
Miroslav Snorek b, Chad West c
a
Department of Mathematics and Computer Science, Saint Marys University,
Halifax, NS, Canada B3H 3C3
b
Department of Computer Science and Engineering, Faculty of Electrical Engineering,
Czech Technical University, Karlovo Nam. 13, 121 35 Prague 2, Czech Republic
c
IBM Toronto Software Development Laboratory, 8200 Warden Ave, Markham,
ON, Canada L6G 1C7

Received 14 May 2004; accepted 28 December 2004

Abstract

Temporal data mining is the application of data mining techniques to data that takes
the time dimension into account. This paper studies changes in cluster characteristics of
supermarket customers over a 24 week period. Such an analysis can be useful for for-
mulating marketing strategies. Marketing managers may want to focus on specic
groups of customers. Therefore they may need to understand the migrations of the cus-
tomers from one group to another group. The marketing strategies may depend on the
desirability of these cluster migrations. The temporal analysis presented here is based on
conventional and modied Kohonen self organizing maps (SOM). The modied
Kohonen SOM creates interval set representations of clusters using properties of rough
sets. A description of an experimental design for temporal cluster migration studies

*
Corresponding author. Tel.: +1 902 420 5798; fax: +1 902 420 5035.
E-mail address: pawan.lingras@smu.ca (P. Lingras).

0020-0255/$ - see front matter  2005 Elsevier Inc. All rights reserved.
doi:10.1016/j.ins.2004.12.007
216 P. Lingras et al. / Information Sciences 172 (2005) 215240

including, data cleaning, data abstraction, data segmentation, and data sorting, is pro-
vided. The paper compares conventional and non-conventional (interval set) clustering
techniques, as well as temporal and non-temporal analysis of customer loyalty. The
interval set clustering is shown to provide an interesting dimension to such a temporal
analysis.
 2005 Elsevier Inc. All rights reserved.

Keywords: Temporal data mining; Rough set theory; Modied kohonen SOM; Loyalty

1. Introduction

Temporal data mining is fundamental to many domains including market


analysis, nancial applications, Web, medicine, and computer security
[3,4,13]. There is an increasing interest in data mining techniques that accom-
modate temporal features (i.e., take into account the time dimension). If the
analysis is based on current or latest snapshots, it is dicult to understand
the evolution or changing nature of the facility. The facility in question may
be a supermarket, a website, a highway, or a service dealership. The temporal
analysis may also reveal cyclical changes. These evolutionary and cyclical
changes may be helpful for planning of resources. Certain groups of users of
the facility may be interesting because of the dynamics of their behavior. For
example, a user may be a cyclical user of the facility, or her usage may be either
on the increase or decrease. Such dynamics can be revealed through temporal
data mining. Much of the early temporal data mining tasks were related to the
use and analysis of temporal sequences of raw data [1,2,4,15,17,28
30,33,34,37,38]. There is a growing body of recent work that analyzes the re-
sults of data mining over a period of time [68,11,32,35,36]. Aggarwal [1]
and Kifer et al. [15] worked with the analysis of changes in data streams. Dong
et al. [6] listed some of the research problems with mining changes from data
streams. They described implications of these changes in relation to classica-
tion and clustering. Ganti et al. [7] described a framework for measuring
changes in data characteristics that encompasses a wide ranging data mining
techniques such as frequent item sets, classiers, and clustering. Ha et al. [8]
studied the changing nature of data mining results such as segmentation and
classication from the customer relationship management (CRM) aspect. They
provided a variety of measures to quantify the desirability of these changes.
The implications of changes in data mining results to CRM were also investi-
gated by West et al. [36]. They attempted to identify customer attrition by
studying changes in cluster memberships. Hogo et al. [11] demonstrated that
the analysis of clusters from dierent time periods can also reveal additional
information about usages of a web site. Song et al. [32] studied the association
rule in the context of temporal changes in data streams. Wang et al. [35] studied
P. Lingras et al. / Information Sciences 172 (2005) 215240 217

the problem of mining changes in classications with temporal changes in data.


The present study focuses on studying changing nature of conventional and
interval set representations of supermarket clusters.
Clustering methods seek out a special type of structure, namely, grouping
tendencies in the data. In this regard, they are not as general as other ap-
proaches, but can provide valuable information when local aggregation of
the data is suspected. Clustering groups together users or data items with sim-
ilar characteristics. As opposed to classication, the grouping process in clus-
tering is unsupervised. The actual categorization of objects, even for a sample,
is not known. The clustering process is an important step in establishing object
proles. The objects can be web users, web documents, customers, or facilities
such as highways. Clustering in data mining faces several additional challenges
compared to conventional clustering applications [5,9,10,12,14]. The clusters
tend to have fuzzy boundaries. Instead of an object precisely belonging to a
cluster, it may be assigned a degree of fuzzy membership to one or more clus-
ters. There is a likelihood that an object may be a candidate for more than one
cluster. Joshi and Krishnapuram [14] argued that clustering operations in data
mining involve modeling an unknown number of overlapping sets. Lingras
[18,19,2123] proposed three dierent approaches for unsupervised creation
of rough or interval set representations of clusters: evolutionary, statistical,
and neural. Lingras [18] described how a rough set theoretic clustering scheme
could be represented using a rough set genome. The objective of the GAs was
to minimize the with-in-group error. Lingras [18] provided a formulation of
within-group error for rough set based clustering. The resulting genetic algo-
rithms (GAs) were used to evolve interval clustering of highway sections. Ling-
ras [19] applied the unsupervised rough set clustering based on GAs for
grouping web users of a rst year university course. He hypothesized that there
are three types of visitors: studious, crammers, and workers. Studious visitors
download notes from the site regularly. Crammers download most of the notes
before an exam. Workers come to the site to nish assigned work such as lab
and class assignments. Generally, the boundaries of these clusters will not be
precise. Lingras [19] illustrated the feasibility of rough set clustering for devel-
oping user proles on the web. However, the clustering process based on GAs
seemed computationally expensive for scaling to larger datasets. The K-means
algorithm is one of the most popular statistical techniques for conventional
clustering. Lingras and West [23] provided a theoretical and experimental anal-
ysis of a modied K-means clustering based on the properties of rough sets.
The modied K-means approach is also suitable for large datasets. It was used
to create interval set representations of clusters of web users as well as a large
set of supermarket users. The Kohonen neural network or self-organizing map
[12,16] is another popular clustering technique. The Kohonen network is desir-
able in some applications due to its adaptive capabilities. Lingras et al. [21,22]
introduced interval set clustering, using a modication of the Kohonen
218 P. Lingras et al. / Information Sciences 172 (2005) 215240

self-organizing maps, based on rough set theory. Both the modied statistical
and neural network based approaches used the properties of rough sets [26,27]
for assigning objects to lower and upper bounds of clusters.
Hogo et al. [11] applied cluster analysis to a sliding window of a time series.
It was shown to be a useful method for grouping related temporal patterns that
are dispersed along the time series. The use of interval set clustering obtained
by the modied Kohonen SOM were shown to be useful in the temporal anal-
ysis of clustering results. Hogo et al.s analysis was applied to visitors to an aca-
demic website. For issues related to identication of the web visitors such as
anonymity and protection of privacy, it was not possible to keep track of
web users from one visit to the next. Therefore, the analysis was necessarily
general. Nevertheless, the analysis pointed out some of the characteristics of
web visitors, which would have remained hidden without the temporal analy-
sis. This paper applies similar temporal analysis of migrations between clusters
of supermarket customers. The data used in the study was supplied by a na-
tional Canadian supermarket chain. Data consisted of transactional records
from a store in a rural setting. The data was collected over a 24 week period.
The customers could be tracked if they used a loyalty card. The acceptance of
the card was around 80%, i.e. 80% of the transactions used the loyalty
card. The study period was divided into six four-week periods. Both conven-
tional and rough set clustering were applied to these periods. The clustering re-
sults from these six periods were compared.
The rest of this paper is organized as follows. Section 2 includes review of
the rough set theory, conventional Kohonen SOM, and the modied Kohonen
SOM based on the properties of rough sets. Section 3 reviews some of the exist-
ing attempts at mining changes and states the problem of temporal changes in
clustering. Section 4 describes the study data and experiment design; including
data cleaning, data abstraction, and data segmentation. The results and discus-
sion are provided in Section 5. Section 6 considers customer relationship man-
agement and directions for future research. Conclusions of this work are
presented in Section 7.

2. Overview of theory and clustering algorithms

2.1. Rough set theory

Pawlak proposed the notion of rough set theory [26,27]. This section pro-
vides a brief summary of the concepts from rough set theory essential for intro-
ducing the Kohonen rough set theoretic algorithm. Let U denote the universe
(a nite ordinary set), and let RU U be an equivalence (indiscernibility)
relation on U. The pair A = (U, R) is called an approximation space. The equiv-
alence relation R partitions the set U into disjoint subsets. Such a partition of
P. Lingras et al. / Information Sciences 172 (2005) 215240 219

the universe is denoted by U/R = {E1, E2, . . ., En}, where Ei is an equivalence


class of E. If two elements u, v 2 U belong to the same equivalence class
EU/R, we say that u and v are indistinguishable. The equivalence classes of
R are called the elementary or atomic sets in the approximation space
A = (U, R). The union of one or more elementary sets is called a composed
set in A. The empty set ; is also considered a special composed set. Com(A)
denotes the family of all composed sets. Since it is not possible to dierentiate
the elements within the same equivalence class, one may not be able to obtain a
precise representation for an arbitrary set X  U in terms of elementary sets in
A. Instead, its lower and upper bounds may represent the set X. The lower
bound A(X) is the union of all the elementary sets, which are subsets of X(ele-
ments in the lower bound of x denitely belong to X). The upper bound AX is
the union of all the elementary sets that have a non-empty intersection with X
(elements in the upper bound of X may or may not belong to X). The pair
(A(X), AX ) is the representation of an ordinary set of X in the approximation
space A = (U, R), or simply the rough set of X. Fig. 1 illustrates the lower and
upper approximation using rough set theory. It can be veried, that for any
subsets X, Y  U, the following eight lemmas hold [21,22,3134].
AX \ Y AX \ AY L1

AX [ Y AX [ AY L2

AX \ Y  AX \ AY L3

AX [ Y AX [ AY L4

A
X
AX ; A
X
AX L5

X Y ) AX AY ; AX AY L6

AU AU U L7

A; A; ; L8

An equivalence class
Lower approximation
Actual set
Upper approximation

Fig. 1. Rough sets.


220 P. Lingras et al. / Information Sciences 172 (2005) 215240

2.2. Conventional Kohonen SOM

This section briey describes the conventional Kohonen SOM [16]. The
architecture is illustrated in Fig. 2. Unsupervised learning using the Kohonen
rule [16] uses a competitive learning approach. In competitive learning, the out-
put neurons compete with each other. The winner output neuron has an output
of one, the rest of the output neurons have outputs of zero. The competitive
learning is suitable for classifying a given pattern into exactly one of mutually
exclusive clusters. The network is used to group patterns represented by m-
dimensional vectors into k groups. The network consists of two layers. The rst
layer is called the input layer and the second layer is called the Kohonen layer.
The network receives the input vector for a given pattern. If the pattern belongs
to the ith group, then the ith neuron in the Kohonen layer has a output value of
one and the other Kohonen layer neurons have output values of zero. Each
connection is assigned a weight wij. The weights of all the connections to a
Kohonen layer neuron make up an m-dimensional weight vector wi. The weight
vector wi for a Kohonen layer neuron is the vector representation of the group
corresponding to that neuron. For any input vector v, the network compares
the input with the weight vector for a group using a measure such as d(wi,v):
Pm 2
j1 wij
vj
dwi ; v 1
m
The pattern v belongs to the group with the minimum value for d(wi,v). The
Kohonen neural network generates the clusters through a learning process as
follows: Initially, the network connections are assigned somewhat arbitrary
weights. The training set of input vectors is presented to the network several
times. For each iteration, the weight vector wi for a group that is closest to
the pattern v is modied using the equation

1 0
0

Output
Layer

Input
Layer

Fig. 2. Conventional Kohonen SOM.


P. Lingras et al. / Information Sciences 172 (2005) 215240 221

wnew
i wold
i at v
wold
i 2

where a(t) is a learning factor which starts with a high value at the beginning of
the training process and is gradually reduced as a function of time.

2.3. Kohonen SOM based on the properties of rough set theory

Rough sets were proposed using equivalence relations. However, it is


possible to dene a pair of upper and lower bounds AX ; AX or a
rough set for every set X  U as long as the properties specied by Pawlak
[26,27] are satised. Yao et al. [3942] described various generalizations of
rough sets by relaxing the assumptions of an underlying equivalence rela-
tion. Skowron and Stepaniuk [31] discuss a similar generalization of rough
set theory.
If one adopts a more restrictive view of rough set theory, the rough sets
developed in this paper may have to be looked upon as interval sets. Lingras
[18] proposed unsupervised rough set clustering based on genetic algorithms
to create the interval sets of clusters for web users. Lingras and West [23] pro-
posed an adaptation of the K-means algorithm based on rough set theory for
interval set clustering of web users. This paper uses some of the concepts from
Lingras and West [23] to create intervals of clusters using the Kohonen self-
organizing maps. Let us consider a hypothetical classication scheme U/
P = {X1, X2, . . ., Xk}, which partitions the set U based on certain criteria.
The actual values of Xi are not known. The classication of web users is an
example of such a hypothetical classication scheme. Depending on the pre-
dominant usage, a set of web visitors can be classied as crammers, workers,
or studious. However, the actual sets corresponding to each one of these clas-
ses are not known. Let us assume that due to insucient knowledge it is not
possible to precisely describe the sets Xi, 1 6 i 6 k, in the partition. It is pos-
sible to dene each set Xi 2 U/R using its lower and upper bounds
(AX ; AX ) based on the available information. In this study, the available
information consists of web access logs. Since vectors represent the objects
and clusters in the Kohonen rough set clustering algorithm, we will use vector
representations, v for an object and xi for cluster Xi. We are considering the
upper and lower bounds of only a few subsets of U. Therefore, it is not pos-
sible to verify all the properties of rough sets [26,27]. However, the family of
upper and lower bounds of xi 2 U/R are required to follow some of the basic
rough set properties such as:

An object v can be part of at most one lower bound P1

v 2 Axi ) v 2 Axi P2
222 P. Lingras et al. / Information Sciences 172 (2005) 215240

An object v is not part of any lower bound P3

m
v belongs to two or more upper bounds.
Properties (P1)(P3) can be obtained from the properties of rough sets and
the fact that Xi \ Xj = ;, i 5 j. It is important to note that, (P1), (P2), (P3) are
not necessarily independent or complete, however, enumerating them will be
helpful in understanding the rough set adaptation of the Kohonen neural
networks.
Incorporating rough sets into the Kohonen algorithm requires the addition
of the concept of lower and upper bounds in the equations, which are used
for updating the weights of the winners. The Kohonen rough set architecture
is similar to the conventional Kohonen architecture. It consists of two layers,
an input layer and the Kohonen rough set layer (rough set output layer).
These two layers are fully connected. Each input layer neuron has a feed for-
ward connection to each output layer neuron. Fig. 3 illustrates the Kohonen
rough set neural network architecture for a one-dimensional case. A neuron
in the Kohonen layer consists of two parts, a lower neuron and an upper neu-
ron. The lower neuron has an output of 1, if an object belongs to the lower
bound of the cluster. Similarly, a membership in the upper bound of the clus-
ter will result in an output of 1 from the upper neuron. Since an object
belonging to the lower bound of a cluster also belongs to its upper bound,
when the lower neuron has an output of 1, the upper neuron also has an
out of 1. However, membership in the upper bound of a cluster does not nec-
essarily imply membership in its lower bound. Therefore, the upper neuron
contains the lower neuron.
The interval clustering provides good results, if initial weights are obtained
by running the conventional Kohonen learning.

Upper Upper Upper

Lower Lower
Lower
C1 C2 C3

Kohonen
Rough Set
Layer
Input
Layer

Fig. 3. Modied Kohonen SOM based on properties of rough sets.


P. Lingras et al. / Information Sciences 172 (2005) 215240 223

Step 1. Initialize weights from m inputs to the k output nodes using the con-
ventional Kohonen algorithm.
Step 2. For each object vector, v, let d(v, xi) be the distance between itself and
the weight vector xi of cluster Xi.
Step 3. The next step in the modication of the Kohonen algorithm for obtain-
ing rough sets is to design criteria to determine whether an object
belongs to the upper or lower bounds of a cluster. Let d(v, xi) =
dv;xi
min16j6kd(v, xj). The ratio dv;x j
, 1 6 j 6 k, is used to determine the
membership of v as follows.
Let T = {j : d(v, xi)/d(v, xj) 6 threshold, i 5 j}.
If T5;, v2Axi and v2Axj ; 8j 2 T . Furthermore, v is not part of
any lower bound. The above criterion guarantees that property (C3)
is satised. The weight vectors xi and xj are modied as:
xnew
i xold
i aupper t v
xold
i

xnew
j xold old
j aupper t v
xj :

Otherwise, if T = ;, v 2 A(xi) such that d(v, xi) is the minimum for


1 6 i 6 k. In addition, by property (C2), v 2 Axi . The weight vector
xi is modied as
xnew
i xold
i alower t v
xold
i :

Usually, alower(t) > aupper(t).


Step 4. If the clustering remains unchanged from the previous iteration, stop.
Otherwise, go to step 2 as long as a maximum number of iterations is
not reached.

It can be easily veried that the above algorithm preserves properties (P1)
(P3).
Kohonens learning law with a xed learning rate does not converge. Con-
vergence requires the sum of the innite sequence of learning rates to be in-
nite, while the sum of squared learning rates must be nite [16, p. 34].
Convergence to a local optimum can be obtained as the training time goes
to innity if the learning rate is reduced in a suitable manner as described
above [16]. Similar comments can be made regarding the proposed modica-
tion of the Kohonen algorithm. However, the application of the Kohonen
algorithm and the proposed modications have shown that reasonable or sta-
ble clustering is obtained with approximately 100 iterations.

3. Mining clustering changes in data streams

While earlier temporal data mining used time-series data for traditional data
mining tasks, there is an increasing interest in studying changes in the data and
224 P. Lingras et al. / Information Sciences 172 (2005) 215240

corresponding analysis over a period of time [1,68,11,15,32,35,36]. Aggarwal


[1] discussed the concept of 13 velocity density estimation to understand, visu-
alize and determine trends in the evolution of fast data streams. Kifer et al. [15]
proposed a method that can guarantee reliability of detected changes in data
streams, and also can provide descriptions and quantication of these changes.
Dongs et al. [6] emphasized the importance of mining of changes and described
some of the research problems and identied inherent challenges. They sketched
some of the preliminary results with respect to classication and clustering.
Ganti et al. [7] developed a framework for quantifying the dierence between
two datasets in terms of models they induce. The models that are included in
their study include popular data mining techniques such as frequent item sets,
classiers, and clustering. Song et al. [32] also focused on implications of
changes to data mining results, in particular, their study addressed association
rules. Another study by Wang et al. [35] also studied changes in the data mining
results dealing with classications. Ha et al. [8] stated the importance of the
problem of changing nature of data mining results in customer relationship
management (CRM). Their formulation included a study of shifts in segmenta-
tion and classication. They provided a variety of measures to quantify the
desirability of these changes. A related study by West et al. [36] attempted to
determine customer attrition by studying changing nature of clusters. A study
conducted by Hogo et al. [11] showed that a study of changes in clusters over
time can reveal potentially useful information about usages of a web site.
The review of some of the existing literature described above supports the
importance of studying the changes in data mining results over time. The focus
of this study is clustering. There are two aspects to the study of changes in clus-
tering: study of changes to aggregate clustering, and study of changing cluster
memberships of individual objects.
Changes to aggregate clustering include shrinking or expanding of certain
clusters. If the desirable clusters are expanding, it is a positive indication to
marketing managers. They can further analyze the cause of these changes with
a view to facilitating such changes. Expansion of undesirable clusters, on the
other hand, can be used to take preventive measures. The study of changing
nature of cluster memberships of individuals is useful for target marketing.
The interval clusters studied in this paper may be particularly useful as they
will be able to provide early indications of these changes. The following sec-
tions describe experimental results of conventional and interval set representa-
tion of clusters of supermarket customers over a 24 week period.

4. Study data and design of experiments

The data was collected from a rural area supermarket, which is part of a
national Canadian chain of stores. The data collection spanned a 24 weeks
P. Lingras et al. / Information Sciences 172 (2005) 215240 225

period. It includes customer information about spending, visits, product cate-


gories shopped, and other transactional data.
Poor quality data always leads to poor quality results. Therefore, data qual-
ity is a fundamental issue in data mining, not only because distorted data
means distorted results, but also because many of the interesting or unusual pat-
terns discovered may be directly due to corrupt data. Data preparation consti-
tutes a signicant eort before mining can be applied. Data preparation in this
study consisted of four steps: the rst was data cleaning; the second was data
abstraction, the third was data segmentation, and the fourth was the sorting of
data segments. The inputs of this preparation phase were the supermarket
transaction records and the outputs were the abstraction records that include
the customer ID, followed by eight time series variables, four of them
for the visits and the other four for their spending for each visit. The details
of the dataset are shown in Table 1.

4.1. Data cleaning and missing values problem

There were no missing values in the data obtained from the supermarket.
The objective of the study was to nd out the interval set clustering based
on the customers spending potential and loyalty. Customer visit patterns were
used as an indication of loyalty and spending patterns were used as an indica-
tion of spending potential. Commercial customers such as restaurants and
small stores tend to buy large volumes of products. Their inclusion with regular
households can skew the analysis. Stores also maintain a stack of cards that are
used by the cashiers to help regular customers (who may have forgotten their
cards) take advantage of special oers. Again, inclusion of such cards with the
regular households is not advisable. It was decided to only include those cus-
tomers who visited six or less times per week, and spend $1000 or less per week.
This helped us eliminate commercial customers and store cards used for multi-
ple households.

4.2. Data abstraction

The process of data abstraction depends on the goal of the study, and the
contents of the data itself. The available data consisted of transactions for

Table 1
Transactional data set
Size of input Number of Number of Number of
transactions customers customers four-week records
after cleaning after cleaning
3,691,611 22,448 22,240 133,440
226 P. Lingras et al. / Information Sciences 172 (2005) 215240

all the items sold by a single store. As mentioned before, one of the goals of the
study was to cluster the customers using visit patterns (loyalty) and spending
patterns (spending potential). Data summarization consisted of creating one
record per household consisting of the number of visits and amount of spend-
ing per week from the transactions records. The abstract representation of the
customer consisted of a vector consisting of customer ID, which is house num-
ber followed by 52 elds, 26 weekly visits and 26, weekly spending amounts.
These records were further segmented as described in the following section.

4.3. Data segmentation

The primary goal of this study was to nd the temporal migrations of custom-
ers between clusters. In order to study the temporal changes, the abstracted data
records were segmented based on time units. The time unit was chosen to be
approximately a month, that is four weeks. This is based on an analysis of prod-
uct purchases that suggests that a typical customer will buy all the necessary
household products over the span of one month. Given 26 weeks, this segmen-
tation resulted in six segments of four weeks each. The last two weeks of data
were discarded. The data for each month was stored in separate data les. Each
data le consisted of four week representation of customer loyalty and spending
potential. Each record in the les consisted of customer ID, followed by eight
time series vector consists of four weekly visits, and four weekly spending.

4.4. Sorting data segment

The last phase in the data preparation consisted of sorting the time series val-
ues for each data segment in ascending order. As mentioned before, usually a
customer buys most of the necessary products in a given month. However,
the actual timing of purchases may vary from customer to customer. It is pos-
sible that customers with similar proles may spend dierent amounts in a given
week. However, if the values are sorted, the dierences between these customers
may vanish. For example, four weeks spending of a customer may be $100, $70,
$50, and $40. Another customer may spend $40, $70, $100, and $50 in those
four weeks. If the two time-series were compared with each other, the two cus-
tomers may seem to have completely dierent proles. If the time series values
were sorted, the two customers will have identical patterns. Therefore, both the
visit patterns and spending patterns were sorted [20,21].

4.5. Experimental design

Previous analysis [20,22,24] and discussions with a market analyst indi-


cated that ve groups are sucient to represent important categories of cus-
P. Lingras et al. / Information Sciences 172 (2005) 215240 227

tomers. The previous studies had labeled these categories as: G1 for loyal big
spenders, G2 for loyal moderate spenders, G3 for semi-loyal moderate spend-
ers, G4 for semi-loyal potentially big spenders, and G5 for infrequent
customers.
The modied Kohonen SOM consisted of eight input neurons correspond-
ing to the four weekly visits and four weekly spending. The average spending
per visit was $25. The ratio of 25:1 in favour of spending would have weighted
the clustering more based on the spending than visits. Lingras and Adams [20]
experimented with various weighting schemes. Based on their experience, the
values were scaled to make spending only twice as important as visits (as op-
posed to 25 times more important with raw values). In other words, the spend-
ing amounts were normalized to values between 0 and 2, while visits were
normalized to values between 0 and 1. The rough set output layer consisted
of ve rough set neurons (each with its lower and upper component) corre-
sponding to the ve categories described earlier. After experimenting with a
range of values, the threshold was set at 0.7, alower(t) was chosen to be 0.01,
and 0.005 was used as the value of aupper(t).

5. Results and discussion

The results obtained from this work are discussed in the following three
steps:

(a) Non-temporal clustering analysis using the conventional Kohonen


SOM.
(b) Non-temporal clustering analysis using the modied Kohonen SOM based
on the properties of rough sets, and comparison of results with the conven-
tional Kohonen SOM.
(c) Temporal clustering analysis using the modied Kohonen SOM.

5.1. Non-temporal clustering analysis using conventional Kohonen SOM

In the non-temporal analysis, each four-week (or monthly) segment was


treated as a separate object. That means for each customer there were six
monthly records. This made sure that the same grouping was applied to all
months. The monthly variations could later be compared with this combined
group. Fig. 4 provides a pictorial view of the spending and visit patterns for
the ve clusters obtained using the conventional self-organizing maps as well
as their cardinalities. The clusters can be described according to the loyalty
and spending potential of the customers as follows.
228 P. Lingras et al. / Information Sciences 172 (2005) 215240

2.5 250

2 200

1.5 150

1 100

0.5 50

0 0
V1 V2 V3 V4 S1 S2 S3 S4
(a) (b)

G1 G2 G3 G4 G5

Fig. 4. Non-temporal cluster analysis using conventional Kohonen SOM over a four-week period:
(a) non-temporal visiting behaviour and (b) non-temporal spending behavior.

Loyal big spenders (G1). This group consists of the largest spenders. They
are frequent visitors and seem to be very loyal to the store. Obviously,
the store would like to encourage continued patronage from such a
group.
Loyal moderate spenders (G2). Even though the maximum spending for these
customers was smaller than G4, their spending patterns were the most stable
among all the groups. The total number of visits was comparable to G1. These
customers may be the most loyal among all the groups. They are not big
spenders like the customers from G1 and G4. Additional analysis shows
that these customers receive higher discounts indicating a value conscious
nature.
Semi-loyal moderate spenders (G3). These customers are similar to those from
G5. However, their spending and visits over 24 weeks indicate that these cus-
tomers are more frequent and spend a little more than those from G5. It is also
possible that they dont always use the supermarket card.
Semi-loyal potentially big spenders (G4). In terms of maximum amount spent,
this group is comparable to the rst group. Based on this observation alone,
one may categorize these customers as the second most loyal customers. How-
ever, the weekly patterns indicate that for 12 weeks these customers tended to
stay away from the store. The supermarket may not be attracting a signicant
portion of purchases from these customers. More incentives to increase patron-
age from these customers may be worthwhile.
Infrequent customers (G5). Customers from this group are the least loyal to the
store among all the groups. They seem to have only visited the store once or
twice during the 24 weeks. The spending levels were very limited as well. It
is also possible that some of these customers do not use the Supermarket card
on a regular basis.
P. Lingras et al. / Information Sciences 172 (2005) 215240 229

5.2. Non-temporal clustering analysis using modied Kohonen SOM based on


the properties of rough sets

Fig. 5 shows the visit and spending patterns for the lower bounds, upper
bounds, and boundary regions of ve clusters obtained using the modied
Kohonen SOM. The patterns for the ve clusters are similar to the ones ob-
tained from the conventional Kohonen SOM. However, there are subtle dier-
ences. For example, the visit patterns for the lower bounds seem to separate
loyal groups G1 and G2 from semi-loyal groups G3 and G4 more clearly than
the conventional clustering. On the other hand, the distinctions for upper

2.5 250
2 200
1.5 150
1 100
0.5 50
0 0
V1 V2 V3 V4 S1 S2 S3 S4
(a) (b)

2.5 250
2 200
1.5 150
1 100
0.5 50
0 0
V1 V2 V3 V4 S1 S2 S3 S4
(c) (d)

2.5 250
2 200
1.5 150
1 100
0.5 50
0 0
V1 V2 V3 V4 S1 S2 S3 S4
(e) (f)

G1 G2 G3 G4 G5

Fig. 5. Non-temporal cluster analysis using modied Kohonen SOM over a four-week period:
(a) low visits behavoir; (b) low spending behavior; (c) upper visits behavior; (d) upper spending
behavior; (e) visiting behavior in Bnd region and (f) spending behavior in Bnd region.
230 P. Lingras et al. / Information Sciences 172 (2005) 215240

bounds are less clear, and the boundary regions seem to have the least distinc-
tion between loyal and semi-loyal customers. Fig. 6 shows a comparison of
sizes of conventional clusters and the corresponding interval representations.
The sizes of conventional clusters are in the range between lower and upper
bounds. This fact lends further credibility to the appropriateness of interval
clustering using the modied Kohonen SOM.

5.3. Temporal clustering analysis using modied Kohonen SOM

Table 2 shows the detailed breakdown of the sizes of lower bounds, upper
bounds, and boundary regions of ve groups for six time periods. Each period
is four weeks or approximately one month long. These periods include exactly
the same customers. Therefore, changes in the clustering from one period to
another suggest temporal changes in customer shopping behaviour. The varia-
tions in the clusters can be better understood by studying their temporal pat-
terns. In order to eliminate the dierences between various clusters, the sizes
of sets were divided by the average set size. Figs. 7 and 8 show the variations
Size
in the ratios AverageSize for lower bounds, upper bounds, and the boundary re-
gion. If the ratio varies in a small range around 1, the group sizes are relatively
stable. In all the three cases, sizes of group G5 (infrequent customers) were
most stable, followed by G3 (semi-loyal moderate spenders). The sizes of loyal
moderate spenders were also relatively stable. The loyal and semi-loyal big
spender groups, G1 and G4, showed the most variation in sizes. In particular,
the semi-loyal big spenders uctuated signicantly. Big spenders are important

90000

75000

60000
Lower
Size

45000 Crisp
Upper
30000

15000

0
1 2 3 4 5
Group

Fig. 6. Comparisons of sizes of crisp clusters with their interval representations.


P. Lingras et al. / Information Sciences 172 (2005) 215240 231

Table 2
Temporal cluster analysis using modied Kohonen SOM
Class Total region size Month
M-1 M-2 M-3 M-4 M-5 M-6
(a) Sizes of lower bounds during six periods
G1 3283 423 635 630 508 537 550
G2 10,605 1602 1811 1767 1811 1773 1841
G3 23,352 3753 3904 4034 3789 3751 4121
G4 5038 748 802 1075 822 740 851
G5 81,562 14,276 13,385 13,162 13,766 13,778 13,195
(b) Sizes of upper bounds during six periods
G1 4935 615 919 946 760 840 855
G2 15,045 2234 2605 2477 2548 2524 2657
G3 30,744 4899 5215 5170 5005 5033 5422
G4 6966 1015 1171 1474 1081 1069 1156
G5 85,634 14,947 14,093 13789 14,434 14,488 13,887
(c) Sizes of boundary regions during six periods
G1 1652 192 284 316 252 303 305
G2 4440 632 794 710 737 751 816
G3 7392 1146 1311 1136 1216 1282 1301
G4 1928 267 369 399 259 329 305
G5 4072 671 708 627 668 706 692

to the supermarket. A more detailed study of buying patterns of these groups


may be helpful in increasing sales. It should perhaps be mentioned that the
sizes of boundary regions tended to uctuate more than the lower and upper
bounds for all the groups. The patterns of sizes of boundary regions as percent-
ages of lower bounds also provide an indication of the stability of group mem-
berships. The loyal big-spender group had consistently high percentages in the
boundary regions, followed by the semi-loyal potentially big spenders. The cus-
tomers in the boundary regions of these two big spender groups can be impor-
tant targets for special marketing campaigns. The use of boundary regions and
temporal analysis of interval clusters seems to have shed additional light on the
grouping of customers, which otherwise would not have been possible based on
static crisp clustering. The cluster memberships can be used to study changing
nature of customers shopping behaviour. As an example, the following section
shows how studying changes in clustering can be used to identify customer
attrition.

5.4. Temporal analysis of attrition based on cluster membership

Customer retention is less expensive than acquiring new ones. Retailers can
learn from catalog marketers customer retention practices [25]. This is impor-
tant for two reasonsretaining the most valuable customers and upgrading
232 P. Lingras et al. / Information Sciences 172 (2005) 215240

1.3

1.2

Size/Average Size 1.1


G1
1 G2
G3
0.9
G4
0.8 G5

0.7

0.6
1 2 3 4 5 6
(a) Period

1.3

1.2

1.1
Size/Average Size

G1
1 G2
G3
0.9 G4
0.8 G5

0.7

0.6
1 2 3 4 5 6
(b) Period

1.3

1.2

1.1
Size/Average Size

G1
1 G2
G3
0.9
G4
0.8 G5

0.7

0.6
1 2 3 4 5 6
(c) Period

Fig. 7. Temporal changes in interval clustering: (a) relative changes in the lower bound; (b) relative
changes in the upper bound and (c) relative changes in the boundary region.
P. Lingras et al. / Information Sciences 172 (2005) 215240 233

60%
Boundary/Lower

40% G1
G2
G3
G4
20% G5

0%
1 2 3 4 5 6
Period

Fig. 8. Temporal changes in the boundary regions as percentages of lower bound.

less valuable customers into higher value ones [33]. Categorizing customers
across three parameters recency of purchase, frequency of purchase and mon-
etary value of transactions (RFM) aids in segmenting customers. The clusters
obtained in this study based on visits and spending correspond to the frequency
and monetary aspects of an RFM model. The recency component can be in-
cluded in the analysis through temporal analysis of cluster memberships. This
section shows how temporal analysis of cluster memberships can help in iden-
tifying attrition.
Attrition is a growing problem in todays increasingly competitive supermar-
ket industry. It is essential for companies to identify attrition among their cus-
tomer base, so that a better understanding of the inuencing factors can be
achieved. This will give companies an opportunity to minimize attrition in
the future. Data mining provides useful tools for the analysis of attrition,
including clustering, to better understand factors aecting attrition among cur-
rent customers, and prediction, to identify customers at risk before they leave,
allowing an opportunity to take preventative actions.
The conventional denition of attrition is based on sustained absence from
the store. However, a reduced patronage may usually be a prelude to attrition.
West et al. [36] provided an improved analysis of attrition based on cluster
memberships. After experimenting with a variety of ctitious lost and contin-
uing customers, West et al. proposed a cluster membership variation measure
Xn
attrition Di;i1 K
k i1 k i1
C
i1
234 P. Lingras et al. / Information Sciences 172 (2005) 215240

where n is the number of periods, in this case, n = 6. K represents the number of


clusters. In the present study, K = 5. The value of the cluster membership in a
specic period is represented as ki. Di,i+1 is the change in the cluster member-
ship from period i to i + 1. C is simply a constant value that shifts the nal po-
sition of the threshold between lost and continuing customers. In this study, the
value of C was set as 1 and the threshold was 0. The change in cluster member-
ship over all of the periods is represented by attrition. If attrition is less than or
equal to 0, then the store has lost the customer.
Applying West et al.s approach we can identify continuing and lost custom-
ers from our data sets. Figs. 9 and 10 show the graphs for two randomly chosen
customers from the cluster migration table. One of the customers was chosen
from the continuing group and the other from the lost group.
Fig. 9 depicts a customer who showed a sharp increase in loyalty over the
rst 4 weeks, followed by stable behaviour over the rest of the periods. From
this graph, it can be concluded that this customer did not show attrition over
the periods under study. Fig. 10, on the other hand, depicts a customer who did
show attrition. Originally the customer was not loyal to the store. This was fol-
lowed by a period of moderate to high loyalty, and nally by a period in the
least loyal cluster. This behaviour would be typical of a customer who decided
to try the store for a while; beginning in Period 2, and then after a short time
chose not to stay with the store.
Figs. 11 and 12 show the average spending and visits for fty randomly cho-
sen customers from each group. Fig. 12 clearly shows a declining trend for the
customers in the lost group over the course of 26 weeks. Fig. 11, on the other
hand, shows a much more stable spending and visit behaviour, which would be
typical of customers who are not reducing their shopping frequency.

Weeks
1 2 3 4 5 6
0

1
Cluster membership

Fig. 9. Temporal changes in cluster memberships for a continuing customer.


P. Lingras et al. / Information Sciences 172 (2005) 215240 235

Weeks
1 2 3 4 5 6
0

1
Cluster membership

Fig. 10. Temporal changes in cluster memberships for a lost customer.

6. Future research and customer relationship management

The temporal data also helps in predicting future purchases of customers.


The best ranked customers are those who score highly for all three RFM
parameters, recency, frequency, and monetary values [33]. Catalog marketers
realized that the response rate for their mailers was always higher for the best
customers. The RFM classication, pioneered by catalog marketers, can be
replicated in other retail businesses. The combined score is a good indicator
of customer interest, which is valuable information. Linking RFM with LTV
(Life Time Valueexpected prot contribution by a customer as long as he/
she remains a patron) helps retailers track over a period of time the trend of
customer behaviourpercentage of customers over or under a pre-specied
activity levels of each parameter [33]. While temporal study of conventional
clustering helped us identify attrition of customers, interval set representations
of customers can provide us with advance warning of potential transition from
a more desirable cluster to a less desirable cluster. The data used in this study,
which spans 24 weeks, revealed many aspects of temporal analysis of cluster
memberships. However, this data is not sucient for predictions based on his-
torical values of cluster memberships. Further research will involve collection
of data from the subsequent time periods to construct and test models for pre-
dicting customer behaviour based on interval and conventional representations
of clusters. A number of recent studies can be very useful for quantifying the
changes [7,8,36]. West et al.s [36] attrition detection method can be extended
to the interval set representation of clusters. Similarly, the deviation measure
proposed by Ganti et al. [7] can be used for more generic quantication of
changes. It will be interesting to study how the deviation measure enables us
236 P. Lingras et al. / Information Sciences 172 (2005) 215240

1.2

0.8
Visits

0.6

0.4

0.2

0
1 3 5 7 9 11 13 15 17 19 21 23
Weeks

60

50

40
Spending

30

20

10

0
1 3 5 7 9 11 13 15 17 19 21 23
Weeks

Fig. 11. Average spending and visits patterns for 50 continuing customers.

to alert the marketing manager of potential changes in an interval set based


clustering. The states transition probability matrix proposed by Ha et al. [8]
is of particular. The transition probabilities can be enhanced by the use of
intervals of clusters. Customer relationship management implications of the
changes in interval set of clustering will be a major future research direction.

7. Conclusion

This work demonstrates the need for non-conventional clustering techniques


to categorize supermarkets customers based on their loyalty and spending po-
tential. The work presents the applications of the conventional and modied
Kohonen SOMs for non-temporal clustering. The comparison between the
P. Lingras et al. / Information Sciences 172 (2005) 215240 237

1.2

0.8
Visits

0.6

0.4

0.2

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Weeks

60

50

40
Spending

30

20

10

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Weeks

Fig. 12. Average spending and visits patterns for 50 lost customers.

interval set clusters and the crisp clusters shows that the interval set clusters
provide clusters that are similar to the conventional crisp clusters. The interval
clusters can provide more information through the boundary regions of cus-
tomers with ambivalent behaviour. The interval clustering was applied to six
sequential periods. Each period was four-weeks or approximately one month
long. The temporal clustering analysis of customers discovered some interest-
ing patterns. For example, the two big spender groups tended to uctuate more
than the other groups. Boundary regions uctuated more than the lower and
upper bounds. The big spender groups also had consistently higher percentages
of customers in boundary regions. The paper also reported results from iden-
tication of attrition based on temporal analysis of cluster memberships. The
analysis clearly indicates that further investigations into detailed temporal buy-
ing patterns of the big spenders in general, and boundary regions of these
groups in particular, may be helpful in designing special promotional cam-
paigns for increasing sales. The results of such investigations, including studies
238 P. Lingras et al. / Information Sciences 172 (2005) 215240

of individual customer migrations among clusters will appear in future


publications.

References

[1] C.C. Aggarwal, A framework for change diagnosis of data streams, in: Proceedings of ACM
SIGMOD Conference, 2003, pp. 575586.
[2] C. Antunes, A. Oliveira, Temporal data mining: an overview, in: Proceedings of KDD 2001
Workshop on Temporal Data Mining, http://www.acm.org/sigkdd/kdd2001/Workshops/
ano.pdf, 2001.
[3] I.V. Cadez, D. Heckerman, C. Meek, P. Smyth, S. White, Model-based clustering and
visualization of navigation patterns on a Web site, Journal of Data Mining and Knowledge
Discovery 7 (4) (2003). http://www.datalab.uci.edu/papers/webcanvas.pdf.
[4] J. Cliord, A. Tuzhilin, Recent Advances in Temporal Databases, Springer-Verlag, Berlin,
1995.
[5] H.A. do Prado, P.M. Engel, H.C. Filho, Rough clustering: an alternative to nding
meaningful clusters by using the reducts from a dataset, in: J. Alpigini, J. Peters, A. Skowron,
N. Zhong (Eds.), Rough Sets and Current Trends in Computing (RSCTC02), Lecture Notes
in Articial Intelligence, 2475, Springer-Verlag, Berlin, 2002.
[6] G. Dong, J. Han, L.V.S. Lakshmanan, J. Pei, H. Wang, P.S. Yu, Online mining of changes
from data streams: research problems and preliminary results, in: Proceedings of the 2003
ACM SIGMOD Workshop on Management and Processing of Data Streams, 2003.
[7] V. Ganti, J. Gehrke, R. Ramakrishnan, A framework for measuring changes in data
characteristics, Journal of Computer and System Sciences 64 (2002) 542578.
[8] S.-H. Ha, S.-M. Bae, S.-C. Park, Customers time-variant purchase behavior and correspond-
ing marketing strategies: an online retailers case, Computer and Industrial Engineering 43 (4)
(2002) 801820.
[9] S. Hirano, X. Sun, S. Tsumoto, Comparison of clustering methods for clinical databases,
Information Sciences 159 (34) (2004) 155165.
[10] S. Hirano, S. Tsumoto, Rough clustering and its application to medicine, Information Sciences
124 (2002) 125137.
[11] M. Hogo, M. Snorek, P. Lingras, Temporal web usage mining, in: Proceedings of 2003 IEEE/
WIC International Conference on Web Intelligence, 2003, pp. 450453.
[12] H. Jin, W. Shum, K. Leung, M. Wong, Expanding self-organizing map for data visualization
and cluster analysis, Information Sciences 163 (13) (2004) 157173.
[13] T. Joachims, R. Armstrong, D. Freitag, T. Mitchell, Webwatcher: a learning apprentice for the
world wide web, in: Proceedings of AAAI Spring Symposium on Information Gathering from
Heterogeneous, Distributed Environments, 1995.
[14] A. Joshi, R. Krishnapuram, Robust fuzzy clustering methods to support web mining, in:
Proceedings of the Workshop on Data Mining and Knowledge Discovery, SIGMOD 98, 1998,
pp. 15/115/8.
[15] D. Kifer, S. Ben-David, J. Gehrke, Detecting change in data streams, in: Proceedings of the
30th VLDB Conference, 2004, pp. 180191.
[16] T. Kohonen, Self-Organization and Associative Memory, Springer-Verlag, Berlin, 1988.
[17] M. Koprulu, N. Cicekli, A. Yazici, Spatio-temporal querying in video databases, Information
Sciences 160 (14) (2004) 131152.
[18] P. Lingras, Unsupervised rough set classication using GAs, Journal of Intelligent Informa-
tion Systems 16 (3) (2001) 215228.
[19] P. Lingras, Rough set clustering for web mining, in: Proceedings of 2002 IEEE International
Conference on Fuzzy Systems, 2002.
P. Lingras et al. / Information Sciences 172 (2005) 215240 239

[20] P. Lingras, G. Adams, Selection of time-series for clustering supermarket customers, Technical
Report 2002_006, Department of Mathematics and Computing Science, Saint Marys
University, Halifax, NS, Canada, 2002.
[21] P. Lingras, M. Hogo, M. Snorek, Interval set clustering of web users using modied kohonen
self-organizing maps based on the properties of rough sets, Web Intelligence and Agent
Systems: An International Journal 2 (3) 217225.
[22] P. Lingras, M. Hogo, M. Snorek, B. Leonard, Clustering supermarket customers using rough
set based Kohonen networks, in: Proceedings of Fourteenth International Symposium on
Methodologies for Intelligent Systems, Notes in Articial Intelligence Series, 2871, Springer,
Berlin, 2003, pp. 169173.
[23] P. Lingras, C. West, Interval set clustering of Web users with rough K-means, Journal of
Intelligent Information Systems 23 (1) (2004) 516.
[24] P. Lingras, L. Young, Multi-criteria time-series based clustering of supermarket customers
using Kohonen networks, in: Proceedings of the 2001 International Conference on Articial
Intelligence (IC-AI2001), vol. I, 2001, pp. 158164.
[25] J. Novo, Drilling Down: Turning Customer Data into Prots with a Spreadsheet,
Booklocker.com Inc, Saint Petersburg, FL, 2004.
[26] Z. Pawlak, Rough sets, International Journal of Information and Computer Science 11 (1982)
145172.
[27] Z. Pawlak, Rough classication, International Journal of ManMachine Studies 20 (1984)
469483.
[28] W. Pedrycz, A. Gacek, Temporal granulation and its application to signal analysis,
Information Sciences 143 (14) (2002) 4771.
[29] L. Rodrguez, H. Ogata, Y. Yano, TVOO: A temporal versioned object-oriented data model,
Information Sciences 114 (14) (1999) 281300.
[30] M. Shyu, C. Haruechaiyasak, S. Chen, Category cluster discovery from distributed WWW
directories, Information Sciences 155 (34) (2003) 181197.
[31] A. Skowron, J. Stepaniuk, Information granules in distributed environment, in: N. Zhong, A.
Skowron, S. Ohsuga (Eds.), New Directions in Rough Sets, Data Mining, and Granular-Soft
Computing, Lecture Notes in Articial Intelligence, 1711, Springer-Verlag, Berlin, 1999, pp.
357365.
[32] H.S. Song, J.K. Kim, S.H. Kim, Mining the change of customer behavior in an internet
shopping mall, Expert Systems with Applications 21 (3) (2001) 157168.
[33] K. Suresh, Retailing Concepts and Cases, ICFAI University Press, Hyderabad, India, 2002.
[34] S. Tomic, S. Vrbsky, T. Camp, A new measure of temporal consistency for derived objects in
real-time database systems, Information Sciences 124 (14) (2000) 139152.
[35] K. Wang, S. Zhou, A.W.-C. Fu, J.X. Yu, Mining changes of classication by correspondence
tracing, in: Proceedings of the 2003 SIAM International Conference on Data Mining (SDM
2003), San Francisco, CA, 2003.
[36] C. West, A. Jain, P. Lingras, B. Leonard, Supermarket customer attrition analysis based on
cluster membership patterns, in: Proceedings of the First Indian International Conference on
Articial Intelligence, 2003, pp. 11321140.
[37] T. Yamasaki, Y. Kataoka, K. Kameyama, K. Nakano, Neural networks handling sequential
patterns, Information Sciences 159 (34) (2004) 141154.
[38] X. Yao, Research issues in spatio-temporal data mining http://www.ucgis.org/Visualization/
whitepapers/Yao-KDVIS2003.pdf 2003.
[39] Y. Yao, Relational interpretations of neighborhood operators and rough set approximation
operators, Information Sciences 111 (14) (1998) 239259.
[40] Y. Yao, Constructive and algebraic methods of the theory of rough sets, Information Sciences
109 (14) (1998) 2147.
240 P. Lingras et al. / Information Sciences 172 (2005) 215240

[41] Y. Yao, A comparative study of fuzzy sets and rough sets, Information Sciences 109 (14)
(1998) 227242.
[42] Y.Y. Yao, X. Li, T.Y. Lin, Q. Liu, Representation and classication of rough set models, in:
Proceedings of Third International Workshop on Rough Sets and Soft Computing, 1994, pp.
630637.

You might also like