Professional Documents
Culture Documents
www.elsevier.com/locate/ins
Abstract
Temporal data mining is the application of data mining techniques to data that takes
the time dimension into account. This paper studies changes in cluster characteristics of
supermarket customers over a 24 week period. Such an analysis can be useful for for-
mulating marketing strategies. Marketing managers may want to focus on specic
groups of customers. Therefore they may need to understand the migrations of the cus-
tomers from one group to another group. The marketing strategies may depend on the
desirability of these cluster migrations. The temporal analysis presented here is based on
conventional and modied Kohonen self organizing maps (SOM). The modied
Kohonen SOM creates interval set representations of clusters using properties of rough
sets. A description of an experimental design for temporal cluster migration studies
*
Corresponding author. Tel.: +1 902 420 5798; fax: +1 902 420 5035.
E-mail address: pawan.lingras@smu.ca (P. Lingras).
0020-0255/$ - see front matter 2005 Elsevier Inc. All rights reserved.
doi:10.1016/j.ins.2004.12.007
216 P. Lingras et al. / Information Sciences 172 (2005) 215240
including, data cleaning, data abstraction, data segmentation, and data sorting, is pro-
vided. The paper compares conventional and non-conventional (interval set) clustering
techniques, as well as temporal and non-temporal analysis of customer loyalty. The
interval set clustering is shown to provide an interesting dimension to such a temporal
analysis.
2005 Elsevier Inc. All rights reserved.
Keywords: Temporal data mining; Rough set theory; Modied kohonen SOM; Loyalty
1. Introduction
self-organizing maps, based on rough set theory. Both the modied statistical
and neural network based approaches used the properties of rough sets [26,27]
for assigning objects to lower and upper bounds of clusters.
Hogo et al. [11] applied cluster analysis to a sliding window of a time series.
It was shown to be a useful method for grouping related temporal patterns that
are dispersed along the time series. The use of interval set clustering obtained
by the modied Kohonen SOM were shown to be useful in the temporal anal-
ysis of clustering results. Hogo et al.s analysis was applied to visitors to an aca-
demic website. For issues related to identication of the web visitors such as
anonymity and protection of privacy, it was not possible to keep track of
web users from one visit to the next. Therefore, the analysis was necessarily
general. Nevertheless, the analysis pointed out some of the characteristics of
web visitors, which would have remained hidden without the temporal analy-
sis. This paper applies similar temporal analysis of migrations between clusters
of supermarket customers. The data used in the study was supplied by a na-
tional Canadian supermarket chain. Data consisted of transactional records
from a store in a rural setting. The data was collected over a 24 week period.
The customers could be tracked if they used a loyalty card. The acceptance of
the card was around 80%, i.e. 80% of the transactions used the loyalty
card. The study period was divided into six four-week periods. Both conven-
tional and rough set clustering were applied to these periods. The clustering re-
sults from these six periods were compared.
The rest of this paper is organized as follows. Section 2 includes review of
the rough set theory, conventional Kohonen SOM, and the modied Kohonen
SOM based on the properties of rough sets. Section 3 reviews some of the exist-
ing attempts at mining changes and states the problem of temporal changes in
clustering. Section 4 describes the study data and experiment design; including
data cleaning, data abstraction, and data segmentation. The results and discus-
sion are provided in Section 5. Section 6 considers customer relationship man-
agement and directions for future research. Conclusions of this work are
presented in Section 7.
Pawlak proposed the notion of rough set theory [26,27]. This section pro-
vides a brief summary of the concepts from rough set theory essential for intro-
ducing the Kohonen rough set theoretic algorithm. Let U denote the universe
(a nite ordinary set), and let RU U be an equivalence (indiscernibility)
relation on U. The pair A = (U, R) is called an approximation space. The equiv-
alence relation R partitions the set U into disjoint subsets. Such a partition of
P. Lingras et al. / Information Sciences 172 (2005) 215240 219
AX [ Y AX [ AY L2
AX \ Y AX \ AY L3
AX [ Y AX [ AY L4
A
X
AX ; A
X
AX L5
X Y ) AX AY ; AX AY L6
AU AU U L7
A; A; ; L8
An equivalence class
Lower approximation
Actual set
Upper approximation
This section briey describes the conventional Kohonen SOM [16]. The
architecture is illustrated in Fig. 2. Unsupervised learning using the Kohonen
rule [16] uses a competitive learning approach. In competitive learning, the out-
put neurons compete with each other. The winner output neuron has an output
of one, the rest of the output neurons have outputs of zero. The competitive
learning is suitable for classifying a given pattern into exactly one of mutually
exclusive clusters. The network is used to group patterns represented by m-
dimensional vectors into k groups. The network consists of two layers. The rst
layer is called the input layer and the second layer is called the Kohonen layer.
The network receives the input vector for a given pattern. If the pattern belongs
to the ith group, then the ith neuron in the Kohonen layer has a output value of
one and the other Kohonen layer neurons have output values of zero. Each
connection is assigned a weight wij. The weights of all the connections to a
Kohonen layer neuron make up an m-dimensional weight vector wi. The weight
vector wi for a Kohonen layer neuron is the vector representation of the group
corresponding to that neuron. For any input vector v, the network compares
the input with the weight vector for a group using a measure such as d(wi,v):
Pm 2
j1 wij
vj
dwi ; v 1
m
The pattern v belongs to the group with the minimum value for d(wi,v). The
Kohonen neural network generates the clusters through a learning process as
follows: Initially, the network connections are assigned somewhat arbitrary
weights. The training set of input vectors is presented to the network several
times. For each iteration, the weight vector wi for a group that is closest to
the pattern v is modied using the equation
1 0
0
Output
Layer
Input
Layer
wnew
i wold
i at
v
wold
i 2
where a(t) is a learning factor which starts with a high value at the beginning of
the training process and is gradually reduced as a function of time.
v 2 Axi ) v 2 Axi P2
222 P. Lingras et al. / Information Sciences 172 (2005) 215240
m
v belongs to two or more upper bounds.
Properties (P1)(P3) can be obtained from the properties of rough sets and
the fact that Xi \ Xj = ;, i 5 j. It is important to note that, (P1), (P2), (P3) are
not necessarily independent or complete, however, enumerating them will be
helpful in understanding the rough set adaptation of the Kohonen neural
networks.
Incorporating rough sets into the Kohonen algorithm requires the addition
of the concept of lower and upper bounds in the equations, which are used
for updating the weights of the winners. The Kohonen rough set architecture
is similar to the conventional Kohonen architecture. It consists of two layers,
an input layer and the Kohonen rough set layer (rough set output layer).
These two layers are fully connected. Each input layer neuron has a feed for-
ward connection to each output layer neuron. Fig. 3 illustrates the Kohonen
rough set neural network architecture for a one-dimensional case. A neuron
in the Kohonen layer consists of two parts, a lower neuron and an upper neu-
ron. The lower neuron has an output of 1, if an object belongs to the lower
bound of the cluster. Similarly, a membership in the upper bound of the clus-
ter will result in an output of 1 from the upper neuron. Since an object
belonging to the lower bound of a cluster also belongs to its upper bound,
when the lower neuron has an output of 1, the upper neuron also has an
out of 1. However, membership in the upper bound of a cluster does not nec-
essarily imply membership in its lower bound. Therefore, the upper neuron
contains the lower neuron.
The interval clustering provides good results, if initial weights are obtained
by running the conventional Kohonen learning.
Lower Lower
Lower
C1 C2 C3
Kohonen
Rough Set
Layer
Input
Layer
Step 1. Initialize weights from m inputs to the k output nodes using the con-
ventional Kohonen algorithm.
Step 2. For each object vector, v, let d(v, xi) be the distance between itself and
the weight vector xi of cluster Xi.
Step 3. The next step in the modication of the Kohonen algorithm for obtain-
ing rough sets is to design criteria to determine whether an object
belongs to the upper or lower bounds of a cluster. Let d(v, xi) =
dv;xi
min16j6kd(v, xj). The ratio dv;x j
, 1 6 j 6 k, is used to determine the
membership of v as follows.
Let T = {j : d(v, xi)/d(v, xj) 6 threshold, i 5 j}.
If T5;, v2Axi and v2Axj ; 8j 2 T . Furthermore, v is not part of
any lower bound. The above criterion guarantees that property (C3)
is satised. The weight vectors xi and xj are modied as:
xnew
i xold
i aupper t
v
xold
i
xnew
j xold old
j aupper t
v
xj :
It can be easily veried that the above algorithm preserves properties (P1)
(P3).
Kohonens learning law with a xed learning rate does not converge. Con-
vergence requires the sum of the innite sequence of learning rates to be in-
nite, while the sum of squared learning rates must be nite [16, p. 34].
Convergence to a local optimum can be obtained as the training time goes
to innity if the learning rate is reduced in a suitable manner as described
above [16]. Similar comments can be made regarding the proposed modica-
tion of the Kohonen algorithm. However, the application of the Kohonen
algorithm and the proposed modications have shown that reasonable or sta-
ble clustering is obtained with approximately 100 iterations.
While earlier temporal data mining used time-series data for traditional data
mining tasks, there is an increasing interest in studying changes in the data and
224 P. Lingras et al. / Information Sciences 172 (2005) 215240
The data was collected from a rural area supermarket, which is part of a
national Canadian chain of stores. The data collection spanned a 24 weeks
P. Lingras et al. / Information Sciences 172 (2005) 215240 225
There were no missing values in the data obtained from the supermarket.
The objective of the study was to nd out the interval set clustering based
on the customers spending potential and loyalty. Customer visit patterns were
used as an indication of loyalty and spending patterns were used as an indica-
tion of spending potential. Commercial customers such as restaurants and
small stores tend to buy large volumes of products. Their inclusion with regular
households can skew the analysis. Stores also maintain a stack of cards that are
used by the cashiers to help regular customers (who may have forgotten their
cards) take advantage of special oers. Again, inclusion of such cards with the
regular households is not advisable. It was decided to only include those cus-
tomers who visited six or less times per week, and spend $1000 or less per week.
This helped us eliminate commercial customers and store cards used for multi-
ple households.
The process of data abstraction depends on the goal of the study, and the
contents of the data itself. The available data consisted of transactions for
Table 1
Transactional data set
Size of input Number of Number of Number of
transactions customers customers four-week records
after cleaning after cleaning
3,691,611 22,448 22,240 133,440
226 P. Lingras et al. / Information Sciences 172 (2005) 215240
all the items sold by a single store. As mentioned before, one of the goals of the
study was to cluster the customers using visit patterns (loyalty) and spending
patterns (spending potential). Data summarization consisted of creating one
record per household consisting of the number of visits and amount of spend-
ing per week from the transactions records. The abstract representation of the
customer consisted of a vector consisting of customer ID, which is house num-
ber followed by 52 elds, 26 weekly visits and 26, weekly spending amounts.
These records were further segmented as described in the following section.
The primary goal of this study was to nd the temporal migrations of custom-
ers between clusters. In order to study the temporal changes, the abstracted data
records were segmented based on time units. The time unit was chosen to be
approximately a month, that is four weeks. This is based on an analysis of prod-
uct purchases that suggests that a typical customer will buy all the necessary
household products over the span of one month. Given 26 weeks, this segmen-
tation resulted in six segments of four weeks each. The last two weeks of data
were discarded. The data for each month was stored in separate data les. Each
data le consisted of four week representation of customer loyalty and spending
potential. Each record in the les consisted of customer ID, followed by eight
time series vector consists of four weekly visits, and four weekly spending.
The last phase in the data preparation consisted of sorting the time series val-
ues for each data segment in ascending order. As mentioned before, usually a
customer buys most of the necessary products in a given month. However,
the actual timing of purchases may vary from customer to customer. It is pos-
sible that customers with similar proles may spend dierent amounts in a given
week. However, if the values are sorted, the dierences between these customers
may vanish. For example, four weeks spending of a customer may be $100, $70,
$50, and $40. Another customer may spend $40, $70, $100, and $50 in those
four weeks. If the two time-series were compared with each other, the two cus-
tomers may seem to have completely dierent proles. If the time series values
were sorted, the two customers will have identical patterns. Therefore, both the
visit patterns and spending patterns were sorted [20,21].
tomers. The previous studies had labeled these categories as: G1 for loyal big
spenders, G2 for loyal moderate spenders, G3 for semi-loyal moderate spend-
ers, G4 for semi-loyal potentially big spenders, and G5 for infrequent
customers.
The modied Kohonen SOM consisted of eight input neurons correspond-
ing to the four weekly visits and four weekly spending. The average spending
per visit was $25. The ratio of 25:1 in favour of spending would have weighted
the clustering more based on the spending than visits. Lingras and Adams [20]
experimented with various weighting schemes. Based on their experience, the
values were scaled to make spending only twice as important as visits (as op-
posed to 25 times more important with raw values). In other words, the spend-
ing amounts were normalized to values between 0 and 2, while visits were
normalized to values between 0 and 1. The rough set output layer consisted
of ve rough set neurons (each with its lower and upper component) corre-
sponding to the ve categories described earlier. After experimenting with a
range of values, the threshold was set at 0.7, alower(t) was chosen to be 0.01,
and 0.005 was used as the value of aupper(t).
The results obtained from this work are discussed in the following three
steps:
2.5 250
2 200
1.5 150
1 100
0.5 50
0 0
V1 V2 V3 V4 S1 S2 S3 S4
(a) (b)
G1 G2 G3 G4 G5
Fig. 4. Non-temporal cluster analysis using conventional Kohonen SOM over a four-week period:
(a) non-temporal visiting behaviour and (b) non-temporal spending behavior.
Loyal big spenders (G1). This group consists of the largest spenders. They
are frequent visitors and seem to be very loyal to the store. Obviously,
the store would like to encourage continued patronage from such a
group.
Loyal moderate spenders (G2). Even though the maximum spending for these
customers was smaller than G4, their spending patterns were the most stable
among all the groups. The total number of visits was comparable to G1. These
customers may be the most loyal among all the groups. They are not big
spenders like the customers from G1 and G4. Additional analysis shows
that these customers receive higher discounts indicating a value conscious
nature.
Semi-loyal moderate spenders (G3). These customers are similar to those from
G5. However, their spending and visits over 24 weeks indicate that these cus-
tomers are more frequent and spend a little more than those from G5. It is also
possible that they dont always use the supermarket card.
Semi-loyal potentially big spenders (G4). In terms of maximum amount spent,
this group is comparable to the rst group. Based on this observation alone,
one may categorize these customers as the second most loyal customers. How-
ever, the weekly patterns indicate that for 12 weeks these customers tended to
stay away from the store. The supermarket may not be attracting a signicant
portion of purchases from these customers. More incentives to increase patron-
age from these customers may be worthwhile.
Infrequent customers (G5). Customers from this group are the least loyal to the
store among all the groups. They seem to have only visited the store once or
twice during the 24 weeks. The spending levels were very limited as well. It
is also possible that some of these customers do not use the Supermarket card
on a regular basis.
P. Lingras et al. / Information Sciences 172 (2005) 215240 229
Fig. 5 shows the visit and spending patterns for the lower bounds, upper
bounds, and boundary regions of ve clusters obtained using the modied
Kohonen SOM. The patterns for the ve clusters are similar to the ones ob-
tained from the conventional Kohonen SOM. However, there are subtle dier-
ences. For example, the visit patterns for the lower bounds seem to separate
loyal groups G1 and G2 from semi-loyal groups G3 and G4 more clearly than
the conventional clustering. On the other hand, the distinctions for upper
2.5 250
2 200
1.5 150
1 100
0.5 50
0 0
V1 V2 V3 V4 S1 S2 S3 S4
(a) (b)
2.5 250
2 200
1.5 150
1 100
0.5 50
0 0
V1 V2 V3 V4 S1 S2 S3 S4
(c) (d)
2.5 250
2 200
1.5 150
1 100
0.5 50
0 0
V1 V2 V3 V4 S1 S2 S3 S4
(e) (f)
G1 G2 G3 G4 G5
Fig. 5. Non-temporal cluster analysis using modied Kohonen SOM over a four-week period:
(a) low visits behavoir; (b) low spending behavior; (c) upper visits behavior; (d) upper spending
behavior; (e) visiting behavior in Bnd region and (f) spending behavior in Bnd region.
230 P. Lingras et al. / Information Sciences 172 (2005) 215240
bounds are less clear, and the boundary regions seem to have the least distinc-
tion between loyal and semi-loyal customers. Fig. 6 shows a comparison of
sizes of conventional clusters and the corresponding interval representations.
The sizes of conventional clusters are in the range between lower and upper
bounds. This fact lends further credibility to the appropriateness of interval
clustering using the modied Kohonen SOM.
Table 2 shows the detailed breakdown of the sizes of lower bounds, upper
bounds, and boundary regions of ve groups for six time periods. Each period
is four weeks or approximately one month long. These periods include exactly
the same customers. Therefore, changes in the clustering from one period to
another suggest temporal changes in customer shopping behaviour. The varia-
tions in the clusters can be better understood by studying their temporal pat-
terns. In order to eliminate the dierences between various clusters, the sizes
of sets were divided by the average set size. Figs. 7 and 8 show the variations
Size
in the ratios AverageSize for lower bounds, upper bounds, and the boundary re-
gion. If the ratio varies in a small range around 1, the group sizes are relatively
stable. In all the three cases, sizes of group G5 (infrequent customers) were
most stable, followed by G3 (semi-loyal moderate spenders). The sizes of loyal
moderate spenders were also relatively stable. The loyal and semi-loyal big
spender groups, G1 and G4, showed the most variation in sizes. In particular,
the semi-loyal big spenders uctuated signicantly. Big spenders are important
90000
75000
60000
Lower
Size
45000 Crisp
Upper
30000
15000
0
1 2 3 4 5
Group
Table 2
Temporal cluster analysis using modied Kohonen SOM
Class Total region size Month
M-1 M-2 M-3 M-4 M-5 M-6
(a) Sizes of lower bounds during six periods
G1 3283 423 635 630 508 537 550
G2 10,605 1602 1811 1767 1811 1773 1841
G3 23,352 3753 3904 4034 3789 3751 4121
G4 5038 748 802 1075 822 740 851
G5 81,562 14,276 13,385 13,162 13,766 13,778 13,195
(b) Sizes of upper bounds during six periods
G1 4935 615 919 946 760 840 855
G2 15,045 2234 2605 2477 2548 2524 2657
G3 30,744 4899 5215 5170 5005 5033 5422
G4 6966 1015 1171 1474 1081 1069 1156
G5 85,634 14,947 14,093 13789 14,434 14,488 13,887
(c) Sizes of boundary regions during six periods
G1 1652 192 284 316 252 303 305
G2 4440 632 794 710 737 751 816
G3 7392 1146 1311 1136 1216 1282 1301
G4 1928 267 369 399 259 329 305
G5 4072 671 708 627 668 706 692
Customer retention is less expensive than acquiring new ones. Retailers can
learn from catalog marketers customer retention practices [25]. This is impor-
tant for two reasonsretaining the most valuable customers and upgrading
232 P. Lingras et al. / Information Sciences 172 (2005) 215240
1.3
1.2
0.7
0.6
1 2 3 4 5 6
(a) Period
1.3
1.2
1.1
Size/Average Size
G1
1 G2
G3
0.9 G4
0.8 G5
0.7
0.6
1 2 3 4 5 6
(b) Period
1.3
1.2
1.1
Size/Average Size
G1
1 G2
G3
0.9
G4
0.8 G5
0.7
0.6
1 2 3 4 5 6
(c) Period
Fig. 7. Temporal changes in interval clustering: (a) relative changes in the lower bound; (b) relative
changes in the upper bound and (c) relative changes in the boundary region.
P. Lingras et al. / Information Sciences 172 (2005) 215240 233
60%
Boundary/Lower
40% G1
G2
G3
G4
20% G5
0%
1 2 3 4 5 6
Period
less valuable customers into higher value ones [33]. Categorizing customers
across three parameters recency of purchase, frequency of purchase and mon-
etary value of transactions (RFM) aids in segmenting customers. The clusters
obtained in this study based on visits and spending correspond to the frequency
and monetary aspects of an RFM model. The recency component can be in-
cluded in the analysis through temporal analysis of cluster memberships. This
section shows how temporal analysis of cluster memberships can help in iden-
tifying attrition.
Attrition is a growing problem in todays increasingly competitive supermar-
ket industry. It is essential for companies to identify attrition among their cus-
tomer base, so that a better understanding of the inuencing factors can be
achieved. This will give companies an opportunity to minimize attrition in
the future. Data mining provides useful tools for the analysis of attrition,
including clustering, to better understand factors aecting attrition among cur-
rent customers, and prediction, to identify customers at risk before they leave,
allowing an opportunity to take preventative actions.
The conventional denition of attrition is based on sustained absence from
the store. However, a reduced patronage may usually be a prelude to attrition.
West et al. [36] provided an improved analysis of attrition based on cluster
memberships. After experimenting with a variety of ctitious lost and contin-
uing customers, West et al. proposed a cluster membership variation measure
Xn
attrition Di;i1
K
k i1 k i1
C
i1
234 P. Lingras et al. / Information Sciences 172 (2005) 215240
Weeks
1 2 3 4 5 6
0
1
Cluster membership
Weeks
1 2 3 4 5 6
0
1
Cluster membership
1.2
0.8
Visits
0.6
0.4
0.2
0
1 3 5 7 9 11 13 15 17 19 21 23
Weeks
60
50
40
Spending
30
20
10
0
1 3 5 7 9 11 13 15 17 19 21 23
Weeks
Fig. 11. Average spending and visits patterns for 50 continuing customers.
7. Conclusion
1.2
0.8
Visits
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Weeks
60
50
40
Spending
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Weeks
Fig. 12. Average spending and visits patterns for 50 lost customers.
interval set clusters and the crisp clusters shows that the interval set clusters
provide clusters that are similar to the conventional crisp clusters. The interval
clusters can provide more information through the boundary regions of cus-
tomers with ambivalent behaviour. The interval clustering was applied to six
sequential periods. Each period was four-weeks or approximately one month
long. The temporal clustering analysis of customers discovered some interest-
ing patterns. For example, the two big spender groups tended to uctuate more
than the other groups. Boundary regions uctuated more than the lower and
upper bounds. The big spender groups also had consistently higher percentages
of customers in boundary regions. The paper also reported results from iden-
tication of attrition based on temporal analysis of cluster memberships. The
analysis clearly indicates that further investigations into detailed temporal buy-
ing patterns of the big spenders in general, and boundary regions of these
groups in particular, may be helpful in designing special promotional cam-
paigns for increasing sales. The results of such investigations, including studies
238 P. Lingras et al. / Information Sciences 172 (2005) 215240
References
[1] C.C. Aggarwal, A framework for change diagnosis of data streams, in: Proceedings of ACM
SIGMOD Conference, 2003, pp. 575586.
[2] C. Antunes, A. Oliveira, Temporal data mining: an overview, in: Proceedings of KDD 2001
Workshop on Temporal Data Mining, http://www.acm.org/sigkdd/kdd2001/Workshops/
ano.pdf, 2001.
[3] I.V. Cadez, D. Heckerman, C. Meek, P. Smyth, S. White, Model-based clustering and
visualization of navigation patterns on a Web site, Journal of Data Mining and Knowledge
Discovery 7 (4) (2003). http://www.datalab.uci.edu/papers/webcanvas.pdf.
[4] J. Cliord, A. Tuzhilin, Recent Advances in Temporal Databases, Springer-Verlag, Berlin,
1995.
[5] H.A. do Prado, P.M. Engel, H.C. Filho, Rough clustering: an alternative to nding
meaningful clusters by using the reducts from a dataset, in: J. Alpigini, J. Peters, A. Skowron,
N. Zhong (Eds.), Rough Sets and Current Trends in Computing (RSCTC02), Lecture Notes
in Articial Intelligence, 2475, Springer-Verlag, Berlin, 2002.
[6] G. Dong, J. Han, L.V.S. Lakshmanan, J. Pei, H. Wang, P.S. Yu, Online mining of changes
from data streams: research problems and preliminary results, in: Proceedings of the 2003
ACM SIGMOD Workshop on Management and Processing of Data Streams, 2003.
[7] V. Ganti, J. Gehrke, R. Ramakrishnan, A framework for measuring changes in data
characteristics, Journal of Computer and System Sciences 64 (2002) 542578.
[8] S.-H. Ha, S.-M. Bae, S.-C. Park, Customers time-variant purchase behavior and correspond-
ing marketing strategies: an online retailers case, Computer and Industrial Engineering 43 (4)
(2002) 801820.
[9] S. Hirano, X. Sun, S. Tsumoto, Comparison of clustering methods for clinical databases,
Information Sciences 159 (34) (2004) 155165.
[10] S. Hirano, S. Tsumoto, Rough clustering and its application to medicine, Information Sciences
124 (2002) 125137.
[11] M. Hogo, M. Snorek, P. Lingras, Temporal web usage mining, in: Proceedings of 2003 IEEE/
WIC International Conference on Web Intelligence, 2003, pp. 450453.
[12] H. Jin, W. Shum, K. Leung, M. Wong, Expanding self-organizing map for data visualization
and cluster analysis, Information Sciences 163 (13) (2004) 157173.
[13] T. Joachims, R. Armstrong, D. Freitag, T. Mitchell, Webwatcher: a learning apprentice for the
world wide web, in: Proceedings of AAAI Spring Symposium on Information Gathering from
Heterogeneous, Distributed Environments, 1995.
[14] A. Joshi, R. Krishnapuram, Robust fuzzy clustering methods to support web mining, in:
Proceedings of the Workshop on Data Mining and Knowledge Discovery, SIGMOD 98, 1998,
pp. 15/115/8.
[15] D. Kifer, S. Ben-David, J. Gehrke, Detecting change in data streams, in: Proceedings of the
30th VLDB Conference, 2004, pp. 180191.
[16] T. Kohonen, Self-Organization and Associative Memory, Springer-Verlag, Berlin, 1988.
[17] M. Koprulu, N. Cicekli, A. Yazici, Spatio-temporal querying in video databases, Information
Sciences 160 (14) (2004) 131152.
[18] P. Lingras, Unsupervised rough set classication using GAs, Journal of Intelligent Informa-
tion Systems 16 (3) (2001) 215228.
[19] P. Lingras, Rough set clustering for web mining, in: Proceedings of 2002 IEEE International
Conference on Fuzzy Systems, 2002.
P. Lingras et al. / Information Sciences 172 (2005) 215240 239
[20] P. Lingras, G. Adams, Selection of time-series for clustering supermarket customers, Technical
Report 2002_006, Department of Mathematics and Computing Science, Saint Marys
University, Halifax, NS, Canada, 2002.
[21] P. Lingras, M. Hogo, M. Snorek, Interval set clustering of web users using modied kohonen
self-organizing maps based on the properties of rough sets, Web Intelligence and Agent
Systems: An International Journal 2 (3) 217225.
[22] P. Lingras, M. Hogo, M. Snorek, B. Leonard, Clustering supermarket customers using rough
set based Kohonen networks, in: Proceedings of Fourteenth International Symposium on
Methodologies for Intelligent Systems, Notes in Articial Intelligence Series, 2871, Springer,
Berlin, 2003, pp. 169173.
[23] P. Lingras, C. West, Interval set clustering of Web users with rough K-means, Journal of
Intelligent Information Systems 23 (1) (2004) 516.
[24] P. Lingras, L. Young, Multi-criteria time-series based clustering of supermarket customers
using Kohonen networks, in: Proceedings of the 2001 International Conference on Articial
Intelligence (IC-AI2001), vol. I, 2001, pp. 158164.
[25] J. Novo, Drilling Down: Turning Customer Data into Prots with a Spreadsheet,
Booklocker.com Inc, Saint Petersburg, FL, 2004.
[26] Z. Pawlak, Rough sets, International Journal of Information and Computer Science 11 (1982)
145172.
[27] Z. Pawlak, Rough classication, International Journal of ManMachine Studies 20 (1984)
469483.
[28] W. Pedrycz, A. Gacek, Temporal granulation and its application to signal analysis,
Information Sciences 143 (14) (2002) 4771.
[29] L. Rodrguez, H. Ogata, Y. Yano, TVOO: A temporal versioned object-oriented data model,
Information Sciences 114 (14) (1999) 281300.
[30] M. Shyu, C. Haruechaiyasak, S. Chen, Category cluster discovery from distributed WWW
directories, Information Sciences 155 (34) (2003) 181197.
[31] A. Skowron, J. Stepaniuk, Information granules in distributed environment, in: N. Zhong, A.
Skowron, S. Ohsuga (Eds.), New Directions in Rough Sets, Data Mining, and Granular-Soft
Computing, Lecture Notes in Articial Intelligence, 1711, Springer-Verlag, Berlin, 1999, pp.
357365.
[32] H.S. Song, J.K. Kim, S.H. Kim, Mining the change of customer behavior in an internet
shopping mall, Expert Systems with Applications 21 (3) (2001) 157168.
[33] K. Suresh, Retailing Concepts and Cases, ICFAI University Press, Hyderabad, India, 2002.
[34] S. Tomic, S. Vrbsky, T. Camp, A new measure of temporal consistency for derived objects in
real-time database systems, Information Sciences 124 (14) (2000) 139152.
[35] K. Wang, S. Zhou, A.W.-C. Fu, J.X. Yu, Mining changes of classication by correspondence
tracing, in: Proceedings of the 2003 SIAM International Conference on Data Mining (SDM
2003), San Francisco, CA, 2003.
[36] C. West, A. Jain, P. Lingras, B. Leonard, Supermarket customer attrition analysis based on
cluster membership patterns, in: Proceedings of the First Indian International Conference on
Articial Intelligence, 2003, pp. 11321140.
[37] T. Yamasaki, Y. Kataoka, K. Kameyama, K. Nakano, Neural networks handling sequential
patterns, Information Sciences 159 (34) (2004) 141154.
[38] X. Yao, Research issues in spatio-temporal data mining http://www.ucgis.org/Visualization/
whitepapers/Yao-KDVIS2003.pdf 2003.
[39] Y. Yao, Relational interpretations of neighborhood operators and rough set approximation
operators, Information Sciences 111 (14) (1998) 239259.
[40] Y. Yao, Constructive and algebraic methods of the theory of rough sets, Information Sciences
109 (14) (1998) 2147.
240 P. Lingras et al. / Information Sciences 172 (2005) 215240
[41] Y. Yao, A comparative study of fuzzy sets and rough sets, Information Sciences 109 (14)
(1998) 227242.
[42] Y.Y. Yao, X. Li, T.Y. Lin, Q. Liu, Representation and classication of rough set models, in:
Proceedings of Third International Workshop on Rough Sets and Soft Computing, 1994, pp.
630637.