You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221653393

Efficient mining of weighted association rules (WAR).

Conference Paper · January 2000


Source: DBLP

CITATIONS READS
69 227

3 authors, including:

Wei Wang Philip S. Yu


Chinese Academy of Sciences University of Illinois at Chicago
1,003 PUBLICATIONS   19,435 CITATIONS    1,508 PUBLICATIONS   61,035 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

BiAffect View project

Web-KDD - KDD Workshop Series on Web Mining and Web Usage Analysis View project

All content following this page was uploaded by Philip S. Yu on 23 April 2015.

The user has requested enhancement of the downloaded file.


Efficient Mining of Weighted Association Rules (WAR)

Wei Wang Jiong Yang Philip S. Yu


IBM Watson Research Center IBM Watson Research Center IBM Watson Research Center
ww1@us.ibm.com jiyang@us.ibm.com psyu@us.ibm.com

ABSTRACT vital information. Assume, for example, that if a customer


In this paper, we extend the tradition association rule prob- buys more than 7 bottles of soda, he is likely to purchase 3
lem by allowing a weight to be associated with each item in or more bags of snacks. Otherwise, the purchase tendency
a transaction, to re ect interest/intensity of the item within of soda is not strong. The traditional association rule can-
the transaction. This provides us in turn with an oppor- not express this type of relationship. With this knowledge,
tunity to associate a weight parameter with each item in the supermarket manager may set a promotion such as \if
the resulting association rule. We call it weighted associ- a customer buys 8 bottles of soda, he can get two free bags
ation rule (WAR). WAR not only improves the con dence of snacks."
of the rules, but also provides a mechanism to do more ef-
fective target marketing by identifying or segmenting cus- In this paper, we rst extend the traditional association rule
tomers based on their potential degree of loyalty or volume problem by allowing a weight to be associated with each item
of purchases. Our approach mines WARs by rst ignoring in a transaction, to re ect interest/intensity of each item
the weight and nding the frequent itemsets (via a tradi- within the transaction. In turn, this provides us with an
tional frequent itemset discovery algorithm), and is followed opportunity to associate a weight parameter with each item
by introducing the weight during the rule generation. It is in a resulting association rule. We call them weighted asso-
shown by experimental results that our approach not only ciation rules (WAR). For example, soda[4; 6] ! snack[3; 5]
results in shorter average execution times, but also produces is a weighted association rule indicating that if a customer
higher quality results than the generalization of previous purchases soda in the quantity between 4 and 6 bottles, he
known methods on quantitative association rules. is likely to purchase 3 to 5 bags of snacks. Thus WAR can
not only improve the con dence in the rules, but also pro-
Categories and Subject Descriptors vide a mechanism to do more e ective target marketing by
H.2.8 [Information Systems]: Database Management| identifying or segmenting customers based on their potential
database applications degree of loyalty or volume of purchases.
Previous work dealing with numerical attributes includes the
General Terms quantitative association rule approach and optimized asso-
Weighted association rules, Ordered shrinkage ciation rule approach. These approaches are not designed
for weighted association rules [4]. In the problem we study
1. INTRODUCTION in this paper, there can be a very large number of items
Association rule discovery has been an active research topic and every item has a numerical attribute, although only a
during recent years. However, the traditional association small fraction of items are present in a transaction. Thus,
rules focus on binary attribute. This model only considers in our approach, the frequent itemsets are rst discovered
whether an item is present in a transaction, but does not (without considering weights), and then the weighted associ-
take into account the weight/intensity of an item within a ation rules for each frequent itemset are generated. Our goal
transaction. For example, a customer may purchase 10 bot- is to segment the weight domain of each item in the item-
tles of soda and 5 bags of snacks and another may purchase set so that rules with higher con dence can be discovered.
4 bottles of soda and 1 bag of snacks at a time. These two Moreover, the speci ed weight interval of each attribute in a
transactions will be treated the same in the conventional as- weighted association rule should also coincide with the nat-
sociation rule approach. This could lead to the loss of some ural distribution of the data and human intuition. In most
case, only the weight interval combinations that represent a
signi cant number of transactions are interesting. Our goal
can be transformed into nding highly populated regions.
Thus, we use another metric density for such purpose. As
a result, the weight domain space of each frequent itemset
is partitioned into ne grids. A density threshold is used to
separate the transaction concentrating regions from the rest.
WARs can be identi ed based on these \dense" regions. The
contributions of this paper are summarized as follows.
 A new class of association rule problems | WAR is that the support of a weighted itemset is always less than
proposed. or equal to the support of any of its generalization.
 Due to the nature of this problem, the mining process A weighted association rule (WAR) is an implication
is accomplished by a twofold approach: rst generating X ! Y where X and Y are two weighted itemsets and
frequent itemsets and then deriving WARs from each item(X ) \ item(Y ) = ;. A transaction is said to support
frequent itemset. a WAR X ! Y i this transaction supports the weighted
itemset X [ Y . In turn, we de ne the support of the WAR
 During the WAR derivation process, as the support of X [ Y . In addition, we say that the WAR
X ! Y holds in the transaction set < with con dence c
{ The concept of density is employed to separate i c% of the transactions in < that support X also support
transaction concentrated regions from the rest. Y . In other words, the con dence of the WAR is the ratio
{ An ecient ordered shrinkage algorithm is pro- of the support of X [ Y over the support of X . The density
posed to derive WARs from a high density re- of a WAR is de ned as the ratio of the actual support of
gion through shrinkages to meet the con dence the WAR and the \expected" support of the WAR. We will
requirement. elaborate on the density de nition in Section 5. In this
paper, we assume that Y only contains one weighted item
for the sake of simplicity.
The remainder of this paper is organized as follows. The
problem is formulated in Section 2, while Section 3 out- Given a transaction set <, our objective is to nd a set
lines the general approach. Sections 4 and 5 present the of Weighted Association Rules (WAR) which have sup-
space partition and WAR generation, respectively. Section port, con dence, and density greater than or equal to some
6 shows the experimental results. A conclusion is drawn in user-speci ed minimum support (referred to as minsup), min-
Section 7. imum con dence (referred to as minconf), and minimum
density (referred to as d). Since there could be a huge
2. PROBLEM FORMULATION number of quali ed WARs, in this paper, we aim at mining
Let = = fi1 ; i2 ; : : : ; iM g be a set of items and } be the set maximum WAR. A quali ed WAR X ! Y is a maxi-
of non-negative integers. A pair hx; wi is called a weighted mum WAR if for any generalization X 0 of X and Y 0 of Y
item, where x 2 = is an item and w 2 } is the weight where X 0 6= X and Y 0 6= Y , neither of X 0 ! Y , X ! Y 0 ,
associated with x. A transaction is a set of weighted items. nor X 0 ! Y 0 is a quali ed WAR.
For example, T1 = fhfashion; 15i; hsports; 10ig and T2 =
fhfashion; 20i; hbook; 5ig are two transactions. An interval 3. GENERAL APPROACH
weighted item is a triple hx; l; ui. This denotes the fact that Clearly, there can be a huge number of potential WARs, due
the weight associated with the item x is within the range to the numerical nature of the weight. Ecient pruning of
[l; u] where l and u are non-negative integers and l  u. such a huge search space becomes a crucial task in the min-
Note that we can always view a weighted item hx; wi as ing process. Unlike [3], we design a twofold approach based
a special case of the interval weighted item hx; l; ui where on an observation we made in Section 2: the support of a
w = l = u. Therefore, we will use the term weighted item weighted itemset is always less than or equal to the support
if no ambiguity will occur. Given two weighted items I1 = of any of its generalizations. This indicates that, for any
hx1 ; l1 ; u1 i and I2 = hx2 ; l2 ; u2 i, we call I1 a generalization weighted itemset I , its support is always less than or equal to
of I2 (and I2 is a specialization of I1 ) i x1 = x2 and the support of item(I ). This suggests that we can rst cal-
l1  l2  u2  u1 . For example, hfashion; 10; 20i is a culate frequent itemsets (without considering the weights)
specialization of hfashion; 10; 25i. Note that any weighted and then examine the weight factor for each frequent itemset
item hx;l; ui can be view as a specialization of the item x. to generate the WARs. A frequent (weighted) itemset is a
(weighted) itemset whose support is larger than or equal to
We use the term weighted itemset to represent a set of the threshold minsup. Thus, we employ a twofold approach.
weighted items. Let item(X ) denote the set of items that (1) Generate frequent itemsets. In this step, we ignore the
are involved in a weighted itemset X , i.e., item(X ) = fx j weight associated with each item in the transaction set. (2)
x 2 =; hx;l; ui 2 X g. Given two weighted itemsets X1 and For each frequent itemset, nd the WAR(s) that meets the
X2 , X1 is a specialization of X2 (or X2 is a generalization support, con dence, and density thresholds.
of X1 ) i item(X1 ) = item(X2 ) and each weighted item
in X1 is a specialization of a weighted item in X2 . For Since the rst phase is mainly the frequent itemset counting
instance, fhfashion; 10; 20i; hbook; 5; 7ig is a specialization (as in the traditional association rule mining), we will not
of fhfashion; 10; 20i; hbook; 5; 10ig. Given a transaction T elaborate it in this paper. After obtaining the set of frequent
and a weighted item hx; l; ui, we say that T supports this itemsets, referred to as F , we examine them to generate the
weighted item i there exists a weighted item hx;wi 2 T weighted association rules. Given an itemset I of cardinality
such that hx;wi is a specialization of hx; l; ui. Similarly, n, the domain of the weights of all items forms an n dimen-
we say that a transaction T supports a weighted itemset sional space where each dimension corresponds to the weight
X i T supports each individual weighted item in X . For of one item. (For simplicity, we assume that the weight range
instance, if X = fhfashion; 10; 20i; hbook; 5; 10ig, then T2 on each dimension to be the same.) Each specialization of
supports X while T1 does not. Given a weighted itemset X I corresponds to an n-dimensional (rectangular) box within
and a set of transactions, referred to as <, we say X has this space. Our objective is to nd the maximum box(es)
support s in < i s% of transactions in < support X . Note so that support, con dence, and density are satis ed. To
facilitate this process, we discretize the space into a set of algorithms, as the projection of an n-dimensional grid onto
grids and divide the second phase further. an (n 1)-dimensional subspace does not align on the grid
boundaries of the (n 1)-dimensional subspace. In fact,
1. Space partition and counter generation: The goal is such projection is always of a looser granularity than the
to identify, for each frequent itemset, those (dense) grid in the (n 1)-dimensional space. This can be easily ob-
grids that satisfy the density requirement. An ecient served in Figure 1. The projection of a grid in Figure 1(1)
pruning technique is provided to reduce the number of overlaps with 9 grids in Figure 1(2).The levelwise pruning in
grids that need to be evaluated and maintained. [2] can still be adopted with one modi cation: given an n-
dimensional grid g, if the support of the union of the (n 1)-
2. WARs generation: The goal is to generate the maxi- dimensional grids that can cover the projection of g is below
mum WARs from the dense boxes enclosing adjacent d  , than g cannot be a dense grid. For example, if the
dense grids. As the dense boxes generally do not sat- support of the shaded area in Figure 1(2) is below d  ,
isfy the con dence requirement, an ordered shrinkage then the grid in Figure 1(1) cannot be dense. After nding
approach is developed to shrink the dense boxes in an all dense grids, dense regions, which serve as the basis for
orderly way to meet the con dence requirement. WAR generation, can be easily identi ed by a depth- rst
traversal through neighboring dense grids similar to [2].
4. SPACE PARTITION
The density concept is introduced to develop e ective prun- sports sports

ing techniques for identifying candidate boxes for WAR min-


ing. We want to keep the additional parameters that need to
be speci ed by the users to a minimum. The intent is to in-
troduce one density threshold that is applicable to all grids,
regardless of the number of dimensions. A straightforward fashion fashion

partitioning method is to divide each dimension into a xed


number of partitions. However, under this approach, as the (1) (2)
number of dimension increases, the number of grids grows books
exponentially, while the average density of each grid drops
rapidly. This implies that di erent density thresholds would Figure 1: Partitioning and Counter Pruning
be needed for grids of di erent dimensionalities. We hence
use an alternative approach that keeps the number of grids 5. WAR GENERATION
N xed regardless of the number of dimensions. Another Given an n-itemset I = fi1 ; i2 ; : : : ; in g, there are poten-
advantage of this partitioning method is that we can easily tially n di erent WAR formats because each item might
control the number of counters by varying the value of N . serve as the right hand side. In order to search for the
This provides us with the capability to take full advantage range associated with each item, we examine each dense re-
of the available storage space dynamically. gion sequentially. Given a dense region, if its support is
below the required minsup threshold, we simply discard it,
The grid is taken as the granularity of our WAR mining since no quali ed WAR can be generated from this dense
process. Any n-dimensional box within this space, which is region. Otherwise, we start from the minimum bounding
the union of a set of adjacent grids and is rectangular in box of the dense region. The minimum bounding box is the
shape, uniquely corresponds to a specialization of I . Thus, box with smallest volume which contains the dense region.
in the remainder of this paper, we will use the term \box" In general, such a box itself may not satisfy the con dence
and \weighted itemset" interchangeably if no ambiguity will requirement, but may contain dense box(es) corresponding
occur. Let be the average number of transactions that to quali ed WAR(s). Thus, we choose to shrink the mini-
support a grid. We have = j<j N . The density of a grid mum bounding box towards the maximum WARs. An al-
is de ned as the ratio of the support of this grid and . ternative algorithm is to pick a grid and grow towards the
Intuitively, density can be viewed as an indication of the maximum WARs. Since the maximum WARs usually have
relative concentration of transactions within the space. A large volumes, shrinking towards a maximum WAR is gen-
grid is dense i its density is above a threshold d, where erally more ecient than growing towards it. To the best
d > 1 is a small real number. In order to limit the search of our knowledge, this is the rst algorithm to use shrinking
space, we only investigate dense grids and the region formed instead of growing in mining association rules or cluster-
by them. Unlike a box, a region does not have to be of rect- ing. A shrinkage is de ned as the action that reduces the
angular shape. The motivation behind this is that we only span of a box over one dimension by exactly one base in-
want to report WARs which represent signi cant patterns in terval. At each step, a candidate box is chosen to perform
the data. Therefore, a range will not be included in a WAR a shrinkage to generate a new dense box that may serve
if there is not enough evidence (density) to support it. A as a candidate for further shrinkage at a later step. The
dense region is the union of a set of adjacent dense grids. process ends when the newly generated dense box meets the
Each box within a dense region, referred to as dense box, is con dence requirement or all remaining candidate boxes fall
a candidate of frequent weighted itemset. The volume of a to meet the support requirement. Without loss of general-
box is de ned as the number of distinct grids it contains. ity, we assume that there is only one dense region. Fig-
ure 2(a) is a dense region whose minimum bounding box
Counter Pruning is shown in Figure 2(b), where the dark shaded grids and
Although this approach needs one density threshold for prun- light shaded grids are dense grids and non-dense grids, re-
ing, it does create additional complexities on the pruning spectively. For each item, the range of its weight can be
shrunken towards two directions: increasing the lowerbound step, there are multiple candidate boxes in j and di erent
or decreasing the upperbound. Thus, there are 2n di erent shrinking directions to choose from. As a result, di erent
shrinkages applicable to a weighted n-itemset. As a result, algorithms for picking candidate box and shrinking direc-
2n new weighted n-itemsets can be generated. Since these tion can produce di erent sequences of operations and hence
new weighted item sets are produced by one shrinkage from could require di erent number of operations before all nec-
the original weighted itemset, we call them the immediate essary WARs are generated. Therefore, the eciency of an
specializations of the original weighted itemset. The original algorithm depends on the amount of time consumed at each
weighted itemset is an immediate generalization of these new operation and the number of operations needed.
weighted itemsets. Figure 2(c) shows the six di erent direc-
tions to shrink a weighted 3-itemset and their corresponding One possible approach is the brute force algorithm. Let 0j
outcomes. be j excluding those boxes which satisfy either of the fol-
lowing conditions: (1) all of its immediate specializations
are in j , or (2) it is a maximum WAR. At each step,
dimension 3

dimension 3
the brute force algorithm randomly picks a box from 0j .
The algorithm terminates when 0j = ;. The computational
 p 
complexity of the brute force algorithm is O (2n)n N .
n
2

dimension 1 dimension 1
Therefore, this approach is indeed inecient. Intuitively,
n

n
sio

sio
en

en

this ineciency is caused by the fact that a box may be


m

m
di

di

visited multiple times via di erent shrinkage paths. This


(a) original dense region (b) minimum bounding box

would lengthen the operation sequence and thus cause the


dimension 3

dimension 3

dimension 3

ineciency. For example, there are two paths to reach the


box in Figure 2(d)(2) from the box in Figure 2(d)(1). One
is \reducing the upper bound of dimension 2" followed by
\reducing the upper bound of dimension 1", while the other
2

is \reducing the upper bound of dimension 1" followed by


dimension 1 dimension 1 dimension 1
n

n
sio

sio

sio
en

en

en

(1) (3) (5)


\reducing the upper bound of dimension 2". The set of
m

m
di

di

di
dimension 3

dimension 3

dimension 3

shrinkages of the two paths is the same. The only di erence


is that di erent paths adopt a di erent permutation of these
shrinkages. In general, if it takes b shrinkages to reach from
box B 0 from B (where B 0 is enclosed within B ), then there
2

exist b! di erent shrinkage paths. In order to eliminate the


dimension 1 dimension 1 dimension 1
n

n
sio

sio

sio
en

en

en

(2) (4) (6)


redundancy of weighted itemset generation, we introduce an
m

m
di

di

di

(c) different ways to shrink


ordered shrinkage technique. This guarantees that each
dimension 3

dimension 3

weighted itemset is generated exactly once.

Ordered Shrinkage
We rst choose a permutation, referred to as
, of 2n dif-
2

dimension 1 dimension 1
n

n
sio

sio
en

en

ferent shrinking directions and retain this order during the


(1) (2)
m

m
di

di

(d) multiple shrinkage paths


entire process. For example, the six shrinking directions in
Figure 2: Region shrinkage Figure 2(c) can be ordered as increasing lowerbound of di-
mension 1, decreasing upperbound of dimension 1, increas-
ing lowerbound of dimension 2, decreasing upperbound of
dimension 2, increasing lowerbound of dimension 3, decreas-
Clearly, each dense box can be reached by a set of shrink- ing upperbound of dimension 3. Then, during the shrinking
ages from the minimum bounding box of the dense region. process, a shrinkage of the kth direction in
can be per-
Assuming that we can only perform one shrinkage on one formed on a box B only if no shrinkage of direction after
candidate box at a time, the entire process can be viewed as the kth one in
has been performed to generate B . We
a sequence of operations (Bj ; Hj ) (j = 1; 2; : : : ) where Bj is call such shrinkage a valid shrinkage. For instance, if we
a candidate box and Hj is a shrinkage along some direction. take the box in Figure 2(c)(4) as the candidate to gener-
Let j be the set that includes the minimum bounding box ate new weighted itemsets, according to the order we pick,
of the dense region B1 and all boxes generated via opera- three shrinking directions can be applied: decreasing upper-
tions (B1 ; H1 ); : : : ; (Bj 1 ; Hj 1 ). j thus represents the set bound of dimension 2, increasing lowerbound of dimension
of candidate boxes (for further shrinking) available before 3, and decreasing upperbound of dimension 3. As a result,
the j th operation. Clearly, we have Bj 2 j at each step. the only valid shrinkage path from Figure 2(d)(1) to Fig-
The sequence stops when all WARs are generated. For the ure 2(d)(2) is \reducing the upper bound of dimension 1"
box Bj visited at the j th step, there exists a subsequence followed by \reducing the upper bound of dimension 2". It
of operations (Bj1 ; Hj1 ); (Bj2 ; Hj2 ); : : : ; (Bjr ; Hjr ) such that is obvious that this ordered shrinkage approach completely
Bj1 is the minimum bounding box of the dense region and eliminates the redundancy of box generation in the previous
Bjk is the box produced by performing shrinkage Hj(k 1) on brute force approach, by providing each possible weighted
Bj(k 1) where 1 < k  r and 1  j1 < j2 <  < jr < j . itemset a unique shrinkage path from the root. The compu-
Hj1 ; Hj2 ; : : : ; Hjr is a shrinkage path to Bj from the mini- tational complexity of the ordered shrinkage for one itemset
mum bounding box of the dense region. Note that, at each is O(N 2 ). The detailed algorithm is shown in [4].
4
10 0.9 1

WAR 0.9
QAR 0.8

0.8
0.7

3 0.7
10 0.6

Overall Accuracy
Execution Time

Overall Recall
0.6
0.5
0.5
0.4
0.4
2
10 0.3
0.3

0.2 0.2
WAR WAR
QAR QAR
0.1 0.1
1
10 0 0
0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
minsup (a) minsup (b) minsup (c)

Figure 3: WAR vs. QAR


6. EXPERIMENTAL RESULTS The overall recall (accuracy) of R0 with respect to R is the
We implemented our algorithm in C and executed it on an average of the individual recalls (accuracies) for each r 2 R.
IBM AIX RS6000 machine with a 333 MHz CPU. The data
was placed on a disk with approximate 8 MB/s throughput. Figure 3(a) shows the average execution time of the two ap-
To analyze the performance of our proposed approaches, we proaches with respect to minsup. WAR on average is a cou-
generate the input data in a similar manner as in [1]. The ple order of magnitude faster than QAR, even though the
data set contains 1 million transactions with 10000 di erent execution time of both approaches increases exponentially
items. The number of items in a transaction is clustered with decrease of minsup because the complexity is approx-
around a mean 15 and a few transactions have many items. imately linear to the number of counters, which usually in-
The volume of frequent weighted itemsets are also clustered creases exponentially with smaller minsup. (Note that the
around a mean 500. Refer to [4] for further information on Y-axis is in log-scale.) Figure 3(b) shows the recall ratio
data generation and additional experiments. of the two algorithms with respect to the minimum sup-
port threshold. WAR has a signi cant higher recall value
The quantitative association rule (QAR) has been proposed than QAR because a signi cant amount of important rules
for the general numerical attribute values that are associ- are pruned by the maxsup threshold in QAR. Finally, we
ated with every database record. Even though we are in- examine the accuracy of the two approaches. The weight
vestigating a di erent scenario, QAR is the only other ap- domain of each item in WAR is partitioned into much ner
proach which we know that may be generalized to mine grids than that in QAR for 2, 3, and 4-itemsets for which
WARs. Therefore, we compare our approach with the QAR most rules exist. As a result, with the ner partition, the
approach. In essence, the quantitative association rule ap- WAR approach produces much more accurate rules than the
proach [3] rst quantizes the domain of every item, and map QAR approach as shown in Figure 3(c).
it into a binary value. Since at the beginning it is unknown
that which items will appear in an itemset, all attribute do- 7. CONCLUSION
mains have to be partitioned. If a numerical attribute is In this paper, we have investigated a new class of interest-
partitioned into  intervals, then these intervals are mapped ing problem: weighted association rules, which have wide
into  2+ items because each of the original intervals and
2
implications. We proposed a two-fold approach, where the
the combined intervals is treated as a di erent item. For frequent itemsets are rst generated (without considering
the synthetic dataset generated above, if the weight domain weight) and then the maximum WARs are derived using an
of each weighted item is partitioned into 20 intervals, then \ordered" shrinkage approach. Experimental results show
there are over 2 million binary items for the QAR approach. that our proposed approach not only outperforms direct gen-
The QAR approach utilizes a parameter maxsup to reduce eralization of previous work by an order of magnitude, but
the resulting set of rules. It is clear that maxsup can im- also produces better quality results.
prove the performance, but could potentially degrade the
quality of the results signi cantly. To analyze the perfor- 8. REFERENCES
mance and quality of these two approaches, we set  = 20 [1] R. Agrawal and R. Srikant. Fast algorithms for mining
and maxsup = minsup + 0:2 for QAR. For the WAR ap- association rules. Proc. 20th VLDB, 1994.
proach, the density d and the total number of grids (N ) for
each itemset are set to 1.5 and 1 million, respectively. [2] R. Agrawal, J. Gehrke, D. Gunopulos, and P.
Raghavan. Automatic subspace clustering of high
The quality of the resulting rules is analyzed via two mea- dimensional data for data mining application. Proc.
surements: recall and accuracy. Since the datasets are gen- ACM SIGMOD, 1998.
erated arti cially, we know the set R of all existing rules.
First, we compute the volume of each rule r 2 R, denoted [3] R. Srikant and R. Agrawal. Mining quantitative
by V (r). For any rule set R0 , the recall (respectively, accu- association rules in large relational tables. Proc. ACM
racy) of R0 with respect to R is computed as follows. For SIGMOD, 1996.
each r 2 R (r0 2 R0 ), nd the rule r0 2 R0 (r 2 R) such [4] W. Wang, J. Yang, and P. Yu. Ecient mining of
that the overlap of r and r0 is the largest, i.e., V (r \ r0 ) weighted association rules (WAR). IBM Research Report
is the largest. Next, the individual recall (accuracy) of r0 RC 21692(97734), March, 2000.
0 V (r \r 0 ) V (r \r 0 )
(r) with respect to r (r ) is computed as V (r) ( V (r0 ) ).

View publication stats

You might also like