Professional Documents
Culture Documents
Karam Gouda and Mohammed J. Zaki
Computer
Science & Communication Engg. Dept., Kyushu University, Japan
Computer Science Dept., Rensselaer Polytechnic Institute, USA
Email: kgouda@csce.kyushu-u.ac.jp, zaki@cs.rpi.edu
ACD{T,W} ACT{W} ACW ADT ADW ATW CDT{W} CDW CTW DTW 3
ACDTW 5
Figure 3. Subset/Backtrack Search Tree (min sup= 3): Circles indicate maximal sets and the infrequent sets have been
crossed out. Due to the downward closure property of support (i.e., all subsets of a frequent itemset must be frequent) the frequent
itemsets form a (shown with the bold line), such that all frequent itemsets lie above the border, while all infrequent itemsets
lie below it. Since MFI determine the border, it is straightforward to obtain FI in a single database scan of MFI is known.
// 7 is an output parameter
LMFI-backtrack( R T R 7 R! )
ing the maximal set Q VY T , GenMax skips QUVT . After finding
Q@WUY T all the remaining nodes with prefix Q are pruned, and so 1. for each "$# T
on. The pruned branches are shown with dashed arrows, indicating 2. 3&% M'( O";Z
that a large part of the search tree is pruned away 3.
4.
+9&% MPO, -=,0# T and , 12"=Z
if &% ( +4&% has a superset in 7 *
Theorem 1 (Correctness) MFI-backtrack returns all and
only the maximal frequent itemsets in the given database.
5.
6. *
7.
return //subsequent branches pruned!
7 &% M7
T &% = FI-combine (3&% R+4&% )
mined maximal pattern. Were it not for this pruning, Gen- Figure 6. Mining MFI with Progressive Focus-
Max would essentially default to a depth-first exploration of ing (* indicates a new line not in MFI-backtrack)
the search tree. Before creating the combine set for the next
pass, in line 4 in Figure 4, GenMax check if ?>
A>
is contained within a previously found maximal set. If yes,
then the entire subtree rooted at A> and including the el-
The +
log !- time bounds reported in [8] for dy-
namic subset testing do not assume anything about the se-
ements of the possible set are pruned. If no, then a new quence of operations performed. In contrast, we have full
extension is required. Another superset check is required at knowledge of how GenMax generates maximal sets; we
line 8, when /?> has no frequent extension, i.e., when the use this observation to substantially speed up the subset
combine set 1 ?> is empty. Even though ?> is a leaf node checking process. The main idea is to progressively nar-
with no extensions it may be subsumed by some maximal row down the maximal itemsets of interest as recursive calls
set, and this case is not caught by the check in line 4 above. are made. In other words, we construct for each invocation
The major challenge in the design of GenMax is how to of MFI-backtrack a list of local maximal frequent itemsets,
perform this subset checking in the current set of maximal J . This list contains the maximal sets that can po-
patterns in an efficient manner. If we were to naively imple- tentially be supersets of candidates that are to be generated
ment and perform this search two times on an ever expand- from the itemset / . The only such maximal sets are those
ing set of maximal patterns MFI, and during each recursive that contain all items in / . This way, instead of check-
ing if ?>
DA> is contained in the full current MFI, we
call of backtracking, we would be spending a prohibitive
amount of time just performing subset checks. Each search check only in J = – the local set of relevant maximal
would take +5F BF - time in the worst case, where MFI itemsets. This technique, that we call progressive focusing,
is the current, growing set of maximal patterns. Note that is extremely powerful in narrowing the search to only the
some of the best algorithms for dynamic subset testing run most relevant maximal itemsets, making superset checking
in amortized time + log !- per operation in a sequence
of operations [8] (for us
+ - ). In dense do-
practical on dense datasets.
Figure 6 shows the pseudo-code for GenMax that incor-
main we have thousands to millions of maximal frequent porates this optimization (the code for the first two opti-
itemsets, and the number of subset checking operations per- mizations is not show to avoid clutter). Before each in-
formed would be at least that much. Can we do better? vocation of LMFI-backtrack a new J ?> is created,
consisting of those maximal sets in the current J 6 that
The answer is, yes! Firstly, we observe that the two sub-
set checks (one on line 4 and the other on line 8) can be contain the item ; (see line 10). Any new maximal itemsets
easily reduced to only one check. Since ?>
?> is a su- from a recursive call are incorporated in the current J
perset of ?> , in our implementation we do superset check at line 12.
only for =?>
?> . While testing this set, we store the
maximum position, say , at which an item in 6A>
DA>
Frequency Testing Optimization
is not found in a maximal set . In other words,
all items before are subsumed by some maximal set. For
the superset test for /?> , we check if F /?> 9F . If yes,
So far GenMax, as described, is independent of the data
format used. The techniques can be integrated into any of
the existing methods for mining maximal patterns. We now
?> is non-maximal. If no, we add it to MFI. present some data format specific optimizations for fast fre-
The second observation is that performing superset quency computations.
checking during each recursive call can be redundant. For GenMax uses a vertical database format, where we have
example, suppose that the cardinality of the possible set available for each item its tidset, the set of all transaction
?> is . Then potentially, MFI-backtrack makes re- tids where it occurs. The vertical representation has the fol-
dundant subset checks, if the current MFI has not changed lowing major advantages over the horizontal layout: Firstly,
during these consecutive calls. To avoid such redun- computing the support of itemsets is simpler and faster with
dancy, a simple check status flag is used. If the flag is false, the vertical layout since it involves only the intersections
no superset check is performed. Before each recursive call of tidsets (or compressed bit-vectors if the vertical format
the flag is false; it becomes true whenever 1 ?> is empty, is stored as bitmaps [4]). Secondly, with the vertical lay-
which indicates that we have reached a leaf, and have to out, there is an automatic “reduction” of the database be-
backtrack. fore each scan in that only those itemsets that are relevant
to the following scan of the mining process are accessed // Can &% combine with other items in TD ?
FI-diffset-combine( &% R+4&% )
from disk. Thirdly, the vertical format is more versatile in 1. T M
supporting various search strategies, including breadth-first, 2. for each ,0#$+9&%
depth-first or some other hybrid search. 3. , M ,
4. if level == 0 then , M 3&%
,
// Can &% combine with other items in T
? 5. else , M , &%
FI-tidset-combine( &% R+9&% ) 6. if , min sup
1. T M 7. TPM T:(/O, Z
2. for each ,0#$+9&% 8. return T
3.* , M',
4.* , M &% , Figure 8. FI-combine: Diffset Propagation
5.* if , min sup
6. TPM T:(/O, Z of computing the tidset of as shown in line 4 in Figure 7.
7. return T That is +F -@
+F;- +F - . The support of is now given
as CD+ -3
CD+F;-UF + -F . At subsequent levels, we have
Figure 7. FI-combine Using Tidset Intersec- available the diffsets for each element in the combine list. In
tions (* indicates a new line not in FI-combine) this case +F# -U
+F -D +; - , but the support is still given
as CD+ -
CD+; - F
+ -F . Figure 8 shows the pseudo-code
Let’s consider how the FI-combine (see Figure 2) routine for computing the combine sets using diffsets.
works, where the frequency of an extension is tested. Each
item ; in 1) actually represents the itemset =
=; and
stores the associated tidset for the itemset
/; . For the
GenMax:
1. Compute J and J
initial invocation, since is empty, the tidset for each item 3. Compute BJ +; - for each item ; J
; in 1 is identical to the tidset, +F;- , of item ; . Before line 4. Sort J@ (decreasing in BJ +; - , increasing in CD+F;- )
3 is called in FI-combine, we intersect the tidset of the ele-
ment A> (i.e., +F
=; - ) with the tidset of element (i.e.,
5.
6. LMFI-backtrack( 5J
^ ) //use diffsets
+F
/ - ). If the cardinality of the resulting intersection 7. return MFI
is above minimum support, the extension with is frequent, Figure 9. The GenMax Algorithm
and
the new intersection result, is added to the combine
set for the next level. Figure 7 shows the pseudo-code for Final GenMax Algorithm The complete GenMax algo-
FI-tidset-combine using this tidset intersection based sup- rithm is shown in Figure 9, which ties in all the optimiza-
port counting. tions mentioned above. GenMax assumes that the input
In Charm [11] we first introduced two new properties of dataset is in the vertical tidset format. First GenMax com-
itemset-tidset pairs which can be used to further increase the putes the set of frequent items and the frequent 2-itemsets,
performance. Consider the items ; and in 1 . If during using a vertical-to-horizontal recovery method [10]. This
intersection in line 4 in Figure 7, we discover that +; - – or information is used to reorder the items in the initial com-
equivalently + /?> - – is a subset of or equal to +F;- , then bine list to minimize the search tree size that is generated.
we do not add to the combine set, since in this case, ; GenMax uses the progressive focusing technique of LMFI-
always occurs along with . Instead of adding to the backtrack, combined with diffset propagation of FI-diffset-
combine set, we add it to =A> . This optimization was also combine to produce the exact set of all maximal frequent
used in Mafia [4] under the name PEP. itemsets, MFI.
Diffsets Propagation Despite the many advantages of the 4 Experimental Results
vertical format, when the tidset cardinality gets very large
(e.g., for very frequent items) the intersection time starts Past work has demonstrated that DepthProject [1] is
to become inordinately large. Furthermore, the size of in- faster than MaxMiner [3], and the latest paper shows that
termediate tidsets generated for frequent patterns can also Mafia [4] consistently beats DepthProject. In out experi-
become very large to fit into main memory. GenMax uses a mental study below, we retain MaxMiner for baseline com-
new format called diffsets [10] for fast frequency testing. parison. At the same time, MaxMiner shows good perfor-
The main idea of diffsets is to avoid storing the entire mance on some datasets, which were not used in previous
tidset of each element in the combine set. Instead we keep studies. We use Mafia as the current state-of-the-art method
track of only the differences between the tidset of itemset and show how GenMax compares against it.
and the tidset of an element ; in the combine set (which All our experiments were performed on a 400MHz Pen-
actually denotes
/; ). These differences in tids are tium PC with 256MB of memory, running RedHat Linux
stored in what we call the diffset, which is a difference of 6.0. For comparison we used the original source or ob-
two tidsets at the root level or a difference of two diffsets at ject code for MaxMiner [3] and MAFIA [4], provided to
later levels. Furthermore, these differences are propagated us by their authors. Timings in the figures are based on total
all the way from a node to its children starting from the root. wall-clock time, and include all preprocessing costs (such
In an extensive study [10], we showed that diffsets are very as horizontal-to-vertical conversion in GenMax and Mafia).
short compared to their tidsets counterparts, and are highly The times reported also include the program output. We
effective in improving the running time of vertical methods. believe our setup reflects realistic testing conditions (as op-
We describe next how they are used in GenMax, with the posed to some previous studies which report only the CPU
help of an example. At level 0, we have available the tidsets time or may not include output cost).
for each item in J . When we invoke FI-combine at this Benchmark Datasets: We chose several real and syn-
level, we compute the diffset of
, denoted as +F - instead thetic datasets for testing the performance of the the al-
Database I AL R MPL the y-axis is in log-scale, this appears linear) faithfully fol-
chess 76 37 3,196 23 (20%) lowing the growth of MFI with lowering minimum sup-
connect 130 43 67,557 31 (2.5%) port, as shown in the top center and right figures. MafiaPP
mushroom
pumsb*
pumsb
120
7117
7117
23
50
74
8,124
49,046
49,046
22 (0.025%)
43 (2.5%)
27 (40%)
shows super-exponential growth and suffers from an ap-
proximately + F 9F - overhead in pruning non-maximal
T10I4D100K 1000 10 100,000 13 (0.01%) sets and thus becomes impractical when MFI becomes too
T40I10D100K 1000 40 100,000 25 (0.1%) large, i.e., at low supports.
On pumsb, we find that GenMax is the fastest, having a
the number of items, Q
Figure 10. Database Characteristics: denotes
the average length of a record,
slight edge over Mafia. It is about 2 times faster than Mafi-
aPP. We observed that the post-pruning routine in MafiaPP
the number of records, and 7+ the maximum pattern works well till around +"8 - maximal itemsets. Since at
length at the given min sup. 60% min sup we had around that many sets, the overhead
of post-processing was not significant. With lower support
the post-pruning cost becomes significant, so much so that
gorithms, shown in Table 10. The real datasets have been we could not run MafiaPP beyond 50% minimum support.
used previously in the evaluation of maximal patterns [1, 3, MaxMiner is significantly slower on pumsb; a factor of 10
4]. Typically, these real datasets are very dense, i.e., they times slower then both GenMax and Mafia.
produce many long frequent itemsets even for high val- Type I results substantiate the claim that GenMax is an
ues of support. The table shows the length of the longest highly efficient method to mine the exact MFI. It is as fast
maximal pattern (at the lowest minimum support used in as Mafia on pumsb and within a factor of 2 on chess. Mafia,
our experiments) for the different datasets. For exam- on the other hand is very effective in mining a superset of
ple on pumsb*, the longest pattern was of length 43 (any the MFI. Post-pruning, in general, is not a good idea, and
method that mines all frequent patterns will be impracti- GenMax beats MafiaPP with a wide margin (over 100 times
cal for such long patterns). We also chose two synthetic better in some cases, e.g., chess at 20%). On Type I data
datasets, which have been used as benchmarks for testing MaxMiner is noncompetitive.
methods that mine all frequent patterns. Previous maxi-
mal set mining algorithms have not been tested on these Type II Datasets: Connect and Pumsb*
datasets, which are sparser compared to the real sets. All Type II datasets, as shown in Figure 12 are characterized
these datasets are publicly available from IBM Almaden by a left-skewed distribution of the maximal frequent pat-
(www.almaden.ibm.com/cs/quest/demos.html). terns, i.e., there is a relatively gradual increase with a sharp
While conducting experiments comparing the 3 different drop in the number of maximal patterns. The mean pattern
algorithms, we observed that the performance can vary sig- length is also longer than in Type I datasets; it is around 16
nificantly depending on the dataset characteristics. We were or 17. The MFI cardinality is also drastically smaller than
able to classify our benchmark datasets into four classes FI cardinality; by a factor of 8 or more (in contrast, for
based on the distribution of the maximal frequent patterns. Type I data, the reduction was only 8 ).
The main performance trend for both Type II datasets
Type I Datasets: Chess and Pumsb is that Mafia is the best till the support is very low, at
Figure 11 shows the performance of the three algorithms which point there is a cross-over and GenMax outperforms
on chess and pumsb. These Type I datasets are character- Mafia. MafiaPP continues to be favorable for higher sup-
ized by a symmetric distribution of the maximal frequent ports, but once again beyond a point post-pruning costs start
patterns (leftmost graph). Looking at the mean of the curve, to dominate. MafiaPP could not be run beyond the plotted
we can observe that for these datasets most of the maximal points. MaxMiner remains noncompetitive (about 10 times
patterns are relatively short (average length 11 for chess and slower). The initial start-up time for Mafia for creating the
10 for pumsb). The MFI cardinality figures on top center bit-vectors is responsible for the high offset at 50% support
and right, show that for the support values shown, the MFI on pumsb*. GenMax appears to exhibit a more graceful
is 2 orders of magnitude smaller than all frequent itemsets. increase in running time than Mafia.
Compare the total execution time for the different algo-
rithms on these datasets (center and rightmost graphs). We Type III Datasets: T10I4 and T40I10
use two different variants of Mafia. The first one, labeled As depicted in Figure 13, Type III datasets – the two
Mafia, does not return the exact maximal frequent set, rather synthetic ones – are characterized by an exponentially de-
it returns a superset of all maximal patterns. The second caying distribution of the maximal frequent patterns. Ex-
variant, labeled MafiaPP, uses an option to eliminate non- cept for a few maximal sets of size one, the vast majority of
maximal sets in a post-processing (PP) step. Both GenMax maximal patterns are of length two! After that the number
and MaxMiner return the exact MFI. of longer patterns drops exponentially. The mean pattern
On chess we find that Mafia (without PP) is the fastest length is very short compared to Type I or Type II datasets;
if one is willing to live with a superset of the MFI. Mafia it is around 4-6. MFI cardinality is not much smaller than
is about 10 times faster than MaxMiner. However, notice the cardinality of all frequent patterns. The difference is
how the running time of MafiaPP grows if one tries to find only a factor of 10 compared to a factor of 100 for Type I
the exact MFI in a post-pruning step. GenMax, though and a factor of 10,000 for Type II.
slower than Mafia is significantly faster than MafiaPP and Comparing the running times we observe that MaxMiner
is about 5 times better than MaxMiner. All methods, except is the best method for this type of data. The breadth-first
MafiaPP, show an exponential growth in running time (since or level-wise search strategy used in MaxMiner is ideal for
maximal itemset distribution chess pumsb
7000
chess(40%) 10000 100000
pumsb(60%) MaxMiner MaxMiner
6000
MafiaPP MafiaPP
1000 GenMax 10000 GenMax
1000 1 10
0
0.1 1
0 5 10 15 20 25 70 65 60 55 50 45 40 35 30 25 20 100 90 80 70 60 50 40
Length Minimum Support (%) Minimum Support (%)
Figure 11. Type I Datasets (chess and pumsb)
maximal itemset distribution connect pumsb*
4000
connect(20%) 10000 10000
3500 pumsb*(7.5%) MaxMiner MaxMiner
3000 MafiaPP MafiaPP
GenMax GenMax
Total Time (sec)
2000
1500 100 100
1000
500
10 10
0
-500
-1000 1 1
0 5 10 15 20 25 30 35 40 100 90 80 70 60 50 40 30 20 10 0 50 45 40 35 30 25 20 15 10 5 0
Length Minimum Support (%) Minimum Support (%)
Figure 12. Type II Datasets (connect and pumsb*)
maximal itemset distribution T10I4D100K T40I10D100K
35000
T10(0.025%) 10000 10000
T40(0.7125%) MaxMiner MaxMiner
30000 MafiaPP MafiaPP
GenMax GenMax
Total Time (sec)
20000
100
15000
10000
100
10
5000
0 1 10
0 2 4 6 8 10 12 14 16 18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2
Length Minimum Support (%) Minimum Support (%)
Figure 13. Type III Datasets (T10 and T40)
maximal itemset distribution mushroom
30000
mushroom(0.1%) 100
mushroom(0.075%)
25000
Total Time (sec)
20000 10
Frequency
15000
MaxMiner
10000 1 MafiaPP
GenMax
5000 GenMax’
Mafia
0 0.1
0 5 10 15 20 25 10 1 0.1 0.01
Length Minimum Support (%)
Figure 14. Type IV Dataset (mushroom)
very bushy search trees, and when the average maximal pat- to get their frequency. On the other hand vertical meth-
tern length is small. Horizontal methods are better equipped ods spend much time in performing intersections on long
to cope with the quadratic blowup in the number of fre- item tidsets or bit-vectors. GenMax gets around this prob-
quent 2-itemsets since one can use array based counting lem by using the horizontal format for computing frequent
2-itemsets (denoted J@ ), but it still has to spend time per- methods, MaxMiner is the best for mining Type III distri-
forming +5F J@BF - pairwise tidset intersections.
Mafia on the other hand performs +5F J F - intersections,
butions. On the remaining types, Mafia is the best method
if one is satisfied with a superset of the MFI. For very low
where J is the set of frequent items. The overhead cost is supports on Type II data, Mafia loses its edge. Post-pruning
enough to render Mafia noncompetitive on Type III data. On non-maximal patterns typically has high overhead. It works
T10 Mafia can be 20 or more times slower than MaxMiner. only for high support values, and MafiaPP cannot be run be-
GenMax exhibits relatively good performance, and it is yond a certain minimum support value. GenMax integrates
about 10 times better than Mafia and 2 to 3 times worse pruning of non-maximal itemsets in the process of mining
than MaxMiner. On T40, the gap between GenMax/Mafia using the novel progressive focusing technique, along with
and MaxMiner is smaller since there are longer maximal other optimizations for superset checking; GenMax is the
patterns. MaxMiner is 2 times better than GenMax and 5 best method for mining the exact MFI.
times better than Mafia. Since the MFI cardinality is not Our work opens up some important avenues of future
too large MafiaPP has almost the time as Mafia for high work. The IBM synthetic dataset generator appears to be
supports. Once again MafiaPP could not be run for lower too restrictive. It produces Type III MFI distributions. We
support values. It is clear that, in general, post-pruning is plan to develop a new generator that the users can use to
not a good idea; the overhead is too much to cope with. produce various kinds of MFI distributions. This will help
provide a common testbed against which new algorithms
Type IV Dataset: Mushroom can be benchmarked. Knowing the conditions under which
Mushroom exhibits a very unique MFI distribution. a method works well or does not work well is an impor-
Plotting MFI cardinality by length, we observe in Figure 14 tant step in developing new solutions. In contrast to pre-
that the number of maximal patterns remains small until vious studies we were able to isolate these conditions for
length 19. Then there is a sudden explosion of maximal pat- the different algorithms. For example, we were able to im-
terns at length 20, followed by another sharp drop at length prove the performance of GenMax’ to match MaxMiner on
21. The vast majority of maximal itemsets are of length 20. mushroom dataset. Another obvious avenue of improving
The average transaction length for mushroom is 23 (see Ta- GenMax and Mafia is to efficiently handle Type III data. It
ble 10), thus a maximal pattern spans almost a full transac- seems possible to combine the strengths of the three meth-
tion. The total MFI cardinality is about 1000 times smaller ods into a single hybrid algorithm that uses the horizontal
than all frequent itemsets. format when required and uses bit-vectors/diffsets or per-
On Type IV data, Mafia performs the best. MafiaPP and haps bit-vectors of diffsets in other cases or in combination.
MaxMiner are comparable at lower supports. This data We plan to investigate this in the future.
is the worst for GenMax, which is 2 times slower than
MaxMiner and 4 times slower than Mafia. In Type IV Acknowledgments We would like to thank Roberto Bayardo for
data, a smaller itemset is part of many maximal itemsets providing us the MaxMiner algorithm and Johannes Gehrke for
(of length 20 in case of mushroom); this renders our pro- the MAFIA algorithm.
gressive focusing technique less effective. To perform max-
imality checking one has to test against a large set of maxi- References
mal itemsets; we found that GenMax spends half its time in
[1] R. Agrawal, C. Aggarwal, and V. Prasad. Depth First Gener-
maximality checking. Recognizing this helped us improve
the progressive focusing using an optimized intersection- ation of Long Patterns. In ACM SIGKDD Conf., Aug. 2000.
[2] R. Agrawal, et al. Fast discovery of association rules. In
based method (as opposed to the original list based ap- Advances in Knowledge Discovery and Data Mining, AAAI
proach). This variant, labeled GenMax’, was able to cut
Press, 1996.
down the execution time by half. GenMax’ runs in the same [3] R. J. Bayardo. Efficiently mining long patterns from
time as MaxMiner and MafiaPP. databases. In ACM SIGMOD Conf., June 1998.
[4] D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal
5 Conclusions frequent itemset algorithm for transactional databases. In
This is one of the first papers to comprehensively com- Intl. Conf. on Data Engineering, Apr. 2001.
[5] D. Gunopulos, H. Mannila, and S. Saluja. Discovering all
pare recent maximal pattern mining algorithms under realis-
tic assumptions. Our timings are based on wall-clock time, the most specific sentences by randomized algorithms. In
we included all pre-processing costs, and also cost of out- Intl. Conf. on Database Theory, Jan. 1997.
[6] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without
putting all the maximal itemsets (written to a file). We were
candidate generation. In ACM SIGMOD Conf., May 2000.
able to distinguish four different types of MFI distributions [7] D.-I. Lin and Z. M. Kedem. Pincer-search: A new algorithm
in our benchmark testbed. We believe these distributions for discovering the maximum frequent set. In Intl. Conf. Ex-
to be fairly representative of what one might see in prac- tending Database Technology, Mar. 1998.
tice, since they span both real and synthetic datasets. Type [8] D. Yellin. An algorithm for dynamic subset and intersection
I is a normal MFI distribution with not too long maximal testing. Theoretical Computer Science, 129:397–406, 1994.
patterns, Type II is a left-skewed distributions, with longer [9] M. J. Zaki. Generating non-redundant association rules. In
maximal patterns, Type III is an exponential decay distri- ACM SIGKDD Conf., Aug. 2000.
bution, with extremely short maximal patterns, and finally [10] M. J. Zaki and K. Gouda. Fast vertical mining using Diffsets.
Type IV is an extreme left-skewed distribution, with very TR 01-1, CS Dept., RPI, Mar. 2001.
large average maximal pattern length. [11] M. J. Zaki and C.-J. Hsiao. C H ARM: An efficient algorithm
We noted that different algorithms perform well under for closed association rule mining. TR 99-10, CS Dept., RPI,
different distributions. We conclude that among the current Oct. 1999.