You are on page 1of 1

Association Rule Mining

also outperforms the discussed existing structures in nodes containing that item need only be removed, and
terms of efficiency, effectiveness, and practicality. the rest of the nodes would still be valid. A
Our structure is termed Support-Ordered Trie Itemset Unlike the trie structure of Amir et al. (1999), the
(SOTrieIT — pronounced “so-try-it”). It is a dual-level SOTrieIT is ordered by support count (which speeds
support-ordered trie data structure used to store perti- up mining) and does not require the powersets of trans-
nent itemset information to speed up the discovery of actions (which reduces construction time). The main
frequent itemsets. weakness of the SOTrieIT is that it can only discover
As its construction is carried out before actual min- frequent 1-itemsets and 2-itemsets; its main strength
ing, it can be viewed as a preprocessing step. For every is its speed in discovering them. They can be found
transaction that arrives, 1-itemsets and 2-itemsets are promptly because there is no need to scan the database.
first extracted from it. For each itemset, the SOTrieIT In addition, the search (depth first) can be stopped at
will be traversed in order to locate the node that stores a particular level the moment a node representing a
its support count. Support counts of 1-itemsets and 2- nonfrequent itemset is found, because the nodes are
itemsets are stored in first-level and second-level nodes, all support ordered.
respectively. The traversal of the SOTrieIT thus requires Another advantage of the SOTrieIT, compared
at most two redirections, which makes it very fast. At with all previously discussed structures, is that it can
any point in time, the SOTrieIT contains the support be constructed online, meaning that each time a new
counts of all 1-itemsets and 2-itemsets that appear in transaction arrives, the SOTrieIT can be incrementally
all the transactions. It will then be sorted level-wise updated. This feature is possible because the SOTrieIT
from left to right according to the support counts of is constructed without the need to know the support
the nodes in descending order. threshold; it is support independent. All 1-itemsets
Figure 3 shows a SOTrieIT constructed from the and 2-itemsets in the database are used to update the
database in Table 1. The bracketed number beside an SOTrieIT regardless of their support counts. To con-
item is its support count. Hence, the support count of serve storage space, existing trie structures such as
itemset {AB} is 2. Notice that the nodes are ordered by the FP-tree have to use thresholds to keep their sizes
support counts in a level-wise descending order. manageable; thus, when new transactions arrive, they
In algorithms such as FP-growth that use a similar have to be reconstructed, because the support counts
data structure to store itemset information, the struc- of itemsets will have changed.
ture must be rebuilt to accommodate updates to the Finally, the SOTrieIT requires far less storage
universal itemset. The SOTrieIT can be easily updated space than a trie or Patricia trie because it is only two
to accommodate the new changes. If a node for a new levels deep and can be easily stored in both memory
item in the universal itemset does not exist, it will be and files. Although this causes some input/output (I/O)
created and inserted into the SOTrieIT accordingly. overheads, it is insignificant as shown in our extensive
If an item is removed from the universal itemset, all experiments. We have designed several algorithms to

Figure 3. A SOTrieIT structure

ROOT

C(4) A(3) B(3) D(1)

D(1) C(3) B(2) D(1) C(3) D(1)



You might also like