Professional Documents
Culture Documents
Strategies
High-utility itemset mining is a popular data mining problem that considers utility factors, such as quantity
and unit profit of items besides frequency measure from the transactional database. It helps to find the most
valuable and profitable products/items that are difficult to track by using only the frequent itemsets. An item
might have a high-profit value which is rare in the transactional database and has a tremendous importance.
While there are many existing algorithms to find high-utility itemsets (HUIs) that generate comparatively
large candidate sets, our main focus is on significantly reducing the computation time with the introduction
of new pruning strategies. The designed pruning strategies help to reduce the visitation of unnecessary nodes
in the search space, which reduces the time required by the algorithm. In this article, two new stricter upper
bounds are designed to reduce the computation time by refraining from visiting unnecessary nodes of an
itemset. Thus, the search space of the potential HUIs can be greatly reduced, and the mining procedure of
the execution time can be improved. The proposed strategies can also significantly minimize the transaction
database generated on each node. Experimental results showed that the designed algorithm with two pruning 58
strategies outperform the state-of-the-art algorithms for mining the required HUIs in terms of runtime and
number of revised candidates. The memory usage of the designed algorithm also outperforms the state-of-the-
art approach. Moreover, a multi-thread concept is also discussed to further handle the problem of big datasets.
CCS Concepts: • Information systems → Association rules; Data analytics; • Computing methodolo-
gies → Knowledge representation and reasoning;
Additional Key Words and Phrases: HUIM, high-utility itemset, pruning strategy, multiple threads
ACM Reference format:
Jimmy Ming-Tai Wu, Jerry Chun-Wei Lin, and Ashish Tamrakar. 2019. High-Utility Itemset Mining with
Effective Pruning Strategies. ACM Trans. Knowl. Discov. Data 13, 6, Article 58 (October 2019), 22 pages.
https://doi.org/10.1145/3363571
1 INTRODUCTION
The main challenge in data mining has always been finding the meaningful information from huge
datasets. A data mining technique to discover interesting, unexpected, and useful patterns of data
Authors’ addresses: J. M.-T. Wu, College of Computer Science and Engineering, Shandong University of Science and
Technology, 579 Qianwangang Rd, Qingdao, Shandong 266590, China; email: wmt@wmt35.idv.tw; J. C.-W. Lin (corre-
sponding author), Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Nor-
way University of Applied Sciences, Inndalsveien 28, Bergen 5063, Norway; email: jerrylin@ieee.org; A. Tamrakar, De-
partment of Computer Science, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV 89154; email:
ashish.tamrakar@unlv.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2019 Association for Computing Machinery.
1556-4681/2019/10-ART58 $15.00
https://doi.org/10.1145/3363571
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:2 J. M.-T. Wu et al.
from a huge database is called pattern mining. Previously, most researches focused on Frequent
Itemset Mining (FIM) and Association Rule Mining (ARM). These pattern mining algorithms are
the traditional methods of finding the frequent set of patterns for locating the set of itemsets in
which the frequency of each itemset is no less than the minimum support count (threshold) [7].
Apriori was firstly proposed for FIM where the database is scanned multiple times to identify the
frequent individual items/patterns and highlight the actual information and rules [1]. FP-Growth
was then proposed to overcome the limitation of the Apriori algorithm. It discovers all the fre-
quent patterns with only two database scans [19]. The FP-Growth is based on a compressed FP-
tree structure for mining the set of frequent itemsets by performing the recursive method. Thus,
the number of unpromising candidates and the computational cost can be reduced significantly.
Although FIM/ARM discovers frequently occurring itemsets, it is likely to miss the itemsets that
have an unexpectedly high importance on profit rather than the quantity. For example, the sale of
milk and bread occurs frequently among the transactions in the dataset while the sale of caviar
seems to be very rare and might not be reflected in the outcome of FIM/ARM. Therefore, it is nec-
essary to consider both profit and quantity of the itemsets for mining the useful and profitable
itemsets. Consequently, the concept of high-utility itemset mining (HUIM) was developed.
To discover the useful and profitable itemsets from huge transactional datasets, HUIM [6, 47–
49, 54] has been one of the most significant research works. HUIM considers both internal and
external utilities [47] to obtain the set of profitable itemsets. Internal utility is represented by the
quantity of an item and external utility is represented by the profit of an item. A minimum high
utility count (threshold) is used to decide whether an itemset is a high utility itemset (HUI). Re-
cently, a lot of research has been carried out in the field of HUIM [17, 29, 30]. Liu et al. proposed a
two-phase model that computes the Transaction Weighted Utility (TWU) of an itemset and con-
siders the Transaction Weighted Download Closure (TWDC) property to find the set of HUIs [33].
This algorithm generates a large number of candidates in the level-wise manner to find the re-
quired HUIs. However, it requires a lot of computation time and memory usage to process a large
number of unpromising candidates. Several improved methods have been proposed to reduce the
unpromising candidate sets [41, 42].
Liu et al. and Zida et al. respectively proposed two approaches to find the HUIs without candidate
generation [32, 56]. In [32], an efficient utility-list (UL) structure was presented to efficiently keep
the potential itemsets into the link-list structure. It uses the simple join operation to generate the
k-itemsets, which outperforms the traditional Apriori-like and Tree-like approaches. Liu et al. then
presented the D2HUP algorithm [31] to use a single phase without generating candidates. A novel
data structure is used to compute a tighter upper bound for pruning the unpromising candidates to
figure out the actual HUIs. Zida et al. [56] then proposed the EFIM approach to find HUIs. It is an
effective algorithm to quickly find all the HUIs in a dataset. In EFIM, a search tree is generated based
on upper bounds, such as sub-tree utility and local utility, to reduce the number of itemsets that
need to be estimated. In this article, two new lower bounds are proposed to reduce the searching
area in EFIM. Although EFIM outperforms the previous studies of HUIM, it still does not address
the mining performance problem especially for the sparse datasets. Thus, it is necessary to present
the efficient pruning strategies for further improvements. The major contributions of this article
are described below:
(1) Two, well-defined upper bounds are proposed to reduce the size of the unpromising candi-
dates. Thus, it can reduce the runtime and improve the mining performance as compared
to EFIM.
(2) The proposed approach generates fewer branches of the search space than EFIM, which
effectively reduces memory usage in a huge dataset.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:3
(3) The proposed approach also discusses a multiple threads framework, which can be used
to handle the large-scale datasets and can be applied for the MapReduce environment.
(4) Experimental results show that the proposed method estimated fewer candidate itemsets
than the state-of-the-art D2HUP and EFIM algorithms.
The remainder of this article is organized as follows: In Section 2, the related work and pre-
liminaries are described. In Section 3, the proposed algorithm is described in detail. An illustrated
example is shown in Section 4. The experimental results and discussion are given in Section 5.
Finally, conclusions of this article are given in Section 6.
2 BACKGROUND
2.1 Related Work
Data mining technology [5, 7, 11, 14, 15, 18–20, 24] is used to find the potential and implicit in-
formation from a very large dataset, and the most common algorithms used were FIM and ARM.
The initial breakthrough came when Agrawal and Srikant [1] proposed a method named Apriori.
However, to further improve the mining performance, Han et al. [19] proposed the FP-Growth al-
gorithm with a tree-structure named FP-tree. FIM does not emphasize on the importance of items
or their quantities. Therefore, there was the need for weighted FIM (WTI-FWI) [44, 52, 55]. These
methods focus on weight and give importance to the items.
HUIM [3, 10, 34, 36, 41, 45, 47–49] considers the importance of the item quantities (internal
utility) and profit value (external utility). Many applications for HUIs have already been proposed.
They prove that the field of HUIM has important commercial value. These applications include
website click-stream analysis [4, 23, 39], cross-marketing in retail store commercial value [9, 25,
26], mobile commerce environment [37, 38], gene regulation [57], and bio-medical applications.
The initial concept of HUIM was proposed by Yao et al. [48] but it failed at the computation since
the “combinational explosion” problem can easily occur. Liu et al. [33] proposed a two-phase algo-
rithm, which was based on Apriori-like approach to find the set of HUIs using multiple database
scans. However, this algorithm generated a large number of candidate sets on each level, and it
caused high computational cost for mining the promising HUIs. Erwin et al. [10] then presented a
CTU-Mine algorithm to discover the HUIs based on the pattern-growth approach.
HUP-Prune [2] was designed to extract the high-utility patterns but it still needed multiple
database scans to find the required HUIs. The Incremental High-Utility Patterns (IHUP) [4] was
also designed to mine the HUIs incrementally and interactively. Although the IHUP avoids the
generate-and-test approach, it still produces a large number of candidates in the Phase 1. Lin
et al. [27] developed the HUP-tree structure to keep the necessary information by integrating
the FP-tree-like structure and the TWU methodology. Tseng et al. then respectively presented the
UP-Growth [43] and UP-Growth+ [41] algorithms to efficiently prune the unpromising candidates
early and mine the HUIs based on the designed UP-tree. The above algorithms are mostly based
on the Apriori-like and Tree-like structure for mining the set of HUIs. Many studies focused on
developing the efficient pruning strategies for mining HUIs, and some of them are still in progress.
[8, 17, 22, 30, 35, 40].
Recently, Liu et al. [32] proposed the mining of HUIs without candidate generation. A utility-
list was used to store the information about utilities for itemsets. These utility-lists also helped
to prune the unnecessary candidates. However, this algorithm used a large amount of memory
for the utility list of each itemsets. Liu et al. developed the D2HUP algorithm [31] relying on a
single-phase approach without generating candidates. A novel data structure is also designed to
estimate a tighter upper bound for pruning the unpromising candidates to figure out the actual
HUIs. FHM [12] is an enhanced version of HUI-Miner, which utilizes a novel pruning strategy
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:4 J. M.-T. Wu et al.
named EUCP (Estimated Utility Co-occurrence Pruning) to reduce the costly join operations of
utility-lists. Krishnamoorthy [21] employed several pruning strategies to improve the performance
of the HUI-Miner algorithm. The utility list (PUL) that maintains utility information at the partition
level is also designed to keep the necessary information for improving the mining performance.
For further pruning the search space, Zida et al. [56] also used the concept of utility lists and
proposed two upper bounds called sub-tree utility and local utility. These bounds are described
in the following sections. This method also uses the fast utility counting technique to reduce the
memory usage. Additionally, Yun et al. [53] proposed the mining of HUIs for an incremental dataset
environment. This method can effectively update the set of HUIs when new transactions come
into the dataset. Ryan et al. [36] then presented an SIQ algorithm with two pruning strategies
for efficiently mining the HUIs. Ryang and Yun [34] then presented an effective algorithm called
SHU-Growth, which is used to mine the HUIs from the stream environment. The summary of the
famous HUIM algorithms is shown in Table 1.
Several extensions of HUIM are also studied. For example, average-utility itemset mining [28,
50] takes the length of the itemset for evaluating the average-utility of the itemset. Yun et al. then
also designed the MPM method [50] to mine the high average-utility itemset from the stream
situation. Mining HUIs from a stream data [52] is also a challenging topic since it needs to keep
enough information in a time-window for mining the complete HUIs. How to efficiently update
the discovered HUIs in an incremental database is also an important task [16, 53]. Moreover, top-k
HUIM [45] has emerged as an interesting topic in recent years, which is used to find the top-k
HUIs instead of mining the whole HUIs.
2.2 Preliminaries
To efficiently mine the set of HUIs, it is necessary to estimate the lower and upper-bound value
of the itemset and reduce the size of the candidates for obtaining the actual HUIs. Hence, by apply-
ing the efficient pruning rule, the number of candidate sets reduction plays a vital role in improving
the performance of the algorithm. Therefore, this study aims to construct a novel tree structure to
generate candidate sets efficiently, and to apply the proposed pruning strategies to significantly
reduce the unnecessary candidate sets in large datasets. Moreover, a multiple-thread is considered
to speed up the mining performance for handling the large-scale dataset. Preliminaries are then
defined as follows:
Suppose the finite set of m unique items is I = {i 1 , i 2 , . . . , im }, and the quantitative database with
a set of transactions is D = {T1 ,T2 , . . . ,Tn } . Each transaction Tq ∈ D where 1 ≤ q ≤ n has a unique
identifier called its TID. Each item i j is associated with a purchase quantity, which is the internal
utility, and with unit profit, which is the external utility. Internal and external utilities are denoted
by q(i j ,Td ) and p f t (i j ), respectively. A set of k unique items X = {i 1 , i 2 , . . . . , i k }, where X ⊆ I is
said to be an k-itemset, k is the length of the itemset, and an itemset X is in transaction Tq if X ⊆ Tq
and a minimum high-utility threshold δ is defined.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:5
Definition 2.4. The transaction utility (TU) of a transaction Tq denoted by TU (Tq ) is defined as
follows:
TU (Tq ) = u (X ,Tq ). (4)
X ⊆Tq
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:6 J. M.-T. Wu et al.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:7
as TU (T4 ) = u (C,T4 ) + u (D,T4 ) + u (E,T4 ) = 10 + 28 + 20 = 58. Similarly, the TU for other trans-
actions are T1 = 28,T2 = 49,T3 = 47,T5 = 54, and T6 = 46. The total utility is calculated from Equa-
tion (5) as TU = 28 + 49 + 47 + 58 + 54 + 46 = 282. The TWU for the itemset {A, B} is calculated
from Equation (6) as TW U ({A, B}) = TU (T1 ) + TU (T2 ) + TU (T5 ) + TU (T6 ) = 28 + 49 + 54 + 46 =
177. The itemset {A, B} has TW U ({A, B}) ≥ TU × δ and it is therefore HTWUI. Similarly, an item-
set {C, D} has u ({C, D}) < TU × δ , and it is not a HUI.
Definition 2.9. The total ordering, denoted by , is the ordering of items in the increasing order
of TW U in the transaction. For example, the TWU for each item in D is shown in Table 4. The
increasing order of items in terms of TW U is as follows: F C B D E A.
Definition 2.10. The revised transaction, denoted by RT , is said to be a transaction in which all
items that have TW U < TU × δ are removed and the remaining items are sorted in an increasing
order of TW U . The items that are removed from the transactions are considered as unpromising
items.
From the given illustrative example, after removing the unpromising items and arranging the
remaining items in an increasing order of TW U , database D is shown in Table 5.
Definition 2.11. The remaining utility, denoted by rem(X ,T ) in transaction T , is defined as
follows:
rem(X ,T ) = u (i j ,T ). (9)
i j ∈T ∩z i j ∀z ∈X
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:8 J. M.-T. Wu et al.
The projected transaction merging is the method of merging the identical projected transaction
(γT ) and the utility from each transaction into one, as follows:
where k is the number of identical projected transactions.
From the illustrative example in Table 5, considering γ = {B}, γ D obtains the projected trans-
actions of (D, A) from T1 , (E, A) from T2 , (D, E, A) from T5 , and (D, E, A) from T6 . The projected
transactions from T4 and T5 are merged to form a single projected transaction in the γ D database.
That is, the new projected database will have (D, A), (E, A), and (D, E, A).
Definition 2.14. The sub-tree utility is denoted by subU (γ , x ) of an itemset γ and an item x which
can have extension of γ as follows:
⎡⎢ ⎤⎥
subU (γ , x ) = ⎢
⎢⎢u (γ ,T ) + u (x,T ) + u (i j ,T ) ⎥⎥⎥ . (10)
(γ ∪{x }) ⊆T ⎢⎣ i j ∈T ∩Ex (γ ∪{x }) ⎥⎦
This sub-tree utility is one of the pruning strategies used to reduce the search space. If subU (γ , x ) <
TU × δ , then the itemset γ ∪ {x } and the following nodes (itemsets) can be pruned. As shown in
the illustrative example in Figure 1, if subU (∅, D) is less than TU × δ , then the following itemsets
{D, E}, {D, A}, and {D, E, A} can be pruned.
Definition 2.15. The local utility denoted by locU (γ , x ) for an itemset is as follows:
locU (γ , x ) = [u (γ ,T ) + rem(γ ,T )]. (11)
(γ ∪{x }) ⊆T
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:9
Obviously, the local utility is always larger than the sub-tree utility for an itemset. However, it is
another pruning strategy in a more aggressive way. If locU (γ , x ) < TU × δ , then all of the branches
following itemset γ with item x can be pruned. If locU ({C}, E) is less than TU × δ , then the nodes
after itemsets {C, B, E}, {C, D, E}, and {C, E} do not need to be estimated, as shown in the illustrative
example in Figure 1.
3 PROPOSED ALGORITHM
This section describes the proposed algorithm i.e., HUIM with Pruning Strategies (HUI-PR) for
determining the set of HUIs. It consists of the following: (1) the construction of itemsets in a tree
structure, (2) the node selection rule, (3) the pruning strategies, and (4) the main algorithm to find
the set of HUIs. The proposed method is an extended approach for EFIM, which is the state-of-the-
art approach. We then propose a stricter upper bound for HUIs and enhance the abilities of the
pruning strategies. In addition, the proposed method also provides a multiple thread framework
to run in a parallel environment.
For the construction of itemsets in a tree structure, high-transaction weighted utilization item-
sets with one item in each itemset (1-HTWUIs) are prepared. The TWU of each item is used to
find 1-HTWUIs, which must be higher than the threshold. This helps to reduce the number of
unnecessary branches in the tree to traverse by pruning. The unpromising items are removed dur-
ing each transaction by scanning a database. After the removal of unpromising items, if there are
empty transactions in the database, they are removed and the items in each transaction are sorted
based on the total ordering as described in Definition 2.9. In the node selection rule, how the node
is traversed is explained in detail to find the itemset. A new strict sub-tree is constructed in the
recursive method and the nodes are visited based on the node selection rule. The pruning of the
nodes using strict local utility will be explained. These pruning rules help in avoiding unnecessary
traversing once an itemset is no longer feasible. Details are described below:
The following items are used to keep the following attached items from the current itemset and
generate the new candidate itemsets for estimation. The definition of strict local utility (slocU ) is
given here. The concept of strict local utility is very similar to traditional local utility, but prunes
some of the overestimated utilities. It includes the remaining utility of the estimated itemset X . A
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:10 J. M.-T. Wu et al.
Both remaining utility and strict remaining utility are used to estimate the potential utility for
the following itemsets from the current itemset in the searching tree structure. Thus, if an item
does not exist in f i (X ), then it is impossible to provide any utility to the following candidate
itemsets. Strict remaining utility resolves this problem and prunes the overestimated utility from
the remaining utility. Therefore, it can obtain a smaller upper bound for HUI mining and effectively
reduce the number of candidate itemsets. An illustrated example is given in Table 5 and assumes
that the current itemset is {B, D}. It is easy to calculate the remaining utility of itemset {B, D} from
T5 and T6 . In T5 , there are 6 × 2 from E and 4 × 4 from A. In the same way, there are 1 × 2 from E
and 6 × 4 from A in T6 . However, if f i ({B, D}) is {A}, this means that the process will not consider
the item E combined with the itemset {B, D} as a new candidate itemset. Therefore, the utility from
E does not need to be accumulated in the remaining utility. Thus, the modified strict remaining
utility of itemset {B, D} is 4 × 4 = 16 and 6 × 4 = 24. Then, the strict remaining utility will be used
to define a new upper bound, which is called strict local utility and is defined below:
0 , item y ∀i ∈ γ , i y and y x
slocU (γ , x ) = (14)
[u (γ ,T ) + srem(γ ,T )] , otherwise
(γ ∪{x }) ⊆T
The concept of strict local utility is very similar to local utility. There are two different improve-
ments to further reduce the estimated utility for candidate HUIs. The first is using strict remaining
utility to replace remaining utility in each related transaction. The second is that if there is no item
existing between γ and x by the increasing order of TW U , then this transaction can be ignored.
There is a simple example here to explain the second improvement. Assume γ is {C, B} and x is
E in Equation (14). slocU (γ , x ) is used to decide whether to estimate the itemset {C, B, D} and the
following itemsets. Consider T2 in Table 5 (it is {C, B, E, A}), includes itemsets {C, B} and the item
{E}. However, it cannot provide any utility to the itemset {C, B, D} and its following itemsets, due
to there being no item between itemset {C, B} and E. Strict local utility will ignore transaction T2
directly. The general theorem and its proof are described below:
Theorem 3.1. Assume the increasing order O of TW U is i 1 i 2 · · · i n , slocU
(γ , x ) is less then
utility threshold. If ∃z ∈ f i (γ ) such that all of the itemsets I = ip1 , ip2 , . . . , ipn , z, x , then the given
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:11
itemset and its following itemsets are definetly not a high-utility itemset, when γ = {ip1 , ip2 , . . . , ipn }
and the order of the items in I is sorted by O.
u (I ) = I ⊆T u (I ,T ) = {γ }∪z∪x ⊆T (u (γ ,T ) + u (z,T ) + u (x,T ))
≤ ipn y x {γ }∪y∪x ⊆T (u (γ ,T ) + u (y,T ) + u (x,T ))
Proof. 0 , item y ∀i ∈ γ , i y and y x
= (γ ∪{x }) ⊆T
[u (γ ,T ) + srem(γ ,T )] , otherwise
= slocU (γ , x )
Therefore, if slocU (γ , x ) < threshold, u (I ) is absolutely less than threshold. It should be noted
that the formula in line 2 is also large than the utility of the following itemsets of I .
Proof. Due to Theorem 3.1, if an item does not exist in f i (γ ), it will not be considered to
combine a candidate itemsets. ssubU (γ , x ) accumulates all of utility of γ , x and the items after x
in order O and existed in f i (γ ). Therefore, ssubU (γ , x ) is a upper bound of u (I ). In other words, if
ssubU (γ , x ) is less than threshold, I and its following itemsets are not high-utility itemsets.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:12 J. M.-T. Wu et al.
4 Scan D to remove each item i f i (∅) from D and delete empty transactions;
5 Sort each transaction in D according to ;
6 Calculate the sub-tree utility subU (∅, i) for each item i ∈ f i (∅) by scanning D;
7 The next items for ∅, ni (∅) = {i i ∈ f i (∅) ∧ subU (∅, i) ≥ minutil };
8 return RecusiveSearch (∅, D, ni (∅), f i (∅), minutil );
ALGORITHM 2: RecusiveSearch
Input: An itemset α; the projected database of α, α–D; the next items of α, ni (α ); the following items
of α, f i (α ) and the minimal threshold, minutil.
Output: The set of high-utility itemsets that are extended from α.
1 Set HU I α = ∅;
2 for each item i ∈ ni (α ) do
3 β = α ∪ {i};
4 Scan α − D to calculate u (β ) and create the projected database of β, β–D;
5 if u (β ) ≥ minutil then
6 β → HU I α ;
7 end
8 if β − D ∅ then
9 Calculate ssubU (β, z) and slocU (β, z) ∀ item z ∈ f i (α ) by scanning β–D;
10 ni (β ) = z ∈ f i (α ) ssubU (β, z) ≥ minutil ;
11 f i (β ) = z ∈ f i (α ) slocU (β, z) ≥ minutil ;
12 HU I α ∪ RecusiveSearch (β, β–D, ni (β ), f i (β ), minutil );
13 end
14 end
15 return HU I α ;
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:13
4 if i n − 1 then
5 stopT I D = round (interval × (i + 1)) − 1;
6 else
7 stopT I D = size − 1
8 end
9 for (j = startT I D; j ≤ stopT I D; ++j) do
10 t = D.get(j);
11 for each item m in t do
12 locU (m) = locU (m) + transU (t );
13 end
14 end
4 AN ILLUSTRATED EXAMPLE
In this section, an example for revealing HUIs from the quantitative database D in Table 2 and the
profit table in Table 3 is given. We assume the user-specified threshold minutil = 48. The process
is described below.
(1) Input file
In the beginning, the proposed HUI-PR performs a pre-process step to load the input
database into memory and obtain a profile for this database. Then, the process can obtain
a table for the transaction weighted utilities (it is also the value of local utility for ∅ with
each item) in Table 4. Next, an increasing order of TW U (without the promising items),
C B D E A, and a database without promising items and the items arranged in
increasing order of TW U , can be obtained in Table 5.
(2) Initial stage
(ALGORITHM 1 lines 1–7)
locU (∅, C) = 154, locU (∅, B) = 177, subU (∅, C) = 153, subU (∅, B) = 167,
locU (∅, D) = 186, locU (∅, E) = 207, subU (∅, D) = 149, subU (∅, E) = 92,
locU (∅, A) = 224. subU (∅, A) = 80.
f i (∅) = {C, B, D, E, A}, ni (∅) = {C, B, D, E, A}.
The projected database of ∅, ∅D (item : utility) is as follows:
B : 12, D : 14, E : 12, A : 16
B : 6, D : 14, E : 2, A : 24
C : 10, B : 21, E : 6, A : 12
B : 9, D : 7, A : 12
C : 30, A : 16
C : 10, D : 28, E : 20
(ALGORITHM 1 line 8)
Then, perform RecusiveSearch(∅, ∅D, ni (∅), f i (∅), minutil ) to search HUIs.
(3) Estimate ∅ ∪ {C}, (perform RecusiveSearch(∅, ∅D, ni (∅), f i (∅), minutil ), C is the first one
in ni (∅), (ALGORITHM 2 line 2))
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:14 J. M.-T. Wu et al.
(ALGORITHM 2 line 4)
The projected database of {C}, {C}-D (item : utility) is as follows:
B : 21, E : 6, A : 12
A : 16
D : 28, E : 20
(ALGORITHM 2 lines 5–7)
u (C) = 48 is not a HUIs.
slocU ({C}, B) = 0, slocU ({C}, D) = 0,
slocU ({C}, E) = 107, slocU ({C}, A) = 49,
ssubU ({C}, B) = 49, ssubU ({C}, D) = 58,
ssubU ({C}, E) = 58, ssubU ({C}, A) = 68.
(ALGORITHM 2 lines 10–11)
f i ({C}) = {E, A}, ni ({C}) = {B, D, E, A}.
(4) Estimate {C} ∪ {B} (ALGORITHM 2 line 2)
(ALGORITHM 2 line 4)
The projected database of {C, B}, {C, B}-D (item : utility) is as follows:
E : 6, A : 12
(ALGORITHM 2 lines 5–7)
u (C, B) = 31 is not a HUIs.
slocU ({C, B}, E) = 0, slocU ({C, B}, A) = 49,
ssubU ({C, B}, E) = 49, ssubU ({C, B}, A) = 43.
(ALGORITHM 2 lines 10–11)
f i ({C, B}) = {A}, ni ({C, B}) = {E}.
(5) Estimate {C, B} ∪ {E} (ALGORITHM 2 line 2)
(ALGORITHM 2 line 4)
The projected database of {C, B, E}, {C, B, E}-D (item : utility) is:
A : 12
(ALGORITHM 2 lines 5–7)
u (C, B, E) = 37 is not a HUIs.
slocU ({C, B, E}, A) = 0, ssubU ({C, B, E}, A) = 49.
(ALGORITHM 2 lines 10–11)
f i ({C, B, E}) = ∅, ni ({C, B, E}) = {A}.
(6) Estimate {C, B, E} ∪ {A} (ALGORITHM 2 line 2)
(ALGORITHM 2 lines 5–7)
u (C, B, E, A) = 49 is a HUIs.
HU Is ← {C, B, E, A}.
(7) Estimate {C} ∪ {D} (ALGORITHM 2 line 2)
(ALGORITHM 2 line 4)
The projected database of {C, D}, {C, D}-D (item : utility) is:
E : 20
(ALGORITHM 2 lines 5–7)
u (C, D) = 38 is not a HUIs.
slocU ({C, D}, E) = 0, slocU ({C, D}, A) = 0,
ssubU ({C, D}, E) = 58, ssubU ({C, D}, A) = 0.
(ALGORITHM 2 lines 10–11)
f i ({C, B}) = ∅, ni ({C, B}) = {A}.
(8) Estimate {C, D} ∪ {E} (ALGORITHM 2 line 2)
u (C, D, E) = 58 is a HUIs.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:15
5 EXPERIMENTAL RESULTS
Experiments for the proposed HUI-PR, the state-of-the-art D2HUP [31], and EFIM algorithms [56]
were performed to find high utility itemsets from several datasets. To compare the algorithms,
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:16 J. M.-T. Wu et al.
Threshold EFIM D2HUP HUI-PR HUI-PR∗ Threshold EFIM D2HUP HUI-PR HUI-PR∗
Chess Mushroom
0.25 1,390 11,279 1,390 1,390 0.14 395 1,819 395 395
0.255 966 8,844 965 964 0.1425 296 1,231 296 296
0.26 704 7,027 703 701 0.145 213 1,055 213 213
0.265 518 5,610 518 515 0.1475 148 1,044 148 148
0.27 392 4,520 392 390 0.15 90 984 89 88
Threshold EFIM D2HUP HUI-PR HUI-PR∗ Threshold EFIM D2HUP HUI-PR HUI-PR∗
Connect Accidents
0.289 3,026 32,334 3,026 3,021 0.131 1,387 7,009 1,386 1,386
0.291 2,378 27,463 2,378 2,375 0.134 1,096 5,969 1,095 1,095
0.293 1,889 23,223 1,889 1,889 0.137 868 5,073 868 868
0.295 1,535 19,925 1,535 1,533 0.14 670 4,274 670 670
0.297 1,307 17,507 1,307 1,305 0.143 548 3,691 548 548
Threshold EFIM D2HUP HUI-PR HUI-PR∗ Threshold EFIM D2HUP HUI-PR HUI-PR∗
Retail Footmart
0.003 425 2,388 425 425 0.0011 1,542 5,185 1,542 1,542
0.004 174 1,198 174 174 0.0012 1,524 2,945 1,524 1,524
0.005 95 819 95 95 0.0013 1,496 1,980 1,495 1,495
0.006 59 576 59 59 0.0014 1,455 1,671 1,455 1,455
0.007 47 419 47 47 0.0015 1,383 1,580 1,383 1,383
experimental results included calculations on the run times, the number of candidate itemsets, the
times of the transaction occurring, and the times of the upper bounds calculation. The experiments
were executed in Java language on a personal computer with 8GB 1,867 MHz DDR3 memory, a 2.7
GHz Intel Core i5 CPU, and macOS High Sierra.
The real-world datasets [13] were used for the experiments to compare the designed HUI-PR
with the D2HUP and EFIM algorithms. Table 6 shows the characteristics of the dataset, where
#|D|, #|I |, AvgLen, MaxLen, and Type represent the total number of transitions, the number of
distinct items, the average size of transactions, maximum size of transactions, and type of dataset,
respectively. For each dataset, the experiment was conducted 100 times and the average was taken.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:17
In Figure 3, HUI-PR showed a much better performance than the EFIM and D2HUP algorithms.
This shows that the proposed improvements were effective in reducing the searching time for
mining HUIs in a dataset. Generally, D2HUP could perform better on the sparse-type datasets.
This is because, in these kinds of datasets the pruning strategies of EFIM and HUI-PR could not
prune candidate itemsets effectively and wasted a lot of time scanning the dataset. HUI-PR was
also more sensitive than EFIM for the subtle improvements. It is worth noting that the run times of
HUI-PR∗ were much more than those of HUI-PR, and sometimes also more than those of EFIM. This
is because strict remaining utility needs to perform the intersection operation to reduce the value
of utility. It requires more time than the time it could save. However, it can indeed obtain more
accurate upper bounds for HUIs. Some discussions about HUI-PR∗ will be given in the following
sections.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:18 J. M.-T. Wu et al.
Table 9. The Times of the Upper Bounds Calculation on the Six Real Datasets
than HUI-PR and EFIM (D2HUP always estimates more candidates during its process), and spend
less time to find all the HUIs in a dataset. Figure 4 shows the memory usage of EFIM and HUI-PR
in the experiments. It proves that the proposed HUI-PR uses less memory in the real running en-
vironment. The proposed HUI-PR algorithm is obviously more suitable applied in a dense dataset.
In a dense dataset, a strict upper bound can avoid to calculate more number of overestimated
itemsets. In a sparse dataset, the influence of the memory usage in the proposed HUI-PR is not
evident, because the differences of the itemsets’ utility are large and the itemsets are easy to be
separated and classified. In the experimental results, the upper bounds calculation can be reduced
more than thousands time and the reduced calculation is larger than 8 percentage in chess, connect
and accidents compared to the traditional EFIM method. Thus, the proposed method can save a lot
of memory usage in the dense datasets. However, in the previous section, HUI-PR∗ did not show
the advantage of accurate upper bounds in the runtime experiments. This is because EFIM and
HUI-PR performed the projected database method to compress and reformat the input dataset. A
smaller dataset reduces the advantage of HUI-PR∗ , but distinguishes the computational complexity
of strict upper bound. The same issue is shown in the multi-thread version of HUI-PR.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:19
reasons for this situation. First, it is very expensive to create a thread, and a computer needs to
assign more resources to this new one. Then, it is a critical section problem to handle the multiple
threads program, as it always wastes a lot of time to keep the data correctly. If the dataset is not
a large-scale dataset, then the multi-thread cannot present the ability of the parallel computation.
Since in the experiments, the D2HUP, EFIM and HUI-PR further performed the projected dataset
process to decrease the size of the dataset. In the “connect” dataset, HUI-PR(2) finally presented
better performance than HUI-PR, since more transactions and items were performed and the effect
of the projected dataset process is not obvious in the “connect” dataset. We then conclude that the
multi-threshold program can help to speed up mining performance especially for the large-scale
dataset without projection operation.
6 CONCLUSION
In this article, we have proposed a novel HUI mining approach called HU-PR to reduce the search
space while finding the set of high-utility itemsets. HUI-PR introduces two new upper bounds and
reduces the number of candidate itemsets. This helps in avoiding the computation time for unnec-
essary itemsets compared to the previous works. Mathematically, the proposed method can provide
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:20 J. M.-T. Wu et al.
more accurate upper bounds and help the process finding of HUIs effectively. However, we also
found that the proposed upper bounds and parallel computation could not show their advantages
with a small-scale dataset. The traditional EFIM loads all of the data into memory and duplicates
several modified version of the original dataset. It can indeed increase the performances of EFIM
and HUI-PR, but it is impractical to apply the algorithm in a real application. Datasets in the real
world are always very tremendous. They usually cannot load in memory and are not allowed to be
modified. In this case, performing the projected dataset process is impossible. However, we have
already proven that the proposed new upper bounds and parallel computation for finding HUIs
are useful. In the future, we will extend the designed multi-threads approach in cloud computation
model (such as a MapReduce framework) to reveal HUIs in a large-scale dataset.
REFERENCES
[1] Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast algorithms for mining association rules. In International Con-
ference on Very Large Data Bases, Vol. 1215. 487–499.
[2] Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. An efficient
candidate pruning technique for high utility pattern mining. In The Pacific-Asia Conference on Knowledge Discovery
and Data Mining. ACM, 749–756.
[3] Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient
tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data
Engineering 21, 12 (2009), 1708–1721.
[4] Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Young-Koo Lee. 2009. Efficient
tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data
Engineering 21, 12 (2009), 1708–1721.
[5] Brock Barber and Howard J. Hamilton. 2003. Extracting share frequent itemsets with infrequent subsets. Data Mining
and Knowledge Discovery 7, 2 (2003), 153–185.
[6] Raymond Chan, Qiang Yang, and Yi-Dong Shen. 2003. Mining high utility itemsets. In International Conference on
Data Mining. IEEE, 19–26.
[7] Ming-Syan Chen, Jiawei Han, and Philip S. Yu. 1996. Data mining: An overview from a database perspective. IEEE
Transactions on Knowledge and Data Engineering 8, 6 (1996), 866–883.
[8] Chun-Jung Chu, Vincent S. Tseng, and Tyne Liang. 2009. An efficient algorithm for mining high utility itemsets with
negative item values in large databases. Applied Mathematics and Computation 215, 2 (2009), 767–778.
[9] Alva Erwin, Raj P. Gopalan, and N. R. Achuthan. 2008. Efficient mining of high utility itemsets from large datasets.
In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 554–561.
[10] Alva Erwin, Raj P. Gopalan, and N. R. Achuthan. 2007. CTU-Mine: An efficient high utility itemset mining algorithm
using the pattern growth approach. In The International Conference on Computer and Information Technology. 71–76.
[11] Philippe Fournier-Viger, Jerry Chun-Wei Lin, Rage Uday Kiran, Yun-Sing Koh, and Rincy Thomas. 2017. A survey of
sequential pattern mining. Data Science and Pattern Recognition 1, 1 (2017), 54–77.
[12] Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S. Tseng. 2014. FHM: Faster high-utility item-
set mining using estimated utility co-occurrence pruning. In International Symposium on Methodologies for Intelligent
Systems. Troels Andreasen, Henning Christiansen, Juan-Carlos Cubero, and Zbigniew W. Raś (Eds.), Springer, 83–92.
[13] Bart Goethals. 2012. Frequent itemset mining dataset repository. Retrieved from http://fimi.ua.ac.be/data.
[14] Wensheng Gan, Jerry Chun-Wei Lin, Han-Chieh Chao, and Justin Zhan. 2017. Data mining in distributed environ-
ment: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7, 6 (2017), e1216.
[15] Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita.
2018. A survey of incremental high-utility itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowl-
edge Discovery 8, 2 (2018), e1242.
[16] Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, Tzung-Pei Hong, and Hamido Fujita.
2018. A survey of incremental high-utility itemset mining. WIRES Data Mining and Knowledge Discovery 8, 2 (2018),
e1242.
[17] Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2019. HUOPM:
High-utility occupancy pattern mining. IEEE Transactions on Cybernetics (2019), 1–14.
[18] Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2019. A survey of
parallel sequential pattern mining. ACM Transactions on Knowledge Discovery from Data 13, 3 (2019), 25.
[19] Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. In ACM Sigmod
Record, Vol. 29. ACM, 1–12.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
HUIM with Effective Pruning Strategies 58:21
[20] Tzung-Pei Hong, Jimmy Ming-Tai Wu, Yan-Kang Li, and Chun-Hao Chen. 2018. Generalizing concept-drift patterns
for fuzzy association rules. Journal of Network Intelligence 3, 2 (2018), 126–137.
[21] Srikumar Krishnamoorthy. 2015. Pruning strategies for mining high utility itemsets. Expert Systems with Applications
42, 5 (2015), 2371–2381.
[22] Guo-Cheng Lan, Tzung-Pei Hong, and Vincent S. Tseng. 2014. An efficient projection-based indexing approach for
mining high utility itemsets. Knowledge and Information Systems 38, 1 (2014), 85–107.
[23] Hua-Fu Li, Hsin-Yun Huang, Yi-Cheng Chen, Yu-Jiun Liu, and Suh-Yin Lee. 2008. Fast and memory efficient mining
of high utility itemsets in data streams. In IEEE International Conference on Data Mining. IEEE, 881–886.
[24] Yu-Chiang Li, Jieh-Shan Yeh, and Chin-Chen Chang. 2005. Direct candidates generation: A novel algorithm for discov-
ering complete share-frequent itemsets. In The International Conference on Fuzzy Systems and Knowledge Discovery,
Lipo Wang and Yaochu Jin (Eds.). Springer, 551–560.
[25] Yu-Chiang Li, Jieh-Shan Yeh, and Chin-Chen Chang. 2005. Direct candidates generation: A novel algorithm for dis-
covering complete share-frequent itemsets. In International Conference on Fuzzy Systems and Knowledge Discovery.
Springer, 551–560.
[26] Yu-Chiang Li, Jieh-Shan Yeh, and Chin-Chen Chang. 2008. Isolated items discarding strategy for discovering high
utility itemsets. Data & Knowledge Engineering 64, 1 (2008), 198–217.
[27] Chun-Wei Lin, Tzung-Pei Hong, and Wen-Hsiang Lu. 2011. An effective tree structure for mining high utility itemsets.
Expert Systems with Applications 38, 6 (2011), 7419–7424.
[28] Jerry Chun-Wei Lin, Shifeng Ren, Philippe Fournier-Viger, Tzung-Pei Hong, Ja-Hwung Su, and Bay Vo. 2017. A fast
algorithm for mining high average-utility itemsets. Applied Intelligence 47, 2 (2017), 331–346.
[29] Jerry Chun-Wei Lin, Shifeng Ren, Philippe Fournier-Viger, Jeng-Shyan Pan, and Tzung-Pei Hong. 2019. Efficiently
updating the discovered high average-utility itemsets with transaction insertion. Engineering Applications of Artificial
Intelligence 72, C (2019), 136–149.
[30] Jerry Chun-Wei Lin, Lu Yang, Philippe Fournier-Viger, and Tzung-Pei Hong. 2019. Mining of skyline patterns by
considering both frequent and utility constraints. Engineering Applications of Artificial Intelligence 77 (2019), 229–
238.
[31] Junqiang Liu, Ke Wang, and Benjamin C. M. Fung. 2012. Direct discovery of high utility itemsets without candidate
generation. In The International Conference on Data Mining. IEEE, 984–989.
[32] Mengchi Liu and Qu Junfeng. 2012. Mining high utility itemsets without candidate generation. In The International
Conference on Information and Knowledge Management. ACM, 55–64.
[33] Ying Liu, Wei-keng Liao, and Alok Choudhary. 2005. A two-phase algorithm for fast discovery of high utility itemsets.
In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 689–695.
[34] Heungmo Ryang and Unil Yun. 2016. High utility pattern mining over data streams with sliding window technique.
Expert Systems with Applications 57 (2016), 214–231.
[35] Heungmo Ryang and Unil Yun. 2017. Indexed list-based high utility pattern mining with utility upper-bound reduc-
tion and pattern combination techniques. Knowledge and Information Systems 51, 2 (2017), 627–659.
[36] Heungmo Ryang, Unil Yun, and Keun Ho Ryu. 2016. Fast algorithm for high utility pattern mining with the sum of
item quantities. Intelligent Data Analysis 20, 2 (2016), 395–415.
[37] Bai-En Shie, Hui-Fang Hsiao, and Vincent S. Tseng. 2013. Efficient algorithms for discovering high utility user be-
havior patterns in mobile commerce environments. Knowledge and Information Systems 37, 2 (2013), 363–387.
[38] Bai-En Shie, Hui-Fang Hsiao, Vincent S. Tseng, and S. Yu Philip. 2011. Mining high utility mobile sequential patterns in
mobile commerce environments. In International Conference on Database Systems for Advanced Applications. Springer,
224–238.
[39] Bai-En Shie, Vincent S. Tseng, and Philip S. Yu. 2010. Online mining of temporal maximal utility itemsets from data
streams. In ACM Symposium on Applied Computing. ACM, 1622–1626.
[40] Wei Song, Yu Liu, and Jinhong Li. 2014. BAHUI: Fast and memory efficient mining of high utility itemsets based on
bitmap. International Journal of Data Warehousing and Mining 10, 1 (2014), 1–15.
[41] Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu. 2012. Efficient algorithms for mining high utility
itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering 25, 8 (2012), 1772–1786.
[42] Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. 2010. UP-Growth: An efficient algorithm for high
utility itemset mining. In ACM International Conference on Knowledge Discovery and Data Mining. ACM, 253–262.
[43] Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. 2010. UP-Growth: An efficient algorithm for high
utility itemset mining. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM,
253–262.
[44] Bay Vo, Frans Coenen, and Bac Le. 2013. A new method for mining frequent weighted itemsets based on WIT-trees.
Expert Systems with Applications 40, 4 (2013), 1256–1264.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.
58:22 J. M.-T. Wu et al.
[45] Cheng Wei Wu, Bai-En Shie, Vincent S. Tseng, and Philip S. Yu. 2012. Mining top-k high utility itemsets. In The
International Conference on Knowledge Discovery and Data Mining. ACM, 78–86.
[46] Jimmy Ming-Tai Wu, Justin Zhan, and Jerry Chun-Wei Lin. 2017. An ACO-based approach to mine high-utility item-
sets. Knowledge-Based Systems 116 (2017), 102–113.
[47] Hong Yao and Howard J. Hamilton. 2006. Mining itemset utilities from transaction databases. Data & Knowledge
Engineering 59, 3 (2006), 603–626.
[48] Hong Yao, Howard J. Hamilton, and Cory J. Butz. 2004. A foundational approach to mining itemset utilities from
databases. In SIAM International Conference on Data Mining. SIAM, 482–486.
[49] Show-Jane Yen and Yue-Shi Lee. 2007. Mining high utility quantitative association rules. In International Conference
on Data Warehousing and Knowledge Discovery. Springer, 283–292.
[50] Unil Yun, Donggyu Kim, Eunchul Yoon, and Hamido Fujita. 2018. Damped window based high average utility pattern
mining over data streams. Knowledge-Based Systems 144 (2018), 188–205.
[51] Unil Yun, Gangin Lee, and Keun Ho Ryu. 2014. Mining maximal frequent patterns by considering weight conditions
over data streams. Knowledge-Based Systems 55 (2014), 49–65.
[52] Unil Yun, Gangin Lee, and Eunchul Yoon. 2017. Efficient high utility pattern mining for establishing manufacturing
plans with sliding window control. IEEE Transactions on Industrial Electronics 64, 9 (2017), 7239–7249.
[53] Unil Yun, Heungmo Ryang, Gangin Lee, and Hamido Fujita. 2017. An efficient algorithm for mining high utility
patterns from incremental databases with one database scan. Knowledge-Based Systems 124 (2017), 188–206.
[54] Unil Yun, Heungmo Ryang, and Keun Ho Ryu. 2014. High utility itemset mining with techniques for reducing over-
estimated utilities and pruning candidates. Expert Systems with Applications 41, 8 (2014), 3861–3878.
[55] Unil Yun and Keun Ho Ryu. 2013. Efficient mining of maximal correlated weight frequent patterns. Intelligent Data
Analysis 17, 5 (2013), 917–939.
[56] Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S. Tseng. 2015. EFIM:
A highly efficient algorithm for high-utility itemset mining. In The International Conference on Artificial Intelligence,
Grigori Sidorov and Sofia N. Galicia-Haro (Eds.). Springer, 530–546.
[57] Morteza Zihayat, Heidar Davoudi, and Aijun An. 2017. Mining significant high utility gene regulation sequential
patterns. BMC Systems Biology 11, 6 (2017), 109.
ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 6, Article 58. Publication date: October 2019.