You are on page 1of 1

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2819162, IEEE
Access

unacceptable, similar to the problem of FIM applied to social II. PROBLEM STATEMENT AND RELATED WORK
networks or large bioinformatics datasets [5]. This section briefly describes the problem of HUIM before
To deal with the performance bottleneck of exact reviewing related work in the area of HUIM and bio-inspired
approaches, bio-inspired algorithms have been applied for algorithms for itemset mining.
HUIM. For example, the genetic algorithm (GA) has been
used to mine HUIs by Kannimuthu and Premalatha [14], and A. PROBLEM OF HUIM
particle swarm optimization (PSO) [21], [22] has recently Let I = {i1, i2,…, iM} be a finite set of items. Then, set X  I
been applied to the mining of HUIs. These existing HUIM is called an itemset; an itemset containing k items is called a
algorithms based on bio-inspired computing follow the k-itemset. Let D = {T1, T2, …, TN} be a transaction database.
traditional routines of the original GA and PSO algorithm. Each transaction TiD, with unique identifier tid, is a subset
That is, the optimal values of one population are maintained of I.
in the next population. However, HUIM is different from The internal utility q(ip, Td) represents the quantity of item
problems in which there are relatively few best values—all ip in transaction Td. The external utility p(ip) is the unit profit
itemsets with utilities no lower than the minimum threshold value of item ip. The utility of item ip in transaction Td is
must be discovered. Because the distribution of HUIs is not defined as u(ip, Td) = p(ip)  q(ip, Td).
even, searching with the best values from the previous The utility of itemset X in transaction Td is defined as


population as targets may mean that some results are missed
u( X , Td )  u(ip , Td ) (1)
within a certain number of iterations. i p X  X Td
To solve this problem, we propose a novel bio-inspired-
algorithm-based HUIM framework (Bio-HUIF) to discover The utility of itemset X in D is defined as
HUIs. In this framework, rather than choosing only those
HUIs with the highest utility values in the current population, u( X )  
X Td Td D
u( X ,Td ) (2)
roulette wheel selection is applied to all the discovered HUIs
to determine the initial target of the next population. Based
The transaction utility (TU) of transaction Td is defined as
on Bio-HUIF, three HUIM algorithms are proposed: Bio-
TU(Td)=u(Td, Td).
HUIF-GA, Bio-HUIF-PSO, and Bio-HUIF-BA. These
To perform HUIM, the minimum utility threshold ,
employ GA, PSO, and the bat algorithm (BA), respectively.
specified by the user, is defined as a percentage of the total
For each algorithm, every discovered HUI could be chosen
TU values of the database, whereas the minimum utility value
as the initial target of the next population according to the
is defined as
ratio of its utility to the total utilities of all discovered HUIs.
The major contributions of this work are summarized as
follows.
min_util =   TU (T )
Td D
d (3)

First, a novel framework for HUIM is proposed based on


bio-inspired algorithms. The strategy of selecting discovered An itemset X is called an HUI if u(X)  min_util.
HUIs probabilistically, instead of maintaining the best values Given a transaction database D, the task of HUIM is to
from population to population, improves the diversity of determine all itemsets that have utilities no less than min_util.
solutions within a limited number of iterations. The transaction-weighted utilization (TWU) of itemset X
Second, under the proposed framework, three new [23] is the sum of the transaction utilities of all the
algorithms are proposed based on GA, PSO, and BA, transactions containing X, which is defined as


respectively. Besides the standard concepts of the three bio-
inspired algorithms, we use the strategies of bitmap database TWU( X )  TU(Td ) (4)
X Td Td D
representation, promising encoding vector checking, and bit
difference sets to accelerate the process of HUI discovery. X is a high transaction-weighted utilization itemset
Third, extensive experiments have been conducted on real (HTWUI) if TWU(X)  min_util; otherwise, X is a low
datasets to validate the performance of the three algorithms. transaction-weighted utilization itemset (LTWUI). An
The results show that the proposed approach outperforms HTWUI/LTWUI with k items is called a k-HTWUI/k-
existing bio-inspired HUIM algorithms in terms of efficiency, LTWUI.
the number of discovered HUIs, and convergence speed. Consider the transaction database in Table 1 and the profit
The remainder of this paper is organized as follows. In table in Table 2. For convenience, we write an itemset {c, e}
Section II, we describe the problem of HUIM and introduce as ce. In the example database, the utility of item e in
some related work. The proposed framework is presented in transaction T1 is u(e, T1) = 16 = 6, the utility of itemset ce in
Section III. The three algorithms based on Bio-HUIF are transaction T1 is u(ce, T1) = u(c, T1) + u(e, T1) = 18 + 6 = 24,
explained in Sections IV–VI, respectively. Experimental and the utility of itemset ce in the transaction database is u(ce)
results are presented and analyzed in Section VII. Finally, we = u(ce, T1) + u(ce, T3) + u(ce, T5) + u(ce, T8) + u(ce, T10) =
draw our conclusions in Section VIII. 24+7+16+31+14 = 92. Given min_util = 115, as u(ce) <

VOLUME XX, 2018 9

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like