Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
Finding Fuzzy Locally Frequent Itemsets

Finding Fuzzy Locally Frequent Itemsets

Ratings: (0)|Views: 22 |Likes:
Published by ijcsis
The problem of mining temporal association rules from temporal dataset is to find association between items that hold within certain time intervals but not throughout the dataset. This involves finding frequent sets that are frequent at certain time intervals and then association rules among the items present in the frequent sets. In fuzzy temporal datasets as the time of transaction is imprecise, we may find set of items that are frequent in certain fuzzy time intervals. We call these as fuzzy locally frequent sets and the corresponding associated association rules as fuzzy local association rules. These association rules cannot be discovered in the usual way because of fuzziness involved in temporal features. In this paper, we propose a modification to the well-known A-priori algorithm to compute fuzzy locally frequent sets. Finally we have shown manually with the help of an example that the algorithm works.
The problem of mining temporal association rules from temporal dataset is to find association between items that hold within certain time intervals but not throughout the dataset. This involves finding frequent sets that are frequent at certain time intervals and then association rules among the items present in the frequent sets. In fuzzy temporal datasets as the time of transaction is imprecise, we may find set of items that are frequent in certain fuzzy time intervals. We call these as fuzzy locally frequent sets and the corresponding associated association rules as fuzzy local association rules. These association rules cannot be discovered in the usual way because of fuzziness involved in temporal features. In this paper, we propose a modification to the well-known A-priori algorithm to compute fuzzy locally frequent sets. Finally we have shown manually with the help of an example that the algorithm works.

More info:

Published by: ijcsis on Feb 15, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

02/15/2011

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 1, January 2011
Finding Fuzzy Locally Frequent Itemsets
Fokrul Alom Mazarbhuiya
Department of Computer ScienceCollege of Computer ScienceKing Khalid UniversityAbha, Kingdom of Saudi Arabiae-mail: fokrul_2005@yahoo.com
Md. Ekramul Hamid
Department of Network EngineeringCollege of Computer ScienceKing Khalid UniversityAbha, Kingdom of Saudi Arabiae-mail: ekram_hamid@yahoo.com
 
 Abstract
— The problem of mining temporal association rulesfrom temporal dataset is to find association between items thathold within certain time intervals but not throughout the dataset.This involves finding frequent sets that are frequent at certaintime intervals and then association rules among the items presentin the frequent sets. In fuzzy temporal datasets as the time of transaction is imprecise, we may find set of items that arefrequent in certain fuzzy time intervals. We call these as fuzzylocally frequent sets and the corresponding associated associationrules as fuzzy local association rules. These association rulescannot be discovered in the usual way because of fuzzinessinvolved in temporal features. In this paper, we propose amodification to the well-known A-priori algorithm to computefuzzy locally frequent sets. Finally we have shown manually withthe help of an example that the algorithm works.
 Keywords-
 
Temporal Data mining, Fuzzy number, Fuzzy time- stamp, Core length of a fuzzy interval, Fuzzified interval 
I.
 
I
NTRODUCTION
 The problem of mining association rules has been definedinitially [15] by R. Agarwal
et al
for application in large supermarkets. Large supermarkets have large collection of records of daily sales. Analyzing the buying patterns of the buyers willhelp in taking typical business decisions such as what to put onsale, how to put the materials on the shelves, how to plan forfuture purchase etc.Mining for association rules between items in temporaldatabases has been described as an important data-miningproblem. Transaction data are normally temporal. The marketbasket transaction is an example of this type.In this paper we consider datasets, which are fuzzytemporal i.e. the time in which a transaction has taken place isimprecise or approximate and is attached to the transactions. Inlarge volumes of such data, some hidden information orrelation ship among the items may be there which cannot beextracted because of some fuzziness in the temporal features.Also the case may be that some association rules may hold incertain fuzzy time period but not throughout the dataset. Forfinding such association rules we need to find itemsets that arefrequent at certain time period, which will obviously beimprecise due to the fact that the time of each transaction isfuzzy. We call such frequent sets fuzzy locally frequent overfuzzy time interval. From these fuzzy locally frequent sets,associations among the items can be obtained. It is shownmanually with the help of an example that the algorithm givesthe required result. Although it is assumed here that the fuzzytime stamps are having similar membership functions, weclaim that the same algorithm with slight modification can beapplied to the database having dissimilar fuzzy time stamps.In section II we give a brief discussion on the works relatedto our work. In section III we describe the definitions, termsand notations used in this paper. In section IV, we give thealgorithm proposed in this paper for mining fuzzy locallyfrequent sets. In section V, we explain the algorithm with asmall dataset and display the results. We conclude withconclusion and lines for future work in section VI. In the lastsection we give some references.II.
 
R
ELATED WORKS
 The problem of discovery of association rules was firstformulated by Agrawal
et al
in 1993. Given a set
 I 
, of itemsand a large collection
 D
of transactions involving the items,the problem is to find relationships among the items i.e. thepresence of various items in the transactions. A transaction
is said to support an item if that item is present in
. Atransaction
is said to support an itemset if 
supports eachof the items present in the itemset. An association rule isan expression of the form
 X 
 
 
where
 X 
and
are subsetsof the itemset
 I 
. The rule holds with confidence
τ
if 
τ
% of the transaction in
 D
that supports
 X 
also supports
. Therule has support
σ
if 
σ
% of the transactions supports
 X 
 
 
.A method for the discovery of association rules was givenin [15], which is known as the A priori algorithm. This wasthen followed by subsequent refinements, generalizations,extensions and improvements. As the number of associationrules generated is too large, attempts were made to extractthe useful rules ([13], [16]) from the large set of discoveredassociation rules. Attempts are also made to make theprocess of discovery of rules faster ([12], [14]).Generalized association rules ([9], [17]) and Quantitativeassociation rules ([18]) were later on defined andalgorithms were developed for the discovery of these rules.A hashed based technique is used in [11] to improve therule mining process of the A priori algorithm.Temporal Data Mining is now an important extensionof conventional datamining and has recently been able toattract more people to work in this area. By taking intoaccount the time aspect, more interesting patterns that aretime dependent can be extracted. There are mainly twobroad directions of temporal data mining [7]. One concernsthe discovery of causal relationships among temporally
139http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 1, January 2011
oriented events. Ordered events form sequences and thecause of an event always occur before it. The otherconcerns the discovery of similar patterns within the sametime sequence or among different time sequences. Theunderlying problem is to find frequent sequential patternsin the temporal databases. The name sequence mining isnormally used for the underlying problem. In [8] theproblem of recognizing frequent episodes in an eventsequence is discussed where an episode is defined as acollection of events that occur during time intervals of aspecific size.The association rule discovery process is also extendedto incorporate temporal aspects. In temporal associationrules each rule has associated with it a time interval inwhich the rule holds. The problems associated are to findvalid time periods during which association rules hold, thediscovery of possible periodicities that association ruleshave and the discovery of association rules with temporalfeatures. In [10], [19], [20] and [21], the problem of temporal data mining is addressed and techniques andalgorithms have been developed for this. In [10] analgorithm for the discovery of temporal association rules isdescribed. In [2], two algorithms are proposed for thediscovery of temporal rules that display regular cyclicvariations where the time interval is specified by user todivide the data into disjoint segments like months, weeks,days etc. Similar works were done in [6] and [22]incorporating multiple granularities of time intervals (e.g.first working day of every month) from which both cyclicand user defined calendar patterns can be achieved. In [1],the method of finding locally and periodically frequent setsand periodic association rules are discussed which is animprovement of other methods in the sense that itdynamically extract all the rules along with the intervalswhere the rules hold. In ([23], [24]) fuzzy calendric datamining and fuzzy temporal data mining is discussed whereuser specified ill-defined fuzzy temporal and calendricpatterns are extracted from temporal data.Our approach is different from the above approaches.We are considering the fact that the time of transactions arenot precise rather they are fuzzy numbers and some itemsare seasonal or appear frequently in the transactions forcertain ill-defined periods only i.e. summer, winter, etc.They appear in the transactions for a short time and thendisappear for a long time. After this they may againreappear for a certain period and this process may repeat.For these itemsets the support cannot be calculated in theusual way ([1], [10]), it has to be computed by the methoddefined in section 3B. These items may lead to interestingassociation rules over fuzzy time intervals. In this paperwe calculate the support values of these sets locally in a
α
-cut of a fuzzy time interval where a fuzzy time intervalrepresents a particular season in which the itemset isappearing frequently and if they are frequent in the fuzzytime interval under consideration then we call these setsfuzzy locally frequent sets. The large fuzzy time gap inwhich they do not appear is not counted. As mentioned inthe previous paragraph similarly works were also done in[23], [24] but in non-fuzzy temporal data. But in all thesemethods they discuss the association rule mining of non-fuzzy temporal data. Our approach although little bitsimilar to the work of [1], is different from others in thesense that it discovers association rules from fuzzytemporal data and finds the association rules along withtheir fuzzy time intervals over which the rules holdautomatically.III.
 
PROBLEM
 
DEFINITION
 
 A. Some Definition related to Fuzziness
Let
 E 
be the universe of discourse. A fuzzy set
 A
in
ischaracterized by a membership function
 A
(
 x
) lying in [0,1].
 A
(
 x
) for
 x
 
 
 E 
represents the grade of membership of x in
 A
.Thus a fuzzy set
 A
is defined as
 A
={(
 x
,
 A(x
)),
 x
 
 
 E 
}A fuzzy set
 A
is said to be normal if 
 A
(
 x
)
=
1
 
for at least one
 x
∈ 
 
E
 
An
α
-cut of a fuzzy set is an ordinary set of elements withmembership grade greater than or equal to a threshold
α 
, 0
 
≤  
 
α 
 
≤  
 
1. Thus a
α 
-cut
 A
α 
of a fuzzy set
 A
is characterized by
 A
α 
=
{
 x
∈ 
 E; A
(
 x
)
 
≥α 
} [see
e.g.
[4]]A fuzzy set is said to be convex if all its
α 
-cuts are convexsets [see
e.g.
[5]].A fuzzy number is a convex normalized fuzzy set
A
definedon the real line
R
such that
 
1.
 
there exists an
x
0
 
 
 R
such that
 A
(
 x
0
)
 
=1, and2.
 
 A
(
 x
)
 
is piecewise continuous.A fuzzy number is denoted by [
a, b, c
] with
a < b < c
 where
 A
(
a
)
= A
(
c
)
=
0 and
 A
(
b
)
=
1.
 A
(
 x
) for all
 x
 
[
a, b
] isknown as left reference function and
 A
(
 x
) for
x
 
[
b, c
] isknown as the right reference function. Thus a fuzzy numbercan be thought of as containing the real numbers within someinterval to varying degrees. The
α 
-cut of the fuzzy number [
1
-a, t 
1
 , t 
1
+a
] is a closed interval [
1
+
(
α 
-
1)
.a, t 
1
+
(1
-
α 
)
.a
].Fuzzy intervals are special fuzzy numbers satisfying thefollowing.1.
 
there exists an interval
 
[
a, b
]
 
 R
such that
 A
(
 x
0
)
=
1for all
 x
0
[
a, b
], and2.
 
 A
(
 x
)
 
is piecewise continuous.A fuzzy interval can be thought of as a fuzzy number with aflat region. A fuzzy interval
 A
is denoted by
 A
 
=
[
a, b, c, d 
]with
a < b < c < d 
where
 A
(
a
)
=
 
 A
(
)
=
0 and
 A
(
 x
)
=
1 for all
 x
 
[
b, c
].
 A
(
 x
) for all
 x
 
[
a, b
] is known as left referencefunction and
 A
(
 x
) for
x
 
[
c, d 
] is known as the right referencefunction. The left reference function is non-decreasing and theright reference function is non-increasing [see
e.g.
[3]].
140http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 1, January 2011
Similarly the
α 
-cut of the fuzzy interval [
1
-a, t 
1
 , t 
2
 , t 
2
+a
] isa closed interval [
1
+
(
α 
-
1)
.a, t 
2
+
(1
-
α 
)
.a
].The core of a fuzzy number
 A
is the set of elements of 
 A
having membership value one i.e.Core(
 A
) = {(
 x
,
 A
(
 x
);
 A
(
 x
) = 1}For every fuzzy set
 A
,
 A
=
U
]1,0[
α α 
 A
where
α
 A
(
 x
) =
α 
.
α 
 A
(
 x
), and
α 
 A
is a special fuzzy set [4]For any two fuzzy sets
 A
and
 B
and for all
α 
[0, 1],i)
 
α
(
 A
 B
) =
α
 A
α
 B
ii)
 
α
(
 A
 B
) =
α
 A
α
 B
 For any two fuzzy numbers
 A
and
 B
, we say themembership functions
 A
(
 x
) and
 B
(
 x
) are similar to each other if the slope of the left reference function of 
 A
(
 x
) is equal to thethat of 
 B
(
 x
) and the slope of right reference of 
 A
(
 x
) is equal thatof 
 B
(
 x
). Obviously for any two fuzzy numbers
 A
and
 B
havingsimilar membership functions
 
α
 A
 
=
α
 B
,
∀α∈
[0, 1]
 B. Some Definition related to Fuzzy Locally Frequent set 
Let T = <
o
 , t 
1
 ,…………
> be a sequence of imprecise or fuzzytime stamps over which a linear ordering < is defined where
i
 <
 j
means
i
denotes the core of a fuzzy time which is earlierthan the core of another fuzzy time stamp
 j.
For the sake of convenience, we assume that all the fuzzy time stamps arehaving similar membership functions. Let
 I 
denote a finite setof items and the transaction database
 D
is a collection of transactions where each transaction has a part which is a subsetof the itemset
 I 
and the other part is a fuzzy time-stampindicating the approximate time in which the transaction hadtaken place. We assume that
 D
is ordered in the ascendingorder of the core of fuzzy time stamps. For fuzzy time intervalswe always consider a fuzzy closed intervals of the form [
1
-a t 
1
 ,
 
2
 , t 
2
+a
] for some real number
a
. We say that a transaction is inthe fuzzy time interval [
1
-a, t 
1
 ,
 
2
 , t 
2
+a
] if the
α 
-cut of the fuzzytime stamp of the transaction is contained in
α
-cut of [
1
-a, t 
1
 ,
 
2
 , t 
2
+a
] for some user’s specified value of 
α
.We define the local support of an itemset in a fuzzy timeinterval [
1
-a, t 
1
 ,
 
2
 ,
2
+a
] as the ratio of the number of transactions in the time interval [
1
+
(
α 
-
1)
.a, t 
2
+
(1
-
α 
)
.a
]containing the itemset to the total number of transactions in[
1
+
(
α 
-
1)
.a, t 
2
+
(1
-
α 
)
.a
] for the whole data base
 D
for a givenvalue of 
α 
. We use the notation (
 X 
) todenote the support of the itemset
 X 
in the fuzzy time interval[
],,,[
2211
aa
Sup
+
1
-a, t 
1
 ,
 
2
 , t 
2
+a
]. Given a threshold
σ
we say that an itemset
 is frequent in the fuzzy time interval [
1
-a, t 
1
 ,
 
2
 ,
2
+a
] if (
 X 
)
(
σ
 /100)* tc where tc denotes the totalnumber of transactions in
 D
that are in the fuzzy time interval[
],,,[
2211
aa
Sup
+
1
-a, t 
1
 ,
 
2
 , t 
2
+a
]. We say that an association rule
 X 
⇒ 
, where
 X 
and
are item sets holds in the time interval [
1
-a, t 
1
 ,
 
2
 , t 
2
+a
]if and only if given threshold
τ
,
0.100 / )( / )(
],,,[],,,[
22112211
τ 
++
 X Sup X Sup
aaaa
 and
 X 
is frequent in [
1
-a, t 
1
 ,
 
2
 ,
2
+a
]. In this case we saythat the confidence of the rule is
τ
.For each locally frequent item set we keep a list of fuzzytime intervals in which the set is frequent where each fuzzyinterval is represented as [
start-a, start, end, end+a
] where
start 
gives the approximate starting time of the time intervaland
end 
gives the approximate ending time of the time-interval.
end 
start 
gives the length of the core of the fuzzy timeinterval. For a given value of 
α 
of two intervals [
start 
1
-a,
start 
1
,
end 
1
,
end 
1
+a] and [
start 
2
-a,
start 
2
,
end 
2
,
end 
2
+a] are non-overlapping if their
α 
-cuts are non-overlapping.IV.
 
PROPOSED ALGORITHM
 
 A. Generating Fuzzy Locally Frequent Sets
While constructing locally frequent sets, with each locallyfrequent set a list of fuzzy time-intervals is maintained in whichthe set is frequent. Two user’s specified thresholds
α 
and
minthd 
are used for this. During the execution of the algorithmwhile making a pass through the database, if for a particularitemset the
α 
-cut of its current fuzzy time-stamp, [
α 
 Lcurrent,
α 
 Rcurrent 
] and the
α 
-cut, [
α 
 Llastseen,
α 
 Rlastseen
] of its
 
fuzzytime, when it was last seen overlap then the current transactionis included in the current time-interval under considerationwhich is extended with replacement of 
α 
 Rlastseen
by
α 
 Rcurrent 
; otherwise a new time-interval is started with
α 
 Lcurrent 
as the starting point. The support count of the itemset in the previous time interval is checked to see whether it isfrequent in that interval or not and if it is so then it is fuzzifiedand added to the list maintained for that set. Also for the fuzzzylocally frequent sets over fuzzy time intervals, a minimum corelength of the fuzzy period is given by the user as
minthd 
andfuzzy time intervals of core length greater than or equal to thisvalue are only kept. If 
minthd 
is not used than an itemappearing once in the whole database will also become locallyfrequent a over fuzzy point of time.Procedure to compute
 L
1
, the set of all fuzzy locally frequentitem sets of size 1.For each item while going through the database we alwayskeeps an
α 
-cut
α 
lastseen
which is [
α 
 Llastseen,
α 
 Rlastseen
] thatcorresponds to the fuzzy time stamp when the item was lastseen. When an item is found in a transaction and the fuzzytime-stamp is
tm
and if its
α 
-cut
α 
tm
=[
α 
 Ltm,
α 
 Rtm
] has emptyintersection with [
α 
 Llastseen,
α 
 Rlastseen
], then a new timeinterval is started by setting
start 
of the new time interval as
α 
 Ltm
and
end 
of the previous time interval as
α 
 Rlastseen
. Theprevious time interval is fuzzified provided the support of theitem is greater than
min-sup
. The fuzzified interval is thenadded to the list maintained for that item provided that theduration of the core is greater than
minthd 
. Otherwise
α 
 Rlastseen
is set to
α 
 Rtm
, the counters maintained for counting
141http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->