|Views: 44
|Likes: 1

Published by ijcsis

Some real life data are associated with duration of events instead of point events. The most common example of such data is data of cellular industry where each transaction is associated with a time interval. Mining maximal fuzzy intervals from such data allows the user to group the transactions with similar behavior together. Earlier works were devoted to mining frequent as well as maximal frequent non-fuzzy intervals. We propose here a method of mining maximal dense fuzzy intervals where density of an interval quite similar to the frequency of an interval.

Some real life data are associated with duration of events instead of point events. The most common example of such data is data of cellular industry where each transaction is associated with a time interval. Mining maximal fuzzy intervals from such data allows the user to group the transactions with similar behavior together. Earlier works were devoted to mining frequent as well as maximal frequent non-fuzzy intervals. We propose here a method of mining maximal dense fuzzy intervals where density of an interval quite similar to the frequency of an interval.

See more

See less

(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 2, February 2011

Mining Maximal Dense Intervals fromTemporal Interval Data

F. A. Mazarbhuiya

1

M.A.Khaleel

1

1

Dept. of Computer Science

1

College of Computer Science

1

King Khalid University, Abha Saudi Arabia

1

Email:{fokrul_2005, khaleel_dm}@yahoo.comA. K. Mahanta

2

H. K. Baruah

2

2

Department of Computer Science

2

Gauhati University, India

2

Email:anjanagu@yahoo.co.in,hemanta_bh@yahoo.com

Abstract

- Some real life data are associated with duration of events instead of point events. The most common example of suchdata is data of cellular industry where each transaction isassociated with a time interval. Mining maximal fuzzy intervalsfrom such data allows the user to group the transactions withsimilar behavior together. Earlier works were devoted to miningfrequent as well as maximal frequent non-fuzzy intervals. Wepropose here a method of mining maximal dense fuzzy intervalswhere density of an interval quite similar to the frequency of aninterval.

Keywords- Frequent intervals, Maximal frequent intervals, Densityof a fuzzy interval, Minimum density, Contribution (vote) of atransaction on a fuzzy interval, join of two fuzzy intervals.

I INTRODUCTION

Among the various types of data mining applications, analysisof transactional data has been considered important. Oneimportant extension of this mining problem is to include atemporal dimension. Most of the earlier works done in this areado not take into account the time factor. By taking into accountthe time aspect, more interesting patterns that are time dependentcan be extracted. Recently data mining in temporal data sets hasarisen as an important data mining problem [[2], [10]].Many real life problems are associated with duration eventsinstead of point events. In this paper we are considering suchdatasets i.e. dataset having time intervals. Such datasets arecalled as temporal interval datasets. A record in such datatypically consists of the starting time and ending time (or thelength of the transaction) in addition to other fields. In [5] analgorithm for mining maximal frequent intervals from such datasets has been givenIn practice however most of the time people make statementsusing vague terms like the early morning, late evening etcinstead of mentioning strict time intervals. There is no strictboundary for separating early morning from morning. Torepresent such vague terms, fuzzy sets are required. In this paperwe discuss the problem of mining dense intervals using a fuzzyconcept. The objective of this paper is three fold. First wepropose the definition of density of a fuzzy interval over atransactional (where each transaction is associated with a timeduration) dataset. Secondly, we propose to define a joinoperation on the fuzzy intervals and lastly we propose analgorithm to mine maximal dense fuzzy intervals. In such cases,we define the amount of contribution (also called vote) of atransaction t associated with time interval [t

1

, t

2

] for a givenfuzzy interval

A

as the ratio of the area bounded by themembership function

A

(x) (associated with the fuzzy interval)and the real line included within the interval [t

1

, t

2

] to the totalarea covered by

A

(x) and the real line. If the total average of thevotes of all the transactions in a fuzzy interval

A

exceeds a pre-defined threshold, then the fuzzy interval is called a dense fuzzyinterval. Similarly a dense fuzzy interval will be maximal if nodense fuzzy interval contains it. The well-known A-priorialgorithm cannot be used here directly as the downward andupward closure property of frequent sets does not hold in thiscase (it is proved with an example). We propose a variation of the A-priori algorithm that works in this situation and gives usthe maximal dense fuzzy intervals.II. RELATED WORKSOne of the very useful extensions of conventional data miningis temporal data mining. In recent times it has been able to attracta lot of researcher to work in this area. Considering the timedimension in the conventional data mining problem, moreinteresting patterns can be extracted that are time dependent.There are mainly two broad directions of temporal data mining[7]. One concerns the discovery of causal relationships amongtemporally oriented events. Ordered events from sequences andthe cause of an event always occur before it. The other concernsthe discovery of similar patterns within the same time sequenceor among different time sequences. The underlying problem is tofind frequent sequential pattern in the temporal databases.Wong

et al

[9] introduced the fuzzy concept into theassociation rule mining to deal with quantitative attributes.Quantitative attributes are normally handled by partitioning theattribute domains and then combining adjacent partitions [8].Although this method can solve problems introduced by finitedomain, it causes the sharp boundary problem. To soften theaffect of soft boundaries, fuzzy sets are used. Here eachquantitative attribute is associated with several fuzzy sets. Afuzzy association rule looks like if

X

is

A

then

Y

is

B

, where

X

and

Y

are attributes and

A

and

B

are fuzzy sets which describe

X

and

Y

respectively. Prade

et al

[6] defined support andconfidence of a fuzzy association rule.

102 http://sites.google.com/site/ijcsis/ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 2, February 2011

In [2], Rossi and Ale extended the well-known A-priorialgorithm for mining association rules to temporal data anddescribed a technique to find interesting patterns on the data thatare time bounded.In [5], the problem of mining maximal frequent intervals isdiscussed. They define a maximal frequent interval as an intervalthat is frequent which means that it is present in sufficientnumber of transactions and no other frequent interval contains it.Using a pre-fix traversal algorithm, the maximal frequentintervals have been found and it was also found experimentallythat pre-order traversal algorithm outperforms the A-priori basedalgorithm.Our approach is different from the above approaches. We aretaking into account the fact that the intervals of time are of fuzzynature. By calculating density of the fuzzy intervals in aparticular transactional dataset where transactions are associatedwith time intervals (non-fuzzy) as mentioned in the next section,we first compute the dense fuzzy time intervals by using someuser defined minimum density value and then apply a joinoperation to join neighboring intervals to find maximal densefuzzy intervals. The fuzzy intervals and their membershipfunctions are provided by domain experts.III PROBLEM DEFINITION

A. Some basic definitions related to fuzziness

Let

E

be the universe of discourse. A fuzzy set

A

in

E

ischaracterized by a membership function

A

(x) lying in [0,1].

A

(x)for x

∈

E

represents the grade of membership of x in

A

. Thus afuzzy set

A

is defined as

A

={(x,

A

(x)), x

∈

E

}A Fuzzy set

A

is said to be normal if

A

(x) =1 for at least one

x

∈

E

.An

α

-cut of a fuzzy set is an ordinary set of elements withmembership grade greater than or equal to a threshold

α

, 0

≤α≤

1.Thus an

α

-cut

A

α

of a fuzzy set

A

is characterized by

A

α

={x

∈

E

;

A

(x)

≥

α

} [see

e.g.

[3]]A fuzzy set is said to be convex if all its

α

-cuts are convexsets.A fuzzy number is a convex normalized fuzzy set

A

definedon the real line

R

such that1.

there exists an

x

0

∈

R

such that

A

(x

0

) =1, and2.

A

(x)

is piecewise continuous.Thus a fuzzy number can be thought of as containing the realnumbers within some interval to varying degrees.Fuzzy intervals are special fuzzy numbers satisfying thefollowing.1.

there exists an interval

[a, b]

⊂

R

such that

A

(x

0

) =1 forall x

0

∈

[a, b], and2.

A

(x)

is piecewise continuous.A fuzzy interval can be thought of as a fuzzy number with a flatregion. A fuzzy interval

A

is denoted by

A

= [a, b, c, d] with a <b < c < d where

A

(a) =

A

(d) = 0 and

A

(x) = 1 for all x

∈

[b, c].

A

(x) for all x

∈

[a, b] is known as left reference function and

A

(x)for

x

∈

[c, d] is known as the right reference function. The leftreference function is non-decreasing and the right referencefunction is non-increasing [see

e.g.

[4]]. The area of a fuzzyinterval is defined as the area bounded by the membershipfunction of the fuzzy interval and the real line.

B. Contribution (vote) of a transaction to a fuzzy interval

We define vote of a transaction t associated with the timeinterval [t

/

, t

//

] for the fuzzy interval

A

= [a, b, c, d] as follows:

∫ ∫

=

d at t t

dxxAdxxAAvote

)()(

// /

where

A

(x) is the membership function associated with the fuzzyinterval.Here

∫

// /

)(

t t

dxxA

is the portion of the area bounded by

A

(x) andthe real line included in the time interval [t

/

, t

//

].

∫

d a

dxxA

)(

isthe total area bounded by

A

(x) and the real line.Obviously

Avote

t

lies in [0,1] and if

A

⊆

[t

/

, t

//

], then

Avote

t

=1 and if

A

∩

[t

/

, t

//

] =

Φ

, then

Avote

t

=0.

C. Density of a fuzzy time interval in a data set

The density of a fuzzy interval over a given temporal intervaldataset

D

is computed by summing up the votes of all thetransactions of

D

for the corresponding fuzzy time interval anddividing it by the total number of transactions in

D

. Each recordcontributes a vote, which falls in [0, 1].

||/

DAvoteAdensity

Dt t D

∑

∈

=

A fuzzy interval is dense if its density is more than a userspecified threshold called

min_density

.

D. Join of two fuzzy intervals

The fuzzy intervals are given by the user as input. Two fuzzyintervals A and B are called neighbors or adjacent to each otherif supp(

A

∩

B

)

≠Φ

where supp(

A

∩

B

) ={x; (

A

∩

B

)(x) > 0 }[see

e.g.

[4]]. We assume that the input fuzzy intervals are such that if the intervals are arranged in the ascending order according totheir starting time then each fuzzy interval has a unique leftneighbor and a unique right neighbor. Let

A

= [a

1

, b

1

, c

1

, d

1

] and

B

= [a

2

, b

2

, c

2

, d

2

] be two adjacent fuzzy intervals. Without lossof generality we can assume that a

1

< a

2

. Also we assume that forany two adjacent fuzzy intervals such as

A

and

B

above c

1

= a

2

and d

1

= b

2

and for c

1

≤

x

≤

d

1

A

(x) = 1 –

B

(x). Our assumption isnatural since otherwise some points will be given more emphasisand some less emphasis. We define the join of

A

and

B

denotedby

A

∧

B

is defined as

A

∧

B

= [a

1

, b

1

, c

2

, d

2

]

103 http://sites.google.com/site/ijcsis/ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 2, February 2011

Where (

A

∧

B

)(x) =

A

(x), a

1

≤

x

≤

b

1

A

(x) +

B

(x)=1,b

1

≤

x

≤

c

2

B

(x) for c

2

≤

x

≤

d

2

To explain the joining operation we again consider two fuzzyintervals [a

1

,b

1

,c

1

,d

1

] and [a

2

,b

2

,c

2

,d

2

] whose membershipfunctions are shown in the figure1. Here c

1

= a

2

and b

2

= d

1

. Anypoint in between c

1

and d

1

will have a membership value of

A

(x)corresponding to

A

and corresponding to

B

it will have amembership value of

B

(x) = 1 –

A

(x) so that

A

(x) +

B

(x) = 1.Thus our joined fuzzy interval will be [a

1

, b

1

, c

2

, d

2

] (shown infig.2).B C F Ga

1

b

1

c

1

=a

2

d

1

=b

2

c

2

d

2

A E D H

Fig 1: Join of two fuzzy intervals

B Ga

1

b

1

c

2

d

2

A H

Fig 2: Joined interval

A dense fuzzy interval is maximal if no super set of it is dense.However a subset of it may not be dense because the downwardand upward closure property for dense sets may not hold in thiscase.

E. Theorem

The join of two fuzzy intervals is not dense if both of the fuzzyintervals are not dense and dense if at least one of the fuzzyintervals is dense.Proof. To prove the above result we consider a data set

D

with 8transactions. The time-intervals associated with the transactionsare shown below.

Transaction id t

1

t

2

t

3

t

4

t

5

t

6

t

7

t

8

Time-interval[

t

i

,

t

j

]

[1,3] [1,6] [3,6] [2,6] [5,7] [6,7] [1,2] [5,7]

Table1: Transaction datasetsConsider the fuzzy intervals

A

= [1, 3, 4, 6] and

B

= [4, 6, 7, 9]where the membership functions of

A

and

B

are respectively0,

x

≤

1 and

x

≥

6

A

(

x

) = (

x

– 1)/2, 1

≤

x

≤

31, 3

≤

x

≤

4(6-

x

)/2, 4

≤

x

≤

6and0,

x

≤

4 and

x

≥

9

B

(

x

) = (

x

– 4)/2, 4

≤

x

≤

61, 6

≤

x

≤

7(9-

x

)/2, 7

≤

x

≤

9

∫ ∫

=

6131

)()(

1

dxxAdxxAAvote

t

=1/3

∫ ∫

=

6161

)()(

2

dxxAdxxAAvote

t

= 1

∫ ∫

=

6163

)()(

3

dxxAdxxAAvote

t

=2/3

∫ ∫

=

6162

)()(

4

dxxAdxxAAvote

t

= 2.75/3

∫ ∫

=

6175

)()(

5

dxxAdxxAAvote

t

=.25/3

∫ ∫

=

6176

)()(

6

dxxAdxxAAvote

t

= 0

∫ ∫

=

6121

)()(

7

dxxAdxxAAvote

t

=.25/3

∫ ∫

=

6175

)()(

8

dxxAdxxAAvote

t

= .25/3Therefore,

8

87654321

)(

AvoteAvoteAvoteAvoteAvoteAvoteAvoteAvote

t t t t t t t t

ADensity

+++++++

=

=3.1666666/8= 0.395833325Similarly

∫ ∫

=

9431

)()(

1

dxxBdxxBBvote

t

=0

∫ ∫

=

9461

)()(

2

dxxBdxxBBvote

t

= 1/3

104 http://sites.google.com/site/ijcsis/ISSN 1947-5500