You are on page 1of 20

Mining weighted sequential patterns in a

sequence database with a time-interval weight

Author:
Joong Hyuk Chang
DCIT, Daegu University,

1
ABSTRACT


The weighted sequential pattern mining aims to find more interesting
sequential patterns, considering the different significance of each data
element in a sequence database.

2
OUTLINE

 ABSTRACT
 KEYWORDS
 APPLICATIONS
 INTRODUCTION
 RELATED WORK
 PROBLEM DEFINITION
 TiWS Patterns
 Mining TiWS patterns in a large database
 EXPERIMENTAL RESULTS
 CONCLUSION

3
KEY WORDS

 Sequence database
 Time-interval sequence database
 Sequential pattern mining
 Weighted sequential pattern
 TiWS pattern
 TiWS support

4
KEY WORDS

 Sequence database: A sequence database consists of ordered


elements or events.
 Time-interval sequence database: The sequence database with
associated time stamp list.
 Sequential pattern mining: Given a set of sequences and support
threshold, finding the complete set of frequent subsequences.
 Given support threshold min_sup =2, <(ab)c> is a sequential
pattern.
 Weighted sequential pattern mining: The sequentional pattern
mining of the weighted sequences in a sequence database.

5
APPLICATIONS

• Applications of sequential pattern mining


– Customer purchase pattern analysis
• First buy computer, then CD-ROM, and then digital camera,
within 3 months.
– Medical treatments, natural disasters (e.g., earthquakes), science
& eng. processes, stocks and markets, etc.
– Web access pattern analysis
– Telephone calling patterns, Weblog click streams
– DNA sequences and gene structures analysis.

6
INTRODUCTION

• Sequential pattern mining aims to discover more


interesting patterns in a sequence database.
• Example:
• [Customer_A]:
Laser printer______________Jan
Scanner_________________Feb
CD Burner_______________March

[Customer_B]:
Laser printer______________Jan
Scanner_________________Jun
CD Burner_______________Sep.

7
RELATED WORK

 To improve the usefulness of mining results in real world applications,


weighted pattern mining has been studied in association rule mining
and sequential pattern mining .Most of the weighted pattern mining
algorithms usually require pre-assigned weights, and the weights are
generally derived from the quantitative information and the importance
of items in a real world application.
 General sequential pattern mining,
 Closed and maximal sequential pattern mining, etc.
 SPADE and PrefixSpan are more efficient in terms of processing time.

8
PROBLEM DEFINITION

 Let I={i1,i2,…in} be a set of all items.


 A sequence S=<s1,s2 ,...sl> is an ordered list of
itemsets,
where sj : itemset,
 Time stamp list TS(S)=<t1,t2...tl>
where tj-1<tj.
 sj⊆I.

SID sequences Time stamp list


10 <a,(abc),(ac),d> <0,1,2,3>
20 <(ad),c,(bc),(ae)> <1,2,3,4>
30 <(ad),(bc),(df)> <1,3,5>
40 <a,(abc),d> <2,3,4>
9
Subsequence vs. super sequence

 Given two sequences α=< a1 a2 … an > and β=< b1 b2 … bm


>
 α is called a subsequence of β, denoted as α⊆ β, if there
exist integers 1≤ j1 < j2 <…< jn ≤m such that a1 ⊆ bj1, a2
⊆ bj2,…, an ⊆ bjn
 β is a super sequence of α
 E.g.α=< (ab), d> and β=< (abc), (de)>

10
TiWS-patterns

 A time-interval between pair of itemsets:


 Time interval weight of a pair of itemsets.
 Time interval weight of a sequence.
1. Strength of a pair of itemsets.
A time-interval between pair of items

Definition 1:
A time-interval between pair of items:
S=<s1,s2,s3...sn> is a sequence.
TS(S)=<t1,t2,t3...tn>be the time stamp list
The time interval between si and sj is
Tiij=tj-ti where(1<i<j<n)
 There exists
n×(n-1) pairs of items
2
n :no of itemsets in the sequence.Possible pairs of itemsets for SID10
1st 2nd Time-
itemset temset interval
a (abc) 1
a (ac) 2
a d 3
(abc) (ac) 1
(abc) d 2
(ac) d 1 12
Time interval weight of a pair of itemsets.

 3 weight functions as
 WF_1:General scale weighting: Wg(TIij)=δ( T Iij )/u=δ (tj–ti)/u

 WF_2:Log scale weighting:Wl(TIij)=δ(log2(1+(TIij)/u)=δ(log2(1+(tj–ti)/u)


 WF_3:General scale weighting with a
ceiling:Wc(TIij)=δ┌TIij˥/u=δ┌tj–ti)˥/u
 Where u(u>0): the size of the unit time and
 δ(0<δ<1): is the base no to determine the amount
of weight reduction per unit time

13
Time ineterval weight of the sequence.

 Definition 2:
 Strength of a pair of itemsets
 STij=length(si)xlength(sj).
 Time-interval weight of a sequence.

14
TiWS-Support

 Definition3:(TiW-support of a sequence)
 The TiW-support of a sequence X in SDB,TiW-
Supp(X), is defined as follows

the weight of the sequence X


=

The weight of the total


sequences in SDB

15
TiWS-Support

 Definition 6:TiWS-patterns
 Given a support threshold minSupport(0<minSupport≤1),
 a sequence X is Time interval weighted sequential pattern
if TiW-Supp(X) is no less than the threshold ie TiWS-
Supp(X)≥minSupport.

16
Anti-monotone property of TiWS-support

 Let A and B be sequences in an SDB, and B is a super


sequence of A then the TiWS-support of A is found as

since A⊆B, the weight of the A is always greater than the weight of
Accordingly the following holds.
Mining TiWS patterns in large sequence database

Input
SDB,Minsup,& time-interval weighting function

psTiWS S=<s1,s2,s3...sn>
TS(S)=<t1,t2,t3...tn>
•Scan SDB once,
•For each sequence S call GetWeight(S).
•Find each time-interval frequet item α Procedure GetWeight(S)
such that TiWS-Supp(α)>minSupp.

W(S)

α,l,Slα
For each time-interval weighted frequent
item α, output α and call Span(α,l,Slα)

Procedure Span(α,l,Sl α
)
TiWS pattern TiWS patterns
(Single item)

Output ἀ,l+1,slἀ

The complete set of TiWS patters


CONCLUSION

 A process to get the weight of the sequence is


proposed.

 A new framework is developed for mining


weighted sequential patterns.

21
22

You might also like