Professional Documents
Culture Documents
Software and
Techniques
BUAN6324/MIS6324
Spring, 2016 Gregory G. MacDonald, PhD
(Lecture #6b)
Agenda
Project selection and expectations
No paper
Powerpoint presentation ~ 10-15 minutes
Describe dataset, what you did in the data prep phase,
methods considered, challenges encountered, model
evaluation, insights.
Sequence Data
Sequence is an ordered list of
elements, elements are made up of
one or more events (items)
Could be customer,
sensor, server, etc
4
Sequence Definition
Lets Check
Object
A
A
A
B
timestamp elements
10
{1},{2,3},{4}
20
{5}
30
{6},{7}
10
{4,5,6},{1}
3 elements, and
4 events (items)
Sequence
Sequence Examples
How many
elements? Events
(items) ?
Subsequence Definition
Subsequence Definition
The Task
10
The Task
11
The Task
12
The Task
13
Extracting Sequential
Patterns
14
Note
* The number of candidate subsequences is substantially
greater than the number of candidate itemsets (from market
basket)
* An item can appear at most once in an itemset, but an event
can appear more than once in a sequence given two items a,b
only one 2-itemset can be generated but there are many
?
candidate 2-subsequences <{a,b}>, <{a}, {b}>, <{b,a}>,
<{b},{a}>
* Order matters in a subsequence but not for itemsets {1,2}
and {2,1} refers to the same itemset but are different
subsequences (temporal consideration)
15
16
Timing Constraints
Maxspan specifies the maximum allowed time difference
between the latest and the earliest occurrences of events in
the entire subsequence
Affects support count
Large maxspan may detect spurious patterns
?
Maxgap maximum time difference between two consecutive
elements of a subsequence (not events/items but elements)
Mingap minimum time difference between two consecutive
elements of a subsequence (again, not events/items but
elements)
17
Timing Constraints
18
Timing Constraints
19
Timing Constraints
20
Quiz!
21
Quiz!
22
R
Package arulesSequences
23
Mid-Term Review
See eLearning midterm guide
24