You are on page 1of 24

Business Intelligence

Software and
Techniques
BUAN6324/MIS6324
Spring, 2016 Gregory G. MacDonald, PhD
(Lecture #6b)

Agenda
Project selection and expectations
No paper
Powerpoint presentation ~ 10-15 minutes
Describe dataset, what you did in the data prep phase,
methods considered, challenges encountered, model
evaluation, insights.

Sequential Pattern Mining


Mid-Term Review

Basis for this lecture is:


http://www-users.cs.umn.edu/~kumar/dmbook/ch7.pdf
2

Sequential Pattern Mining


Considers a sequence of transactions in
time and not just one shopping trip,
temporal dimension
Patterns may consist of elements that
may not be continuous in time, possible
to use extracted patterns to predict the
future
3

Sequence Data
Sequence is an ordered list of
elements, elements are made up of
one or more events (items)

Could be customer,
sensor, server, etc
4

Sequence Definition

Lets Check
Object
A
A
A
B

timestamp elements
10
{1},{2,3},{4}
20
{5}
30
{6},{7}
10
{4,5,6},{1}

3 elements, and
4 events (items)
Sequence

Length of the A sequence is 6 (6 elements), with 7 events

Sequence Examples

How many
elements? Events
(items) ?

Subsequence Definition

Subsequence Definition

The Task

10

The Task

11

The Task

12

The Task

13

Extracting Sequential
Patterns

14

Note
* The number of candidate subsequences is substantially
greater than the number of candidate itemsets (from market
basket)
* An item can appear at most once in an itemset, but an event
can appear more than once in a sequence given two items a,b
only one 2-itemset can be generated but there are many
?
candidate 2-subsequences <{a,b}>, <{a}, {b}>, <{b,a}>,
<{b},{a}>
* Order matters in a subsequence but not for itemsets {1,2}
and {2,1} refers to the same itemset but are different
subsequences (temporal consideration)
15

Sequence Generation (GSP)

16

Timing Constraints
Maxspan specifies the maximum allowed time difference
between the latest and the earliest occurrences of events in
the entire subsequence
Affects support count
Large maxspan may detect spurious patterns
?
Maxgap maximum time difference between two consecutive
elements of a subsequence (not events/items but elements)
Mingap minimum time difference between two consecutive
elements of a subsequence (again, not events/items but
elements)
17

Timing Constraints

18

Timing Constraints

19

Timing Constraints

20

Quiz!

21

Quiz!

22

R
Package arulesSequences

23

Mid-Term Review
See eLearning midterm guide

24

You might also like