You are on page 1of 14

Sequence Analysis

Athira P- AM.BU.P2MBA20029

1
Agenda
• Predictive analytics
• Association
• Sequential data base
• Sequential pattern mining
• Algorithm
• GSP Py
• Applications
2
Predictive analytics
• Data mining
Data Mining is a process used by organizations to extract specific data from huge databases
to solve business problems. It primarily turns raw data into useful information. Builds a
model to identify a pattern among the attributes presented in the data set of customers.
• Categories
Prediction
Association
Clustering

3
Association
Finding frequent patterns, associations, correlations, or causal structures
among sets of items or objects in transaction databases, relational databases,
and other information repositories.

• Link analysis
• Sequence analysis

4
Sequence data base
• Customer shopping sequences
Purchases laptop first, then digital camera, and then smart phone in 6 months
• Medical treatments, natural disasters..
• Stocks and markets
• Biological sequences: DNA, Protein..
• Soft ware engineering: Program execution..

5
Sequential pattern mining

• In Sequential pattern mining relationships between objects are examined


in terms of their order of occurrence to identify associations over time.
• Transaction database, sequence data base and time-series data base
• Gapped and non-gapped sequential patterns
• An example of a sequential pattern is “Customers who buy a Canon
digital camera are likely to buy an HP colour printer within a month.”

6
• Sequential pattern mining: Given a set of sequences, find a complete set
of frequent sub sequences (satisfying the min_sup threshold)

A sequence data base • A sequence: (ef) (ab) (df) c b


SID Sequence
10 <a(abc)(ac)d(cf)> • An element(event) may contain set of items
20 <(ad)c(bc)(ae)> Items within an element is unordered, usually ordered
30 <(ef)(ab)(df)cb> alphabetically
40 <eg(af)cbc>
• <a(bc)dc> is a sub sequence of <a(abc)
(ac)d(cf)>
• Given a support threshold min_sup =2, <(ab)c> is a sequential pattern
7
Sequential pattern mining algorithms

• Algorithm requirements: Efficient, Scalable, Finding complete set, Incorporating


various kinds of user-specific constraints
• The Apriori property: if a subsequence s1 is infrequent, none of its super-sequences
can be frequent
• Representative algorithms
• GSP(Generalized sequential patterns)
• SPADE
• PrefixSpan
• Clospan 8
Apriori algorithm - Example

9
GSP Py
• Generalized Sequence Pattern (GSP) algorithm in Python
• Install it with pip:
pip install gsppy
• To use it in a project, import it and use the GSP class.
from gsppy.gsp import GSP
• It is assumed that your transactions are a sequence of sequences representing
items in baskets.
10
transactions = [ ['Bread', 'Milk'], ['Bread', 'Diaper', 'Beer', 'Eggs'],
['Milk', 'Diaper', 'Beer', 'Coke'], ['Bread', 'Milk', 'Diaper', 'Beer'],
['Bread', 'Milk', 'Diaper', 'Coke'] ]

• Init the class to prepare the transactions and to find patterns in baskets that occur over the
support threshold (count):

result = GSP(transactions).search(0.3)

11
Applications
• Sales transactions
• Credit card transactions
• Banking services
• Insurance service products
• Telecommunication services
• Medical records

12
References
• https://www.youtube.com/watch?v=GhEteXWNIXc
• http://hanj.cs.illinois.edu/cs412/bk3/7_sequential_pattern_mining.pdf
• SEQUENTIAL DATA MINING FOR BUSINESS STATISTIC ANALYSIS: IJCSET –
Volume 3, Issue 2 – February 2017.
• Sequential Pattern Mining – Approaches and Algorithms: ACM Journal Name, Vol. V,
No. N, M 20YY, Pages 1–46.
• https://www.javatpoint.com/data-mining

13
Thank you

14

You might also like