Professional Documents
Culture Documents
Athira P- AM.BU.P2MBA20029
1
Agenda
• Predictive analytics
• Association
• Sequential data base
• Sequential pattern mining
• Algorithm
• GSP Py
• Applications
2
Predictive analytics
• Data mining
Data Mining is a process used by organizations to extract specific data from huge databases
to solve business problems. It primarily turns raw data into useful information. Builds a
model to identify a pattern among the attributes presented in the data set of customers.
• Categories
Prediction
Association
Clustering
3
Association
Finding frequent patterns, associations, correlations, or causal structures
among sets of items or objects in transaction databases, relational databases,
and other information repositories.
• Link analysis
• Sequence analysis
4
Sequence data base
• Customer shopping sequences
Purchases laptop first, then digital camera, and then smart phone in 6 months
• Medical treatments, natural disasters..
• Stocks and markets
• Biological sequences: DNA, Protein..
• Soft ware engineering: Program execution..
5
Sequential pattern mining
6
• Sequential pattern mining: Given a set of sequences, find a complete set
of frequent sub sequences (satisfying the min_sup threshold)
9
GSP Py
• Generalized Sequence Pattern (GSP) algorithm in Python
• Install it with pip:
pip install gsppy
• To use it in a project, import it and use the GSP class.
from gsppy.gsp import GSP
• It is assumed that your transactions are a sequence of sequences representing
items in baskets.
10
transactions = [ ['Bread', 'Milk'], ['Bread', 'Diaper', 'Beer', 'Eggs'],
['Milk', 'Diaper', 'Beer', 'Coke'], ['Bread', 'Milk', 'Diaper', 'Beer'],
['Bread', 'Milk', 'Diaper', 'Coke'] ]
• Init the class to prepare the transactions and to find patterns in baskets that occur over the
support threshold (count):
result = GSP(transactions).search(0.3)
11
Applications
• Sales transactions
• Credit card transactions
• Banking services
• Insurance service products
• Telecommunication services
• Medical records
12
References
• https://www.youtube.com/watch?v=GhEteXWNIXc
• http://hanj.cs.illinois.edu/cs412/bk3/7_sequential_pattern_mining.pdf
• SEQUENTIAL DATA MINING FOR BUSINESS STATISTIC ANALYSIS: IJCSET –
Volume 3, Issue 2 – February 2017.
• Sequential Pattern Mining – Approaches and Algorithms: ACM Journal Name, Vol. V,
No. N, M 20YY, Pages 1–46.
• https://www.javatpoint.com/data-mining
13
Thank you
14