You are on page 1of 12

Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.

html

1 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

2 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

3 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

df = pd.read_excel('http://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail
df.head()

4 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

df['Description'] = df['Description'].str.strip()
df.dropna(axis=0, subset=['InvoiceNo'], inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')
df = df[~df['InvoiceNo'].str.contains('C')]

basket = (df[df['Country'] =="France"]


.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))

5 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

def encode_units(x):
if x <= 0:
return 0
if x >= 1:
return 1

basket_sets = basket.applymap(encode_units)
basket_sets.drop('POSTAGE', inplace=True, axis=1)

frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)


rules.head()

6 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

apriori
association_rules

rules[ (rules['lift'] >= 6) &


(rules['confidence'] >= 0.8) ]

7 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

basket['ALARM CLOCK BAKELIKE GREEN'].sum()

340.0

basket['ALARM CLOCK BAKELIKE RED'].sum()

316.0

basket2 = (df[df['Country'] =="Germany"]


.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))

basket_sets2 = basket2.applymap(encode_units)
basket_sets2.drop('POSTAGE', inplace=True, axis=1)
frequent_itemsets2 = apriori(basket_sets2, min_support=0.05, use_colnames=True)
rules2 = association_rules(frequent_itemsets2, metric="lift", min_threshold=1)

rules2[ (rules2['lift'] >= 4) &


(rules2['confidence'] >= 0.5)]

8 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

Vote 3 Share 89

3 points

9 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

42 Comments pbpython.com 1 Login

Recommend 4 Share Sort by Best

Join the discussion

LOG IN WITH
OR SIGN UP WITH DISQUS ?

Jarad Collier 2 months ago


This is a phenomenal post! I coded a R solution because the apriori algorithm isn't extensively
ported in Python it seems. I wish mlxtend had some more of the features that R does such as:
remove redundant rules, plot a network flow graph (example on this page:
http://www.kdnuggets.com/20.... MLxtend, although minimal, is the only one I know about so
far. Anyone know any others with more depth to do market basket analysis like R ?
6 Reply Share

Chris Moffitt Mod Jarad Collier 2 months ago


I do not know of other implementations with more depth in python but I do know that
Sebastian, the maintainer of mlxtend has made several improvements to this function
based on feedback/requests from others that have read this thread. I encourage you to
reach out to him with ideas (and even better with code)!
Reply Share

Ahmed Askar a day ago


phenomenal resource. thank you
is there a way to look in to rules with frequent itemset more than 2
Reply Share

Ahmed Askar Ahmed Askar 14 hours ago


im going to answer my own question. it does support more than 2 item-set in the
antecedants and consequents columns . thank you again
Reply Share

Kailash Gopalan 23 days ago


Hi Chris ,
Are there any good graph visualizations you know of for the same described in the post ?
Kailash
Reply Share

John Nguyen 24 days ago


So when I remove the country filter to be like this:

basket = (df.groupby(['InvoiceNo', 'Description'])['Quantity']


.sum().unstack().reset_index().fillna(0)

10 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

11 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

12 de 12 01/12/2017 10:35 a. m.

You might also like