Market Basket Analysis

Wake Forest University | School of Business
Marketing Analytics
BAN 6065
Professor Jia Li
Spring 2021
Last Time
• Competitor Analytics
– Perceptual Mapping
1
This Week’s Agenda
• Product Analytics
– Recommendation System
– Synchronous Lecture: Market Basket Analysis

– Asynchronous Lecture: Collaborative Filtering
Customers tend to buy things

together…
2
What is Market Basket Analysis (MBA)?
• Relationships through associations
Market-Basket (transactions)
TID Items
Examples of discovered
1 Bread, Milk patterns:
2 Bread, Diaper, Beer, Eggs •{Bread} {Milk}
3 Milk, Diaper, Beer, Coke •{Diaper} → {?}

4 Bread, Milk, Diaper, Beer •{Milk, Bread} → {?}
5 Bread, Milk, Diaper, Coke
• In summary, we want to know that, in a shopping basket, “what product

likely goes with what other product?”
5
A popular “data mining legend”

• when grocery shoppers bought diapers on Thursdays
and Saturdays,
t they also tended to buy beer.
– The rationale for this result is that your families often “stock
up for a weekend at home with their babies.”
– The legend then states that Wal-Mart increased profits on
Fridays by placing alcoholic beverage in the baby section of
the store.
• The story is untrue!
3
Why Do We Care?
• Product placement
• Whole Foods: next to flowers are birthday cards
• Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of
also purchasing one of three types of candy bars (Forbes)
• Recommendations
• Amazon.com: as you are looking at HDTVs, you might also want HDMI
cables
• Bundling
• E.g., travel “packages” – flight, hotel, car
• Other Applications
• Price discrimination
• Website / catalog design
• Fraud detection (multiple suspicious insurance claims)
• Medical complications (based on combinations of treatments)
7
Association Rules
• Definition
• Given a transactional database (set of transactions), find
rules that
predict the occurrence of an item
based on the occurrences of other items in the
database.
• Implication means co-occurrence, not causality
4
Rule Format
• IF {set of items} ⇒ THEN {set of items}
• Example: If {diapers} ⇒ then {beer}
• “IF” part: Antecedent or Body of the rule

• “THEN” part: Consequent or Head of the rule
• “Item set” = the items (e.g., products) comprising the antecedent or

consequent
• Antecedent and consequent are disjoint (i.e., have no items in common)
Many Rules are Possible

Market-Basket (transactions)
Consider example to the right:
TID Items
Transaction 2 supports several rules, such as 1 Bread, Milk

2 Bread, Diaper, Beer, Eggs
– “If bread, then diaper”
3 Milk, Diaper, Beer, Coke
– “If beer, then diaper”
4 Bread, Milk, Diaper, Beer
– “If bread and beer, then eggs” 5 Bread, Milk, Diaper, Coke
– + many more …
A brute-force approach can be prohibitively expensive

It is proved that if a dataset contains n items, the total # of
possible rules is: R = 3n – 2n+1 + 1
e.g., If there are 100 products, then there are a 5.153775e+47
possible sets to explore 10
5
Frequent Item Sets
• Ideally, we want to create all possible
combinations of items
• Problem: computation time grows exponentially

as # of items increases
• Solution: consider only “frequent item sets”

• Criterion for “frequent”: support
11
Support
# of transactions containing all items in antecedent and consequent
support =
# of transactions in database
TID Items
• Support for {bread} ⇒ {diapers} is 2/4
1 Bread, Milk, Diapers
• In other words, 50% of transactions include this
2 Bread, Diapers
pair of items
3 Bread, Beer
4 Milk, Eggs, Shoes
• Support quantifies the significance of the co-occurrence of the items

involved in a rule
• In practice, we only care about item sets with strong enough support
12
6
Exercise on Support
• What is the support of {white}?

• What is the support of {red, white}?
13
Measuring the Performance of

Association Rules
• Criterion for performance: confidence
# of transactions containing items in antecedent and consequent

confidence =
# of transactions containing items in antecedent
TID Items
• Confidence for {bread} ⇒ {diapers} is 2/3
2 Bread, Diapers • In other words: conditional on that the
3 Bread, Beer basket contains bread, there is probability
4 Milk, Eggs, Shoes 2/3 that the same basket also contains
diapers
14
7
Exercise on Confidence
TID Items
• Confidence for {diapers}
⇒{bread} is ?
2 Bread, Diapers
3 Bread, Beer
4 Milk, Eggs, Shoes
Confidence for {red, white}

⇒{green} is ?
15
Valid Association Rules

• A rule has to meet a minimum support and a minimum confidence.
• Both thresholds determined by decision maker.
• Why need both thresholds?
• Example of a rule with high confidence & low support:

• A cell phone company database contains all call destinations for
each account
• {Germany} ⇒ {France, Belgium} with confidence = 100%
• Support is 1 of 100K accounts.
16
8
Valid Association
• Suppose we use: Minimum support: 50%
Rules
• Minimum confidence: 50%
• Check Support:
TID Items Frequent Pattern Support
1 Bread, Milk, Diapers {Bread} 75%

2 Bread, Diapers {Milk} 50%
3 Bread, Beer {Diapers} 50%
4 Milk, Eggs, Shoes {Bread, Diapers} 50%
• Check Rules – only two survives:

1. Bread ⇒ Diapers (Support = 50%, Confidence = 66.67%)
2. Diapers ⇒ Bread (Support = 50%, Confidence = 100%)
17
Is a “valid rule” always a “good

rule”?
Coffee NOT Coffee
Tea 15 5 20
Consider:
NOT Tea 75 5 80 Tea ⇒ Coffee
90 10 100
• Confidence = #(Coffee and Tea)/#(Tea) = 15/20 = 75%

• i.e., the probability that someone who has bought tea will also buy
coffee, is 75%.
• Seems good?
18
9
Caveat About Confidence
Coffee NOT Coffee
Tea 15 5 20
NOT Tea 75 5 80 Tea ⇒ Coffee
90 10 100
• Recall that Confidence = #(Coffee∪Tea)/#(Tea) = 15/20 = 75%
• But, P(Coffee) = #(Coffee) /100 = 90/100 = 90%

• i.e., the probability that someone would have bought coffee is 90%
• So, given that tea has been bought, the probability of buying
coffee has dropped.
• Although confidence is high, rule is misleading!
• In fact, the confidence of “NOT Tea ⇒ Coffee” is 75/80 = 93.75%

19
Another Performance Measure: Lift

• The lift of a rule measures how much more likely the consequent is,
given the antecedent
confidence of the rule P (Y | X )

lift = =
support of the consequent P (Y )
Coffee NOT Coffee

Tea 15 5 20
Tea ⇒ Coffee
NOT Tea 75 5 80
90 10 100
• Confidence is 75%
• Support of Coffee is 90%
• Lift = 0.75/0.9 = 0.833 < 1 ⇒ this rule is worse than not having
any rule. 20
10
More on Lift
• Another example: {diapers} ⇒ {beer}
• # of customers in database: 1000
• # of customers buying diapers: 200
• # of customers buying beer: 50
• # of customers buying diapers & beer: 20
• Confidence: 20/200 = 0.1 (or 10%)

• Support of consequent: 50/1000 = 0.05 (or 5%)
• Lift: 0.1/0.05 = 2 > 1 ⇒ This rule says that, if a customer buys

diapers, she/he is twice likely to also buy beer (than her/his chance
of buying beer if we know nothing about her/him).
21
Exercise on Lift
Lift for {red, white} ⇒{green}
is ?
22
11
Data Exercise
> install.packages("arules")
> library(arules)
# A grocery transaction dataset
>data("Groceries")
> summary(Groceries)
# inspect: display associations and transactions in readable form
> inspect(head(Groceries, 3))
> ar <- apriori(Groceries, parameter=list(supp=0.01, conf=0.3,

target="rules"))
23
Data Exercise
> ar <- apriori(Groceries, parameter=list(supp=0.01, conf=0.3,
target="rules"))
24
12
Data Exercise
> inspect(subset(ar, lift>3))
25
Visualizing Association Rules

> install.packages(“arulesViz")
> library(arulesViz)
> ar5 <- head(sort(ar, by="lift"), 5)

> plot(ar5, method="graph", control=list(type="items"))
26
13
Things to Do
• Group Assignment 2
– Due March 7 at 11:59pm EST
• Help Session
– March 5 9:30am – 11:00am EST
14

Market Basket Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Market Basket Analysis

Uploaded by

Copyright:

Available Formats

Wake Forest University | School of Business

– Synchronous Lecture: Market Basket Analysis

Customers tend to buy things

2 Bread, Diaper, Beer, Eggs •{Bread} {Milk}

3 Milk, Diaper, Beer, Coke •{Diaper} → {?}

• In summary, we want to know that, in a shopping basket, “what product

A popular “data mining legend”

• Implication means co-occurrence, not causality

• “IF” part: Antecedent or Body of the rule

• “Item set” = the items (e.g., products) comprising the antecedent or

• Antecedent and consequent are disjoint (i.e., have no items in common)

Many Rules are Possible

Transaction 2 supports several rules, such as 1 Bread, Milk

A brute-force approach can be prohibitively expensive

• Problem: computation time grows exponentially

• Solution: consider only “frequent item sets”

• Support quantifies the significance of the co-occurrence of the items

• What is the support of {white}?

Measuring the Performance of

# of transactions containing items in antecedent and consequent

Confidence for {red, white}

Valid Association Rules

• Why need both thresholds?

• Example of a rule with high confidence & low support:

1 Bread, Milk, Diapers {Bread} 75%

• Check Rules – only two survives:

Is a “valid rule” always a “good

• Confidence = #(Coffee and Tea)/#(Tea) = 15/20 = 75%

• But, P(Coffee) = #(Coffee) /100 = 90/100 = 90%

• In fact, the confidence of “NOT Tea ⇒ Coffee” is 75/80 = 93.75%

Another Performance Measure: Lift

confidence of the rule P (Y | X )

Coffee NOT Coffee

• Confidence: 20/200 = 0.1 (or 10%)

• Lift: 0.1/0.05 = 2 > 1 ⇒ This rule says that, if a customer buys

> ar <- apriori(Groceries, parameter=list(supp=0.01, conf=0.3,

Visualizing Association Rules

> ar5 <- head(sort(ar, by="lift"), 5)

You might also like