You are on page 1of 14

Wake Forest University | School of Business

Marketing Analytics
BAN 6065
Professor Jia Li
Spring 2021

Last Time
• Competitor Analytics
– Perceptual Mapping

1
This Week’s Agenda

• Product Analytics
– Recommendation System

– Synchronous Lecture: Market Basket Analysis


– Asynchronous Lecture: Collaborative Filtering

Customers tend to buy things


together…

2
What is Market Basket Analysis (MBA)?
• Relationships through associations
Market-Basket (transactions)
TID Items
Examples of discovered
1 Bread, Milk patterns:

2 Bread, Diaper, Beer, Eggs •{Bread} {Milk}

3 Milk, Diaper, Beer, Coke •{Diaper} → {?}


4 Bread, Milk, Diaper, Beer •{Milk, Bread} → {?}
5 Bread, Milk, Diaper, Coke

• In summary, we want to know that, in a shopping basket, “what product


likely goes with what other product?”
5

A popular “data mining legend”


• when grocery shoppers bought diapers on Thursdays
and Saturdays,
t they also tended to buy beer.
– The rationale for this result is that your families often “stock
up for a weekend at home with their babies.”
– The legend then states that Wal-Mart increased profits on
Fridays by placing alcoholic beverage in the baby section of
the store.
• The story is untrue!

3
Why Do We Care?
• Product placement
• Whole Foods: next to flowers are birthday cards
• Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of
also purchasing one of three types of candy bars (Forbes)

• Recommendations
• Amazon.com: as you are looking at HDTVs, you might also want HDMI
cables

• Bundling
• E.g., travel “packages” – flight, hotel, car

• Other Applications
• Price discrimination
• Website / catalog design
• Fraud detection (multiple suspicious insurance claims)
• Medical complications (based on combinations of treatments)
7

Association Rules
• Definition
• Given a transactional database (set of transactions), find
rules that
predict the occurrence of an item
based on the occurrences of other items in the
database.

• Implication means co-occurrence, not causality

4
Rule Format
• IF {set of items} ⇒ THEN {set of items}
• Example: If {diapers} ⇒ then {beer}

• “IF” part: Antecedent or Body of the rule


• “THEN” part: Consequent or Head of the rule

• “Item set” = the items (e.g., products) comprising the antecedent or


consequent

• Antecedent and consequent are disjoint (i.e., have no items in common)

Many Rules are Possible


Market-Basket (transactions)
Consider example to the right:
TID Items

Transaction 2 supports several rules, such as 1 Bread, Milk


2 Bread, Diaper, Beer, Eggs
– “If bread, then diaper”
3 Milk, Diaper, Beer, Coke
– “If beer, then diaper”
4 Bread, Milk, Diaper, Beer
– “If bread and beer, then eggs” 5 Bread, Milk, Diaper, Coke
– + many more …

A brute-force approach can be prohibitively expensive


It is proved that if a dataset contains n items, the total # of
possible rules is: R = 3n – 2n+1 + 1
e.g., If there are 100 products, then there are a 5.153775e+47
possible sets to explore 10

5
Frequent Item Sets
• Ideally, we want to create all possible
combinations of items

• Problem: computation time grows exponentially


as # of items increases

• Solution: consider only “frequent item sets”


• Criterion for “frequent”: support

11

Support
# of transactions containing all items in antecedent and consequent
support =
# of transactions in database

TID Items
• Support for {bread} ⇒ {diapers} is 2/4
1 Bread, Milk, Diapers
• In other words, 50% of transactions include this
2 Bread, Diapers
pair of items
3 Bread, Beer
4 Milk, Eggs, Shoes

• Support quantifies the significance of the co-occurrence of the items


involved in a rule
• In practice, we only care about item sets with strong enough support

12

6
Exercise on Support

• What is the support of {white}?


• What is the support of {red, white}?

13

Measuring the Performance of


Association Rules
• Criterion for performance: confidence

# of transactions containing items in antecedent and consequent


confidence =
# of transactions containing items in antecedent

TID Items
• Confidence for {bread} ⇒ {diapers} is 2/3
1 Bread, Milk, Diapers
2 Bread, Diapers • In other words: conditional on that the
3 Bread, Beer basket contains bread, there is probability
4 Milk, Eggs, Shoes 2/3 that the same basket also contains
diapers

14

7
Exercise on Confidence
TID Items
• Confidence for {diapers}
1 Bread, Milk, Diapers
⇒{bread} is ?
2 Bread, Diapers
3 Bread, Beer
4 Milk, Eggs, Shoes

Confidence for {red, white}


⇒{green} is ?

15

Valid Association Rules


• A rule has to meet a minimum support and a minimum confidence.
• Both thresholds determined by decision maker.

• Why need both thresholds?

• Example of a rule with high confidence & low support:


• A cell phone company database contains all call destinations for
each account
• {Germany} ⇒ {France, Belgium} with confidence = 100%
• Support is 1 of 100K accounts.

16

8
Valid Association
• Suppose we use: Minimum support: 50%
Rules
• Minimum confidence: 50%

• Check Support:
TID Items Frequent Pattern Support

1 Bread, Milk, Diapers {Bread} 75%


2 Bread, Diapers {Milk} 50%
3 Bread, Beer {Diapers} 50%
4 Milk, Eggs, Shoes {Bread, Diapers} 50%

• Check Rules – only two survives:


1. Bread ⇒ Diapers (Support = 50%, Confidence = 66.67%)
2. Diapers ⇒ Bread (Support = 50%, Confidence = 100%)

17

Is a “valid rule” always a “good


rule”?
Coffee NOT Coffee
Tea 15 5 20
Consider:
NOT Tea 75 5 80 Tea ⇒ Coffee
90 10 100

• Confidence = #(Coffee and Tea)/#(Tea) = 15/20 = 75%


• i.e., the probability that someone who has bought tea will also buy
coffee, is 75%.

• Seems good?

18

9
Caveat About Confidence
Coffee NOT Coffee
Tea 15 5 20
NOT Tea 75 5 80 Tea ⇒ Coffee
90 10 100
• Recall that Confidence = #(Coffee∪Tea)/#(Tea) = 15/20 = 75%

• But, P(Coffee) = #(Coffee) /100 = 90/100 = 90%


• i.e., the probability that someone would have bought coffee is 90%
• So, given that tea has been bought, the probability of buying
coffee has dropped.
• Although confidence is high, rule is misleading!

• In fact, the confidence of “NOT Tea ⇒ Coffee” is 75/80 = 93.75%


19

Another Performance Measure: Lift


• The lift of a rule measures how much more likely the consequent is,
given the antecedent

confidence of the rule P (Y | X )


lift = =
support of the consequent P (Y )

Coffee NOT Coffee


Tea 15 5 20
Tea ⇒ Coffee
NOT Tea 75 5 80
90 10 100
• Confidence is 75%
• Support of Coffee is 90%
• Lift = 0.75/0.9 = 0.833 < 1 ⇒ this rule is worse than not having
any rule. 20

10
More on Lift
• Another example: {diapers} ⇒ {beer}
• # of customers in database: 1000
• # of customers buying diapers: 200
• # of customers buying beer: 50
• # of customers buying diapers & beer: 20

• Confidence: 20/200 = 0.1 (or 10%)


• Support of consequent: 50/1000 = 0.05 (or 5%)

• Lift: 0.1/0.05 = 2 > 1 ⇒ This rule says that, if a customer buys


diapers, she/he is twice likely to also buy beer (than her/his chance
of buying beer if we know nothing about her/him).

21

Exercise on Lift
Lift for {red, white} ⇒{green}
is ?

22

11
Data Exercise
> install.packages("arules")
> library(arules)
# A grocery transaction dataset
>data("Groceries")

> summary(Groceries)
# inspect: display associations and transactions in readable form
> inspect(head(Groceries, 3))

> ar <- apriori(Groceries, parameter=list(supp=0.01, conf=0.3,


target="rules"))

23

Data Exercise
> ar <- apriori(Groceries, parameter=list(supp=0.01, conf=0.3,
target="rules"))

24

12
Data Exercise
> inspect(subset(ar, lift>3))

25

Visualizing Association Rules


> install.packages(“arulesViz")
> library(arulesViz)

> ar5 <- head(sort(ar, by="lift"), 5)


> plot(ar5, method="graph", control=list(type="items"))

26

13
Things to Do
• Group Assignment 2
– Due March 7 at 11:59pm EST

• Help Session
– March 5 9:30am – 11:00am EST

14

You might also like