You are on page 1of 67

Business Analytics

Today Objective

Association Mining (unsupervised Learning)

Lift Value Calculation, Market Basket Analysis(support and


confidence ) ,Super market design, Association
Mining(apriori)
Indian Institute of Management (IIM),Rohtak
Let's, take the case of a baby and her family dog.

She knows and identifies


this dog.

Few weeks later a family


friend brings along a dog
and tries to play with the
baby.
Baby has not seen this dog earlier. But it
recognizes many features (2 ears, eyes,
walking on 4 legs) are like her pet dog. She
identifies the new animal as a dog. This is
unsupervised learning, where you are not
taught but you learn from the data (in this
case data about a dog)

Indian Institute of Management (IIM),Rohtak


Indian Institute of Management (IIM),Rohtak
Basic Rule
A `rule’ is something like this:
If a basket contains Bread and Butter , then it also contains Milk
Any such rule has two associated measures:
1. confidence – when the `if’ part is true, how often is the `then’ bit
true? This is the same as accuracy.
#_ _ _ _ _ _
Confidence (A )
#_ _ _
2. coverage or support – how much of the database contains
#_ _ _ _ _ _
support(A B) =
_#_ _

Indian Institute of Management (IIM),Rohtak


Rule Measures: Support & Confidence
Transaction ID Items Bought
1 Trouser, Shirt, Jacket
2 Trouser,Jacket
3 Trouser, Jeans
4 Shirt, Sweatshirt
If the minimum support is 50%, then {Trouser, Jacket} is the only 2- itemset
that satisfies the minimum support.
Frequent Itemset Support
{Trouser} 75%
{Shirt} 50%
{Jacket} 50%
{Trouser, Jacket} 50%

If the minimum confidence is 50%, then the only two rules generated
from this 2-itemset, that have confidence greater than 50%, are:
Trouser  Jacket Support=50%, Confidence=66%
Jacket  Trouser Support=50%, Confidence=100%
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Lift
Lift (x => y) is nothing but the ‘interestingness’ or the likelihood
of the item y being purchased when the item x is sold. Unlike
confidence (x => y), this method takes into account the
popularity of the item y.
Lift= support(X & Y)/Support(X)*Support(Y)

•Lift (x => y) = 1 means that there is no correlation within the


itemset.
•Lift (x => y) > 1 means that there is a positive correlation within
the itemset, i.e., products in the itemset, x and y, are more likely
to be bought together.
•Lift (x => y) < 1 means that there is a negative correlation
within the itemset, i.e., products in itemset, x and y, are unlikely
to be bought together.

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

For the superstore data, the lift for meat and vegetables would equal:

Lift(A→B) = (Confidence (A→B))/(Support (B))

Indian Institute of Management (IIM),Rohtak


Grehasthi Grocery store

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Computing Lift for Two Products
A two-way product lift therefore is simply a lift involving two
products and can easily be computed in Excel. It can be
generalized to situations involving the computation of lifts
involving more than two items or other transaction attributes
(such as day of week). To practice computing lift, you’ll use
the superstore transaction data in the file
marketbasket.xls. Data shows a subset of the data. The day
of the week is denoted by 1 = Monday, 2 = Tuesday … 7 =
Sunday. For example, the first transaction represents a person
who bought vegetables, meat, and milk on a Friday.(look you
excel)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Use file :marketbasketothreeway.xlsx


Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

=VLOOKUP(N12,$J$9:$K$14,2)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

=VLOOKUP(O12,$J$9:$K$14,2)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Write in M17 =COUNTIF(day_week,K17)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
You can use the same concept to compute for the superstore data the lift
of an arbitrary combination of two products and a day of the week.

Complete the following steps:


1. In cell Q14 use the array formula
=SUM((INDIRECT(P13)=$P$14)*(IND
IRECT(N13)=1)*(INDIRECT(O13)=1))
to compute the actual number of transactions involving vegetables and
baby goods on Friday. This formula computes three arrays:
■ An array containing a 1 if the day of the week matches the number in
P14 (here a 5) and a 0 otherwise.
■ An array containing a 1 if the vegetables column contains a 1 and 0
otherwise.
■ An array containing a 1 if the baby column contains a 1 and 0
otherwise.
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
In cell R14 compute the predicted number of transactions involving
baby goods and vegetables purchased on Friday with the following
formula:
IF(N13<>O13,VLOOKUP(N13,K9:L14,2,FA
LSE)*L7*VLOOKUP(O13,K9:L14,2,FALSE
)*VLOOKUP(P14,K17:L23,2),0)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

IF(N13<>O13,VLOOKUP(N13,K9:L14,2,FALSE)*L7*VLOOKUP(
O13,K9:L14,2,FALSE)*VLOOKUP(P14,K17:L23,2),0)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

If you enter the same product class twice, this formula yields
a 0. Otherwise, multiply (total number of transactions) *
(fraction of baby transactions) * (fraction of vegetable
transactions) * (fraction of Friday transactions).
This gives a predicted number of Monday meat and vegetable
transactions (assuming independence).
Finally, in cell S14, compute the lift with the formula
=IF(R14=0,1,Q14/R14).
The lift for vegetables and baby goods on Friday is .85. This
means that on Fridays vegetables and baby goods are bought
together less frequently than expected.

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

=IF(R14=0,1,Q14/R14)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Optimizing the Three-Way Lift
In an actual situation with many products, there would be a huge
number of three way lifts. For example, with 1,000 products, you can
expect 1,0003 = 1 billion three way lifts! Despite this, a retailer is often
interested in finding the largest three-way lifts. Intelligent use of the
Evolutionary Solver can ease this task. To illustrate the basic idea, you
can use the Evolutionary Solver to determine the combination of
products and day of the week with maximum lift.

Use Evolutionary Solver with the changing cells being the day of the
week (cell P14) and an index reflecting the product classes (cells N12
and O12). Cells N12 and O12 are linked with lookup tables to cells
N13:O13. For instance, a 1 in cell N12 makes N13 be vegetables.

Indian Institute of Management (IIM),Rohtak


Excel Solver

Goal of optimization is to find


values of the design variables that
maximize or minimize an
objective function, possibly with
constraints on the design
variables.

Indian Institute of Management (IIM),Rohtak


Excel Solver

Indian Institute of Management (IIM),Rohtak


Excel Solver
Load the Solver Add-in
To load the solver add-in, execute the following steps.
1. On the File tab, click Options.
2. Under Add-ins, select Solver Add-in and click on the Go
button.

Indian Institute of Management (IIM),Rohtak


Excel Solver
3. Check Solver Add-in and click OK.

4. You can find the Solver on


the Data tab, in the Analyse
group.

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Maximize lift (S14), and then choose N12


and O12 (product classes) to be integers
between 1 and 6.
P14 is an integer between 1 and 7.
Add a constraint that Q14 >= 20 to ensure
you count only combinations that occur a
reasonable number of times.

Indian Institute of Management (IIM),Rohtak


Excel Solver

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

The three-way lift, as shown


in PREVIOUS SLIDE indicates that
roughly 6.32 times more people, as
expected under an independence
assumption, buy DVDs and baby goods
on Thursday. This indicates that on
Thursdays placing DVDs (often an
impulse purchase) in the baby sections
will increase profits.
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

As you learned at the beginning of this class, handbags and


makeup are often purchased to gather. This suggests that to
maximize revenues a store should be laid out so products
with high lift are placed near each other. Given a lift
matrix for different product categories, you can use the
Evolutionary Solver to locate product categories to
maximize the total lift of proximate product categories. To
illustrate the idea, consider a grocery store that stocks the
six product categories shown in file name
marketlayout.xlsx. In rows 8 through 13, the two-way
lifts are shown.
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
In the cell range G16:I17(Yellow color), determine
the locations of the product categories that maximize
the lifts of adjacent product categories. Assume that
customers can travel only in a north–south or east–
west direction. This assumption is reasonable
because any two store aisles are either parallel or
perpendicular to each other. This implies, for
example, that location A1 is adjacent to A2 and B1,
whereas location A2 is adjacent to B2, A1, and A3.
Enter a trial assignment of product categories to
locations in cell range G16:I17.
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

=VLOOKUP(G16,$E$8:$F$13,2,FALSE)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

In cell range G21:G26, compute the lift for


adjacent products for each location. For
example, in cell G21 the
formula =INDEX(lifts,G17,G16)+INDEX(lifts,
G17,H17) adds (Lift for products assigned to
A1 and B1) + (Lift for products assigned to A1
and A2). This gives the total lift for products
adjacent to A1. In cell G27 the
formula =SUM(G21:G26) calculates the total
lift generated by adjacent product categories.
parallel or perpendicular to each other
Indian Institute of Management (IIM),Rohtak
Market Basket Analysis

=INDEX(lifts,G17,G16)+INDEX(lifts,G17,H17)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
=INDEX(lifts,H17,G17)+INDEX(lifts,H17,H16)+INDEX(lifts,H17,I17)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
=INDEX(lifts,I17,I16)+INDEX(lifts,I17,H17)

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

No repetition

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis
Use the Solver window, to find the store layout that maximizes the
total lift for adjacent product categories.

Indian Institute of Management (IIM),Rohtak


Market Basket Analysis

Indian Institute of Management (IIM),Rohtak


Association Rule Procedure

Indian Institute of Management (IIM),Rohtak


The Apriori Procedures
The Apriori Method is an influential method for
mining frequent item sets.

Key Concepts :
• Frequent Itemsets: The sets of item
which has minimum support (denoted
by Li for ith-Itemset).
• Join Operation: To find Lk , a set of
candidate k-itemsets is generated by
joining Lk with itself.
• Apriori Property: Any subset of
frequent itemset must be frequent.
Indian Institute of Management (IIM),Rohtak
Thank you !!!
Indian Institute of Management (IIM),Rohtak

You might also like