Association Rules 2 Data Mining Techniques So Far Chapter 5 Statistics Chapter 6 Decision Trees Chapter 7 Neural Networks Chapter 8 Nearest Neighbor Approaches: Memory- Based Reasoning and Collaborative Filtering 3 Questions related to Market Basket 4 What can be inferred? I purchase diapers I purchase a new car I purchase OTC cough medicine I purchase a prescription medication I dont show up for class 5 Market Basket Analysis Retail each customer purchases different set of products, different quantities, different times MBA uses this information to: Identify who customers are (not by name) Understand why they make certain purchases Gain insight about its merchandise (products): Fast and slow movers Products which are purchased together Products which might benefit from promotion Take action: Store layouts Which products to put on specials, promote, coupons Combining all of this with a customer loyalty card it becomes even more valuable 6 Association Rules DM technique most closely allied with Market Basket Analysis AR can be automatically generated AR represent patterns in the data without a specified target variable Good example of undirected data mining Whether patterns make sense is up to humanoids (us!) 7 Association Rules Apply Elsewhere Items purchased on a credit card, such as rental cars and hotel rooms, provide insight into the next product that customers are likely to purchase. Optional services purchased by telecommunications customers (call waiting, call forwarding, DSL, speed call, and so on) help determine how to bundle these services together to maximize revenue. Banking services used by retail customers (money market accounts, CDs, investment services, car loans, and so on) identify customers likely to want other services. Unusual combinations of insurance claims can be a sign of fraud and can spark further investigation. Medical patient histories can give indications of likely complications based on certain combinations of treatments. 8 Market Basket Analysis Drill-Down MBA is a set of techniques, Association Rules being most common, that focus on point-of-sale (p-o-s) transaction data 3 types of market basket data (p-o-s data) Customers Orders (basic purchase data or baskets or item sets) Items (merchandise/services purchased) 9 Typical Data Structure (Relational Database) Lots of questions can be answered Avg # of orders/customer Avg # unique items/order Avg # of items/order For a product What % of customers have purchased Avg # orders/customer include it Avg quantity of it purchased/order Visualization is extremely helpful next slide Transaction Data 10 Combining data These measures give broad insight into the business. In some cases, there are few repeat customers, so the proportion of orders per customer is close to 1. This suggests a business opportunity to increase the number of sales per customers. Or, the number of products per order may be close to 1, suggesting an opportunity for cross-selling during the process of making an order. It can be useful to compare these measures to each other. 11 Questions about ... Sales Order Characteristics Item Popularity Tracking Marketing Interventions Clustering Products by Usage 12 Sales Order Characteristics Customer purchases have additional interesting characteristics. For instance, the average order size varies by time and region For Web purchases and mail-order transactions, additional information may also be gathered at the point of sale: Did the order use gift wrap? Is the order going to the same address as the billing address? Did the purchaser accept or decline a particular cross-sell offer? 13 Item popularity What is the most common item found on a one-item order? What is the most common item found on a multi-item order? What is the most common item for repeat customer purchases? How has ordering of an item changed over time? How does the ordering of an item vary geographically? 14 Tracking Marketing Interventions Including marketing interventions along with the product sales over time makes it possible to see the effect of the interventions. Prior to the intervention, sales are hovering at 50 units / week. After the intervention, they peak at 7-8 times that amount. A challenge in answering this question is determining whether the additional sales are incremental or are made by customers who would purchase the product anyway at some later time. We can also look at the number of baskets containing the item. If the number of customers is not increasing, there is evidence that existing customers are simply stocking up on the item at a lower cost. 15 Clustering Products by Usage What groups of products often appear together? Such groups of products are very useful for making recommendations to customerscustomers who have purchased some of the products may be interested in the rest of them A lot of information available about products. In addition to the product hierarchy, such information includes the color of clothes, whether food is low calorie, whether a poster includes a frame, and so on Questions: Do diet products tend to sell together? Are customers purchasing similar colors of clothing at the same time? Do customers who purchase framed posters also buy other products? 16 Pivoting for Cluster Algorithms 17 Association Rules Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of also purchasing one of three types of candy bars Customers who purchase maintenance agreements are very likely to purchase large appliances When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaners So what 18 Famous Rules: Beer & Diapers 19 Famous Rules: Beer & Diapers WHY? Beer drinkers do not want to interrupt their enjoyment of televised sports, so they buy diapers to reduce trips to the bathroom. No, thats not it. Families with young children are preparing for the weekend. What can a retailer do with this information? Put the beer and diapers close together, so when one is purchased, customers remember to buy the other one. Put them as far apart as possible, so opportunity to buy yet more items. Put higher-margin diapers a bit closer to the beer, although mixing baby products and alcohol would probably be unseemly. 20 Association Rules If buy Diaper Buy Beer Then If buy Beer, Diaper Buy Cheese, Chocolate Then Shoppers who buy Diaper are very likely to buy Beer. Shoppers who buy Beer and Diaper are likely to buy Cheese and Chocolate Examples: For a frequent itemset {Diaper, Beer}, is Diaper promoting the purchase of Beer, or Beer increasing the chance of Diaper purchase? We need directions. 21 Association Rules Rule format: If {set of items} Then {set of items} LHS implies RHS * If {Diaper, Baby Food} {Beer, Wine} Then LHS RHS An association rule is valid if it satisfies some evaluation measures * RHS = "Right Hand Side LHS = "Left Hand Side 22 Association Rules Association rule types: Actionable Rules contain high-quality, actionable information Trivial Rules information already well-known by those familiar with the business Results from market basket analysis may simply be measuring the success of previous marketing campaigns Inexplicable Rules no explanation and do not suggest action Trivial and Inexplicable Rules occur most often 23 Milk & Wine co-occur But Only 2 out of 200K transactions contain these items Rule Evaluation Transaction No. Item 1 Item 2 Item 3 100 Beer Diaper Chocolate 101 Milk Chocolate Wine 102 Beer Wine Vodka 103 Beer Cheese Diaper 104 Ice Cream Diaper Beer . 24 Support: The frequency in which the items in LHS and RHS co-occur. E.g., The support of the {Diaper} {Beer} rule is 3/5: 60% of the transactions contain both items. No. of transactions containing items in LHS and RHS Total No. of transactions in the dataset Support = Transaction No. Item 1 Item 2 Item 3 100 Beer Diaper Chocolate 101 Milk Chocolate Shampoo 102 Beer Wine Vodka 103 Beer Cheese Diaper 104 Ice Cream Diaper Beer Rule Evaluation Support 25 Rule Evaluation - Confidence Is Beer leading to Diaper purchase or Diaper leading to Beer purchase? Among the transactions with Diaper, 100% have Beer. P(Beer|Diaper)=100% Among the transactions with Beer, 75% have Diaper. P(Diaper|Beer)=75% Confidence = Transaction No. Item 1 Item 2 Item 3 100 Beer Diaper Chocolate 101 Milk Chocolate Shampoo 102 Beer Wine Vodka 103 Beer Cheese Diaper 104 Ice Cream Diaper Beer No. of transactions containing both LHS and RHS No. of transactions containing LHS confidence for {Diaper} {Beer} : 3/3 When Diaper is purchased, the likelihood of Beer purchase is 100% confidence for {Beer} {Diaper} : 3/4 When Beer is purchased, the likelihood of Diaper purchase is 75% So, {Diaper} {Beer} is a more important rule according to confidence. 26 Rule Evaluation - Lift Transaction No. Item 1 Item 2 Item 3 Item 4 100 Beer Diaper Chocolate 101 Milk Chocolate Shampoo 102 Beer Milk Vodka Chocolate 103 Beer Milk Diaper Chocolate 104 Milk Diaper Beer Whats the support and confidence for rule {Chocolate}{Milk}? Support = 3/5 Confidence = 3/4 Very high support and confidence. Does Chocolate really lead to Milk purchase? No! Because Milk occurs in 4 out of 5 transactions. Chocolate is even decreasing the chance of Milk purchase 3/4 < 4/5, i.e. P(Milk|Chocolate)<P(Milk) Lift = (3/4)/(4/5) = 0.9375 < 1 When lift > 1 then the rule is better at predicting the result than guessing When lift < 1, the rule is doing worse than informed guessing and using the Negative Rule produces a better rule than guessing 27 Rule Evaluation Lift (cont.) Measures how much more likely is the RHS given the LHS than merely the RHS Lift = confidence of the rule / probability of the RHS i.e. = P(RHS|LHS)/P(RHS) Example: {Diaper} {Beer} Total number of customer in database: 1000 No. of customers buying Diaper: 200 No. of customers buying beer: 50 No. of customers buying Diaper & beer: 20 Probability of Beer = 50/1000 (5%) Confidence = 20/200 (10%) Lift = 10%/5% = 2 Lift higher than 1 implies people have higher change to buy Beer when they buy Diaper. Lift lower than 1 implies people have lower change to buy Milk when they buy Chocolate. 28 Rule Evaluation Practical Impact Most methods for extracting association rules find too many trivial rules. Most are either obvious and uninteresting. Example: If Maternity Ward then patient is a woman. Confidence 100%, support 100% Need to screen for rules that are of particular interest and significance. Actionable: Keep only rules that can be acted upon. Interestingness: Various measures for how surprising or unexpected a rule is. Example: A rule is interesting if it contradicts what is currently known (e.g., it contradicts a rule that was previously discovered). 29 Creating Association Rules 1. Choosing the right set of items 2. Generating rules by deciphering the counts in the co-occurrence matrix 3. Overcoming the practical limits imposed by thousands or tens of thousands of unique items 30 Creating Association Rules 31 Creating Association Rules Choosing the right set of items Within a grocery store where there are tens of thousands of products on the shelves, a frozen pizza might be considered an item for analysis purposes, regardless of its toppings (extra cheese, pepperoni, or mushrooms), its crust (extra thick, whole wheat, or white), or its size. On the other hand, the manager of frozen foods or a chain of pizza restaurants may be very interested in the particular combinations of toppings that are ordered. 32 Creating Association Rules Choosing the right set of items What level of the product hierarchy is the right one to use? Market basket analysis produces the best results when the items occur in roughly the same number of transactions in the data. This helps prevent rules from being dominated by the most common items. Product hierarchies can help here. Roll up rare items to higher levels in the hierarchy, so they become more frequent. More common items may not have to be rolled up at all. 33 Creating Association Rules Generating rules by deciphering the counts in the co-occurrence matrix if condition, then result. if Barbie doll, then candy bar = if a customer purchases a Barbie doll, then the customer is also expected to purchase a candy bar. Saying that the rule if B and C then A has a confidence of 0.33 is equivalent to saying that when B and C appear in a transaction, there is a 33 percent chance that A also appears in it. 34 Creating Association Rules Overcoming the practical limits imposed by thousands or tens of thousands of unique items 1. Generate co-occurrence matrix for single itemsif OJ then soda 2. Generate co-occurrence matrix for two itemsif OJ and Milk then soda 3. Generate co-occurrence matrix for three itemsif OJ and Milk and Window Cleaner then soda 4. And so on 35 Algorithm to Extract Association Rules The standard algorithm: Apriori Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 The Association Rules problem was defined as: Generate all association rules that have support greater than the user-specified minimum support and confidence greater than the user-specified minimum confidence the base algorithm uses support and confidence, but we can also use lift to rank the rules discovered by Apriori. The algorithm performs an efficient search over the data to find all such rules. 36 Finding Association Rules from Data Association rules discovery problem is decomposed into two sub-problems: 1. Find all sets of items (itemsets) whose support is above minimum support - called frequent itemsets or large itemsets 2. From each frequent itemset, generate rules whose confidence is above minimum confidence. Given a large itemset Y, and X is a subset of Y Calculate confidence of the rule X (Y - X) If its confidence is above the minimum confidence, then X (Y - X) is an association rule we are looking for. 37 Example A data set with 5 transactions Minimum support = 40%, Minimum confidence = 80% Phase 1: Find all frequent itemsets {Beer} (support=80%), {Diaper} (60%), {Chocolate} (40%) {Beer, Diaper} (60%) Transaction No. Item 1 Item 2 Item 3 100 Beer Diaper Chocolate 101 Milk Chocolate Shampoo 102 Beer Wine Vodka 103 Beer Cheese Diaper 104 Ice Cream Diaper Beer Beer Diaper (conf. 34= 75%) Diaper Beer (conf. 33= 100%) Phase 2: 38 A naive way is to calculate the support for every possible itemset. 2 N possible itemsets given N items impossible to do! Need smart method: frequent itemsets of size n contain itemsets of size n- 1 that also must be frequest Example: if {diaper, beer} is frequent then {diaper} and {beer} are each frequent as well This means that If an itemset is not frequent (e.g., {wine}) then no itemset that includes wine can be frequent either, such as {wine, beer} . We therefore first find all itemsets of size 1 that are frequent. Then try to expand these by counting the frequency of all itemsets of size 2 that include frequent itemsets of size 1. Example: If {wine} is not frequent we need not try to find out whether {wine, beer} is frequent. But if both {wine} & {beer} were frequent then it is possible (though not guaranteed) that {wine, beer} is also frequent. Then take only itemsets of size 2 that are frequent, and try to expand those, etc. Phase 1: Finding all frequent itemsets How to perform an efficient search of all frequent itemsets? 39 Assume {Milk, Bread, Butter} is a frequent itemset. Using items contained in the itemset, list all possible rules {Milk} {Bread, Butter} {Bread} {Milk, Butter} {Butter} {Milk, Bread} {Milk, Bread} {Butter} {Milk, Butter} {Bread} {Bread, Butter} {Milk} Calculate the confidence of each rule Pick the rules with confidence above the minimum confidence Support {Milk, Bread, Butter} Support {Milk} No. of transaction that support {Milk, Bread, Butter} No. of transaction that support {Milk} = Phase 2: Generating Association Rules Confidence of {Milk} {Bread, Butter}: 40 Agrawal (94)s Apriori Algorithm - An Example Transactions 1 st scan C 1 L 1 L 2 C 2 C 2 2 nd scan C 3 L 3 3 rd scan T-ID Items 10 A, C, D 20 B, C, E 30 A, B, C, E 40 B, E Itemset sup {A} 2 {B} 3 {C} 3 {D} 1 {E} 3 Itemset sup {A} 2 {B} 3 {C} 3 {E} 3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemset sup {A, B} 1 {A, C} 2 {A, E} 1 {B, C} 2 {B, E} 3 {C, E} 2 Itemset sup {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 Itemset {B, C, E} Itemset sup {B, C, E} 2 {A,B,C}, {A, C, E}? 41 The number of combinations with n items is proportional to the number of items raised to the n th power - a number that gets very large, very fast. 42 Final Thought on Association Rules: The Problem of Lots of Data Fast Food Restaurantcould have 100 items on its menu How many combinations are there with 3 different menu items? 161,700 ! Supermarket10,000 or more unique items 50 million 2-item combinations 100 billion 3-item combinations How to reduce data: Use of product hierarchies (groupings) Prunning: reducing the number of items and combinations of items being considered at each step Minimum support pruning requires that a rule hold on a minimum number of transactions. If there are one million transactions and the minimum support is 1%, then only rules supported by 10,000 transactions are of interest. Finally, know that the number of transactions in a given time- period could also be huge (hence expensive to analyze) 43 Using Association Rules to Compare Stores EX: compare sales at store openings versus existing stores: 1. Gather data for a specific period (such as 2 weeks) from store openings. Augment each of the transactions in this data with a virtual item saying that the transaction is from a store opening. 2. Gather about the same amount of data from existing stores. Here you might use a sample across all existing stores, or you might take all the data from stores in comparable locations. Augment the transactions in this data with a virtual item saying that the transaction is from an existing store. 3. Apply market basket analysis to find association rules in each set. 4. Pay particular attention to association rules containing the virtual items. 44 Dissociation Rules if A and not B, then C Dissociation rules can be generated by a simple adaptation of the basic market basket analysis algorithm. Downsides to including new items: doubling the number of items seriously degrades performance the size of a typical transaction grows because it now includes inverted items the frequency of the inverse items tends to be much larger than the frequency of the original items. So, minimum support constraints tend to produce rules in which all items are inverted, such as if NOT A and NOT B then NOT C. These rules are less likely to be actionable. 45 Sequential Analysis Using Association Rules Association rules find things that happen at the same time - what items are purchased at a given time. The next natural question concerns sequences of events and what they mean. Examples: New homeowners purchase shower curtains before purchasing furniture. Customers who purchase new lawnmowers are very likely to purchase a new garden hose in the following 6 weeks. When a customer goes into a bank branch and asks for an account reconciliation, there is a good chance that he or she will close all his or her accounts. In order to consider time-series analyses on your customers, there has to be some way of identifying customers. Without a way of tracking individual customers, there is no way to analyze their behavior over time. 46 Sequential Patterns Instead of finding association between items in a single transactions, find association between items across related transactions over time. Customer ID Transaction Data. Item 1 Item 2 AA 2/2/2001 Laptop Case AA 1/13/2002 Wireless network card Router BB 4/5/2002 laptop iPaq BB 8/10/2002 Wireless network card Router
Sequence : {Laptop}, {Wireless Card, Router} A sequence has to satisfy some predetermined minimum support 47 Exercise 1 by hand Given the above list of transactions, do the following: 1) Find all the frequent itemsets (minimum support 40%) 2) Find all the association rules (minimum confidence 70%) 3) For the discovered association rules, calculate the lift Transaction No.Item 1 Item 2 Item 3 Item 4 100 Beer Diaper Chocolate 101 Milk Chocolate Shampoo 102 Beer Soap Vodka 103 Beer Cheese Wine 104 Milk Diaper Beer Chocolate 48 RapidMiner Practice To see: RapidMiner Tutorial example 2 / 26 To practice: Do the exercise presented in the tutorial using the file Iris.ioo. 49 Exercise 1 using RapidMiner Take Beer.xls file and find the association rules First process the data to the right format (Beer1.xls ) 50 RapidMiner Practice To see: Training Videos\05 - Akhtar Fareed - RapidMinerTutorial\RapidMiner Tutorial (part 9_9) Association Rules To practice: Do the exercises presented in the movie using the file BalanceScale.xls. 51 Data Preprocessing Bank.xls Bank.ioo Save as .ioo format Process design Take a look at the .ioo file and attributes / variables Process the attributes using Select Attributes Rules can only handle categorical data types Find association rules Use operators: FP-Growth then Create Association Rules Association Rules Read and interpret the results RapidMiner Practice