You are on page 1of 8

# Association Rule

Yudi Agusta, PhD
Data Warehouse dan Data Mining, Lecture 12

Lecture’s Structure
Definition and Benefit Method Apriori Algorithm Support for An Item Set Confidence for An Association Rule Support for An Association Rule Extended Rule Selection

Definition
Market Basket Analysis Find regularities in the shopping behavior of customers of supermarkets, mail-order companies and the like Try to find sets of products that are frequently bought together The information is expressed in the form or rule

1

Benefit
The rule can be used to increase the number of items sold, for instance, by appropriately arranging the products in the shelves of a supermarket (they maybe placed adjacent of each other) The rule can be used to arrange promotion The rule can be used to prepare the availability of products in the selves

Association Rule
Example: If a customer buys wine and bread, he often buys cheese, too An association rule expresses an association between sets of items/products Can be applied to items/products in a:
Supermarket mail-order company special equipment options of a car optional services offered by telecommunication companies etc

Association Rule
Tidak hanya mencari hanya sekedar rule, tapi mencari rule yang “good”, “expressive” dan “reliable” Standar measure of good rule: support and confidence of a rule Main problem: there are so many possible rules
Example: products in a supermarket

Efficient algorithm is needed to inspect all the possible rules

2

Apriori Algorithm
Developed by Agrawal et al. in 1993 Restrict the search space Check only a subset of all rules Try not to miss important rules Some terms:
Support Confidence Item Set: a group of products such as {bread, wine, cheese}.

Support an Item Set
Let T be the set of all transactions under consideration
Contoh: set of all baskets of products bought by the customers of a supermarket

The support of an item set S is the percentage of all transactions in T which contain S, dengan rumus:
Support(S) = (|U|/|T|)*100% U is the set of all transactions that contain all items in S |U| and |T| are the number of elements in U and T

Contoh
A customer buys the set X = {milk, bread, apples, wine, sausages, cheese, onions, potatoes} Item Set = {bread, wine, cheese} S is obviously a subset of X, hence S is in U If there are 318 customers and 242 of them buy such a set U or a similar one that contains S, the support(S) = 76,1%

3

Confidence of An Association Rule
A measure used by Agrawal et al. (1993) to evaluate association rules (Intuitively) is the number of cases in which the rule is correct relative to the number of cases in which it is applicable The confidence of a rule R = “A And B Then C” is the support of the set of all items that appear in the rule divided by the support of the antecedent of the rule
Confidence(R) = (Support{A,B,C}/Support{A,B})*100%

Example
Let R = “wine AND bread THEN cheese” If a customer buys wine and bread then the rule is applicable and it says that he/she can be expected to buy cheese. If he/she doesn’t buy wine or doesn’t buy bread or buys neither, then the rule is not applicable and it doesn’t say anything about the customer. The customer may or may not buy cheese, and thus, the rule may or may not be correct.

Rule Confidence
Set a limit about how good is a rule in predicting an event. In the last example, a percentage of the number of all correct prediction divided by the number of all prediction is the rule confidence. In some program, the rule confidence of a rule is set equal to 80% (to make it a good rule)

4

Support of An Association Rule
Example: “A AND B THEN C” According to Agrawal (1993)
Is the support of the antecedent of the rule. The support of an association rule is the support of {A,B}

According to Borgelt (2002)
Is the support of the consequent of the rule. The support of an association rule is the support of {A,B,C}

Extended Rule Selection
Good rules are not often interesting. Example 1:

Example 2:

Extended Rule Selection
Interesting rule has to have a significantly different rule confidence from its antecedent. Comparison of a good rule with its antecedent rule including rule without antecedent is important. Can be performed using:
Absolute Confidence Difference to Prior Difference of Confidence Quotient to 1 Absolute Difference of Improvement Value to 1 Information Difference to Prior Normalised Chi2 Measure

5

Absolute Confidence Difference to Prior
Absolute value of the difference between rules Example:
(THEN bread) = 60% (cheese THEN bread) = 62% Absolute Confidence Difference = 2%

Threshold for prior confidence can be set, for example, = 20% ‘Interesting rule’ has to have absolute confidence difference less than (60-20)% =40% and greater than (60+20)%=80%

Difference of Confidence Quotient 1
Quotient? For the last example, if threshold for prior confidence is set = 20% Then:
The confidence of the next interesting rule is less than (1-20%)*60%=0.8*60%=48% The confidence of the next interesting rule is greater than 60%/(1-20%)=60%/0.8=75%

The difference with the previous one is the deviation depends on the prior confidence

Absolute Difference of Improvement Value to 1 For the last example, if threshold for prior confidence is set = 20% Then
Absolute difference of improvement value to 1 is less than (1-20%)*60%=0.8*60%=48% Absolute difference of improvement value to 1 is greater than (1+20%)*60%=1.2*60%=72%

Absolute difference of improvement value to 1 can be greater than 100%

6

Information Difference to Prior
Is the information gain criterion used in decision tree learners such as C4.5 to select the split attributes The idea: without any further information about other items in the set, we have a certain probability distribution for, say “bread” and “no bread” From the last example:

The entropy of a probability distribution is a lower bound on the number of yes-no-questions you have to ask in order to determine the actual value

Information Difference to Prior
From the last example: we also need to calculate the entropy for P(*|cheese) and P(*|no cheese) P(*|cheese): probability to buy * with cheese P(*|no cheese): probability to buy * with no cheese The difference between the entropy for the prior rule and the expected value of posterior rule is the information difference to prior