You are on page 1of 18

Lec.5.

Introduction to
Advanced Analytics - Theory and
Methods
Association Rules

2
Association Rules
Association Rules is another unsupervised learning
method. There is no “prediction” performed but is used
to discover relationships , find similarities among large
sets of data items.
The example questions are :
- Which of my products tend to be purchased together?
- What will other people who are like this product tend
to buy/watch or click on for other products we may
have to offer?
3
Association Rules
• rules show how frequently the item set occurs in a
transaction.
• Discover "interesting" relationships among
variables in a large database
Rules of the form “If X is observed, then Y is
also observed"
The definition of "interesting“ varies with the
algorithm used for discovery

4
- Association Rules being most common technique used
in market basket analysis, that focus on point-of-sale (p-
o-s) transaction data
- 3 types of market basket data (p-o-s data)
* Customers
* Orders (basic purchase data)
* Items (merchandise or goods/services purchased)
-AR can be automatically generated
- AR represent patterns in the data without a specified
target variable
5
Rule Induction: Sequential Covering
Algorithm
•Steps:
Rules are learned one at a time
Each time a rule is learned, the
tuples covered by the rules are
removed
Repeat the process on the
remaining tuples until termination
condition.
6
Sequential Covering Algorithm

Examples covered
by Rule 2
Examples covered
Examples covered
by Rule 3
by Rule 1

Data items

7
How to generating Association
Rules?
Using IF-THEN Rules for Classification
• Represent the knowledge in the form of IF-THEN rules
R: IF age = youth AND student = yes

THEN buy computer = yes

 If part: Rule precondition


then part: rule consequent
• A rule can be assessed by its coverage and accuracy
• From a class labeled traning data set (D ):
 Let : ncovers = # of samples covered by R (or rule)
 Let : ncorrect = # of samples correctly classified by R
coverage(R) = ncovers /|D|
accuracy(R) = ncorrect / ncovers

9
Traditional rules VS. Association
rules
• Traditional rules usually limit a consequent of a rule
to a single attribute.

• Association rule generators allow the consequent of


a rule to contain one or several attribute values

10
Example

• A typically application is market basket analysis,


where the desire is to determine those items likely
to be purchased by the customer during a
shopping experience.

• The output of the market basket analysis is a set


of associations about customer-purchase behavior.

• The association rule are used to help determining


product market strategies.
11
Example

• If there are any interesting relationships to be found in


customer purchasing trends among the grocery store
products:
 Milk
 Cheese
 Bread
 Eggs

12
Possible associations include the following:

• If customers purchase milk they also purchase bread.


• If customers purchase bread they also purchase milk.
• If customers purchase milk and eggs they also
purchase cheese and bread.

• If customers purchase milk, cheese, and eggs they


also purchase bread.
13
Confidence
• Analyzing the first rule we are coming to the
natural question: “How likely will the event of
a milk purchase lead to a bread purchase?”

• To answer this question, a rule has an


associated confidence, which is in our case
the conditional probability of a bread
purchase given a milk purchase

14
Rule Confidence
Given a rule of the form “If A then B”, rule
confidence is the conditional probability that B is
true when A is known to be true.

If customers purchase milk they also purchase bread

e.g. 1000 transactions involve the purchase of


milk, and 200 of these transactions also contain a
bread purchase, then confidence of a milk
purchase given a bread purchase is 200/1000=20%
Rule Confidence
The second rule different from the first rule in
theIf domain.
customers
i.e. purchase
confidencebread
valuethey
for also
a milk
purchase given purchase milk.
a bread purchase.

e.g. 2000 transactions involve the purchase of


bread, and 200 of these transactions also
contain a milk purchase, then confidence of a
bread purchase given milk purchase is
200/2000=10% .
Rule Support (minimum support)
•Support is the minimum percentage of
instances (transactions) in the database
that contain all items listed in a
specific association rule.
Rule basic Measures: Support and Confidence

Support: denotes the frequency of the rule within


transactions. A high value means that the rule
involves a great part of database.

Confidence: denotes the percentage of transactions


containing A which contain also B. It is an estimation
of conditioned probability

You might also like