Professional Documents
Culture Documents
○
ASSOCIATION RULES 309
Put simply, association rules, or <iffinity analysis, constitute a study of "what goes
with what." This method is also called market basket analysis because it originated
with the study of customer transactions databases to determine dependencies
between purchases of different items. Association rules are heavily used in retail
for learning about items that are purchased together, but they are also useful
in other fields. For example, a medical researcher might want to learn what
symptoms go with what confirmed diagnoses. In law, word combinations that
appear too often might indicate plagiarism.
In Stock.
SoleI by DeIIa_ and FlMiliad by AmIWlll.
This item cIoeo noI ship to Tlipel City, Taiwln; Repubfic of Chino. Please
chede other sellers "\\be) may ship warnationaly. Learn mora
Color: White
TABLE 14.1
{ } { }
�
=
� 3� − 2�+1 + 1
312 ASSOCIATION RULES AND COLLABORATIVE FILTERING
{ }
4
100 × 10
= 40%
� (� − 1)
ASSOCIATION RULES 313
Confidence = .
�
=� ∣ .
�
314 ASSOCIATION RULES AND COLLABORATIVE FILTERING
� =� � ,
� �
=�
�
= .
= .
{ }
{ }
ASSOCIATION RULES 315
TABLE 14.2
>
TABLE 14.3
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
316 ASSOCIATION RULES AND COLLABORATIVE FILTERING
{ }
{ }
{ }
{ }
{ }⇒{ }
{ }
{ }
{ }⇒{ }
{ }
{ }
{ }⇒{ }
{ }
{ }
{ }⇒{ }
{ }
{ }
{ }⇒{ }
{ }
{ }
{ }⇒{ }
{ }
{
}
ASSOCIATION RULES 317
FIGURE 14.2
318 ASSOCIATION RULES AND COLLABORATIVE FILTERING
TABLE 14.4
(�&�)
{ }
ASSOCIATION RULES 319
TABLE 14.5
(�&�)
→
→
→
→
→
→
→
→
→
320 ASSOCIATION RULES AND COLLABORATIVE FILTERING
TABLE 14.6
ASSOCIATION RULES 321
FIGURE 14.3
322 ASSOCIATION RULES AND COLLABORATIVE FILTERING
down by looking for rows that share the same support.) This does not mean that
the rules are not useful. On the contrary, it can reduce the number of itemsets
to be considered for possible action from a business perspective.
3
14.2 COLLABORATIVE FILTERING
3
This section copyright ○
c 2016 Statistics.com, Galit Shmueli, and Peter Bruce.
COLLABORATIVE FILTERING 323
�1 �2 ⋯ ��
�1 �1,1 �1,2 ⋯ �1,�
�2 �2,1 �2,2 ⋯ �2,�
⋮
�� ��,1 ��,2 ⋯ ��,�
TABLE 14.7
� � 1 , �2 , … , � � � �1 , �2 , … , ��
�� � �
� �
��,� ��
(�� , �� , ��,� )
324 ASSOCIATION RULES AND COLLABORATIVE FILTERING
MovielD
Customer ID 1 5 8 17 18 28 30 44 48
30878 4 1 3 3 4 5
124105 4
822109 5
823519 3 1 4 4 5
885013 4 5
893988 3 4 4
1248029 3 2 4 3
1503895 4
1842128 4 3
2238063 3
TABLE 14.8 SAMPLE OF RECORDS FROM THE NETFUX PRIZE CONTEST, FOR A SUBSET
OF 10 CUSTOMERS AND 9 MOVIES
This is an example where converting the rating information into a binary matrix
of rated/unrated proved to be useful.
1. Find users who are most similar to the user of interest (neighbors). This
is done by comparing the preference of our user to the preferences of
other users.
2. Considering only the iteIm that the user has not yet purchased, recom-
mend the ones that are most preferred by the user's neighbors.
'''The BellKor 2008 Solution to the NetBix Prize," Bell, R. M., Koren, Y., and Volinsky, C.,
www.netflixprize.com/assetslProgressPrize200RJ3ellKor.pd£
COLLABORATIVE FILTERING 325
(�30878 , �823519 )
(4 − 3.333)(3 − 3.4) + (3 − 3.333)(4 − 3.4) + (4 − 3.333)(5 − 3.4)
=√ √
(4 − 3.333)2 + (3 − 3.333)2 + (4 − 3.333)2 (3 − 3.4)2 + (4 − 3.4)2 + (5 − 3.4)2
= 0.6∕1.75 = 0.34.
326 ASSOCIATION RULES AND COLLABORATIVE FILTERING
4×3+3×4+4×5
(�30878 , �823519 ) = √ √
42 + 32 + 42 32 + 42 + 52
= 44∕45.277 = 0.972.
�
COLLABORATIVE FILTERING 327
�1 = 3.7 �5 = 3
(4 − 3.7)(1 − 3) + (4 − 3.7)(5 − 3)
(�1 , �5 ) = √ √ = 0.
(4 − 3.7)2 + (4 − 3.7)2 (1 − 3)2 + (5 − 3)2
328 ASSOCIATION RULES AND COLLABORATIVE FILTERING
the "long tail" of user preferences, while association rules look for the
"head." This difference ha, implications for the data needed: association
rules require data on a very large number of "basket," (transactions) in order
to find a sufficient number of baskets that contain certain combinations of
items. In contrast, collaborative filtering does not require many "baskets,"
but it does require data on as many items as possible for many users.
Also association rules operate at the basket level (our database can include
multiple transactions for each user), while collaborative filtering operates at
the user level.
Because association rules produce generic, impersonal rules (association-
based recommendations such as Amazon's "Frequently Bought Together"
display the same recommendations to all users searching for a specific item),
they can be used for setting common strategies such as product placement
in a store or sequencing of diagnostic tests in hospitals. In contrast, col-
laborative filtering generates user-specific recommendations (e.g., Amazon's
"Customers Who Bought This Item Also Bought ... ") and is therefore a
tool designed for personalization.
Transactional data vs. user data Association rules provide recommen-
dations of items based on their co-purchase with other items in many trans-
actions Ibaskets. In contrast, collaborative filtering provides recommendations
of items based on their co-purchase or co-rating by even a small number
of other users. Considering distinct baskets is useful when the same items
are purchased over and over again (e.g., in grocery shopping). Considering
distinct users is useful when each item is typically purchased/rated once (e.g.,
purchases of books, music, and movies).
Binary data and rating. data A"ociation rules treat item, a., binary data
(1 = purchase, 0 = nonpurchase), whereas collaborative filtering can operate
on either binary data or on numerical ratings.
Two or more item. In association rules, the antecedent and consequent
can each include one or more items (e.g., IF milk THEN cookies and
cornflakes). Hence a recommendation might be a bundle of the item of
interest with multiple items ("buy milk, cookies, and cornflakes and receive
10% discount"). In contrast, in collaborative filtering similarity is measured
between pairs of items or pairs of users. A recommendation will therefore
be either for a single item (the most popular item purchased by people like
you, which you haven't purchased), or for multiple single items that do not
necessarily relate to each other (the top two most popular items purcha,ed
by people like you, which you haven't purchased).
14.3 SUMMARY
Association rules (aL,o called market basket analysis) and collaborative filtering
are unsupervised methods for deducing associations between purchased items
from databases of transactions. Association rules search for generic rules about
items that are purchased together. The main advantage of this method is that it
generates dear, simple rules of the form "IF X is purchased, THEN Y is also
likely to be purchased." The method is very transparent and easy to understand.
The process of creating association rules is two-staged. First, a set of candidate
rules based on frequent itemsets is generated (the Apriori algorithm being the
most popular rule generating algorithm). Then from these candidate rules, the
rules that indicate the strongest association between items are selected. We use
the measures of support and confIdence to evaluate the uncertainty in a rule.
The user also specilies minimal support and confIdence values to be used in the
rule generation and selection process. A third measure, the Jill ratio, compares
the efficiency of the rule to detect a real association compared to a random
combination.
7 If the rule is "IF milk, THEN cookies and cornflakes," then the association rules would recommend
cookies and cornflakes to a milk purchaser, while item-based collaborative filtering would recommend
the most popular single item purchased with milk.
SUMMARY 331
One shortcoming of association rules is the profusion of rules that are gen-
erated. There is therefore a need for ways to reduce these to a small set of
useful and strong rules. An important nonautomated method to condense the
information involves examining the rules for uninformative and trivial rules as
well as for rules that share the same support. Another issue that needs to be kept
in mind is that rare combinations tend to be ignored because they do not meet
the minimum support requirement. For this reason it is better to have items that
are approximately equally frequent in the data. This can be achieved by using
higher-level hierarchies as the items. An example is to use types of books rather
than titles of individual books in deriving association rules from a database of
bookstore transactions.
Collaborative ftltering is a popular technique used in online recommendation
systems. It is based on the relationship between items formed by users who acted
similarly on an item, such as purchasing or rating an item highly. User-based
collaborative flltering operates on data on item-user combinations, calculates
the similarities between users, and provides personalized recommendations to
users. An important component for the success of collaborative fIltering is that
users provide feedback about the recommendations provided and have sufficient
information on each item. One disadvantage of collaborative fIltering methods
is that they cannot generate recommendations for new users or new items.
Also, with a huge number of users, user-based collaborative fIltering becomes
computationally challenging, and alternatives such as item-based methods or
dimension reduction are popularly used.
332 ASSOCIATION RULES AND COLLABORATIVE FILTERING
TABLE 14.9
Course
Topics.xls
TABLE 14.10
PROBLEMS 333
Cosmetics.xls
Cosmet
ics.xls
TABLE 14.11
334 ASSOCIATION RULES AND COLLABORATIVE FILTERING
RowlD Confidence" Ante<edent 11'1 Consequent leI Support for A Support fore Support for A & e Lift Katio
50 80.52 8rush6 & Concnler Nai l Pol ish & Bronzer 77 103 62 3.9OB712647
51 60.19 Nll il Poli sh & Bronzer Brushes & Concealer 103 77 62 3.908712647
46 Bl.58 Nll il Polish & Concea ler & Bron%er Brushes 76 110 62 3.7081S3971
54 56.36 8rush6 Nai l Pol ish & Concealer & Bronzer 110 76 62 3.708133971
29 76.36 Brush6 Nail Pol ish & Bronzer 110 103 B4 3.706972639
27 Bl.55 Nll il Polish & Bronzer Brushes 103 110 84 3.706972639
49 73.B1 8rush6 & Bronter Nail Pol ish & Concea ler B4 109 62 3.385757973
52 56.88 Na il Polish & Concellier Brushes & Bronzer 109 84 62 3.385757973
15 70.64 Nll il Polish & Concnler Brushes 109 110 77 3.211009174
17 70.00 8rush6 Nai l Pol ish & Concea ler 110 109 77 3.211009174
6 67.07 Blush & Na il Polish Brushes B2 110 55 3.048780488
7 SO.OO Brush6 Blush & Na il Polis" 110 82 55 3.048780488
TABLE 14.12