You are on page 1of 12

Apriori Algorithm Review for

Finals.

SE 157B, Spring Semester 2007


Professor Lee
By
Gaurang Negandhi

1
Overview
 Definition of Apriori Algorithm
 Steps to perform Apriori Algorithm
 Apriori Algorithm Examples
 Pseudo Code for Apriori Algorithm
 Apriori Advantages/Disadvantages
 References

2
Definition of Apriori Algorithm
 In computer science and data mining,
Ap ri ori is a classic algorithm for learning
association rules.
 Apriori is designed to operate on databases
containing transactions (for example,
collections of items bought by customers, or
details of a website frequentation).
 The algorithm attempts to find subsets which
are common to at least a minimum number C
(the cutoff, or confidence threshold) of the
itemsets.
3
Definition (contd.)
 Apriori uses a "bottom up" approach, where
frequent subsets are extended one item at a
time (a step known as candidate generation,
and groups of candidates are tested against
the data.
 The algorithm terminates when no further
successful extensions are found.
 Apriori uses breadth-first search and a
hash tree structure to count candidate item
sets efficiently.
4
5
Steps to Perform Apriori
Algorithm

6
Apriori Algorithm Examples
Problem Decomposition
Transaction ID Items Bought
1 Shoes, Shirt, Jacket
2 Shoes,Jacket
3 Shoes, Jeans
4 Shirt, Sweatshirt
If the minimum support is 50%, then {Shoes, Jacket} is the only 2-
itemset that satisfies the minimum support.
Frequent Itemset Support
{Shoes} 75%
{Shirt} 50%
{Jacket} 50%
{Shoes, Jacket} 50%
If the minimum confidence is 50%, then the only two rules generated from this 2-
itemset, that have confidence greater than 50%, are:

Shoes ⇒ Jacket Support=50%, Confidence=66%


7
Jacket ⇒ Shoes Support=50%, Confidence=100%
The Apriori Algorithm — Example
Min support =50%
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2 8
Pseudo Code for Apriori
Algorithm

9
Apriori
Advantages/Disadvantages
 Advantages
 Uses large itemset property
 Easily parallelized
 Easy to implement
 Disadvantages
 Assumes transaction database is memory
resident.
 Requires many database scans.

10
Summary
 Association Rules form an very applied data mining
approach.
 Association Rules are derived from frequent
itemsets.
 The Apriori algorithm is an efficient algorithm for
finding all frequent itemsets.
 The Apriori algorithm implements level-wise search
using frequent item property.
 The Apriori algorithm can be additionally optimized.
 There are many measures for association rules.

11
References
 Re ference s
 Agrawal R, Imielinski T, Swami AN. "Mining Association Rules
between Sets of Items in Large Databases." SIGMOD. June
1993, 22 (2):207-16, pdf.
 Agrawal R, Srikant R. "Fast Algorithms for Mining Association
Rules", VLDB. Sep 12-15 1994, Chile, 487-99, pdf,
ISBN 1-55860-153-8.
 Mannila H, Toivonen H, Verkamo AI. "Efficient algorithms for
discovering association rules." AAAI Workshop on Knowledge
Discovery in Databases (SIGKDD). July 1994, Seattle, 181-92,
ps.
 Implementation of the algorithm in C#
 Retrieved from "http://en.wikipedia.org/wiki/Apriori_algorithm"

12

You might also like