You are on page 1of 12

Apriori Algorithm Review

for Finals.

SE 157B, Spring Semester 2007


Professor Lee
By
Gaurang Negandhi

1
Overview
Definition of Apriori Algorithm
Steps to perform Apriori Algorithm
Apriori Algorithm Examples
Pseudo Code for Apriori Algorithm
Apriori Advantages/Disadvantages
References

2
Definition of Apriori
Algorithm
In computer science and data mining,
Apriori is a classic algorithm for learning
association rules.
Apriori is designed to operate on databases
containing transactions (for example,
collections of items bought by customers,
or details of a website frequentation).
The algorithm attempts to find subsets
which are common to at least a minimum
number C (the cutoff, or confidence
threshold) of the itemsets.

3
Definition (contd.)
Apriori uses a "bottom up" approach,
where frequent subsets are extended
one item at a time (a step known as
candidate generation, and groups of
candidates are tested against the data.
The algorithm terminates when no
further successful extensions are found.
Apriori uses breadth-first search and a
hash tree structure to count candidate
item sets efficiently.
4
5
Steps to Perform Apriori
Algorithm

6
Apriori Algorithm
Examples
Problem1 Decomposition
Transaction ID Items Bought
Shoes, Shirt, Jacket
2 Shoes,Jacket
3 Shoes, Jeans
4 Shirt, Sweatshirt
If the minimum support is 50%, then {Shoes, Jacket} is the only
2- itemset that satisfies the minimum support.
Frequent Itemset Support
{Shoes} 75%
{Shirt} 50%
{Jacket} 50%
{Shoes, Jacket} 50%
If the minimum confidence is 50%, then the only two rules generated from this 2-
itemset, that have confidence greater than 50%, are:

Shoes Jacket Support=50%, Confidence=66%


7
Jacket Shoes Support=50%, Confidence=100%
The Apriori Algorithm Example
Min support =50%
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2 8
Pseudo Code for Apriori
Algorithm

9
Apriori
Advantages/Disadvantage
s
Advantages
Uses large itemset property
Easily parallelized
Easy to implement
Disadvantages
Assumes transaction database is
memory resident.
Requires many database scans.

10
Summary
Association Rules form an very applied data
mining approach.
Association Rules are derived from frequent
itemsets.
The Apriori algorithm is an efficient algorithm
for finding all frequent itemsets.
The Apriori algorithm implements level-wise
search using frequent item property.
The Apriori algorithm can be additionally
optimized.
There are many measures for association rules.

11
References
References
Agrawal R, Imielinski T, Swami AN. "Mining Association
Rules between Sets of Items in Large Databases."
SIGMOD. June 1993, 22(2):207-16, pdf.
Agrawal R, Srikant R. "Fast Algorithms for Mining
Association Rules", VLDB. Sep 12-15 1994, Chile, 487-
99, pdf, ISBN 1-55860-153-8.
Mannila H, Toivonen H, Verkamo AI. "Efficient algorithms
for discovering association rules." AAAI Workshop on
Knowledge Discovery in Databases (SIGKDD). July 1994,
Seattle, 181-92, ps.
Implementation of the algorithm in C#
Retrieved from
"http://en.wikipedia.org/wiki/Apriori_algorithm"

12

You might also like