You are on page 1of 19

Data Mining

Frequent Patterns

By:
Muhammad Haleem
Overview
Road Map
Frequent pattern
Association analysis
Single-dimensional
Multi-dimensional
Market basket analysis
Apriori Algorithm
Road Map
Imagine that you are a sales manager at All Electronics,
and you are talking to a customer who recently bought a
PC and a digital camera from the store.
What should you recommend to the customer next?
Information about which products are frequently
purchased by your customers following their purchases
of a PC and a digital camera in sequence would be very
helpful in making your recommendation.
Frequent patterns and association rules are the
knowledge that you want to mine in such a scenario.
Frequent pattern
Frequent patterns are patterns (e.g., item sets,
subsequences or substructures) that appear
frequently in a data set. For example, a set of
items, such as milk and bread, that appear
frequently together in a transaction data set is a
frequent item set.
A subsequence, such as buying first a PC, then
a digital camera, and then a memory card, if it
occurs frequently in a shopping history
database, is a (frequent ) sequential pattern.
Frequent pattern…
A substructure can refer to different structural forms,
such as sub graphs and sub trees which may be
combined with item sets or subsequences.
If a substructure occurs frequently, it is called a
(frequent ) structured pattern. Finding frequent
patterns plays an essential role in mining interesting
relationships among data.
Moreover, it helps in data classification, clustering,
and other data mining tasks.
Thus, frequent pattern mining has become an
important data mining task and a focused theme in
data mining research.
Association analysis.
In association analysis we derive below given
association rules:
Single-dimensional association rules:
That involve single attribute.
Multidimensional association rule
That involve multiple attributes.
Single-dimensional association rule
Suppose that, as a marketing manager at
AllElectronics, you want to know which items are
frequently purchased together by the customers.
An example of such a rule, mined from the All
Electronics transactional database, is buys( X ,
“computer” )buys(X , “software”) [support =7%,
confidence =50%]
(where X is a variable representing a customer)
Single-dimensional association rule…
A confidence, or certainty, of 50% means that if a customer
buys a computer, there is a 50% chance that customer will
buy software as well.
A 7% support means that 7% of all the transactions under
analysis show that computer and software are purchased
together.
This association rule involves a single attribute or predicate
(i.e., buys) that repeats. Association rules that contain a single
predicate are referred to as single-dimensional association
rules.
Dropping the predicate notation, the rule can be written
simply as “computer software [7%, 50%].”
Multidimensional association rule
A data mining system may find association rules like
Age(X , “20..29”) ^ income(X , “40K..49K” ) 
buys(X , “laptop”)
[support =2%, confidence=60%].
The rule indicates that X of the All Electronics under
study, 2% are 20 to 29 years old with an annual
income of $40,000 to $49,000 and have purchased a
laptop (computer) at AllElectronics.
Multidimensional association rule…
There is a 60% probability that a customer in this age
and income group will purchase a laptop on the visit to
All Electronics.
Note that this is an association involving more than
one attribute or predicate (i.e., age, income, and buys).
Based on the terminology used in multidimensional
databases, where each attribute is referred to as a
dimension, the above rule is referred to as a
multidimensional association rule.
Market basket analysis
A typical example of frequent item set mining is market
basket analysis.
This process analyzes customer buying habits by finding
associations between the different items that customers
place in their “shopping baskets” .
The discovery of these associations can help retailers
develop marketing strategies by gaining insight into which
items are frequently purchased together by customers.
For instance, if customers are buying milk, how likely are
they to also buy bread (and what kind of bread) on the
same trip.
Market basket analysis…
Market basket analysis…
Suppose, as manager of an All Electronics branch, you
would like to learn more about the buying habits of your
customers.
Specifically if you like to know, “Which groups or sets of
items are customers likely to purchase on a given trip to
the store?”
To answer your question, market basket analysis may be
Performed on the retail data of customer transactions at
your store.
You can then use the results to plan marketing and
advertising strategies and in the design of a new catalog.
Market basket analysis…
Also market basket analysis may help you design different
store layouts. In one strategy, items that are frequently
purchased together can be placed in proximity to further
encourage the combined sale of such items.
If customers who purchase computers also tend to buy
antivirus software at the same time, then placing the
hardware display close to the software display may help
increase the sales of both items.
In an alternative strategy, placing hardware and software
at opposite ends of the store may entice customers who
purchase such items to pick up other items along the way.
Market basket analysis…
For instance, after deciding on an expensive computer,
a customer may observe security systems for sale
while heading toward the software display to purchase
antivirus software, and may decide to purchase a home
security system as well.
Market basket analysis can also help retailers plan
which items to put on sale at reduced prices. If
customers tend to purchase computers and printers
together, then having a sale on printers may encourage
the sale of printers as well as computers.
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Database TDB Itemset sup
{A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2
Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
17
The Apriori Algorithm—Example from your book
(frequent 2-itemset with minimum support=2)
The Apriori Algorithm (Pseudo-code)
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from L
increment the count of all candidates in Ck+1
for their frequency that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end

You might also like