Professional Documents
Culture Documents
Example:
•
• For instance, if customers are buying milk, how likely are they to also buy bread on
the same trip.
• Such information can lead to increased sales by helping retailers do selective
marketing and plan their shelf space.
Association rules:
• Buys(X,”Milk”)=>Buys(X,”Bread”)[Support=75%,
Confidence=100%]
• support(A⇒B) = P(A∪B)
• confidence(A⇒B) = P(B|A)
• If the relative support of an itemset I satisfies a prespecified minimum support
threshold.
In general, association rule mining can be viewed as a two-step process:
1. Find all frequent itemsets: By definition, each of these itemsets will occur at least as
frequently as a predetermined minimum support count, min support.
2. Generate strong association rules from the frequent itemsets: By definition, these
rules must satisfy minimum support and minimum confidence.
•
• If the relative support of an itemset I satisfies a prespecified minimum support
threshold.
Example:
3c) Compute all the frequent item sets using Apriori algorithm for the given data where
min-sup = 2.
Ans:
Therefore the Items in L1,L2 ,L3 are the Frequent Itemsets.
4a) Define Support of an association rule.
Ans: Support is a measure of the number of times an item set appears in a dataset.
support(A⇒B) = P(A∪B)
• Buys(X,”Milk”)=>Buys(X,”Bread”)[Support=75%, Confidence=100%]
Ans:
4d) Explain in detail about multilevel association rules.
Ans: Association rules generated from mining data at multiple levels of abstraction are called
multiple-level or multilevel association rules. Multilevel association rules can be mined
efficiently using concept hierarchies under a support-confidence framework.
i) Using uniform minimum support for all levels (referred to as uniform
support):
• The same minimum support threshold is used when mining at each level of abstraction.
• For example a minimum support threshold of 5% is used throughout (e.g., for mining
from “computer” down to “laptop computer”). Both “computer” and “laptop computer”
are found to be frequent, while “desktop computer” is not.
Ans: Apriori algorithm is used to identify frequent itemsets in a dataset & generate an
association based rule based on the itemsets
5b) Give a note on Closed Frequent Item
Ans: Closed Frequent Itemset: An itemset is closed if none of its immediate supersets has
the same support as that of the itemset.
Ex: {A,B}=3, {A,C}=3, {A}=4. let us comsider {A,B},{A,C} are immediate superset
of {A}, Which has less support count Then {A}. Then {A} is a Closed Itemset.
Let us consider a graph h with an edge set E(h) and a vertex set V(h). Let us consider the
existence of subgraph isomorphism from h to h’ in such a way that h is a subgraph of h’. A
label function is a function that plots either the edges or vertices to a label. Let us consider a
labeled graph dataset, F=H1,H2 ,H3….Hn Let us consider s(h) as the support which means the
percentage of graphs in F where h is a subgraph. A frequent graph has support that will be
no less than the minimum support threshold. Let us denote it as min_support.
Steps in finding frequent subgraphs:
There are two steps in finding frequent subgraphs.
The first step is to create frequent substructure candidates.
The second step is to find the support of each and every candidate. We must
optimize and enhance the first step because the second step is an NP-completed
set where the computational complexity is accurate and high.
Ans: Sequential pattern mining is the mining of frequently appearing series events or
subsequences as patterns. An instance of a sequential pattern is users who purchase a Canon
digital camera are to purchase an HP color printer within a month.
For retail information, sequential patterns are beneficial for shelf placement and promotions.
This industry, and telecommunications and different businesses, can also use sequential
patterns for targeted marketing, user retention, and several tasks.
There are several areas in which sequential patterns can be used such as Web access pattern
analysis, weather prediction, production processes, and web intrusion detection.
Given a set of sequences, where each sequence includes a file of events (or elements) and each
event includes a group of items, and given a user-specified minimum provide threshold of min
sup, sequential pattern mining discover all frequent subsequences, i.e., the subsequences whose
occurrence frequency in the group of sequences is no less than min_sup.
2. 200 <(ad)c(bcd)(abe)>
3. 300 <(ef)(ab)(def)cb>
4. 400 <eg(adf)CBC>