Professional Documents
Culture Documents
• If bin width is more for ex for age of the bin width is 24 the
following rules will emerge as interesting
Approach:
– Withhold the target variable from the rest of the data
– Apply existing frequent itemset generation on the rest of the data
– For each frequent itemset, compute the descriptive statistics for
the corresponding target variable
• Frequent itemset becomes a rule by introducing the target
variable as rule consequent
– Apply statistical test to determine interestingness of the rule
' 30 23 5
Z 3.11
2 2 2 2
s s 3.5 6.5
1
2
n1 n2 50 250
– For 1-sided test at 95% confidence level, critical Z-value for rejecting
null hypothesis is 1.64.
– Since Z is greater than 1.64, r is an interesting rule
TID W 1 W 2 W 3 W 4 W 5
D1 2 2 0 0 1
D2 0 0 1 2 2
D3 2 3 0 0 0
D4 0 0 1 0 1
D5 1 1 1 0 2
• Data contains only continuous attributes of the same “type”
– e.g., frequency of words in a document
TID W 1 W 2 W 3 W 4 W 5 TID W1 W2 W3 W4 W5
D1 2 2 0 0 1 D1 0.40 0.33 0.00 0.00 0.17
D2 0 0 1 2 2 Normalize D2 0.00 0.00 0.33 1.00 0.33
D3 2 3 0 0 0 D3 0.40 0.50 0.00 0.00 0.00
D4 0 0 1 0 1 D4 0.00 0.00 0.33 0.00 0.17
D5 1 1 1 0 2 D5 0.20 0.17 0.33 0.00 0.33
BITS Pilani, Hyderabad Campus
Compute word associations
TID W1 W2 W3 W4 W5
D1 0.40 0.33 0.00 0.00 0.17
D2 0.00 0.00 0.33 1.00 0.33
D3 0.40 0.50 0.00 0.00 0.00
D4 0.00 0.00 0.33 0.00 0.17
D5 0.20 0.17 0.33 0.00 0.33
Support(W1,W2) = (0.44+0.33)/2+0+(o.40+0.50)/2+0+(0.20+0.17)/2 = 1
TID W1 W2 W3 W4 W5 Example:
D1 0.40 0.33 0.00 0.00 0.17
Sup(W1,W2,W3)
D2 0.00 0.00 0.33 1.00 0.33
D3 0.40 0.50 0.00 0.00 0.00 = 0 + 0 + 0 + 0 + 0.17
D4 0.00 0.00 0.33 0.00 0.17 = 0.17
D5 0.20 0.17 0.33 0.00 0.33
Example:
Sup(W1) = 0.4 + 0 + 0.4 + 0 + 0.2 = 1
Sup(W1, W2) = 0.33 + 0 + 0.4 + 0 + 0.17 = 0.9
Sup(W1, W2, W3) = 0 + 0 + 0 + 0 + 0.17 = 0.17
BITS Pilani, Hyderabad Campus
Mining Multi-Dimensional
Association
• Single-dimensional rules:
• buys(X, “milk”) buys(X, “bread”)
• Multi-dimensional rules: 2 dimensions or predicates
– Inter-dimension assoc. rules (no repeated predicates)
• age(X,”19-25”) occupation(X,“student”) buys(X, “coke”)
– hybrid-dimension assoc. rules (repeated predicates)
• age(X,”19-25”) buys(X, “popcorn”) buys(X, “coke”)
• Categorical Attributes: finite number of possible values, no ordering
among values—data cube approach
• Quantitative Attributes: Numeric, implicit ordering among values—
discretization, clustering, and gradient approaches