You are on page 1of 22

BITS Pilani

BITS Pilani Dr.Aruna Malapati


Asst Professor
Hyderabad Campus Department of CSIS
BITS Pilani
Hyderabad Campus

Extended Association Rule Mining


Today’s Learning objective

• Generate quantitative association rules when the items


are categorical or continious

BITS Pilani, Hyderabad Campus


Continuous and Categorical
Attributes
How to apply association analysis formulation to non-
asymmetric binary variables?
Session Country Session Number of
Browser
Id Length Web Pages Gender Buy
Type
(sec) viewed
1 USA 982 8 Male IE No
2 China 811 10 Female Netscape No
3 USA 2125 45 Female Mozilla Yes
4 Germany 596 4 Male IE Yes
5 Australia 123 9 Male Mozilla No
… … … … … … …
10

Example of Association Rule:


{Number of Pages [5,10)  (Browser=Mozilla)}  {Buy = No}

BITS Pilani, Hyderabad Campus


Handling Categorical
Attributes
• Transform categorical attribute into asymmetric binary
variables

• Introduce a new “item” for each distinct attribute-value


pair
– Example: replace Browser Type attribute with
• Browser Type = Internet Explorer
• Browser Type = Mozilla
• Browser Type = Mozilla

BITS Pilani, Hyderabad Campus


Handling Categorical
Attributes
• Potential Issues
– What if attribute has many possible values?
• Example: attribute country has more than 200 possible
values
• Many of the attribute values may have very low support
– Potential solution: Aggregate the low-support attribute
values
– What if distribution of attribute values is highly skewed?
• Example: 95% of the visitors have Buy = No
• Most of the items will be associated with (Buy=No) item
– Potential solution: drop the highly frequent items

BITS Pilani, Hyderabad Campus


Handling Continuous
Attributes
• Mining continuous attributes may reveal interesting rules like
• Users whose annual income is more than $120K belong to 45-60 age group
• Association rules that contain continuous attributes are known as
quantitative association rules.
• Different methods:
– Discretization-based
– Statistics-based
– Non-discretization based
• minApriori

BITS Pilani, Hyderabad Campus


Discretization-based
Discretization groups
methods adjacent values of
continuous attribute
into a finite number
of intervals.

BITS Pilani, Hyderabad Campus


Discretization-based
methods
• A key parameter in attribute discretization is the number of
intervals used to partition each attribute.
• Usually provided by users in any of the following forms
• Bin Witdh
• Bin Frequency
• Number of clusters
• Assume the following rules with support=5% and
confidence=65%

• If bin width is more for ex for age of the bin width is 24 the
following rules will emerge as interesting

BITS Pilani, Hyderabad Campus


Statistics-based Methods

• Quantitative association rules can be used to infer the


statistical properties of a population.

• Rule consequent consists of a continuous variable,


characterized by their statistics
– mean, median, standard deviation, etc.

BITS Pilani, Hyderabad Campus


Statistics-based Methods

Approach:
– Withhold the target variable from the rest of the data
– Apply existing frequent itemset generation on the rest of the data
– For each frequent itemset, compute the descriptive statistics for
the corresponding target variable
• Frequent itemset becomes a rule by introducing the target
variable as rule consequent
– Apply statistical test to determine interestingness of the rule

BITS Pilani, Hyderabad Campus


Statistics-based Methods

• How to determine whether an association rule interesting?


– Compare the statistics for segment of population covered
by the rule vs segment of population not covered by the
rule:
• A  B:  versus A  B: ’
 '   
– Statistical hypothesis testing: Z
s12 s22
• Null hypothesis: H0: ’ =  +  
n1 n2
• Alternative hypothesis: H1: ’ >  + 
• Z has zero mean and variance 1 under null hypothesis

BITS Pilani, Hyderabad Campus


Statistics-based Methods
Example:
r: Browser=Mozilla  Buy=Yes  Age: =23
– Rule is interesting if difference between  and ’ is greater than 5
years (i.e.,  = 5)
– For r, suppose n1 = 50, s1 = 3.5
– For r’ (complement): n2 = 250, s2 = 6.5

 '    30  23  5
Z   3.11
2 2 2 2
s s 3.5 6.5
1
 2

n1 n2 50 250
– For 1-sided test at 95% confidence level, critical Z-value for rejecting
null hypothesis is 1.64.
– Since Z is greater than 1.64, r is an interesting rule

BITS Pilani, Hyderabad Campus


Mining Multiple-Level
Association Rules
• Items often form hierarchies
• For example Heritage 2% milk -> Britannia white wheat bread
• Flexible support settings
– Items at the lower level are expected to have lower support
• Exploration of shared multi-level mining (Agrawal &
Srikant@VLB’95, Han & Fu@VLDB’95)

uniform support reduced support


Level 1
Milk Level 1
min_sup = 5%
[support = 10%] min_sup = 5%

Level 2 2% Milk Skim Milk Level 2


min_sup = 5% [support = 6%] [support = 4%] min_sup = 3%

BITS Pilani, Hyderabad Campus


Multi-level Association: Flexible
Support and Redundancy filtering
• Flexible min-support thresholds: Some items are more valuable but less frequent
– Use non-uniform, group-based min-support
– E.g., {diamond, watch, camera}: 0.05%; {bread, milk}: 5%; …
• Redundancy Filtering: Some rules may be redundant due to “ancestor”
relationships between items
– milk  wheat bread [support = 8%, confidence = 70%]
– 2% milk  wheat bread [support = 2%, confidence = 72%]
– The first rule is an ancestor of the second rule
• A rule is redundant if its support is close to the “expected” value, based on the
rule’s ancestor

BITS Pilani, Hyderabad Campus


Non-Discretization methods

• In some applications it is interesting to find associations among


continuous attributes rather than discrete intervals.
• For e.g word associations in documents.

TID W 1 W 2 W 3 W 4 W 5
D1 2 2 0 0 1
D2 0 0 1 2 2
D3 2 3 0 0 0
D4 0 0 1 0 1
D5 1 1 1 0 2
• Data contains only continuous attributes of the same “type”
– e.g., frequency of words in a document

BITS Pilani, Hyderabad Campus


Non-Discretization methods
• How to determine the support of a word?
– If we simply sum up its frequency, support count will
be greater than total number of documents!
• Normalize the word vectors – e.g., using L1 norm
• Each word has a support equals to 1.0

TID W 1 W 2 W 3 W 4 W 5 TID W1 W2 W3 W4 W5
D1 2 2 0 0 1 D1 0.40 0.33 0.00 0.00 0.17
D2 0 0 1 2 2 Normalize D2 0.00 0.00 0.33 1.00 0.33
D3 2 3 0 0 0 D3 0.40 0.50 0.00 0.00 0.00
D4 0 0 1 0 1 D4 0.00 0.00 0.33 0.00 0.17
D5 1 1 1 0 2 D5 0.20 0.17 0.33 0.00 0.33
BITS Pilani, Hyderabad Campus
Compute word associations
TID W1 W2 W3 W4 W5
D1 0.40 0.33 0.00 0.00 0.17
D2 0.00 0.00 0.33 1.00 0.33
D3 0.40 0.50 0.00 0.00 0.00
D4 0.00 0.00 0.33 0.00 0.17
D5 0.20 0.17 0.33 0.00 0.33
Support(W1,W2) = (0.44+0.33)/2+0+(o.40+0.50)/2+0+(0.20+0.17)/2 = 1

• Since every word frequency is normalized to 1 makes their support


=1
• Hence all item set will be frequent hence we need a modification to
our support.

BITS Pilani, Hyderabad Campus


Min-Apriori

New definition of support:

sup(C )   min D(i, j )


iT jC

TID W1 W2 W3 W4 W5 Example:
D1 0.40 0.33 0.00 0.00 0.17
Sup(W1,W2,W3)
D2 0.00 0.00 0.33 1.00 0.33
D3 0.40 0.50 0.00 0.00 0.00 = 0 + 0 + 0 + 0 + 0.17
D4 0.00 0.00 0.33 0.00 0.17 = 0.17
D5 0.20 0.17 0.33 0.00 0.33

BITS Pilani, Hyderabad Campus


Anti-monotone property of
Support
TID W1 W2 W3 W4 W5
D1 0.40 0.33 0.00 0.00 0.17
D2 0.00 0.00 0.33 1.00 0.33
D3 0.40 0.50 0.00 0.00 0.00
D4 0.00 0.00 0.33 0.00 0.17
D5 0.20 0.17 0.33 0.00 0.33

Example:
Sup(W1) = 0.4 + 0 + 0.4 + 0 + 0.2 = 1
Sup(W1, W2) = 0.33 + 0 + 0.4 + 0 + 0.17 = 0.9
Sup(W1, W2, W3) = 0 + 0 + 0 + 0 + 0.17 = 0.17
BITS Pilani, Hyderabad Campus
Mining Multi-Dimensional
Association
• Single-dimensional rules:
• buys(X, “milk”)  buys(X, “bread”)
• Multi-dimensional rules:  2 dimensions or predicates
– Inter-dimension assoc. rules (no repeated predicates)
• age(X,”19-25”)  occupation(X,“student”)  buys(X, “coke”)
– hybrid-dimension assoc. rules (repeated predicates)
• age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)
• Categorical Attributes: finite number of possible values, no ordering
among values—data cube approach
• Quantitative Attributes: Numeric, implicit ordering among values—
discretization, clustering, and gradient approaches

BITS Pilani, Hyderabad Campus


Take home message

• Categorical attributes is done by creating each class and


binarizing the data.
• Continuous attributes can be converted by discretization of
the class intervals.
• Statistical methods are interesting to understand the
properties of a population. Confidence measure is modified.
• Non discretization methods are used to handle continuous
attributes and the support measure is modified.

BITS Pilani, Hyderabad Campus

You might also like