Professional Documents
Culture Documents
In sampling:
p: the support of Z in database
n: is sample size
m: number of transactions that contain all items of set Z (for which we check)
To convert this from one item set to all item sets we need to use the union bound.
For this we lower the threshold. So we have a lower chance to miss item semts in
the sample.
How much we want to lower depends is denoted in mhu/mju.
The probabilty of being epsilon or more away is e^{-2\epsilon^2 * n}
We can still miss frequent item sets even with lowered threshold.
Can we add these to the frequent item sets.
A theorem is devoted to this in slides.
Observation.
In computing the sample size.
p was a probabilty that random transaciton supports itemset Z.
That is, 1 of Z < setof
We first look at finite cases. Where there are not infinitely many decimals. So
Ints
We will