You are on page 1of 5

# ǫ -net & VC-dimension

Hirak Sarkar

1. Introduction We can think of a situation where we are picking up special interest groups from a set. That is we are collecting people from a set such that there is at least one represented person from each group. Now we can further assume that we only need to think about large groups that is we do not consider small groups. Being realistic that is a fair assumption. Let N be the size of the population, so we can take groups into account that are large fractions of that set. Let it be 0 ≤ ǫ ≤ 1. So we can impose the condition that we also want the representative amount of subset. So the subset must capture the size of the subset. So formally we specify that if m is the size of a subset (m ≥ ǫn) that the number of representatives r (m) should be such m) ≈ m . Now we want a smallest r . That problem is very much that r(n n similar to Hitting Set Problem. The family of subsets we want to hit with minimum number of points, in a way that we do not consider small subsets. Sometimes we only worry about the bound of number of points in the hitting set. Following ﬁgure is depicting the situation where the base set is set of points on a real plane, and the subsets are rectangular ranges. By applying randomized algorithm we could easily get a quick bound over the number of points to be chosen. Let us choose points with probability n p = log to build the sample. The probability that we miss a particular large ǫn subset that is the failure probability is (1 − p)ǫn = (1 − It follows that sample size 2. Deﬁnitions In this section we would formally deﬁne Range Space and related notations that would be used often throughout the section.
Preprint submitted to Elsevier May 30, 2013
log n n

1 log n ǫn ) ≤ ǫn n

(1)

then some such sample would be hitting set.

It is to be noted that V C -dimension can be very well inﬁnite. set of points. R) is δ if for all Y .|B|=m |R|B | Smallest d such that πs (m) is O (md ) for all m. where R|Y = {Y ∩ r |r ∈ R}. |Y | ≥ δ . and R family of subsets of X .(a) Figure 1: Hitting Set (b) 2. Dual Range Space: If S = (X. We farther deﬁne the restricted range space as if Y ⊆ X then S|Y = (Y. Shatter Function: For a range space S = (X. From that framework we can deﬁne V C -dimension of a range space S = (X. For example X can be a set of points on a real line and R be a real interval. Such as example would be points on real plane and taking range as all possible convex subsets of points.1. 2 . if we consider two points then the set can be shattered. Xp |p ∈ r } is a dual range space where Xp {r ∈ R|p ∈ r }. R) is (primal) range space then S ∗ = {R. R) is a tuple where X . V C -dimension We say that a ﬁnite subset of points is shattered if R|Y = 2Y . πS (m) is a shatter function if πS (m) = maxB⊂X. thus we can bound the number of subsets by some polynomial. R|Y ). R). Y can not be shattered and there exists a subset size δ . Range Space Range Space S = (X. So even if we can not shatter all possible subsets we can shatter some of the subsets.2. 2. For example of points on a line and range as real intervals.

If N ⊆ X is the largest set shattered by S .2. then 2δ ≤ cδ d . Observation 3. The properties also lead to some observations that can be concluded for dual of a range space.3. R) has a V C -dimension δ .1. VC-dimension & Sauer’s Lemma Lemma 3. 4. More precisely it is bounded by δ Gδ (n) = i=0 n i We can farther write it as a recursive formula Gδ (n) = Gδ−1 (n) + Gδ (n − 1) The proof can be followed by induction. If S has V C dimension δ then S ∗ has a V C dimension δ ∗ ≤ 2δ Right away we can put a bound on dual shattering dimension. R) has V C dimension of S is d. then |R| is bounded by nδ for |X | = n.1. If for any range r ∈ R and m(r ) ≥ ǫ then N contain at least one such X| . point in r ∩ X . Lemma 3. In a range space S = (X. If S = (X. then the V C dimension of S is O (d log d). R) subset N ⊆ X is an ǫ-net for X . Proof. where m(r ) = |r|∩ X| 3 . But to solve the problem of set cover and to show the application we need to deﬁne it. I S = (X. then |N | = δ . ǫ-net & Algorithm for Geometric Set Cover When we state that we are in search of a set which would contain at least one element from each of the large subsets. then formally we are referring to ǫ-net. Hence the result follows.

Suppose the elements of X are weighted. let N be a subset obtained by m independent draws from x. R is the set of m ranges. ∃r ∈ R′ . Let S = (X. If W is the total weight of the set then we will be interested to hit ”large” weighted ranges. That choosing of sample leads us to |N | X| the ǫ-net theorem. If instead of using V C dimension we use shattering dimension of S say d. R) be a range space of V C dimension δ and x be a ﬁnite subset of X . We will call it Geometric Set Cover. Theorem 1. To state formally. The algorithm repeatedly selects an ǫ-net for some ǫ. then large sets would be denoted having more weight than ǫW . we know that w (r ) = y∈r w (y ). To design the algorithm for set cover let us ﬁrst state the set cover problem in a modiﬁed form. Application to Geometric Set Cover The application of ǫ-net is most vivid when we use it to solve the family of covering problems. then sample size greater than or equal to O ( d log d ). We want to choose a subset R′ ⊆ R. there is a weight function w : x → R + . for a range r ⊆ R. i. For 0 ≤ ǫ1 and φ ≤ 1. log } ǫ φ ǫ ǫ .4.1. We will need this result while designing and proving ǫ ǫ the algorithm to solve covering problem. In place of elements we use points that are to be covered.e. We can similarly argue that the same ordering sampling has to be replaced by weighted sampling. The larger the V C dimension the more points we need in the net. Construction To construct the ǫ-net we have to collect point that is part of a large range. x ∈ r . such that ∀x ∈ X. φ is the failure probability so the value of m is inversely proportional to φ. The ǫ net theorem is as follows. where 4 4 8δ 16 m ≥ max{ log . then N is an ǫ-net with probability greater than (1 − φ). We are choosing the 4 . S = (X. the ǫ factor as well apply there. If the net contains N points and for any subset r ⊆ R then we want |N ∩r | X| − |r|∩ ≤ φ where φ is constant. R) has a ﬁnite VC-dimension where X is a set of n points. 5. This is an independent draw without replacement.

If we are successful then a brute force checking can tell us if that ǫ-net is also a cover or not. 4. . It can be shown that number of doubling is bounded by O (k log m ) k where m is number of ranges and k is size of optimal set cover.shattering dimension δ ∗ Result: Finding Geometric Set Cover Flag = 0 . Let Rp = {r ∈ R|p ∈ r } . end end else continue. either we can have a series of numbers 1. Therefore Wi ≤ (1 + ǫ)W i − 1. The number of iterations depends upon optimal set cover and number of ranges. if w (Rp ) ≤ ǫW then double the weight of Rp . Data: Dual Range Space S ∗ . else Y is not a cover so there exists one point p ∈ X that is not covered .ǫ. repeat ∗ ∗ Choose an ǫ-net (weighted) Y of size O ( δǫ log δǫ ) . If S is the original problem we have to work on S ∗ . We have to guess it. 5 . Flag = 1 . . . end until Flag == 1 . Finally we get rid oﬀ the logn factor. Every time we are doubling the weight we are increasing the weight by no more than (1 + ǫ) factor.1. if Y is an ǫ-net then if Y is a cover then Report the cover . As we are increasing the weight of the optimal set at each step.ranges. So we have to work in the dual plane of the problem. Observation 5. 2. The steps of the algorithm are as follows. As we don’t know the size of optimal set cover. We assume that shattering dimension of the range space is δ ∗ . Algorithm 1: Geometric Set Covering One quick observation follows from the algorithm that. We will check if have chosen an actual ǫ-net or not. and test whether we ﬁnd out the cover or not.