You are on page 1of 3

Prof.

Prakash Panangaden, COMP 599

Student: Johannes Brustle, ID: 260580221
Due march 7th 2016

Assignment 3
Question 1.
Let x0 the origin and xi = ei , the i-th standard vector. We will show that
the set S = ni=0 xi can be shattered by linear threshold functions. Given
U S let U the characteristic function of U mapping to {1, 1}. Now set
w
~ = (U (x1 ), ..., U (xn )) and b = U (x0 )/2. For xi , 1 i n, wx
~ i=1b
if xi U and w
~ xi = 1 < b if xi
/ U . Also, w
~ x0 = 0 < b = 1/2 if x0
/U
and w
~ x0 = 0 b = 1/2 if x0 U . That is, S can be shattered by the
class of linear threshold functions.
Question 2.
1. Let x0 < x1 < < x2k be 2k + 1 distinct points on the real line, call this
set X. Consider subset S ={xi : i is even}. Then |S| = k + 1. Then every
x2l S for 0 l n, but no two elements of S can be in the same closed
interval since x2l1 and x2l+1 are not. That is, the VC dimension stated in
the question is < ((2k) + 1).
Now let Y = X x2k , then |Y | = 2k. In this case, let U Y .
Then consider closed intervals:
L = {[xi , xj ] : ((xs U ), i s j)(i = 0xi1
/ U )(j = 2k1xj+1
/
U )}. Elements of L are closed disjoint intervals containing no point in Y U
and all points of U . Suppose |L| > k. But by definition of L, there must
be at least one point in Y U between each interval of L, hence at least k.
Also every interval of L contains at least one point in Y . But this implies
|Y | > k + k, contradiction. Hence |L| k so the VC dimension is 2k.
2. Let X = {e1 , , en , e1 , . . . , en } in Rn . Let I and J be two arbitrary
subsets of [n]. Consider X S = {ei : i I} {ej : j J}.
Also, hyper-rectangle R = [a1 , b1 ] [an , bn ] where aj = 1 if j J, else
aj = 0. Similarly, bi = 1 if i I, else bi = 0. Then obviously R contains all
points in S and no points in X S. Hence the VC dimension is 2n.
Now suppose we are given Y Rn , Y = {y1 , , y2n+1 }.
Let amin
= min{ai , lower bound of i-th coordinate of yj , 1 j 2n + 1},
i

and bmax
= max{bi , upper bound of i-th coordinate of yj , 1 j 2n + 1}.
i
Since there are 2n + 1 points, yl such that amin
and bmax
are the same for
i
i
Y and Y yl , for all i. Then it is easy to see that @R, hyper-rectangle, such
that R Y = Y yl . This proves VC dimension is < 2n + 1
Question 3.
Note that since the only update rule in Perceptron alg. is w~t+1 w
~ t + y t xt
and yt and xt are integer t in our case, w
~ will be integer. Hence suppose
w
~ = (w1 , , wn ) an integer valued linear seperator with respect to S, meaning the Perceptron algorithm makes no mistakes.
This means w
~ x~1 0 w1 0.
Next, w
~ x~2 < 0 w
~ x~2 1 w1 1 + w2 (1) 1 w1 + 1 w2
w2 1.
By induction wt 0 t. Base case was just covered. P
Suppose the claim holds
for {1, . . . , t 1}. If t is odd, then
w
~ xt 0 = 1 t1
k=1 wk + wt wt 0.
Pt1
If t is even, then w
~ xt 1 = k=1 wk wt wt 1.
More P
precisely, since we now know w
~ positive, from above we get that in fact
t1
wt k=1 wk t.
Claim: wt 2t3 for t 2 Do this by induction on t. Base case: w2 1
223 = 1/2.
Pt1
Suppose
claim
holds
for
{1,
.
.
.
,
t

1}.
Then
w

t
k=1 wk wt
Pt1 k3 Pt3 s1 Pt4 s
t3
2
=
2
.
2

2
=
s=0
s=0
k=2
Hence wn 2n3 and recall that we start with w
~ = 0 in the Perceptron
algorithm and the update rule is w~t+1 w
~ t + yt xt so wn changes by 1, only
if a mistake was made. Conclude that a lower bound for total number of
mistakes made is 2n3 .
Question 4.
1. Let C the target concept, |C| = r. If ~x a positive example, at least one
element of C ouputs 1, else if ~x a negative example, all elements of C output
0. Let WC = {wc1 , , wcr } the weights corresponding to C.
Hence our algorithm, on a negative example, leaves WC unchanged, and on
a positive example for which our algorithm predicts negative, i, 1 i r
such that wci wci 2. Moreover, assume for some j, wcj n. If, on a
positive example, the corresponding element of C outputs 1, w
~ ~x n and
no updates are done. That is, no weights in WC that are n are updated.
Also, weights in WC are never decreased as this would mean a corresponding
element of the target concept output 1 on a negative example.

All above implies, since w

~ starts out at (1, , 1) and 2k n k < logn
we get that any weight in WC is updated at most (logn) + 1 times. Conclude
that, since every time a mistake occurs on a positive example, at least one
element of WC is updated, after r((logn) + 1) such mistakes, all elements of
C have weight n and so no mistakes can be made anymore.
P
2. Now consider the total sum of all weights at time t: St = ni=1 wi .
Let the number of mistakes coming from positive examples since time t = 0
be Pt . Similarly, the number of mistakes coming from negative examples
since time t = 0 be Nt .
If Pj increases by one at some time j w
~ ~x < n Sj increases by at most
by n. If Ni increases by one at some time i w
~ ~x n Si decreases by
at least n/2.
Note that S0 = n, and St > 0 t. Thus 0 < St n + Pt (n) Nt (n/2)
Nt < 2 + 2Pt .
3. Total number of mistakes is Mt = Nt + Pt . From 1. we know Pt
r((logn) + 1) t, and 2. implies Nt < 2 + 2r((logn) + 1) t. Hence conclude
that Mt < 2 + 3r((logn) + 1) t.