Assignment 2

18.657.
Fall 2105
Rigollet October 18, 2015
Problem set #2 (due Wed., October 21)
Should be typed in LATEX
Problem 1. Rademacher Complexities and beyond

Let F be a class of functions from X to IR and let X1 , . . . , Xn be iid copies of a
random variable X ∈ X . Moreover, let σ1 , . . . , σn be n i.i.d. Rad(1/2) random variables
and let g1 , . . . , gn be n i.i.d. N(0, 1). Assume that all these random variables are mutually
independent.
1. Prove the desymmetrization inequality:

1 Xn 1 Xn
h i h i
IE sup σi f (Xi ) − IE[f (X)] ≤ 2IE sup f (Xi ) − IE[f (X)]

f ∈F n i=1 f ∈F n i=1
2. Prove the Rademacher/Gaussian process comparison inequality

h Xn i rπ h Xn i
IE sup σi f (Xi ) ≤ IE sup gi f (Xi )
f ∈F i=1 2 f ∈F i=1
n
h 1 X i
Define Rn (F ) = IE sup σ f (X i . Let F and G be two set of functions from X to
)

i
n

f ∈F i=1
IR and recall that F + G = {f + g : f ∈ F , g ∈ G}.
3. Let h ∈ IRX be a given function and define F + h = {f + h : f ∈ F }. Show that
khk∞
Rn (F + {h}) ≤ Rn (F ) + √ ,
n
where khk∞ = supx∈X |h(x)|.
4. Let F1 , . . . , Fk be k sets of functions from X to IR. Show that

k
X
Rn (F1 + · · · , Fk ) ≤ Rn (Fj ) .
j=1
5. Show that this inequality derived in 4. is in fact an equality when the Fj s are the
same.
page 1 of 4
Problem 2. Covering and packing
Definition: A set P ⊂ T is called an ε-packing of the metric space (T, d) if d(f, g) > ε
for every f, g ∈ P , f =
6 g. The largest cardinality of an ε-packing of (T, d) is called
the packing number of (T, d):

D(T, d, ε) = sup card(P ) : P is an ε packing of (T, d)
Recall that N(T, d, ε) denotes the ε-covering number of (T, d).
1. Show that
D(T, d, 2ε) ≤ N(T, d, ε) ≤ D(T, d, ε)
Let M be an n × m random matrix with entries that are i.i.d Rad(1/2) entries. We
are interested in its operator norm
kMk = sup u⊤ Mv .
u∈IRn : |u|2 ≤1
v∈IRm : |v|2 ≤1
2. Show that
kMk ≤ 2 max u⊤ Mv ,
u∈Nn
v∈Nm
where Nn and Nm are 41 -nets of the unit Euclidean balls of IRn and IRm respectively.
3. Conclude that √ √
IEkMk ≤ C m+ n .
Problem 3. Chaining
Let F be the class of all nondecreasing functions from [0, 1] to [0, 1].
1. Show that for any x = (x1 , . . . , xn ) ∈ [0, 1]n , the covering number of (F , dx∞) satisfy:
N(F , dx∞ , ε) ≤ n2/ε .
2. Using the chaining bound, show that

r
log n
Rn (F ) ≤ C
n
page 2 of 4
3. Show that there is indeed a strict improvement over the bound obtained using the
theorem in section 5.2.1
Problem 4. Kernel ridge regression

Consider the regression model:
Yi = f (xi ) + ξi , , i = 1, . . . , n
where x1 , . . . , xn are fixed design points in IRd , ξ = (ξ1 , . . . , ξn ) ∼ N (0, Σ) ∈ IRn with
known covariance matrix Σ ≻ 0 and f : IRd → IR is an unknown regression function.
Let W be an RKHS on IRd with reproducing kernel k. Define Y = (Y1 , . . . , Yn )⊤ and
g = [g(x1 ), . . . , g(xn )]⊤ for any function g. Define the estimator fˆ of f by
fˆ = argmin ψ(Y − g) + µkgk2W

g∈W
where k · kW denotes the Hilbert norm on W , ψ(x) = x⊤ Σ−1/2 x and µ > 0 is a tuning
parameter to be chosen later.
1. Prove the representer theorem, i.e., that there exists a vector θ ∈ IRn such that
n
fˆ(x) =
X
θi k(xi , x) , for any x ∈ IRd
i=1
2. Prove that the vector f̂ = [fˆ(x1 ), . . . , f(x

ˆ n )]⊤ satisfies
(KΣ−1/2 + µIn )f̂ = KΣ−1/2 Y ,
where In is the identity matrix of IRn and K denotes the symmetric n × n matrix
with elements Ki,j = k(xi , xj ).
3. Prove that the following inequality holds

n
2
1 X 2
ψ(f − f̂) ≤ inf ψ(f − g) + 2µkgkW + Zi k(xi , ·) W ,
g∈W µ i=1
where Z1 , . . . , Zn are iid N (0, 1).
4. Conclude that
2
1
IEψ(f − f̂) ≤ inf ψ(f − g) + 2µkg kW + Tr(K) ,
g∈W µ
where Tr(K) denotes the trace of K.
page 3 of 4
5. Assume now that k is the Gaussian kernel:
′ 2
k(x, x′ ) = e−|x−x |2
Show that there exists a choice of µ for which

√
IEψ(f − f̂) ≤ 2kf kW 2n .
page 4 of 4
MIT OpenCourseWare
http://ocw.mit.edu
18.657 Mathematics of Machine Learning

Fall 2015
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Assignment 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 2

Uploaded by

Copyright:

Available Formats

18.657.

Problem set #2 (due Wed., October 21)

Should be typed in LATEX

Problem 1. Rademacher Complexities and beyond

1. Prove the desymmetrization inequality:

2. Prove the Rademacher/Gaussian process comparison inequality

4. Let F1 , . . . , Fk be k sets of functions from X to IR. Show that

Recall that N(T, d, ε) denotes the ε-covering number of (T, d).

N(F , dx∞ , ε) ≤ n2/ε .

2. Using the chaining bound, show that

Problem 4. Kernel ridge regression

fˆ = argmin ψ(Y − g) + µkgk2W

2. Prove that the vector f̂ = [fˆ(x1 ), . . . , f(x

(KΣ−1/2 + µIn )f̂ = KΣ−1/2 Y ,

3. Prove that the following inequality holds

where Z1 , . . . , Zn are iid N (0, 1).

Show that there exists a choice of µ for which

18.657 Mathematics of Machine Learning

You might also like