You are on page 1of 2

CSE 5523: Machine Learning Theory Fall 2017

Homework 1
Instructor: Raef Bassily Due on: Mon Sep 18

Instructions and Notes


• Show your reasoning for whatever results you provide.

• You can use known results in literature, but, you need to cite your source.
• Notation: For any positive integer k, the notation [k] is used to denote {1, . . . , k}. For d-dimensional
vectors x = (x1 , . . . , xd ), y = (y1 , . . . , yd ), the notation hx, yi is used to denote the inner-product
Pd
between x and y; that is, hx, yi = i=1 xi yi . Also, kxkp will denote the Lp norm of a vector x, i.e.,
1/p
kxkp = (|x1 |p + . . . + |xd |p ) .

Problem 1 [5 points]
Consider random variables X and Y . Let U = min (X, Y ) and V = max (X, Y ).

1. Suppose E [X] = E [Y ] = 1. What is E [U + V ]? Suppose, moreover, that X and Y are independent.


What is E [U V ]?

2. Suppose X and Y are identically distributed and have continuous joint density function. What is
P [U = X] and P [U = Y ]?

Problem 2 [10 points]


Suppose that x1 , . . . , xT are uniformly and independently drawn from { 1, 1}n (i.e., for each i 2 [T ], xi =
(xi,1 , . . . , xi,n ) is a vector of n independent random variables that take values ±1 with equal probability.)
1. Find an upper bound (the tighter, the better) on the probability of the following event:

9i, j 2 [T ], i 6= j such that hxi , xj i > n/10.

2. Suppose T is even number; that is, T = 2M for some positive integer M . Suppose n 100. For each
2
k 2 [M ], let wk , hx2k 1 , x2k i. Let w
b , median (w1 , . . . , wM ). Show that P [w
b > n/10]  e 2c M ,
1 2
where c = 2 e .
[See the hints section below for a hint on Part 2.]

Problem 3 [5 points]
Let ` : R ⇥ R ! R be a real-valued function that takes two real variables as inputs. For each z 2 R, let
b
w(z) 2 arg min `(w, z); that is, w(z)
b is a minimizer of `(·, z) (assume such minimizer always exists). Let Z be
w2R
a real-valued random variable. Let w⇤ 2 arg min E [`(w, Z)]. Prove that E [` (w(Z),
b Z)]  E [`(w⇤ , Z)].
w2R
Problem 4 [10 points]
Let x(1) , . . . , x(n) be a fixed (deterministic) collection of n points in { 1, 1}d , that is, for each i 2 [n],
(i) (i)
x(i) = (x1 , . . . , xd ) 2 { 1, +1}d . For each i 2 [n], x(i) is independently randomized and we obtain a noisy
(i)
version z(i) as follows: for each coordinate zj , j 2 [d],
8 (i)
>
> xj
< 2✏ with probability 1/2 + ✏;
(i)
zj =
>
> (i)
: xj
with probability 1/2 ✏.
2✏

for some fixed ✏ 2 (0, 1/2). Also, all coordinates are randomized independently of each other. Let y =
(y1 , . . . , yd ) be a fixed (deterministic) point in { 1, +1}d . Let 2 (0, 1).

1. Find a value for ↵ (expressed as a function of , n, d, ✏) such that


n n
1 X (i) X
hz , yi hx(i) , yi  ↵ with probability at least 1 .
n i=1 i=1

(Note: find the smallest ↵ you can get; that is, use the tightest concentration inequality among those
we studied that is applicable to this scenario.)
Pn
2. Let z̄ = n1 i=1 z(i) . Find a value for ↵ (expressed as a function of , n, d, ✏) such that

kz̄ E [z̄]k2  ↵ with probability at least 1 .

[See the hints section below for a hint on Part 2.]

Hints
• Problem 2 - Part 2:
– Define Ek = 1 [ wk > n/10 ] , k 2 [M ]. Note that E1 , . . . , EM is a sequence of random variables.
What can you say about them? For any k 2 [M ], show that E [Ek ]  e 2 (use the fact that
PM
n 100). Now, think what does the event w b > n/10 imply about k=1 Ek ? Now, you have an
PM
event in terms of k=1 Ek , finalize your proof by applying Cherno↵-Hoe↵ding’s bound.

• Problem 4 - Part 2:
– You may need to use McDiarmid’s inequality with f (z(1) , . . . , z(n) ) = kz̄ E [z̄]k2 being the
function of bounded sensitivity. In the version of McDiarmid’s inequality stated in class, the
function was defined over n scalar random variables. One can have the exact statement of the
bound for functions defined over n d-dimensional vectors. In this part of the problem, you may
need to consider f (z(1) , . . . , z(n) ) = kz̄ E [z̄]k2 as a function of n vectors z(1) , . . . , z(n) .
h i r h i
– To complete your proof, you may need to use the fact that E kz̄ E [z̄]k2  E kz̄ E [z̄]k22 .
You can then explicitly evaluate the quantity on the right-hand side (in terms of ✏, d, and n). Use
this together with McDiarmid’s theorem to obtain a bound on the probability of the event in
question.

You might also like