You are on page 1of 1

C4B

Machine Learning I
A. Zisserman, Hilary Term 2009

1. Given the following training data for a {0, 1} binary classier: (x1 = 0.8; y1 = 1); (x2 = 0.4; y2 = 0); (x3 = 0.6; y3 = 1) Determine the output of a K Nearest Neighbour (K-NN) classier for all points on the interval 0 x 1 using (a) 1-NN (b) 3-NN 2. A regressor algorithm is dened using the mean of the K Nearest Neighbours of a test point. Determine the ouput on the interval 0 x 1 using the training data in question (1) for K=2. 3. Two students are working on a machine-learning approach to spam detection. Each student has their own set of 100 labeled emails, 90% of which are used for training and 10% for validating the model. Student A runs a KNN classication algorithm and reports 80% accuracy on her validation set. Student B experiments with over 100 different learning algorithms, training each one on his training set, and recording the accuracy on the validation set. His best formulation achieves 90% accuracy. Whose algorithm would you pick for protecting a corporate network from spam? Why? 4. Are the following sets of points linearly separable: (a) S1 : S2 : (b) S1 : S2 : (1, 2, 0) , (2, 4, 0), (3, 1, 0) (2, 4, 1) , (1, 5, 1), (5, 0, 1) (1, 2) , (2, 4) , (3, 1) (2, 4) , (1, 5) , (5, 0)

(c) Describe how the convex hulls of the sets can be used to determine if they are linearly separable. 5. For a linear SVM, f (x) = w x + b, show: (a) That the value of b can be computed from w and one support vector. (b) That the vector w in the primal cost function can be expressed as N i i xi , where xi are the training data. N (Hint, start by expressing w = i i xi + w , where w is orthogonal to xi i). 6. Show that if k1 (x, x ) and k2 (x, x ) are both valid kernels, then so is k1 (x, x ) + k2 (x, x ). (Hint, start from the properties of the Gram matrix Ki associated with ki (x, x )). 7. K-NN can be used in a transformed feature space x (x) using the Kernel trick. By expanding ||x x ||2 show that the distance between two points can be written in terms of kernels. 8. (a) Show that for the logistic sigmoid function d (z ) = (1 ) dz (b) The negative log-likelihood for logistic regression training is
n

L(w) =
i

yi log (w xi ) + (1 yi ) log(1 (w xi ))

Show that its gradient has the simple form: dL = dw


n

(yi (w xi ))xi
i

You might also like