You are on page 1of 2

CSE/ECE 343/543: Machine Learning

Assignment-3A
Max Marks: 25 Due Date: 11:59PM Nov. 16, 2019

Instructions

• Keep collaborations at high level discussions. Copying/Plagiarism will be dealt with


strictly.

• Start early, solve the problems yourself. Some of these questions may be asked in
Quiz/Exams.

• Use Google Colab to write and train your models. You can follow this tutorial to get
started with Google Colab. You can upload your data on Google Drive and access it
from there.

• Late submission penalty: As per course policy.

• Submission guidelines: Please refer to the post on google classroom. We will strictly
follow them.

THEORY QUESTIONS

1. (5 points) The KL divergence between two discrete probability distributions P and Q


P P (z)
is given by KL(P |Q) = z P (z) log Q(z) . Show that the cross-entropy loss used for
multi-class classification is the same as the KL divergence under certain assumptions
of the posterior distribution P (y|x), where y are the class labels, while x are the data
samples. What are these assumptions?

2. (10 points) Consider the following neural network and the two activation functions:

Figure 1

1
• A1: Signed Sigmoid function S(x) = sign[σ(x) − 0.5] = sign[ 1+exp(−x) − 0.5]
• A2: Linear activation function L(x) = Cx
a) (5 points) What would be the activation function (A1 or A2) for each neuron (N1,
N2 and N3) so that the output of the neural network is the binary logistic linear
regression classifier Y = arg maxy P (Y = y|X), where Y = P (Y = 1|X) =
exp(β1 X1 +β2 X2 ) 1
1+exp(β1 X1 +β2 X2 )
, Y = P (Y = −1|X) = 1+exp(β1 X 1 +β2 X2 )

b) (5 points) Derive the values of β1 and β2 in terms of weights w0 , ...w6 .

3. (10 points) Consider the set of binary function overs d boolean (binary) variables:

F := {f : {0, 1}d → {0, 1}} (1)

Prove that any function in F can be represented by a neural network with single hidden
layer.

Page 2

You might also like