CS615 - Assignment 2

CS 615 - Deep Learning
Assignment 2 - Objective Functions and Gradients

Winter 2022
Introduction
In this assignment we’ll implement our output/objective modules and add computing the gradients
to each of our modules.
Allowable Libraries/Functions
Recall that you cannot use any ML functions to do the training or evaluation for you. Using basic
statistical and linear algebra function like mean, std, cov etc.. is fine, but using ones like train are
not. Using any ML-related functions, may result in a zero for the programming component. In
general, use the “spirit of the assignment” (where we’re implementing things from scratch) as your
guide, but if you want clarification on if can use a particular function, DM the professor on slack.
Grading
Do not modify the public interfaces of any code skeleton given to you. Class and
variable names should be exactly the same as the skeleton code provided, and no default
parameters should be added or removed.
Theory 18pts
Implementation of non-input, non-objective gradient methods 20pts
Implementation of objective layers 40pts
Tests on non-input, non-objective gradient methods 10pts
Tests on objective layers’ loss computations 4pts
Tests on objective layers’ gradient computations 8pts
TOTAL 100pts
Table 1: Grading Rubric
1
1 Theory

1. (8 points) Given h = 1 2 3 4 as an input, compute the gradients of the output with respect
to this input for:
(a) A ReLU layer:
If it is a positive integer the value is the same else it is 0. For the gradient, those are
values
 are the
 diagonal of the matrix, thus, giving us an identity matrix. Therefore, Y =
1 0 0 0
0 1 0 0
 
0 0 1 0
0 0 0 1
(b) A Softmax layer:

On the diagonals where i = j, gradient is equal to (g(z) ∗ (1 − g(z)) and where i! = j,
0 −2 −3 −4
−2 −2 −6 −8 
gradient is equal to (−gi (z) ∗ gj (z)). Therefore, Y = 
−3 −6 −6 −12

−4 −8 −12 −12
(c) A Sigmoid Layer:

1
The values obtained from −z are the values of the diagonal, in order to obtain the
1+e 
0.19661193 0. 0. 0.
 0. 0.10499359 0. 0. 
gradient. Therefore, Y =  
 0. 0. 0.04517666 0. 
0. 0. 0. 0.01766271
(d) A Tanh Layer:

ez −e−z
The values obtained from z −z are the values of the diagonal, in order to obtain the
e +e 
0.41997434 0. 0. 0.
 0. 0.07065082 0. 0. 
gradient. Therefore, Y =  
 0. 0. 0.00986604 0. 
0. 0. 0. 0.00134095

2. (2 points) Given h = 1 2 3 4 as an input, compute the gradient of the output a fully  con-
1 2
3 4 
nected layer with regards to this input if the fully connected layer has weights of W = 
5 6 

7 8
as biases b = −1 2 .
Answer: The gradient of the

output of a fully connected layer is the transpose of the weights(W )
1 3 5 7
matrix. Therefore, Y =
2 4 6 8
2
3. (2 points) Given a target value of y = 0 and an estimated value of ŷ = 0.2 compute the loss
for:
(a) A squared error objective function: (y − ŷ)2

= (0 − 0.2)2
Therefore, Y = 0.04000000000000001
(b) A log loss (negative log likelihood) objective function): −((y ∗ ln(ŷ)) + ((1 − y) ∗ ln(1 − ŷ)))
= −((0 ∗ ln(0.2)) + ((1 − 0) ∗ ln(1 − 0.2)))
Therefore, Y = 0.2231435513142097
4. (1 point) Given a target distribution of y = [1, 0, 0] and an estimated distribution of ŷ =

[0.2, 0.2, 0.6] compute the cross entropy loss.
PK
Answer: Cross Entropy Loss = k=1 (yk ln(ŷ))
Therefore, Y = 1.6094379124341003
5. (4 points) Given a target value of y = 0 and an estimated value of ŷ = 0.2 compute the gradient
of the following objective functions with regards to their input, ŷ:
(a) A squared error objective function: −2(y − ŷ)

= −2(0 − 0.2)
Therefore, Y = 0.4
−(y−ŷ)
(b) A log loss (negative log likelihood) objective function): ŷ(1−ŷ)
−(0−0.2)
= 0.2(1−0.2)
Therefore, Y = 1.2499999999999998
6. (1 point) Given a target distribution of y = [1, 0, 0] and an estimated distribution of ŷ =

[0.2, 0.2, 0.6] compute the gradient of the cross entropy loss function, with regard to the input
distribution ŷ.
−y
Answer: Gradient of cross entropy loss function = ŷ

Therefore, Y = −5. −0. −0.
3
2 Adding Gradient Methods
To each of your non-input modules from HW1 (FullyConnectedLayer, ReLuLayer, SoftmaxLayer,
TanhLayer and SigmoidLayer ) implement the gradient method such that it computes and returns
the gradient of the most recent output of the layer with respect to its most recent input (both of
which should have been stored in the parent class, and updated in the forward method).
3 Objective Layers
Next, let’s implement a module for each of our objective functions. These modules should implement
(at least) two methods:
• eval - This method takes two explicit parameters, the target value and the incoming/estimated
value, and computes and returns the loss (as a single float value) according to the module’s
objective function.
• gradient - This method takes the same two explicit parameters as the eval method and computes
and returns the gradient of the objective function using those parameters.
Implement these for the following objective functions:
• Least Squares as LeastSquares
• Log Loss (negative log likelihood) as LogLoss
• Cross Entropy as CrossEntropy
Your public interface is:

c l a s s XXX( ) :
def eval ( s e l f , y , yhat ) :
#TODO
def g r a d i e n t ( s e l f , y , yhat ) :
#TODO
4
4 Testing the gradient methods
For each non-input, non-objective layer (FullyConnectedLayer (w/ two outputs), ReLuLayer, Soft-
maxLayer, TanhLayer, and SigmoidLayer ):
1. Instantiate the layer (for reproducibility, seed your random number generator to zero prior to
running your tests).
2. Forward propagate through the layer using the input provided below.
3. Run your gradient method to get the gradient of the output of the layer with respect to its
input.
Here’s the input (single observation) to each of these layers:

h= 1 2 3 4
In your report provide the gradient returned by each layer.
• Gradient of Fully Connected Layer:

9.76270079e − 06 2.05526752e − 05 −1.52690401e − 05 −1.24825577e − 05
Y=
4.30378733e − 05 8.97663660e − 06 2.91788226e − 05 7.83546002e − 05
• Gradient of ReLU Activation Layer:

 
1 0 0 0
0 1 0 0
Y= 0

0 1 0
0 0 0 1
• Gradient of Softmax Activation Layer:

 
0.03103085 −0.00279373 −0.00759413 −0.02064299
−0.00279373 0.07955019 −0.02064299 −0.05611347
Y= −0.00759413

−0.02064299 0.18076935 −0.15253222
−0.02064299 −0.05611347 −0.15253222 0.22928869
• Gradient of Tanh Activation Layer:

 
0.58781675 0. 0. 0.
 0. 0.44338254 0. 0. 
Y= 
 0. 0. 0.4231454 0. 
0. 0. 0. 0.42040353
• Gradient of Sigmoid Activation Layer:

0.21936186 0. 0. 0.0. 0.20715623 0. 0.0. 0. 0.20087901 0.0.
Y=
0. 0. 0.19824029
5
5 Testing the Objective Layers
Finally we’ll test the objective layers. For each objective module you’ll:
• Instantiate the layer/module
• Evaluate the objective function using the provide estimate and target value(s).
• Compute (and return) the gradient of the objective function given the estimated and target
value(s).
For this you’ll use the following target (y) and estimated (ŷ) values for the least squares and log loss
objective functions:
y=0
ŷ = 0.2
and the following for the cross-entropy objective function:

y= 1 0 0

ŷ = 0.2 0.2 0.6
In your report provide the evaluation of the objective function and the gradient returned by each of
these output layers.
• Least Squares
Evaluation: 0.04000000000000001
Gradient: 0.4
• Log Loss(Negative Log Likelihood)
Evaluation: 0.2231435513142097
Gradient: 1.2499999999999998
• Cross Entropy
Evaluation: 1.6094379124341003

Gradient: −5. −0. −0.
6
Submission
For your submission, upload to Blackboard a single zip file containing:
1. PDF Writeup
2. Source Code
3. readme.txt file
The readme.txt file should contain information on how to run your code to reproduce results for each
part of the assignment.
The PDF document should contain the following:
1. Part 1: Your solutions to the theory question
2. Parts 2-3: Nothing
3. Part 4: The gradient of the output of each layer with respect to its input, where the provided
X is the input.
4. Part 5: The loss of each objective layer using the provided y and ŷ as well as the gradient of
the objective functions, with regards to their input (ŷ).

CS615 - Assignment 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS615 - Assignment 2

Uploaded by

Copyright:

Available Formats

CS 615 - Deep Learning

Assignment 2 - Objective Functions and Gradients

Table 1: Grading Rubric

(b) A Softmax layer:

(c) A Sigmoid Layer:

(d) A Tanh Layer:

Answer: The gradient of the

(a) A squared error objective function: (y − ŷ)2

4. (1 point) Given a target distribution of y = [1, 0, 0] and an estimated distribution of ŷ =

(a) A squared error objective function: −2(y − ŷ)

6. (1 point) Given a target distribution of y = [1, 0, 0] and an estimated distribution of ŷ =

Implement these for the following objective functions:

• Least Squares as LeastSquares

• Log Loss (negative log likelihood) as LogLoss

• Cross Entropy as CrossEntropy

Your public interface is:

• Gradient of Fully Connected Layer:

• Gradient of ReLU Activation Layer:

• Gradient of Softmax Activation Layer:

• Gradient of Tanh Activation Layer:

• Gradient of Sigmoid Activation Layer:

• Instantiate the layer/module

• Log Loss(Negative Log Likelihood)

The PDF document should contain the following:

1. Part 1: Your solutions to the theory question

2. Parts 2-3: Nothing

You might also like