You are on page 1of 2

6COM1044: Neural Networks and Machine Learning

Dr Shabnam Kadir
Neural Networks and Deep Learning Tutorial 1

1. Try tinkering with deep neural networks here: https://playground.tensorflow.org/


2. Try some of the tutorials here: https://colab.research.google.com/notebooks/welcome.ipynb.
3. Load the MNIST dataset into your Jupyter notebook using TensorFlow. Plot a few digits.
4. (a) Any two successive layers in a feed-forward neural network form a bipartite graph, true or false?
(b) Do the layers need to be fully connected?
5. Use matplotlib to plot the following activation functions and their derivatives (maybe use %inline to
plot within a Jupyter notebook):
(i) Linear: y = h.
1
(ii) Sigmoid (logistic): y = (1+e−h )
(eh −e−h )
(iii) Tanh g(h) = (eh +e−h )

(eh −e−h )
6. Show that for g(h) = (eh +e−h )
, g 0 (h) = 1 − g(h)2 .
∂J
7. For the following loss functions J(y) compute ∂y

(i) J = 12 (y − y ∗ )2
(ii) J = y ∗ log(y) + (1 − y ∗ ) log(1 − y)

8. Given the following functions,


• J = cos(u)
• u = u1 + u2
• u1 = sin(t)
• u2 = 3t
• t = x2
(a) Write J in terms of x.
(b) Find ∂J
∂x using the Chain Rule.

9. (Logistic regression) Given the following functions,


• J = y ∗ log(y) + (1 − y ∗ ) log(1 − y)
• y = 1+e1−a
PD
• a = j=0 θj xj
(a) Draw a neural network that might represent this situation.
(b) Compute ∂J ∂J ∂J ∂J
∂y , ∂a , ∂θj , ∂xj using the Chain Rule.
10. You would like to design a feed-forward neural network that can drive a car. You have available some
training data consists of greyscale images 64 × 64 pixel images. The training labels y = (θ, v) give the
human driver’s steering wheel angle in degrees (or radians) and the human driver’s speed in km/h.
(a) How many units might you include in your input layer?
(b) Let x be the training image (input) vector (which should be compatible with your answer to (a)
)with a 1 component appended at the end (why?). Assume there is a hidden layer with 2048 units
and an output layer of 2 units(one for steering angle, one for speed). You decide to use the ReLU
activation function for the hidden units and no activation function for either the outputs or the
inputs. Calculate the number of parameters (weights) in this network. (Make sure you remember
the bias terms).
(c) You train your network with the cost function J = 12 |y − z|2 , where z ∈ R2 is the output vector.
Assume the following notation:
• r(γ) = max{0, γ} is the ReLU activation function, and let r(v) is r(·) applied component-wise
to a vector.
• Assume all vectors are column vectors.
• g is the vector of hidden unit values before the ReLU activation functions are applied, and
h = r(g) is the vector of hidden unit values after they are applied (we append a 1-component
to the end of h).
• V is the weight matrix mapping the input layer to the hidden layer; g = V x.
• W is the weight matrix mapping the hidden layer to the output layer; z = W h.
∂J
Derive an expression for ∂Wij .
∂J ∂J
(d) Write ∂W as an outer product of two vectors. ( ∂W is a matrix with the same dimensions as W
(why?)).
∂J
(e) Derive ∂Vij .

11. Let the output of a neuron be given by:


N
!
X
y(x) := σ wi xi ,
i=0

with
σ := tanh(z).

Let the training set by (xµ , tµ ) for µ = 1, . . . , M , where x is the input vector and tµ is the desired target
output. Let the error function be:
M M
1 X1 1 X µ
E := (y(xµ ) − tµ )2 ≡ E .
M µ=1 2 M µ=1

Derive an incremental learning rule from E using the gradient descent method applied separately to each
µ
training example. (Hint: ∆wiµ := η ∂E µ 1 µ µ 2
∂wi , where E = 2 (y(x ) − t ) and use the expression in Question
5).

Page 2

You might also like