You are on page 1of 7

ML4NLU Page 1 Matr.Nr.

Machine Learning for Natural Language Understanding


(Upload solutions via moodle test)
Good Luck!
Exercise WS 2023/2024

Mandatory Exercise Due: 04.02.2024

Multiple Choice 10 Points


Are the following statements true or false?

Statement True False


1 The sigmoid-function hθ (x) is smooth and symetric at x = 0.

2 BERT is a language model based on RNNs

3 A Multilayer Perceptron encodes a simple linear discriminant function

4 Every continuous function can be approximated arbitrarily closely by a


multi-layer Artificial Neural Network.
5 GPT-4 is known for its language generation abilities

6 RNNs capture long-term dependencies

7 Hyperparameters should be tuned on the validation set

8 An overfitted model performs well on unknown data

9 The classification of unbalanced data is measured best with error and


accuracy
10 The harmonic mean of precision and recall is called F-measure
ML4NLU Page 2 Matr.Nr.:

Aspects of Machine Learning Models 10 Points

1) Use Pseudocode to fill the steps (1 to 4) in such way that the model goes through the process
of training. Stopping criteria can be ignored. Assign and reuse variables if needed. (5 Points)
Algorithm 1: Generic machine learning model training
input : batches = {samples, targets}, learningrate = λ, P arameters = Θ, loss = MSE,
model = NN
init model(parameters);
for batch in data do
1
2
3
4
end
return model;

2) The following table contains predicted values from a simplified linear model (Yi = Beta0 + xi )
and their true (i.e. expected) counterpart. Show the calculation of the MSE for this model! How
should Beta be updated to minimize the MSE? (5 Points)

Predicted Expected
2 1
3 2
5 4
8 7
ML4NLU Page 3 Matr.Nr.:

Neural Networks I 10 Points

1) Which types of neural networks do you know and for which tasks are they typically used? (2
Points)

2) Explain what distinguishes an Long Short-Term Memory model (LSTM) from a conventional
Recurrent Neural Network (RNN). (3 Points)

3) Name at least three NLP tasks for which an LSTM is suitable! (3 Points)

4) Describe how Transformers handle sequential information (2 Points)


ML4NLU Page 4 Matr.Nr.:

Neural Networks II 10 Points

1) Explain the terms overfitting and underfitting! When can they each occur? (2 Points)

2) Explain the differences between parameters and hyperparameters in a machine learning model (3
Points)
ML4NLU Page 5 Matr.Nr.:

3) Take a look at the following Neural Network:

Input Hidden Ouput


layer layer layer

x0 4
b = −2
1 1

b = 0.5
2 1

x1 1
b = −3

Show that the network correctly classifies the following data. Assume sgn as activation function (5
Points)

x0 x1 Klasse (
2 1 1 +1, if x > 0,
sgn(x):=
-1 2 1 −1, if x <= 0.
-3 2 -1
ML4NLU Page 6 Matr.Nr.:

Language Models 10 Points

1) Name an describe task usually used for the pretraining of language models (e.g. BERT) (2
Points)

2) What are positional embeddings and why are they used in the context of Transformer mod-
els? (2 Points)

3) Name at least four downstream tasks at token or text level and briefly explain them. (4 Points)

4) Discuss where even the largest language models reach their limits! (2 Points)
ML4NLU Page 7 Matr.Nr.:

Benchmarks 10 Points

h i
G(y, n) := (y1 , . . . , yn ), (y2 , . . . , yn+1 ), . . . , (y|y|−n+1 , . . . , y|y| ) (1)

C(g, y, n) := [g|g ∈ G(y, n)] (2)

P  
min C(g, ŷ, n), C(g, y, n)
g∈G(ŷ,n)
P (ŷ, y, n) := P (3)
C(g, ŷ, n)
g∈G(ŷ,n)

ŷ = [a cat is on the mat]


y = [a dog is on the couch]

1) Calculate the uni-/bi-grams for G(ŷ, 1), G(y, 1), G(ŷ, 2), G(y, 2). (4 Points)

2) Calculate the uni-/bi-gram-precision for P (ŷ, y, 1) and P (ŷ, y, 2). (6 Points)

You might also like