Professional Documents
Culture Documents
This assessment consists of exam-style questions and you should answer as you
would in an exam. You cannot copy or paraphrase text or material from other sources
and present this as your own work. Your exam answers should be entirely your own
work without unacknowledged input from others. If you are in any doubt, you should
clearly acknowledge the origin of any material, text passages or ideas presented (e.g.
through references). You must not co-operate with any other person when completing
the exam, which must be entirely your own work. You must not share any information
about the exam with another person (e.g. another student) or act on any such
information you may receive. Any attempt to do so will be dealt with under the
University's Policy for Good Academic Practice and may result in severe sanctions.
You must submit your completed assessment on MMS within 3 hours of you
downloading the exam. Assuming you have revised the module contents beforehand,
answering the questions should take no more than three hours.
Page 1 of 8
1. Classification
Two binary classification models C1 and C2 are trained on a particular dataset,
and then evaluated on a separate small dataset containing 7 cases. Each row in
the table below shows the actual class for that test case (0 or 1), and the scores
produced by each classifier for that test case.
(a) For each classifier, calculate the following for the test data:
• the confusion matrix [2 marks]
• the precision [1 mark]
• the recall [1 mark]
• the F1 measure [1 mark]
(b) For each classifier, sketch the ROC curve, including points on the curve
corresponding to threshold values of 0, 0.2, 0.4, 0.6, 0.8 and 1.
[8 marks]
[4 marks]
Page 2 of 8
(d) Given the following F1 and recall curves for a new classification model C3,
describe (using words, via a sketch, or both) how the precision varies with
threshold.
[3 marks]
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Threshold
Recall F1
Page 3 of 8
2. Modelling
(a) The plot below shows observations of attribute y for various values of
attribute x. Both axes have a linear scale. A polynomial regression model is
to be fitted to this data. Suggest which model degree is likely to give the
best results, and explain your reasoning.
[3 marks]
[4 marks]
Page 4 of 8
(c) A regression model is to be fitted to a data set containing attributes x1..x9.
The scatter plots below show the relationships between each x attribute and
the attribute to be predicted, y. All scales are linear.
For each x attribute, explain whether you would include it in the model,
and if so, whether you would perform any additional processing based on
that attribute before fitting the model.
[6 marks]
Page 5 of 8
(d) The table below shows a sample from a data set on grocery shopping habits.
A model predicting the attribute weekly_spend is to be fitted. For each of the
following model types, explain which attributes you would include in the
model, and any additional processing that would be necessary:
[7 marks]
Page 6 of 8
3. Ensemble models
[2 marks]
(b) Explain how a random forest using bagging operates, and how it is able to
give better predictive performance than an individual decision tree.
[4 marks]
(c) Three separate classifiers C1, C2 and C3 have been trained to predict
whether an image contains a cat or a dog. These three classifiers will then
be aggregated into an ensemble. Each classifier has predicted the following
probabilities for seven test cases:
Actual C1 C2 C3
Cat Dog Cat Dog Cat Dog
Cat 0.6 0.4 0.8 0.2 0.9 0.1
Cat 0.4 0.6 0.6 0.4 0.8 0.2
Cat 0.7 0.3 0.4 0.6 0.7 0.3
Dog 0.6 0.4 0.2 0.8 0.6 0.4
Dog 0.4 0.6 0.3 0.7 0.7 0.3
Dog 0.6 0.4 0.8 0.2 0.6 0.4
Cat 0.9 0.1 0.4 0.6 0.9 0.1
Assuming a decision threshold of 0.5, calculate for the test data set:
[4 marks]
[2 marks]
Page 7 of 8
(e) The diagram below shows a model fitted to a training set as the first stage
of a gradient boost regression model. Sketch the following:
You can show these on a single diagram or multiple diagrams, as you wish.
[8 marks]
Page 8 of 8