You are on page 1of 2

Supervised Machine Learning

1. Select and critique an appropriate algorithm for building a model using supervised learning.
Your answer should include either a description or pseudo-code of how the algorithm works,
input and output limitations, pros, cons, hyper-parameters that require optimisation and types
of suitable applications.

2.

3. A 10-fold cross validation is performed on four data sets using three machine learning
models: Random Forest, Logistic Regression, and a kNearest Neighbour. The training error
results are presented in the table below. Analyse the results and discuss the use of cross-
validation as a standard approach for validating the accuracy of a predictive model.

4. Describe and critique cross validation as the gold standard for evaluating the training
performance of a model. 7M.

5. Describe and critique cross validation as the gold standard for evaluating the training
performance of a model. Include a discussion on the use of cross validation on Spark.
6. What is meant by ‘overfitting’ a model?2M.

7. Good feature selection results in attributes or features that are highly correlated with the
class, yet uncorrelated with each other. Propose and discuss the application of a commonly
applied method of feature selection.6M.

8. Deep Learning is based on a multi-layer feed-forward artificial neural network that is trained
with stochastic gradient descent using backpropagation. “Deep learning is the strongest
machine learning method”. Discuss the validity of this statement in the context of
advantages, disadvantages and use cases of the method. 10M.

9. The newest and shiniest member of the data science algorithm toolbox is deep learning. i.
Propose an application for deep learning. [3] ii. Discuss the similarities between deep
learning and traditional function fitting algorithms like multiple linear regression and logistic
regression. 3+6.

10. Identify two layer types used in deep learning algorithms listing a commonly applied use
case for each.4M.

11. Most of the data mining models deployed in production applications are ensemble models.
In an executive board analogy, having a board with diverse and independent members
makes statistical sense. Propose four methods of achieving diversity in the base models
making up an ensemble.

12. Discuss the similarities between deep learning and traditional function fitting algorithms like
multiple linear regression and logistic regression. (12 Marks).

13. Training a model using deep learning applies multiple layers of representation of data.
Identify three differences between convolutional and recurrent layers. Your answer should
identify at least one difference under any three of the headings: description; input, output;
pros, cons, use cases.

You might also like