You are on page 1of 2

DEDAN KIMATHI UNIVERSITY OF TECHNOLOGY

University Examinations 2020/2021

FOURTH YEAR SEMESTER ONE EXAMINATION FOR THE DEGREE OF BACHELOR OF


SCIENCE IN COMPUTER SCIENCE

CCS 4102: MACHINE LEARNING

DATE: AUGUST 2020 TIME: 2 HOURS

Instructions: Answer Question 1 and Any Other Two.

Question 1: (30 marks)

a) Define and explain the circumstances that may require one to either Normalize or
Standardize the data in a dataset. [4 marks]
b) The goal of resampling methods is to make the best use of your training data in
order to accurately estimate the performance of a model on new unseen data.
You can use either, a train and test split of your data or a k-fold cross-validation
to resample your data. Describe scenarios where you can use either of the
techniques. [4 marks]
c) Describe the following algorithm evaluation matrix and provide their Python
functions.
i) Classification Accuracy [2 marks]
ii) Mean Absolute Error [2 marks]
d) Using equations, differentiate Statistical from Computer Science Learning
Perspective [4 marks]
e) Give any one benefit and any one limitation of parametric and non-parametric
algorithms [4 marks]
f) Provide any two supervised machine learning algorithms for regression
problems [2 marks]
g) Differentiate Classification from Regression Machine Learning problems
[2 marks]
h) Using k-Nearest Neighbours and Support Vector Machine algorithms
discuss the term Bias-Variance Trade-Off [5 marks]
i) Differentiate Over-fitting from under-fitting of a machine learning
Algorithm [1 mark]

Question 2 (15 marks)

Suppose you have a CSV file called iris.data.CSV that has the following features
sepal-length, sepal-width, petal-length, petal-width, class and you have loaded
in your python programming interface of your choice all the necessary libraries,
provide code snippets to perform the following tasks.
1
a) Load the dataset [3 marks]
b) Print dimension and the first 30 rows of the dataset [2 marks]
c) Display both box & whisker plots and histograms of the each
attribute in the dataset [4 marks]
d) Create a validation dataset using split-out validation technique.
Use validation dataset value of 20% and seed value==7 [6 marks]

Question 3 (15 marks)

Use the following dataset to answer the following questions


X Y
1 1
2 3
4 3
3 2
5 5

a) Using a simple linear regression model of the form


y = B0 + B1*x
Compute the value of variables B0 and B1 [8 marks]
b) Make predictions [2 marks]
c) Estimate the Root Mean Square Error of the prediction [5 marks]

Question 4 (15 marks)

Use the following dataset that describes two categorical input variables
and a class variable that has two outputs to answer the questions that follow.
Use Naïve Bayes Algorithm

Weather Car Class


Sunny Working Go-out
Rainy Broken Go-out
Sunny Working Go-out
Sunny working Go-out
sunny Working Go-out
Rainy Broken Stay-home
Rainy Broken Stay-home
Sunny Working Stay-home
Sunny Broken Stay-home
Rainy Broken Stay-home

a) Compute the class probabilities [2 marks]


b) Compute the conditional probabilities [8 marks]
c) Make predictions [5 marks]

You might also like