You are on page 1of 2

AIN SHAMS UNIVERSITY

FACULTY OF ENGINEERING
Department of Computer and Systems Engineering
4th Year, Electrical Engineeing

Spring 2020 Examination Date: 04.08.2020 Time Allowed: 2 Hours

CSE 465: Selected Topics in Systems Engineering

Lecturer: Prof. Hazem M. Abbas Total Marks: 90 Page: 01

Exam consists of FOUR questions in TWO pages [An OPEN-BOOK EXAM]

Question 1: Decision Trees & Bayes Classifiers (25 Marks)

A. (15 Marks) The dataset in the table will be used to learn a decision tree for predicting whether a mushroom is
edible or not based on its shape, color and odor.
1. What is entropy H(Edible|Oder = 1 or Odor = 3)

2. Which attribute that should be chosen to use for the root of the
tree

3. Draw the full decision tree that would be learned for this data

4. Suppose we have a validation set as follows. What will be the


training set error and validation set error of the tree? Express
your answer as the number of examples that would be misclas-
sified.
C B 2 No
D B 2 No
C W 2 Y es

B. (10 Marks) Consider a one-dimensional pattern space (the x-axis) and two classes (ω1 and ω2 ) with densities

p(x|ω1 ) = 0.5e−|x−m1 | and p(x|ω2 ) = e−|x−m2 |

1. Let m1 = 0, m2 = 2 and the decision regions R1 = {x|x ≤ 1} and R2 = {x|x > 1}. Compute the probabilities
of error 1 and 2 . Sketch a figure.
2. How should one place the decision border between R1 and R2 in order to have 1 = 2 ?

Question 2: SVM (20 Marks)

1. Consider building an SVM for the following two-class 2-dimensional training data, with two classes indicated
by circles (o) and crosses (x).
o class: (−1, 3)T , (0, 2)T , (0, 1)T , (0, 0)T
x class: (1, 5)T , (1, 6)T , (3, 3)T

(a) Plot the training points and, by inspection, draw a linear classifier that separates the data with maximum
margin. Identify the support vectors.
(b) Use the primal formulation with the support vectors, to find the parameters of the linear SVM, h(x) =
wT x + b.
(c) Assume that more data points will be added to both classes. State when would these new data will change
the solution found in (ii). Motivate your answer.

1 /2 Examiners: Prof. Hazem Abbas & Prof. Mahmoud Khalil


Question 3: Clustering (25 Marks)

A. (15 Marks) The one dimensional data points, {−2.2, −2.0, −0.3, 0.1, 0.2, 0.4, 1.6, 1.7, 1.9, 2.0)}, are to be clus-
tered as described below. For each part of the problem, assume that the Euclidean distance between the data
points will be used as a dissimilarity measure.

1. Use hierarchical agglomerative clustering with single linkage to cluster the data. Draw a dendrogram to
illustrate your clustering and include a vertical axis with numerical labels indicating the height of each
parental node in the dendrogram.
2. Repeat part (1) using hierarchical agglomerative clustering with complete linkage.
3. Comment on the two results.

B. (10 Marks) Consider the application of the k-means clustering algorithm to the one-dimensional data set
D = {0, 1, 5, 8, 14, 16} for k = 3 clusters

1. Start with the three cluster means: m1 (0) = 2, m2 (0) = 6 and m3 (0) = 9. What are the values of the means
at the next iteration?
2. What are the final cluster means, after convergence of the algorithm?
3. For your final clusters, to which cluster does the point x = 3 belong? To which cluster does x = 11 belong?

Question 4: Linear & Logistic Regression (20 Marks)

A. (10 Marks) Let x1 , x2 , · · · , xn be independent non-negative integers from a Poisson distribution with the
expectation value E[x] = λ. This corresponds to a discrete distribution p(x|λ) = λx e−λ /x!, x ≥ 0 when
E[x] = var[x] = λ.
Find the maximum likelihood (ML) estimate for the parameter λ. Is it unbiased?

B. (10 Marks) Logistic regression is named after the log-odds of success defined as

P (Y = 1|X = x)
ln
P (Y = 0|X = x)

Show that log-odds of success is a linear function of x.

2 /2 Examiners: Prof. Hazem Abbas & Prof. Mahmoud Khalil

You might also like