You are on page 1of 3

Homework 2 – Handwriting

Deadline: 2023/11/13 12:00


Grading Policy
In handwriting assignment, you need to provide detailed derivations. Partial points will be given for
correct reasoning. Please write you answers and derivations on A4 papers and hand in the homework
in the class on 2023/11/13. No late submission will be accepted.

Problems
1. In Principal Components Analysis, given a x and it’s a low-dimensional projection y with the
following equation 𝑦 = 𝑤 𝑇 𝑥, please answer the following questions:
(a) (5%) Please explain the connection between w and covariance matrix of y (Use math).
16 0 2
(b) (10%) Assume the covariance matrix of x is 0 25 0 and we set k=1 in PCA, please
2 0 4
determine the matrix w.
2. (5%) Define a multivariate Bernoulli mixture where inputs are binary and derive the EM
equations. (You can find some hint in section 5.7.)
3. (5%) Generalize the Gini index and the misclassification error for K > 2 classes. Generalize
misclassification error to risk, taking a loss function into account.
Hint: Gini index is in equation 9.5.
Misclassification error is in 9.6.
Risk you can find in equation 3.7.
𝑒𝑥𝑝(𝑎𝑖 ) 𝜕𝑦
4. (5%) Show that the derivative of the softmax, 𝑦𝑖 = ∑ )
, 𝑖𝑠 𝜕𝑎 𝑖 = 𝑦𝑖 (𝛿𝑖𝑗 − 𝑦𝑗 ) where 𝛿𝑖𝑗
𝑗 𝑒𝑥𝑝(𝑎𝑗 𝑗

is 1 if i = j and 0 otherwise.
5. (5%) With K = 2, show that using two softmax output is equal to using one sigmoid output.
6. (5%) Consider a density model given a mixture distribution
𝐾

𝑝(𝑥) = ∑ 𝜋𝑘 𝑝(𝑥|𝑘)
𝑘=1

And suppose that we partition the vector x into two parts so that x = (𝑥𝑎 , 𝑥𝑏 ). show that the
conditional density 𝑝(𝑥𝑏 |𝑥𝑎 ) is itself a mixture distribution and find expressions for the mixing
coefficients and for the component densities.
7. (5%) Consider the following data set for a binary class problem

If we use entropy to measure impurity, calculate the total impurity after splitting on A and B
separately. Which attribute would the decision tree induction algorithm choose?
8. (15%) Consider the following training data:

𝑥1 = (2, 3), 𝑥2 = (3, 2), 𝑥3 = (4, 4), 𝑥4 = (4, 2), 𝑥5 = (6, 4), 𝑥6 = (6, 3), 𝑥7 = (7, 2), 𝑥8 = (8, 3)
Let 𝑦𝐴 = −1, 𝑦𝐵 = +1 be the class indicators for both classes :
𝐴 = {𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 }, 𝐵 = {𝑥5 , 𝑥6 , 𝑥7 , 𝑥8 }
(a) Just using the above-standing plot, specify which of the points should be identified as
support vectors.
(b) Draw the maximum margin line which separates the classes (you don’t have to do any
computations here). Write down the normalized normal vector 𝑤 𝜖 ℝ2 of the separating
line and the offset parameter 𝑏 𝜖 ℝ
(c) Consider the decision rule: 𝐻(𝑥) = 〈𝑤, 𝑥〉 + b. Explain how this equation classifies
points on either side of a line. Determine the class for the points 𝑥9 = (3, 4), 𝑥10 = (7, 4)
and 𝑥11 = (5, 5).
Homework 2 – Programming
Deadline: 2023/11/13 23:30
Grading policy
The programming section of homework consists of “implementation” and “report”. In the
implementation, you have to complete the blocks enclosed by “PLEASE WRITE HERE”
comments. If your submitted .ipynb file cannot be successfully compiled due to the mis-
modification outside of the blocks mentioned above, some points will be deducted. In the report,
answers are expected to be stated with your observations followed by explanations. Writing in
English or Chinese are acceptable.
Please upload your .ipynb file and report named hw2_StudentID.ipynb and hw2_StudentID.pdf
to the eeclass. No late submission will be accepted.
Implementation
In this part you need to train the Support Vector Machine (SVM) and Decision Tree classifier to
train the 3 class of Iris plant from Iris dataset (the Iris dataset is provided with HW materials).
This Iris dataset contains 3 classes of 50 instances each, where each class refers to a type of iris
plant. There are 4 attributes: 1: sepal length, 2: sepal width, 3: petal length, 4: petal width (all in
cm) and 5: class (Iris Setosa, Iris Versicolour, and Iris Virginica).
1. (3%) Correctly implementing Decision tree classifier with tree diagram plot.
2. (3%) Correctly implementing SVM with decision boundary plot.
Report
1. (7%) Implement Decision tree classifier with tree diagram plot while considering depth
parameter as 2, 3 and 4 respectively; report the difference in performance (in terms of
accuracy) and which one works better and why? (You can analyze through confusion matrix
or with tree diagram)
2. (5%) Implement and experiment the performance (accuracy) of SVM classifier while
choosing Gamma values as (1, 100 and 500) with fix regularization parameter (C) as 1.
3. (5%) Plot the decision boundary for above conditions (Q2) i.e., Gamma=1, Gamma=100 and
Gamma=500 with C=1, also explain what you analyze from it? Which is better? Why? (You
can analyze through confusion matrix or with changes in decision boundary)
4. (5%) Implement and experiment the performance (accuracy) of SVM classifier while
choosing C value as (1, 100 and 500) with fix Gamma variable as 1.
5. (5%) Plot the decision boundary for above condition (Q4) i.e., C=1, C=100 and C=500 with
Gamma=1, also explain which one is better? Why? (You can analyze through confusion
matrix or with changes in decision boundary)
6. (7%) Compare the performance of these two algorithms [Decision tree and SVM] and give
some insight about the pros and cons of using one algorithm over another and why? (In
reference with Iris dataset)

You might also like