You are on page 1of 4

ISE 233 HW #1 – Solutions

Due 11:59 PM on March 1st, 2021 (Monday)

Problem 1 (Linear Regression and Gradient Descent):

Part (a): What is the loss function for linear regression? Describe by words and formula.
Solution: Loss function for linear regression is sum squared error or mean squared error

Part (b): Why would we use an iterative algorithm for the linear regression problem?
Solution: Closed form equation can be used for simple linear regression problem, but it is
hard to be used to solve multiple linear regression problem.

Part (c): What can happen if the learning rate is too high or too low?
Solution: Learning rate is too small – gradient descent would take long time to converge
and can be very slow
Learning rate is too high – gradient descent can overshoot the minimum. You might jump
across the valley and end up on the other side, possibly even higher up than you were
before. So, the algorithm may fail to converge, or even diverge

Part (d): How does the gradient descent algorithm update the ’s?

Solution:

Problem 2 (Linear Regression and Gradient Descent): You are given a vector a
measurements x and true values y

1 1.5
𝑥 = [2] 𝑦=[2 ]
3 2.5

Part (a): Plot y and x as points.


3
2.5
2
1.5
1
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5
Solution:

Part (b): If we start with 0 =0 and 1 =0, what is the initial value for the loss function?
1 2 1
Solution: 2𝑚 ∑(𝑦𝑖 − 𝑦̂)
𝑖 = (1.52 + 22 + 2.52 ) = 2.083
2∗3

Part (c): Compute the next estimate of 0 and 1, after 1 iteration of gradient descent.
Solution: If select the step size 𝛼 = 0.1
1 0.1
𝜃0𝑛𝑒𝑤 = 𝜃0𝑜𝑙𝑑 − 𝛼 [ ∑(𝜃0 + 𝜃1 𝑥𝑖 − 𝑦𝑖 )] = 0 − [−1.5 − 2 − 2.5] = 0.2
𝑚 3
1 0.1
𝜃1𝑛𝑒𝑤 = 𝜃1𝑜𝑙𝑑 − 𝛼 [ ∑(𝜃0 + 𝜃1 𝑥𝑖 − 𝑦𝑖 ) 𝑥𝑖 ] = 0 − [−1.5 ∗ 1 − 2 ∗ 2 − 2.5 ∗ 3] = 0.43
𝑚 3
𝑦 = 0.2 + 0.43𝑥

Problem 3 (Data Concepts): What is data object, data label, and data attribute? Describe
by words and give an example.
Solution: Data object – represents an entity (samples, examples, instances, data points, or
objects). Data label – detected or tagged data objects/samples. Data attribute – a data field
representing a characteristic or feature of a data object.

Problem 4 (Box Plots): Consider the following data.

1 1 2 2 4 6 6.8 7.2 8 8.3 9 10 10 11.5

Part (a): What is Q1, Q3, median, smallest value, largest value, and IQR?
Solution:
2+2 6.8+7.2 9+10
1st method: Q1= =2; Median= =7; Q3= =9.5; Max=11.5; Min=1; IQR=Q3-Q1=7.5
2 2 2
Part (b): Use Box plots to identify if there is any outlier.
Solution: There is no outlier.
Problem 5 (Data Dissimilarity): Consider the following data matrix, Define a 3-by-3
data dissimilarity matrix.

Object Number Attribute 1 Attribute 2 Attribute 3


1 A Excellent 45
2 B Fair 22
3 C Good 64
Solution:
This is mixed type data. For mixed type data matrix, the difference for numeric attribute
|𝑥 −𝑥 |
𝑖𝑓 𝑗𝑓
equals 𝑚𝑎𝑥−𝑚𝑖𝑛 ; the difference for nominal/binary attribute equals to 1 if the values of two
attributes are different; the difference for ordinal attribute is same as the numeric attribute.
Finally, the overall difference is the average value for all types of attributes. Therefore, the
dissimilarity matrix can be defined as.
0
0.85 0
0.65 0.83 0

Problem 6 (Principal Component Analysis): Consider the following data matrix. Use
PCA to reduce the dimension by 1.

Object Number Attribute 1 Attribute 2


1 90 60
2 90 90
3 60 60
4 60 60
5 30 30
Part (a): Compute the mean of every attribute, and the mean adjusted data matrix.
Mean of attribute 1=66
Mean of attribute 2=60
Mean adjusted data matrix=
24 0
24 30
-6 0
-6 0
-36 -30

Part (b): Compute the covariance matrix.


Covariance matrix=[630 450] (sample variance and sample covariance)
450 450

Part (c): Compute the eigenvalues and eigenvectors of the covariance matrix.
1.22 1
𝜆1 = 998.9, 𝑣1 = [1.22, 1]𝑇 , then we standardized the eigenvector. 𝑣1 = [ , ]𝑇 =
√1.222 +12 √1.222 +12
[0.77339, 0.63393]T
−0.82 1
𝜆2 = 81.1, 𝑣2 = [−0.82, 1]𝑇 , then we standardized the eigenvector. 𝑣1 = [ , ]𝑇 =
√−0.822 +12 √−0.822 +12
[−1.43266, 1.74714]T

Part (d): Pick the top one principal component.


Pick 𝜆1 = 998.9, 𝑣1 = [0.77339, 0.63393]T

Part (e): Derive the new data.

You might also like