You are on page 1of 2

INDIAN INSTITUTE OF TECHNOLOGY, KHARAGPUR

Department of Industrial Engineering and Management


Class Test 4

Subject Number: IM31202 Subject Name: Statistical Learning with Applications


Full Marks: 40 Time: 1 hour Date: 06.04.2023

Instructions : 1. Attempt all questions.


2. Maximum marks are shown against each question.
3. Answers should be short and to the point.

1. a. What is the difference between maximal margin classifier and support vector classifier (5)

In maximal margin classifier, it is assumed that the two classes of response data should be
perfectly separable by a hyperplane. A natural choice is the maximal margin hyperplane (also
known as the maximal margin hyperplane optimal separating hyperplane), which is the
separating hyperplane that is farthest from the training observations.

In many cases no separating hyperplane exists, and so there is no maximal margin classifier.
The support vector classifier, sometimes called a soft margin classifier, allow some
observations to be on the incorrect side of the margin, or even the incorrect side of the
hyperplane, rather than seeking the largest possible margin so that every observation is not only
on the correct side of the hyperplane but also on the correct side of the margin.

b. Explain the model form used for support vector classifier. How is it different from
maximal margin classifier? (5)

Model form of SVC

C is a nonnegative tuning parameter. As in (9.11), M is the width of the margin; we seek to


make this quantity as large as possible. In (9.14), 𝜖1 , . . . , 𝜖𝑛 are slack variables that allow
individual observations to be on slack the wrong side of the margin or the hyperplane; we will
explain them in variable greater detail momentarily. Once we have solved (9.12)–(9.15), we
classify a test observation 𝑥 ∗ as before, by simply determining on which side of the
hyperplane it lies. That is, we classify the test observation based on the
sign of 𝑓(𝑥 ∗ ) = 𝛽0 + 𝛽1 𝑥1∗ + · · · + 𝛽𝑝 𝑥𝑝∗ .

If ϵi = 0 then the ith observation is on the correct side of the margin, as we saw in Section 9.1.4.
If ϵi > 0 then the ith observation is on the wrong side of the margin, and we say that the ith
observation has violated the margin. If ϵi > 1 then it is on the wrong side of the hyperplane.

1
In (9.15), C bounds the sum of the ϵi’s, and so it determines the number and severity of the
violations to the margin (and to the hyperplane) that we will tolerate. We can think of C as a
budget for the amount that the margin can be violated by the n observations. If C = 0 then there
is no budget for violations to the margin, and it must be the case that ϵ1 = · · · = ϵn = 0, in
which case (9.12)–(9.15) simply amounts to the maximal margin hyperplane optimization

2. What is bagging in context of decision trees? Explain how it can be applied in regression
and classification problems. (5)

3. From the dataset below, determine what proportion of variability is explained by the first
principal component? (10)

X1 4 8 13 7
X2 11 4 5 14

𝑋𝑖′ = (𝑋𝑖 − 𝜇𝑖 )
X1' -4 0 5 -1
X2' 2.5 -4.5 -3.5 5.5

𝜎𝑋2′ 𝐶𝑜𝑣(𝑋1′ , 𝑋2′ )


1
Σ𝑍 = ( )
𝐶𝑜𝑣(𝑋1′ , 𝑋2′ ) 𝜎𝑋2′
2
=
14 -11
-11 23

′ 2 ′ ′
∑𝑖(𝑋1𝑖 ) ∑ 𝑋1𝑖 𝑋2𝑖
𝜎𝑋2′ = , 𝐶𝑜𝑣(𝑋1′ , 𝑋2′ ) = (n = 3)
1 𝑛−1 𝑛−1

Eigen values of Σ𝑋′ : 30.385 and 6.615

PVE by 1st PC = 30.385/(30.385+6.615) = 0.821 or 82.1%

4. What are the problems of Gradient Descent algorithm in the context of neural networks? (5)

5. How does Adagrad algorithm mitigate the issues in Stochastic Gradient Descent? (5)

6. Explain how convolution and pooling layers work in CNN. (5)

############################

You might also like