Professional Documents
Culture Documents
Answer Key
Question 1. Write down the cubic model form of a regression spline with 3 knots at 𝛼1, 𝛼2
and 𝛼3. (5)
𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥 2 + 𝛽3 𝑥 3
Question 2. Write down all the constraints needed to be put on the piecewise cubic model to
convert it to a natural cubic spline. Explain the necessity of all the constraints. (5 + 5)
Answer: To convert a piecewise cubic model to a natural cubic spline, the following
constraints need to be applied: (5+5)
1. Continuity: The function must be continuous at all points where two cubic polynomials
meet. This means that the function values on either side of the knots should be equal.
2. Continuity of the first derivative: The first derivative of the function should also be
continuous at all join points. This means that the slopes of the cubic polynomials on
either side of the knots should be equal.
1
3. Continuity of the second derivative: The second derivative of the function should
also be continuous at all join points. This means that the curvature of the cubic
polynomials on either side of the knots should be equal.
4. Linearity at boundary: Since at the boundary, the number of datapoints are usually
small, the variance increases. Hence a linearity constraint at boundary ensure stability.
Question 3. What function is optimized to generate the smoothing spline? What is the necessity
to use the roughness penalty component? (5)
where λ is a nonnegative tuning parameter. The function g that minimizes the equation is
known as a smoothing spline.
The main purpose of the roughness penalty is to prevent overfitting, which occurs when a
model fits too closely to the training data and fails to generalize well to new data.
Question 4. In spite of being the optimal classifier, what are the disadvantages of Bayes
classifier? (5)
Question 5. Consider the 10 companies listed in below table. For each company, we have
information on whether charges were filed against it, whether it is small or large company, and
2
whether (after investigation) it turned out to be fraudulent (F) or truthful (T) in financial
reporting.
A “small” company has just been “charged” with fraudulent financial reporting. Using naïve
Bayes classification technique, predict the class of the company (T or F)? Estimate the
probability of this company to be fraudulent. (10 + 5)
Solution: (a) Naïve Bayes classification technique, for predicting the class of the company
(T or F) (10)
y1 corresponds to the class ‘truthful’, and y2 corresponds to the class ‘fraudulent’. Therefore,
M = 2, N = 10
𝑁1 6
𝑃(𝑦1 ) = = = 0.6
𝑁 10
𝑁2 4
𝑃(𝑦2 ) = = = 0.4
𝑁 10
𝑉 𝑥1 : {yes, no} = { 𝑣1 𝑥1 , 𝑣2 𝑥1 }; 𝑑1 = 2
3
𝑉 𝑥2 : {𝑠𝑚𝑎𝑙𝑙, large} = { 𝑣1 𝑥2 , 𝑣2 𝑥2 }; 𝑑2 = 2
Value Count 𝐍𝐪 𝐯𝐥 𝐱
𝐣
𝐕𝐥 𝐱𝐣
truthful fraudulent
q=1 q=2
𝑣1 𝑥1 : yes 1 3
𝑣2 𝑥1 : no 5 1
𝑣1 𝑥2 : small 4 1
𝑣2 𝑥1 : large 2 3
We consider an instance from the given dataset (the same procedure applies for a data tuple
not in the given dataset (unseen instance)):
𝑁1 𝑣1 𝑥 1
1
𝑃( 𝑥1 | 𝑦1 ) = =
𝑁1 6
𝑁2 𝑣1 𝑥 3
1
𝑃( 𝑥1 | 𝑦2 ) = =
𝑁2 4
𝑁1 𝑣1 𝑥 4
2
𝑃( 𝑥2 | 𝑦1 ) = =
𝑁1 6
𝑁2 𝑣1𝑥 1
2
𝑃( 𝑥2 | 𝑦2 ) = =
𝑁2 4
1 4 1
𝑃(𝑥| 𝑦1 ) = 𝑃( 𝑥1 | 𝑦1 ) ∗ 𝑃( 𝑥2 | 𝑦1 ) = ∗ =
6 6 9
3 1 3
𝑃(𝑥| 𝑦2 ) = 𝑃( 𝑥1 | 𝑦2 ) ∗ 𝑃( 𝑥2 | 𝑦2 ) = ∗ =
4 4 16
4
1 6 1
𝑃(𝑥| 𝑦1 ) 𝑃(𝑦1 ) = ∗ =
9 10 15
3 4 3
𝑃(𝑥| 𝑦2 ) 𝑃(𝑦2 ) = ∗ =
16 10 40
𝒎𝒂𝒙
𝒚𝑵𝑩 = 𝒂𝒓𝒈 𝑷(𝒙| 𝒚𝒒 ) 𝑷(𝒚𝒒 )
𝒒
3 1
Since, >
40 15
This gives 𝒒 = 2.
Therefore, for the pattern x = {yes, small}, the predicted class is ‘fraudulent.’
1 6 3 4 17
= ∗ + ∗ =
9 10 16 10 120
𝑃(𝑥 | 𝑦2 ) ∗ 𝑃( 𝑦2 )
𝑃(𝑦2 | 𝑥) =
𝑃(𝑥)
3 4
∗ 10 9
𝑃(𝑦2 | 𝑥) = 16 = = 0.5294
17 17
120