You are on page 1of 5

SLA Class Test 2

Answer Key

Question 1. Write down the cubic model form of a regression spline with 3 knots at 𝛼1, 𝛼2
and 𝛼3. (5)

𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥 2 + 𝛽3 𝑥 3

Answer: A cubic spline with K knots can be modelled as - (5)


𝑦𝑖 = 𝛽0 + 𝛽1 𝑏1 (𝑥𝑖 ) + 𝛽2 𝑏2 (𝑥𝑖 ) + ⋯ + 𝛽𝐾+3 𝑏𝐾+3 (𝑥𝑖 ) + 𝜀𝑖
For an appropriate choice off basis functions 𝑏1 , 𝑏2 , … , 𝑏𝐾+3. The model can then be fit using
least squares.
The most direct way to represent a cubic spline is to start off with a basis for a cubic polynomial
– namely, 𝑥, 𝑥 2 , 𝑥 3 - and then add one truncated power basis function per knot.
A truncated power basis function is defined as -
3
ℎ(𝑥, 𝛼) = (𝑥 − 𝛼)3+ = {(𝑥 − 𝛼) 𝑖𝑓 𝑥 > 𝛼
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where α is the knot.
In order to fit a cubic spline to a data set with 3(=K) knots, we perform least squares regression
with an intercept and 6 (=3+K) predictors, of the form 𝑋, 𝑋 2 , 𝑋 3 , ℎ(𝑋, 𝛼1 ), ℎ(𝑋, 𝛼2 ), ℎ(𝑋, 𝛼3 )
where 𝛼1 , 𝛼2 , 𝛼3 are the knots.
A cubic spline with 3 knots can be modelled as -
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥 2 + 𝛽2 𝑥 3 + 𝛽4 ℎ(𝑥, 𝛼1 ) + 𝛽5 ℎ(𝑥, 𝛼2 ) + 𝛽6 ℎ(𝑥, 𝛼3 ) + 𝜀𝑖
where,
3
ℎ(𝑥, 𝛼𝑖 ) = (𝑥 − 𝛼𝑖 )3+ = {(𝑥 − 𝛼𝑖 ) 𝑖𝑓 𝑥 > 𝛼𝑖
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Question 2. Write down all the constraints needed to be put on the piecewise cubic model to
convert it to a natural cubic spline. Explain the necessity of all the constraints. (5 + 5)

Answer: To convert a piecewise cubic model to a natural cubic spline, the following
constraints need to be applied: (5+5)
1. Continuity: The function must be continuous at all points where two cubic polynomials
meet. This means that the function values on either side of the knots should be equal.
2. Continuity of the first derivative: The first derivative of the function should also be
continuous at all join points. This means that the slopes of the cubic polynomials on
either side of the knots should be equal.

1
3. Continuity of the second derivative: The second derivative of the function should
also be continuous at all join points. This means that the curvature of the cubic
polynomials on either side of the knots should be equal.
4. Linearity at boundary: Since at the boundary, the number of datapoints are usually
small, the variance increases. Hence a linearity constraint at boundary ensure stability.

The above constraints are required for natural cubic spline.


The derivative constraints ensure that the resulting natural cubic spline is smooth and has no
sharp changes at . The continuity constraint ensures that the spline is continuous at the knots,
which is important for interpolation and curve fitting. The continuity of the first derivative
constraint ensures that the slope of the spline is smooth, which is important for applications
such as motion planning or trajectory planning. The continuity of the second derivative
constraint ensures that the curvature of the spline is smooth, which is important for applications
such as curve smoothing or path smoothing.

Question 3. What function is optimized to generate the smoothing spline? What is the necessity
to use the roughness penalty component? (5)

Answer: A natural approach is to find the function g that minimizes - (5)


𝑛

∑(𝑦𝑖 − 𝑔(𝑥𝑖 ))2 + 𝜆 ∫ 𝑔′′ (𝑡)2 𝑑𝑡


𝑖=1

where λ is a nonnegative tuning parameter. The function g that minimizes the equation is
known as a smoothing spline.
The main purpose of the roughness penalty is to prevent overfitting, which occurs when a
model fits too closely to the training data and fails to generalize well to new data.

Question 4. In spite of being the optimal classifier, what are the disadvantages of Bayes
classifier? (5)

Answer: Disadvantages of Bayes Classifier - (5)

• Requires initial knowledge of prior probability 𝑃(𝑦𝑞) and likelihood 𝑃(𝑥|𝑦𝑞).


• In real world problems, these probabilities are not known in advance.
• With the knowledge of the probabilistic structure of the problem, conditional densities
can be parameterized.
• In most pattern recognition problems, assumption of knowledge of probability structure
is not always valid.
• Classical parametric models are unimodal, but multimodal densities are found in many
real problems.

Question 5. Consider the 10 companies listed in below table. For each company, we have
information on whether charges were filed against it, whether it is small or large company, and

2
whether (after investigation) it turned out to be fraudulent (F) or truthful (T) in financial
reporting.
A “small” company has just been “charged” with fraudulent financial reporting. Using naïve
Bayes classification technique, predict the class of the company (T or F)? Estimate the
probability of this company to be fraudulent. (10 + 5)

Company 𝒔(𝒊) 𝒙𝟏 : charges filed 𝒙𝟐 : company size 𝒚: status


1 yes small truthful
2 no small truthful
3 no large truthful
4 no large truthful
5 no small truthful
6 no small truthful
7 yes small fraudulent
8 yes large fraudulent
9 no large fraudulent
10 yes large fraudulent

Solution: (a) Naïve Bayes classification technique, for predicting the class of the company
(T or F) (10)

Company 𝒙𝟏 : charges 𝒙𝟐 : company status y


𝒔(𝒊) filed size
1 yes small truthful y1
2 no small truthful y1
3 no large truthful y1
4 no large truthful y1
5 no small truthful y1
6 no small truthful y1
7 yes small fraudulent y2
8 yes large fraudulent y2
9 no large fraudulent y2
10 yes large fraudulent y2

y1 corresponds to the class ‘truthful’, and y2 corresponds to the class ‘fraudulent’. Therefore,
M = 2, N = 10

𝑁1 6
𝑃(𝑦1 ) = = = 0.6
𝑁 10
𝑁2 4
𝑃(𝑦2 ) = = = 0.4
𝑁 10

𝑉 𝑥1 : {yes, no} = { 𝑣1 𝑥1 , 𝑣2 𝑥1 }; 𝑑1 = 2

3
𝑉 𝑥2 : {𝑠𝑚𝑎𝑙𝑙, large} = { 𝑣1 𝑥2 , 𝑣2 𝑥2 }; 𝑑2 = 2

The count table generated from data is given in Table.


Table: Number of training samples, 𝑁𝑞 𝑣𝑙 𝑥 , of class q having value 𝑉𝑙 𝑥𝑗
𝑗

Value Count 𝐍𝐪 𝐯𝐥 𝐱
𝐣

𝐕𝐥 𝐱𝐣
truthful fraudulent
q=1 q=2

𝑣1 𝑥1 : yes 1 3

𝑣2 𝑥1 : no 5 1

𝑣1 𝑥2 : small 4 1

𝑣2 𝑥1 : large 2 3

We consider an instance from the given dataset (the same procedure applies for a data tuple
not in the given dataset (unseen instance)):

x: {yes, small} = {x1, x2}

In the discretized domain, ‘yes’ corresponds to 𝑣1 𝑥1 and ‘small’ corresponds to 𝑣1 𝑥2 .

𝑁1 𝑣1 𝑥 1
1
𝑃( 𝑥1 | 𝑦1 ) = =
𝑁1 6
𝑁2 𝑣1 𝑥 3
1
𝑃( 𝑥1 | 𝑦2 ) = =
𝑁2 4
𝑁1 𝑣1 𝑥 4
2
𝑃( 𝑥2 | 𝑦1 ) = =
𝑁1 6
𝑁2 𝑣1𝑥 1
2
𝑃( 𝑥2 | 𝑦2 ) = =
𝑁2 4

1 4 1
𝑃(𝑥| 𝑦1 ) = 𝑃( 𝑥1 | 𝑦1 ) ∗ 𝑃( 𝑥2 | 𝑦1 ) = ∗ =
6 6 9
3 1 3
𝑃(𝑥| 𝑦2 ) = 𝑃( 𝑥1 | 𝑦2 ) ∗ 𝑃( 𝑥2 | 𝑦2 ) = ∗ =
4 4 16

4
1 6 1
𝑃(𝑥| 𝑦1 ) 𝑃(𝑦1 ) = ∗ =
9 10 15
3 4 3
𝑃(𝑥| 𝑦2 ) 𝑃(𝑦2 ) = ∗ =
16 10 40

𝒎𝒂𝒙
𝒚𝑵𝑩 = 𝒂𝒓𝒈 𝑷(𝒙| 𝒚𝒒 ) 𝑷(𝒚𝒒 )
𝒒
3 1
Since, >
40 15
This gives 𝒒 = 2.
Therefore, for the pattern x = {yes, small}, the predicted class is ‘fraudulent.’

(b) Probability of this company to be fraudulent. (5)


𝑷(𝒙 | 𝒚𝟐 ) ∗ 𝑷( 𝒚𝟐 )
𝑷(𝒚𝟐 | 𝒙) =
𝑷(𝒙)

𝑃(𝑥) = 𝑃(𝑥 | 𝑦1 )𝑃( 𝑦1 ) + 𝑃(𝑥 | 𝑦2 )𝑃( 𝑦2 )

1 6 3 4 17
= ∗ + ∗ =
9 10 16 10 120

𝑃(𝑥 | 𝑦2 ) ∗ 𝑃( 𝑦2 )
𝑃(𝑦2 | 𝑥) =
𝑃(𝑥)
3 4
∗ 10 9
𝑃(𝑦2 | 𝑥) = 16 = = 0.5294
17 17
120

Therefore, Probability of this company to be fraudulent is 0.53.

You might also like