Professional Documents
Culture Documents
1
(a) Linear Models
(b) Support Vector Machines
(c) Decision Trees
(d) Artificial Neural Networks
Sol. (c)
5. (1 mark) What happens when your model complexity (such as interaction terms in linear
regression, order of polynomial in SVM, etc.) increases?
(a) Model Bias increases
(b) Model Bias decreases
(c) Variance of the model increases
(d) Variance of the model decreases
Sol. (b) and (c)
6. (1 mark) Suppose we want an RL agent to learn to play the game of golf. For training purposes,
we make use of a golf simulator program. Assume that the original reward distribution gives
a reward of +10 when the golf ball is hit into the hole and -1 for all other transitions. To aid
the agent’s learning process, we propose to give an additional reward of +3 whenever the ball
is within a 1 metre radius of the hole. Is this additional reward a good idea or not? Why?
(a) Yes. The additional reward will help speed-up learning.
(b) Yes. Getting the ball to within a metre of the hole is like a sub-goal and hence, should
be rewarded.
(c) No. The additional reward may actually hinder learning.
(d) No. It violates the idea that a goal must be outside the agent’s direct control.
Sol. (c)
In this specific case, the additional reward will be detrimental to the learning process, as the
agent will learn to accumulate rewards by keeping the ball within the 1 metre radius circle and
not actually hitting the ball in the hole.
7. (2 marks) You want to toss a fair coin a number of times and obtain the probability of getting
heads by taking a simple average. What is the estimated number of times you’ll have to toss
the coin to make sure that your estimated probability is within 10% of the actual probability,
at least 90% of the time?
(a) 400*ln(20)
(b) 800*ln(20)
(c) 200*ln(20)
Sol. (c)
Since you’re given that the coin is fair, p = 0.5 is the mean of your tosses if tossed infinitely.
2
Hence, margin for error is 0.05. Now, using Chernoff-Hoeffding bounds, we can obtain the
required number of trails as follows.
2
P r(|X̄ − 0.5| ≥ 0.05) ≤ 0.1 ≤ 2e−2·(0.05) ·n
2
0.05 ≤ e−2·(0.05) ·n
1
ln( ) ≤ −2 · (0.05)2 · n
20
1
n≥ ln(20)
2 · (0.05)2
400
n≥ ln(20)
2
=⇒ n ≥ 200ln(20)
8. (1 mark) A new phone, E-Corp X1 has been announced and it is what you’ve been waiting
for, all along. You decide to read the reviews before buying it. From past experiences, you’ve
figured out that good reviews mean that the product is good 90% of the time and bad reviews
mean that it is bad 70% of the time. Upon glancing through the reviews section, you find out
that the X1 has been reviewed 1269 times and only 127 of them were bad reviews. What is
the probability that, if you order the X1, it is a bad phone?
(a) 0.1362
(b) 0.160
(c) 0.840
(d) 0.773
Sol. (b)
For the solution, let’s use the following abbreviations.
• BP - Bad Phone
• GP - Good Phone
• GR - Good Review
• BR - Bad Review
From the given data, P r(BP |BR) = 0.7 and P r(GP |GR) = 0.9. Using this, Pr(BP—GR) =
1 - Pr(GP—GR) = 0.1.
Hence,
9. (1 mark) You face a particularly challenging RL problem, where the reward distribution keeps
changing with time. In order to gain maximum reward in this scenario, does it make sense to
stop exploration or continue exploration?
3
(b) Continue exploration
Sol. (b)
Ideally, we would like to continue exploring, since this allows us to adapt to the changing
reward distribution.