You are on page 1of 7

DS 432 – Assignment I

Program: B.Tech. Discipline: CSE/ECE


Semester: August–December 2020 Time: 3 Hours
Course Code: DS 432 Total Pages: 7
Course Name: Predictive Modeling for Data Science Max. Marks: 20

Instructions:
1. Read the instructions carefully.
2. Attempt all questions.
3. Use of calculator and/or R/Python is allowed. (Code output not required.)
4. Use of lecture notes, books, other internet resources are allowed.
5. Any discussion or otherwise inappropriate communication between examinees will be dealt with
severely.
6. Each question is worth 1 point. The exam has 20 questions.
7. Write your answers on plain A4 paper, scan, and upload the merged PDF in Moodle
https://exam.niituniversity.in. Answer five questions per page – total 4 pages.
Separate each answer with a line as per discussed format.
8. For each question, 0.5 marks will be awarded for correct option (A, B, C, D, or E). For the
explanation that follows your choice of option, either 0.5 or 0.25 or 0 marks will be awarded. 0.5
marks will be awarded for correct explanation, 0.25 marks for partially correct explanation, and
0 marks will be awarded for incorrect explanation. If your option (A, B, C, D, or E) is incorrect,
0 marks will be awarded for the question regardless of the explanation that follows.
9. Do not forget to write your name and enrollment number on each page of your submission.
Write the answers in sequence; also handwriting should be legible. No email submission please
and don’t wait until last minute for the submission.
10. By uploading the answer (in Moodle) you acknowledge that you did not discuss
any aspect of this exam with anyone other than the instructor, that you neither
gave nor received any unauthorized assistance on this exam, and that the work
submitted is entirely your own.

1
Answer questions 1 and 2 with the R output given below.

Call:
lm(formula = Y ~ X, data = Regr)

Residuals:
1 2 3 4 5 6 7
0.55769 -0.65385 -0.86538 0.34615 1.13462 -0.07692 -0.44231

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.4423 0.4757 19.851 5.99e-06 ***
X -1.7885 0.2887 -6.194 0.0016 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7869 on 5 degrees of freedom


Multiple R-squared: 0.8847, Adjusted R-squared: 0.8617
F-statistic: 38.37 on 1 and 5 DF, p-value: 0.0016
1. The estimated regression equation for the full model is
(A) ŷ = 9.4423X − 1.7885 + 0.2887
(B) ŷ = 9.4423 − 0.8847X
(C) ŷ = −9.4423 − 1.7885X
(D) ŷ = 9.4423X + 1.7885
(E) None of the above.
2. The predicted value ŷ(1) is equal to
(A) -1.7885
(B) 11.2308
(C) 7.6538
(D) 9.4423
(E) None of the above.
3. Suppose that {Zt } is a white noise process with a sample size of n = 100. If we performed a
simulation to study the sampling variation of r1 , the lag one sample autocorrelation, about
95% of our estimates r1 would fall between
(A) -0.025 and 0.025
(B) -0.05 and 0.05
(C) -0.1 and 0.1
(D) -0.2 and 0.2

2
(E) None of the above.
4. What is the difference between strict stationarity and weak stationarity?
(A) Strict stationarity requires that the mean function and autocovariance function be free of
time t. Weak stationarity does not.
(B) Strict stationarity is required to guarantee that MMSE forecasts are unbiased (in ARIMA
models). These forecasts may not be unbiased under weak stationarity.
(C) Strict stationarity is a stronger form of stationarity that does not rely on large-sample
theory.
(D) Strict stationarity refers to characteristics involving joint probability distributions. Weak
stationarity refers only to conditions placed on the mean and autocovariance function.
(E) None of the above.
5. In an analysis, we have determined that
• The Dickey-Fuller unit root test for the series {Xt } does not reject a unit root.
• The ACF for the series {Xt } has a very, very slow decay.
• The PACF for the differences {∇Xt } has significant spikes at lags 1 and 2 (and is
negligible at higher lags).
Which model is most consistent with these observations?
(A) IMA(1,1)
(B) ARI(2,1)
(C) ARIMA(2,2,2)
(D) IMA(2,2)
(E) None of the above.
6. Which of the following processes is stationary?
(A) MA(1) process with θ = −1.4
(B) Xt = 12.3 + 1.1Xt−1 + Zt
(C) IMA(1,1)
(D) Xt = β0 + β1 t + Zt
(E) None of the above.
7. Consider the time series model Xt = 0.8Xt−1 + 0.09Xt−2 + Zt − 0.01Zt−2 . Determine whether
the model is stationary and/or invertible.
(A) Stationary but not invertible.
(B) Not stationary but invertible.
(C) Stationary and invertible.
(D) Neither stationary nor invertible.

3
(E) None of the above.
8. Here is the R output from fitting an ARIMA(1,0,0) model to a data set ts.sim11.
> arima(ts.sim11, order=c(1,0,0))

Call:
arima(x = ts.sim11, order = c(1, 0, 0))

Coefficients:
ar1 intercept
0.5782 0.0146
s.e. 0.0810 0.2167

sigma^2 estimated as 0.8582: log likelihood = -134.45, aic = 272.91


The last observed value of the data set is X100 = −1.6387. Using the fitted AR(1) model, the
(estimated) MMSE forecast for X101 is approximately equal to
(A) -0.941
(B) -0.841
(C) -0.741
(D) -0.641
(E) None of the above.
9. For polynomial regression, which one of these structural assumptions is the one that most
affects the trade-off between underfitting and overfitting.
(A) The polynomial degree.
(B) Whether we learn the weights by matrix inversion or gradient descent.
(C) The assumed variance of the Gaussian noise.
(D) The use of a constant-term unit input.
(E) None of the above.
10. The relationship between number of beers consumed (x) and blood alcohol content (y) was
studied in 16 male college students by using least squares regression. The following regression
equation was obtained from this study: ŷ = −0.0127 + 0.0180x. Another guy, his name Buddy,
has the regression equation written on a scrap of paper in his pocket. Buddy goes out drinking
and has 4 beers. He calculates that he is under the legal limit (say, 0.08) so he decides to drive
to another bar. Unfortunately Buddy gets pulled over and confidently submits to a road-side
blood alcohol test. He scores a blood alcohol of 0.085 and gets himself arrested. Obviously,
Buddy did not knew about residual variation. Buddy’s residual is:
(A) +0.0257
(B) -0.0257
(C) +0.005

4
(D) -0.005
(E) none of the above.
11. What did we discover about the method of moments procedure when estimating parameters
in ARIMA models?
(A) The procedure gives reliable results when the sample size n > 100.
(B) The procedure gives unbiased estimates.
(C) The procedure should not be used when models include AR components.
(D) The procedure should not be used when models include MA components.
(E) None of the above.
12. Which statement about MMSE forecasts in stationary ARMA models is true?
(A) If X̂t (l) is the MMSE forecast of ln(Xt+l ), then eX̂t (l) is the MMSE forecast of Xt+l .
(B) As the lead time l increases, X̂t (l) will approach the process mean E(Xt ) = µ.
(C) As the lead time l increases, V(X̂t (l)) will approach the process variance V(Xt ) = γ0 .
(D) All of the above are true.
(E) None of the above are true.
13. An observed time series displays a clear upward linear trend. We fit a straight line regression
model to remove this trend, and we notice that the residuals from the straight line fit are
stationary in the mean level. What should we do next?
(A) Search for a stationary ARMA process to model the residuals.
(B) Perform a Shapiro-Wilk test.
(C) Calculate the first differences of the residuals and then consider fitting another regression
model to them.
(D) Perform a t-test for the straight line slope estimate.
(E) None of the above.
14. Let V(X) = 1, V(Y ) = 2, and Cov(X, Y ) = 3, then the value of α that minimizes V(αX +
(1 − α)Y ) is
(A) 0.1
(B) 0.2
(C) 0.4
(D) 0.5
(E) None of the above.
15. Consider the time series data given below.
Time 1 2 3 4 5 6
Xt 2 3 7 4 8 11

5
Now suppose you would like to fit a AR(2) model into the above data set. Then the MoM
estimate of φ̂1 and φ̂2 is given by
(A) φ̂1 = 0.2 and φ̂2 = 1.7
(B) φ̂1 = 1.7 and φ̂2 = 0.2
(C) φ̂1 = −0.2 and φ̂2 = 1.7
(D) φ̂1 = 0.2 and φ̂2 = −1.7
(E) None of the above.
16. Traditionally, when we have a real-valued input attribute during decision-tree training we
consider a binary split according to whether the attribute is above or below some threshold.
Suppose someone suggested that instead we should have a multiway split with one branch
for each of the distinct values of the attribute. Which of the following is the single biggest
problem with that suggestion.
(A) It is computationally expensive.
(B) It would probably result in a decision tree that scores badly on the training set and a
test set.
(C) It would probably result in a decision tree that scores well on the training set but badly
on a test set.
(D) It would probably result in a decision tree that scores badly on the training set but well
on a test set.
(E) None of the above.
17. A hypothesis test that can be used for model comparison in linear regression is
(A) F -test.
(B) t-test.
(C) χ2 -test.
(D) partial F -test.
(E) None of the above.
18. A statistic that measures the change in the fitted regression coefficients when an observation
is dropped from the regression analysis is
(A) Cook’s distance.
(B) Hat matrix.
(C) Influential point.
(D) Leverage point.
(E) None of the above.
19. In linear regression, the tendency in data sets when there are several unusual observations
clustered together such that attempts to identify one observation at a time fail is called

6
(A) Masking effect.
(B) Collinearity effect.
(C) Interaction effect.
(D) Linear restriction.
(E) None of the above.
20. Averaging the output of multiple decision trees helps ______
(A) increase bias.
(B) increase variance.
(C) decrease bias.
(D) decrease variance.
(E) None of the above.

You might also like