You are on page 1of 2

Stat 331 Applied Linear Models – Assignment 1

Due on May 29th (Friday) 11 pm to Crowdmark. If a question involves R, use R Markdown to


write your solution and show your R codes and outputs.

1. Under the simple linear regression model, let ŷi = β̂0 + β̂1 xi denote the fitted value, and
ri = yi − ŷi denote the corresponding residual. We want to show that
Pn 2
2 r
σ̂ = i=1 i
n−2
is an unbiased estimator of σ 2 . Prove the following
Pn 2 xys2 Pn 2
Pn
(a) Show that i=1 ri = syy − sxx where syy = i=1 (yi − ȳ) and sxy = i=1 (xi − x̄)(yi − ȳ).

(b) Show that E(syy ) = (n − 1)σ 2 + β12 sxx and E(s2xy ) = β12 s2xx + σ 2 sxx .

Pn 2 = (n − 2)σ 2 .

(c) Show that E i=1 ri

2. We look at weekly reports on the box office ticket sales for plays on Broadway in New York.
We shall consider the data for a particular week (referred to below as the current week). The
data are in the form of the gross box office results for the current week and the gross box
office results for the previous week. The data are available on the Learn website in the file
a1q2.csv.
Fit the simple linear regression model to the data: yi = β0 + β1 xi + i where yi is the gross
box office result for the ith play in the current week (in $) and xi is the gross box office result
for the ith play in the previous week (in $). Complete the following:

(a) Plot the data. Does a simple linear regression model seem appropriate? Explain.

(b) Fit the simple linear regression model, provide the equation of the fitted line.

(c) Find a 95% confidence interval for the slope of the regression model, β1 . Is 1 a plausible
value for β1 ? Give a reason to support your answer.

(d) Test the hypothesis (at a 5% significance level) H0 : β0 = 10000 v.s. Ha : β0 6= 10000.
Provide a p-value and draw a conclusion.

(e) Suppose play A has $400,000 in gross box office the previous week, use the fitted re-
gression model to estimate the gross box office result for play A in the current week (in
$). Find a 95% prediction interval for the gross box office result for play A in the current
week (in $). Is $450,000 a feasible value for the gross box office result of play A in the
current week? Give a reason to support your answer.

1
iid
3. Consider a simple linear regression (SLR) model: yi = β0 + β1 xi + i , where i ∼ N (0, 1),
i = 1, 2, . . . , 20. Suppose β0 = 1, β1 = 2, and xi = i, i = 1, . . . , 20. Let β̂0 and β̂1 denote the
least squares estimators of β0 and β1 . Answer the following questions.

(a) ŷ8 , the 8th fitted value, can be expressed as a linear combination of yi ’s, determine the
coefficients in this linear combination and plot them against the corresponding xi ’s.
Comment on the pattern of the coefficients such as which xi values receive large coeffi-
cients.

(b) Under our SLR model, compute the value of V ar(β̂1 ).

(c) A set of y values are recorded in “a1q3.txt00 (available on Learn). Based on these ob-
served responses and without assuming σ is known, estimate β̂1 and its variance, and
construct a 95% confidence interval for β1 . You can read the data into R using the fol-
lowing command:

y <- read.table("/your_file_directory/a1q3.txt")$V1

(d) To see that β̂1 is random but unbiased, we can simulate samples according to the popu-
lation model. For a single dataset, we can generate the data (yi ’s) according to our SLR
model and compute β̂1 . Repeat this process 5000 times, and each time record the value
of β̂1 and its corresponding 95% confidence interval, similar as in part (c). What are
the sample mean and variance of your β̂1 values? What is the percentage of the con-
fidence intervals that include the true value of β1 . For your result to be reproducible,
use R command set.seed(20200511) once before running the simulation. To generate
20 independent random variables that follow the standard normal distribution, we can
use

e <- rnorm(20)

You might also like