You are on page 1of 5

RA Assignment 5

1. A certain response y is measured at 2-min time intervals. At time 0, a “treatment” is


applied, and it is hypothesized that this causes a “jump” in the response curve, as shown
in the graph.

The data are given in the following table. Set α = 0.05.

Time y
−5 5
−3 7
−1 10
1 14
3 17
5 18

(a) Formulate a suitable model for this problem, and test the significance of the jump.
(b) Assume that the slopes before and after the change are not the same. How can you
model this situation? Perform the appropriate tests.

2. An experiment was conducted to model Y with five explanatory variables X1 , X2 , X3 ,


X4 and X5 . We desire an equation of relating Y to the other variables. The following
table gives SSE for all possible subset models. Total sum of squares is equal to 5.0634
and the number of observations is equal to 20.

1
Find the best model, respectively, by Cp , forward selection, backward selection and step-
wise selection. Write down how to get the best model in details. Choose critical values
for both ENTRY and STAY to be 2 (i.e., the resulting F -test statistic (F -value) is to be
compared with 2).

Hints to Assignment 5

1. In this question, the response y is clearly the dependent variable, and the independent
variables include the time t (a quantitative variable) and the categorical variable de-
scribing whether the measurement is made before or after the treatment. Note that the

2
treatment is applied at time 0. As a result, the model equation can be formulated as

Y = β0 + β1 t + β2 z + ε,

where

1, if the measurement is made after the treatment (i.e., t ≥ 0)
z :=
0, otherwise.

E.g., the first measurement of these variables gives

y1 = 5, t1 = −5, z1 = 0,

while the fourth measurement indicates

y4 = 14, t4 = 1, z1 = 1.

Now, testing the significance of the treatment (or equivalently, the jump in the graph)
means to test if β2 = 0.
The part (b) is actually similar to the example on page 128 of our sildes for Chapter 3.
If you have found in part (a) that the treatment is insignificant, then you can use

Y = β0 + β1 t + β3 tz + ε

and test whether β3 = 0 to check the common slope assumption; otherwise, if part (a)
says that the treatment is significant, then you should use

Y = β0 + β1 t + β2 z + β3 tz + ε

and test if β3 = 0.

2. Here we show you how to carry out the first step of these sequential methods. You should
mimic these procedures and complete the remaining parts.
The crucial point is that the “full model” used to compute the F -statistic should be
understood in a relative sense, and it is not necessarily the model including all variables.
In sequential methods, when we decide whether to delete or add a variable, say xi , the
full model is always the model including xi (plus the variables already entered or not
yet deleted at the this stage), while the reduced model is the one without this variable.
E.g., in forward selection, starting with the model with only the incercept, we have now
arrived at the model “Intercept+x1 +x2 ”, and we would like to compute the F -statistic
by adding x4 . The full model at this stage is “Intercept+x1 +x2 + x4 ” and the reduced
one is “Intercept+x1 +x2 ”. As another example, in backward elimination, suppose we

3
are now at the model with “Intercept+x1 +x3 + x6 ” and considering the F -statistic by
deleting x6 ; in this case, the full model at this stage is “Intercept+x1 +x3 + x6 ” and the
reduced one is “Intercept+x1 +x3 ”.

(a) Forward selection: The test statistics is


SSEH0 − SSEHA SSEH0
F := = − (n − pA − 1),
SSEHA /(n − pA − 1) SSEHA /(n − pA − 1)
where H0 is the null hypothesis (corresponding to the reduced model), HA is the
alternative hypothesis (corresponding to the “full model”) and pA is the number of
regressors in the “full model”.
• Step 0: Intercept is entered. Note that the model involving only the intercept
has RSS = 0 and SSE = T SS=5.0634.
• Step 1:
5.0634
x1 F = − 20 + 2 = 26.813 > 2
2.0338/(20 − 2)
5.0634
x2 F = − 20 + 2 = 0.149 < 2
5.0219/(20 − 2)
5.0634
x3 F = − 20 + 2 = 41.298 > 2 (most significant)
1.5370/(20 − 2)
5.0634
x4 F = − 20 + 2 = 18.392 > 2
2.5044/(20 − 2)
5.0634
x5 F = − 20 + 2 = 40.563 > 2
1.5563/(20 − 2)
Conclusion: x3 is entered.
• ······
(b) Backward elimination : The test statistics is
SSEH0 − SSEHA SSEH0
F := = − (n − pA − 1),
SSEHA /(n − pA − 1) SSEHA /(n − pA − 1)
where H0 is the null hypothesis (corresponding to the reduced model), HA is the
alternative hypothesis (corresponding to the “full model”) and pA is the number of
regressors in the “full model”.
• Step 0: All variables are entered.

4
• Step 1:
0.9653
x1 F = − 20 + 6 = 1.450 × 10−3 < 2 (most insignificant)
0.9652/(20 − 6)
1.0388
x2 F = − 20 + 6 = 1.068 < 2
0.9652/(20 − 6)
1.1565
x3 F = − 20 + 6 = 2.775 > 2
0.9652/(20 − 6)
0.9871
x4 F = − 20 + 6 = 0.318 < 2
0.9652/(20 − 6)
1.2199
x5 F = − 20 + 6 = 3.694 > 2
0.9652/(20 − 6)
Conclusion: x1 is removed.
• ······
(c) The stepwise search method is actually a combination of the forward and backward
selection methods, and the way that the F -statistic is computed is similar as above.

You might also like