Professional Documents
Culture Documents
Time y
−5 5
−3 7
−1 10
1 14
3 17
5 18
(a) Formulate a suitable model for this problem, and test the significance of the jump.
(b) Assume that the slopes before and after the change are not the same. How can you
model this situation? Perform the appropriate tests.
1
Find the best model, respectively, by Cp , forward selection, backward selection and step-
wise selection. Write down how to get the best model in details. Choose critical values
for both ENTRY and STAY to be 2 (i.e., the resulting F -test statistic (F -value) is to be
compared with 2).
Hints to Assignment 5
1. In this question, the response y is clearly the dependent variable, and the independent
variables include the time t (a quantitative variable) and the categorical variable de-
scribing whether the measurement is made before or after the treatment. Note that the
2
treatment is applied at time 0. As a result, the model equation can be formulated as
Y = β0 + β1 t + β2 z + ε,
where
1, if the measurement is made after the treatment (i.e., t ≥ 0)
z :=
0, otherwise.
y1 = 5, t1 = −5, z1 = 0,
y4 = 14, t4 = 1, z1 = 1.
Now, testing the significance of the treatment (or equivalently, the jump in the graph)
means to test if β2 = 0.
The part (b) is actually similar to the example on page 128 of our sildes for Chapter 3.
If you have found in part (a) that the treatment is insignificant, then you can use
Y = β0 + β1 t + β3 tz + ε
and test whether β3 = 0 to check the common slope assumption; otherwise, if part (a)
says that the treatment is significant, then you should use
Y = β0 + β1 t + β2 z + β3 tz + ε
and test if β3 = 0.
2. Here we show you how to carry out the first step of these sequential methods. You should
mimic these procedures and complete the remaining parts.
The crucial point is that the “full model” used to compute the F -statistic should be
understood in a relative sense, and it is not necessarily the model including all variables.
In sequential methods, when we decide whether to delete or add a variable, say xi , the
full model is always the model including xi (plus the variables already entered or not
yet deleted at the this stage), while the reduced model is the one without this variable.
E.g., in forward selection, starting with the model with only the incercept, we have now
arrived at the model “Intercept+x1 +x2 ”, and we would like to compute the F -statistic
by adding x4 . The full model at this stage is “Intercept+x1 +x2 + x4 ” and the reduced
one is “Intercept+x1 +x2 ”. As another example, in backward elimination, suppose we
3
are now at the model with “Intercept+x1 +x3 + x6 ” and considering the F -statistic by
deleting x6 ; in this case, the full model at this stage is “Intercept+x1 +x3 + x6 ” and the
reduced one is “Intercept+x1 +x3 ”.
4
• Step 1:
0.9653
x1 F = − 20 + 6 = 1.450 × 10−3 < 2 (most insignificant)
0.9652/(20 − 6)
1.0388
x2 F = − 20 + 6 = 1.068 < 2
0.9652/(20 − 6)
1.1565
x3 F = − 20 + 6 = 2.775 > 2
0.9652/(20 − 6)
0.9871
x4 F = − 20 + 6 = 0.318 < 2
0.9652/(20 − 6)
1.2199
x5 F = − 20 + 6 = 3.694 > 2
0.9652/(20 − 6)
Conclusion: x1 is removed.
• ······
(c) The stepwise search method is actually a combination of the forward and backward
selection methods, and the way that the F -statistic is computed is similar as above.