Professional Documents
Culture Documents
Model Answers Part b.26 2013
Model Answers Part b.26 2013
We used the Durbin-Watson test which assumes the model errors follow
the autoregressive model of order 1:
i = % i1 + ui ,
where the ui are independent, normal and have constant variance. The
Null hypothesis of the DW test is that % = 0. However, the actual
test statistic is approximately DW = 2(1 %b) thus ranging between
0 and 4. If DW 2 then we can assume independence, DW 4 is
equivalent to negative autocorrelation, and DW 0 is equivalent to
positive autocorrelation. The test provides bounds depending on the
confidence level and the number of explanatory variables which give a
meaning to the sign.
1
Describe the five measures of model quality given above, and provide
the geometric form of the model suggested by all measures. [6marks]
CV: Cross-validation. Data are split into training set and test set.
Goodness of fit is measured by the prediction error of the model
fitted to the training set as it predicts the test set. Once again,
we want to minimise this error. The commonly accepted form is
the 10-fold cross-validation.
2
lm(BP ~ years + weight).
3. We continue with the preferred model and check for outliers using the
R function influence.measures. The output is given below.
Note: qf(0.5,3,36)=0.887
Describe each of the measures given, and when the associate value be-
comes critical. For each point identify which measure becomes critical
(if any). Are any of the points influential? [6marks]
The first six values are based on the leave-one-out principle indicating
how coefficients and fit change when an observation is removed from
the analysis. The first three columns show the effects on coefficients
() and are given by
bj bj [i]
dfb.j = .
se(bj )
Observations for which the absolute value of dfb.j exceeds 1 are con-
sidered influential to this coefficient. The fourth column looks at the
fitted values: n
X ybj ybj [i]
dffit = .
j=1
y
se(b j )
p
dffit
p becomes critical if the absolute exceeds 3 (k + 1)/(n k 1) =
3 3/36 = 0.866.
The fourth column is the covariance ratio which measures the change
in the standard errors of the estimated coefficients and becomes critical
when it is smaller than 1 3(k + 1)/n = 0.769 or larger than 1 + 3(k +
1)/n = 1.231.
3
The fifth column is Cooks D which measures the overall change in coef-
ficients and is problematic if it exceeds the median of the F -distribution
with k + 1 and n k 1 degrees of freedom, in this case the proposed
0.887.
The final column contains the hat values which are the hat matrix
diagonal values. These values measure the influence that the j th obser-
vation has on the j th fitted value. It is considered extreme if the value
exceeds 3(k + 1)/n = 0.231. Observations exceeding this bound are
considered influential.