You are on page 1of 2

Tutorial Sheet Answers Week 8

GOODNESS OF FIT
Exercise 1
(a) r=sqrt(0.6595)=0.81 Strong positive correlation. As the number of cheques increases, the
cost increases
(b) 66% of the variation in bank charges can be explained by the different number of cheques
processed.
(c) Average bank charges are £18.68 fixed charge plus £0.60 per cheque.
(d) £138 (e) £257 (f) Knotty Knitwear because it involves interpolation.
(g) The one with the largest positive residual. (h) The one with the largest negative residual.

Exercise 2
Mileage Price Fitted Residual Residual2
19 3250 2879.3 370.7 137418.49
41 2650 2300.7 349.3 122010.49
33 2100 2511.1 -411.1 169003.21
59 1650 1827.3 -177.3 31435.29
96 1250 854.2 395.8 156657.64
67 1100 1616.9 -516.9 267185.61
Mean 2000 SSRes 883710.73
SSTot 3490000 n-2 4
s2 220927.7
Correlation coefficient, r =-0.864 => strong inverse correlation between mileage and price.
Coefficient of Determination, r2 =75% => 75% of variation in prices can be explained by the
mileage.
Standard error about the line, s = 470 => The predicted price is roughly £470 away from the actual
price on average

Random scatter. Linear model seems to be appropriate.


So, overall model appears a good fit

Exercise 3
(a) Correlation is not causal. The size of the town is the main factor here.
(b) The relationship is probably non-linear.
(c) There are only a few points, so the correlation has probably come about by chance.
(d) Both variables have increased gradually over the past 50 years, not necessarily because they
are connected in any way. Or possible 3rd factor; increased opportunities for women.
PREDICTION

1. r = ±√0.855 = 0.925
Strong direct correlation between bacteria per unit volume and time.

2. At the start of the experiment the bacteria density is predicted to be -69.9, which is
clearly impossible and casts doubt on the validity of the model.

Every hour the number of bacteria increases by 88 per unit volume on average.

3. n=25 points df=n-2=23 t(95%)=2.069 s.e.=7.566

95% CI is 87.982 ± 2.069 x 7.566


87.982 ± 15.654 72.328 -> 103.636

4. It has a large positive residual, so the number of bacteria was far greater than expected at
that time.

5. (a) 5 hours -69.9 + 88 x 5 = 370.1


(b) 10 hours -69.9 + 88 x 10 = 810.1

6. 5 involves interpolation so should be quite reliable


10 involves extrapolation well beyond the 6 hours observed and is very unreliable

7. Clear pattern suggesting that the relationship is actually non-linear

8. If the growth is exponential then taking logs should reduce it to a linear form

9. R-squared is much higher for Logbacteria.


The residual plot for Logbacteria shows a random scatter.
None of the data are flagged as unusual in the Logbacteria analysis.
Note that we cannot compare S values because they are not on the same scale.

10. 5.90681 = 3.25 +0.531 X => x=5 hours


8.56298 = 3.25 +0.531 X => x=10 hours

11. Need to anti-log the limits.


95% confident that in repeated experiments of this type the number of bacteria per
unit volume after 5 hours would be between (e^5.878=) 357.0 and 378.4 on average.

12. Need to anti-log the limits.


95% confident that in an experiment of this type the number of bacteria per unit volume
after 10 hours would be between 4620.7 and 5929.2.

13. Bacteria = exp(3.25 +0.531 Time) = exp(3.25) x exp(0.531)Time


= 25.79 (1.70)Time

14. Number of bacteria per unit volume is multiplied by a factor of 1.70 every hour on
average, i.e. it increases by 70% per hour on average.

TESTING MEANS

1. z=-2.29 < -1.6449 Yes

2. mean=6.5 s=1.031, s.e.=0.344, 95% CI (5.71, 7.29)

t=-1.46, P=0.18, df=8 No.

3. t(8)=0.62 < 2.31 No. Suspect not a random sample.

4. z=2.90 Process should be adjusted.

You might also like