Professional Documents
Culture Documents
RESIDUALS
A residual is the vertical distance between a data point and the possible line of best fit.
Click on the icon to experiment with finding the ‘line of best fit’
by minimising the sum of the squares of the residuals.
Write down the function that you find which minimises the sum of the squares of the residuals.
95
0
50
25
75
100
0
25
50
100
75
95
IB_03
cyan black
Z:\...\IBBK3_18\584IB318.CDR
Wed Jul 21 09:22:40 2004
Color profile: Disabled
Composite Default screen
Example 4
Use the formulae for calculating m and c for the line of best fit through (1, 3),
(3, 5) and (5, 6).
P P P
So, x = 9, y = 14, xy = 48,
x y xy x2
P 2
1 3 3 1 x = 35, n = 3
P P
3 5 15 9 P ( x)( y) 9 £ 14
5 6 30 25 sxy = xy ¡ = 48 ¡ =6
P n 3
9 14 48 35 P
2
P 2 ( x)2 92
sx = x ¡ = 35 ¡ =8
n 3
P P
x 9 y
x= = = 3 and y = = 14
3
n 3 n
sxy 14
So, using y ¡ y = (x ¡ x) we get y¡ 3 = 68 (x ¡ 3)
sx2
y ¡ 4:67 + 0:75x ¡ 2:25
y + 0:75x ¡ 2:25 + 4:67
y + 0:75x + 2:42
From this point onwards we will use technology to find the least squares regression line.
We can find the least squares regression line using:
² a computer package ² a graphics calculator ² a computer spreadsheet
To do this consider the tabled data:
x 1 2 3 4 5 6 7
y 5 8 10 13 16 18 20
25
50
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\585IB318.CDR
Wed Jul 21 09:23:16 2004
Color profile: Disabled
Composite Default screen
Click on FINISH
You should now have a graph showing the
7 points.
Step 4: Place the arrow on one of the points and click
the RH mouse button once.
INTERPOLATION / EXTRAPOLATION
The two variables in the following scatterplot are the mass of a platypus (independent variable
plotted on the x-axis) and the length of the same platypus (dependent variable plotted on the
y-axis) for 14 different animals.
The data was collected in an experiment to discover if there was a relationship between the
length and mass of these animals.
0
25
50
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\586IB318.CDR
Wed Jul 21 09:23:47 2004
Color profile: Disabled
Composite Default screen
There is a very high positive correlation between the variables, and the line of best fit is
determined to be y + 0:087x + 46:1 cm.
However, it would be dangerous to predict that for a mass of 800 grams the extension would
be 0:087 £ 800 + 46:1 = 115:7 cm because we may have exceeded the elastic limit of the
spring somewhere between x = 500 grams and x = 800 grams, meaning that the spring
becomes permanently stretched more than predicted by the graph.
A further example could be the world record for the long
jump prior to the Mexico City Olympic Games of 1968.
A steady regular increase in the World record over the
previous 30 years had been recorded. However, due to the
high altitude and a perfect jump, the USA competitor Bob
Beamon, shattered the record by a huge amount, not in
keeping with previous increases.
50
100
0
25
75
95
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\587IB318.CDR
Wed Jul 21 09:24:27 2004
Color profile: Disabled
Composite Default screen
Example 5
The table below shows the sales for Hancock’s Electronics established in late 1998.
EXERCISE 18C
1 Recall the tread depth data of car tyres after travelling thousands of kilometres:
kilometres (X thousand) 14 17 24 34 35 37 38 39
tread depth (Y mm) 5:7 6:5 4:0 3:0 1:9 2:7 1:9 2:3
25
50
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Y:\...\IBBK3_18\588IB318.CDR
Thu Jul 22 09:43:32 2004
Color profile: Disabled
Composite Default screen
2 Recall the restauranteur’s data for the number of diners in March and the temperature at
noon.
Temperature (X o C) 23 25 28 30 30 27 25 28 32 31 33 29 27
Number of diners (Y ) 57 64 62 75 69 58 61 78 80 67 84 73 76
5 The rate of a chemical reaction in a certain plant depends on the number of frost-free
days experienced by the plant over a year which, in turn, depends on altitude. The higher
the altitude, the greater the chance of frost. The following table shows the rate of the
chemical reaction R, as a function of the number of frost-free days, n.
25
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\589IB318.CDR
Wed Jul 21 09:25:45 2004
Color profile: Disabled
Composite Default screen
6 The yield (Y kg) of pumpkins on a farm depends on the quantity of fertiliser (X g/m2 ).
The following table shows X 4 13 20 26 30 35 50
corresponding X and Y values.
Y 1:8 2:9 3:8 4:2 4:7 5:7 4:4
a Draw a scatterplot of the data
and identify an outlier.
b Calculate the correlation coefficient:
i with the outlier included ii without the outlier.
c Calculate the equation of the least squares regression line:
i with the outlier included ii without the outlier.
d If you wish to estimate the yield when 15 g/m2 are used, which regression line from
c should be used?
e Can you explain what may have caused the outlier?
7 Find the least squares regression line for y on x if:
a x = 6:12, y = 5:94, sxy = ¡4:28, sx = 2:32
b x = 21:6, y = 45:9, sxy = 12:28, sx = 8:77
P P P P P
8 n = 6, x = 61, y = 89, xy = 1108, (x¡x)2 = 138 and (y¡y)2 = 284
a Find i the mean of X ii the mean of Y:
b Find i the standard deviation of X ii the standard deviation of Y:
c Find the covariance of X and Y .
d Find the least squares regression model for y on x.
What to do:
1 Amy and Lee are two wine judges. They are considering six red wines: A, B, C, D, E
and F. They taste each wine and put them in order of enjoyment from 1 (best) to 6
(worst), and the results of their judging is shown in the table which follows:
Wine A B C D E F Notice that for wine A, d = 6 ¡ 3 = 3
and for wine C, d = 6 ¡ 2 = 4
Amy’s order 3 1 6 2 4 5
a Find Spearman’s rank order correlation
Lee’s order 6 5 2 1 3 4 coefficient for the wine tasting data.
b Comment on the degree of agreement between their rankings of the wine.
c What is the significance of the sign of t?
50
0
25
75
95
100
0
25
50
75
95
100
IB_03
cyan black
Z:\...\IBBK3_18\590IB318.CDR
Wed Jul 21 09:26:24 2004