You are on page 1of 12

Scatter graphs

Eg Bob thinks that height and IQ are connected. He collects data on 5 people:
a) Plot a scatter graph of this data and draw a line of best fit.
b) Use your line to estimate the height of someone with an IQ of 116

Height
167 173 164 170 175
cm
IQ 120 126 118 124 129

Try to put an equal number of


points either side of the line

The line can be used as a


model of the function
connecting the 2 variables

b) Height  162cm
Types of correlation
The relationship between 2 variables is called the correlation.
You must be able to recognise the different types of correlation:

If the line of best fit would If the line of best fit is


have a positive gradient, always close to the points,
the correlation is positive the correlation is strong

If the line of best fit would If the points are sometimes


have a negative gradient, far from the line of best fit,
the correlation is negative the correlation is weak
Product-Moment Correlation Coefficient
In S1, you use a numerical calculation to measure correlation between 2 variables

This is called the Product-Moment Correlation Coefficient, denoted by the letter r


Perfect negative Perfect positive
correlation No correlation correlation

-1 0 1

WB14 The scatter diagrams below were drawn by a student.


D ia g ra m A D ia g ra m B D ia g ra m C
y v t
+ + +
+ + + + + +
+ + +
+ + + + + +
+
+ + + + + +
+ + + +
+ + + +
+ +
+ + +
+
x u s

The student calculated the value of the product moment correlation coefficient
for each of the sets of data. The values were:
0.68 = C –0.79 = A 0.08 = B
Write down, with a reason, which value corresponds to which scatter diagram.
WB15 Students in Mr Brawn’s exercise class have to do press-ups and sit-ups.
The number of press-ups x and the number of sit-ups y done by a random sample
of 8 students are summarised below.

 x  272  x  10164  xy  11222  y  320  y  13464


2 2

(a) Evaluate Sxx, Syy and Sxy.


(b) Calculate, to 3 decimal places, the product moment correlation coefficient
between x and y.
(c) Give an interpretation of your coefficient.

a) S xx  10164 
2722
 916
S xx   x 2 
  x
2

8
n
 y
2
320

2
S yy  13464   664
8 S yy   y 2 
n
272  320
S xy  11222   342
S xy   xy 
  x   y 
8
n
342
b) r   0.439 S xy
916  664 r
S xx S yy
c) Pupils that are able to do more press-ups can
also do more sit-ups
You can code raw data in order to reduce the numbers you work with.
Coding
WB16 A company owns two petrol stations P and Q along a main road. Total daily
sales in the same week for P (£p) and for Q (£q) are summarised in the table below.
p  4365 q  4340
When these data are coded using x  and y 
100 100
p q x y
Monday 4760 5380 3.95 10.4

Tuesday 5395 4460 10.3 1.2

Wednesday 5840 4640 14.75 3

Thursday 4650 5450 2.85 11.1

Friday 5365 4340 10 0

Saturday 4990 5550 6.25 12.1

Sunday 4365 5840 0 15

This has no effect on


the product moment
correlation coefficient

You are unlikely to


be asked to code
data, but should be
aware of how to
(a) Calculate Sxy, Sxx and Syy.
(b) (b) Calculate, to 3 significant figures, the value of the product moment
correlation coefficient between x and y.
(c)(i) Write down the value of the product moment correlation coefficient between p
and q.
(ii) Give an interpretation of this value.
 x  48.1  y  52.8  x  486.44  y  613.22  xy  204.95
2 2

S xx  486.44 
48.12
 155.92...
Use the memory
functions on S xx x 2
  x
2

7 n
2 your calculator
S yy  613.22 
52.8
 214.95... to store these
S yy   y 2 
  y
2

7 exact values
n
48.1 52.8
S xy  204.95   157.86...
S xy   xy 
  x   y 
7
n
 157.86...
b) r   0.862 S xy
155.92...  214.95... r
S xx S yy
c) The same, -0.862 Coding does not effect the PMCC

d) As one gains customers, the other loses them and vice-versa


Now try Ex6E, Q7,9
The least squares regression line
If there is a correlation between 2 variables, it is often useful to find the
equation of the line of best fit. Using Sxx and Sxy, it is possible to obtain this

Known as the least squares regression line of y on x, the equation is given by:

a  y  bx S xx   x 2 
  x
2

y  a  bx where and from previously, n


S xy
b
S xx S xy   xy 
  x   y 
n

x
x y
 y
n n
mb
The above definitions, except for x and y ,
a are given on the formula sheet, but you must
be able to recognise which variable is x and
which is y, even when different letters are used
WB17 A manufacturer stores drums of chemicals. During storage, evaporation
takes place. A random sample of 10 drums was taken and the time in storage, x
weeks, and the evaporation loss, y ml, are shown in the table below.
x 3 5 6 8 10 12 13 15 16 18
y 36 50 53 61 69 79 82 90 88 96
(a) On the grid below, draw a scatter diagram to represent these data.

b) Give a reason to support


fitting a regression model of the
form y = a + bx to these data

The points form a reasonably straight line


c) Find, to 2 decimal places, the value of a and the value of b.
x
x
y y 
(You may use Σx2 = 1352, Σy2 = 53112 and Σxy = 8354.)
n n
d) Give an interpretation of the value of b.
e) Using your model, predict the amount of evaporation that would take place after
(i) 19 weeks, (ii) 35 weeks.

x 3 5 6 8 10 12 13 15 16 18  x  106 x  10.6
y 36 50 53 61 69 79 82 90 88 96  y  704 y  70.4
106  704 106 2
c) S xy  8354   891.6 S xx  1352   228.4
10 10
891.6 891.6 16571 To 2dp, a = 29.02, b = 3.90
b a  70.4   10.6 
228.4 228 .4 571 y  29.02  3.90 x
d) For every week in storage, about 4ml evaporates
e) x  19  y  103.12 b is the gradient of
S xx   x 2 
 x 2

x  35  y  165.52 the regression line n


S xy
S xy   xy 
  x   y 
The regression coefficient of y on x is b 
S xx n
Least squares regression line of y on x is y = a + bx where a  y  b x
WB17 An office has the heating switched on at 7.00 a.m. each morning.
On a particular day, the temperature of the office, t °C, was recorded m minutes
after 7.00 a.m. The results are shown in the table below.

m 0 10 20 30 40 50
S xx   x 2 
 x 2

t 6.0 8.9 11.8 13.5 15.3 16.1 n


(a) Calculate the exact values of Smt and Smm.
S xy   xy 
  x   y 
If necessary, calculate manually
n

 m  150  t  71.6  m  5500  mt  2147


2

S mt   mt 
  m   t  150  71.6
 2147   357
n 6

S mm   m 2

  m
2
150 2
 5500   1750
n 6
An office has the heating switched on at 7.00 a.m. each morning. On a particular
day, the temperature of the office, t °C, was recorded m minutes after 7.00 a.m.
(b) Calculate the equation of the regression line of t on m in the form t = a + bm.
(c) Use your equation to estimate the value of t at 7.35 a.m.
(d) State, giving a reason, whether or not you would use the regression equation in
(b) to estimate the temperature
(i) at 9.00 a.m. that day, (ii) at 7.15 a.m. one month later.

b
357
 0.204 t
 t

71.6
m
 m

150
a  t  bm 
41
1750 n 6 n 6 6
41
t   0.204m di) Using the model, when m = 120 (ie 9am), t = 31oC,
6 which is higher than you would expect any heating
system to go to. The regression line is not valid.
m  35  t  13.97C dii) Yes – reset m = 0 each day

S xy S mt
The regression coefficient of y on x is b  
S xx S mm
Least squares regression line of y on x is y = a + bx where a  y  b x
Interpreting a regression line
Eg the weekly growth, g, in mm, of a banana is plotted against the
amount of fertiliser used, f, in ml. A scatter graph is made of the results:
g
The regression line is calculated as g  4.3  2.6 f

Using the regression line a) Give an interpretation of


80 within the range of known the coefficients 4.3 and 2.6
data is called interpolation A banana given no fertiliser will
have a weekly growth of 4.3mm
Using the regression For every ml of fertiliser used, there
40 line outside the range will be 2.6mm of growth each week
of known data is
called extrapolation b) Use the regression line to predict
the growth of a banana given:
i) 10ml of fertiliser
0 5 10 15 20 25 f 4.3  2.6  10  30.3mm
c) Comment on the reliability of each prediction ii) 30ml of fertiliser
4.3  2.6  30  82.6mm
The prediction for 10ml is interpolated and so more reliable
The prediction for 30ml is extrapolated and so may not be reliable Now try Ex7D

You might also like