You are on page 1of 6

National Junior College Mathematics Department 2016

National Junior College


2015 – 2016 H2 Mathematics
Correlation and Regression [approx. 3 lessons] Tutorial

Basic Mastery Questions

1. Sketch a scatter diagram that might be expected when x and y are related approximately as
given in each of the cases (A), (B) and (C) below. In each case your diagram should include 6
points, approximately equally spaced with respect to x, and with all x- and y-values positive.
The letters a, b, c, d, e and f represent constants.
(A) y a bx2 , where a is positive and b is negative,
(B) y c d ln x , where c is positive and d is negative,
f
(C) y e , where e is positive and f is negative. [GCE2013/II/10 (part)]
x

2. The diagrams below show the two regression lines (y on x and x on y) for three different sets
of bivariate data. The scales along the two axes are the same for each diagram.

y y y

P P P

O x O x O x
(i) (ii) (iii)

The equations of the regression lines for one of the sets of bivariate data have been incorrectly
obtained. Identify which of (i), (ii) and (iii) is the corresponding scatter diagram, and justify
your answer clearly.

For the other two sets of bivariate data, explain, in each case, what the diagram tells us about
the correlation between the variables x and y? What does the point P represent? Indicate in the
diagrams the y on x and x on y lines.

3. Ten students sat for a practical test and a theoretical test for one of their subjects, Physics.
Their marks out of 10 are recorded in the following table.

Practical test (x) 8 6 10 8 5 6 8 10 7 7


Theoretical test (y) 6 7 8 6 7 4 9 10 5 8

Draw a scatter diagram for the pairs of marks.

Find, in any form, the equation of the regression line of


(i) y on x, and
(ii) x on y.

Calculate the product moment correlation coefficient for the data.

A student was absent from the theoretical test but obtained a mark of 6 in the practical test.
Use the appropriate regression line to estimate a mark in the theoretical test for this student.
Comment on the reliability of this estimate.

2015 – 2016 / H2 Maths / Correlation and Regression Page 1 of 6


National Junior College Mathematics Department 2016
Practice Questions

1. An experiment with certain swimming animals was carried out in order to investigate how the
speed at which they swam depended on the angle through which their hind feet moved. The
angle degrees through which the hind feet moved was measured, together with the
swimming speed v ms 1 . The results are given in the table.

θ 87 92 96 97 98 101 110 114 115 115 116 123 133


v 0.35 0.30 0.50 0.40 0.25 0.45 0.60 0.55 0.55 0.65 0.50 0.70 0.75

(i) State, giving a reason, which of the least squares regression lines, on v or v on ,
should be used to express a possible linear relation between v and .

(ii) Calculate the equation of the line chosen in part (i), giving the values of the coefficients
to a suitable degree of accuracy.

(iii) Interpret, in context, the value of the gradient of the regression line in (ii). By
considering the value of the v-intercept of the regression line, comment on the
suitability of a linear model for the relationship between v and , for values of
beyond the given data range.

(iv) Find the product moment correlation coefficient for this set of data. If the swimming
speeds were inaccurately measured and each measurement of v is to increase by 0.05,
what is the effect on the product moment correlation coefficient? Justify your answer.

2. A random sample of eight pairs of values of x and y are given in the table below.

i 1 2 3 4 5 6 7 8
xi 10 11 12 11 17 14 19 x8
yi 9 8 7 6 5 4 1 y8

(i) It is given that the regression lines y on x and x on y for this set of data have equations

7 151 7
y x and x y 20
10 10 6

respectively. Find the values of x8 and y8 .

(ii) Let Yi be the value obtained by substituting xi into the equation of the regression line
7 151 7 151
of y on x, for i = 1, 2, …, 8 i.e. Y1 x1 , Y2 x2 ,.... Find the value of
10 10 10 10
8
( yi Yi ) 2 .
i 1

8
2
(iii) Hence state an inequality that must be satisfied by yi a bxi for any real
i 1
constants a and b. Justify your answer clearly.

2015 – 2016 / H2 Maths / Correlation and Regression Page 2 of 6


National Junior College Mathematics Department 2016
3. With the implementation of a new bus fare system, Jasmine wanted to find out how the bus
fares were decided for different bus journeys. She identified 12 common locations and used a
map to measure the straight line distance, x km, of each location from her home. She also
measured the road distance, y km, of each location from her home and the corresponding bus
fare, s cents. The data are shown below.

Location A B C D E F G H I J K L
x 7.7 3.0 24.1 13.2 9.3 9.0 10.4 3.5 17.6 4.5 2.0 2.5
y 8.8 3.3 28.0 16.1 9.4 8.9 12.5 15.8 22.5 5.0 2.2 2.8
s 121 81 181 149 125 121 137 149 173 91 71 71

(i) By considering the values of x and y, explain why Location F should be omitted from
any further analysis. State, with a reason, another location that should be omitted.

Omit the data for the two locations in part (i).

(ii) Use a suitable regression line to give an estimate of the straight line distance when the
road distance is 20.0 km.

(iii) Draw a scatter diagram of s against y. State, with a reason, which of the following
models is more appropriate to describe the relationship between y and s:

Model I: s a by 2 ,
Model II: s a b ln y

(iv) Using the more appropriate model found in part (iii), calculate the equation of the
corresponding regression line.

(v) Estimate the road distance travelled if the bus fare is 170 cents. Comment on the
reliability of this estimate. [HCI/2010/Prelims/II/Q12 (modified)]

4. The table below shows the maximum temperature and the sale of cold soft drinks between
1130 hrs to 1430 hrs by a shop in a Central Business District for nine Tuesdays.
Temperature, t (oC) 29.4 30.5 36.6 31.1 32.5 33.4 33.8 34.8 35.1
Daily sales, s ($) 100 170 64 186 220 236 244 252 254
(i) Draw the scatter diagram for these values, labelling the axes clearly.

(ii) Identify a pair of values of s and t which should be regarded as an outlier. Give a
possible reason for the occurrence for this pair of data.

(iii) Omitting the outlier, find, correct to 4 decimal places, the value of the product moment
correlation coefficient between
(a) t and s,
1
(b) and s.
t
d
(iv) Use your answers to parts (i) and (iii) to explain which of s = a + bt or s c is the
t
better model, and find the equation of the appropriate regression line for the better
model. [NJC/2011/Promos/Q10(b) (modified)]

2015 – 2016 / H2 Maths / Correlation and Regression Page 3 of 6


National Junior College Mathematics Department 2016
5. The table below gives the values of a set of bivariate data comprising ten observations of x
and y.
x 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
y 13.1 12.9 12.5 11.7 10.9 9.9 8.9 7.7 6.0 4.1

(i) Sketch the scatter diagram and determine the value of the product moment correlation
coefficient between y and x.

(ii) Determine which of the following is the best model for this set of data, justifying your
choice clearly.

(A) y ax b (B) y cx 2 d (C) y e x f

(iii) Find the equation of the least-squares regression line of your selected best model in part
(ii). Use your equation to estimate the value of y when x = 3.8. Comment on the
reliability of the estimation. [NJC/2012/Prelims/II/Q6]

6. A medical officer wishes to investigate a patient’s walking speed s km/h and his heart-beat
rate h beats per minute (bpm). The data is shown below:

s 1 1.5 2 2.5 3 3.5 4 4.5 5


h 60 63 66 75 86 99 150 110 130

(i) Sketch a scatter plot of the above data.

(ii) One of the values of h appears to be incorrect. Indicate the corresponding point on your
diagram by labelling it P.

Omit P for the remainder of this question.

(iii) Calculate the product moment correlation coefficient for this set of data. Use the
equation of an appropriate regression line to predict the value of s when h = 100,
justifying your choice of regression line.

It is suggested to use one of the following two models instead:

Model (I): h a bs 2 ,
Model (II): h a bes .

where a and b are real constants.

(iv) Determine which of the two models is a better choice, giving a reason for your answer.

(v) Suppose a new data pair ( s , h ) is added to the table above, where s and h are the
patient’s sample mean walking speed (in km/h) and his sample mean heart-beat rate (in
bpm) respectively, based on the data above. Without any calculations, explain whether
the equation of the regression line you have obtained in part (iii) would change.
[NJC/2015/Prelims/II/Q12]

2015 – 2016 / H2 Maths / Correlation and Regression Page 4 of 6


National Junior College Mathematics Department 2016
7. A scientist wishes to investigate the rate at which mould grows on a slice of expired bread. He
conducts an experiment to measure the area covered by mould on a slice of expired bread
over a span of 2 weeks and records his findings in the table below.

Day t 0 2 6 10 13
2
Area covered by mould, x (in cm ) 1.5 18 75 94 99

(i) Calculate, correct to 4 decimal places, the product moment correlation coefficient for
this set of data.

(ii) Explain why the value you have obtained in part (i) does not necessarily imply that a
linear model is suitable for this set of data.

After carrying out some work, the scientist theorises that a model of the form

A
ln 1 a bt ,
x

for some real constants A, a and b, may be a good fit for this set of data. He tests his theory by
calculating the product moment correlation coefficient (denoted by r) between t and
A
ln 1 for a few possible values of A, and records his findings in the table below.
x

A 100 101 102


r –0.983563 –0.975623 –0.969018

(iii) Calculate the value of r for A = 100, giving your answer correct to 6 decimal places.

(iv) Which of 100, 101, and 102 is the most appropriate value of A? Justify your answer.

(v) Using the most appropriate value of A in part (iii), find the values of a and b, and use
these values to estimate the least number of complete days needed for the mould to
cover an area of 50 cm2.

(vi) Suggest what the value of A represents in the context of this question.

2015 – 2016 / H2 Maths / Correlation and Regression Page 5 of 6


National Junior College Mathematics Department 2016
Correlation and Regression Tutorial – Numerical Answers

Basic Mastery Questions

3. y on x line: y = 2.41 + 0.612x ; x on y line: x = 4 + 0.5y;

Estimate of y = 6, NOT reliable

Practice Questions
1. (ii) v 0.0995 0.565 (iv) 0.878

2. (i) x8 10, y8 8 (ii) 8.8

3. (ii) 16.7 km (iv) s 26.0 45.2 ln y (v) 24.1 km

25701
4. (iii) r = –0.9635 (iv) s 999.59
t

5. (i) r = –0.973 (iii) y 0.359 x2 13.2 (iv) 8.04

6. (iii) s = 3.67

7. (i) r = 0.9592 (iii) r = –0.983563 (v) a = 3.37, b = –0.632; 6 days

2015 – 2016 / H2 Maths / Correlation and Regression Page 6 of 6

You might also like