You are on page 1of 9

EXTENSION MATERIAL

Useful statistical
5 techniques for the project
Try this worksheet after you have completed Exercise 5H.

Pearson’s product-moment correlation


coefficient, r
In your project you will be expected to show that you know how to find the
correlation coefficient and, if necessary, the equation of the regression line
without using your GDC.
The formula for Pearson’s product-moment correlation coefficient, r,
for two sets of data, x and y, is:
You will be expected
s xy to use this formula to
➔ r= sx s y enhance your project.

where sxy is the covariance and


sx and sy are the standard deviations of x and y respectively.
sxy = ∑ ( x − x )( y − y ) or ∑ xy ∑ x ∑ y
n − n n
n

sx = ∑ ( x − x ) or2
⎛ ∑ x2
⎜⎜

− x 2 ⎟⎟
n ⎝ n ⎠

sy = ∑ ( y − y ) or 2
⎛ ∑ y2
⎜⎜

− y 2 ⎟⎟
n ⎝ n ⎠

EXAMPLE 1

Ten students train for a race. The table shows the number of hours per week that
they train and the time, in minutes, that it takes them to complete the race.
Training 21 5 6 3 9 12 8 25 10 6
Race time 14.2 16.1 16.2 18.4 15.9 15.3 14.8 13.8 14.1 16

Find the correlation coefficient, r, and comment on your result.

Answer
There are 10 pieces of data, so n = 10
x y xy
21 14.2 298.2
5 16.1 80.5
6 16.2 97.2
3 18.4 55.2
9 15.9 143.1
12 15.3 183.6
8 14.8 118.4
25 13.8 345
10 14.1 141
6 16 96

∑ x = 105 ∑ y = 154.8 ∑ xy = 1558.2

{ Continued on next page

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 1
EXTENSION MATERIAL

sxy = ∑ xy − ∑ x ∑ y
n n n
1558.2 105 154.8
= − ×
10 10 10 Use GDC to calculate the standard
= − 6.72 deviations of x and y.
S xy − 6.72 See Chapter 12 for guidance
r= = = −0.767
Sx × S y 6.77 × 1.294 on using your GDC.
There is a fairly strong negative correlation.

Equation of the regression line


The formula for the regression line of y on x is Only calculate the equation
of the regression line if there
s
➔ ( y − y ) = ( s x )2 ( x − x )
xy
is a moderate or strong
correlation coefficient.
Where x and y are the means of x and y, sx is the standard deviation of x and sxy
is the covariance.

EXAMPLE 2

The table shows the weight of a puppy and the weight of its mother.
Weight of puppy, x 0.52 0.98 1.21 1.05 0.25 1.54
Weight of mother, y 4.98 6.23 9.23 7.10 3.55 11.4
a Calculate the correlation coefficient, r.
b Find the equation of the regression line.

Answer
a x y x×y
0.52 4.98 2.5896
0.98 6.23 6.1054
1.21 9.23 11.1683
1.05 7.10 7.455
0.25 3.55 0.8875
1.54 11.4 17.556

∑ x = 5.55 ∑ y = 42.43 ∑ xy = 45.7618

sxy ¦ xy  ¦ x ¦ y
n n n
45.7618 5.55 42.43
= − ×
6 6 6
= 1.085675
s xy
r = = 1.085675 = 0.973 (3 sf ) Use GDC to calculate values of sx and sy
s x × s y 0.4277 × 2.61
This is a very strong, positive correlation. Comment on the correlation.
b The equation of the regression line is
⎛ 42.43 ⎞ 1.085675 ⎛ 5.55 ⎞ S xy
⎜y− ⎟ = ⎜x − ⎟ Use ( y − y ) = (x − x )
⎝ 6 ⎠ 0.4277 2 ⎝ 6 ⎠ (S x )2
y − 7.072 = 5.935x − 5.490 Rearrange
y = 5.94 x + 1.58

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 2
EXTENSION MATERIAL

Exercise 1
1 The table shows the heights and weights of 12 gymnasts.

Height, x cm 145 152 149 156 162 148 155 163 151 168 157 154
Weight, y kg 44 50 48 54 58 47 53 59 50 63 55 52
a Find the correlation coefficient, r.
b Calculate the equation of the line of regression.
c Use your line to predict the weight of a gymnast who is 160 cm tall.

2 The table shows the number of hours that each of ten students study before a test
and their test result.
Number of hours, x 1 1.5 3 4.5 2 0.5 2 5 4.5 3.5
Test score, % 51 48 73 76 55 45 61 83 81 72
a Find the correlation coefficient, r.
b Calculate the equation of the line of regression.
c Use your line to predict the score of a student who studied for 2.5 hours.

3 Lucy worked in a cafe. She recorded the number of cups of coffee sold each day and
the weather. Her results for a two-week period are shown below.
Outside temperature, x 19° 21° 18° 17° 22° 19° 20° 21° 24° 25° 23° 22° 23° 19°
Number of cups, y 210 195 225 250 182 205 198 201 154 139 155 185 172 197
a Find the correlation coefficient, r.
b Calculate the equation of the line of regression.

4 Johnny surveyed 14 students to find out the distance that they lived from school and
how long it took them to get to school. His results are in the table.
Distance, x km 2 0.5 3.2 1.6 0.2 2.8 1.2 4.8 0.7 3.6 8.2 4.7 1.5 1.9
Time, y minutes 24 10 20 21 6 28 12 35 8 35 50 48 9 19
a Find the correlation coefficient, r.
b Calculate the equation of the line of regression.

Yates’s continuity correction


If you are performing a 2 test on a 2 × 2 matrix then you need to apply
Yates’s continuity correction for the test to be valid.
fo is the observed frequency
➔ When you apply Yates’s continuity correction the formula for the 2 test is and
2
F 2 ¦ (| f o  f e |0.5) fe is the expected frequency
f e

EXAMPLE 2

For her Maths Studies Project Kiki is investigating the effects of eating breakfast.
She wants to test is whether eating breakfast is independent of gender.
Her data is summarized in the table.
Breakfast No breakfast Totals
Male 75 21 96
Female 49 35 84
Totals 124 56 180
Her hypotheses are
H0: Eating breakfast is independent of gender.
H1: Eating breakfast is not independent of gender.
Perform a 2 test at the 5% significance level to test these hypotheses.
{ Answer on next page

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 3
EXTENSION MATERIAL

Answer
Expected values
Breakfast No breakfast Total
Male 96 124 96 56 96
× × 180 = 66.13 × × 180 = 29.86
180 180 180 180
Use the totals of the rows
Female 84 124 84 56 84
× × 180 = 57.86 × × 180 = 26.13 and columns to work out the
180 180 180 180
expected values.
Total 124 56 180
Degrees of freedom = (2 – 1)(2 – 1) = 1
(| 75 − 66.13 |− 0.5)2 (| 21 − 29.86 |− 0.5)2 (| 49 − 57.86 | − 0.5)2 This is a 2 x 2 matrix so
χ2 = + + use Yates’s continuity
66.13 29.86 57.86
(| 35 − 26.13 |− 0.5)2 2
8.362 2 2 correction to find the 2
+ = 8.36 + + 8.36 + 8.36
26.13 66.13 29.86 57.86 26.13 test statistic.
= 7.29
Looking at the table of critical values for the 2 distribution, the critical
value at the 5% significance level for 1 degree of freedom is 3.841.
7.29 > 3.841 therefore we reject the null hypothesis.
Eating breakfast is not independent of gender.

Exercise 2
1 Max collected information on the number of students who took Art or ICT. He
decided to perform a x2 test at the 5% significance level to see if subject choice was
independent of gender. Here is his data:

Art ICT Total


Male 5 11 16
Female 8 6 14
Total 13 17 30

a Write down the null and alternative hypotheses.


b Write down the number of degrees of freedom.
c Calculate the 2 test statistic.
d Write down, with a reason, if you accept or reject the null hypothesis.
2 Belinda wanted to find out whether the average grade that a Grade 12 student achieved
was related to the number of hours that they spent on social networks. Her data is given
in the table. She decides to perform a 2 test at the 5% significance level to test her data.

< 5 hours ≥ 5 hours Total


< 4 (out of 7) 6 10 16
≥ 4 (out of 7) 11 9 20
Total 17 19 36

a Write down the null and alternative hypotheses.


b Write down the degrees of freedom.
c Calculate the 2 test statistic.
d Write down, with a reason, if you accept or reject the null hypothesis.

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 4
EXTENSION MATERIAL

3 Kohei conducted an investigation to find out if there was a connection between


eating carrots and eyesight. His data is given in the table. He decided to perform a
2 test at the 5% significance level to test his data.
Wears glasses/ Do not Wear glasses/
Total
contacts contacts
Eats carrots 13 18 31
Does not eat carrots 7 9 16
Total 20 27 47

a Write down the null and alternative hypotheses.


b Write down the degrees of freedom.
c Calculate the 2 test statistic.
d Write down, with a reason, if you accept or reject the null hypothesis.
4 Tamara is conducting a survey to find out if the eye colours of children
are the same as their mothers’. She collects this data.

Blue/green Brown
Total
(children) (children)
Blue/green (mother) 23 6 29
Brown( mother) 7 21 28
Total 30 27 57

She decides to perform a 2 test at the 5% significance level to test her data.
a Write down the null and alternative hypotheses.
b Write down the degrees of freedom.
c Calculate the 2 test statistic.
d Write down, with a reason, if you accept or reject the null hypothesis.

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 5
EXTENSION MATERIAL

Chapter 5 extension worked solutions


Exercise 1
1 a x y xy
145 44 6380
152 50 7600
149 48 7152
156 54 8424
162 58 9396
148 47 6956
155 53 8215
163 59 9617
151 50 7550
168 63 10584
157 55 8635
154 52 8008
Total =1860 = 633 = 98517

sxy = ∑ xy − ∑ x ∑ y
n n n
98517 1860 633
= − ×
12 12 12
=33.5
Using the GDC to calculate the standard deviations for x and y, we get:
sxy 33.5
r = = = 0.997
sx × s y 6.442 × 5.214

This is a strong, positive correlation.


b The equation of the regression line is:
⎛ 633 ⎞ 33.5
⎜y− ⎟ = ( x − 1860 )
⎝ 12 ⎠ 6.4422 12
y = 0.807 x − 72.4
c The weight of a gymnast 160 cm tall is
y = 0.807 × 160  72.4 = 56.7 kg
2 a x y xy
1 51 51
1.5 48 72
3 73 219
4.5 76 342
2 55 110
0.5 45 22.5
2 61 122
5 83 415
4.5 81 364.5
3.5 72 252
Total = 27.5 = 645 = 1970

sxy = ∑ xy − ∑ x ∑ y
n n n

= 1970 − 27.5 × 645


10 10 10
=19.625
Using the GDC to calculate the standard deviations for x and y, we get:
sxy 19.625
r 0.969
sx u s y 1.504 u13.463

This is a strong, positive correlation.

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 6
EXTENSION MATERIAL

b The equation of the regression line is:


⎛ 645 ⎞
⎜y− ⎟ = 19.6252 ( x − 27.5 )
⎝ 10 ⎠ 1.504 10

y = 8.67x + 40.6
c The score of a student who studied for 2.5 hours is
y = 8.67 × 2.5 + 40.6 = 62
3 x y xy
19 210 3990
21 195 4095
18 225 4050
17 250 4250
22 182 4004
19 205 3895
20 198 3960
21 201 4221
24 154 3696
25 139 3475
23 155 3565
22 185 4070
23 172 3956
19 197 3743
Total = = =
293 2668 54970

sxy = ∑ xy − ∑ x ∑ y
n n n
54970 293 2668
= − ×
14 14 14
= − 61.959
Using the GDC to calculate the standard deviations for x and y, we get:
sxy 61.959
r 0.9581
sx u s y 2.2824 u 28.334

This is a strong, negative correlation.


b The equation of the regression line is:
⎛ 2668 ⎞ −61.959 ⎛ 293 ⎞
⎜y− ⎟= ⎜x − ⎟
⎝ 14 ⎠ 2.2824 2 ⎝ 14 ⎠

y = −11.9x + 439.5
4 x y xy
2 24 48
0.5 10 5
3.2 20 64
1.6 21 33.6
0.2 6 1.2
2.8 28 78.4
1.2 12 14.4
4.8 25 168
0.7 8 5.6
3.6 35 126
8.2 50 410
4.7 48 225.6
1.5 9 13.5
1.9 19 36.1
Total = 36.9 = 325 = 1229.4

sxy = ∑ xy − ∑ x ∑ y
n n n
= 1229.4 − 36.9 × 325
14 14 14
= 26.628

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 7
EXTENSION MATERIAL

Using the GDC to calculate the standard deviations for x and y, we get:
s 26.628
r = xy = = 0.921
sx × s y 2.0838 ×13.878
This is a strong, positive correlation.
b The equation of the regression line is:
⎛ 325 ⎞ 26.628 36.9
⎜y− ⎟= (x − )
⎝ 14 ⎠ 2.08382 14

y = 6.13x + 7.05

Exercise 2
1 a H0: Choice of subject is independent of gender.
H1: Choice of subject is not independent of gender.
b dof = (2 − 1)(2 − 1) = 1
c Expected values
Breakfast No breakfast Total
Male 16 13 . 16 17 .
× × 30 = 6.93 × × 30 = 9.06 16
30 30 30 30

Female 14 13 . 14 17 .
× × 30 = 6.06 × × 30 = 7.93 14
30 30 30 30
Total 13 17 30

. . .
(| 5 − 6.93 |− 0.5)2 (| 11 − 9.0 6 |− 0.5)2 (| 8 − 6.0 6 | − 0.5)2
χ2 = . + . + .
6.93 9 .0 6 6 .0 6
. .2 .2 .2
(| 6 − 7.93 |− 0.5)2
+ . = 1.4 3. + 1.4 3. + 1 .4 3
.
7.93 6.93 9 .0 6 6 .0 6
.2
+ 1.4 3. =1.12
7.93

d In the table of critical values for the χ2 distribution, the critical value at the
5% significance level for
1 degree of freedom is 3.841.
1.12 < 3.841 therefore we accept the null hypothesis.
Choice of subject is independent of gender.

2 a H0: Grade is independent of the number of hours spent on social networks.


H1: Grade is not independent of the number of hours spent on social networks.
b dof = 1
c Expected values
< 5 hours ≥ 5 hours Total
< 4 (out of 7) 16 17 . 16 19 .
× × 36 = 7.5 × × 36 = 8.4 16
36 36 36 36

≥ 4 (out of 7) 20 17 . 20 19 .
× × 36 = 9.4 × × 36 = 10.5 20
36 36 36 36
Total 17 19 36

. . .
(| 6 − 7. 5 |− 0.5)2 (| 10 − 8. 4 |− 0.5)2 (| 11 − 9. 4 |− 0.5)2
χ2 = . + . + .
7. 5 8. 4 9. 4
. 2
.2 .2 .2
(| 9 − 10. 5 |− 0.5)
+ . = 1.0 .5 + 1.0 .5 + 1 .0 5
.
10. 5 8. 4 9. 4 10. 5
=0.476
d In the table of critical values for the χ2 distribution, the critical value at the
5% significance level for 1 degree of freedom is 3.841.
0.476 < 3.841 therefore we accept the null hypothesis.
Grade is independent of the number of hours spent on social networks.

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 8
EXTENSION MATERIAL

3 a H0: eyesight is independent of eating carrots.


H1: eyesight is not independent of eating carrots.
b dof = 1
c Expected values
Wears glasses/contacts Does not wear glasses/contacts Total
Eats carrots 31 20 31 27
× × 47 = 13.19 × × 47 = 17.81 31
47 47 47 47

Does not eat 16 20 16 27


carrots × × 47 = 6.81 × × 47 = 9.19 16
47 47 47 47

Total 20 27 47
(| 13 − 13.19 |− 0.5)2 (| 18 − 17.81 |− 0.5)2 2
χ2 = + + (| 7 − 6.81 | − 0.5)
13.19 17.81 6.81
(| 9 − 9.19 |− 0.5)2 ( − 0.31)2 ( − 0.31)2 ( − 0.31)2 2
+ = + + + (− 0.31)
9.19 13.19 17.81 6.81 9.19
=0.0373
d In the table of critical values for the χ2 distribution, the critical value at the
5% significance level for 1 degree of freedom is 3.841.
0.0373 < 3.841 therefore we accept the null hypothesis.
Eyesight is independent of eating carrots.

4 a H0: colour of eyes of children is independent of colour of eyes of mother.


H1: colour of eyes of children is not independent of colour of eyes of mother.
b dof = 1
c Expected values

Blue/green (children) Brown (children) Total


Blue/green 29 30 29 27
(mother) × × 57 = 15.26 × × 57 = 13.74 29
57 57 57 57

Brown 28 30 28 27
(mother) × × 57 = 14.74 × × 57 = 13.26 28
57 57 57 57

Total 20 27 57
(| 23 − 15.26 |− 0.5)2 (| 6 − 13.74 |− 0.5)2 (| 7 − 14.74 | − 0.5)2
χ2 = + +
15.26 13.74 14.74
(| 21 − 13.26 |− 0.5)2
+
13.26
(7.24 )2 2
(7.24 )2 (7.24 )2
= + ( 7.24 ) + +
15.26 13.74 14.74 13.26
=14.8

d In the table of critical values for the χ2 distribution, the critical value at the
5% significance level for 1 degree of freedom is 3.841.
14.8 > 3.841 therefore we do not accept the null hypothesis.
Colour of eyes of children is not independent of colour of eyes of mother.

© Oxford University Press 2012: this may be reproduced for class use solely for the purchaser’s institute Extension worksheet 9

You might also like