You are on page 1of 40

Revision Pearson, Spearman

and Chi Square test [104 marks]


As part of a study into healthy lifestyles, Jing visited Surrey Hills University. Jing
recorded a person’s position in the university and how frequently they ate a salad.
Results are shown in the table.

Jing conducted a χ 2 test for independence at a 5 % level of significance.

1a. State the null hypothesis. [1 mark]

Markscheme
number of salad meals per week is independent of a person’s position in the
university A1
Note: Accept “not associated” instead of independent.
[1 mark]
1b. Calculate the p-value for this test. [2 marks]

Markscheme
0.0201 (0.0201118…) A2
[2 marks]

1c. State, giving a reason, whether the null hypothesis should be accepted. [2 marks]

Markscheme
0.0201 < 0.05 R1
the null hypothesis is rejected A1

Note: Award (R1) for a correct comparison of their p-value to the test
level, award (A1) for the correct interpretation from that comparison.
Do not award (R0)(A1).
[2 marks]
The Malvern Aquatic Center hosted a 3 metre spring board diving event. The
judges, Stan and Minsun awarded 8 competitors a score out of 10. The raw data is
collated in the following table.

2a. Write down the value of the Pearson’s product–moment correlation [2 marks]
coefficient, r.

Markscheme
0.909 (0.909181…) A2
[2 marks]

2b. Using the value of r, interpret the relationship between Stan’s score [2 marks]
and Minsun’s score.
Markscheme
(very) strong and positive A1A1
Note: Award A1 for (very) strong A1 for positive.
[2 marks]

2c. Write down the equation of the regression line y on x. [2 marks]

Markscheme
y = 1.14x + 0.578 (y = 1.14033 … x + 0.578183 …) A1A1
Note: Award A1 for 1.14x, A1 for 0.578. Award a maximum of A1A0 if the
answer is not an equation in the form y = mx + c.
[2 marks]
2d. Use your regression equation from part (b) to estimate Minsun’s score [2 marks]
when Stan awards a perfect 10.

Markscheme
1.14 × 10 + 0.578 M1
12.0 (11.9814…) A1
[2 marks]

2e. State whether this estimate is reliable. Justify your answer. [2 marks]
Markscheme
no the estimate is not reliable A1
outside the known data range R1
OR
a score greater than 10 is not possible R1
Note: Do not award A1R0.
[2 marks]
The Commissioner for the event would like to find the Spearman’s rank correlation
coefficient.

2f. Copy and complete the information in the following table. [2 marks]
Markscheme
A1A1

Note: Award A1 for correct ranks for Stan. Award A1 for correct ranks for
Minsun.
[2 marks]

2g. Find the value of the Spearman’s rank correlation coefficient, rs . [2 marks]

Markscheme
0.933 (0.932673…) A2
[2 marks]
2h. Comment on the result obtained for rs . [2 marks]

Markscheme
Stan and Minsun strongly agree on the ranking of competitors. A1A1
Note: Award A1 for “strongly agree”, A1 for reference to a rank order.
[2 marks]

2i. The Commissioner believes Minsun’s score for competitor G is too high [1 mark]
and so decreases the score from 9.5 to 9.1.
Explain why the value of the Spearman’s rank correlation coefficient rs does not
change.

Markscheme
decreasing the score to 9.1, does not change the rank of competitor G A1
[1 mark]

A calculator generates a random sequence of digits. A sample of 200 digits is


randomly selected from the first 100 000 digits of the sequence. The following
table gives the number of times each digit occurs in this sample.
It is claimed that all digits have the same probability of appearing in the
sequence.

3a. Test this claim at the 5% level of significance. [7 marks]


Markscheme
H0: The sequence contains equal numbers of each digit. (A1)
H1: The sequence does not contain equal numbers of each digit. (A1)
(9+1+25+1+25+49+1+9+4+16)
χ2calc = 20
=7 (M1)(A1)
The number of degrees of freedom is 9. (A1)
χ20.95; 5 = 16.919 (A1)

χ2calc < 16.919. Hence H0 is accepted. (A1)


[7 marks]

3b. Explain what is meant by the 5% level of significance. [2 marks]

Markscheme
The probability of rejecting H0 when it is true (A1)
is 0.05. (A1)
Note: Award (A1)(A1) for “the probability of a type I error is 0.05.”
[2 marks]
Charles wants to measure the strength of the relationship between the price of a
house and its distance from the city centre where he lives. He chooses houses of
a similar size and plots a graph of price, P (in thousands of dollars) against
distance from the city centre, d (km).

4a. Explain why it is not appropriate to use Pearson’s product moment [1 mark]
correlation coefficient to measure the strength of the relationship
between P and d.

Markscheme
the data is not linear R1
[1 mark]

4b. Explain why it is appropriate to use Spearman’s rank correlation [1 mark]


coefficient to measure the strength of the relationship between P and d.
Markscheme
the data is (montonically) decreasing. R1
[1 mark]
The data from the graph is shown in the table.

4c. Calculate Spearman’s rank correlation coefficient for this data. [6 marks]
Markscheme
assign ranks M1
average equal prices M1

A1A1

rs = −0.991 (Note: condone rs = 0.991) A2


[6 marks]

4d. State what conclusion Charles can make from the answer in part (c). [1 mark]

Markscheme
There is a strong, negative relationship between the price of a house and its
distance from the city centre. R1
[1 mark]

90
On 90 journeys to his office, Isaac noted whether or not it rained. He also recorded
his journey time to the office, and classified each journey as short, medium or
long.
Of the 90 journeys to the office, there were 3 short journeys when it rained, 22
medium journeys when it rained, and 15 long journeys when it rained. There were
also 14 short journeys when it did not rain.
Isaac carried out a χ 2 test at the 5% level of significance on these data, looking at
the weather and the types of journeys.

5a. Write down H0 , the null hypothesis for this test. [1 mark]

Markscheme
* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.
type of journey and whether it rained are independent (A1) (C1)
Note: Accept “there is no association” or “not dependent”. Do not accept “not
related” or “not correlated”. Accept equivalent terms for ‘type of journey’.

[1 mark]
5b. Find the expected number of short trips when it rained. [3 marks]

Markscheme
17 40 17×40
90
× 90
× 90 OR 90
(A1)(M1)

17 40 17×40
Note: Award (A1) for 17 or 40 seen. Award (M1) for 90
× 90
× 90 OR 90
seen.

68
7. 56(7. 55555 … , 9
) (A1) (C3)

[3 marks]

0. 0206
5c. The p-value for this test is 0. 0206. [2 marks]
State the conclusion to Isaac’s test. Justify your reasoning.

Markscheme
reject (do not accept) H0 (A1)
OR
type of journey and whether it rained are not independent (A1)

Note: Follow through from part (a) for their phrasing of the null hypothesis.

0. 0206 < 0. 05 (R1) (C2)

Note: A comparison must be seen, either numerically or in words (e.g. p-value


< significance level). Do not award (R0)(A1).

[2 marks]
Lucy sells hot chocolate drinks at her snack bar and has noticed that she sells
more hot chocolates on cooler days. On six different days, she records the
maximum daily temperature, T , measured in degrees centigrade, and the number
of hot chocolates sold, H . The results are shown in the following table.

The relationship between H and T can be modelled by the regression line with
equation H = aT + b.

6a. Find the value of a and of b. [3 marks]

Markscheme
valid approach (M1)
eg a or b (or for r or r2 = 0. 962839 seen in (ii))
correct value for
a = −9. 84636, b = 221. 592
a = −9. 85, b = 222 A1A1 N3
[3 marks]
6b. Write down the correlation coefficient. [1 mark]

Markscheme
−0. 981244
r = −0. 981 A1 N1
[1 mark]

6c. Using the regression equation, estimate the number of hot chocolates [2 marks]
that Lucy will sell on a day when the maximum temperature is 12°C.

Markscheme
correct substitution into their equation (A1)
eg −9. 85 × 12 + 222
103. 435 (103. 8 from 3sf)
103 (hot chocolates) A1 N2
[2 marks]

−1
Don took part in a project investigating wind speed, x kmh−1 , and the time, y
minutes, to fully charge a solar powered robot.
The investigation was carried out six times. The results are recorded in the table.

7a. On graph paper, draw a scatter diagram to show the results of Don’s [4 marks]
investigation. Use a scale of 1 cm to represent 2 units on the x-axis, and
1 cm to represent 5 units on the y-axis.

Markscheme
* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.

(A4)

Note: Award (A1) for correct scales and labels.


Award (A3) for all six points correctly plotted.
Award (A2) for four or five points correctly plotted.
Award (A1) for two or three points correctly plotted.
Award at most (A0)(A3) if axes reversed.
If graph paper is not used, award at most (A1)(A0)(A0)(A0).

[4 marks]
7b. Calculate x , the mean wind speed. [1 mark]

Markscheme
19 (kmh−1 ) (A1)

[1 mark]

7c. Calculate y , the mean time to fully charge the robot. [1 mark]

Markscheme
32 (minutes) (A1)

[1 mark]

M is the point with coordinates ( x , y ).

7d. Plot and label the point M on your scatter diagram. [2 marks]
Markscheme
point in correct position, labelled M (A1)(ft)(A1)
Note: Award (A1)(ft) for point plotted in correct position, (A1) for point
labelled M Follow through from their part (b).

[2 marks]

7e. Calculate r, Pearson’s product–moment correlation coefficient. [2 marks]

Markscheme
(r =)0. 944(0. 943733 …) (G2)
Note: Award (G1) for 0. 943 (incorrect rounding).
[2 marks]
7f. Describe the correlation between the wind speed and the time to fully [2 marks]
charge the robot.

Markscheme
(very) strong positive correlation (A1)(ft)(A1)(ft)

Note: Award (A1)(ft) for (very) strong. Award (A1)(ft) for positive. Follow
though from their part (d)(i). If there is no answer to part (d)(i), award at most
(A0)(A1) for a correct direction.

[2 marks]

7g. Write down the equation of the regression line y on x, in the form [2 marks]
y = mx + c.
Markscheme
y = 0. 465x + 23. 2(y = 0. 465020 … x + 23. 1646 …) (A1)(A1)(G2)
Note: Award (A1) for 0. 465x. Award (A1) for 23. 2. If the answer is not an
equation, award at most (A1)(A0).

[2 marks]

7h. Draw this regression line on your scatter diagram. [2 marks]

Markscheme
regression line through their M (A1)(ft)
regression line through their (0, 23. 2) (A1)(ft)

Note: Award a maximum of (A1)(A0) if the line is not straight/ruler not used.
Award (A0)(A0) if the points are connected.
Follow through from their point M in part (b) and their y-intercept in part (e)
(i).
If M is not plotted or labelled, then follow through from part (b).

[2 marks]

7i. Hence or otherwise estimate the charging time when the wind speed is [2 marks]
27 kmh−1 .
Markscheme
(y =)0. 465020 …(27)+23. 1646 … (M1)

Note: Award (M1) for correct substitution into their regression equation.

35. 7 (minutes) (35. 7201 …) (A1)(ft)(G2)

Note: Follow through from their equation in part (e)(i).

OR
an attempt to use their regression line to find the y value at x = 27

Note: Award (M1) for an indication of using their regression line. This must be
illustrated by vertical and horizontal lines or marks at the correct place(s) on
their scatter diagram.

35. 7 (minutes) (A1)(ft)

Note: Follow through from part (e)(ii).

[2 marks]

7j. Don concluded from his investigation: “There is no causation between [1 mark]
wind speed and the time to fully charge the robot”.
In the context of the question, briefly explain the meaning of “no causation”.
Markscheme
wind speed does not cause a change in the time to charge (the robot) (A1)

Note: Award (A1) for a statement that communicates the meaning of a non-
causal relationship between the two variables.

[1 mark]

A scientist measures the concentration of dissolved oxygen, in milligrams per litre


(y) , in a river. She takes 10 readings at different temperatures, measured in
degrees Celsius (x).
The results are shown in the table.

It is believed that the concentration of dissolved oxygen in the river varies linearly
with the temperature.

8a. For these data, find Pearson’s product-moment correlation coefficient, r. [2 marks]

Markscheme
−0.974 (−0.973745…) (A2)
Note: Award (A1) for an answer of 0.974 (minus sign omitted). Award (A1)
for an answer of −0.973 (incorrect rounding).
[2 marks]
8b. For these data, find the equation of the regression line y on x. [2 marks]

Markscheme
y = −0.365x + 17.9 (y = −0.365032…x + 17.9418…) (A1)(A1) (C4)
Note: Award (A1) for −0.365x, (A1) for 17.9. Award at most (A1)(A0) if not
an equation or if the values are reversed (eg y = 17.9x −0.365).
[2 marks]

8c. Using the equation of the regression line, estimate the concentration of [2 marks]
dissolved oxygen in the river when the temperature is 18 °C.

Markscheme
y = −0.365032… × 18 + 17.9418… (M1)
Note: Award (M1) for correctly substituting 18 into their part (a)(ii).
= 11.4 (11.3712…) (A1)(ft) (C2)
Note: Follow through from part (a)(ii).
[2 marks]
A survey was carried out to investigate the relationship between a person’s age in
years ( a ) and the number of hours they watch television per week (h ). The
scatter diagram represents the results of the survey.

The mean age of the people surveyed was 50.


For these results, the equation of the regression line h on a is h = 0.22a + 15.

9a. Find the mean number of hours that the people surveyed watch [2 marks]
television per week.
Markscheme
* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.
0.22(50) + 15 (M1)

Note: Award (M1) for correct substitution of 50 into equation of the


regression line.

(=) 26 (A1) (C2)


OR
655
25
(M1)

Note: Award (M1) for correctly summing the h values of the points, and
dividing by 25.

(=) 26.2 (A1) (C2)


[2 marks]

9b. Draw the regression line on the scatter diagram. [2 marks]

Markscheme
line through (50, 26 ± 1) and (0, 15) (A1)(ft)(A1) (C2)

Note: Award (A1)(ft) for a straight line through (50, their h̄ ), and (A1) for the
line intercepting the y-axis at (0, 15); this may need to be extrapolated.
Follow through from part (a). Award at most (A0)(A1) if the line is not drawn
with a ruler.

[2 marks]
9c. By placing a tick (✔) in the correct box, determine which of the following [1 mark]
statements is true:

Markscheme
(A1) (C1)

Note: Award (A0) if more than one tick (✔) is seen.

[1 mark]

9d. Diogo is 18 years old. Give a reason why the regression line should not be [1 mark]
used to estimate the number of hours Diogo watches television per week.

Markscheme
18 is less than the lowest age in the survey OR extrapolation. (A1) (C1)

Note: Accept equivalent statements .

[1 mark]
A group of 800 students answered 40 questions on a category of their choice out
of History, Science and Literature.
For each student the category and the number of correct answers, N , was
recorded. The results obtained are represented in the following table.

10a. State whether N is a discrete or a continuous variable. [1 mark]

Markscheme
* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.
discrete (A1)
[1 mark]

10b. Write down, for N , the modal class; [1 mark]

Markscheme
11 ⩽ N ⩽ 20 (A1)
[1 mark]
10c. Write down, for N , the mid-interval value of the modal class. [1 mark]

Markscheme
15.5 (A1)(ft)

Note: Follow through from part (b)(i).

[1 mark]

10d. Use your graphic display calculator to estimate the mean of N ; [2 marks]

Markscheme
21.2 (21.2125) (G2)
[2 marks]
10e. Use your graphic display calculator to estimate the standard deviation of [1 mark]
N.

Markscheme
9.60 (9.60428 …) (G1)
[1 marks]

A χ 2 test at the 5% significance level is carried out on the results. The critical
value for this test is 12.592.

10f. Find the expected frequency of students choosing the Science category [2 marks]
and obtaining 31 to 40 correct answers.
Markscheme
260 157
800
× 800
× 800OR 260×157
800
(M1)

Note: Award (M1) for correct substitution into expected frequency


formula.

= 51.0 (51.025) (A1)(G2)


[2 marks]

10g. Write down the null hypothesis for this test; [1 mark]

Markscheme
choice of category and number of correct answers are independent (A1)

Notes: Accept “no association” between (choice of) category and number
of correct answers. Do not accept “not related” or “not correlated” or
“influenced”.

[1 mark]

10h. Write down the number of degrees of freedom. [1 mark]


Markscheme
6 (A1)
[1 mark]

10i. Write down the p-value for the test; [1 mark]

Markscheme
0.0644 (0.0644123 …) (G1)
[1 mark]

10j. Write down the χ 2 statistic. [2 marks]

Markscheme
11.9 (11.8924 …) (G2)
[2 marks]
10k. State the result of the test. Give a reason for your answer. [2 marks]

Markscheme
the null hypothesis is not rejected (the null hypothesis is accepted) (A1)(ft)
OR
(choice of) category and number of correct answers are independent
(A1)(ft)
as 11.9 < 12.592OR0.0644 > 0.05 (R1)

Notes: Award (R1) for a correct comparison of either their χ 2 statistic to


the χ 2 critical value or their p-value to the significance level. Award (A1)(ft)
from that comparison.
Follow through from part (f). Do not award (A1)(ft)(R0).

[2 marks]
Jim heated a liquid until it boiled. He measured the temperature of the liquid as it
cooled. The following table shows its temperature, d degrees Celsius, t minutes
after it boiled.

11a. Write down the independent variable. [1 mark]

Markscheme
t A1 N1
[1 mark]

11b. Write down the boiling temperature of the liquid. [1 mark]

Markscheme
105 A1 N1
[1 mark]

Jim believes that the relationship between d and t can be modelled by a linear
regression equation.

11c. Jim describes the correlation as very strong. Circle the value below [2 marks]
which best represents the correlation coefficient.
0.992 0.251 0 − 0.251 − 0.992
Markscheme
−0.992 A2 N2
[2 marks]

11d. Jim’s model is d = −2.24t + 105, for 0 ⩽ t ⩽ 20. Use his model to [2 marks]
predict the decrease in temperature for any 2 minute interval.

Markscheme
valid approach (M1)
eg ddd = −2.24; 2 × 2.24, 2 × −2.24, d(2) = −2 × 2.24 × 105,
t
finding d(t2 ) − d(t1 ) where t2 = t1 + 2
4.48 (degrees) A1 N2

Notes: Award no marks for answers that directly use the table to find the
105−98.4
decrease in temperature for 2 minutes eg 2 = 3.3.

[2 marks]

© International Baccalaureate Organization 2021


International Baccalaureate® - Baccalauréat International® - Bachillerato Internacional®

Printed for BRITISH SCHS


MONTEVIDEO

You might also like