You are on page 1of 54

Revision sheet 2- Chapter 19 [158 marks]

1. [Maximum mark: 24] 21M.3.AHL.TZ2.1


Juliet is a sociologist who wants to investigate if income affects happiness
amongst doctors. This question asks you to review Juliet’s methods and
conclusions.

Juliet obtained a list of email addresses of doctors who work in her city. She
contacted them and asked them to fill in an anonymous questionnaire.
Participants were asked to state their annual income and to respond to a set of
questions. The responses were used to determine a happiness score out of 100. Of the
415 doctors on the list, 11 replied.

Juliet’s results are summarized in the following table.

For the remaining ten responses in the table, Juliet calculates the mean
happiness score to be 52. 5.

Juliet decides to carry out a hypothesis test on the correlation coefficient to


investigate whether increased annual income is associated with greater
happiness.
Juliet wants to create a model to predict how changing annual income might
affect happiness scores. To do this, she assumes that annual income in dollars, X,
is the independent variable and the happiness score, Y , is the dependent
variable.

She first considers a linear model of the form

Y = aX + b .

Juliet then considers a quadratic model of the form

Y = cX
2
+ dX + e .

After presenting the results of her investigation, a colleague questions whether


Juliet’s sample is representative of all doctors in the city.

A report states that the mean annual income of doctors in the city is $80 000 .
Juliet decides to carry out a test to determine whether her sample could
realistically be taken from a population with a mean of $80 000.

(a.i) Describe one way in which Juliet could improve the reliability
of her investigation. [1]

Markscheme

Any one from: R1

increase sample size / increase response rate / repeat process


check whether sample is representative
test-retest participants or do a parallel test
use a stratified sample
use a random sample

Note: Do not condone:


Ask different types of doctor
Ask for proof of income
Ask for proof of being a doctor
Remove anonymity
Remove response K.

[1 mark]

(a.ii) Describe one criticism that can be made about the validity of
Juliet’s investigation. [1]

Markscheme

Any one from: R1

non-random sampling means a subset of population might be responding


self-reported happiness is not the same as happiness
happiness is not a constant / cannot be quantified / is difficult to measure
income might include external sources
Juliet is only sampling doctors in her city
correlation does not imply causation
sample might be biased

Note: Do not condone the following common but vague responses unless
they make a clear link to validity:
Sample size is too small
Result is not generalizable
There may be other variables Juliet is ignoring
Sample might not be representative

[1 mark]
(b) Juliet classifies response K as an outlier and removes it from the
data. Suggest one possible justification for her decision to
remove it. [1]

Markscheme

because the income is very different / implausible / clearly contrived


R1

Note: Answers must explicitly reference "income" to get credit.

[1 mark]

(c.i) Calculate the mean annual income for these remaining


responses. [2]

Markscheme

($) 90 200 (M1)A1

[2 marks]

(c.ii) Determine the value of r, Pearson’s product-moment


correlation coefficient, for these remaining responses. [2]

Markscheme

r = 0. 558 (0. 557723 …) A2

[2 marks]
(d.i) State why the hypothesis test should be one-tailed. [1]

Markscheme

EITHER
only looking for change in one direction R1

OR
only looking for greater happiness with greater income R1

OR
only looking for evidence of positive correlation R1

[1 mark]

(d.ii) State the null and alternative hypotheses for this test. [2]

Markscheme

H0 : ρ = 0; H1 : ρ > 0 A1A1

Note: Award A1 for ρ seen (do not accept r), A1 for both correct hypotheses,
using their ρ or r. Accept an equivalent statement in words, however
reference to “correlation for the population” or “association for the
population” must be explicit for the first A1 to be awarded.

Watch out for a null hypothesis in words similar to “Annual income is not
associated with greater happiness”. This is effectively saying ρ ≤ 0 and
should not be condoned.

[2 marks]

(d.iii) The critical value for this test, at the 5% significance level, is
0. 549 . Juliet assumes that the population is bivariate normal.
Determine whether there is significant evidence of a positive
correlation between annual income and happiness. Justify your
answer.
[2]

Markscheme

METHOD 1 – using critical value of r

0. 558 > 0. 549 (0. 557723 … > 0. 549) R1

(therefore significant evidence of ) a positive correlation A1

Note: Do not award R0A1.

METHOD 2 – using p-value

0. 0469 < 0. 05 (0. 0469463 … < 0. 05) A1

Note: Follow through from their r-value from part (c)(ii).

(therefore significant evidence of ) a positive correlation A1

Note: Do not award A0A1.

[2 marks]

(e.i) Use Juliet’s data to find the value of a and of b. [1]

Markscheme

a = 0. 000126 (0. 000125842 …), b = 41. 1 (41. 1490 …) A1


[1 mark]

(e.ii) Interpret, referring to income and happiness, what the value of


a represents. [1]

Markscheme

EITHER
the amount the happiness score increases for every $1 increase in (annual)
income A1

OR
rate of change of happiness with respect to (annual) income A1

Note: Accept equivalent responses e.g. an increase of 1. 26 in happiness for


every $10000 increase in salary.

[1 mark]

(e.iii) Find the value of c, of d and of e. [1]

Markscheme

c = −2. 06 × 10
−9
(−2. 06191 … × 10
−9
) ,

d = 7. 05 × 10
−4
(7. 05272 … × 10
−4
) ,

e = 12. 6 (12. 5878 …) A1

[1 mark]
(e.iv) Find the coefficient of determination for each of the two
models she considers. [2]

Markscheme

for quadratic model: R 2


= 0. 659 (0. 659145 …) A1

for linear model: R2


= 0. 311 (0. 311056 …) A1

Note: Follow through from their r value from part (c)(ii).

[2 marks]

(e.v) Hence compare the two models. [1]

Markscheme

EITHER
quadratic model is a better fit to the data / more accurate A1

OR
quadratic model explains a higher proportion of the variance A1

[1 mark]

(e.vi) Juliet decides to use the coefficient of determination to choose


between these two models.

Comment on the validity of her decision. [1]

Markscheme
EITHER
not valid, R not a useful measure to compare models with different
2

numbers of parameters A1

OR
not valid, quadratic model will always have a better fit than a linear model
A1

Note: Accept any other sensible critique of the validity of the method. Do
not accept any answers which focus on the conclusion rather than the
method of model selection.

[1 mark]

(f.i) State the name of the test which Juliet should use. [1]

Markscheme

(single sample) t-test A1

[1 mark]

(f.ii) State the null and alternative hypotheses for this test. [1]

Markscheme

EITHER

H0 : μ = 80 000; H1 : μ ≠ 80 000 A1

OR
H0 : (sample is drawn from a population where) the population mean is
$80 000

H1 : the population mean is not $80 000 A1

Note: Do not allow FT from an incorrect test in part (f )(i) other than a z-test.

[1 mark]

(f.iii) Perform the test, using a 5% significance level, and state your
conclusion in context. [3]

Markscheme

p = 0. 610 (0. 610322 …) A1

Note: For a z-test follow through from part (f )(i), either 0. 578 (from biased
estimate of variance) or 0. 598 (from unbiased estimate of variance).

0. 610 > 0. 05 R1

EITHER

no (significant) evidence that mean differs from $80 000 A1

OR

the sample could plausibly have been drawn from the quoted population
A1
Note: Allow R1FTA1FT from an incorrect p-value, but the final A1 must still be
in the context of the original research question.

[3 marks]
2. [Maximum mark: 6] 20N.1.SL.TZ0.T_10
On 90 journeys to his office, Isaac noted whether or not it rained. He also
recorded his journey time to the office, and classified each journey as short,
medium or long.

Of the 90 journeys to the office, there were 3 short journeys when it rained, 22
medium journeys when it rained, and 15 long journeys when it rained. There
were also 14 short journeys when it did not rain.

Isaac carried out a χ test at the 5% level of significance on these data, looking
2

at the weather and the types of journeys.

(a) Write down H , the null hypothesis for this test.


0 [1]

Markscheme

* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.

type of journey and whether it rained are independent (A1) (C1)

Note: Accept “there is no association” or “not dependent”. Do not accept


“not related” or “not correlated”. Accept equivalent terms for ‘type of
journey’.

[1 mark]

(b) Find the expected number of short trips when it rained. [3]

Markscheme

17

90
×
40

90
× 90 OR 17×40

90
(A1)(M1)

Note: Award (A1) for 17 or 40 seen. Award (M1) for 17

90
×
40

90
× 90 OR
17×40

90
seen.

7. 56 (7. 55555 … ,
68

9
) (A1) (C3)

[3 marks]

(c) The p-value for this test is 0. 0206.

State the conclusion to Isaac’s test. Justify your reasoning. [2]

Markscheme

reject (do not accept) H 0 (A1)

OR

type of journey and whether it rained are not independent (A1)

Note: Follow through from part (a) for their phrasing of the null hypothesis.

0. 0206 < 0. 05 (R1) (C2)

Note: A comparison must be seen, either numerically or in words (e.g. p-


value < significance level). Do not award (R0)(A1).

[2 marks]
3. [Maximum mark: 19] 20N.2.SL.TZ0.T_1
Don took part in a project investigating wind speed, x km h −1
, and the time, y
minutes, to fully charge a solar powered robot.

The investigation was carried out six times. The results are recorded in the table.

M is the point with coordinates ( x , .


y )

(a) On graph paper, draw a scatter diagram to show the results of


Don’s investigation. Use a scale of 1 cm to represent 2 units on
the x-axis, and 1 cm to represent 5 units on the y -axis. [4]

Markscheme

* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.

(A4)
Note: Award (A1) for correct scales and labels.
Award (A3) for all six points correctly plotted.
Award (A2) for four or five points correctly plotted.
Award (A1) for two or three points correctly plotted.
Award at most (A0)(A3) if axes reversed.
If graph paper is not used, award at most (A1)(A0)(A0)(A0).

[4 marks]

(b.i) Calculate x , the mean wind speed. [1]

Markscheme

19 (km h
−1
) (A1)

[1 mark]

(b.ii) Calculate y , the mean time to fully charge the robot. [1]

Markscheme

32 (minutes) (A1)

[1 mark]

(c) Plot and label the point M on your scatter diagram. [2]

Markscheme

point in correct position, labelled M (A1)(ft)(A1)


Note: Award (A1)(ft) for point plotted in correct position, (A1) for point
labelled M Follow through from their part (b).

[2 marks]

(d.i) Calculate r, Pearson’s product–moment correlation coefficient. [2]

Markscheme

(r =) 0. 944 (0. 943733 …) (G2)

Note: Award (G1) for 0. 943 (incorrect rounding).

[2 marks]

(d.ii) Describe the correlation between the wind speed and the time
to fully charge the robot. [2]

Markscheme

(very) strong positive correlation (A1)(ft)(A1)(ft)

Note: Award (A1)(ft) for (very) strong. Award (A1)(ft) for positive. Follow though
from their part (d)(i). If there is no answer to part (d)(i), award at most (A0)(A1)
for a correct direction.

[2 marks]

(e.i) Write down the equation of the regression line y on x, in the


form y = mx + c. [2]

Markscheme
y = 0. 465x + 23. 2 (y = 0. 465020 … x + 23. 1646 …) (A1)(A1)(G2)

Note: Award (A1) for 0. 465x. Award (A1) for 23. 2. If the answer is not an
equation, award at most (A1)(A0).

[2 marks]

(e.ii) Draw this regression line on your scatter diagram. [2]

Markscheme

regression line through their M (A1)(ft)

regression line through their (0, 23. 2) (A1)(ft)

Note: Award a maximum of (A1)(A0) if the line is not straight/ruler not used.
Award (A0)(A0) if the points are connected.
Follow through from their point M in part (b) and their y -intercept in part
(e)(i).
If M is not plotted or labelled, then follow through from part (b).

[2 marks]

(e.iii) Hence or otherwise estimate the charging time when the wind
speed is 27 km h
−1
. [2]

Markscheme

(y =) 0. 465020 …(27)+23. 1646 … (M1)

Note: Award (M1) for correct substitution into their regression equation.
35. 7 (minutes) (35. 7201 …) (A1)(ft)(G2)

Note: Follow through from their equation in part (e)(i).

OR

an attempt to use their regression line to find the y value at x = 27

Note: Award (M1) for an indication of using their regression line. This must
be illustrated by vertical and horizontal lines or marks at the correct place(s)
on their scatter diagram.

35. 7 (minutes) (A1)(ft)

Note: Follow through from part (e)(ii).

[2 marks]

(f ) Don concluded from his investigation: “There is no causation


between wind speed and the time to fully charge the robot”.

In the context of the question, briefly explain the meaning of


“no causation”. [1]

Markscheme

wind speed does not cause a change in the time to charge (the robot)
(A1)

Note: Award (A1) for a statement that communicates the meaning of a non-
causal relationship between the two variables.

[1 mark]
4. [Maximum mark: 15] 19N.2.SL.TZ0.T_1
Casanova restaurant offers a set menu where a customer chooses one of the
following meals: pasta, fish or shrimp.

The manager surveyed 150 customers and recorded the customer’s age and
chosen meal. The data is shown in the following table.

A χ test was performed at the 10% significance level. The critical value for this
2

test is 4. 605.

Write down

A customer is selected at random.

(a) State H , the null hypothesis for this test.


0 [1]

Markscheme

(H0 :) choice of meal is independent of age (or equivalent) (A1)

Note: Accept "not associated" or "not dependent" instead of independent.


In lieu of "age", accept an equivalent alternative such as "being a child or
adult".

[1 mark]

(b) Write down the number of degrees of freedom. [1]


Markscheme

2 (A1)

[1 mark]

(c) Show that the expected number of children who chose shrimp
is 31, correct to two significant figures. [2]

Markscheme

69

150
×
67

150
× 150 OR 69×67

150
(M1)

Note: Award (M1) for correct substitution into expected frequency formula.

30. 82 (30. 8) (A1)

31 (AG)

Note: Both an unrounded answer that rounds to the given answer and
rounded answer must be seen for the (A1) to be awarded.

[2 marks]

(d.i) the χ statistic.


2 [2]

Markscheme


2

calc
=) 2. 66 (2. 657537 …) (G2)

[2 marks]

(d.ii) the p-value. [1]

Markscheme
(p-value =) 0. 265 (0. 264803 …) (G1)

Note: Award (G0)(G2) if the χ statistic is missing or incorrect and the p-value
2

is correct.

[1 mark]

(e) State the conclusion for this test. Give a reason for your answer. [2]

Markscheme

0. 265 > 0. 10 OR 2. 66 < 4. 605 (R1)(ft)

the null hypothesis is not rejected (A1)(ft)

OR

the choice of meal is independent of age (or equivalent) (A1)(ft)

Note: Award (R1)(ft)) for a correct comparison of either their χ statistic to


2

the χ critical value or their p-value to the significance level.


2

Condone “accept” in place of “not reject”.


Follow through from parts (a) and (d).

Do not award (A1)(ft)(R0).

[2 marks]

(f.i) Calculate the probability that the customer is an adult. [2]

Markscheme

81

150
(
27

50
, 0. 54, 54%) (A1)(A1)(G2)

Note: Award (A1) for numerator, (A1) for denominator.


[2 marks]

(f.ii) Calculate the probability that the customer is an adult or that


the customer chose shrimp. [2]

Markscheme

116

150
(
58

75
, 0. 773, 0. 773333 … , 77. 3%) (A1)(A1)(G2)

Note: Award (A1) for numerator, (A1) for denominator.

[2 marks]

(f.iii) Given that the customer is a child, calculate the probability that
they chose pasta or fish. [2]

Markscheme

34

69
(0. 493, 0. 492753 … , 49. 3%) (A1)(A1)(G2)

Note: Award (A1) for numerator, (A1) for denominator.

[2 marks]
5. [Maximum mark: 6] 19M.2.SL.TZ2.S_1
A group of 7 adult men wanted to see if there was a relationship between their
Body Mass Index (BMI) and their waist size. Their waist sizes, in centimetres, were
recorded and their BMI calculated. The following table shows the results.

The relationship between x and y can be modelled by the regression equation


y = ax + b.

(a.i) Write down the value of a and of b. [3]

Markscheme

valid approach (M1)

eg correct value for a or b (or for correct r or r = 0.955631 seen in (ii))


2

0.141120, 11.1424

a = 0.141, b = 11.1 A1A1 N3

[3 marks]

(a.ii) Find the correlation coefficient. [1]

Markscheme

0.977563

r = 0.978 A1 N1

[1 mark]
(b) Use the regression equation to estimate the BMI of an adult
man whose waist size is 95 cm. [2]

Markscheme

correct substitution into their regression equation (A1)

eg 0.141(95) + 11.1

24.5488

24.5 A1 N2

[2 marks]
6. [Maximum mark: 13] 19M.2.SL.TZ2.T_1
Sila High School has 110 students. They each take exactly one language class
from a choice of English, Spanish or Chinese. The following table shows the
number of female and male students in the three different language classes.

A χ test was carried out at the 5 % significance level to analyse the relationship
2

between gender and student choice of language class.

Use your graphic display calculator to write down

The critical value at the 5 % significance level for this test is 5.99.

One student is chosen at random from this school.

Another student is chosen at random from this school.

(a) Write down the null hypothesis, H0 , for this test. [1]

Markscheme

* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.

(H0:) (choice of ) language is independent of gender (A1)

Note: Accept “there is no association between language (choice) and


gender”. Accept “language (choice) is not dependent on gender”. Do not
accept “not related” or “not correlated” or “not influenced”.
[1 mark]

(b) State the number of degrees of freedom. [1]

Markscheme

2 (AG)

[1 mark]

(c.i) the expected frequency of female students who chose to take


the Chinese class. [1]

Markscheme

16.4 (16.4181…) (G1)

[1 mark]

(d) State whether or not H0 should be rejected. Justify your


statement. [2]

Markscheme

(we) reject the null hypothesis (A1)(ft)

8.68507… > 5.99 (R1)(ft)

Note: Follow through from part (c)(ii). Accept “do not accept” in place of
“reject.” Do not award (A1)(ft)(R0).

OR

(we) reject the null hypothesis (A1)

0.0130034 < 0.05 (R1)


Note: Accept “do not accept” in place of “reject.” Do not award (A1)(ft)(R0).

[2 marks]

(e.i) Find the probability that the student does not take the Spanish
class. [2]

Markscheme

88

110
(
4

5
, 0.8, 80% ) (A1)(A1)(G2)

Note: Award (A1) for correct numerator, (A1) for correct denominator.

[2 marks]

(e.ii) Find the probability that neither of the two students take the
Spanish class. [3]

Markscheme

88

110
×
87

109
(M1)(M1)

Note: Award (M1) for multiplying two fractions. Award (M1) for multiplying
their correct fractions.

OR

(
46

110
)(
45

109
)+ 2(
46

110
)(
42

109
)+ (
42

110
)(
41

109
) (M1)(M1)

Note: Award (M1) for correct products; (M1) for adding 4 products.

0.639 (0.638532 … ,
348

545
, 63.9% ) (A1)(ft)(G2)

Note: Follow through from their answer to part (e)(i).


[3 marks]

(e.iii) Find the probability that at least one of the two students is
female. [3]

Markscheme

1 −
67

110
×
66

109
(M1)(M1)

Note: Award (M1) for multiplying two correct fractions. Award (M1) for
subtracting their product of two fractions from 1.

OR

43

110
×
42

109
+
43

110
×
67

109
+
67

110
×
43

109
(M1)(M1)

Note: Award (M1) for correct products; (M1) for adding three products.

0.631 (0.631192 … , 63.1% ,


344

545
) (A1)(G2)

[3 marks]
7. [Maximum mark: 9] 18N.2.SL.TZ0.T_1
The marks obtained by nine Mathematical Studies SL students in their projects
(x) and their final IB examination scores (y) were recorded. These data were used
to determine whether the project mark is a good predictor of the examination
score. The results are shown in the table.

The equation of the regression line y on x is y = mx + c.

A tenth student, Jerome, obtained a project mark of 17.

(a.ii) Use your graphic display calculator to write down ȳ , the mean
examination score. [1]

Markscheme

54 (G1)

[1 mark]

(a.iii) Use your graphic display calculator to write down r , Pearson’s


product–moment correlation coefficient. [2]

Markscheme

0.5 (G2)

[2 marks]
(b.i) Find the exact value of m and of c for these data. [2]

Markscheme

m = 0.875, c = 41.75 (m =
7

8
, c =
167

4
) (A1)(A1)

Note: Award (A1) for 0.875 seen. Award (A1) for 41.75 seen. If 41.75 is rounded
to 41.8 do not award (A1).

[2 marks]

(c.i) Use the regression line y on x to estimate Jerome’s examination


score. [2]

Markscheme

y = 0.875(17) + 41.75 (M1)

Note: Award (M1) for correct substitution into their regression line.

= 56.6 (56.625) (A1)(ft)(G2)

Note: Follow through from part (b)(i).

[2 marks]

(c.ii) Justify whether it is valid to use the regression line y on x to


estimate Jerome’s examination score. [2]

Markscheme
the estimate is valid (A1)

since this is interpolation and the correlation coefficient is large enough


(R1)

OR

the estimate is not valid (A1)

since the correlation coefficient is not large enough (R1)

Note: Do not award (A1)(R0). The (R1) may be awarded for reasoning based
on strength of correlation, but do not accept “correlation coefficient is not
strong enough” or “correlation is not large enough”.

Award (A0)(R0) for this method if no numerical answer to part (a)(iii) is seen.

[2 marks]
8. [Maximum mark: 21] 18N.3.AHL.TZ0.Hsp_3
Mr Sailor owns a fish farm and he claims that the weights of the fish in one of his
lakes have a mean of 550 grams and standard deviation of 8 grams.

Assume that the weights of the fish are normally distributed and that Mr Sailor’s
claim is true.

Kathy is suspicious of Mr Sailor’s claim about the mean and standard deviation of
the weights of the fish. She collects a random sample of fish from this lake whose
weights are shown in the following table.

Using these data, test at the 5% significance level the null hypothesis
H : μ = 550 against the alternative hypothesis H : μ < 550 , where μ grams
0 1

is the population mean weight.

Kathy decides to use the same fish sample to test at the 5% significance level
whether or not there is a positive association between the weights and the
lengths of the fish in the lake. The following table shows the lengths of the fish in
the sample. The lengths of the fish can be assumed to be normally distributed.

(a.i) Find the probability that a fish from this lake will have a weight
of more than 560 grams. [2]

Markscheme

Note: Accept all answers that round to the correct 2sf answer in (a), (b) and
(c) but not in (d).

X ∼ N (550, 82) (M1)


P (X > 560) − 0.10564 … = 0.106 A1

[2 marks]

(a.ii) The maximum weight a hand net can hold is 6 kg. Find the
probability that a catch of 11 fish can be carried in the hand net. [4]

Markscheme

Note: Accept all answers that round to the correct 2sf answer in (a), (b) and
(c) but not in (d).

Xi ∼ N (550, 82), i = 1 ,…, 11

11

let Y = ∑ Xi
i=1

E (Y) = 11 × 550 (6050) A1

Var (Y) = 11 × 8
2
(704) (M1)A1

P (Y ⩽ 6000) = 0.02975 … = 0.0298 A1

[4 marks]

(b.i) State the distribution of your test statistic, including the


parameter. [2]

Markscheme
Note: Accept all answers that round to the correct 2sf answer in (a), (b) and
(c) but not in (d).

t distribution with 7 degrees of freedom A1A1

[2 marks]

(b.ii) Find the p-value for the test. [2]

Markscheme

Note: Accept all answers that round to the correct 2sf answer in (a), (b) and
(c) but not in (d).

p = 0.25779…= 0.258 A2

[2 marks]

(b.iii) State the conclusion of the test, justifying your answer. [2]

Markscheme

Note: Accept all answers that round to the correct 2sf answer in (a), (b) and
(c) but not in (d).

p > 0.05 R1

therefore we conclude that there is no evidence to reject H 0 A1


Note: FT their p-value.

Note: Only award A1 if R1 awarded.

[2 marks]

(c.i) State suitable hypotheses for the test. [1]

Markscheme

Note: Accept all answers that round to the correct 2sf answer in (a), (b) and
(c) but not in (d).

H0 : ρ = 0 , H1 : ρ > 0 A1

Note: Do not accept r in place of ρ.

[1 mark]

(c.ii) Find the product-moment correlation coefficient r. [2]

Markscheme

Note: Accept all answers that round to the correct 2sf answer in (a), (b) and
(c) but not in (d).

r = 0.782 A2
[2 marks]

(c.iii) State the p-value and interpret it in this context. [3]

Markscheme

Note: Accept all answers that round to the correct 2sf answer in (a), (b) and
(c) but not in (d).

0.01095… = 0.0110 A1

since 0.0110 < 0.05 R1

there is positive association between weight and length A1

Note: FT their p-value.

Note: Only award A1 if R1 awarded.

Note: Conclusion must be in context.

[3 marks]

(d) Use an appropriate regression line to estimate the weight of a


fish with length 360 mm. [3]

Markscheme

Note: Accept all answers that round to the correct 2sf answer in (a), (b) and
(c) but not in (d).

regression line of y (weight) on x(length) is (M1)


y = 0.8267… x + 255.96… (A1)

x = 360 gives y = 554 A1

Note: Award M1A0A0 for the wrong regression line, that is y = 0.7393…x –
51.62….

[3 marks]
9. [Maximum mark: 6] 18M.1.SL.TZ1.T_4
A scientist measures the concentration of dissolved oxygen, in milligrams per
litre (y) , in a river. She takes 10 readings at different temperatures, measured in
degrees Celsius (x).

The results are shown in the table.

It is believed that the concentration of dissolved oxygen in the river varies


linearly with the temperature.

(a.i) For these data, find Pearson’s product-moment correlation


coefficient, r. [2]

Markscheme

−0.974 (−0.973745…) (A2)

Note: Award (A1) for an answer of 0.974 (minus sign omitted). Award (A1) for
an answer of −0.973 (incorrect rounding).

[2 marks]

(a.ii) For these data, find the equation of the regression line y on x. [2]

Markscheme

y = −0.365x + 17.9 (y = −0.365032…x + 17.9418…) (A1)(A1) (C4)

Note: Award (A1) for −0.365x, (A1) for 17.9. Award at most (A1)(A0) if not an
equation or if the values are reversed (eg y = 17.9x −0.365).

[2 marks]
(b) Using the equation of the regression line, estimate the
concentration of dissolved oxygen in the river when the
temperature is 18 °C. [2]

Markscheme

y = −0.365032… × 18 + 17.9418… (M1)

Note: Award (M1) for correctly substituting 18 into their part (a)(ii).

= 11.4 (11.3712…) (A1)(ft) (C2)

Note: Follow through from part (a)(ii).

[2 marks]
10. [Maximum mark: 6] 18M.1.SL.TZ2.T_1
The following scatter diagram shows the scores obtained by seven students in
their mathematics test, m, and their physics test, p.

The mean point, M, for these data is (40, 16).

(a) Plot and label the point M(m̄, p̄ ) on the scatter diagram. [2]

Markscheme

* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.
(A1)(A1)

(C2)

Note: Award (A1) for mean point plotted and (A1) for labelled M.

[2 marks]

(b) Draw the line of best fit, by eye, on the scatter diagram. [2]

Markscheme

straight line through their mean point crossing the p-axis at 5±2 (A1)(ft)(A1)
(ft) (C2)

Note: Award (A1)(ft) for a straight line through their mean point. Award (A1)
(ft) for a correct p-intercept if line is extended.

[2 marks]

(c) Using your line of best fit, estimate the physics test score for a
student with a score of 20 in their mathematics test. [2]

Markscheme
point on line where m = 20 identified and an attempt to identify y-
coordinate (M1)

10.5 (A1)(ft) (C2)

Note: Follow through from their line in part (b).

[2 marks]
11. [Maximum mark: 13] 18M.2.SL.TZ1.S_8
The following table shows values of ln x and ln y.

The relationship between ln x and ln y can be modelled by the regression


equation ln y = a ln x + b.

(a) Find the value of a and of b. [3]

Markscheme

* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.

valid approach (M1)

eg one correct value

−0.453620, 6.14210

a = −0.454, b = 6.14 A1A1 N3

[3 marks]

(b) Use the regression equation to estimate the value of y when x =


3.57. [3]

Markscheme

correct substitution (A1)

eg −0.454 ln 3.57 + 6.14

correct working (A1)


eg ln y = 5.56484

261.083 (260.409 from 3 sf )

y = 261, (y = 260 from 3sf ) A1 N3

Note: If no working shown, award N1 for 5.56484.


If no working shown, award N2 for ln y = 5.56484.

[3 marks]

(c) The relationship between x and y can be modelled using the


formula y = kxn, where k ≠ 0 , n ≠ 0 , n ≠ 1.

By expressing ln y in terms of ln x, find the value of n and of k. [7]

Markscheme

METHOD 1

valid approach for expressing ln y in terms of ln x (M1)

eg ln y = ln (kx ) ,
n n
ln (kx ) = a ln x + b

correct application of addition rule for logs (A1)

eg ln k + ln n
(x )

correct application of exponent rule for logs A1

eg ln k + n ln x

comparing one term with regression equation (check FT) (M1)

eg n = a, b = ln k

correct working for k (A1)

eg ln k = 6.14210, k = e
6.14210
465.030

n = −0.454, k = 465 (464 from 3sf ) A1A1 N2N2

METHOD 2

valid approach (M1)

eg e
ln y
= e
a ln x+b

correct use of exponent laws for e a ln x+b


(A1)

eg e
a ln x
× e
b

correct application of exponent rule for a ln x (A1)

eg ln x a

correct equation in y A1

eg y = x
a
× e
b

comparing one term with equation of model (check FT) (M1)

eg k = e ,
b
n = a

465.030

n = −0.454, k = 465 (464 from 3sf ) A1A1 N2N2

METHOD 3

valid approach for expressing ln y in terms of ln x (seen anywhere) (M1)

eg ln y = ln (kx ) ,
n n
ln (kx ) = a ln x + b

correct application of exponent rule for logs (seen anywhere) (A1)


eg ln a
(x ) + b

correct working for b (seen anywhere) (A1)

eg b = ln (e )
b

correct application of addition rule for logs A1

eg ln b
(e x )
a

comparing one term with equation of model (check FT) (M1)

eg k = e ,
b
n = a

465.030

n = −0.454, k = 465 (464 from 3sf ) A1A1 N2N2

[7 marks]
12. [Maximum mark: 6] 18M.2.SL.TZ2.S_1
The following table shows the mean weight, y kg , of children who are x years old.

The relationship between the variables is modelled by the regression line with
equation y = ax + b.

(a.i) Find the value of a and of b. [3]

Markscheme

valid approach (M1)

eg correct value for a or b (or for r seen in (ii))

a = 1.91966 b = 7.97717

a = 1.92, b = 7.98 A1A1 N3

[3 marks]

(a.ii) Write down the correlation coefficient. [1]

Markscheme

0.984674

r = 0.985 A1 N1

[1 mark]

(b) Use your equation to estimate the mean weight of a child that is
1.95 years old. [2]
Markscheme

correct substitution into their equation (A1)


eg 1.92 × 1.95 + 7.98

11.7205

11.7 (kg) A1 N2

[2 marks]
13. [Maximum mark: 14] 17N.2.SL.TZ0.S_8
Adam is a beekeeper who collected data about monthly honey production in his
bee hives. The data for six of his hives is shown in the following table.

The relationship between the variables is modelled by the regression line with
equation P = aN + b.

Adam has 200 hives in total. He collects data on the monthly honey production
of all the hives. This data is shown in the following cumulative frequency graph.

Adam’s hives are labelled as low, regular or high production, as defined in the
following table.
Adam knows that 128 of his hives have a regular production.

(a) Write down the value of a and of b. [3]

Markscheme

* This question is from an exam for a previous syllabus, and may contain
minor differences in marking or structure.

evidence of setup (M1)

egcorrect value for a or b

a = 6.96103, b = −454.805

a = 6.96, b = −455 (accept 6.96x − 455) A1A1 N3

[3 marks]

(b) Use this regression line to estimate the monthly honey


production from a hive that has 270 bees. [2]

Markscheme

substituting N = 270 into their equation (M1)

eg6.96(270) − 455

1424.67

P = 1420 (g) A1 N2

[2 marks]
(c) Write down the number of low production hives. [1]

Markscheme

40 (hives) A1 N1

[1 mark]

(d.i) Find the value of k; [3]

Markscheme

valid approach (M1)

eg128 + 40

168 hives have a production less than k (A1)

k = 1640 A1 N3

[3 marks]

(d.ii) Find the number of hives that have a high production. [2]

Markscheme

valid approach (M1)

eg200 − 168

32 (hives) A1 N2

[2 marks]

(e) Adam decides to increase the number of bees in each low


production hive. Research suggests that there is a probability of
0.75 that a low production hive becomes a regular production
hive. Calculate the probability that 30 low production hives
become regular production hives. [3]

Markscheme

recognize binomial distribution (seen anywhere) (M1)

n
egX ∼ B(n, p), (
r
) p (1 − p)
n−r

correct values (A1)

egn = 40 (check FT) and p = 0.75 and


40 30 10
r = 30, ( ) 0.75 (1 − 0.75)
30

0.144364

0.144 A1 N2

[3 marks]

© International Baccalaureate Organization, 2023

You might also like