You are on page 1of 58

Revision sheet 1- Chapter 19 [246 marks]

1. [Maximum mark: 5] SPM.1.SL.TZ0.6


As part of a study into healthy lifestyles, Jing visited Surrey Hills University. Jing
recorded a person’s position in the university and how frequently they ate a
salad. Results are shown in the table.

Jing conducted a χ2 test for independence at a 5 % level of significance.

(a) State the null hypothesis. [1]

(b) Calculate the p-value for this test. [2]

(c) State, giving a reason, whether the null hypothesis should be


accepted. [2]
2. [Maximum mark: 7] SPM.1.AHL.TZ0.12
Product research leads a company to believe that the revenue (R)
made by selling its goods at a price (p) can be modelled by the
equation.

R (p) = cpe
dp
, c, d ∈ R

There are two competing models, A and B with different values for the
parameters c and d.

Model A has c = 3, d = −0.5 and model B has c = 2.5, d = −0.6.

The company experiments by selling the goods at three different prices


in three similar areas and the results are shown in the following table.

The company will choose the model with the smallest value for the
sum of square residuals.

Determine which model the company chose. [7]


3. [Maximum mark: 17] SPM.2.SL.TZ0.3
The Malvern Aquatic Center hosted a 3 metre spring board diving event. The
judges, Stan and Minsun awarded 8 competitors a score out of 10. The raw data is
collated in the following table.

The Commissioner for the event would like to find the Spearman’s rank
correlation coefficient.

(a.i) Write down the value of the Pearson’s product–moment


correlation coefficient, r. [2]

(a.ii) Using the value of r, interpret the relationship between Stan’s


score and Minsun’s score. [2]

(b) Write down the equation of the regression line y on x. [2]


(c.i) Use your regression equation from part (b) to estimate Minsun’s
score when Stan awards a perfect 10. [2]

(c.ii) State whether this estimate is reliable. Justify your answer. [2]

(d) Copy and complete the information in the following table.

[2]
(e.i) Find the value of the Spearman’s rank correlation coefficient, r .
s [2]

(e.ii) Comment on the result obtained for r . s [2]


(f ) The Commissioner believes Minsun’s score for competitor G is
too high and so decreases the score from 9.5 to 9.1.

Explain why the value of the Spearman’s rank correlation


coefficient r does not change.
s [1]
4. [Maximum mark: 6] EXN.1.SL.TZ0.3
The weights of apples on a tree can be modelled by a normal distribution with a
mean of 85 grams and a standard deviation of 7. 5 grams.

A sample of apples are taken from 2 trees, A and B, in different parts of the
orchard.

The data is shown in the table below.

The owner of the orchard wants to know whether the mean weight of the apples
from tree A(μ ) is greater than the mean weight of the apples from tree B(μ )
A B

so sets up the following test:

H0 : μ A = μ B and H 1 : μA > μB

(a) Find the probability that an apple from the tree has a weight
greater than 90 grams. [2]

(b.i) Find the p-value for the owner’s test. [2]


(b.ii) The test is performed at the 5% significance level.

State the conclusion of the test, giving a reason for your answer. [2]
5. [Maximum mark: 9] EXM.1.SL.TZ0.11
A calculator generates a random sequence of digits. A sample of 200 digits is
randomly selected from the first 100 000 digits of the sequence. The following
table gives the number of times each digit occurs in this sample.

It is claimed that all digits have the same probability of appearing in the
sequence.

(a) Test this claim at the 5% level of significance. [7]


(b) Explain what is meant by the 5% level of significance. [2]
6. [Maximum mark: 6] EXM.1.SL.TZ0.2
Kayla wants to measure the extent to which two judges in a gymnastics
competition are in agreement. Each judge has ranked the seven competitors, as
shown in the table, where 1 is the highest ranking and 7 is the lowest.

(a) Calculate Spearman’s rank correlation coefficient for this data. [5]

(b) State what conclusion Kayla can make from the answer in part
(a). [1]
7. [Maximum mark: 9] EXM.1.SL.TZ0.3
Charles wants to measure the strength of the relationship between the price of a
house and its distance from the city centre where he lives. He chooses houses of
a similar size and plots a graph of price, P (in thousands of dollars) against
distance from the city centre, d (km).

The data from the graph is shown in the table.

(a) Explain why it is not appropriate to use Pearson’s product


moment correlation coefficient to measure the strength of the
relationship between P and d. [1]

(b) Explain why it is appropriate to use Spearman’s rank correlation


coefficient to measure the strength of the relationship between
P and d. [1]
(c) Calculate Spearman’s rank correlation coefficient for this data. [6]

(d) State what conclusion Charles can make from the answer in
part (c).
[1]
8. [Maximum mark: 11] EXM.1.SL.TZ0.8
In an effort to study the level of intelligence of students entering college, a
psychologist collected data from 4000 students who were given a standard test.
The predictive norms for this particular test were computed from a very large
population of scores having a normal distribution with mean 100 and standard
deviation of 10. The psychologist wishes to determine whether the 4000 test
scores he obtained also came from a normal distribution with mean 100 and
standard deviation 10. He prepared the following table (expected frequencies
are rounded to the nearest integer):

(a) Copy and complete the table, showing how you arrived at your
answers. [5]
(b) Test the hypothesis at the 5% level of significance. [6]
9. [Maximum mark: 6] EXM.1.AHL.TZ0.19
A company sends a group of employees on a training course. Afterwards, they
survey these employees to gather data on the effectiveness of the training. In
order to test the reliability of the survey, they design two sets of similar
questions, which are given to the employees one week apart.

The questions in the survey were grouped in different sections. The mean scores
of the employees on the first section of each survey are given in the table.

(a) State the name of this test for reliability. [1]

(b) State a possible disadvantage of using this test for reliability. [1]

(c) Calculate Pearson’s product moment correlation coefficient for


this data. [2]
(d) Hence determine, with a reason, if the survey is reliable. [2]
10. [Maximum mark: 5] EXM.1.AHL.TZ0.20
As part of the selection process for an engineering course at a particular
university, applicants are given an exam in mathematics. This year the university
has produced a new exam and they want to test if it is a valid indicator of future
performance, before giving it to applicants. They randomly select 8 students in
their first year of the engineering course and give them the exam. They compare
the exam scores with their results in the engineering course.

The results of the 8 students are shown in the table.

(a) State the name of this test for validity. [1]

(b) Calculate Pearson’s product moment correlation coefficient for


this data. [2]

(c) Hence determine, with a reason, if the new exam is a valid


indicator of future performance. [2]
11. [Maximum mark: 10] EXM.1.AHL.TZ0.55
Eggs at a farm are sold in boxes of six. Each egg is either brown or white. The
owner believes that the number of brown eggs in a box can be modelled by a
binomial distribution. He examines 100 boxes and obtains the following data.

(a.i) Calculate the mean number of brown eggs in a box. [1]

(a.ii) Hence estimate p, the probability that a randomly chosen egg is


brown. [1]

(b) By calculating an appropriate χ statistic, test, at the 5%


2

significance level, whether or not the binomial distribution


gives a good fit to these data. [8]
12. [Maximum mark: 18] EXM.1.AHL.TZ0.56
A zoologist believes that the number of eggs laid in the Spring by female birds of
a certain breed follows a Poisson law. She observes 100 birds during this period
and she produces the following table.

The zoologist wishes to determine whether or not a Poisson law provides a


suitable model.

(a) Calculate the mean number of eggs laid by these birds. [2]

(b.i) Write down appropriate hypotheses. [2]


(b.ii) Carry out a test at the 1% significance level, and state your
conclusion. [14]
13. [Maximum mark: 14] EXM.1.AHL.TZ0.58
The number of telephone calls received by a helpline over 80 one-minute
periods are summarized in the table below.

(a) Find the exact value of the mean of this distribution. [2]

(b) Test, at the 5% level of significance, whether or not the data can
be modelled by a Poisson distribution. [12]
14. [Maximum mark: 13] EXM.2.SL.TZ0.5
A pharmaceutical company has developed a new drug to decrease cholesterol.
The final stage of testing the new drug is to compare it to their current drug. They
have 150 volunteers, all recently diagnosed with high cholesterol, from which
they want to select a sample of size 18. They require as close as possible 20% of
the sample to be below the age of 30, 30% to be between the ages of 30 and 50
and 50% to be over the age of 50.

Half of the 18 volunteers are given the current drug and half are given the new
drug. After six months each volunteer has their cholesterol level measured and
the decrease during the six months is shown in the table.

Calculate the mean decrease in cholesterol for

The company uses a t-test, at the 1% significance level, to determine if the new
drug is more effective at decreasing cholesterol.

(a) State the name for this type of sampling technique. [1]

(b) Calculate the number of volunteers in the sample under the age
of 30. [3]
(c.i) The new drug. [1]

(c.ii) The current drug. [1]

(d) State an assumption that the company is making, in order to


use a t-test. [1]

(e) State the hypotheses for this t-test. [1]


(f ) Find the p-value for this t-test. [3]

(g) State the conclusion of this test, in context, giving a reason. [2]
15. [Maximum mark: 16] EXM.2.SL.TZ0.6
Jim writes a computer program to generate 500 values of a variable Z. He obtains
the following table from his results.

In this situation, state briefly what is meant by

(a) Use a chi-squared goodness of fit test to investigate whether or


not, at the 5 % level of significance, the N(0, 1) distribution can
be used to model these results. [12]
(b.i) a Type I error. [2]

(b.ii) a Type II error. [2]


16. [Maximum mark: 10] EXM.2.AHL.TZ0.12
The curve y = f (x) is shown in the graph, for 0 ⩽ x ⩽ 10 .

The curve y = f (x) passes through the following points.

It is required to find the area bounded by the curve, the x-axis, the y -axis and the
line x = 10.

One possible model for the curve y = f (x) is a cubic function.

(a) Use the trapezoidal rule to find an estimate for the area. [3]
(b.i) Use all the coordinates in the table to find the equation of the
least squares cubic regression curve. [3]

(b.ii) Write down the coefficient of determination. [1]

(c.i) Write down an expression for the area enclosed by the cubic
regression curve, the x-axis, the y -axis and the line x = 10.
[1]

(c.ii) Find the value of this area. [2]


17. [Maximum mark: 12] EXM.2.AHL.TZ0.24
The hens on a farm lay either white or brown eggs. The eggs are put into boxes
of six. The farmer claims that the number of brown eggs in a box can be
modelled by the binomial distribution, B(6, p). By inspecting the contents of 150
boxes of eggs she obtains the following data.

(a) Show that this data leads to an estimated value of p = 0.4 . [1]

(b) Stating null and alternative hypotheses, carry out an


appropriate test at the 5 % level to decide whether the farmer’s
claim can be justified. [11]
18. [Maximum mark: 27] EXM.2.AHL.TZ0.28
The random variable X is thought to follow a binomial distribution B (4, p). In
order to investigate this belief, a random sample of 100 observations on X was
taken with the following results.

An automatic machine is used to fill bottles of water. The amount delivered, Y


ml, may be assumed to be normally distributed with mean μ ml and standard
deviation 8 ml. Initially, the machine is adjusted so that the value of μ is 500. In
order to check that the value of μ remains equal to 500, a random sample of 10
bottles is selected at regular intervals, and the mean amount of water, ȳ , in these
bottles is calculated. The following hypotheses are set up.

H0: μ = 500; H1: μ ≠ 500

The critical region is defined to be (ȳ < 495) ∪ (ȳ > 505) .

(a.i) State suitable hypotheses for testing this belief. [1]

(a.ii) Calculate the mean of these data and hence estimate the value
of p. [5]
(a.iii) Calculate an appropriate value of χ and state your conclusion,
2

using a 1% significance level. [13]


(b.i) Find the significance level of this procedure. [5]
(b.ii) Some time later, the actual value of μ is 503. Find the
probability of a Type II error. [3]
19. [Maximum mark: 34] EXM.3.AHL.TZ0.8
This question explores methods to analyse the scores in an exam.

A random sample of 149 scores for a university exam are given in the table.

The university wants to know if the scores follow a normal distribution, with the
mean and variance found in part (a).

The expected frequencies are given in the table.

The university assigns a pass grade to students whose scores are in the top 80%.

The university also wants to know if the exam is gender neutral. They obtain
random samples of scores for male and female students. The mean, sample
variance and sample size are shown in the table.

The university awards a distinction to students who achieve high scores in the
exam. Typically, 15% of students achieve a distinction. A new exam is trialed with
a random selection of students on the course. 5 out of 20 students achieve a
distinction.

A different exam is trialed with 16 students. Let p be the percentage of students


achieving a distinction. It is desired to test the hypotheses
H0 : p = 0.15 against H 1 : p > 0.15

It is decided to reject the null hypothesis if the number of students achieving a


distinction is greater than 3.
(a.i) Find unbiased estimates for the population mean. [1]

(a.ii) Find unbiased estimates for the population Variance. [2]

(b) Show that the expected frequency for 20 < x ≤ 4 is 31.5 correct
to 1 decimal place. [3]
(c) Perform a suitable test, at the 5% significance level, to
determine if the scores follow a normal distribution, with the
mean and variance found in part (a). You should clearly state
your hypotheses, the degrees of freedom, the p-value and your
conclusion. [8]

(d) Use the normal distribution model to find the score required to
pass.
[2]

(e) Perform a suitable test, at the 5% significance level, to


determine if there is a difference between the mean scores of
males and females. You should clearly state your hypotheses,
the p-value and your conclusion. [6]
(f ) Perform a suitable test, at the 5% significance level, to
determine if it is easier to achieve a distinction on the new
exam. You should clearly state your hypotheses, the critical
region and your conclusion. [6]

(g.i) Find the probability of making a Type I error. [3]


(g.ii) Given that p = 0.2 find the probability of making a Type II
error. [3]
20. [Maximum mark: 5] 21M.1.SL.TZ1.6
Arriane has geese on her farm. She claims the mean weight of eggs from her
black geese is less than the mean weight of eggs from her white geese.

She recorded the weights of eggs, in grams, from a random selection of geese.
The data is shown in the table.

In order to test her claim, Arriane performs a t-test at a 10% level of significance.
It is assumed that the weights of eggs are normally distributed and the samples
have equal variances.

(a) State, in words, the null hypothesis. [1]

(b) Calculate the p-value for this test. [2]

(c) State whether the result of the test supports Arriane’s claim.
Justify your reasoning. [2]
21. [Maximum mark: 6] 21M.1.SL.TZ2.11
A newspaper vendor in Singapore is trying to predict how many copies of The
Straits Times they will sell. The vendor forms a model to predict the number of
copies sold each weekday. According to this model, they expect the same
number of copies will be sold each day.

To test the model, they record the number of copies sold each weekday during a
particular week. This data is shown in the table.

A goodness of fit test at the 5% significance level is used on this data to


determine whether the vendor’s model is suitable.

The critical value for the test is 9. 49 and the hypotheses are

H0 : The data satisfies the model.


H1 : The data does not satisfy the model.

(a) Find an estimate for how many copies the vendor expects to
sell each day. [1]

(b.i) Write down the degrees of freedom for this test. [1]

(b.ii) Write down the conclusion to the test. Give a reason for your
answer. [4]
© International Baccalaureate Organization, 2023

You might also like