You are on page 1of 41

Oberoi International School By Yogi

Statistics Module-1
MYP Standard and Extended-Differentiated

1
Oberoi International School By Yogi

1.

At the end of a school day, the Headmaster conducted a survey asking students in how
many classes they had used the internet.

The data is shown in the following table.

(a) State whether the data is discrete or continuous.

[1]

The mean number of classes in which a student used the internet is 2.

(b) Find the value of 𝑘.

[4]

(c) It was not possible to ask every person in the school, so the Headmaster arranged
the student names in alphabetical order and then asked every 10th person on the list.

Identify the sampling technique used in the survey.

[1]

2
Oberoi International School By Yogi

2.

A food scientist measures the weights of 760 potatoes taken from a single field and
the distribution of the weights is shown by the cumulative frequency curve below.

(a) Find the number of potatoes in the sample with a weight of more than 200 grams.

[2]

3
Oberoi International School By Yogi
(b.i) Find the median weight.

[1]

(b.ii) Find the lower quartile.

[1]

(b.iii) Find the upper quartile.

[1]

(c) The weight of the smallest potato in the sample is 20 grams and the weight of the largest
is 400 grams.

Use the scale shown below to draw a box and whisker diagram showing the distribution of the
weights of the potatoes. You may assume there are no outliers.

[2]

4
Oberoi International School By Yogi

3.

A school consists of 740 students divided into 5 grade levels. The numbers of students in each
grade are shown in the table below.

The Principal of the school wishes to select a sample of 25 students. She wishes to ensure that,
as closely as possible, the proportion of the students from each grade in the sample is the same
as the proportions in the school.

(a) Calculate the number of grade 12 students who should be in the sample.

[3]

(b) The Principal selects the students for the sample by asking those who took part in a
previous survey if they would like to take part in another. She takes the first of those who
reply positively, up to the maximum needed for the sample.

State which two of the sampling methods listed below best describe the method used.

Stratified Quota Convenience Systematic Simple random

[2]

4.

In a school, 200 students solved a problem in a mathematics competition. Their times to solve
the problem were recorded and the following cumulative frequency graph was produced.

5
Oberoi International School By Yogi

(a) Use the graph to find

[[N/A]]

(a.i) the median time;

[1]

(a.ii) the lower quartile;

[1

(a.iii) the upper quartile;

[1]

(a.iv) the interquartile range.

[1

Cedric took 14 seconds to solve the problem.

(b) Determine whether Cedric’s time is an outlier.

[3]

6
Oberoi International School By Yogi

5.

The following frequency distribution table shows the test grades for a group of students.

Grade 1 2 3 4 5 6 7

Frequency 1 4 7 9 𝑝 9 4

For this distribution, the mean grade is 4.5.

(a) Write down the total number of students in terms of 𝑝.

[1]

(b) Calculate the value of 𝑝.

[3]

6.

In the first month of a reforestation program, the town of Neerim plants 85 trees. Each
subsequent month the number of trees planted will increase by an additional 30 trees.

The number of trees to be planted in each of the first three months are shown in the following
table.

(a) Find the number of trees to be planted in the 15th month.

7
Oberoi International School By Yogi
(b) Find the total number of trees to be planted in the first 15 months.

[2]

(c) Find the mean number of trees planted per month during the first 15 months.

7.

Elsie, a librarian, wants to investigate the length of time, 𝑇 minutes, that people spent in her
library on a particular day.

(a) State whether the variable 𝑇 is discrete or continuous.

[1]

Elsie’s data for 160 people who visited the library on that particular day is shown in the following
table.

(b) Find the value of 𝑘.

[2]

(c.i) Write down the modal class.

[1]

8
Oberoi International School By Yogi
(c.ii) Write down the mid-interval value for this class.

[1]

(d) Use Elsie’s data to calculate an estimate of the mean time that people spent in the library.

[2]

(e) Using the table, write down the maximum possible number of people who spent 35 minutes
or less in the library on that day.

[1]

Elsie assumes her data to be representative of future visitors to the library.

(f) Find the probability a visitor spends at least 60 minutes in the library.

[2]

The following box and whisker diagram shows the times, in minutes, that the 160 visitors spent
in the library.

(g) Write down the median time spent in the library.

[1]

(h) Find the interquartile range.

[2]

(i) Hence show that the longest time that a person spent in the library is not an outlier.

[3]

9
Oberoi International School By Yogi

Elsie believes the box and whisker diagram indicates that the times spent in the library are not
normally distributed.

(j) Identify one feature of the box and whisker diagram which might support Elsie’s belief.

[1]

8.

A college runs a mathematics course in the morning. Scores for a test from this class are shown
below.

25 33 51 62 63 63 70 74 79 79 81 88 90 90 98

For these data, the lower quartile is 62 and the upper quartile is 88.

(a) Show that the test score of 25 would not be considered an outlier.

[3]

The box and whisker diagram showing these scores is given below.

Test scores

Another mathematics class is run by the college during the evening. A box and whisker diagram
showing the scores from this class for the same test is given below.

Test scores

A researcher reviews the box and whisker diagrams and believes that the evening
class performed better than the morning class.

10
Oberoi International School By Yogi
(b) With reference to the box and whisker diagrams, state one aspect that may support the
researcher’s opinion and one aspect that may counter it.

[2]

9.

Mackenzie conducted an experiment on the reaction times of teenagers. The results of the
experiment are displayed in the following cumulative frequency graph.

Use the graph to estimate the

11
Oberoi International School By Yogi
(a.i) median reaction time.

[1]

(a.ii) interquartile range of the reaction times.

[3]

(b) Find the estimated number of teenagers who have a reaction time greater than 0.4 seconds.

[2]

(c) Determine the 90th percentile of the reaction times from the cumulative frequency graph.

[2]

Mackenzie created the cumulative frequency graph using the following grouped frequency table.

(d.i) Write down the value of 𝑎.

[1]

12
Oberoi International School By Yogi
(d.ii) Write down the value of 𝑏.

[1]

(e) Write down the modal class from the table.

[1]

(f) Use your graphic display calculator to find an estimate of the mean reaction time.

[2]

Upon completion of the experiment, Mackenzie realized that some values were
grouped incorrectly in the frequency table. Some reaction times recorded in the interval 0 < 𝑡 ≤
0.2 should have been recorded in the interval 0.2 < 𝑡 ≤ 0.4.

(g) Suggest how, if at all, the estimated mean and estimated median reaction times will change if
the errors are corrected. Justify your response.

[4]

10.

A group of 120 students sat a history exam. The cumulative frequency graph shows the scores
obtained by the students.

13
Oberoi International School By Yogi

(a) Find the median of the scores obtained.

[1]

The students were awarded a grade from 1 to 5, depending on the score obtained in the
exam. The number of students receiving each grade is shown in the following table.

14
Oberoi International School By Yogi
(b) Find an expression for 𝑎 in terms of 𝑏.

[2]

The mean grade for these students is 3.65.

(c.i) Find the number of students who obtained a grade 5.

[3]

(c.ii) Find the minimum score needed to obtain a grade 5.

[2]

11.

Deb used a thermometer to record the maximum daily temperature over ten consecutive
days. Her results, in degrees Celsius (∘ C), are shown below.

14, 15, 14, 11, 10, 9, 14, 15, 16, 13

For this data set, find the value of

(a) the mode.

[1]

(b) the mean.

[2]

(c) the standard deviation.

[1]

12.

15
Oberoi International School By Yogi
The number of sick days taken by each employee in a company during a year was
recorded. The data was organized in a box and whisker diagram as shown below:

For this data, write down

(a.i) the minimum number of sick days taken during the year.

[1]

(a.ii) the lower quartile.

[1]

(a.iii) the median.

[1]

(b) Paul claims that this box and whisker diagram can be used to infer that the percentage
of employees who took fewer than six sick days is smaller than the percentage of
employees who took more than eleven sick days.

State whether Paul is correct. Justify your answer.

[2]

16
Oberoi International School By Yogi
13.

Hafizah harvested 49 mangoes from her farm. The weights of the mangoes, 𝑤, in grams, are
shown in the following grouped frequency table.

(a) Write down the modal group for these data.

[1]

(b) Use your graphic display calculator to find an estimate of the standard deviation of
the weights of mangoes from this harvest.

[2]

(c) On the grid below, draw a histogram for the data in the table.

[3]

17
Oberoi International School By Yogi

14.

Chicken eggs are classified by grade (4, 5, 6, 7 or 8), based on weight. A mixed carton contains
12 eggs and could include eggs from any grade. As part of the science project, Rocky buys 9
mixed cartons and sorts the eggs according to their weight.

(a) State whether the weight of the eggs is a continuous or discrete variable.

[1]

(b) Write down the modal grade of the eggs.

[1]

(c) Use your graphic display calculator to find an estimate for the standard deviation of
the weight of the eggs.

[2]

(d) The mean weight of these eggs is 64.9 grams, correct to three significant figures.

Use the table and your answer to part (c) to find the smallest possible number of eggs that
could be within one standard deviation of the mean.

[2]

18
Oberoi International School By Yogi

15.

Stephen was invited to perform a piano recital. In preparation for the event, Stephen
recorded the amount of time, in minutes, that he rehearsed each day for the piano recital.

Stephen rehearsed for 32 days and data for all these days is displayed in the following box-and-
whisker diagram.

(a) Write down the median rehearsal time.

[1]

Stephen states that he rehearsed on each of the 32 days.

(b) State whether Stephen is correct. Give a reason for your answer.

[2]

(c) On 𝑘 days, Stephen practiced exactly 24 minutes.

Find the possible values of 𝑘.

[3]

19
Oberoi International School By Yogi
16.

University students were surveyed and asked how many hours, ℎ , they worked each month.
The results are shown in the following table.

Use the table to find the following values.

(a.i) 𝑝.

[1]

(a.ii) 𝑞.

[1]

The first five class intervals, indicated in the table, have been used to draw part of a cumulative
frequency curve as shown.

20
Oberoi International School By Yogi

(b) On the same grid, complete the cumulative frequency curve for these data.

[2]

(c) Use the cumulative frequency curve to find an estimate for the number of students who
worked at most 35 hours per month.

[2]

17.

A health inspector analysed the amount of sugar in 500 different snacks prepared in various
school cafeterias. The collected data are shown in the following box-and-whisker diagram.

Amount of sugar per snack in grams

(c) The health inspector visits two school cafeterias. She inspects the same number of meals at
each cafeteria. The data is shown in the following box-and-whisker diagrams.

21
Oberoi International School By Yogi

Meals prepared in the school cafeterias are required to have less than 10 grams of sugar.

State, giving a reason, which school cafeteria has more meals that do not meet the
requirement.

[2]

18.

Each month the number of days of rain in Cardiff is recorded.


The following data was collected over a period of 10 months.

11 13 8 11 8 7 8 14 x 15

For these data the median number of days of rain per month is 10.

(a) Find the value of x.

[2]

(b.i) Find the standard deviation

[2]

(b.ii) Find the interquartile range.

22
Oberoi International School By Yogi
[2]

19.

The following box-and-whisker plot shows the number of text messages sent by students in a
school on a particular day.

(a) Find the value of the interquartile range.

[2]

(b) One student sent k text messages, where k > 11 . Given that k is an outlier, find the least
value of k.

[4]

23
Oberoi International School By Yogi

Regression
1.

The Malvern Aquatic Center hosted a 3 metre spring board diving event. The judges, Stan
and Minsun awarded 8 competitors a score out of 10. The raw data is collated in the following
table.

(a.i) Write down the value of the Pearson’s product–moment correlation coefficient, 𝑟.

[2]

(a.ii) Using the value of 𝑟, interpret the relationship between Stan’s score and Minsun’s score.

[2]

(a.iii) Write down the equation of the regression line 𝑦 on 𝑥.

[2]

(a.iiii) Use your regression equation from part (b) to estimate Minsun’s score when Stan awards
a perfect 10.

[2]

(a.iiiii) State whether this estimate is reliable. Justify your answer.

[2]

24
Oberoi International School By Yogi

2.

The water temperature (𝑇) in Lake Windermere is measured on the first day of eight consecutive
months (𝑚) from January to August (months 1 to 8) and the results are shown below. The value
for May (month 5) has been accidently deleted.

(a) Assuming the data follows a linear model for this period, find the regression line of 𝑇 on 𝑚 for
the remaining data.

[2]

(b) Use your line to find an estimate for the water temperature on the first day of May.

[2]

(c.i) Explain why your line should not be used to estimate the value of 𝑚 at which the
temperature is 10.0 ∘ 𝐶.

[1]

(c.ii) Explain in context why your line should not be used to predict the value for December
(month 12).

[1]

25
Oberoi International School By Yogi
(d) State a more appropriate model for the water temperature in the lake over an extended
period of time. You are not expected to calculate any parameters.

[1]

3.

The random variables (𝑋, 𝑌) follow a bivariate normal distribution with product-moment
correlation coefficient 𝜌. The values of six random observations of (𝑋, 𝑌) are shown in the table.

𝑥 6.3 4.1 5.6 9.2 7.8 8.2

𝑦 9.2 4.9 8.9 10.3 8.9 9.8

(b) Determine the value of the product-moment correlation coefficient, 𝑟, of the sample.

[1]

4.

The mean annual temperatures for Earth, recorded at fifty-year intervals, are shown in the table.

Year 1708 1758 1808 1858 1908 1958 2008


(𝐱)

Year 8.73 9.22 9.10 9.12 9.13 9.45 9.76


∘ 𝐂 (𝐲)

Tami creates a linear model for this data by finding the equation of the straight line passing
through the points with coordinates (1708, 8.73) and (1958, 9.45).

(a) Calculate the gradient of the straight line that passes through these two points.

[2]

26
Oberoi International School By Yogi

(b.i) Interpret the meaning of the gradient in the context of the question.

[1]

(b.ii) State appropriate units for the gradient.

[1]

(c) Find the equation of this line giving your answer in the form 𝑦 = 𝑚𝑥 + 𝑐.

[2]

(d) Use Tami’s model to estimate the mean annual temperature in the year 2000.

[2]

Thandizo uses linear regression to obtain a model for the data.

(e.i) Find the equation of the regression line 𝑦 on 𝑥.

[2]

(e.ii) Find the value of 𝑟, the Pearson’s product-moment correlation coefficient.

[1]

(f) Use Thandizo’s model to estimate the mean annual temperature in the year 2000.

[2]

27
Oberoi International School By Yogi
Thandizo uses his regression line to predict the year when the mean annual temperature will first
exceed 15 ∘ 𝐶.

(g) State two reasons why Thandizo’s prediction may not be valid.

[2]

6.

This question is about modelling the spread of a computer virus to predict the number of
computers in a city which will be infected by the virus.

A systems analyst defines the following variables in a model:

• 𝑡 is the number of days since the first computer was infected by the virus.

• 𝑄(𝑡) is the total number of computers that have been infected up to and including day 𝑡.

The following data were collected:

(a.i) Find the equation of the regression line of 𝑄(𝑡) on 𝑡.

[2]

(a.ii) Write down the value of 𝑟, Pearson’s product-moment correlation coefficient.

[1]

28
Oberoi International School By Yogi

7.

Eduardo believes that there is a linear relationship between the age of a male runner and
the time it takes them to run 5000 metres.

To test this, he recorded the age, 𝑥 years, and the time, 𝑡 minutes, for eight males in a single
5000 m race. His results are presented in the following table and scatter diagram.

29
Oberoi International School By Yogi
(a) For this data, find the value of the Pearson’s product-moment correlation coefficient, 𝑟.

[2]

Eduardo looked in a sports science text book. He found that the following information about
𝑟 was appropriate for athletic performance.

(b) Comment on your answer to part (a), using the information that Eduardo found.

[1]

(c) Write down the equation of the regression line of 𝑡 on 𝑥, in the form 𝑡 = 𝑎𝑥 + 𝑏.

[1]

(d) A 57-year-old male also ran in the 5000 m race.

Use the equation of the regression line to estimate the time he took to complete the
5000 m race.

[2]

9.

As part of his mathematics exploration about classic books, Jason investigated the time taken by
students in his school to read the book The Old Man and the Sea. He collected his data by
stopping and asking students in the school corridor, until he reached his target of 10 students
from each of the literature classes in his school.

30
Oberoi International School By Yogi
(a) State which of the two sampling methods, systematic or quota, Jason has used.

[1]

Jason constructed the following box and whisker diagram to show the number of hours students
in the sample took to read this book.

(b) Write down the median time to read the book.

[1]

(c) Calculate the interquartile range.

[2]

Mackenzie, a member of the sample, took 25 hours to read the novel. Jason believes
Mackenzie’s time is not an outlier.

(d) Determine whether Jason is correct. Support your reasoning.

[4]

31
Oberoi International School By Yogi
For each student interviewed, Jason recorded the time taken to read The Old Man and the Sea
(𝑥), measured in hours, and paired this with their percentage score on the final exam (𝑦). These
data are represented on the scatter diagram.

(e) Describe the correlation.

[1]

Jason correctly calculates the equation of the regression line 𝑦 on 𝑥 for these students to be

𝑦 = −1.54𝑥 + 98.8.

He uses the equation to estimate the percentage score on the final exam for a student who read
the book in 1.5 hours.

(f) Find the percentage score calculated by Jason.

[2]

(g) State whether it is valid to use the regression line 𝑦 on 𝑥 for Jason’s estimate. Give a reason
for your answer.

[2]

32
Oberoi International School By Yogi
10.

Juliet is a sociologist who wants to investigate if income affects happiness


amongst doctors. This question asks you to review Juliet’s methods and conclusions.

Juliet obtained a list of email addresses of doctors who work in her city. She contacted them and
asked them to fill in an anonymous questionnaire. Participants were asked to state their annual
income and to respond to a set of questions. The responses were used to determine a
happiness score out of 100. Of the 415 doctors on the list, 11 replied.

(a.i) Describe one way in which Juliet could improve the reliability of her investigation.

[1]

(a.ii) Describe one criticism that can be made about the validity of Juliet’s investigation.

[1]

Juliet’s results are summarized in the following table.

33
Oberoi International School By Yogi

(b) Juliet classifies response K as an outlier and removes it from the data. Suggest one possible
justification for her decision to remove it.

[1]

For the remaining ten responses in the table, Juliet calculates the mean happiness score to be
52.5.

(c.i) Calculate the mean annual income for these remaining responses.

[2]

(c.ii) Determine the value of 𝑟, Pearson’s product-moment correlation coefficient, for these
remaining responses.

34
Oberoi International School By Yogi
[2

11.

Lucy sells hot chocolate drinks at her snack bar and has noticed that she sells more hot
chocolates on cooler days. On six different days, she records the maximum daily temperature, 𝑇,
measured in degrees centigrade, and the number of hot chocolates sold, 𝐻. The results are
shown in the following table.

The relationship between 𝐻 and 𝑇 can be modelled by the regression line with equation 𝐻 =
𝑎𝑇 + 𝑏.

(a.i) Find the value of 𝑎 and of 𝑏.

[3]

(a.ii) Write down the correlation coefficient.

[1]

(b) Using the regression equation, estimate the number of hot chocolates that Lucy will sell on a
day when the maximum temperature is 12∘ C.

[2]

35
Oberoi International School By Yogi

12.

Don took part in a project investigating wind speed, 𝑥 km h−1 , and the time, 𝑦 minutes, to fully
charge a solar powered robot.

The investigation was carried out six times. The results are recorded in the table.

(a) On graph paper, draw a scatter diagram to show the results of Don’s investigation. Use a
scale of 1 cm to represent 2 units on the 𝑥-axis, and 1 cm to represent 5 units on the 𝑦-
axis.

[4]

(b.i) Calculate 𝑥, the mean wind speed.

[1]

(b.ii) Calculate 𝑦, the mean time to fully charge the robot.

[1]

M is the point with coordinates (𝑥, 𝑦).

(c) Plot and label the point M on your scatter diagram.

[2]

(d.i) Calculate 𝑟, Pearson’s product–moment correlation coefficient.

[2]

(d.ii) Describe the correlation between the wind speed and the time to fully charge the robot.

[2]

36
Oberoi International School By Yogi
(e.i) Write down the equation of the regression line 𝑦 on 𝑥, in the form 𝑦 = 𝑚𝑥 + 𝑐.

[2]

(e.ii) Draw this regression line on your scatter diagram.

[2]

(e.iii) Hence or otherwise estimate the charging time when the wind speed is 27 km h−1 .

[2]

(f) Don concluded from his investigation: “There is no causation between wind speed and the
time to fully charge the robot”.

In the context of the question, briefly explain the meaning of “no causation”.

[1]

13.

Galois Airways has flights from Hong Kong International Airport to different destinations. The
following table shows the distance, 𝑥 kilometres, between Hong Kong and the
different destinations and the corresponding airfare, 𝑦, in Hong Kong dollars (HKD).

The Pearson’s product–moment correlation coefficient for this data is 0.948, correct to
three significant figures.

(a) Use your graphic display calculator to find the equation of the regression line 𝑦 on 𝑥.

[2]

37
Oberoi International School By Yogi
The distance from Hong Kong to Tokyo is 2900 km.

(b) Use your regression equation to estimate the cost of a flight from Hong Kong to Tokyo with
Galois Airways.

[2]

(c) Explain why it is valid to use the regression equation to estimate the airfare between Hong
Kong and Tokyo.

[2]

15. 19M.2.SL.TZ1.T_1

A healthy human body temperature is 37.0 °C. Eight people were medically examined and the
difference in their body temperature (°C), from 37.0 °C, was recorded. Their heartbeat (beats per
minute) was also recorded.

(b.i) Write down, for this set of data the mean temperature difference from 37 °C, 𝑥.

[1]

(b.ii) Write down, for this set of data the mean number of heartbeats per minute, 𝑦.

[1]

(c) Plot and label the point M(𝑥, 𝑦) on the scatter diagram.

[2]

(d.i) Use your graphic display calculator to find the Pearson’s product–moment correlation
coefficient, 𝑟.

[2]

(d.ii) Hence describe the correlation between temperature difference from 37 °C and heartbeat.

38
Oberoi International School By Yogi
[2]

(f) Draw the regression line 𝑦 on 𝑥 on the scatter diagram.

[2]

16.

A group of 7 adult men wanted to see if there was a relationship between their Body Mass Index
(BMI) and their waist size. Their waist sizes, in centimetres, were recorded and their BMI
calculated. The following table shows the results.

The relationship between 𝑥 and 𝑦 can be modelled by the regression equation 𝑦 = 𝑎𝑥 + 𝑏.

(a.i) Write down the value of 𝑎 and of 𝑏.

[3]

(a.ii) Find the correlation coefficient.

[1]

(b) Use the regression equation to estimate the BMI of an adult man whose waist size is 95 cm.

[2]

17.

The following table shows the hand lengths and the heights of five athletes on a sports team.

39
Oberoi International School By Yogi
The relationship between x and y can be modelled by the regression line with equation y = ax +
b.

(b) Another athlete on this sports team has a hand length of 21.5 cm. Use the regression
equation to estimate the height of this athlete.

[2]

18. 18N.2.SL.TZ0.T_1

The marks obtained by nine Mathematical Studies SL students in their projects (x) and their final
IB examination scores (y) were recorded. These data were used to determine whether the
project mark is a good predictor of the examination score. The results are shown in the table.

(a.ii) Use your graphic display calculator to write down 𝑦, the mean examination score.

[1]

(a.iii) Use your graphic display calculator to write down r , Pearson’s product–moment correlation
coefficient.

[2]

The equation of the regression line y on x is y = mx + c.

(b.i) Find the exact value of m and of c for these data.

[2]

A tenth student, Jerome, obtained a project mark of 17.

(c.i) Use the regression line y on x to estimate Jerome’s examination score.

[2]

40
Oberoi International School By Yogi

(c.ii) Justify whether it is valid to use the regression line y on x to estimate Jerome’s examination
score.

[2]

19.

A scientist measures the concentration of dissolved oxygen, in milligrams per litre (y) , in a river.
She takes 10 readings at different temperatures, measured in degrees Celsius (x).

The results are shown in the table.

It is believed that the concentration of dissolved oxygen in the river varies linearly with the
temperature.

(a.i) For these data, find Pearson’s product-moment correlation coefficient, r.

[2]

(a.ii) For these data, find the equation of the regression line y on x.

[2]

(b) Using the equation of the regression line, estimate the concentration of dissolved oxygen in
the river when the temperature is 18 °C.

[2]

41

You might also like