You are on page 1of 11

ADVANCED GCE UNIT

4767/01

MATHEMATICS (MEI)
Statistics 2 MONDAY 21 MAY 2007
Additional Materials: Answer booklet (8 pages) Graph paper MEI Examination Formulae and Tables (MF2)

Morning Time: 1 hour 30 minutes

INSTRUCTIONS TO CANDIDATES

Write your name, centre number and candidate number in the spaces provided on the answer booklet. Answer all the questions. You are permitted to use a graphical calculator in this paper. Final answers should be given to a degree of accuracy appropriate to the context.

INFORMATION FOR CANDIDATES

The number of marks is given in brackets [ ] at the end of each question or part question. The total number of marks for this paper is 72.

ADVICE TO CANDIDATES

Read each question carefully and make sure you know what you have to do before starting your answer. You are advised that an answer may receive no marks unless you show sufcient detail of the working to indicate that a correct method is being used.

This document consists of 4 printed pages.


OCR 2007 [L/102/2657] OCR is an exempt Charity

[Turn over

2 1

The random variable X represents the time taken in minutes for a haircut at a barbers shop. X is Normally distributed with mean 11 and standard deviation 3.
(i) Find P(X < 10).

[4]

(ii) Find the probability that exactly 3 out of 8 randomly selected haircuts take less than 10 minutes. [3] (iii) Use a suitable approximating distribution to nd the probability that at least 50 out of 100 randomly selected haircuts take less than 10 minutes. [4]

A new hairdresser joins the shop. The shop manager suspects that she takes longer on average than the other staff to do a haircut. In order to test this, the manager records the time taken for 25 randomly selected cuts by the new hairdresser. The mean time for these cuts is 12.34 minutes. You should assume that the time taken by the new hairdresser is Normally distributed with standard deviation 3 minutes.
(iv) Write down suitable null and alternative hypotheses for the test. (v) Carry out the test at the 5% level.

[3] [5]

A medical student is trying to estimate the birth weight of babies using pre-natal scan images. The actual weights, x kg, and the estimated weights, y kg, of ten randomly selected babies are given in the table below.

x y

2.61 3.2

2.73 2.6

2.87 3.5

2.96 3.1

3.05 2.8

3.14 2.7

3.17 3.4

3.24 3.3

3.76 4.4

4.10 4.1

(i) Calculate the value of Spearmans rank correlation coefcient.

[5]

(ii) Carry out a hypothesis test at the 5% level to determine whether there is positive association between the students estimates and the actual birth weights of babies in the underlying population. [5] (iii) Calculate the value of the product moment correlation coefcient of the sample. You may use the following summary statistics in your calculations:

x = 31.63,

y = 33.1,

x2 = 101.92,

y2 = 112.61,

xy = 106.51.

[5]

(iv) Explain why, if the underlying population has a bivariate Normal distribution, it would be preferable to carry out a hypothesis test based on the product moment correlation coefcient.

Comment briey on the signicance of the product moment correlation coefcient in relation to that of Spearmans rank correlation coefcient. [4]

OCR 2007

4767/01 Jun07

3 3

The number of calls received at an ofce per 5 minutes is modelled by a Poisson distribution with mean 3.2.
(i) Find the probability of

(A) exactly one call in a 5-minute period, (B) at least 6 calls in a 5-minute period.
(ii) Find the probability of

[4]

(A) exactly one call in a 1-minute period, (B) exactly one call in each of ve successive 1-minute periods. [4]

(iii) Use a suitable approximating distribution to nd the probability of at most 45 calls in a period of 1 hour. [4]

Two assumptions required for a Poisson distribution to be a suitable model are that calls arrive at a uniform average rate, independently of each other.
(iv) Comment briey on the validity of each of these assumptions if the ofce is

(A) the enquiry department of a bank, (B) a police emergency control room. [4]

The sexes and ages of a random sample of 300 runners taking part in marathons are classied as follows.
Observed

Sex Male 70 76 52 198 Female 54 36 12 102

Row totals 124 112 64 300

Under 40 Age group 4049 50 and over Column totals

(i) Carry out a test at the 5% signicance level to examine whether there is any association between age group and sex. State carefully your null and alternative hypotheses. Your working should include a table showing the contributions of each cell to the test statistic. [10] (ii) Does your analysis support the suggestion that women are less likely than men to enter marathons as they get older? Justify your answer. [3]

For marathons in general, on average 3% of runners are Female, 50 and over. The random variable X represents the number of Female, 50 and over runners in a random sample of size 300.
(iii) Use a suitable approximating distribution to nd P(X 12).

[5]

OCR 2007

4767/01 Jun07

Permission to reproduce items where third-party owned material protected by copyright is included has been sought and cleared where possible. Every reasonable effort has been made by the publisher (OCR) to trace copyright holders, but if any items requiring clearance have unwittingly been included, the publisher will be pleased to make amends at the earliest possible opportunity. OCR is part of the Cambridge Assessment Group. Cambridge Assessment is the brand name of University of Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge.
OCR 2007 4767/01 Jun07

Mark Scheme 4767 June 2007

Question 1 (i) X ~ N(11,32) P(X < 10) = P Z <

10 11 3

M1 for standardizing M1 for use of tables with their z-value M1 dep for correct tail A1CAO (must include use of differences) M1 for coefficient M1 for 0.36963 0.63045 A1 FT (min 2sf) M1 for Normal approximation with correct (FT) parameters B1 for continuity corr. M1 for standardizing and using correct tail A1 CAO (FT 50.5 or omitted CC) B1 for H0, as seen. B1 for H1, as seen. B1 for definition of 4

= P( Z < 0.333) = (0.333) = 1 (0.333) = 1 0.6304 = 0.3696 (ii) P(3 of 8 less than ten) = 0.36963 0.63045 = 0.2815 (iii) = np = 100 0.3696 = 36.96 2 = npq = 100 0.3696 0.6304 = 23.30 Y ~ N(36.96,23.30) P(Y 50) = P Z >

8 3

49.5 36.96 23.30

= P(Z > 2.598) = 1 (2.598) = 1 0.9953 = 0.0047

(iv)

H0: = 11; H1: > 11 Where denotes the mean time taken by the new hairdresser

(v)

Test statistic =

12.34 11

3/ 25

1.34 0.6

M1 must include 25 A1 (FT their ) B1 for 1.645 M1 for sensible comparison leading to a conclusion A1 for conclusion in words in context (FT their )
5

= 2.23 5% level 1 tailed critical value of z = 1.645 2.23 > 1.645, so significant. There is sufficient evidence to reject H0 It is reasonable to conclude that the new hairdresser does take longer on average than other staff.

19

Question 2

(i)

x y Rank x Rank y d d2

2.61 2.73 2.87 2.96 3.05 3.14 3.17 3.24 3.76 3.2 2.6 3.5 3.1 2.8 2.7 3.4 3.3 4.4

4.1 4.1

M1 for ranking (allow all ranks reversed) M1 for d2 A1 for d2 = 68 M1 for method for rs

10 6 4 16

9 10 -1 1

8 3 5 25

7 7 0 0

6 8 -2 4

5 9 -4 16

4 4 0 0

3 5 -2 4

2 1 1 1

1 2 -1 1

6d 6 68 rs = 1 = 1 2 n(n 1) 10 99
2

= 0.588 (to 3 s.f.) [ allow 0.59 to 2 s.f.]


(ii)

NB No ranking scores zero

A1 f.t. for |rs| < 1

H0: no association between x and y H1: positive association between x and y Looking for positive association (onetail test): critical value at 5% level is 0.5636 Since 0.588> 0.5636, there is sufficient evidence to reject H0, i.e. conclude that there is positive association between true weight x and estimated weight y.

B1 for H0, in context. B1 for H1, in context.

NB H0 H1 not ito
B1 for 0.5636 M1 for sensible comparison with c.v., provided |rs| < 1 A1 for conclusion in words & in context, f.t. their rs and sensible cv M1 for method for Sxy M1 for method for at least one of Sxx or Syy
5

(iii)

x = 31.63, y = 33.1, x2 = 101.92, y2 = 112.61, xy = 106.51. Sxy = xy = 1.8147 Sxx = x 2 Syy = y 2


r =

1 1 xy = 106.51 10 31.63 33.1 n 1 2 ( x ) = 101.92 n

1 10

31.632 = 1.8743

1 2 ( y ) = 112.61 n
=

1 10

33.12 = 3.049

A1 for at least one of Sxy, Sxx, Syy correct. M1 for structure of r A1 (awrt 0.76)
5

Sxy Sxx S yy

1.8147 = 0.759 1.8743 3.049

(iv)

E1 for has values, not Use of the PMCC is better since it takes into account not just the ranking but the actual value of the weights. just ranks Thus it has more information than Spearmans and will E1 for contains more information therefore provide a more discriminatory test. Allow alternatives. B1 for a cv Critical value for rho = 0.5494 E1 dep PMCC is very highly significant whereas Spearmans is only just significant.

19

Question 3

(A) P(X = 1) = 0.1712 0.0408 = 0.1304


(i) OR

M1 for tables A1 (2 s.f. WWW) M1 A1 B1 for mean (SOI) M1 for probability A1


4 4

= e

-3.2

3.21 = 0.1304 1!

(B) P(X 6) = 1 P(X 5) = 1 0.8946 = 0.1054


(ii)

(A) = 3.2 5 = 0.64 P(X =1) = e-0.64

0.641 = 0.3375 1!

(B) P(exactly one in each of 5 mins) = 0.33755 = 0.004379

B1 (FT to at least 2 s.f.)

(iii)

Mean no. of calls in 1 hour = 12 3.2 = 38.4 Using Normal approx. to the Poisson,
X ~ N(38.4, 38.4)

B1 for Normal approx. with correct parameters (SOI) B1 for continuity corr. M1 for probability using correct tail A1 CAO, (but FT 44.5 or omitted CC)
4

P(X 45.5) = P Z

45.5 38.4 38.4

= P(Z 1.146) = (1.146) = 0.874 (3 s.f.)

(iv)

(A) Suitable arguments for/against each assumption: (B) Suitable arguments for/against each assumption:

E1, E1 E1, E1
4 16

Question 4 (i)

H0: no association between age group and sex; H1: some association between age group and sex;
Expected

B1 (in context)

Sex Male 81.84 73.92 42.24


198

Age group

Under 40 40 49
50 and over

Female 42.16 38.08 21.76


102

Row totals 124 112 64 300

Column totals Contribution to test statistic

M1 A1 for expected values (to 2dp) M1 for valid attempt at (OE)2/E M1dep for summation
6

Sex Male 1.713 0.059 2.255 Female 3.325 0.114 4.378 A1CAO for X2
4

Age group

Under 40 40 49
50 and over

X 2 = 11.84

Refer to 2 Critical value at 5% level = 5.991 Result is significant There is some association between age group and sex .
NB if H0 H1 reversed, or correlation mentioned, do not award first B1or final E1

B1 for 2 deg of f B1 CAO for cv B1 dep on their cv & X2 E1 (conclusion in context)

(ii)

The analysis suggests that there are more females in the under 40 age group and less in the 50 and over age group than would be expected if there were no association. The reverse is true for males. Thus these data do support the suggestion. Binomial(300, 0.03) soi n = 300, p = 0.03 so EITHER: use Poisson approximation to Binomial with = np = 9 Using tables: P(X 12) = 1 P(X 11) = 1 0.8030 = 0.197
OR: use Normal approximation N(9, 8.73)

(iii)

P(X > 11.5) = P Z >

11.5 9 8.73

= P(Z > 0.846)) = 1 0.8012 = 0.199

E1 E1 E1dep (on at least one of the previous E1s) B1 CAO EITHER: B1 for Poisson B1dep for Poisson(9) M1 for using tables to find 1 P(X 11) A1 OR: B1 for Normal B1dep for parameters M1 for using tables with correct tail (cc not required for M1) A1

18

Report on the Units taken in June 2007


4767: Statistics 2 General Comments

As with previous years, the majority of candidates were well prepared for this examination. Candidates are improving in their ability to carry out hypothesis tests, using correct notation and suitably thorough explanation. Most demonstrate good understanding of the Normal distribution; very few candidates use incorrect tail-probabilities in probability calculations compared with previous years. Marks for explanation and interpretation continue to be elusive to even the most able candidates.
Comments on Individual Questions Section A

(i)

Well answered. Many candidates lost marks through inappropriate use of continuity corrections. Most managed to calculate a probability using the correct tail of the Normal distribution. Well answered. A few candidates omitted the binomial coefficient. Some found three eighths of their previous answer. Otherwise, most gained full marks. The majority of candidates gained at least 3 of the 4 marks available. Many lost a single mark through inaccurate use of Normal tables, failure to use a continuity correction or using the continuity correction, 50.5. A small number attempted to use a Poisson approximation, gaining no credit. Most candidates obtained two marks for providing correct hypotheses in terms of . The mark for defining proved harder to obtain. Many made no attempt to define at all; some of those who did, seemed unable to relate to the new hairdresser. As with previous years, this mark still proves to be rarely given. Well answered. A variety of approaches were seen; the most common being as outlined in the mark scheme. A small number of students were penalised heavily for treating the sample mean as a single observation, thus avoiding use of the standard error 3/25. Most candidates obtained at least 4 of the 5 available marks. A few lost the final mark through failing to answer in context. In such questions, the concluding statement should always refer to the context in which the question is set. Well answered. Most achieved full marks. Some candidates made mistakes with ranking or with calculating d2, thus losing at least one mark. A number of candidates omitted the 6 from their calculation of rs. Those failing to use ranks scored no marks on this part of the question. Most candidates are now describing their hypotheses in tests for association, as outlined in the specification. Many failed to give their hypotheses in context, as required; in this particular question, between x and y was sufficient. Several lost a mark for omitting the word positive from their alternative hypothesis; a further mark was lost if positive was omitted from their conclusion. In the remainder of the question, most scored full marks, but marks were lost for failing to provide a conclusion in context.

(ii) (iii)

(iv)

(v)

(a)(i)

(ii)

49

Report on the Units taken in June 2007

(iii) (iv)

Well answered, with most candidates scoring full marks. Poorly done. Many answers merely repeated the wording given in the question without actually explaining why the pmcc test is preferable. Many candidates appeared not to realise that two explanations were required in this part of the question. For the second explanation, very few managed to refer to a critical value; most answers simply compared the values of the correlation coefficients with each other. Most candidates scored full marks. A small number misinterpreted the question, finding P(X = 5) instead of P(X = 1). Most candidates scored full marks. A small number used 1 - P(X 6), losing both marks. Most candidates scored full marks. Well answered. Some candidates misinterpreted the question and found P(X = 1), using B(5, 0,3375) Well answered. Common mistakes involved incorrect, or omitted, continuity corrections. Most candidates worked to an acceptable degree of accuracy.

(i) A (i) B (ii) A (ii) B (iii)

(iv) A In answering questions such as this one, candidates should aim to provide a &B decision together with a reason to support it. Many candidates provided indecisive comments. Other candidates merely stated that calls would (or would not) arrive independently and at a uniform average rate, making no attempt to interpret what this meant. It is clear that most candidates have a poor understanding of what is meant by uniform average rate. 4 (i) Well answered. In stating hypotheses, some candidates lost a mark for failing to provide context. Calculations of expected frequencies were handled accurately, on the whole, leading to full marks for the test statistic; however, some candidates lost an accuracy mark through premature approximation. Most candidates had little trouble picking up the final 4 marks, although a significant number thought they should carry out a two-tailed test and were, consequently, penalised. A small number mentioned correlation in their conclusions. It proved difficult for candidates to obtain full marks for this part of the question. Better attempts saw candidates comparing observed and expected frequencies. Those who referred to the contributions to the test statistic tended to write nonsense unless they demonstrated an appreciation of the difference between positive and negative contributions. Well answered, with most gaining full marks. The Poisson approximation proved more popular and successful than the Normal approximation. Of those using the Normal approximation, several applied incorrect continuity corrections.

(ii)

(iii)

50

You might also like