Cambridge
International AS & A Level
Further
Mathematics
Further Probability
& Statistics
John du Feu
R Series editor: Roger Porkess
HODDER
Bp emanc Gp HODDER
‘AN HACHETTE UK COMPANY‘estos fom the Canbige rte A 8A Level Fer Mathumatis papers are podied
Dy pemison ef mbes inertial aseaton Uns ater sclgowedge. the
(stars eagle anon, and comments tat appar nhs bok were wt byte author.
(Baige Assman Intatona doom bea nu espa forthe ecole anes to
(ust tan fam pst qution pps nc re cotled Ins uit.
"TGSE fe oie wadema.
‘Te users oul ie to tank he along who hae gn prison to epodie photographs
nttetene
Photo cede
pt ©akaponChumchue/Shttestck 9.0 © wa Shatetck: p28 © Basan ATELY
$Sjurtastack: pt ban Dobe; p21 Atha SinonShutestock: pASB
Mite Mri /13.
Every fet hasbeen mae to trace copyrit nd atone omership The pubis wl be gad
{ovate suitable aranganent wih ay copesh holes wha has ot ben pele To coma.
Hace Us pole Ito ea papers tht are ata renewable and eae pout and made
from wood gown ssi es, The ggg and manufacturing ree are expected to
‘ofr ote envronnel eqn fe cur o rin
Cer: plac contac Banc 120 Fak rv, Mian Park, Abingdon, On OH SE.
‘pone: (6) 183582720 Fx: (4) 01295 040 Ema dueontabeokgc.o. ines re
pen om 9 no 8 pn, Mond fa etry, th a 2 How message arnerng sevice. You can
‘to oe though our weit wenodsendtn.con
och of the materi nt bok wae pb aga as par of he ME Stated Matas
Seve fsbo cael spn rte Carag taraonal AS 8 Level ahr Ratan:
Splat: the engl NE author team fer states compre ln Cer, Micha Daves,
‘Rithony Een Bb Fanci, Gel God. Aan Gam, Nigel Gen, Lim Hennes, Roger Poke
tnt cure Spe.
© Ruger Pras and Job Feu 2018
Fit published in 2018 by
oder Eduction, an Haat UX company
mate Hour, 50 Vitara Ember
{don ca 002
Inpeson mint 5.6321
Yer maa moat «2000 tame
[Alig eee pa fom any we pert unde UX pyr am, op of this plain
fyb ec r tana nny om by yest, tectonic chai ncaa
otoopyng and ceding, bel thn ay iforatin sage an retieval ptm, without
Feomsin rng om the use order Ueece om he Copyright sng ey Lie,
Fler dette of un ees er epegtapheepection) muy be obtaines fom the Cnyiaht
Uiensig agen Lined, nach
Cover hao by Shuttentack/ amu
eatin y Tose, Alar, Inc, an Teta Stn Seves
"Typeset in Bro St 1/1 ngs Softee Services Pt, Pndicber, nla
Ps in aly
‘catalogue cod for tid aval fom he Bish iran,
sey sraxsiocaias
Mix
Paper fom
reaps soron
Es¢ FSC™ 104740Contents
Introduction iv
How to use this book v
‘The Cambridge International AS & A Level Further
Mathematics 9231 syllabus viii
1. Continuous random variables 1
41.1. Piecewise definition of a probability density function 2
4.2 The expectation and variance of a function of X 3
41.3 The cumulative distribution function 10
41.4 Finding the PDF of a function of a continuous random variable 19
2 Inference using normal and t-distributions 28
2.4 Interpreting sample data using thet distribution 28
2.2. Using the tstbution for pore samples 34
2.3. Hypothesis testing ona sample mean using the tstbution 35
2:4 Using the stnbuton with two samples 3
2.5. Comparison beeen pated and 2-sample tests 530
2.6 Testing fora non-zero value ofthe dference of two means $1
"7 Wypothess tests and confidence interals 3
{Using the normal cistribution with vo samples 60
2.9 Tests with large samples 65
3. Chi-squared tests 78
2.1 The chi-squared test for acontngeny table 79
3.2. Goodness of ft tests 33
4 Non-parametric tests
4:1 single-sampe non-parametric tose wt
62 Faited-sample non parametric tests 3
43 Theilconon enksum test te
5. Probability generating functions 156
5.1 Probabilities defined bya probabitty generating function 157
5:2 Expectation and variance 161
5.3 The sum of independent random variables 167
5.4 The PGFs for some standard discrete probability distributions 172
Index 179Introduction
“This is one of a series of four books supporting the Cambridge International
[AS & A Level Further Mathematics 9231 syllabus for examination from
2020. It is preceded by five books supporting Cambridge International AS 8
‘A Level Mathematics 9709. The five chapters in this book cover che further
probability and statistics required for the Paper 4 examination. This part of
the series ako contains two books for further pure mathematics and one
book for farther mechanics,
“These books are based on the highly successful series for the Mathematics
in Education and Indastry (MEI) syllabus in the UK but they have been
‘tedesigned and revised for Cambridge International students; where
appropriate, new material has been written and the exercises contain many
‘past Cambridge International examination questions. An overview of |
the unite making up the Cambridge International syllabus is given in the
following pages.
‘Throughout the series, che emphasis is on understanding the mathematics 3s
well as routine calculations. The various exercises provide plenty of scope for
‘practising basic techniques; they alo contain many typical examination-style
(guestions
‘The original MEI author team would like to thank John du Feu who has
carried out the extensive task of presenting their work ina suitable form for
(Cambridge International students and for his many original contributions.
‘They would aso ike to thank Cambridge Assessment International
[Education for its detailed advice in preparing the books and for permission
0 use many past examination questions
Roger Porkess
Series editorHow to use this book
The structure of the book
‘This book has been endorsed by Cambridge Assessment International
Education. Its listed as an endorsed textbook for students aking the
Cambridge International AS & A Level Fusther Mathematics 9231
sllabus. The Further Probability & Statistic syllabus content is covered
comprehensively and is presented across ive chapters, offering a structured
route through the course
‘The book is written on the assumption that you have covered and
‘understood the work in the Cambridge International AS & A Level
‘Mathematics 9709 syllabus, including the probability and statistics content,
‘The following icon is wed to indicate material that is not directly on the
syllabus.
(© There are places where the book goes beyond the requirements of
the syllabus to show how the ideas can be taken farther or where
fundamental underpinning work is explored, Such work is marked
ss extension,
Bach chapter is broken down into several sections, with each section covering
single topic. Topics are introduced through explanations, with key terms
picked out in red. These are reinforced with plentiful worked examples,
punctuated with commentary to demonstrate methods and ilstrate
application of the mathematics under discussion,
Regular exercises allow you to apply what you have learned, They offer a
large variety of practice and higher-order question types that map to the key.
concepts of the Cambridge International syllabus, Look out forthe following
G Probiem-tolving questions wil help yo to develop the abiiy
to analyse problems, recognise how to represent different situations
inathematicaly identify and icerpret relevant information and select
appropriate methods
© Mosetting questions provide you wit a inuxkaction 1 the
importane skill of mathematical modeling. In this, you tke an
‘everyday or workplace station, o one that aries n your other
subject, and presen tin a form that allows you to apply mathematics
@ Communication and proof questions encourage you to become
amore fluent mathematician, giving you scope to communicate your
‘work with lear logical arguments and eo justify your ressExercises alo include questions from real Cambridge Assessment
International Education past papers, so that you can become familiar with the
types of questions you are likely to meetin formal assesments
Answers to exercise questions, excluding long explanations and proof, are
available online at swwuchoddereducation com/cambridgesxtzs,s0 you can
check your work. Is important, however, that you have a go at answering
the questions before looking up the answers if you are to understand the
‘mathematics fall
{In addition to the exercises, range of additional features are cluded to
enhance your learning.
ACTIVITY
[Activities invite you to do some work for yourself, typically to introduce
‘yous to ideas that ate chen going to be taken farther. In some places,
activities are also used to follow up work that has just been covered.
Th real lif ts often the case that as well as analysing a situation or
problem, you ako need to carry out some investigative work. This allows
you to check whether your proposed approach is likely to be fruitful ot
{o work at all, and whether it can be extended. Such opportunities are
marked as investigations.
(Other helpil eatoresinchde the following
© Thissymbot hightighes point ie will benefit you to discuss with
your teacher or flloestadents, to encourage deeper exploration
znd mathematical communication. Ifyou ae working on your
own, there ae answers availble online at
www hoddereducation,com/cambridgeextas.
@ Thisis a warning sign. It is used where a common mistake,
misunderstanding or ticky point being described to prevent you
fiom making the same error
[A variety of notes are included to offer advice or spark your interest:
© Note
"Notes expand on the topic under consideration and explore the deeper
lessons that emerge from what has just been done.
Historical notes offer itoresting background information about famous
mathematicians or results to engage you in this fascinating field© Technology note
‘tough graphical elelaars and compulers are nol permite inthe
txaminatone fortis Cambridge International spiabus, we have included
Technology notestonceste places here working wth hes ca be help
for earning ond for teaching,
Finally, each chapter ends with the key points covered, plus a list of the
learning outcomes that summarise what you have learned in a form that is
closely related to the syllabus.
Digital support
Comprehensive online suppor for tis book, including further questions,
Js avallble by subscription to MET Integral online teaching and learning
platform for AS & A Level Mathematics and Further Mathematies,
integralmaths org This enine platform provides extensive, high-quality
:esources including printable materi, innovative interactive activites and
formative and summative asesiments. Our eTextbooks lnk seamlesty with
Tncegra, allowing you to move with ease between corresponding topics in
the eTextbooks and Integral
METS Integral® material has no been through the Cambridge International
endorsement procesThe Cambridge International
AS & A Level Further
Mathematics
9231 syllabus
‘The syllabus content i assesed over four examination papers.
+ 60% of the AS Level; 30% ofthe
ALevel
*# Compulsory for AS and A Level
Paper 3: Further Mechanics
#-Thour 30 minutes
‘4036 of the AS Level; 20% of the
ALevel
Offered as part of AS;
compulsory for A Level
Paper 2: Further Pure
‘Mathematics 2
+ 2hours
#3096 of the A Level
+ Compulsory for A Levelsnot a
rouce to AS Level
Paper 4: Further Probability &
Statistics
‘© T hour 30 minutes
‘+ 409% of the AS Level; 20% of the
Aleve!
+ Offered as part of AS;
compulsory for/A Level
‘The following diagram illstates the permitted combinations for AS Level
and A Level
AS Level Further
‘Mathematics
Paper 1 and Paper 3
Further Pure Mathematies 1
and Further Mechanies
Paper 1 and Paper 4
Further Pure Mathematics 1
and Further Probability & Statisties
Prior knowledge
A Level Further
Mathematics
Paper 1,2,3 and 4
Further Pure Mathematics 1 and 2,
Further Mechanies and Further
Probability & Statisties
Ie is expected that earners will have studied the majority of the Cambridge
International AS & A Level Mathematics 9709 syllabus content before
studying Cambridge Intemational AS & A Level Further Mathematics 9231‘The prior knowledge required for each Further Mathematics component is
shown in the fllowing table,
9231 Paper 1 9709 Papers Land 3
Further Pure Mathematics 1
9231 Paper 2:
Further Pure Mathematics 2
9709 Papers 1 and 3
9231 Paper 3 |9709 Papers 1,3 and # Brae |
Further Mechanics = ae
‘9231 Paper 4: 9709 Papers 1,3,5 and 6
Further Probability & Statistics
For Paper 4: Further Probability & Statistics, knowledge of Cambridge
International AS & A Level Mathematics 9709 Papers 5 and 6: Probability &
Statistics subject content is assumed.
Command
words
‘The table below includes command word used in the asessment for this
syllabus. The use of the command word will relate to the subject context,
Calculate | workout fom given ft, gues oF information
Dediuce conclude fom sable infra
Derive | aban something (expresion/equaton/ value) fom another
z by a sequence af logical eps Eca|
Describe [sete the poins of topic / give asics and main Ratu
Desernine—[enblsh with certnty
Evaluate | judge or ealulte the quality. nportnce amount, oF vale
‘of something
Explain [set out purposes or reasons / make the eelationships between
things evident / provide why and/or how and support with
relevant evideve :
entify | mame/select/recognise
Tnterpret | identify meaning or significance in relation to the context
Josty support a case with evidence/argument —
Prove confirm the truth of the given statement using a chain of
logical mathematical reasoning
‘Show (@hat)_| provide structured evidence that leads to a given reaule
‘Sketch __| make a simple fcehand drawing showing the key features
State expres in clear terms
Verily confitm a given starement/result s tueKey concepts
Key concepts are estential ideas that help students develop a deep
understanding of mathematics.
‘The key concepts are:
Problem solving
Mathematics i fundamentally problem solving and representing systems and
‘models in different ways, These include:
> Algebra: this an esiential tool which supports and expresses
‘mathematical reasoning and provides a means to generalise acros a
number of contexts,
>» Geometrical techniques: algebraic representations also describe a spatial
relationship, which gives us a new way to understand a situation,
>» Calculus: this i fandamental element which describes change in
‘dynamic situations and underlines the links between functions and graphs.
>» Mechanical models these explain and predict how particles and objects
‘move or teain stable under the influence of forces.
>» Statistical methods: these are used to quantify and model aspects of the
‘world around us, Probability theory predicts how chance events might
proceed, and whether assumptions about chance are justified by evidence.
Communication
“Mathematical proof and reasoning is expresed using algebra and notation so
that others can follow each line of reasoning and confirm is completeness
and accuracy. Mathematical notation is universal Each solution is structured,
bat proof and problem solving also invite creative and original thinking.
‘Mathematical modelling
‘Mathematical modelling can be applied to many different situations and
‘problems, leading to predictions and solutions. variety of mathematical
Content areas and techniques may be required to create the model. Once the
‘model has been created and applied, che results can be interpreted to give
predictions and information about the real world,
‘These key concepts are reinforced in the different question types included
in this book: Problem-solving, Communication and proof, snd
‘Modelling.‘The control of
large numbers
is possible,
and like unto
that of small
numbers, if
we subdivide
them.
Sun Tzu, ‘The
Art Of War’
(S40c-4960c)
Continuous random
variables
‘You will reall having met probability density functions (PDFs) for
continuous random variables in A Level Mathematics.To find probabilities
using a probability density function fx) you need to integrate the fianction
between the limits you ae using
somes x
» as putt of an experiment you are measuring temperature in Celsius but
then need to convert them to Fahrenheit: F= 1.8C + 32
>» you are measuring the lengths of the sides of squate pieces of material and
Seducing their areas: = 1?
> you are estimating the ages, A years, of hedgerows by counting the
number, n, of types of shrubs and trees in 30m lengths: A = 100 ~ 50.
In fact, in any situation where you ate entering the value of a random,
variable into a formula, the outcome will be another random variable that is
a function of the one you entered. Under these circumstances you may need
(0 find the expectation and variance of such a fnction of a random variable
Anse to exes ae avilable at wasuhoddaducaton com abides il‘1 CONTINUOUS RANDOM VARIABLES.
Fora discrete random variable, X,in which the value x, occurs with
probability p, the expectation and variance ofa faction g(X) are given by
B(g()) = Esl),
Var(a(20) = lel)! 2, {HCO}
“Te equtlene res continzous random variable, X, with PDF fi) ate
H(A) = J Geax
Var(s(X)= J (9()'a)de— {EG
“You may find it helpfl to think of the function g(X) a6 a new variable;say,¥
"The continuows random variable X has PDF ffx) where
fie for0 = x= 2
fix) ={4k—ke for <4
0 otherwise
li) Find the value of the constane k
Ui) Sketch y = fa).
(ii) Find PL = X= 3.5),
‘The continuous random variable Y= X?,
(iv) Find EY).
Solution
a [fteaes [Sar tayae=t
[les] 1
‘K2—0) + k[(16 - 8)- BB =1
ae abt(i)
id
MMSE i consinsous random variable Xhas PDF fe) given by
o
ti
‘i
. =
A Figure 12
PUsXx=35)
= PPheacs ["(-fe)ae
“(TEs
=f Be[bs- 88-4]
=2
F-08375
nea [iatdedes f'aa(t-fa)ax
= Jibedee f'(e
-fel-fs-a]
~o-94[($-1)-(G-1]]
-}o)ax
rqy=fs rO<#<10
0 otherwise.
Find E@X+ 4).
Find SE(X) +4,
Find Var3X + 4).
Verify cht Var(3X + 4) = 3°Vir).
Ars cenin coulda niniindatinntantoes il3
3
5
:
:
3
:
Solution
Here you ae using
Wi EGX+4)= J" Ger 4yehae ] HeCo)= fecooae
x80 {
= [hoe +40ae
=2044
=m
(i) 3E(x)+4=3)" xghaeo4 cereus |
f ae Var(e(3)) = I(eG0)" eax
[ioe] +4 ~fe(e)F
=20+4 0d tom a eu roma
=
[Notice here that E(BX + 4) = 24 = 3E(X) +4
Ui) To find Var (3X +4), use
ac +4)= [Ne oh aceon
= J 00 +28 +165) dx —576
You then multiply A[ft vee +se]'-s16
ost the br nd
enalipi by
Us senso sen] se
= 59% 31300-5376 | You may recall rom A
=50 ep
Ff 2) = fey
iv) Var(X) = E(x?) 7
iv) Var(xX) = (X?)-[B(X)J a
BC) = [2 ls)ae
&
BO) = fe ghede BO) = [xgpeae
BEM) fsperde BOD= [peterMB a candor variable Xhas PDE
#
B(x) = [855 20)=[ p=]. 1
E(X*)= 50 E(X) = 64
Here you ae using
var(x) = B(2)= [BOX
From part lil, Var(3X-+ 4) = 50
So Var(3X +4) = 3 Var(X) a5 required
ex feoce =
modelo predict the ow of runners, | 2, 4
and particularly their finishing times. ay 7
We are offering a prize of $150 for [31 | 5
the best such model submitted. =[An entrant for the competition proposes a model in which a runner's time,
X houts,is a continuous random variable with PDF 1
As)
4 2
Bee-N-a foriex4,
Iti possible co use definite integration, bur this eauses a problem as you
_gmot use the same fete fo bth init of the tera nd sche variable
‘integration. So you would have to change the variable of integration,
‘which is dummy variable as it does not appear in che final answer, to a
dlferen letter,
‘To find the proportions of runners finishing by any time, substicute that value
for 3380, when = 2
‘You would not be correct to write
down an expression
ei= [f fie—na—are
since woul then be both limit
of the integra andthe variable used
‘within it To overcome this problem
you use @ dummy variable, say, 1, 50
{hat Fs) snow writen
ree femme
Fa) = dat dca Mae Haan
Ue
= 0.41 to two decimal places.
INcoRRECT
correctHere isthe complete table, with all the values worked out
100 | 000 | 000,
125 | 00+ | ans
150 | 0x3 | as.
175 [026
20 | oat | 049
225 | oss | 57
350 | 069 | 075
300 | 089 | ost
350 | 098 | 099
4.00 100 | 1.00
Notice the distinctive shape ofthe curves ofthese functions (Figure 1.8),
sometimes called an ogive.
Fe),
Uonauny donngusyp eanejnuind ayy 1
os
os
© Note
Youhave
probaly ma hs
Shape already
eee
pate
eve eee
a
‘ r 7 3 T stom)
‘A Figure 1.5
7) Regen inertia ities
the organising committee what more might you look for it a model
Properties of the cumulative distribution
function, F(x)
‘The graphs on the next page, Figure 1.6,show the probability density
fanction f(x) and the cumolative distbution function F(s) of atypical
continuous random variable X.
You will see that the values ofthe random variable always lie between and 6.
Answers to exercises oe avilable at wus adder duction com /ambridgextas i9 Feo
‘A Figure 1.6
‘These graphs illastate a number of general results for cumulative distribution
fnctions.
1 F(a) =0. forx< athe lower limit of x
‘The probability of X taking a value less than or equal to a is er; the
value of X must be greater than or equal t a
2 F(x) = 1. for x b,the upper limit of =. Xcannot take values greater
than b
9 Pes X54) = Fla) FO
PCS X= A= P< A) -PIXS A
‘This is very usfil when finding probabilities fom @ PDF or a CDE.
‘1 CONTINUOUS RANDOM VARIABLES.
FOR.
Faro,
Reared
] : 0
Cremer: To
A Figure 17
Fa
i
7 TE 74 The median, m satisfies the equation F() = 0.5.
P(X m) = 05 by definition ofthe median 1
ve
"0 ee
os|
| at * :
toh + :
:
—* > g
A Fgure 1.8 is
5 f)= 2F@)=F@) ig
Since you integrate f(x) to obtain F(a), the reverse must aso be true:
differentiating F(x) gives fx).
6 P(x) isa continuous function: the graph of y = F(x) has no gaps.
© Notes
1 Notice the use of lower and upper case letters here. The probability
density function is denoted by the lower case f whereas the cumulative
istribution function is given the upper case F.
pera ern (n
"A machine saws planks of wood to a nominal length. The continuous random
variable X represents the erzor in millimetres of the actual length ofa plank
‘coming off the machine. {The variable X has PDF fx) where
0x
(x)=) 50
lo otherwise
foro 10
‘The graph F() is shown in Figure 1.10.
Fay
° 7 7 = T >
‘A Figure 1.10lw) P@= xX <7)=F)-F)
Pale
= 091-036
= 055
(The median value of X's found by solving the equation
Fim) = 05:
eatiee
dw jlym? = 05.
‘Thisis rearranged to give
m? —20n+50=0
20 Ja —4 50
i
‘m= 2.93 (or 17.07, outside the domain for X),
‘The median error is 2.93 mm,
(vil The customer rejects those planks for which 8 = X= 10,
P@ =X 10) = F(10) - Fs)
= 1-096
£0 4% of planks are rejected,
MESES a. -5 oF continuous random vavable Xis given by
s for0 12.(ii) The graph of F(x) is shown in Figure 1.12.
ry
06:
a
Coenen cane eerie =
Figure 1.12
[MESENGER The continuous random variable X has cumulative distribution fonction F(s)
sven by
0 forx <2
=\z_1 for2ex
Fal=)5-3 for2=x <6
1 forx>6,
Find the PDF f(s).
Solution
f= Leo)
Ea
bre)=0 for <2
fa)=) Zrw=% for26
1.4 Finding the PDF of a function of a
continuous random variable
The cumulative distribution function provides you with a stepping stone
between the PDF ofa continuous random variable and that ofa function of
that variable. Example 1.7 shows how itis done.
Anse teenies ae veo wudnt mbes 4‘1 CONTINUOUS RANDOM VARIABLES
‘A company makes metal boxes to order. The basic process consists of euting
four squares off the corners ofa sheet of metal, which i then folded and
welded along the joins. Consequently for every box theze ae four square
coffcuts of waste metal.
A Figure 113,
‘The company is looking for ways to cut costs and the designers wonder if
anything can be done with chese square piece. They decide in the fst place
to investigate the distribution oftheir szes.A survey ofthe lage plein their
serap area shows that hey vary in length up toa maximam of 2 decimetres,
Tris suggested thei lengths in decimetres can be modelled as continuous
random variable L with probability density function
Lae
soa [tO-8) rose
0 otherwise
Assume this model tobe accurate
li). Find the cumulative distribution function for the length of a square.
lil Hence derive the cumulative distribution fanction for the area of
square,
lil) Find the PDF for the area of a square.
liv). Sketch the graphs of the probability density functions and the
‘cumulative distribution functions of che length and the area.
(0) Find the mean area ofthe square offcuts when making a box.
Solution
{) ThecDFis
4 Note the ue of
F()= [.4(4 ~12)4ue J asa ummyverabe
ra
ie]
fxt 0)
Fa,
Since F)=5-% for0<1=2
pce elige (hes
i fora> 4
lil The PDF for the area of a square is found by differentiating H1@)
4-4 for0 eyo woyuny @ 0 40¢ 24) BupuLy #1
livl_ ‘The graphs of the PDFs ofthe length and the area are shown in
Figure 1.14
{al an, Ib} ie)
to 1
0s os
5 7 on di 2 3 4 =
a Figure 136
1
ti) fro
Answers o exeises are avilable t wun hoddened uation com /cambidgestnas1 CONTINUOUS RANDOM VARIABLES
1a
B 2.
fra=0
fords a=4
Ef fora> 4,
a Mean = E(4) = [sbvo)4e
W-9
8
‘This could also have been found as the mean of a function of a continuous
random variable, using the general result,
Bteeal= J eemteyax
‘here xis the length Inot the area) of one of the squares.Inthisease, ghs)= = f(x) =¥(4—x!) and oe x<2
ging le X0]= fs F4-a?)ae
le. the same answer
WMS 1 continuous random variable has PDF Aa) where
yal? OSes
= lo otherwise.
‘igeuen wopues snonuuoa e jo wonsuny © 0 4a ay) Bujpury 1
lil Find BQ9,
lil Find the cumulative distribution function, F(a).
lil) Find PO = x= 2) using
fa) ee)
tb) ax)
and shout your answer isthe same by each method.
2 The continuous random variable U has PDF flu) where
iu for Sus 8
Of Shem
[il Find the value of &
lil Sketeh fl.
(ill Find FQ.
(iv) Sketch the graph of F().
3. A continuous random variable X has PDF fls) where
xt for text
cafe
Ui Find the ae oF
(i) Find Fe
lil) Find the median of X
Ui) Find the mode of X
Aner weer ania inilecttincntenitecin il1 CONTINUOUS RANDOM VARIABLES
‘The continuous random variable X has PDF fx) given by
&
£(@)=]@+0"
0 otherwise,
fors=0
where kis a constant
li) Show that k = 3, and find the cumulative distribution fiction,
Ui) Pid abo he vue of: sock ae C=) ~ Z
“The continuous random vrsbleXhas CDF gen by
0 forx <0
F(x) = 20-2 for0Sx st
a forx>1,
{il Find PEX> 05).
Ut nde wl of uch da P< = 4
Ul inde PDF eof Xan ste is gph
he coins nom able has PDR) hea by
ioms) e053
w-fr? ee
case
Show tat b= yan find the we of (8) and Y()
Find the cumulative distribution function for X,and verify by calculation
that the median value of Xis between 1.04 and 1.05.
{A continuous random variable X has PDF fs) where
féx(1-x) ford 1
ns
Find the mean of Xand show that othe variance of Xi 0.05
‘Show that F(a) the probability hat X= x (for any value of x between
O and 1), satisfies
0 forx <0
F(s)={38?-28)GeOS est
1 fers >t
Use this sue to show that P(|X= 1] < 0°) = 0.1495
“What would this probability be if instead, X were normally distributed?
‘The continous random vatable X has PDF (x) given by
{e for OSS 1
f(x),
0 otherwise"
(i) Find Fes)
‘The contnnonn rnd varale Vi given by Y= 30, The cumulative
isin incon oF Ys denoted HG 1
(ii) Find HQ).
(iii) Find hy).
liv) Find P(X < 0.5).
(w) Find POY < 05)
The continuous random viable X has PDE Qe) given by
; =po ford 0.
(v)_ Find J(2) where J(2) i the cumulative distribution faction of Z.
‘sigetseA wopues snonuuoa e yo uoNsuny © 0 4a oui BupULY 1
(wi) Find
vi) Find PZ < 3)
‘The continous random vacable X has PDF ff) given by
02 for30 beats
pcerene|taner
2G fte)de
va(a)= Joereyas—[E007
the mean of Xs the fr which
J tonae and f11e)40=05
‘he mode of Xi che valor which ) ha gress
mile
‘proba deity Scton maybe di peeve
2 IEIN] va mein of Xen the expectation and vain of Xe
£(@(%0) = Joinex
var(a(X))= Ji@te)*A(=) d= —(B(60)"
2 is he probaity density faction ofX then the cursive
disrbation finaion (CDF) of Xis Fa) where
Bp) athe ei
fey= re
for the median, m, Fin) = 05.
4 Given that fla is the probability density function of X, you can find
that ofa related variable (eg. ¥ = X?.
‘To do this you need fist to find the cumulative distribution function
lof X and use this to find that of Y, You can then differentiate the
ccuntlative distribution function of Yt find fly), the probability
density function of ¥."Now that you have finished this chapter, you should be able to
Tos probity sty Fanction chen bs dean ae
fen aren en eo
esr erag aye tote eee
re ce ere etn ae eee
2(X) is a fiction of X
se the general result Var(e(X)) = J (e(s))" A(x) de—[E(@(X))}°
ee et cea Ceca ae
random vibe X and gies foci of
unlettnd aod ww the aor Borweea the pot deziey
fineion (PD and the culate dab ncn (CDF)
sw: aPDF or CDF'ts enlace pobabiis
ea EDRera CDF io eae the median another percentiles
eee
09 @ 0 uojouny @ 0 Jag ayy Bupuly |
Anacrcein mauit eniiatcnteniccte ilInference using normal and
t-distributions
Every
experiment
may be said
toexist only
inorder to
dive the facts
chance of
disproving
the null
hypothesis,
RA, Fisher
(1890-1962)
2.1 Interpreting sample data using
the t-distribution
2 INFERENCE USING NORMAL AND (-DISTRIBUTIONS.
Students find new bat
‘Two students and a lecturer catching them for specimens,’ she
hhave found their way into the explained.
textbooks. On a recent field trip he other wo members of the
they discovered a small colony of mg
afprevionaly unkoven bat ving Sy Resuter Jarwinckr Pal and
iu 21-year-old Vijay Kumar, showed
aan lots of photographs of the bats as
“Somewhere in Northern India’ well as pages of measurements
is all that Shakila Mahadavan, that they had gently made on the
20, would say about its location. few bats they had caught before
“We don’t want the general public releasing them back into their
disturbing the bats or worse still cave‘The deviation is
the difference
eor-lof the
value from the
‘mean, In this,
example the
mean is 146,
You need to know
the degrees
af freedom in
many situations
‘where you are
caleulating
confidence
intervals or
conducting
hypothesis teste
‘The meatremens fed in the end he weigh. a
Shiu tree eid se aks
we sess
Fos gta wat sina he mean eg fa
Inland 99 cones st ge
iis laf he nese pera seth ely erent
Sh Alors nom ow the pt popula bwin cn
inte fon he et mesure ow mete te mer no
Sf drson eu oui bua te ek
‘The mean is estimated to be the same asthe sample mean
156:4132+ 160-4 142-4 145-4138-4 1514144 _
5
46,
‘When it comes to estimating the standard deviation, seart by finding the
sample variance
(,=3)"
acon)
and then take the square root to find the standard deviation, «
“The use of (n~ 1) as divisor illustrates the important concept of degrees of
freedom,
‘The deviations ofthe eight numbers areas follows.
Uwonnguisip- ay) Buren evep eyduses Buranda) 1
136-146,
132-146
160-146
142-146 =—-4
145 ~146=—1 ;
138-146 =-8
151-146
144 146
‘These eight deviations are not independent they must add up to zero
because of the way the mean is calculated. This means that when you have
‘worked out che first seven deviations it is inevitable that the final one has
the value it does (inthis ease ~2). Only seven values of the deviation are
independent and, in general, only (n~ 1) out ofthe n deviations fom the
sample mean are independent,
Consequently, there ate n~1 free variables in this sation. "The number of
five variables within a system is called the degrees of freedom and denoted
byw.
Answers exes are available at wa hoddeneducaton com abridge i2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS.
Aparticular
value of the
sample variance
te denoted by
the associated
random variable
bys.
So the sample variance is worked out using divisor (n ~ 1). The resulting
value is very useful because it is an unbiased estimate of the parent
population variance.
In the case of the bats, the estimated population variance is
‘The numbers onthe top t
100, 196 and s9 on are the
‘squares ofthe deviations,
7 DUDE 96 FI L164 25-8) «gg
N86 = 9.27,
and the corresponding value ofthe standard deviation is
Calculating the confidence intervals
[Returning to the prablem of estimating the mean weight ofthe bats, you
‘now know that
Fad, 2 = 86,
[Before starting on further calculations, there ae some important and related
points to notice
41 This is a small sample. It would have been much beter if they had
managed to catch and weigh more chan eight bats.
=9.27 and y=8-1=7
2 The true parent standard deviation, ois unknown and, consequently, che
standard deviation of ehe sampling distribution given by the central limit
theorem, -&, is also unknown.
oa
3 In situations where the sample is small and the parent standard deviation,
or variance is unknown, there is litle more that can be done unless you
can assume that the parent population is normal. (In this case that isa
reasonable assumption, the bats being a naturally occurring population )
Ifyou can assume normality; then you may use the distribution,
cstimating the value of o fom your sample.
4 It is posible to test whether a set of data could reasonably have been
taken ffom a normal distribution by using normal probability graph paper.
‘The method involves making a cumulative frequency table and plotting
points on a graph with specially chosen axes. Ifthe graph obtained is
Approximately a straight line, then the data could plausibly have been.
drawn from « normal population. Otherwise a normal population is
woikely.
‘The ristribution looks very ike the normal distribution. Its exact
shape depends on the number of degrees of freedom, v, and, indeed, for
large values of v it slit different from it-The larger the value of v, the
closer the tdistribution isto the normal Figure 2.1 shows the normal,
dlistibution and tdistributions v= 2 and y= 10.araon rs
‘A Figurez
© historical note
‘Wiliam 5. Gosset was born in Canterbury, UKin 1876, After studying both
‘mathematies and chemistry at Oxford, he joined the Guinness breweries in
Dublin as.a scientist. He found that an immense amount of statistical data
‘was available, relating the brewing methods and the quality ofthe ingredients,
particularly barley and hops, tothe finished product. Much of this data took
the form of samples, and Gosset developed techniques to handle them,
including the discovery ofthe t-
Answers 0 exercises are avilable at cameg
3
2
:
2
2
:
‘A Figure 23
(i) The confidence interval does not contain 95 minutes (the time tken by
the tran). Therefore there is suficient evidence to suggest thatthe journey
time by busi different ffom that by train, and chat iti, in fact ls
2.2 Using the ¢-distribution for paired
samples
“The ideas developed in the lst few pages can also be used in constructing
confidence intervals forthe difference inthe means of paired data. This is
shown in the next example,
In an experiment on group behaviour, 12 subjects were each asked to
hold one arm out horizontally while supporting a 2kg weight, under two
conditions:
>» while together in a group
>» while alone with the experimenter
"The times, in seconds, for which they were able to support the weight under
the two conditions were recorded a follows.
A[s[e[pl[elr[é[n[rfs[K[e
ot [7 | 72 [ 53 | 71 | 48 | 85 | 72 | 82 | 54 | 70 | 73
4 | 72 | 81 | 35 | 56 | 39 | 6 | 66 | 38 | 60 | 74 | 52
aa [-1|-9|ia|is| «|| «| 4a[—«| afarThe ‘true
ditference means
the diference
inthe times in
the population
aa whole You
fannot know
with certainty
the differences
inthe population
bout you ean infer
from the sample
confidence
Interval forthe
population,
Find 2 90% confidence interval for the tue diference between ‘group’
times and alone’ simes. You may astume thatthe differences are normally
stributed. Does your result provide evidence that there is any difference in
the times inthe population as a whole?
Solution
‘The sample comprises the 12 differences
Mean
Standard deviation
Degrees of freedom
Given the assumption thatthe differences are normally distributed, you may
tse the e-stribution,
For y= 11, the two-tiled critical value from the tlistribution at the 10%
level of significance is 1.796,
‘The 90% symmetrical confidence interval forthe mean difference between,
the ‘group’ and alone’ times is
7-196
tod +1.796x—
a Via
2.77 to 18.57,
Since the confidence interval does not contain zero, there is evidence that
there isa difference in the times in the population as a whole.
2.3 Hypothesis testing on a sample
mean using the r-distribution
‘At the beginning ofthis chapter, you mec the dlstribution. This isthe
distribution of a sample mean when the parent population is normally
distributed but the standard deviation ofthe parent population is unknown
and has to be estimated using the sample standard deviation, s In addition to
finding a confidence interval, you can alo carry ou a eeest (4 hypothesis
test based on the t-disteibution)
‘Tests ate being carried out on a new drug designed co relieve the symptoms
‘of the common cold, One of the tests is 0 investigate whether the drug has
any effect on the number of hours that people sleep.
"The drug is given in tablet form one evening to a random sample of 16
people who have colds. The number of houss they sleep may be assumed 0
bbe normally distributed and is recorded as follows.
»
Aout eves voile nuded com ambien:
=
i
3INFERENCE USING NORMAL AND f-DISTRIBUTIONS.
a1 «673372 BL 9 OTH
64 69 70 78 67 72 76 79
‘There is abo a large control group of people who have colds but are not
given the drug The mean number of hours they sleep is 6.6,
lil Use these data to set up a 95% confidence interval for the mean length
‘of time somebody with a cold sleeps after taking the tablet
lil Carey out a test at che 1% significance level, ofthe hypothesis that che
new drug hat an effect on the number of hours a perron sleeps
Solution
lil For the given data, = 16,0=16~1=15, ¥ = 7.094, s= 1.276.
For a 95% confidence interval, with » = 15, k= 2.131 (fom tables)
‘ 1.276
$= 7.09442.131x
Va vie
So the 95% confidence interval for is 6.41 to 7.77
‘The confidence limits are given by $k
sion
room|
romes2te lB am
‘A Figure 24
(il) Hy There isno change in the mean number of hours seep. w= 6.6
H:There is a change in the mean number of hours sleep. # 6.6
‘Two-tled testa the 1% significance level,
For this sample, n = 16, = 16— 1 = 15,¥ = 7.094,s= 1.276.
"The critical value fort for v= 15, atthe 19% significance level, is found
from tables to be 2.947,
7.094 - 66
1276
Ve vis
‘The tee static 155.‘hiss to be compared with 2947 he xa vale, forte 1%
sigan on
Since 138-<2947 therein reton tthe 1 signifcance eto
Te the lbp
‘There isinsuficient evidence to suggest thatthe mean number of hours
sleep is dfferene when people tke the drug.
I you have access toa computer, you ean use a spreadsheet (s0e Figure 2.5) is
‘ado all ofthe calculations for this test using the fllowing steps. t Dam is
1 Ener the data ln this case int cls 80817 : uy | ie
2. Use ne ependseet tunctone provided by your spreadsheet tr example | 1 §
AVERAGE and-STDEV Io find he meen aedsonglestneardaenatons | = i
2 calculate the ralue using thefommula@1e-eéViBTvsaRTISH, | 4 é
4 Use the spreadsheet function provided by your spreadsheet, for example | 60 3
=TINV.2TI0.01,15) to calculate the critical value, o 14 ’
5 You can also find the p-value using the spreadsheet function provided by oe 3
Your spreadsheet fo amples DISTZTI SUED) " ol] | i
a sal iy
A a &
1a —Pa :
oof ae z
In this case, the 20 16 g
Been Gites :
egrat 12 Chil [38067 g
cells B18 and 23 pele | 0.25 3
a
‘A Figure 25
‘The mean of a random sample of seven observations of a normally
dliscributed random variable X i 132.6. Based on these seven observations,
a unbiased estimate of the pasent population variance # i 140.84,
li) Explain why an estimate ofthe standard error is given by 4.61
lil Show that a 95% confidence interval for the mean j of Xis 121.3
to 39.
2 The weights in grams of six beetles of a particular species are as follows.
123 97 118 104 12 124
lil Calculate the sample mean and show that an estimate of the sample
variance is 1.291,
lil, Show chat 2 90% confidence interval for the mean w of X's 10.32,
to 1218.
eno ei sii editing i3
3
g
z
:
z
:
8
t
‘An apstude test for entrance to university is designed to produce scores
‘that may be modelled by the normal distribution. In ealy testing, 15
sadents from the appropriate age group are given the test Their scores
(out of 500) are as follows
321445219378 4072895
2764636565298
li) Use these data to estimate the mean and standard deviation to be
‘expected for students aking this ret
lil Construct 295% confidence interval for the mean,
A fit farmer has a large number of almond trees, all ofthe same variety
and of the same age. One year, he wishes to estimate the mean yield of
hs trees, He collect all ehe almond fiom eight tees and records the
following weights (n ke).
cr a
li) Use these data to estimate the mean and standard deviation of the
yields of all the farmer’ tres
lil Construct 95% confidence interval for the mean yield.
li) What stastical assumption is required for your procedure to be valid?
liv) How might you select a sample of eight trees from those growing in
a large field?
A forensic scientist is trying to decide whether a man accused of fraud
‘could have written a particular letter. As part of the investigation she
looks at the lengths of sentences used in the letter. She finds ther to
have the following numbers of words,
7 18 2 4 18 16 14 16 16 2 2 19
lil Use dhese data to estimate the mean and standard deviation of the
lengths of sentences used by the leter writer
lil Construct 29096 confidence interval for the mean Tength of the
lester writer’ sentences,
lil What assumptions have you made to obtain your answer?
liv) A sample of sentences written by the accused has mean length 26
‘words. Doss this mean he is innocent?
A large company i investigating the numberof incoming telephone call 2 its
‘exchange in onder to determine how many telephone lines it should have. On
Sundays very few calls are received because the office i closed. During March
‘one yer, the number of calls received each day was recorded, as follows.655 | 661 | 599
‘08 | 650 | 901
[a [sar [oe [rar [ose To
[saw | or | ia | 706 | eo
a
(i) What day of the week was 1 March?
(il Which of the data do you consider relevant to the company's
research and why?
lil Construct 2 95% confidence interval for the number of incoming
calls per weekday.
liv) Your calculation is eiticsed on the grounds thae your data are
discrete and so the underlying distribution cannot possibly be
normal. How would you respond to this criticism?
Ayre company is tying out a new tread pattern, which i s hoped will
result in the tyres giving greater distance. Ina pilot experiment, 12 eyes
are testo the mileages (»1000miles) at which they are condemned are
as follows:
6 BN BE © 0 B81 2 6 B B
{il Construct a 95% confidence interval fr the mean distance that
tyre tavels before being condemned.
{il Whaeassumpeions,satsical and practical, are required for your
answer to pare (it be valid?
‘Ange fshing-hoat made a catch of 500 mackerel from a shosl-The
cot mas of the eatch was 320k. The standaed deviation ofthe mas of
individual mackerel is known to be 0.06kg
(Find a 99% confidence interval for the mean mass ofa mackerel in
the shoal
‘An individual fisherman caught ten mackerel from the se shoal These
had masses (in kg) of
1.04 0.94 092 0.85 085 0.70 0.68 0.62 061 059
Ui} From these data only use your calculator to estimate the mean and
sandard deviation ofthe masses of mackerel in dhe shoal
Ui Assuming the masses of mackere! ae normally dstibated, se your
results fom pat (il find another 99% confidence interval fr the
sean mass of a mackerel in the sho.
liv). Give two statistical reasons why you would use the fis imits yous
calculated in preference co the second limit.
Answers to esenizes ae available at wns hodderedusation com /eambr
Uuounguisyp- aq) Bursn vesw ojduues e uo Busey sisauiodiy ee3
;
:
Z
i
:
3
3
8
3
é
a
A new computerised job-matching system has been developed that finds
suitably skilled applicants to fit notiied vacancies. It is hoped that this
will reduce unemployment rates, and a tral of the system is conducted
"The unemployment rates in each area just before the introduction ofthe
system and affer one month ofits operation are recorded in the table below.
s[e[?
46 [na [77
Ty?
[ae
ZI EH
li Find 2 90% confidence intra forthe ue ference between the
to ates of unemployment.
li) Does your confidence interval provide evidence that there a
difference in the ates afer the new sytem is inroduced?
Ui) Do you think the asumptions required to construct the confidence
iweva are jst here?
‘Two timekeopers at an athletics track are being compared. They each
time the nine sprints one afternoon.
lil Find a 9996 confidence interval forthe cue difference between the
times recorded by the ewo timers. The times they record are listed
below,
12 +[s[e]7[s]e
9.68 | 10.01 | 9.62 |21.90| 20:70 | 20.90 | 42.30] 43.91 | 13.96
9.66 | 9.99 9.44 | 22.00] 20.82 | 20.58 | 42.39 | 44.27 [44.22
Ui Do you think that the two timers are equivalent on average?
li) Are the assumptions appropriate for a confidence interval based on.
the ‘test justified in this case?
Fourteen marked rats were timed twice as they ran through a maze. In
‘one condition, chey had just been fed; in the other they were hungry.
(i) Find a 95% confidence interval for the true difference between the
rts’ times when they are fed and when they are hungry. The data
below give the rats’ times in each condition,
Al®[¢]>[e[F]o[n] i]s] «| [MIN
30} 31 |25|23| 50] 26] 14] 27 | 31 | 39 | 38 | 39 | 44 | 30
29/18] 14 | 27) 37] 34] 15] 22 | 29 | 18 | 20 | 10| 30 | 52
li) Do you think the assumptions required to construct the confidence
interval are justified here?
lil) Hal€ofehe rats were made to ran the maze frst when hungry and
half ran ie frst when fed, Why did the experimenter do this?12 Cats of ol ate supposed to contain at eat 380ml, Sl thnks that the
seenge conten a his She mesures he coon of doa
Spl fen ca th he long stl nce oh, ys
3265 33123279 329.8 3304
3276 ©3293 3303084330.
(i) What distributional assumption is necessary in onder to earry outa
‘testo check Salma’ suspicion?
Ui) Carry out this test at the 5% significance level
ill 1fSalma picked 10 cans fom a single box of 24, would the test have
still been valid? Explain your answer
13, Ina certain country, past research showed that inthe average married
couple, the man was 7em taller than his wife. sociologist believes that
‘with changing roles, people are now choosing marriage partners neater
their own height.She has measured the heights of 12 couples. Her results
are shown in the table below.
i[2?]3]4]s]6
1982 [1742 | 1928 | 163.6 | 1832 | 71a
Uonnquisip- ay Gulch ueaus syduses © uo Busey siseyiodly 2
178.4 [165.1 | 191.9 | 1563 | 178.7 | 163.0
7{sefofol[n] pe
80.6 | 733 [1668 | F719 | 1755 | 103.2
1so.4 | 1702 | 1642 | 1662 | 1769 [1588
Li) ‘Test atthe 59 significance level che hypothesis that men are on
average less than 7em eller than their wives
lil, Explain clealy what assumptions you make in this ease.
16 The speed » at which a javelin is thrown by an athlete is measured in
Jamh*-The results for 10 randomly chosen throws are summavised by
Yr=1108, Lo-v =3339,
where P isthe sample mean,
(i) Staking any necessary assumption, calculate a 99% confidence
interval for the mean speed of a throw:
‘The seals Go a Gunther 5 randomly chosen throws re now combined
with the above results. It is found chat the sample variance is smaller than
that used in par (i
lil State, with reasons, whether a 95% confidence interval calculated,
fiom the combined 15 results will be wider ot less wide than that
found in pare (i)
Gantry Internationa AS & A Ler Further Mathnats
9281 Paper 23 QT November 2012
Aner exes a alle a sonoma a2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS
1%
W
Ina crossword competition the times x minutes, taken by a random
sample of 6 entrants to complete a crossword are summarised as follows,
Yee2i09 Dome =i812
‘rice cen rt hc wn
jist, Cladus 99 outdenosineral
‘acon fiat th tnd even of ia pelos een 2
pee mines Find eb nae mpl setae woul lad 295%
eSatncewecrl for of wath nS mes
Sate ena A deer Dae
stat ge HO ie
‘random sample of8 oberon of normal ndom vail X pe
the flowingummarted das whee denotes the nple mean
Dros Lom
‘Test at the 5% significance level, whether the population mean of X is
greater than 45.
Calculate a 95% confidence interval for the population mean of X.
Ganividge Itt AS & A Love Further Mathematicr
9251 Paper 21 QD fe 2012
‘A random sample of 10 observations of a normally distributed random
variable X gave the following summarised data, where denotes the
sample mean.
Yee04 Leena)
“Test at the 1086 significance level, whether the population mean of X is
Jess than 7.5,
15519
508)
Cambridge Ineational AS © A Lee Further Madeats
9231 Paper 21.Q7 Never 2013
‘A random sample of 8 sunflower plant is taken from the large number
‘grown by a gardener, and che heights ofthe plants are measured. A 95%
confidence interval for the population mean, 1 metres is calculated from
the sample data as 1.17 < p< 2.03, Given thatthe height of a sunflower
plant is denoted by 2 metres find the values of yx and x? for this
sample of 8 plants.
CamiidgeInerational AS © A Lee rther Mahemats
9231 Paper 21 Q7 Jone 20152.4 Using the f-distribution with two
samples
Have you noticed how time often seems to pass more slowly afer lunch?
IE time passes more slowly, one minute of rel time should seem longer, 50 if
‘you ask people to estimate when a minute appears to have elapsed, the real
time elapsed will be less
‘You could ask the question:"Will the mean realtime elapsed when one
‘minute appears to have elapsed be less afer Iunch than before?”
In this example you are interested, notin what the mean value ofa random
variable i, but in wha the dference between the mean values is in 6wo
different situations, Statistical problems giving tse to different versions of this
‘general question are the topic ofthe next few sections
Lunch time
Find a group of volunteers and approach them before lunch, Give each of
them a starting signal and ask chem eo say when one minute has elapsed.
Record the realtime elapsed. You will then need to find a second group of
volunteers to approach after lunch, You will need reasonably large groups to
get usefl rests,
11 What ate the advantages and disidvantages of using separate groups of
people forthe before-Iunch and afer-lunch times?
2. What are the advantages and disadvantages of instead conducting an
experiment in which the same people are asked before and after lunch
and only the difference in ther eal times recorded?
‘The volunteers in research projects are called subjects and before lunch’ and
‘after lunch’ are the two conditions in which testing occurs. An experiment
such as the one described above, where a different group of subjects i tested
in each of the two conditions is called an unpaired design; this is in contrast
0. paired design you met easier in this chapter, where the same set of
subjects tested in both conditions
“The members af a maths clas were asked one marning ta check the time
shown by their watches then look away and, when they estimated that a minute
bad elapsed, to check dheir watches again to see how long had in face elapsed,
“The same procedure was followed with another clas from the same year
soup, that afternoon. The back-to-back stem-and-leaf diagram on the next
page shows the results,
Ansiverso exrises ave available at yun hoddeneducation. com fambridgestas
Saraiues ov
ye vonmaunsips aus Busse) FZ2 INFERENCE USING NORMAL AND /-DISTRIBUTIONS
Morning class___Afternoon class
(24 students) (22 students)
28
2]3{4
556 |3| 6
4| 440
78 |4| 999665
113333445] 22111
3557777 |5| 8775
144 /6| 3
|.6| 3. represents 63 seconds
a Figure 26
"This experiment gives a set of data with which you could investigate the
{question asked a¢ the start of this section. This experiment has an unpaired
design: two separate groups of subjects are used in the two conditions
‘This section uses the data given in Figure 2.6 to work through the process
‘of hypothesis testing in the context of an unpaired design. Ifyou have
‘carried out your own investigation, you might find i helpfl to repeat the
calculations using your data,
‘You cannot look here atthe difference between a before-lunch and an
after-lunch time fora particular person, but you can look atthe difference
‘between the mean before-lonch time and the mean after-lunch time. In fact,
‘you can make the following hypotheses:
Hi: There is no difference between the mean of people's estimates of one
minute before and afier lunch,
H;Afier lanch, the mean of people’ estimates of one minute tends to be
shorter than before lunch,
You can then use as your sample statistic the difference between the before
lunch sample mean and the after-unch sample mean. You need to calculate
the distribution of this sample statistic on the assurnption thatthe mull,
hypothesis is true
The test statistic and its distribution
‘Assume that each before-lunch estimate isan independent random variable
1X, (= Lyon 24) with the normal distribution (4,6) and each after-lunch
‘estimate is an independent random variable Y, (= 1,.--,22) with normal
distribution N(jz, 62)-You are also making the assumption that each X is
independent of each Y.This isa plausible assumption; it merely requires the
independence of che two samples taken,‘Recall thatthe mean of a sample size from a normal distribution Niu, @°)
has distribution 2
a
In this case, cherefore, the mean of the 24 before-lunch estimates has
dliseibution
[Next you need a result that if has distribution N(u,, 02) and Y has
distribution N(u,,02) then (X— ¥) has distribution
N(ity ~ ys o% + 3)
Here, the distribution of the differences ofthe two sample means is therfore
‘Serdes omy yin uonnquisip~ ous Burst) 92
“The mul hypothesis then ates that both means are equal i 4
che ul hypothesis tue
ep ewle Se 2 :
R-7-n(oH+ 5) i
Unforaateh you donot know of a 3 3 you wil want 3s you hae in
eater work to replace these unknown vals with sample estes Te mght
Seem ment mata owe sample eimates fo the two unkown variances,
ein fc in tums ov that hem ard to make any progres in elatng,
the ditibutton. This caper, therefore only deals with theese where You
fan asume tht the variance inthe two conditions are equal here hs,
inca thao? 0 otha the before and afer one estimates have
the nme variance fats ewe
So that
aE - non ®
oe T1
osm) ovat 2
To estimate 0, you ean use the pooled vaiance estimator from the 0
samples,
@4-DS} + 2-52
CFD
Asters to eves are aval owe deeduction, com canbpestas a:
2
i
3
3
2
3
g
3
Z
where S? and $2 are the usual unbiased sample estimators ofthe population
2a
which is obtained from @ by replacing the value of o* with its estimator S*,
then has a t-distribution, with degrees of freedom equal to that in the pooled
variance estimate: (24 +22 -2) = 44
Carrying out the /-test for an unpaired sample
In the example
= 51.542, 5,
TPKE
4 8.691
and the value ofthe test statistic is
8.797; 7 = 47.908, 5, =8574
& Note
51.542 — 47.909
‘The process A16.
described 2
Fest forthe
Gifrence ot |The cca region fora one-tailed tt inthe ae of 44 depress of eedom,
tere the 8% significance evel is > 1.680 (is ale docs not apes inthe tables,
with unpaired but can be obtained by interpolation for the values given for 30 and 50 degrees
samyiesso | fed). Sins i416 «1.68, esl lend yout ace he mall
Typothe ats significance lsh no good evidence tht che boxe
Inch mean andthe ae anch mean ofthe population a whole diferent
aera ear
pen ung a -ample He
1 Usea ration timer to decide wheter male and fale have the sme
tne action ines or wie older people have slower rections
then young people ou ean choose the definition of older to wit he
sample you hve sable) or whether sah lyr hae quicker
ae
“oun use Dem rulers acon tnd hl ely th
the zero mark downwards while dhe subject holds their thumb and forefinger
2a apart at the 2er0 mark of the ruler You drop the ruler without warning
2-sample t-test.and your subject resto ach it betwen thumb and frefingeThe distance,
¢.in mints, through which the ur has fille before ie caught can be
ted to mesure the ection cn, in second using che formal 2
i
3
2 Are students sudying A Level math beter at mental arithmetic than
those taking other A Leveh? You will need to devise a mental aichmeic
text (do you want to test speed or accuracy) and administer ito a group
OFA Level maths suidents and a group of students taking other A Level
‘Do not be disapointed by the result You can adapt this test vo suit your
prijodices: are A Level geography sradcns beer at naming capitals of
foreign counties? Are A Level English students better at speling?
3 Two groups of mbject are each given lists of 25 words. Bath groups
‘mst run down the ist as quickly ax posible Those in the fist group
tick the words that are in capital lees (You should make sure that
about half ofthe words, placed randomly inthe ist are in capital leer.)
‘The second group teks the words that chyme witha target word that,
you give them. (Make sure chat about half ofthe words, placed randomly
in he list do thyme with this word) You then ak the subjects each
to write down as many word a they cn remember fom the list do
not tll the subjects in advance that chy will have to do this Test the
Irypothesis that the subjects who have looked for rymes remember
more ofthe words than those who Tooked for capital eters. Why would
jee dificult to run this experiment witha paied design?
squisip-1 aq) Buln 9
Assumptions for the 2-sample -test
“The assumptions needed for the 2-sample test are quite severe.
11 The two samples must be independent randam samples of the populations
involved,
Strictly this requires every posible sample to have an equal probabilcy
‘of being chosen. Ifyou simply picked a group of volunteers, it would,
therefore, probably not be 2 random sample. However, this method is very
close to the method offen used by academic psychologists when choosing
their samples, The hope in choosing a random sample is that the effects
lof ll the irelevane ciferences benween members of the poptlation that
influence the variables you are testing will average out.
2. The random variables measured in che two conditions mast:
{ibe normally distributed
Ui) have equal variances inthe wo conditions
Answers to exeniss are avilable at wre hoddereducton com Heambridgexins a2 INFERENCE USING NORMAL AND f-
[Are these assumptions justifed?"The only information you have co help you
decide isthe two samples: the stem-and-leaf diagram for che data ofthe
‘example is shown again below.
‘Morning class Afternoon
(24 students) (22 students)
2) 8
2/3/14
ss6\3|6
4] 440
78 |4| 999665
11333344 {s/ 22111
5557777 |5| 8775
144 |6|3
|6| 3 represents 63 seconds
A Figure27
[Ac frst sight, the distributions here do not look much like samples from a
normal distribution: they are rather obviously negatively skewed. Neither is
it clear that they would have come from populations of the sume variance
However, these are relatively small samples and ie would be unwise to draw
any firm conclusions from them about the population distribution from
which they are drawn,
> Do you think the assumptions made in the 2-sample test are
{justified in the case of the experiment you carried out?
The underlying logic of hypothesis testing
When you construct the sampling dssibution of test statistic you use
>» a model for the distribution of the random variables involved in the
>» the value given to a parameter ofthis distribution by the null hypothesis,
In the time-estimation example the construction of the sampling dissibution
depends on:
>» people’ estimates of one minute before and affer Ianch being,
independent and distributed normally, with a common variance
>» the null hypothesis that the difference between the means of their
‘estimates before and after unch is zero
‘The alternative hypothesis, that the difference betwen these means is greater
than zero, gives an alternative range of possible values for the parameter
of the distribution, but assumes the same model for the random variables
involved,In the example it was determined (by using pre-calculated tables, in ict),
thar if
> the model fr the random variables was correct
and
> the null hypothesis were true
then a test statistic greater than 1.680 would only arse in a random sample
5% of the time. (The significance level is 5%)
In the example easier inthis section, you obtained a value of 1.416 and, since
this is less than 1,680, che null hypothesis was accepted. Suppose, instead, that
you had obtained a value greater than 1.680; say, for example, 1.832, In that
cate, there would be three possible explanations
Explanation A.
» ‘The model is correct.
> The null hypothesis is false, because the mean difference in before~ and
after-lunch times is greater than zero.
Explanation B
» ‘The model is correct.
» The nall hypothesis is true (or false because the mean difference in
‘before-anlafer-lunch times is actually less than 2et0).
However, the sample selected happens to give a value of the test statistic
greater than 1,680. The probability ofthis happening is 0.05 (¢he significance
level) if the null hypothesis is true, or less ifche mean difference in before~
and affer-hanch times is actully les than zero,
Explanation C
» The model is incorrect, because the sampling method does not produce
independent estimates for each subject, or because the estimates are not
listributed normally in the population, or do not have a common variance
> The mall hypothesis is true ot ike.
In this, ase you have no idea how likely eis that the test statistic will have
any value at all,
‘The hypothesis testing methodology is:
>» to assume that explanation C is not the case
» to observe that if explanation B was the case then the results obtained:
‘would be very unlikely,
»» and therefore to accept that explanation A is the case.
‘Thus you reject the null hypothesis and accept the alternative.
However, you should always be aware that the logic that leads you to this
‘conclusion on the basis ofthe evidence in the sample depend on the
correctness of your sampling and distributional assumptions
Answers to exeries oe available at ” es
sejdues om) yim uonmnquisips ow GUIs 722
:
5
Z
i
2
i
i
2.5 Comparison between paired and
2-sample f-tests
‘The table below shows summarised data from the experiment you have just
been analysing, together with data gathered fiom a paired experiment using a
single sample of twelve people. Each was asked, both before and after lunch,
to estimate one minute inthe same way as deeribed for the unpaired design,
‘The test statistic forthe paired experiment is 1.829, with 11 degrees of
fieedom and a critical value of 1.796, 50 that here the mull hypothesis is
rejected.
“Why do you reject the mull hypothesis inthe paired case where the sample
size is considerably smaller, which, all other things being equal, would usualy
lead co a less decisive test, 28 reflected in the larger critical value?
‘You can see why the apposite appears to have happened if you look at how
the test satisties forthe two cases are calculated.
$1250 17.59 - 09 51,542 — 47.909 _ 1 416
T
60465 soot
‘The tes statistics for the paired and unpaired calculations have very similae
numerator, but the standard error in the denominator is considerably lager
in the unpaired calculation, despite the larger sample size in that cas.
© not
‘The crucial point is thatthe
is, forall sorts of reasons, considerable
variation amongst people in their reaction times and lunch i only one,
relatively small, effect amongst many. Some people wil tend to make short
estimates in both conditions and some long estimates in both conditions,
‘though in both cases the effect af lunch may be the same.
‘The pared design enables you to take ths inte account in a way that the
unpaired design cannot because ofthe way the standard error s estimated,
Using paired and 2-sample ¢-tests
It isa characteristic of research by social scientists that they are looking for
a small average difference between the values of a particular variable in two
diferent conditions, but that subjects show very substantial variation in thevalues ofthis variable within both conditions. In these situations, 2-sample
‘tests not usually very helpfl, asi will require a very large sample size to
discriminate between the nll hypothesis of no difference between the means
in the two conditions and the true situation where there is a small difference
‘Considerable ingenuity is therefore employed in attempting to match subjects
s0 thata puted test ean be used to eliminate some of the variation between
‘them and the small diference between the two experimental conditions is
not swamped,
In the paired experiment, you used the same subject in each of two
‘conditions, bt this is not necessary. In fact, having taken part in one
‘experimental condition sometimes makes it impossible to take part in the
second.
For example, if you wish ¢o test the effect on children’s intelligence of an
‘upbringing in families from two different social chases, you could not use the
same child and bring it up twice, nor would a 2-sample t-test be suitable in
this case: the variation in intelligence caused by other factors would swamp
the effect you are looking for.
ne possibilty isto find pairs of identical ewins who are being adopted
a birth and are asigned to adoptive paren of diferent social clases:
these conitute matched pairs of subjects and you could use a Hees on
the diferences between the intelligences of the twins from the two types
‘of family. Notice that here the matching is perfect in che sense that both
children have identical genetic endowments: the belief implicit in this,
experiment is that heredity is a major cause of variation in intelligence and
this ffece will be canceled out by the matching process. OF couse, there
‘will be many differences betwen the adoptive families other than chss, and
iis possible that the variations in intelligence induced by these differences
in upbringing will stil swamp the effect being examined, Ideally, you would
‘want to find identical twins being assigned to Families difering only in their
social cas, but i is unlikely chat you would find enough, if ny, examples of
this to conduct the test!
2.6 Testing for a non-zero value of the
difference of two means
‘You have now used the ¢-test to examine the null hypothesis that ew
different conditions produce the same mean value of some random variable.
‘The method can also be used in a more general way to cet null hypotheses
thae suggest chat the mean ofa random variable, X, differs by a given amount
in the two conditions.
Hypothesis: for some given value of 5
Hy; The diference between the mean values of X in condition 1 and
condition 2 is &
‘Seow ony jo souasa\p aya Jo anpenosaz-voU & 04 Buse) 9
Aner to exenises ave availabe ot unchoddereducation som /cambridgeesis d2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS
‘Sample‘Two sets of observations of X, one set in each condition,
Let X; and X, be the random variables in the two conditions and n, and w, be
the number of observations under each condition,
Use these vals to calculate the sumple means X, and X, and the unbiased
pooled-smple estimator" of the population variance.
‘Then
provided that the random variable X's distributed normally inthe population,
withthe same variance in each condition, and chat the mull hypothesis true
"The manuicrarers ofa dieting compound claim tha the wie of their product
2s pat of acalorie-counting diet leas o an average extra weight los of at
least five kilograms in a period of months. An experiment bas been caried
‘out by 2 consumer’ group that doubts this lam
"The hypotheses are:
1); The mean ext weight losin a period of months fom adding the
dicing compound vo elore-counting de sive kilograms
{The mean extra weight losin a period of months om adding the
ticing compound vo alorc-counting dc ses han fie ogra
‘he asumption ae that the weight los ina period of months fom a
calorie ounting dic, wit or without the diting compound, sa normally
Gisuted random varible and that the ation ofthe etn compound
tothe det does not affect the variance ofthis random variable
“Thirty-six deers used the dieting compound wih hei dis heir weight ses
x, ((= 1, ..,36) in kilograms are summatised by the figures
Sx, =40022 Ye =o1023»
Sixty-two dieters lowed the sme caloie-countng procedure, but dd not
tse the dieting compound; thelr weight loses, = 1,62) i ilgras
tre summarised bythe igus
Sy,csr.et $yp = sotsao
Solution
‘These data give:
so that
eae RET EL.
1.37, 4, = 6433, 7= 9.22 and s,
326‘The test statistic is
and there are (size of sample 1 + size of sample 2 ~2) = (36 + 62—2) = 96
degrees of freedom,
"The critical region for a one-niled test with 96 degrees of ficedom at
the 596 significance level is < 1,661 using interpolation and so, since
3.144 < =1,661, the null hypothesis is rejected in fvour ofthe alternative
that the average extra weight loss is not as great as five kilograms.
‘Hsouiodiny £2
2.7 Hypothesis tests and confidence
intervals
‘There isa very close relationship beeween hypothesis tests and confidence
intervals, which should be clearly understood.
A hypothesis test suggests a value for an unknown population parameter
(che null hypothesis), and chen accepts this value if'a test statistic lies in a
particular range (chat is, lies outside the critical region). However, the critical
region depends on the hypothesised population parameter, so you can reverse
this process Thus, fora given value of the test statistic, you can determine
the range of values for the population parameter that would be accepted by
the test ifthey were offered as null hypotheses. This is called the confidence
interval for the population parameter.
‘y049)0 aou9pyuos pues
For instance in the case where you take an independent random sample of
size » fiom a normal distribution to test the hypotheses:
H,:Population mean =
H; (2) Population mean
or (b) Population mean >
or (cl Population mean <
‘The ees stati i
Fou
vi
and you accept the null hypothesis at the o% significance level if
Rou,
+
where 4,4, are the one-and two-tailed critical values respectively for the
‘-lstribution with n~ 1 degrees of freedom at the 4% level.
Anan exis ele added on ambient i
or ed2 INFERENCE USING NORMAL AND f-
& note
Usually two-sided
confidence
intervals are
used, as in (a
Alternatively, for a given value of ¥ you can view these inequalities as
constraining the range of values of, which would be accepted by the tet if
‘they were offered as nll hypotheses, and rearranging. them gives the (100 ~ %
confidence intervals.
fale
or bh e-t, or
a feuceen,
laste, fou
Confidence intervals for the difference of two means
from unpaired samples
“Two runners are being considered fora place in a team. They have each
recently competed in several races, though not against each other. Their times
{Go seconds) were as shown in the table below.
472 | 518 | 481 | 479 | #90 | 482 |
ws | a4 | 483 | wt | 476
“You can model the first and second runner’ times with variables T, and T,
‘with distributions Ni, 0°) and Nid, 6), respectively You are describing
their sunning times as normally distributed with different means and a
common variance. The different means reflect differences inthe runners
uunderiying ability; the random variability comes from factors such as the
influence of other runners and weather conditions for which the effects in
the different races ae independent.
‘Because you are interested in the difference in the runners’ underlying,
abilities, you are looking for a confidence interval for the difference between
and d
‘The sample means ofthe runnen’ times have distributions
(in 4 o2)and 7, ~(u4, £02)
0 that the distribution of their differences is
~T)~ Nt, - Hy 0° (
‘The standardised variable
then has an N(O, 1 distributionIfyou replace o* with its unbiased sample estimator,
p= C“DS += SE here 5} and Sf ate the unbiased sample
$= CAPSS DSE where and S} are the unbiased samp
cscimators of the variance ffom the ewo separate samples, hen, finally
(=7)-W~ 4.)
7,1
sir+3)
‘The critical value forthe tseibution with 10 degrees offcedom atthe 5%
significance level is 2.228, so that D lies between ~2.228 and +2.228 in 95%
‘of samples; that is, a 95% confidence level for (u,~ jt) is defined by:
D=
has distribution 4.
2B < ) < 42.228,
‘This ean be rearranged as
(GB) 2.2285 (5+ $) » X; and X, be distributed normally inthe population with a common variance
>» X, and X, be independent of each other in the population
» @be the pooled-sample estimate of the common population variance of
X,and X,
> tbe the two-sided a% critical value forthe t-stribution with (n, +n, —2)
degrees of freedom,
‘Then a (100 ~ 4% confidence interval for the difference in the means 4, and
1m, of X, and X, is given by
my aafbe Tet) <6
Aroves exes eet nachauna
B+ Ts
STearaju) aouapyuod pue sissy stsayi0dky Zz2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS.
Tn questions 1-3, you are expected to make a sensible choice of significance
level forthe hypothesis tests involved. Remember that the 5% level i
‘conventional in scientific contexts,
1A species of finch has subspecies on two different Galapagos Islands
"The weights of a sample of finches ffom each island are listed below:
fatela[alelalale|wlelatal
(i) Isthere evidence thae the finches on Daphne Major are heavier on
average than those on Daphne Minor?
(i) What assumpeions do you need to make? Are they reasonable?
2 Two groups of subjects are asked to volunteer for a psychology
experiment. One group is told chat they willbe paid $1 for participating;
the other that they will be paid $20. The experiment consists of a rather
lll task that must be repeated for one hour:subjects are then asked ¢0
tate how interesting the task was, on a scale rom 1 to 10, with 10 being.
the most interesting,
li) Test, using a 2-sample t-test the hypothesis chat the task was found
_more interesting by those who were paid les.
‘The ratings of the two groups were:
(i), State and comment on the assumptions you are making in onder to
‘carry out this test
(is) Could you devise a pared design for this experiment?tral eae aa
ee ee
on 0
‘The results he finds (measuring incomes in $ per week) are summarised
inthis table
‘29u—pyu09 pue Sisa) SISa\ROdKY 2
{il Show, using a 2-sample test, thatthe hypothesis that those staying.
‘on at school have higher incomes at age 24 is rejected, on evidence
‘of this sample. What assumptions are you making for a 2-sample
fees to be appropriate? How plausible are they?
(il) What other difference berween the two groups inevitably exist that
‘might explain this unexpected resul? How could you design an
experiment to eliminate this effec?
{In questions &~6, you need to decide whether the data are from an.
experiment with a paired or an unpaired design. You are expected to make
a sensible choice of significance level for the hypothesis tests involved,
Remember that the 5% level is conventional in scientific contexts.
4 Amongst all praying mantises, females are on average 7 centimetres
Tonger than males. new variety of mantis has been bred, the insects of
‘which are suppoted to be more nearly equal in size.
‘Tes the hypothesis hat the difference between male and female average
lengths is ess than 7 centimetres, using the lengths in centimetres ofthe
sample of twelve males and twelve females shown below, State clearly the
assumptions you are making in your test
ia [isa [19 | 3 [a7 | 130
wi | 202 [42 | 62 | 169 | a8
22 | 242 | 4 | a4 | 132 | 218
wos | 235 [168 | 119 132
Anowers to exenises ae avaiable at unnshod deredaioncom /cambvidgeests BH2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS.
5 In ct the data given in question 4 are pated: cach male ants spared
wrth ts ate in the flloming way
T]2[3]4
ia] [a9 [63
maz [ou | 124 [254
7[#19 [0
10.1 | 202 | 142 | 62
103] 235 [108 [119
36
87 [130
132 [214
n [2
io] 88
155 [132
li) Test ehe hypothesis thar male mantses are on average less than 7
‘centimetres salle chan their mates.
li) Explain cleatly what assumptions you make in tis case, and how
these assumptions differ from those you made in question 4
(i) Why are the results you obtain differen inthis ease ffom those you
found in question 4?
6 Ie x known from many studies that che best current post-operative
treatment reduces says in hospital after major operations, compared
‘with untreated patients, by an average of 6.2 days. A new treatment is
propored, with the hypothesis that this new treatment will reduce stays
in hospital by more than 6.2 days on average, and a trial is conducted on
‘wo groups of patients who have just undergone major operations. The
results are shown below.
“Test the hypothesis given, clearly stating the assumptions you are making,
In questions 7-9, you need to decide whether te at are fom an
cxperimene ih pred or an unpaired desig,
7 The masses in gems, of ine hens" epg and eight duck es ate
reeowed below
ala] s[a[a]»[ «es @
elapse pelos [spa
(Construct 95% confidence interval for the difference in mest
tne hess an ck es
(id) State the assumptions you are making in constructing this
confidence interval10
{A group of towers anda group of chess players have their esting puke
rates measured. These data ae shown below.
70 fz ]
é[a[7[~lal als
uz 9} | 92 | 79 pee | as }
ided 95% confidence interval, giving an upper Kimit for
Construct a one.
the extent to which the mean resting pulse rate of chess players exceeds
that of rowers.
‘The amount, p, of infestation of maiz fields by root nematodes, in grams
of the pest per square metre s measured in randomly chosen square
‘metre areas on 33 maize farms. Some of the farms have sprayed the
crops with @ new pesticide. The measurements are summarised in the
table below.
1490862
Construct 2 90% confidence interval for che difference in mean
infection between the sprayed and unsprayed crops.
Fish ofa certain species live in two separate lakes, A and B.A zoologist,
claims tha the mean length of fish in A is greater than the mean length
of fish in B.To test his claim, he catches a random sample of 8 fish from
‘Aand a tandom sample of 6 fish ffom B.The lengths of the 8 fish from,
Ain appropriate units areas follows,
153° 120 151 2B HB
Assuming a normal distribution, find a 95% confidence interval for the
mean length of fish in A.
‘The lengths ofthe 6 fish from Bin the same units, are as follows,
150 107 136 4 116 126
Stating any assumptions that you make, test at the 5% significance level
‘whether the mean length of fish in Ais greater than the mean length of |
fishin B.
Calculate 2.959% confidence interval forthe difference in the mean
lengths of fish fiom A and from B.
Cambege International AS & A Loe Further Mathis
9231 Paper 22 QUI November 2014
‘reAre}u eauapyues pue ss) saypodley [3
Answers to ecerises ave avilable t wunshoddereducation.com cambridevextrss i2 INFERENCE USING NORMAL AND (DISTRIBUTIONS.
2.8 Using the normal distribution
with two samples
In studying 2-sample t-tests, you had to make the asumption thatthe
‘variance in the population of the random variable you were sampling was
the same in both conditions. You then estimated this common variance from
the ewo samples. However, there are some situations in which you know
the variance of the whole population and you can use this information in a
hypothesis test or in constructing confidence interval.
For instance, it may be that, before the ability of the maths class to estimate
cone minute was tested (see page 43), extensive tests were conducted that
determined that, in the school population as a whole, students estimated
‘minutes are nortally distributed and havea standard deviation of 7.42
seconds You are testing the hypotheses:
H,There is no difference between the mean of people's estimates of
‘one minute before and after lunch.
H,:Afier lunch, the mean of people’ estimates of one minute tends to be
shorter than before lunch.
‘But you can now make the assumption that people’ estimates of one minute
ate normally disteibuted with standard deviation 7.42.
‘The null hypothesis implies that beforeTunch and after unch estimates have
lstributions N(, 742") where sis the common mean asserted by the null
hypothesis. With this assumption, che mean ofthe 24 before-lunch estimates
dns distribution
and the mean of the 22 after-lunch estimates has distribution
F=N(u, 22)
¥~N(u. 735")
“The dsibuton ofthe cliference ofthe two simple means is therefore
R-PaN(0, 2s 22)
In Palit Sts 2 you conve pots os with te norma
iad f Xba ation NO, ower vince os Known ten
the es tic ha the wanda normal din, NO, 1)-The tes
si hee
Ku-7
[Ee TE IT
ata Naat 2With the data used in the example on page 43,
so the test statistic has the value 1
SSB 05 «1659
7434+ 30
‘The critical region for a one-tailed tes atthe 5% significance level for the
standard normal distribution is = > 1,645. In this case, since 1.659 > 1.645,
you reject the null hypothesis and accepe the alternative, that the after Iunch
times are shorter than the before-lunch times.
Different known variances for the two samples
Alternatively, you might know separately the variances ofthe populations
from which each sample was drawn, where these need not be the same.
‘Suppose there are wo machines ina factory The first isa high-accuracy
‘machine, which produces bolts with radi that are normaly distributed with
standard deviation 0.052mm. The second isa lower-accuracy machine,
producing washers with internal radii that ate normally distributed with
standard deviation 0.172mm. Both machines are adjustable to produce
‘components with diferent radi, but today they are supposed to be set so that
the high-accuracy machine produces bolts with radii 2mm smaller than the
internal radi of the washers produced by the low-accuracy machine.
‘sayduses ony yim uonngursip eunvou aya Busy 82
‘To check whether the setting is correct, a sample of components is taken
from each machine, and the radius of each measured.'The results are shown in
the following table
wosz [toxz | 998 [1009 [ 1057 [1049 | 110
You ae testing the hypotheses:
[H,The mean radius ofthe bolts being produced is 2mm less than the mean
internal dius of the washers being produced,
H1The mean radius ofthe bolts and the mean internal radius of the washers
being produced do not differ by 2antn,
‘You can assume that the radi ofthe components being produced by each
normally distributed with the standard deviations given above.
IfX, denotes the internal radius ofthe washer, and X, the radius of a bol,
what isthe distribution of the sample statistic Xy — Xj?
machine a
Answers to exercises are onal at was hoddenedcation com fambridgvexteas a