100% found this document useful (2 votes)
4K views192 pages

Further Statistics

Uploaded by

Ada 孔
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (2 votes)
4K views192 pages

Further Statistics

Uploaded by

Ada 孔
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
Cambridge International AS & A Level Further Mathematics Further Probability & Statistics John du Feu R Series editor: Roger Porkess HODDER Bp emanc Gp HODDER ‘AN HACHETTE UK COMPANY ‘estos fom the Canbige rte A 8A Level Fer Mathumatis papers are podied Dy pemison ef mbes inertial aseaton Uns ater sclgowedge. the (stars eagle anon, and comments tat appar nhs bok were wt byte author. (Baige Assman Intatona doom bea nu espa forthe ecole anes to (ust tan fam pst qution pps nc re cotled Ins uit. "TGSE fe oie wadema. ‘Te users oul ie to tank he along who hae gn prison to epodie photographs nttetene Photo cede pt ©akaponChumchue/Shttestck 9.0 © wa Shatetck: p28 © Basan ATELY $Sjurtastack: pt ban Dobe; p21 Atha SinonShutestock: pASB Mite Mri /13. Every fet hasbeen mae to trace copyrit nd atone omership The pubis wl be gad {ovate suitable aranganent wih ay copesh holes wha has ot ben pele To coma. Hace Us pole Ito ea papers tht are ata renewable and eae pout and made from wood gown ssi es, The ggg and manufacturing ree are expected to ‘ofr ote envronnel eqn fe cur o rin Cer: plac contac Banc 120 Fak rv, Mian Park, Abingdon, On OH SE. ‘pone: (6) 183582720 Fx: (4) 01295 040 Ema dueontabeokgc.o. ines re pen om 9 no 8 pn, Mond fa etry, th a 2 How message arnerng sevice. You can ‘to oe though our weit wenodsendtn.con och of the materi nt bok wae pb aga as par of he ME Stated Matas Seve fsbo cael spn rte Carag taraonal AS 8 Level ahr Ratan: Splat: the engl NE author team fer states compre ln Cer, Micha Daves, ‘Rithony Een Bb Fanci, Gel God. Aan Gam, Nigel Gen, Lim Hennes, Roger Poke tnt cure Spe. © Ruger Pras and Job Feu 2018 Fit published in 2018 by oder Eduction, an Haat UX company mate Hour, 50 Vitara Ember {don ca 002 Inpeson mint 5.6321 Yer maa moat «2000 tame [Alig eee pa fom any we pert unde UX pyr am, op of this plain fyb ec r tana nny om by yest, tectonic chai ncaa otoopyng and ceding, bel thn ay iforatin sage an retieval ptm, without Feomsin rng om the use order Ueece om he Copyright sng ey Lie, Fler dette of un ees er epegtapheepection) muy be obtaines fom the Cnyiaht Uiensig agen Lined, nach Cover hao by Shuttentack/ amu eatin y Tose, Alar, Inc, an Teta Stn Seves "Typeset in Bro St 1/1 ngs Softee Services Pt, Pndicber, nla Ps in aly ‘catalogue cod for tid aval fom he Bish iran, sey sraxsiocaias Mix Paper fom reaps soron Es¢ FSC™ 104740 Contents Introduction iv How to use this book v ‘The Cambridge International AS & A Level Further Mathematics 9231 syllabus viii 1. Continuous random variables 1 41.1. Piecewise definition of a probability density function 2 4.2 The expectation and variance of a function of X 3 41.3 The cumulative distribution function 10 41.4 Finding the PDF of a function of a continuous random variable 19 2 Inference using normal and t-distributions 28 2.4 Interpreting sample data using thet distribution 28 2.2. Using the tstbution for pore samples 34 2.3. Hypothesis testing ona sample mean using the tstbution 35 2:4 Using the stnbuton with two samples 3 2.5. Comparison beeen pated and 2-sample tests 530 2.6 Testing fora non-zero value ofthe dference of two means $1 "7 Wypothess tests and confidence interals 3 {Using the normal cistribution with vo samples 60 2.9 Tests with large samples 65 3. Chi-squared tests 78 2.1 The chi-squared test for acontngeny table 79 3.2. Goodness of ft tests 33 4 Non-parametric tests 4:1 single-sampe non-parametric tose wt 62 Faited-sample non parametric tests 3 43 Theilconon enksum test te 5. Probability generating functions 156 5.1 Probabilities defined bya probabitty generating function 157 5:2 Expectation and variance 161 5.3 The sum of independent random variables 167 5.4 The PGFs for some standard discrete probability distributions 172 Index 179 Introduction “This is one of a series of four books supporting the Cambridge International [AS & A Level Further Mathematics 9231 syllabus for examination from 2020. It is preceded by five books supporting Cambridge International AS 8 ‘A Level Mathematics 9709. The five chapters in this book cover che further probability and statistics required for the Paper 4 examination. This part of the series ako contains two books for further pure mathematics and one book for farther mechanics, “These books are based on the highly successful series for the Mathematics in Education and Indastry (MEI) syllabus in the UK but they have been ‘tedesigned and revised for Cambridge International students; where appropriate, new material has been written and the exercises contain many ‘past Cambridge International examination questions. An overview of | the unite making up the Cambridge International syllabus is given in the following pages. ‘Throughout the series, che emphasis is on understanding the mathematics 3s well as routine calculations. The various exercises provide plenty of scope for ‘practising basic techniques; they alo contain many typical examination-style (guestions ‘The original MEI author team would like to thank John du Feu who has carried out the extensive task of presenting their work ina suitable form for (Cambridge International students and for his many original contributions. ‘They would aso ike to thank Cambridge Assessment International [Education for its detailed advice in preparing the books and for permission 0 use many past examination questions Roger Porkess Series editor How to use this book The structure of the book ‘This book has been endorsed by Cambridge Assessment International Education. Its listed as an endorsed textbook for students aking the Cambridge International AS & A Level Fusther Mathematics 9231 sllabus. The Further Probability & Statistic syllabus content is covered comprehensively and is presented across ive chapters, offering a structured route through the course ‘The book is written on the assumption that you have covered and ‘understood the work in the Cambridge International AS & A Level ‘Mathematics 9709 syllabus, including the probability and statistics content, ‘The following icon is wed to indicate material that is not directly on the syllabus. (© There are places where the book goes beyond the requirements of the syllabus to show how the ideas can be taken farther or where fundamental underpinning work is explored, Such work is marked ss extension, Bach chapter is broken down into several sections, with each section covering single topic. Topics are introduced through explanations, with key terms picked out in red. These are reinforced with plentiful worked examples, punctuated with commentary to demonstrate methods and ilstrate application of the mathematics under discussion, Regular exercises allow you to apply what you have learned, They offer a large variety of practice and higher-order question types that map to the key. concepts of the Cambridge International syllabus, Look out forthe following G Probiem-tolving questions wil help yo to develop the abiiy to analyse problems, recognise how to represent different situations inathematicaly identify and icerpret relevant information and select appropriate methods © Mosetting questions provide you wit a inuxkaction 1 the importane skill of mathematical modeling. In this, you tke an ‘everyday or workplace station, o one that aries n your other subject, and presen tin a form that allows you to apply mathematics @ Communication and proof questions encourage you to become amore fluent mathematician, giving you scope to communicate your ‘work with lear logical arguments and eo justify your ress Exercises alo include questions from real Cambridge Assessment International Education past papers, so that you can become familiar with the types of questions you are likely to meetin formal assesments Answers to exercise questions, excluding long explanations and proof, are available online at swwuchoddereducation com/cambridgesxtzs,s0 you can check your work. Is important, however, that you have a go at answering the questions before looking up the answers if you are to understand the ‘mathematics fall {In addition to the exercises, range of additional features are cluded to enhance your learning. ACTIVITY [Activities invite you to do some work for yourself, typically to introduce ‘yous to ideas that ate chen going to be taken farther. In some places, activities are also used to follow up work that has just been covered. Th real lif ts often the case that as well as analysing a situation or problem, you ako need to carry out some investigative work. This allows you to check whether your proposed approach is likely to be fruitful ot {o work at all, and whether it can be extended. Such opportunities are marked as investigations. (Other helpil eatoresinchde the following © Thissymbot hightighes point ie will benefit you to discuss with your teacher or flloestadents, to encourage deeper exploration znd mathematical communication. Ifyou ae working on your own, there ae answers availble online at www hoddereducation,com/cambridgeextas. @ Thisis a warning sign. It is used where a common mistake, misunderstanding or ticky point being described to prevent you fiom making the same error [A variety of notes are included to offer advice or spark your interest: © Note "Notes expand on the topic under consideration and explore the deeper lessons that emerge from what has just been done. Historical notes offer itoresting background information about famous mathematicians or results to engage you in this fascinating field © Technology note ‘tough graphical elelaars and compulers are nol permite inthe txaminatone fortis Cambridge International spiabus, we have included Technology notestonceste places here working wth hes ca be help for earning ond for teaching, Finally, each chapter ends with the key points covered, plus a list of the learning outcomes that summarise what you have learned in a form that is closely related to the syllabus. Digital support Comprehensive online suppor for tis book, including further questions, Js avallble by subscription to MET Integral online teaching and learning platform for AS & A Level Mathematics and Further Mathematies, integralmaths org This enine platform provides extensive, high-quality :esources including printable materi, innovative interactive activites and formative and summative asesiments. Our eTextbooks lnk seamlesty with Tncegra, allowing you to move with ease between corresponding topics in the eTextbooks and Integral METS Integral® material has no been through the Cambridge International endorsement proces The Cambridge International AS & A Level Further Mathematics 9231 syllabus ‘The syllabus content i assesed over four examination papers. + 60% of the AS Level; 30% ofthe ALevel *# Compulsory for AS and A Level Paper 3: Further Mechanics #-Thour 30 minutes ‘4036 of the AS Level; 20% of the ALevel Offered as part of AS; compulsory for A Level Paper 2: Further Pure ‘Mathematics 2 + 2hours #3096 of the A Level + Compulsory for A Levelsnot a rouce to AS Level Paper 4: Further Probability & Statistics ‘© T hour 30 minutes ‘+ 409% of the AS Level; 20% of the Aleve! + Offered as part of AS; compulsory for/A Level ‘The following diagram illstates the permitted combinations for AS Level and A Level AS Level Further ‘Mathematics Paper 1 and Paper 3 Further Pure Mathematies 1 and Further Mechanies Paper 1 and Paper 4 Further Pure Mathematics 1 and Further Probability & Statisties Prior knowledge A Level Further Mathematics Paper 1,2,3 and 4 Further Pure Mathematics 1 and 2, Further Mechanies and Further Probability & Statisties Ie is expected that earners will have studied the majority of the Cambridge International AS & A Level Mathematics 9709 syllabus content before studying Cambridge Intemational AS & A Level Further Mathematics 9231 ‘The prior knowledge required for each Further Mathematics component is shown in the fllowing table, 9231 Paper 1 9709 Papers Land 3 Further Pure Mathematics 1 9231 Paper 2: Further Pure Mathematics 2 9709 Papers 1 and 3 9231 Paper 3 |9709 Papers 1,3 and # Brae | Further Mechanics = ae ‘9231 Paper 4: 9709 Papers 1,3,5 and 6 Further Probability & Statistics For Paper 4: Further Probability & Statistics, knowledge of Cambridge International AS & A Level Mathematics 9709 Papers 5 and 6: Probability & Statistics subject content is assumed. Command words ‘The table below includes command word used in the asessment for this syllabus. The use of the command word will relate to the subject context, Calculate | workout fom given ft, gues oF information Dediuce conclude fom sable infra Derive | aban something (expresion/equaton/ value) fom another z by a sequence af logical eps Eca| Describe [sete the poins of topic / give asics and main Ratu Desernine—[enblsh with certnty Evaluate | judge or ealulte the quality. nportnce amount, oF vale ‘of something Explain [set out purposes or reasons / make the eelationships between things evident / provide why and/or how and support with relevant evideve : entify | mame/select/recognise Tnterpret | identify meaning or significance in relation to the context Josty support a case with evidence/argument — Prove confirm the truth of the given statement using a chain of logical mathematical reasoning ‘Show (@hat)_| provide structured evidence that leads to a given reaule ‘Sketch __| make a simple fcehand drawing showing the key features State expres in clear terms Verily confitm a given starement/result s tue Key concepts Key concepts are estential ideas that help students develop a deep understanding of mathematics. ‘The key concepts are: Problem solving Mathematics i fundamentally problem solving and representing systems and ‘models in different ways, These include: > Algebra: this an esiential tool which supports and expresses ‘mathematical reasoning and provides a means to generalise acros a number of contexts, >» Geometrical techniques: algebraic representations also describe a spatial relationship, which gives us a new way to understand a situation, >» Calculus: this i fandamental element which describes change in ‘dynamic situations and underlines the links between functions and graphs. >» Mechanical models these explain and predict how particles and objects ‘move or teain stable under the influence of forces. >» Statistical methods: these are used to quantify and model aspects of the ‘world around us, Probability theory predicts how chance events might proceed, and whether assumptions about chance are justified by evidence. Communication “Mathematical proof and reasoning is expresed using algebra and notation so that others can follow each line of reasoning and confirm is completeness and accuracy. Mathematical notation is universal Each solution is structured, bat proof and problem solving also invite creative and original thinking. ‘Mathematical modelling ‘Mathematical modelling can be applied to many different situations and ‘problems, leading to predictions and solutions. variety of mathematical Content areas and techniques may be required to create the model. Once the ‘model has been created and applied, che results can be interpreted to give predictions and information about the real world, ‘These key concepts are reinforced in the different question types included in this book: Problem-solving, Communication and proof, snd ‘Modelling. ‘The control of large numbers is possible, and like unto that of small numbers, if we subdivide them. Sun Tzu, ‘The Art Of War’ (S40c-4960c) Continuous random variables ‘You will reall having met probability density functions (PDFs) for continuous random variables in A Level Mathematics.To find probabilities using a probability density function fx) you need to integrate the fianction between the limits you ae using somes x» as putt of an experiment you are measuring temperature in Celsius but then need to convert them to Fahrenheit: F= 1.8C + 32 >» you are measuring the lengths of the sides of squate pieces of material and Seducing their areas: = 1? > you are estimating the ages, A years, of hedgerows by counting the number, n, of types of shrubs and trees in 30m lengths: A = 100 ~ 50. In fact, in any situation where you ate entering the value of a random, variable into a formula, the outcome will be another random variable that is a function of the one you entered. Under these circumstances you may need (0 find the expectation and variance of such a fnction of a random variable Anse to exes ae avilable at wasuhoddaducaton com abides il ‘1 CONTINUOUS RANDOM VARIABLES. Fora discrete random variable, X,in which the value x, occurs with probability p, the expectation and variance ofa faction g(X) are given by B(g()) = Esl), Var(a(20) = lel)! 2, {HCO} “Te equtlene res continzous random variable, X, with PDF fi) ate H(A) = J Geax Var(s(X)= J (9()'a)de— {EG “You may find it helpfl to think of the function g(X) a6 a new variable;say,¥ "The continuows random variable X has PDF ffx) where fie for0 = x= 2 fix) ={4k—ke for <4 0 otherwise li) Find the value of the constane k Ui) Sketch y = fa). (ii) Find PL = X= 3.5), ‘The continuous random variable Y= X?, (iv) Find EY). Solution a [fteaes [Sar tayae=t [les] 1 ‘K2—0) + k[(16 - 8)- BB =1 ae abt (i) id MMSE i consinsous random variable Xhas PDF fe) given by o ti ‘i . = A Figure 12 PUsXx=35) = PPheacs ["(-fe)ae “(TEs =f Be[bs- 88-4] =2 F-08375 nea [iatdedes f'aa(t-fa)ax = Jibedee f'(e -fel-fs-a] ~o-94[($-1)-(G-1]] -}o)ax rqy=fs rO<#<10 0 otherwise. Find E@X+ 4). Find SE(X) +4, Find Var3X + 4). Verify cht Var(3X + 4) = 3°Vir). Ars cenin coulda niniindatinntantoes il 3 3 5 : : 3 : Solution Here you ae using Wi EGX+4)= J" Ger 4yehae ] HeCo)= fecooae x80 { = [hoe +40ae =2044 =m (i) 3E(x)+4=3)" xghaeo4 cereus | f ae Var(e(3)) = I(eG0)" eax [ioe] +4 ~fe(e)F =20+4 0d tom a eu roma = [Notice here that E(BX + 4) = 24 = 3E(X) +4 Ui) To find Var (3X +4), use ac +4)= [Ne oh aceon = J 00 +28 +165) dx —576 You then multiply A[ft vee +se]'-s16 ost the br nd enalipi by Us senso sen] se = 59% 31300-5376 | You may recall rom A =50 ep Ff 2) = fey iv) Var(X) = E(x?) 7 iv) Var(xX) = (X?)-[B(X)J a BC) = [2 ls)ae & BO) = fe ghede BO) = [xgpeae BEM) fsperde BOD= [peter MB a candor variable Xhas PDE # B(x) = [855 20)=[ p=]. 1 E(X*)= 50 E(X) = 64 Here you ae using var(x) = B(2)= [BOX From part lil, Var(3X-+ 4) = 50 So Var(3X +4) = 3 Var(X) a5 required ex feoce = modelo predict the ow of runners, | 2, 4 and particularly their finishing times. ay 7 We are offering a prize of $150 for [31 | 5 the best such model submitted. = [An entrant for the competition proposes a model in which a runner's time, X houts,is a continuous random variable with PDF 1 As) 4 2 Bee-N-a foriex4, Iti possible co use definite integration, bur this eauses a problem as you _gmot use the same fete fo bth init of the tera nd sche variable ‘integration. So you would have to change the variable of integration, ‘which is dummy variable as it does not appear in che final answer, to a dlferen letter, ‘To find the proportions of runners finishing by any time, substicute that value for 3380, when = 2 ‘You would not be correct to write down an expression ei= [f fie—na—are since woul then be both limit of the integra andthe variable used ‘within it To overcome this problem you use @ dummy variable, say, 1, 50 {hat Fs) snow writen ree femme Fa) = dat dca Mae Haan Ue = 0.41 to two decimal places. INcoRRECT correct Here isthe complete table, with all the values worked out 100 | 000 | 000, 125 | 00+ | ans 150 | 0x3 | as. 175 [026 20 | oat | 049 225 | oss | 57 350 | 069 | 075 300 | 089 | ost 350 | 098 | 099 4.00 100 | 1.00 Notice the distinctive shape ofthe curves ofthese functions (Figure 1.8), sometimes called an ogive. Fe), Uonauny donngusyp eanejnuind ayy 1 os os © Note Youhave probaly ma hs Shape already eee pate eve eee a ‘ r 7 3 T stom) ‘A Figure 1.5 7) Regen inertia ities the organising committee what more might you look for it a model Properties of the cumulative distribution function, F(x) ‘The graphs on the next page, Figure 1.6,show the probability density fanction f(x) and the cumolative distbution function F(s) of atypical continuous random variable X. You will see that the values ofthe random variable always lie between and 6. Answers to exercises oe avilable at wus adder duction com /ambridgextas i 9 Feo ‘A Figure 1.6 ‘These graphs illastate a number of general results for cumulative distribution fnctions. 1 F(a) =0. forx< athe lower limit of x ‘The probability of X taking a value less than or equal to a is er; the value of X must be greater than or equal t a 2 F(x) = 1. for x b,the upper limit of =. Xcannot take values greater than b 9 Pes X54) = Fla) FO PCS X= A= P< A) -PIXS A ‘This is very usfil when finding probabilities fom @ PDF or a CDE. ‘1 CONTINUOUS RANDOM VARIABLES. FOR. Faro, Reared ] : 0 Cremer: To A Figure 17 Fa i 7 TE 7 4 The median, m satisfies the equation F() = 0.5. P(X m) = 05 by definition ofthe median 1 ve "0 ee os| | at * : toh + : : —* > g A Fgure 1.8 is 5 f)= 2F@)=F@) ig Since you integrate f(x) to obtain F(a), the reverse must aso be true: differentiating F(x) gives fx). 6 P(x) isa continuous function: the graph of y = F(x) has no gaps. © Notes 1 Notice the use of lower and upper case letters here. The probability density function is denoted by the lower case f whereas the cumulative istribution function is given the upper case F. pera ern (n "A machine saws planks of wood to a nominal length. The continuous random variable X represents the erzor in millimetres of the actual length ofa plank ‘coming off the machine. {The variable X has PDF fx) where 0x (x)=) 50 lo otherwise foro 10 ‘The graph F() is shown in Figure 1.10. Fay ° 7 7 = T > ‘A Figure 1.10 lw) P@= xX <7)=F)-F) Pale = 091-036 = 055 (The median value of X's found by solving the equation Fim) = 05: eatiee dw jlym? = 05. ‘Thisis rearranged to give m? —20n+50=0 20 Ja —4 50 i ‘m= 2.93 (or 17.07, outside the domain for X), ‘The median error is 2.93 mm, (vil The customer rejects those planks for which 8 = X= 10, P@ =X 10) = F(10) - Fs) = 1-096 £0 4% of planks are rejected, MESES a. -5 oF continuous random vavable Xis given by s for0 12. (ii) The graph of F(x) is shown in Figure 1.12. ry 06: a Coenen cane eerie = Figure 1.12 [MESENGER The continuous random variable X has cumulative distribution fonction F(s) sven by 0 forx <2 =\z_1 for2ex Fal=)5-3 for2=x <6 1 forx>6, Find the PDF f(s). Solution f= Leo) Ea bre)=0 for <2 fa)=) Zrw=% for26 1.4 Finding the PDF of a function of a continuous random variable The cumulative distribution function provides you with a stepping stone between the PDF ofa continuous random variable and that ofa function of that variable. Example 1.7 shows how itis done. Anse teenies ae veo wudnt mbes 4 ‘1 CONTINUOUS RANDOM VARIABLES ‘A company makes metal boxes to order. The basic process consists of euting four squares off the corners ofa sheet of metal, which i then folded and welded along the joins. Consequently for every box theze ae four square coffcuts of waste metal. A Figure 113, ‘The company is looking for ways to cut costs and the designers wonder if anything can be done with chese square piece. They decide in the fst place to investigate the distribution oftheir szes.A survey ofthe lage plein their serap area shows that hey vary in length up toa maximam of 2 decimetres, Tris suggested thei lengths in decimetres can be modelled as continuous random variable L with probability density function Lae soa [tO-8) rose 0 otherwise Assume this model tobe accurate li). Find the cumulative distribution function for the length of a square. lil Hence derive the cumulative distribution fanction for the area of square, lil) Find the PDF for the area of a square. liv). Sketch the graphs of the probability density functions and the ‘cumulative distribution functions of che length and the area. (0) Find the mean area ofthe square offcuts when making a box. Solution {) ThecDFis 4 Note the ue of F()= [.4(4 ~12)4ue J asa ummyverabe ra ie] fxt 0) Fa, Since F)=5-% for0<1=2 pce elige (hes i fora> 4 lil The PDF for the area of a square is found by differentiating H1@) 4-4 for0 eyo woyuny @ 0 40¢ 24) BupuLy #1 livl_ ‘The graphs of the PDFs ofthe length and the area are shown in Figure 1.14 {al an, Ib} ie) to 1 0s os 5 7 on di 2 3 4 = a Figure 136 1 ti) fro Answers o exeises are avilable t wun hoddened uation com /cambidgestnas 1 CONTINUOUS RANDOM VARIABLES 1a B 2. fra=0 fords a=4 Ef fora> 4, a Mean = E(4) = [sbvo)4e W-9 8 ‘This could also have been found as the mean of a function of a continuous random variable, using the general result, Bteeal= J eemteyax ‘here xis the length Inot the area) of one of the squares. Inthisease, ghs)= = f(x) =¥(4—x!) and oe x<2 ging le X0]= fs F4-a?)ae le. the same answer WMS 1 continuous random variable has PDF Aa) where yal? OSes = lo otherwise. ‘igeuen wopues snonuuoa e jo wonsuny © 0 4a ay) Bujpury 1 lil Find BQ9, lil Find the cumulative distribution function, F(a). lil) Find PO = x= 2) using fa) ee) tb) ax) and shout your answer isthe same by each method. 2 The continuous random variable U has PDF flu) where iu for Sus 8 Of Shem [il Find the value of & lil Sketeh fl. (ill Find FQ. (iv) Sketch the graph of F(). 3. A continuous random variable X has PDF fls) where xt for text cafe Ui Find the ae oF (i) Find Fe lil) Find the median of X Ui) Find the mode of X Aner weer ania inilecttincntenitecin il 1 CONTINUOUS RANDOM VARIABLES ‘The continuous random variable X has PDF fx) given by & £(@)=]@+0" 0 otherwise, fors=0 where kis a constant li) Show that k = 3, and find the cumulative distribution fiction, Ui) Pid abo he vue of: sock ae C=) ~ Z “The continuous random vrsbleXhas CDF gen by 0 forx <0 F(x) = 20-2 for0Sx st a forx>1, {il Find PEX> 05). Ut nde wl of uch da P< = 4 Ul inde PDF eof Xan ste is gph he coins nom able has PDR) hea by ioms) e053 w-fr? ee case Show tat b= yan find the we of (8) and Y() Find the cumulative distribution function for X,and verify by calculation that the median value of Xis between 1.04 and 1.05. {A continuous random variable X has PDF fs) where féx(1-x) ford 1 ns Find the mean of Xand show that othe variance of Xi 0.05 ‘Show that F(a) the probability hat X= x (for any value of x between O and 1), satisfies 0 forx <0 F(s)={38?-28)GeOS est 1 fers >t Use this sue to show that P(|X= 1] < 0°) = 0.1495 “What would this probability be if instead, X were normally distributed? ‘The continous random vatable X has PDF (x) given by {e for OSS 1 f(x), 0 otherwise " (i) Find Fes) ‘The contnnonn rnd varale Vi given by Y= 30, The cumulative isin incon oF Ys denoted HG 1 (ii) Find HQ). (iii) Find hy). liv) Find P(X < 0.5). (w) Find POY < 05) The continuous random viable X has PDE Qe) given by ; =po ford 0. (v)_ Find J(2) where J(2) i the cumulative distribution faction of Z. ‘sigetseA wopues snonuuoa e yo uoNsuny © 0 4a oui BupULY 1 (wi) Find vi) Find PZ < 3) ‘The continous random vacable X has PDF ff) given by 02 for30 beats pcerene|taner 2G fte)de va(a)= Joereyas—[E007 the mean of Xs the fr which J tonae and f11e)40=05 ‘he mode of Xi che valor which ) ha gress mile ‘proba deity Scton maybe di peeve 2 IEIN] va mein of Xen the expectation and vain of Xe £(@(%0) = Joinex var(a(X))= Ji@te)*A(=) d= —(B(60)" 2 is he probaity density faction ofX then the cursive disrbation finaion (CDF) of Xis Fa) where Bp) athe ei fey= re for the median, m, Fin) = 05. 4 Given that fla is the probability density function of X, you can find that ofa related variable (eg. ¥ = X?. ‘To do this you need fist to find the cumulative distribution function lof X and use this to find that of Y, You can then differentiate the ccuntlative distribution function of Yt find fly), the probability density function of ¥. "Now that you have finished this chapter, you should be able to Tos probity sty Fanction chen bs dean ae fen aren en eo esr erag aye tote eee re ce ere etn ae eee 2(X) is a fiction of X se the general result Var(e(X)) = J (e(s))" A(x) de—[E(@(X))}° ee et cea Ceca ae random vibe X and gies foci of unlettnd aod ww the aor Borweea the pot deziey fineion (PD and the culate dab ncn (CDF) sw: aPDF or CDF'ts enlace pobabiis ea EDRera CDF io eae the median another percentiles eee 09 @ 0 uojouny @ 0 Jag ayy Bupuly | Anacrcein mauit eniiatcnteniccte il Inference using normal and t-distributions Every experiment may be said toexist only inorder to dive the facts chance of disproving the null hypothesis, RA, Fisher (1890-1962) 2.1 Interpreting sample data using the t-distribution 2 INFERENCE USING NORMAL AND (-DISTRIBUTIONS. Students find new bat ‘Two students and a lecturer catching them for specimens,’ she hhave found their way into the explained. textbooks. On a recent field trip he other wo members of the they discovered a small colony of mg afprevionaly unkoven bat ving Sy Resuter Jarwinckr Pal and iu 21-year-old Vijay Kumar, showed aan lots of photographs of the bats as “Somewhere in Northern India’ well as pages of measurements is all that Shakila Mahadavan, that they had gently made on the 20, would say about its location. few bats they had caught before “We don’t want the general public releasing them back into their disturbing the bats or worse still cave ‘The deviation is the difference eor-lof the value from the ‘mean, In this, example the mean is 146, You need to know the degrees af freedom in many situations ‘where you are caleulating confidence intervals or conducting hypothesis teste ‘The meatremens fed in the end he weigh. a Shiu tree eid se aks we sess Fos gta wat sina he mean eg fa Inland 99 cones st ge iis laf he nese pera seth ely erent Sh Alors nom ow the pt popula bwin cn inte fon he et mesure ow mete te mer no Sf drson eu oui bua te ek ‘The mean is estimated to be the same asthe sample mean 156:4132+ 160-4 142-4 145-4138-4 1514144 _ 5 46, ‘When it comes to estimating the standard deviation, seart by finding the sample variance (,=3)" acon) and then take the square root to find the standard deviation, « “The use of (n~ 1) as divisor illustrates the important concept of degrees of freedom, ‘The deviations ofthe eight numbers areas follows. Uwonnguisip- ay) Buren evep eyduses Buranda) 1 136-146, 132-146 160-146 142-146 =—-4 145 ~146=—1 ; 138-146 =-8 151-146 144 146 ‘These eight deviations are not independent they must add up to zero because of the way the mean is calculated. This means that when you have ‘worked out che first seven deviations it is inevitable that the final one has the value it does (inthis ease ~2). Only seven values of the deviation are independent and, in general, only (n~ 1) out ofthe n deviations fom the sample mean are independent, Consequently, there ate n~1 free variables in this sation. "The number of five variables within a system is called the degrees of freedom and denoted byw. Answers exes are available at wa hoddeneducaton com abridge i 2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS. Aparticular value of the sample variance te denoted by the associated random variable bys. So the sample variance is worked out using divisor (n ~ 1). The resulting value is very useful because it is an unbiased estimate of the parent population variance. In the case of the bats, the estimated population variance is ‘The numbers onthe top t 100, 196 and s9 on are the ‘squares ofthe deviations, 7 DUDE 96 FI L164 25-8) «gg N86 = 9.27, and the corresponding value ofthe standard deviation is Calculating the confidence intervals [Returning to the prablem of estimating the mean weight ofthe bats, you ‘now know that Fad, 2 = 86, [Before starting on further calculations, there ae some important and related points to notice 41 This is a small sample. It would have been much beter if they had managed to catch and weigh more chan eight bats. =9.27 and y=8-1=7 2 The true parent standard deviation, ois unknown and, consequently, che standard deviation of ehe sampling distribution given by the central limit theorem, -&, is also unknown. oa 3 In situations where the sample is small and the parent standard deviation, or variance is unknown, there is litle more that can be done unless you can assume that the parent population is normal. (In this case that isa reasonable assumption, the bats being a naturally occurring population ) Ifyou can assume normality; then you may use the distribution, cstimating the value of o fom your sample. 4 It is posible to test whether a set of data could reasonably have been taken ffom a normal distribution by using normal probability graph paper. ‘The method involves making a cumulative frequency table and plotting points on a graph with specially chosen axes. Ifthe graph obtained is Approximately a straight line, then the data could plausibly have been. drawn from « normal population. Otherwise a normal population is woikely. ‘The ristribution looks very ike the normal distribution. Its exact shape depends on the number of degrees of freedom, v, and, indeed, for large values of v it slit different from it-The larger the value of v, the closer the tdistribution isto the normal Figure 2.1 shows the normal, dlistibution and tdistributions v= 2 and y= 10. araon rs ‘A Figurez © historical note ‘Wiliam 5. Gosset was born in Canterbury, UKin 1876, After studying both ‘mathematies and chemistry at Oxford, he joined the Guinness breweries in Dublin as.a scientist. He found that an immense amount of statistical data ‘was available, relating the brewing methods and the quality ofthe ingredients, particularly barley and hops, tothe finished product. Much of this data took the form of samples, and Gosset developed techniques to handle them, including the discovery ofthe t- Answers 0 exercises are avilable at came g 3 2 : 2 2 : ‘A Figure 23 (i) The confidence interval does not contain 95 minutes (the time tken by the tran). Therefore there is suficient evidence to suggest thatthe journey time by busi different ffom that by train, and chat iti, in fact ls 2.2 Using the ¢-distribution for paired samples “The ideas developed in the lst few pages can also be used in constructing confidence intervals forthe difference inthe means of paired data. This is shown in the next example, In an experiment on group behaviour, 12 subjects were each asked to hold one arm out horizontally while supporting a 2kg weight, under two conditions: >» while together in a group >» while alone with the experimenter "The times, in seconds, for which they were able to support the weight under the two conditions were recorded a follows. A[s[e[pl[elr[é[n[rfs[K[e ot [7 | 72 [ 53 | 71 | 48 | 85 | 72 | 82 | 54 | 70 | 73 4 | 72 | 81 | 35 | 56 | 39 | 6 | 66 | 38 | 60 | 74 | 52 aa [-1|-9|ia|is| «|| «| 4a[—«| afar The ‘true ditference means the diference inthe times in the population aa whole You fannot know with certainty the differences inthe population bout you ean infer from the sample confidence Interval forthe population, Find 2 90% confidence interval for the tue diference between ‘group’ times and alone’ simes. You may astume thatthe differences are normally stributed. Does your result provide evidence that there is any difference in the times inthe population as a whole? Solution ‘The sample comprises the 12 differences Mean Standard deviation Degrees of freedom Given the assumption thatthe differences are normally distributed, you may tse the e-stribution, For y= 11, the two-tiled critical value from the tlistribution at the 10% level of significance is 1.796, ‘The 90% symmetrical confidence interval forthe mean difference between, the ‘group’ and alone’ times is 7-196 tod +1.796x— a Via 2.77 to 18.57, Since the confidence interval does not contain zero, there is evidence that there isa difference in the times in the population as a whole. 2.3 Hypothesis testing on a sample mean using the r-distribution ‘At the beginning ofthis chapter, you mec the dlstribution. This isthe distribution of a sample mean when the parent population is normally distributed but the standard deviation ofthe parent population is unknown and has to be estimated using the sample standard deviation, s In addition to finding a confidence interval, you can alo carry ou a eeest (4 hypothesis test based on the t-disteibution) ‘Tests ate being carried out on a new drug designed co relieve the symptoms ‘of the common cold, One of the tests is 0 investigate whether the drug has any effect on the number of hours that people sleep. "The drug is given in tablet form one evening to a random sample of 16 people who have colds. The number of houss they sleep may be assumed 0 bbe normally distributed and is recorded as follows. » Aout eves voile nuded com ambien: = i 3 INFERENCE USING NORMAL AND f-DISTRIBUTIONS. a1 «673372 BL 9 OTH 64 69 70 78 67 72 76 79 ‘There is abo a large control group of people who have colds but are not given the drug The mean number of hours they sleep is 6.6, lil Use these data to set up a 95% confidence interval for the mean length ‘of time somebody with a cold sleeps after taking the tablet lil Carey out a test at che 1% significance level, ofthe hypothesis that che new drug hat an effect on the number of hours a perron sleeps Solution lil For the given data, = 16,0=16~1=15, ¥ = 7.094, s= 1.276. For a 95% confidence interval, with » = 15, k= 2.131 (fom tables) ‘ 1.276 $= 7.09442.131x Va vie So the 95% confidence interval for is 6.41 to 7.77 ‘The confidence limits are given by $k sion room| romes2te lB am ‘A Figure 24 (il) Hy There isno change in the mean number of hours seep. w= 6.6 H:There is a change in the mean number of hours sleep. # 6.6 ‘Two-tled testa the 1% significance level, For this sample, n = 16, = 16— 1 = 15,¥ = 7.094,s= 1.276. "The critical value fort for v= 15, atthe 19% significance level, is found from tables to be 2.947, 7.094 - 66 1276 Ve vis ‘The tee static 155. ‘hiss to be compared with 2947 he xa vale, forte 1% sigan on Since 138-<2947 therein reton tthe 1 signifcance eto Te the lbp ‘There isinsuficient evidence to suggest thatthe mean number of hours sleep is dfferene when people tke the drug. I you have access toa computer, you ean use a spreadsheet (s0e Figure 2.5) is ‘ado all ofthe calculations for this test using the fllowing steps. t Dam is 1 Ener the data ln this case int cls 80817 : uy | ie 2. Use ne ependseet tunctone provided by your spreadsheet tr example | 1 § AVERAGE and-STDEV Io find he meen aedsonglestneardaenatons | = i 2 calculate the ralue using thefommula@1e-eéViBTvsaRTISH, | 4 é 4 Use the spreadsheet function provided by your spreadsheet, for example | 60 3 =TINV.2TI0.01,15) to calculate the critical value, o 14 ’ 5 You can also find the p-value using the spreadsheet function provided by oe 3 Your spreadsheet fo amples DISTZTI SUED) " ol] | i a sal iy A a & 1a —Pa : oof ae z In this case, the 20 16 g Been Gites : egrat 12 Chil [38067 g cells B18 and 23 pele | 0.25 3 a ‘A Figure 25 ‘The mean of a random sample of seven observations of a normally dliscributed random variable X i 132.6. Based on these seven observations, a unbiased estimate of the pasent population variance # i 140.84, li) Explain why an estimate ofthe standard error is given by 4.61 lil Show that a 95% confidence interval for the mean j of Xis 121.3 to 39. 2 The weights in grams of six beetles of a particular species are as follows. 123 97 118 104 12 124 lil Calculate the sample mean and show that an estimate of the sample variance is 1.291, lil, Show chat 2 90% confidence interval for the mean w of X's 10.32, to 1218. eno ei sii editing i 3 3 g z : z : 8 t ‘An apstude test for entrance to university is designed to produce scores ‘that may be modelled by the normal distribution. In ealy testing, 15 sadents from the appropriate age group are given the test Their scores (out of 500) are as follows 321445219378 4072895 2764636565298 li) Use these data to estimate the mean and standard deviation to be ‘expected for students aking this ret lil Construct 295% confidence interval for the mean, A fit farmer has a large number of almond trees, all ofthe same variety and of the same age. One year, he wishes to estimate the mean yield of hs trees, He collect all ehe almond fiom eight tees and records the following weights (n ke). cr a li) Use these data to estimate the mean and standard deviation of the yields of all the farmer’ tres lil Construct 95% confidence interval for the mean yield. li) What stastical assumption is required for your procedure to be valid? liv) How might you select a sample of eight trees from those growing in a large field? A forensic scientist is trying to decide whether a man accused of fraud ‘could have written a particular letter. As part of the investigation she looks at the lengths of sentences used in the letter. She finds ther to have the following numbers of words, 7 18 2 4 18 16 14 16 16 2 2 19 lil Use dhese data to estimate the mean and standard deviation of the lengths of sentences used by the leter writer lil Construct 29096 confidence interval for the mean Tength of the lester writer’ sentences, lil What assumptions have you made to obtain your answer? liv) A sample of sentences written by the accused has mean length 26 ‘words. Doss this mean he is innocent? A large company i investigating the numberof incoming telephone call 2 its ‘exchange in onder to determine how many telephone lines it should have. On Sundays very few calls are received because the office i closed. During March ‘one yer, the number of calls received each day was recorded, as follows. 655 | 661 | 599 ‘08 | 650 | 901 [a [sar [oe [rar [ose To [saw | or | ia | 706 | eo a (i) What day of the week was 1 March? (il Which of the data do you consider relevant to the company's research and why? lil Construct 2 95% confidence interval for the number of incoming calls per weekday. liv) Your calculation is eiticsed on the grounds thae your data are discrete and so the underlying distribution cannot possibly be normal. How would you respond to this criticism? Ayre company is tying out a new tread pattern, which i s hoped will result in the tyres giving greater distance. Ina pilot experiment, 12 eyes are testo the mileages (»1000miles) at which they are condemned are as follows: 6 BN BE © 0 B81 2 6 B B {il Construct a 95% confidence interval fr the mean distance that tyre tavels before being condemned. {il Whaeassumpeions,satsical and practical, are required for your answer to pare (it be valid? ‘Ange fshing-hoat made a catch of 500 mackerel from a shosl-The cot mas of the eatch was 320k. The standaed deviation ofthe mas of individual mackerel is known to be 0.06kg (Find a 99% confidence interval for the mean mass ofa mackerel in the shoal ‘An individual fisherman caught ten mackerel from the se shoal These had masses (in kg) of 1.04 0.94 092 0.85 085 0.70 0.68 0.62 061 059 Ui} From these data only use your calculator to estimate the mean and sandard deviation ofthe masses of mackerel in dhe shoal Ui Assuming the masses of mackere! ae normally dstibated, se your results fom pat (il find another 99% confidence interval fr the sean mass of a mackerel in the sho. liv). Give two statistical reasons why you would use the fis imits yous calculated in preference co the second limit. Answers to esenizes ae available at wns hodderedusation com /eambr Uuounguisyp- aq) Bursn vesw ojduues e uo Busey sisauiodiy ee 3 ; : Z i : 3 3 8 3 é a A new computerised job-matching system has been developed that finds suitably skilled applicants to fit notiied vacancies. It is hoped that this will reduce unemployment rates, and a tral of the system is conducted "The unemployment rates in each area just before the introduction ofthe system and affer one month ofits operation are recorded in the table below. s[e[? 46 [na [77 Ty? [ae ZI EH li Find 2 90% confidence intra forthe ue ference between the to ates of unemployment. li) Does your confidence interval provide evidence that there a difference in the ates afer the new sytem is inroduced? Ui) Do you think the asumptions required to construct the confidence iweva are jst here? ‘Two timekeopers at an athletics track are being compared. They each time the nine sprints one afternoon. lil Find a 9996 confidence interval forthe cue difference between the times recorded by the ewo timers. The times they record are listed below, 12 +[s[e]7[s]e 9.68 | 10.01 | 9.62 |21.90| 20:70 | 20.90 | 42.30] 43.91 | 13.96 9.66 | 9.99 9.44 | 22.00] 20.82 | 20.58 | 42.39 | 44.27 [44.22 Ui Do you think that the two timers are equivalent on average? li) Are the assumptions appropriate for a confidence interval based on. the ‘test justified in this case? Fourteen marked rats were timed twice as they ran through a maze. In ‘one condition, chey had just been fed; in the other they were hungry. (i) Find a 95% confidence interval for the true difference between the rts’ times when they are fed and when they are hungry. The data below give the rats’ times in each condition, Al®[¢]>[e[F]o[n] i]s] «| [MIN 30} 31 |25|23| 50] 26] 14] 27 | 31 | 39 | 38 | 39 | 44 | 30 29/18] 14 | 27) 37] 34] 15] 22 | 29 | 18 | 20 | 10| 30 | 52 li) Do you think the assumptions required to construct the confidence interval are justified here? lil) Hal€ofehe rats were made to ran the maze frst when hungry and half ran ie frst when fed, Why did the experimenter do this? 12 Cats of ol ate supposed to contain at eat 380ml, Sl thnks that the seenge conten a his She mesures he coon of doa Spl fen ca th he long stl nce oh, ys 3265 33123279 329.8 3304 3276 ©3293 3303084330. (i) What distributional assumption is necessary in onder to earry outa ‘testo check Salma’ suspicion? Ui) Carry out this test at the 5% significance level ill 1fSalma picked 10 cans fom a single box of 24, would the test have still been valid? Explain your answer 13, Ina certain country, past research showed that inthe average married couple, the man was 7em taller than his wife. sociologist believes that ‘with changing roles, people are now choosing marriage partners neater their own height.She has measured the heights of 12 couples. Her results are shown in the table below. i[2?]3]4]s]6 1982 [1742 | 1928 | 163.6 | 1832 | 71a Uonnquisip- ay Gulch ueaus syduses © uo Busey siseyiodly 2 178.4 [165.1 | 191.9 | 1563 | 178.7 | 163.0 7{sefofol[n] pe 80.6 | 733 [1668 | F719 | 1755 | 103.2 1so.4 | 1702 | 1642 | 1662 | 1769 [1588 Li) ‘Test atthe 59 significance level che hypothesis that men are on average less than 7em eller than their wives lil, Explain clealy what assumptions you make in this ease. 16 The speed » at which a javelin is thrown by an athlete is measured in Jamh*-The results for 10 randomly chosen throws are summavised by Yr=1108, Lo-v =3339, where P isthe sample mean, (i) Staking any necessary assumption, calculate a 99% confidence interval for the mean speed of a throw: ‘The seals Go a Gunther 5 randomly chosen throws re now combined with the above results. It is found chat the sample variance is smaller than that used in par (i lil State, with reasons, whether a 95% confidence interval calculated, fiom the combined 15 results will be wider ot less wide than that found in pare (i) Gantry Internationa AS & A Ler Further Mathnats 9281 Paper 23 QT November 2012 Aner exes a alle a sonoma a 2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS 1% W Ina crossword competition the times x minutes, taken by a random sample of 6 entrants to complete a crossword are summarised as follows, Yee2i09 Dome =i812 ‘rice cen rt hc wn jist, Cladus 99 outdenosineral ‘acon fiat th tnd even of ia pelos een 2 pee mines Find eb nae mpl setae woul lad 295% eSatncewecrl for of wath nS mes Sate ena A deer Dae stat ge HO ie ‘random sample of8 oberon of normal ndom vail X pe the flowingummarted das whee denotes the nple mean Dros Lom ‘Test at the 5% significance level, whether the population mean of X is greater than 45. Calculate a 95% confidence interval for the population mean of X. Ganividge Itt AS & A Love Further Mathematicr 9251 Paper 21 QD fe 2012 ‘A random sample of 10 observations of a normally distributed random variable X gave the following summarised data, where denotes the sample mean. Yee04 Leena) “Test at the 1086 significance level, whether the population mean of X is Jess than 7.5, 15519 508) Cambridge Ineational AS © A Lee Further Madeats 9231 Paper 21.Q7 Never 2013 ‘A random sample of 8 sunflower plant is taken from the large number ‘grown by a gardener, and che heights ofthe plants are measured. A 95% confidence interval for the population mean, 1 metres is calculated from the sample data as 1.17 < p< 2.03, Given thatthe height of a sunflower plant is denoted by 2 metres find the values of yx and x? for this sample of 8 plants. CamiidgeInerational AS © A Lee rther Mahemats 9231 Paper 21 Q7 Jone 2015 2.4 Using the f-distribution with two samples Have you noticed how time often seems to pass more slowly afer lunch? IE time passes more slowly, one minute of rel time should seem longer, 50 if ‘you ask people to estimate when a minute appears to have elapsed, the real time elapsed will be less ‘You could ask the question:"Will the mean realtime elapsed when one ‘minute appears to have elapsed be less afer Iunch than before?” In this example you are interested, notin what the mean value ofa random variable i, but in wha the dference between the mean values is in 6wo different situations, Statistical problems giving tse to different versions of this ‘general question are the topic ofthe next few sections Lunch time Find a group of volunteers and approach them before lunch, Give each of them a starting signal and ask chem eo say when one minute has elapsed. Record the realtime elapsed. You will then need to find a second group of volunteers to approach after lunch, You will need reasonably large groups to get usefl rests, 11 What ate the advantages and disidvantages of using separate groups of people forthe before-Iunch and afer-lunch times? 2. What are the advantages and disadvantages of instead conducting an experiment in which the same people are asked before and after lunch and only the difference in ther eal times recorded? ‘The volunteers in research projects are called subjects and before lunch’ and ‘after lunch’ are the two conditions in which testing occurs. An experiment such as the one described above, where a different group of subjects i tested in each of the two conditions is called an unpaired design; this is in contrast 0. paired design you met easier in this chapter, where the same set of subjects tested in both conditions “The members af a maths clas were asked one marning ta check the time shown by their watches then look away and, when they estimated that a minute bad elapsed, to check dheir watches again to see how long had in face elapsed, “The same procedure was followed with another clas from the same year soup, that afternoon. The back-to-back stem-and-leaf diagram on the next page shows the results, Ansiverso exrises ave available at yun hoddeneducation. com fambridgestas Saraiues ov ye vonmaunsips aus Busse) FZ 2 INFERENCE USING NORMAL AND /-DISTRIBUTIONS Morning class___Afternoon class (24 students) (22 students) 28 2]3{4 556 |3| 6 4| 440 78 |4| 999665 113333445] 22111 3557777 |5| 8775 144 /6| 3 |.6| 3. represents 63 seconds a Figure 26 "This experiment gives a set of data with which you could investigate the {question asked a¢ the start of this section. This experiment has an unpaired design: two separate groups of subjects are used in the two conditions ‘This section uses the data given in Figure 2.6 to work through the process ‘of hypothesis testing in the context of an unpaired design. Ifyou have ‘carried out your own investigation, you might find i helpfl to repeat the calculations using your data, ‘You cannot look here atthe difference between a before-lunch and an after-lunch time fora particular person, but you can look atthe difference ‘between the mean before-lonch time and the mean after-lunch time. In fact, ‘you can make the following hypotheses: Hi: There is no difference between the mean of people's estimates of one minute before and afier lunch, H;Afier lanch, the mean of people’ estimates of one minute tends to be shorter than before lunch, You can then use as your sample statistic the difference between the before lunch sample mean and the after-unch sample mean. You need to calculate the distribution of this sample statistic on the assurnption thatthe mull, hypothesis is true The test statistic and its distribution ‘Assume that each before-lunch estimate isan independent random variable 1X, (= Lyon 24) with the normal distribution (4,6) and each after-lunch ‘estimate is an independent random variable Y, (= 1,.--,22) with normal distribution N(jz, 62)-You are also making the assumption that each X is independent of each Y.This isa plausible assumption; it merely requires the independence of che two samples taken, ‘Recall thatthe mean of a sample size from a normal distribution Niu, @°) has distribution 2 a In this case, cherefore, the mean of the 24 before-lunch estimates has dliseibution [Next you need a result that if has distribution N(u,, 02) and Y has distribution N(u,,02) then (X— ¥) has distribution N(ity ~ ys o% + 3) Here, the distribution of the differences ofthe two sample means is therfore ‘Serdes omy yin uonnquisip~ ous Burst) 92 “The mul hypothesis then ates that both means are equal i 4 che ul hypothesis tue ep ewle Se 2 : R-7-n(oH+ 5) i Unforaateh you donot know of a 3 3 you wil want 3s you hae in eater work to replace these unknown vals with sample estes Te mght Seem ment mata owe sample eimates fo the two unkown variances, ein fc in tums ov that hem ard to make any progres in elatng, the ditibutton. This caper, therefore only deals with theese where You fan asume tht the variance inthe two conditions are equal here hs, inca thao? 0 otha the before and afer one estimates have the nme variance fats ewe So that aE - non ® oe T1 osm) ovat 2 To estimate 0, you ean use the pooled vaiance estimator from the 0 samples, @4-DS} + 2-52 CFD Asters to eves are aval owe deeduction, com canbpestas a : 2 i 3 3 2 3 g 3 Z where S? and $2 are the usual unbiased sample estimators ofthe population 2a which is obtained from @ by replacing the value of o* with its estimator S*, then has a t-distribution, with degrees of freedom equal to that in the pooled variance estimate: (24 +22 -2) = 44 Carrying out the /-test for an unpaired sample In the example = 51.542, 5, TPKE 4 8.691 and the value ofthe test statistic is 8.797; 7 = 47.908, 5, =8574 & Note 51.542 — 47.909 ‘The process A16. described 2 Fest forthe Gifrence ot |The cca region fora one-tailed tt inthe ae of 44 depress of eedom, tere the 8% significance evel is > 1.680 (is ale docs not apes inthe tables, with unpaired but can be obtained by interpolation for the values given for 30 and 50 degrees samyiesso | fed). Sins i416 «1.68, esl lend yout ace he mall Typothe ats significance lsh no good evidence tht che boxe Inch mean andthe ae anch mean ofthe population a whole diferent aera ear pen ung a -ample He 1 Usea ration timer to decide wheter male and fale have the sme tne action ines or wie older people have slower rections then young people ou ean choose the definition of older to wit he sample you hve sable) or whether sah lyr hae quicker ae “oun use Dem rulers acon tnd hl ely th the zero mark downwards while dhe subject holds their thumb and forefinger 2a apart at the 2er0 mark of the ruler You drop the ruler without warning 2-sample t-test. and your subject resto ach it betwen thumb and frefingeThe distance, ¢.in mints, through which the ur has fille before ie caught can be ted to mesure the ection cn, in second using che formal 2 i 3 2 Are students sudying A Level math beter at mental arithmetic than those taking other A Leveh? You will need to devise a mental aichmeic text (do you want to test speed or accuracy) and administer ito a group OFA Level maths suidents and a group of students taking other A Level ‘Do not be disapointed by the result You can adapt this test vo suit your prijodices: are A Level geography sradcns beer at naming capitals of foreign counties? Are A Level English students better at speling? 3 Two groups of mbject are each given lists of 25 words. Bath groups ‘mst run down the ist as quickly ax posible Those in the fist group tick the words that are in capital lees (You should make sure that about half ofthe words, placed randomly inthe ist are in capital leer.) ‘The second group teks the words that chyme witha target word that, you give them. (Make sure chat about half ofthe words, placed randomly in he list do thyme with this word) You then ak the subjects each to write down as many word a they cn remember fom the list do not tll the subjects in advance that chy will have to do this Test the Irypothesis that the subjects who have looked for rymes remember more ofthe words than those who Tooked for capital eters. Why would jee dificult to run this experiment witha paied design? squisip-1 aq) Buln 9 Assumptions for the 2-sample -test “The assumptions needed for the 2-sample test are quite severe. 11 The two samples must be independent randam samples of the populations involved, Strictly this requires every posible sample to have an equal probabilcy ‘of being chosen. Ifyou simply picked a group of volunteers, it would, therefore, probably not be 2 random sample. However, this method is very close to the method offen used by academic psychologists when choosing their samples, The hope in choosing a random sample is that the effects lof ll the irelevane ciferences benween members of the poptlation that influence the variables you are testing will average out. 2. The random variables measured in che two conditions mast: {ibe normally distributed Ui) have equal variances inthe wo conditions Answers to exeniss are avilable at wre hoddereducton com Heambridgexins a 2 INFERENCE USING NORMAL AND f- [Are these assumptions justifed?"The only information you have co help you decide isthe two samples: the stem-and-leaf diagram for che data ofthe ‘example is shown again below. ‘Morning class Afternoon (24 students) (22 students) 2) 8 2/3/14 ss6\3|6 4] 440 78 |4| 999665 11333344 {s/ 22111 5557777 |5| 8775 144 |6|3 |6| 3 represents 63 seconds A Figure27 [Ac frst sight, the distributions here do not look much like samples from a normal distribution: they are rather obviously negatively skewed. Neither is it clear that they would have come from populations of the sume variance However, these are relatively small samples and ie would be unwise to draw any firm conclusions from them about the population distribution from which they are drawn, > Do you think the assumptions made in the 2-sample test are {justified in the case of the experiment you carried out? The underlying logic of hypothesis testing When you construct the sampling dssibution of test statistic you use >» a model for the distribution of the random variables involved in the >» the value given to a parameter ofthis distribution by the null hypothesis, In the time-estimation example the construction of the sampling dissibution depends on: >» people’ estimates of one minute before and affer Ianch being, independent and distributed normally, with a common variance >» the null hypothesis that the difference between the means of their ‘estimates before and after unch is zero ‘The alternative hypothesis, that the difference betwen these means is greater than zero, gives an alternative range of possible values for the parameter of the distribution, but assumes the same model for the random variables involved, In the example it was determined (by using pre-calculated tables, in ict), thar if > the model fr the random variables was correct and > the null hypothesis were true then a test statistic greater than 1.680 would only arse in a random sample 5% of the time. (The significance level is 5%) In the example easier inthis section, you obtained a value of 1.416 and, since this is less than 1,680, che null hypothesis was accepted. Suppose, instead, that you had obtained a value greater than 1.680; say, for example, 1.832, In that cate, there would be three possible explanations Explanation A. » ‘The model is correct. > The null hypothesis is false, because the mean difference in before~ and after-lunch times is greater than zero. Explanation B » ‘The model is correct. » The nall hypothesis is true (or false because the mean difference in ‘before-anlafer-lunch times is actually less than 2et0). However, the sample selected happens to give a value of the test statistic greater than 1,680. The probability ofthis happening is 0.05 (¢he significance level) if the null hypothesis is true, or less ifche mean difference in before~ and affer-hanch times is actully les than zero, Explanation C » The model is incorrect, because the sampling method does not produce independent estimates for each subject, or because the estimates are not listributed normally in the population, or do not have a common variance > The mall hypothesis is true ot ike. In this, ase you have no idea how likely eis that the test statistic will have any value at all, ‘The hypothesis testing methodology is: >» to assume that explanation C is not the case » to observe that if explanation B was the case then the results obtained: ‘would be very unlikely, »» and therefore to accept that explanation A is the case. ‘Thus you reject the null hypothesis and accept the alternative. However, you should always be aware that the logic that leads you to this ‘conclusion on the basis ofthe evidence in the sample depend on the correctness of your sampling and distributional assumptions Answers to exeries oe available at ” es sejdues om) yim uonmnquisips ow GUIs 72 2 : 5 Z i 2 i i 2.5 Comparison between paired and 2-sample f-tests ‘The table below shows summarised data from the experiment you have just been analysing, together with data gathered fiom a paired experiment using a single sample of twelve people. Each was asked, both before and after lunch, to estimate one minute inthe same way as deeribed for the unpaired design, ‘The test statistic forthe paired experiment is 1.829, with 11 degrees of fieedom and a critical value of 1.796, 50 that here the mull hypothesis is rejected. “Why do you reject the mull hypothesis inthe paired case where the sample size is considerably smaller, which, all other things being equal, would usualy lead co a less decisive test, 28 reflected in the larger critical value? ‘You can see why the apposite appears to have happened if you look at how the test satisties forthe two cases are calculated. $1250 17.59 - 09 51,542 — 47.909 _ 1 416 T 60465 soot ‘The tes statistics for the paired and unpaired calculations have very similae numerator, but the standard error in the denominator is considerably lager in the unpaired calculation, despite the larger sample size in that cas. © not ‘The crucial point is thatthe is, forall sorts of reasons, considerable variation amongst people in their reaction times and lunch i only one, relatively small, effect amongst many. Some people wil tend to make short estimates in both conditions and some long estimates in both conditions, ‘though in both cases the effect af lunch may be the same. ‘The pared design enables you to take ths inte account in a way that the unpaired design cannot because ofthe way the standard error s estimated, Using paired and 2-sample ¢-tests It isa characteristic of research by social scientists that they are looking for a small average difference between the values of a particular variable in two diferent conditions, but that subjects show very substantial variation in the values ofthis variable within both conditions. In these situations, 2-sample ‘tests not usually very helpfl, asi will require a very large sample size to discriminate between the nll hypothesis of no difference between the means in the two conditions and the true situation where there is a small difference ‘Considerable ingenuity is therefore employed in attempting to match subjects s0 thata puted test ean be used to eliminate some of the variation between ‘them and the small diference between the two experimental conditions is not swamped, In the paired experiment, you used the same subject in each of two ‘conditions, bt this is not necessary. In fact, having taken part in one ‘experimental condition sometimes makes it impossible to take part in the second. For example, if you wish ¢o test the effect on children’s intelligence of an ‘upbringing in families from two different social chases, you could not use the same child and bring it up twice, nor would a 2-sample t-test be suitable in this case: the variation in intelligence caused by other factors would swamp the effect you are looking for. ne possibilty isto find pairs of identical ewins who are being adopted a birth and are asigned to adoptive paren of diferent social clases: these conitute matched pairs of subjects and you could use a Hees on the diferences between the intelligences of the twins from the two types ‘of family. Notice that here the matching is perfect in che sense that both children have identical genetic endowments: the belief implicit in this, experiment is that heredity is a major cause of variation in intelligence and this ffece will be canceled out by the matching process. OF couse, there ‘will be many differences betwen the adoptive families other than chss, and iis possible that the variations in intelligence induced by these differences in upbringing will stil swamp the effect being examined, Ideally, you would ‘want to find identical twins being assigned to Families difering only in their social cas, but i is unlikely chat you would find enough, if ny, examples of this to conduct the test! 2.6 Testing for a non-zero value of the difference of two means ‘You have now used the ¢-test to examine the null hypothesis that ew different conditions produce the same mean value of some random variable. ‘The method can also be used in a more general way to cet null hypotheses thae suggest chat the mean ofa random variable, X, differs by a given amount in the two conditions. Hypothesis: for some given value of 5 Hy; The diference between the mean values of X in condition 1 and condition 2 is & ‘Seow ony jo souasa\p aya Jo anpenosaz-voU & 04 Buse) 9 Aner to exenises ave availabe ot unchoddereducation som /cambridgeesis d 2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS ‘Sample‘Two sets of observations of X, one set in each condition, Let X; and X, be the random variables in the two conditions and n, and w, be the number of observations under each condition, Use these vals to calculate the sumple means X, and X, and the unbiased pooled-smple estimator" of the population variance. ‘Then provided that the random variable X's distributed normally inthe population, withthe same variance in each condition, and chat the mull hypothesis true "The manuicrarers ofa dieting compound claim tha the wie of their product 2s pat of acalorie-counting diet leas o an average extra weight los of at least five kilograms in a period of months. An experiment bas been caried ‘out by 2 consumer’ group that doubts this lam "The hypotheses are: 1); The mean ext weight losin a period of months fom adding the dicing compound vo elore-counting de sive kilograms {The mean extra weight losin a period of months om adding the ticing compound vo alorc-counting dc ses han fie ogra ‘he asumption ae that the weight los ina period of months fom a calorie ounting dic, wit or without the diting compound, sa normally Gisuted random varible and that the ation ofthe etn compound tothe det does not affect the variance ofthis random variable “Thirty-six deers used the dieting compound wih hei dis heir weight ses x, ((= 1, ..,36) in kilograms are summatised by the figures Sx, =40022 Ye =o1023» Sixty-two dieters lowed the sme caloie-countng procedure, but dd not tse the dieting compound; thelr weight loses, = 1,62) i ilgras tre summarised bythe igus Sy,csr.et $yp = sotsao Solution ‘These data give: so that eae RET EL. 1.37, 4, = 6433, 7= 9.22 and s, 326 ‘The test statistic is and there are (size of sample 1 + size of sample 2 ~2) = (36 + 62—2) = 96 degrees of freedom, "The critical region for a one-niled test with 96 degrees of ficedom at the 596 significance level is < 1,661 using interpolation and so, since 3.144 < =1,661, the null hypothesis is rejected in fvour ofthe alternative that the average extra weight loss is not as great as five kilograms. ‘Hsouiodiny £2 2.7 Hypothesis tests and confidence intervals ‘There isa very close relationship beeween hypothesis tests and confidence intervals, which should be clearly understood. A hypothesis test suggests a value for an unknown population parameter (che null hypothesis), and chen accepts this value if'a test statistic lies in a particular range (chat is, lies outside the critical region). However, the critical region depends on the hypothesised population parameter, so you can reverse this process Thus, fora given value of the test statistic, you can determine the range of values for the population parameter that would be accepted by the test ifthey were offered as null hypotheses. This is called the confidence interval for the population parameter. ‘y049)0 aou9pyuos pues For instance in the case where you take an independent random sample of size » fiom a normal distribution to test the hypotheses: H,:Population mean = H; (2) Population mean or (b) Population mean > or (cl Population mean < ‘The ees stati i Fou vi and you accept the null hypothesis at the o% significance level if Rou, + where 4,4, are the one-and two-tailed critical values respectively for the ‘-lstribution with n~ 1 degrees of freedom at the 4% level. Anan exis ele added on ambient i or ed 2 INFERENCE USING NORMAL AND f- & note Usually two-sided confidence intervals are used, as in (a Alternatively, for a given value of ¥ you can view these inequalities as constraining the range of values of, which would be accepted by the tet if ‘they were offered as nll hypotheses, and rearranging. them gives the (100 ~ % confidence intervals. fale or bh e-t, or a feuceen, laste, fou Confidence intervals for the difference of two means from unpaired samples “Two runners are being considered fora place in a team. They have each recently competed in several races, though not against each other. Their times {Go seconds) were as shown in the table below. 472 | 518 | 481 | 479 | #90 | 482 | ws | a4 | 483 | wt | 476 “You can model the first and second runner’ times with variables T, and T, ‘with distributions Ni, 0°) and Nid, 6), respectively You are describing their sunning times as normally distributed with different means and a common variance. The different means reflect differences inthe runners uunderiying ability; the random variability comes from factors such as the influence of other runners and weather conditions for which the effects in the different races ae independent. ‘Because you are interested in the difference in the runners’ underlying, abilities, you are looking for a confidence interval for the difference between and d ‘The sample means ofthe runnen’ times have distributions (in 4 o2)and 7, ~(u4, £02) 0 that the distribution of their differences is ~T)~ Nt, - Hy 0° ( ‘The standardised variable then has an N(O, 1 distribution Ifyou replace o* with its unbiased sample estimator, p= C“DS += SE here 5} and Sf ate the unbiased sample $= CAPSS DSE where and S} are the unbiased samp cscimators of the variance ffom the ewo separate samples, hen, finally (=7)-W~ 4.) 7,1 sir+3) ‘The critical value forthe tseibution with 10 degrees offcedom atthe 5% significance level is 2.228, so that D lies between ~2.228 and +2.228 in 95% ‘of samples; that is, a 95% confidence level for (u,~ jt) is defined by: D= has distribution 4. 2B < ) < 42.228, ‘This ean be rearranged as (GB) 2.2285 (5+ $) » X; and X, be distributed normally inthe population with a common variance >» X, and X, be independent of each other in the population » @be the pooled-sample estimate of the common population variance of X,and X, > tbe the two-sided a% critical value forthe t-stribution with (n, +n, —2) degrees of freedom, ‘Then a (100 ~ 4% confidence interval for the difference in the means 4, and 1m, of X, and X, is given by my aafbe Tet) <6 Aroves exes eet nachauna B+ Ts STearaju) aouapyuod pue sissy stsayi0dky Zz 2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS. Tn questions 1-3, you are expected to make a sensible choice of significance level forthe hypothesis tests involved. Remember that the 5% level i ‘conventional in scientific contexts, 1A species of finch has subspecies on two different Galapagos Islands "The weights of a sample of finches ffom each island are listed below: fatela[alelalale|wlelatal (i) Isthere evidence thae the finches on Daphne Major are heavier on average than those on Daphne Minor? (i) What assumpeions do you need to make? Are they reasonable? 2 Two groups of subjects are asked to volunteer for a psychology experiment. One group is told chat they willbe paid $1 for participating; the other that they will be paid $20. The experiment consists of a rather lll task that must be repeated for one hour:subjects are then asked ¢0 tate how interesting the task was, on a scale rom 1 to 10, with 10 being. the most interesting, li) Test, using a 2-sample t-test the hypothesis chat the task was found _more interesting by those who were paid les. ‘The ratings of the two groups were: (i), State and comment on the assumptions you are making in onder to ‘carry out this test (is) Could you devise a pared design for this experiment? tral eae aa ee ee on 0 ‘The results he finds (measuring incomes in $ per week) are summarised inthis table ‘29u—pyu09 pue Sisa) SISa\ROdKY 2 {il Show, using a 2-sample test, thatthe hypothesis that those staying. ‘on at school have higher incomes at age 24 is rejected, on evidence ‘of this sample. What assumptions are you making for a 2-sample fees to be appropriate? How plausible are they? (il) What other difference berween the two groups inevitably exist that ‘might explain this unexpected resul? How could you design an experiment to eliminate this effec? {In questions &~6, you need to decide whether the data are from an. experiment with a paired or an unpaired design. You are expected to make a sensible choice of significance level for the hypothesis tests involved, Remember that the 5% level is conventional in scientific contexts. 4 Amongst all praying mantises, females are on average 7 centimetres Tonger than males. new variety of mantis has been bred, the insects of ‘which are suppoted to be more nearly equal in size. ‘Tes the hypothesis hat the difference between male and female average lengths is ess than 7 centimetres, using the lengths in centimetres ofthe sample of twelve males and twelve females shown below, State clearly the assumptions you are making in your test ia [isa [19 | 3 [a7 | 130 wi | 202 [42 | 62 | 169 | a8 22 | 242 | 4 | a4 | 132 | 218 wos | 235 [168 | 119 132 Anowers to exenises ae avaiable at unnshod deredaioncom /cambvidgeests BH 2 INFERENCE USING NORMAL AND f-DISTRIBUTIONS. 5 In ct the data given in question 4 are pated: cach male ants spared wrth ts ate in the flloming way T]2[3]4 ia] [a9 [63 maz [ou | 124 [254 7[#19 [0 10.1 | 202 | 142 | 62 103] 235 [108 [119 36 87 [130 132 [214 n [2 io] 88 155 [132 li) Test ehe hypothesis thar male mantses are on average less than 7 ‘centimetres salle chan their mates. li) Explain cleatly what assumptions you make in tis case, and how these assumptions differ from those you made in question 4 (i) Why are the results you obtain differen inthis ease ffom those you found in question 4? 6 Ie x known from many studies that che best current post-operative treatment reduces says in hospital after major operations, compared ‘with untreated patients, by an average of 6.2 days. A new treatment is propored, with the hypothesis that this new treatment will reduce stays in hospital by more than 6.2 days on average, and a trial is conducted on ‘wo groups of patients who have just undergone major operations. The results are shown below. “Test the hypothesis given, clearly stating the assumptions you are making, In questions 7-9, you need to decide whether te at are fom an cxperimene ih pred or an unpaired desig, 7 The masses in gems, of ine hens" epg and eight duck es ate reeowed below ala] s[a[a]»[ «es @ elapse pelos [spa (Construct 95% confidence interval for the difference in mest tne hess an ck es (id) State the assumptions you are making in constructing this confidence interval 10 {A group of towers anda group of chess players have their esting puke rates measured. These data ae shown below. 70 fz ] é[a[7[~lal als uz 9} | 92 | 79 pee | as } ided 95% confidence interval, giving an upper Kimit for Construct a one. the extent to which the mean resting pulse rate of chess players exceeds that of rowers. ‘The amount, p, of infestation of maiz fields by root nematodes, in grams of the pest per square metre s measured in randomly chosen square ‘metre areas on 33 maize farms. Some of the farms have sprayed the crops with @ new pesticide. The measurements are summarised in the table below. 1490862 Construct 2 90% confidence interval for che difference in mean infection between the sprayed and unsprayed crops. Fish ofa certain species live in two separate lakes, A and B.A zoologist, claims tha the mean length of fish in A is greater than the mean length of fish in B.To test his claim, he catches a random sample of 8 fish from ‘Aand a tandom sample of 6 fish ffom B.The lengths of the 8 fish from, Ain appropriate units areas follows, 153° 120 151 2B HB Assuming a normal distribution, find a 95% confidence interval for the mean length of fish in A. ‘The lengths ofthe 6 fish from Bin the same units, are as follows, 150 107 136 4 116 126 Stating any assumptions that you make, test at the 5% significance level ‘whether the mean length of fish in Ais greater than the mean length of | fishin B. Calculate 2.959% confidence interval forthe difference in the mean lengths of fish fiom A and from B. Cambege International AS & A Loe Further Mathis 9231 Paper 22 QUI November 2014 ‘reAre}u eauapyues pue ss) saypodley [3 Answers to ecerises ave avilable t wunshoddereducation.com cambridevextrss i 2 INFERENCE USING NORMAL AND (DISTRIBUTIONS. 2.8 Using the normal distribution with two samples In studying 2-sample t-tests, you had to make the asumption thatthe ‘variance in the population of the random variable you were sampling was the same in both conditions. You then estimated this common variance from the ewo samples. However, there are some situations in which you know the variance of the whole population and you can use this information in a hypothesis test or in constructing confidence interval. For instance, it may be that, before the ability of the maths class to estimate cone minute was tested (see page 43), extensive tests were conducted that determined that, in the school population as a whole, students estimated ‘minutes are nortally distributed and havea standard deviation of 7.42 seconds You are testing the hypotheses: H,There is no difference between the mean of people's estimates of ‘one minute before and after lunch. H,:Afier lunch, the mean of people’ estimates of one minute tends to be shorter than before lunch. ‘But you can now make the assumption that people’ estimates of one minute ate normally disteibuted with standard deviation 7.42. ‘The null hypothesis implies that beforeTunch and after unch estimates have lstributions N(, 742") where sis the common mean asserted by the null hypothesis. With this assumption, che mean ofthe 24 before-lunch estimates dns distribution and the mean of the 22 after-lunch estimates has distribution F=N(u, 22) ¥~N(u. 735") “The dsibuton ofthe cliference ofthe two simple means is therefore R-PaN(0, 2s 22) In Palit Sts 2 you conve pots os with te norma iad f Xba ation NO, ower vince os Known ten the es tic ha the wanda normal din, NO, 1)-The tes si hee Ku-7 [Ee TE IT ata Naat 2 With the data used in the example on page 43, so the test statistic has the value 1 SSB 05 «1659 7434+ 30 ‘The critical region for a one-tailed tes atthe 5% significance level for the standard normal distribution is = > 1,645. In this case, since 1.659 > 1.645, you reject the null hypothesis and accepe the alternative, that the after Iunch times are shorter than the before-lunch times. Different known variances for the two samples Alternatively, you might know separately the variances ofthe populations from which each sample was drawn, where these need not be the same. ‘Suppose there are wo machines ina factory The first isa high-accuracy ‘machine, which produces bolts with radi that are normaly distributed with standard deviation 0.052mm. The second isa lower-accuracy machine, producing washers with internal radii that ate normally distributed with standard deviation 0.172mm. Both machines are adjustable to produce ‘components with diferent radi, but today they are supposed to be set so that the high-accuracy machine produces bolts with radii 2mm smaller than the internal radi of the washers produced by the low-accuracy machine. ‘sayduses ony yim uonngursip eunvou aya Busy 82 ‘To check whether the setting is correct, a sample of components is taken from each machine, and the radius of each measured.'The results are shown in the following table wosz [toxz | 998 [1009 [ 1057 [1049 | 110 You ae testing the hypotheses: [H,The mean radius ofthe bolts being produced is 2mm less than the mean internal dius of the washers being produced, H1The mean radius ofthe bolts and the mean internal radius of the washers being produced do not differ by 2antn, ‘You can assume that the radi ofthe components being produced by each normally distributed with the standard deviations given above. IfX, denotes the internal radius ofthe washer, and X, the radius of a bol, what isthe distribution of the sample statistic Xy — Xj? machine a Answers to exercises are onal at was hoddenedcation com fambridgvexteas a

You might also like