This action might not be possible to undo. Are you sure you want to continue?
STATISTICS
Problems and Solutions
Second Edition
STATISTICS
Problems and Solutions
Second Edition
E E BASSETT J MB E N R RME B J T MORGAN
University of Kent at Canterbury, UK
I T JOLLIFFE
Univerdy of Aberdeen, UK
B JONES
SrnithKline Beecham Pharmaceuticals, UK
P M NORTH
ASRU Ltd., UK
orld Scientific
ingapore New Jersey. London Hong Kong
Singapore 912805 USA oflce: Suite IB. including photocopying. Inc. Jolliffee. please pay a copying fee through the Copyright Clearance Center. J. Re.. 222 Rosewood Drive. electronic or mechanical. I. B. 41 Bedford Square. Farrer Road.Published by World Scientific Publishing Co. This book. Ltd. North 1986 First published 2000 STATISTICS Problems and Solutions Copyright Q 2000 by World Scientific Publishing Co. Morgan. Ltd. For photocopying of material in this volume. ISBN 981024293X ISBN 9810243219 (pbk) Printed in Singapore by UtoPrint . recording or any information storage and retrieval system now known or to be invented. Bremner. Covent Garden. without written permission from the Publisher. NJ 07661 UK ofice: 57 Shelton Street. B. First published in Great Britain 1986 by Edward Arnold (Publishers) Ltd. London WC2H 9HE British Library CataloguinginPublication Data A catalogue record for this book is available from the British Library. River Edge. Danvers. Re. All rights reserved. In this case permission to photocopy is not required from the publisher. Jones. J. P. USA. 1060 Main Street. M. Bassett. may not be reproduced in any form or by any means. London WClB 3DQ Copyright Q E. orparts thereof. E. T. MA 01923. P 0 Box 128. T. M.
By contrast. and we have felt it hclpful to do this. Because of this. and since they can be gained most effectively by working through examples. W e have found that many people have a passing acquaintance with the various concepts encountered in a typical introductory statistics course for those with a basic knowledge of calculus: the sort of course which in Britain might be given to sixthform mathematicians or to first year undergraduates. When statistics is compared with. we find that the algebraic manipulations are relatively straightforward. By and large. bookwork sections of questions have typically been excluded. inclusion of a ‘bookwork’ section of a problem has enabled us to widen horizons somewhat by discussing techniques slightly beyond most syllabuses. Some notes give alternative methods of solution. even if a fair proportion of examination questions require ‘bookwork’ answers.Preface The idea for this book a r m from a short course the authors have been giving a t the University of Kent annually since 1974. although we have in many (but not all) cases ensured that they are realistic in standard and style by keeping actual questions in mind while devising them. W e feel that it will often be . and have in almost all cases provided extensive notes to the solutions. Since most readers of this book will have interests in solving problems set in public or professional examinations at Alevel or first year undergraduate level. preferred not to restrict ourselves to ‘standard’ length problems. In particular. Readers may prefer to postpone working through these parts until they have become familiar with the more straightforward material. and we have framed most of the problems so as to omit these matters. The course has several aims. at the same time. will be found to be of value. Occasionally. containing many notes and comments. to an audience consisting mainly of schoolteachers. that of length. though. A quick glance through the pages of this book will reveal the authors’ approach. Some relatively difficult problems (or parts of problems) have been marked with asterisks. straightforward solutions to problems are of limited value unless they contain a discussion of why the technique used is the most appropriate one. The problems are all original. but the real difficulty comes from determining which techniques to apply to which problem. when we have felt it desirable to explore the links between related statistical techniques we have extended a problem to well beyond acceptable examination length. but one of the most important is to give the participants guided experience in solving problems in statistics. we have of course taken care to respect the copyright in published examination questions. such answers can be obtained most effectively through reading a good textbook. In one respect. we have in general not included straightforward expository material. which we hope will add to the value of the solutions themselves. while others present solutions to related problems. We have therefore deliberately elaborated solutions into more than bald ‘model answers’. As we noted above. people usually have little experience in applying the concepts in practical examples. the problems included in this book are often not representative of examinations questions. In this way we aim to give readers an insight into how professional statisticians view and deal with the sorts of problems set as questions in public examinations. however. Since this book is not designed as a standard textbook. say. But in general we have. mechanics as an examination subject. we have devised the problems after perusing many examination questions set a t this level. we feel that a book of worked solutions to problems. in any case. These attributes of experience and confidence are vital to teachers and students >like. and hence lack confidence.
Thus. since notation and terminology are usually introduced there. many problems have facets which take them out of their section. The fact that statistical problems are so interdependent emphasises the unity of the subject. by contrast. there are several occasions in early chapters in which reference is made to later material for further details. the most we can d o is to couch our conclusion in terms of probability. and we can regard inference as a form of generalisation.vi Preface from examining the notes that the reader will discover the relationships between the various statistical techniques which will give the confidence needed to tackle new problems. or for related material. state where the haphazard (or random) features occur. (There are one or two exceptions to this rule. for cases in which results are conventionally quoted to a fixed accuracy only. require one to calculate the probability of some event of interest. Consequently. these statements cannot be both definitive and guaranteed as true. and is accurate to the number of digits quoted. and. we cannot be certain that the one which appears to be the better will always give superior results. therefore. The link between probability and statistical inference is just as clearcut as is the distinction between them. We have taken a commonsense approach to the matter of the accuracy to which calculations are carried out. a process of deduction. Having completed such a preliminary analysis. A straightfonvard. Other examples of inference are found in problems of regression and correlation in which a random sample gives partial information about the relationship between two variables.) Thus. One can describe the distinction between these two areas in a variety of ways. Perhaps the simplest is to note that problems of inference start with sample information and aim to make statements about the groups or populations from which the samples were randomly drawn. and perhaps having performed a goodnessoffit test. problems in probability theory describe the background. By contrast. while the problems have been grouped according to the topic of primary interest. In doing this we aim to discover from what distribution the data might have come. it is not just a collection of isolated techniques. For example. Since making deductions from axioms is the basic procedure of pure mathematics. is the r e v e m process of arguing from general knowledge to special cases. and sometimes indeed into a different chapter. but a comprehensive approach to the analysis of data. In inference. that branch dealing with problems with random or haphazard elements. A textbook is designed to be read from beginning to end. for example. we expect readers to dip into this book. but with the parameters of that distribution unknown. we have incomplete information about a population. say. if informal. The only caution we would give a reader who does wish to dip into the book is that the introductions to individual chapters and sections should normally be read along with the problems. as with the more difficult problems these have been marked with asterisks. and wish to infer what we can about that population. but we hope it is the one which when picked up helps to resolve difficulties about how a technique is carried out and when it is appropriate. So. we may assume that a variable has. The most basic division in Statistics is that between probability and inference. It follows that. if we test two fertilisers on a sample of crops. the problems devised for this book often do not fall neatly into single categories. probability is essentially a branch of mathematics. typically. we have knowledge about part of a population. Some of the notes are more advanced in standard. it is not intended to be the book which cannot be put down. In such a case the inferences are then required only about those parameters. As with most examination questions. way of starting this process is to construct a histogram from the sample. Probability. a normal distribution. Thus the statistician cannot do without results in probability theory in making inferences. Since inference attempts to make statements about entire populations on the basis of the partial information given by a sample. in sequence. and accordingly we use such a division in this book. Our aim is almost always to ensure that the result of an entire sequence of calculations is presented to a reasonable number of significant digits. generally. since tables of .
the sequence of calculations may not a p ar to be correct. we warmly acknowledge the extensive questioning directed at us by the participants on our courses over the years. Cooke (Edward Arnold. using the rrof system. it is pointless calculating a corresponding statistic to any greater accuracy. we are grateful to all whose perceptive comments have helped to sharpen our own ideas about suitable ways to present and solve these problems. Accordingly.) Our practice in this book has been to present any intermediate steps to a reasonable accuracy. by G. M. Inevitably. The production of the book owes a great deal to the magnificent service offered by the Computing Laboratory a t the University of Kent. Clarke and D. but it would be rather risky to be much less accurate in our calculations.B.E. use of normal tables. we have written for those who have read a firstlevel textbook such as A Basic Course in Statistics. dividing 10 by 3. as a help to the reader wishing to follow the calculations through. of course. suppose we were evaluating 10/ 10 to 2 significant digits. We assume that the reader has encountered these. 1983). June. Different authors have contributed different sections to this book. T. histograms. and the authors are most grateful for the provision of these facilities. but. and either knows what is involved or has a favourite textbook in which to look them up. Finally. without which the successive revisions of the material would have been very much more difficult. and one recorded a value only to a few significant digits as an aidememoire. (As an unrealistically simple example. 1986 . but we have shared in the production of all parts of the volume. A. We d o not set out to teach systematically matters like conditional probability. Second Edition. But subsequent calculations are based not on the value presented but on a more accurate one.125. North Canterbury. and will therefore be familiar with the terminology and most of the concepts of elementary probability and statistics.). without using the obvious method of cancelling! We would have to write F * 10 = 10 32 = 3. no book is ever the fruit of the effort and imagination of the authors alone.2. r tests and correlation (just to take a single example from each chapter). A difficulty in doing this is that if such a value is presented only to the number of significant digits in the initial or final values. In particular. The book was typed at a computer terminal.2 gives 3. Morgan Philip M. and especially the Electronic Publishing Research Unit established by Mrs Heather Brown. it is basically as if one was using a calculator which held the intermediate value to 8 significant digits.vii Student’s rdistribution on 10 degrees of freedom are generally given to three or four significant digits. London. While we naturally accept full responsibility for any errors that may remain. Eryl Bassett J. Michael Bremner Ian Jolliffe Byron Jones Byron J. A student completely unfamiliar with a subject does not start to study it by reading a book of worked examples. without whom we would not have embarked on the task of producing full solutions to all these problems. some authors have contributed more than others. and in particular the task of overall editorial responsibility was mainly undertaken by one of us (E. In most cases we have presented some intermediate values in a sequence of calculations.
(Arnold. and to help students appreciate which techniques might be appropriate for any particular problem. A. North Novembel. London. Cooke. 1998. the availability of computer packages to undertake tedious calculations has become much greater. Naturally. In our view. we viewed the material as particularly suitable for students taking courses at about the level of Alevel examinations in England and Wales. Fourth Edition. Cooke. M. The authors wish to express their gratitude to Sunil Nair and Dr Lu Jitan of World Scientific Publishing for the energy and enthusiasm they have shown in arranging for this book to be once again in print. we continue to accept responsibility for any errors which remain. as well as for firstyear undergraduates at university. M. and for those teaching or advising them. these very desirable developments do not reduce the need for a book like the present one. Our approach aims to improve a student’s understanding of statistics. and the packages have become more sophisticated and easy to use. and are happy to continue recommending it. We remarked in the original preface that we were writing for those who have read a firstlevel textbook such as that by G. T. Clarke and D. Michael Bremner Ian Jolliffe Byron Jones Byron J. 1999 Reference: A Basic Course in Statistics. by G. We give a full reference below. Eryl Bassett J.) . Morgan Philip M. This level corresponds to that of calculusbased introductory courses in Probability and Statistics in many other parts of the world. We are pleased to see that that book has now reached its fourth edition. A. many aspects of statistical work have moved on. Clarke and D. In particular. this approach is quite complementary to the need for computing assistance.viii Preface Preface to the 2000 reprint Since this book was first published in 1986. In 1986. Developments since then suggest to us that the material should be all the more suitable for firstyear undergraduates or those taking the Higher Certificate examinations of the Royal Statistical Society.
9 lA.10 Length of a chord of a circle lB.6 The insect breeding experiment .7 Finding the mode and median of a distribution 1B.8 Buffon’s needle 1B.3 1A.6 Positioning the pointer 1B.3 The game of darts 18.2 Sampling incoming batches 2A.9 Transforming a random variable lB.2 The total score from two dice Selecting balls from an urn 1B.l 1B.5 1A.3 Sampling for defective items 2A.ll 2 Probability Distributions 2A Discrete Distributions 2A.4 The randomised response experiment 1B.5 Finding a probability density function 1B.4 1A.8 1A.4 The music recital 2A.6 1A.ll The Wellington boots Dealing four cards The school assembly The second card A birthday coincidence Winning at craps The fisherman Testing lightbulbs Designing a navigation system The sports club Playing the fruit machine 16 18 19 20 24 25 26 27 30 32 35 36 37 39 41 43 45 47 47 48 49 51 52 54 55 1B Random Variables The highest score in a dice game lB.1 The squash match 2A.5 Marking a multiplechoice test 2A.7 1A.Contents Preface 1 V Probability and Random Variables 1 1 2 4 7 10 10 14 1 A Probability lA.2 1A.l 1A.10 lA.
8 2A.10 2A. 1 Numbers of sixes in dice throws 3B.5 4A.6 120 121 123 125 126 127 129 130 130 134 136 4B Two Samples: Normal Distribution 4B.2 The Occurrences of thunderstorms 3B.9 2A. 1 4A.7 2A.x Contents 2A.4 Frontal breadths of skulls 3B.11 T h e telephone exchange Random sultanas in scones Relationship between binomial and Poisson distributions Tossing a coin until ‘heads’ appears Estimating the size of a fish population Collecting cigarette cards Distributions The slippery pole Mixing batches of pesticide Tolerances for metal cylinders The number of matches in a box Fitting plungers into holes Bird wingspans Distributions related to the Normal Failures of belt drives Guaranteed life of a machine Watch the birdie Breaking a rod at random 57 59 61 63 65 67 70 70 71 72 74 75 77 78 79 81 82 85 88 88 90 92 2C Simulating Random Variables 2c.8 2B. 1 Weights of tins of peas 4B.11 2A.1 3A.3 Milk yield of cows .4 2B.6 4 Inference 120 Normal Distribution Lengths of manufactured bolts Lifetimes of electrical components T h e diameters of marbles Variability in weight of bottled fruit Changes in weight of manufactured product Estimating the mean and variance 4A One Sample: 4A.3 2B.5 Rotating bends Distribution of accidents in a factory 3B.2 3A. 1 2B. 3B.2 4A. 1 Simulating random variables Using dice to obtain random numbers 2c.7 2B.3 3A. 3 3 Data Summarisation and GoodnessofFit 95 95 96 100 3A Data Summarisation 3A.2 Ants on flower heads 2C.5 2B.9 2B.3 4A.6 2B. 12 2B Continuous 2B.10 28.3 The distribution of I.Q.2 Experimental teaching of arithmetic 4B.2 2B.4 4A.4 The weights of club members Histogram for catches of fish Distribution of examination marks Consumption of cigarettes 102 103 105 106 107 110 113 115 117 3B GoodnessofFit 3B.
4 4B.3 Sex ratios of insects TimeSenes The electricity consumption of a Canadian household 5D.3 4D.6 A control chart for massproduced articles 4D 5 Analysis of Structured Data 5A Kcgression and Corrclation 5A.3 Regression for several sets of data 5A. money supply and dysentery 5A.2 5B 5C 5D Index .5 4B.2 The rubber content of guayule plants 5B.1 Relationship between quality and temperature 5B.6 Glue sniffers 4D.2 Two discrete random variables Mean square error of estimators 4D.8 Other Problems 4D.2 5c.2 Women investors in building societies 4c.4 Are telephone numbers random? 4c.6 4C Speeding up mathematical tasks Peeling potatoes faster Estimating a ratio 139 141 143 145 145 146 147 150 151 152 155 156 158 158 160 163 164 167 169 172 174 176 176 177 181 183 186 188 190 193 194 196 199 202 202 203 205 210 210 213 217 Binomial and Poisson Distributions The pink and white flowers 4C.4 Arrivals at a petrol station 4D. 1 Turning points in a time series 5D. 1 Distinguishing between brands of cheese 4C.1 The numbers of pensioners 5A. 1 Sampling bags o apples f 4D. 1 The value of rainfall forecasts A survey of exercise habits 5C.2 Graduate unemployment 5A.6 4c. 3 4c.7 Ignition problems in cars Modifying an accident black spot 4C.3 The quality of electronic components Contingency Tables 5C.5 Smoking in public places Smoking habits of history and statistics teachers 4C.xi 4B.5 Confidence intervals for an exponential distribution 4D.5 Correlation between height and weight 5A.4 Inflation.6 Judging a beauty contest Analysis of Variance 3B.7 Quality control for bolts 4D.
.
Some problems may relate to situations possessing sufficient symmetry . The problems set in this section generally require a numerical answer. therefore. that is. In the present section we deal with problems in which we have no such advantage. are the general ones of probability theory. as its central feature. we cover problems involving random variables and their distributions. applicable in all cases. a tool to enable us to make valid statistical inferences. Very often the sample space will have an easily recognised structure. almost all statistical analyses depend for their validity on results in probability. probability will be just a means to an end.1 Probability and Random Variables As Professor Sir John Kingman remarked in a review lecture in 1984 on the 150th anniversary of the founding of the Royal Statistical Society. (Exceptions are graphical methods . and indeed it is to a substantial extent the growth of less formal. and the use of the basic laws and theorems of probability theory. paving the way for the coverage of wellknown probability distributions in Chapter 2. therefore. and these are examined in Section 1B. But in this chapter and the next we examine it more for its own sake. an experiment the outcome of which is uncertain. For a calculation to be possible one must be able to assign a probability numerically to each of the outcomes of the experiment concerned. in Section l B . Obvious examples are simple games using dice or playing cards. and the law of total probability and Bayes’ Theorem. The tools used in this section. techniques in more advanced areas which made the above remark controversial. The first section of this chapter is devoted to a discussion of problems involving the manipulation of events and their corresponding probabilities.construction of histograms and so on. The remark was challenged and led to an interesting discussion. graphically based. but it is unquestionable that. the concept of a repeatable random experiment. In many problems. and one can then exploit this structure mathematically to calculate any required probabilities. the aim wl be to calculate the probability associated with some event of interest. that methods based upon probability theory appear throughout this book. The set of all possible outcomes of an experiment is known as its sample space (another term which is sometimes used is possibility space). or a set of real numbers. Later. and il subsets of this are called events. In particular we will need to use the addition and multiplication laws. at an elementary level. ‘The theory of probability lies at the root of all statistical theory’. 1A Probability The theory of probability has.) It is natural. The most common structures of this sort occur when a sample space is a set of integers. From Chapter 3 onwards.
this is less often true of probability problems. ACB . where ‘ X X Z T ’ represents a verbal statement defining the event. we denote the conditional probability of A given B by the standard notation Pr(A i B ) . This is. and that the children select boots of the correct colour and put them on (without distinguishing between left and right). The probability of an event will be denoted by Pr(xucr). We have. each with probability &. BCA . unfortunately. we note that. BAC . the probability that each child obtains the correct pair of boots is &. and C obtains those belonging to B. Finally.l for the assumption of ‘equally likely outcomes’ to be appropriate. (d) Now suppose that two pairs of boots are red.l The wellington boots Three young children are attempting to retrieve their Wellington boots from a cupboard. A notation as simple as this suffices for some cases. but when the description of events is complex we resort to notation for the events themselves. unlikely to have time for this. Another reason for presenting multiple solutions is that they illustrate a method of checking solutions (a familiar experience of those struggling with probability problems is that of reaching what turns out to be the wrong answer through an apparently flawless argument). e. Thus we may specify that an event will be denoted by a symbol . and that none of these is clearly ‘best’.2 Probabiliry and Random Variables lA. produce the same answer there is some hope that it may be correct! The luckless examination candidate is. it is frequently the case that there are two or more possible approaches to the solution of a problem. There are = 90 such allocations. and one pair is yellow. We use a natural and conventional notation. ABC . If two solutions. Find the probability that all three children end up wearing their own boots correctly. For other problems such an assumption may not be appropriate. CAB stand for the outcome in which A obtains the boots belonging to C. Solution For the first three parts of the question we shall adopt a sample space consisting of all possible allocations of two boots to each of the three children. to demonstrate the range of different approaches that can be used. perhaps with a subscript.acapital letter. Indeed. there are six outcomes in which each child obtains a pair. the choice of an approach is often largely a matter of personal taste. CAB. viz. (c) Find the probability that at least one obtains the correct pair of boots. (b) If we denote the three children by A. write Pr(E. Each selects two boots at random. lA. therefore. (b) Find the probability that each obtains a pair of boots (not necessarily the correct pair). while B and C do not. while the remaining five are of the latter. values of the probabilities of various events. calculation of probabilities then reduces simply to a question of counting outcomes. (a) Since only one allocation is correct. directly or indirectly. we need only = 6 of these. B.g.) for its probability. so that a standard method of solution will be appropriate. (a) Find the probability that each obtains the correct pair of boots. in part. and we must be given. of course. Thus the probability of this event is $ = A. and C and let. Whereas a typical problem in another area will fall into one of a range of standard categories. of which all but one result consider allocations of boots to B and C. presented more than one solution to several of the problems in this section. for example. CBA . En . B obtains those belonging to A. using different methods. There are 6) . since A is to have the correct boots. Probability problems are often found difficult. To count these. the cupboard contains no other boots.and we will then. (c) We consider first allocations such that A obtains the correct pair of boots. The first six problems in this section are of the former type.
we obtain the probability that all three children are correctly shod as X & = &.1A. for example. (a) We may imagine A having first choice of boots. Pr(B1 l A l n A 2 ) = obtain the solution $.’ In solving probability problems. Adding to these the single allocation such that they all obtain the correct boots. A l denotes the event that the first boot chosen by A belongs to A. similarly Pr(MB(MA) = f . Only one such allocation is correct. i. The interpretation we have chosen here is much stronger.so that the required probability is $ = &. f . (3) Some readers may prefer to tackle this problem by dividing each part into stages and using the multiplication laws of probability. and we shall frequently make it in our solutions to the problems which follow. (b) If we let M A stand for the event that A obtains a matching pair of boots. there are 8 (= 23) outcomes in which all three children obtain the correct boots. I. implying. we Similarly. consisting of all 4! possible allocations of the boots to the feet of the children concerned (in this part of the question the order is important). We now present a solution along these lines. We then require Pr(A U B UC). similarly there are 5 allocations such that only B obtains the correct boots. since each child can obtain his (or her) boots in two possible orders. so that the probability that the redbooted children are correctly shod is = $. and define B and C similarly. assuming independence between the allocation of red and yellow boots. If we adopt a notation in which. we consider a sample space . and Pr(B2 lA. (2) It is possible to solve parts (a)(c) on the basis of a sample space in which all 6 ! possible ordered allocations of the boots to the children are equally likely. Now Pr(MJ is just the probability that the second boot selected by A matches the first. of which 2 are correct. The probability that the child with the yellow boots puts them on correctly is : For the red boots. and 5 such that only C does. for example. and has therefore been omitted).)Pr(B 1 A l n ~ 2 ) P r ( n~Al n ~ 21)n ~ (we have omitted two further conditional probabilities in order to shortcn the above expression: they relate to C‘s ‘choice’. they imply only that the outcome cannot be predicted in advance. we require Pr(MAnMBnMc) Pr(MA)Pr(Ms IMA) = (as above. and are both equal to 1 since C has no choice a t all). as before. we obtain the solution 15 . Now Pr(A1) = since A has initially 6 boots to choose from. Strictly speaking. Then. followed by B and finally C . This probability may be expressed as 2 Pr(A I)Pr(A 2 1 A . of which 1 is correct. we find that there are 16 allocations which result in a t least one child obtaining the correct boots. Finally.lflA2nB1) = f . and Pr(A2/Al) = $. and so on. h4ultiplying these probabilities. we require the probability of the event AlnA2flBlnA2nC1flCZ. (c) We now let A denote the event that A obtains the correct pair of boots. Thus there are 5 allocations with the property that only A obtains the correct boots. the third component P r ( M c l M A n M s ) is equal to 1. that each of the 90 possible outcomes has the same chance of occurring. it is the natural one in the absence of any further information. 1  . since at this stage A has 5 boots to choose from.e. Notes (1) The words ‘at random’ are frequently encountered in questions such as this.1 Probability 3 in B and C obtaining incorrect pairs of boots. Multiplying these probabilities. (d) We may consider the red and yellow boots separately. Using the general addition law for three events. z. for parts (a) and (b). Such an assumption of ‘equally likely outcomes’ is in line with everyday usage in statements such as ‘Winning Premium Bond numbers are chosen at random.
The four remaining terms that we require in order to obtain Pr(AUBUC) are all the same. in part (a). (iii) the probability that all four are spades. Pr(LAnRA) = Pr(LA)Pr(RAILA) = + x i = 3. Method 1 Let si denote the event that the i th card dealt is a spade. since A A lnA2. given that the first two are spades. 4 3 2 1 24 Multiplying these two probabilities. Similarly.3 x & + & = A. and ‘two spades’. and equal to Pr(A n B n C ) (since. there is. 1A. L A will stand for the event that the boot worn by A on his (or her) left foot is the correct one.2 Dealing four cards Four cards are dealt from a standard pack of 52 cards. 45 (d) We shall use a notation in which.Pr(AnC) + P r ( A n B n c ) . to be $. we obtain Pr(L BnRBnLCnRC) = Pr(LB) x Pr(RB B) x Pr(Lc 1 LBnRB) x Pr(Rc 1 L BnRBnLC) = LxlxLxL = 1 . Of these. using the notation used in solving part (a). if A and B obtain the correct boots. Thus Pr(AUBUC) = 3 X & . we have to calculate the probabilities of only two events.XX52 51 50 49 (ii) Rather than sum the probabilities of the three events ‘no spades’. Find (i) the probability that all four are spades. indeed. take it that A is the owner of the yellow b o s Dealing first with these.P r ( B n c ) . Turning to the red boots. for example.ns2ns. a further reduction in effort.4 Probability and Random Variables 1A.2 this probability may be expanded as Pr(A) + Pr(B) + Pr(C) . ‘one spade’. as before.12 11 10 . we obtain Pr(A) = Pr(Al)Pr(A2IA1) = $Xf = A. (ii) the probability that two or fewer are spades.13 X . for example. we obtain ot. This probability has already been shown. without loss of generality. we obtain the solution &. since we have already found one of these probabilities in the solution to . Adopting this approach. so also must C). it will be simpler to adopt the standard device of obtaining the probability of the event complementary to that specified in the problem. and Hi the event that it is a heart.Pr(AnB) . (iv) the probability that spades and hearts alternate. We may. = Now.) . Method 1 is probably the most straightforward and appealing. and then subtracting from 1. The required probabilities are then obtained as follows: (i) ~ r ( s ~ n s ~ n s=n s ~ ) ~ Pr(Sl)Pr(& iSl)Pr(S3 1s1ns2)Pr(s4Is. Pr(B) and Pr(C) have the same value. Solution We shall solve this problem using three different approaches.
the remaining three probabilities are 39 11 13 39 12 11 13. to have the same value.( 52 x 51 x 50 x 49 = . Thus Probability 5 Pr(two or fewer are spades) = 1 . Hence the as before.2 part (i).. Method 1. Since the number of hands with this property is 13 x 12X50X49. since the second of the probabilities is known. we obtain 1 3 x l 2 x 1 1 x10) + (4x 1 3 x 1 2 11 x39) ~ Pr(two or fewer are spades) = 1 .x . probability that a hand consists entirely of spades is gs.10 50 49 . The number of different hands consisting of three spades followed by one card which is not a spade is 13 X 12x 11x 39. This sample space consists of 52x51 x50x49 outcomes . taking the order in which the cards are dealt into account. we must find the probability that the first two cards are spades. Pr(HlnS2nH3nS4).x .x . . (iii) In order to obtain the required conditional probability. The first of these probabilities is = Pr(SI)Pr(H2 1S1)Pr(s3 ~ s ~ ~ H ~ ) P ~ ( H ~ Pr(S1nH2nS3nHq) and 52 51 50 49 and the second is found.156 52X51x5OX49 20825 Method 2 ' 13 12 12 x 13 x . .. 0 Hence the total number of hands containing three or four spades is ( 1 3 ~ 1 2 ~ 1 1 ~ 1+ ) ( 4 ~ 1 3 ~ 1 2 X l l x 3 9 )from which we can deduce the same answer as that obtained by . it follows that the corresponding probability is Dividing the probability obtained in g.Pr(al1 four are spades).Pr(three are spades) . we need only the following: Pr(three are spades) = Pr(SlnS2nSgl s 4 ) + Pr(SlnS2n 53nS4) + Pr(slnS2ns3ns4)+ ~ r ( S ~ n s ~ n s ~ n s ~ ) . x In this method we base our solutions on the enumeration of appropriate outcomes in a sample space consisting of all possible hands of four cards.. (i) The number of different hands consisting of four spades is 13 x 12 x 11 x 10. we see that the number of relevant hands is the same for each.50 x. (ii) As in Method 1.4 9 ' = 52 51 Similarly.19 912 20 825 . considering the three other positions in which the card which is not a spade can appear.all equally likely. The required probability is thus 2 x 132x 122 = . Combining these results with the solution to part (i).x 12. .x .lA.11 245 ' (iv) The required probability is composed of two parts. The first of these four probabilities is 13 12 11 39 Pr(Sl)Pr(S2 IS1)Pr(s3 Is1ns2)pr( F4 ls1ns2ns3) x x. and.x x x 52 51 50 49 52 5 1 50 49 39 52 13 51 12 50 11 49 respectively. similarly. it is simpler to count the number of hands with three or four spades and subtract.x . 11 = x.
although the four components which are summed to obtain the probability of three spades are equal. (See Note 5 to Problem lA. although the argument is valid for similar questions involving the rolling of dice. The probability of such a hand is possible ("42) which. On the other hand. in finding the probability of three spades. and for mention of a problem for . What is more. Method 3 In this method (which can be used only for parts (i) and (ii) of the problem) we again consider a sample space of equally likely fourcard hands. the probabilities which are multiplied to obtain them are not. to obtain Pr(SInS. such as that we have given. and that each of these may be combined with possible combinations of three spades. after appropriate cancellations. and the inadequacy of such a notation becomes more evident when we start writing out a solution. is 13X13X12x12.3 for a similar point. (ii) To count the number of hands containing three spades and one card of another suit. without explaining what this means.2 answer to (i) by this quantity. The total number of hands is thus 39x ('33) and. . (iv) The number of possible hands in which spades and hearts alternate.10 and Note 1 to Problem 1B.) and then multiply by four. with a heart in the first position. While one might prefer a solution where the events are described verbally. with a spade in the first position. part (i): Pr(al1 four cards are spades) = Pr(first card is a spade) x Pr(second card is a spade I first card is a spade) x %(third card is a spade 1 first two cards are spades) x Pr(fourth card is a spade I first three cards are spades) = . the introduction of an appropriate notation for the events of interest makes it possible to write out a fairly concise. this argument works! However. solution. but d o not take the order in which the cards are dealt into account.) (2) In solving part (ii) by Method 1 it is tempting. we note that there are 39 possibilities for the latter card. it is not valid in situations where selection is performed withour replacement. involving meaningless terms like Pr(S iSS). . or the random selection of cards wirh replacernenr (and is fundamental to the derivation of the binomial distribution). we obtain the same answer as before. (See Note 2 to Problem 1B. we obtain the required result. There are thus outcomes in the sample space. dividing by Notes ("4') and cancelling. is not enough. on the grounds that there are four possible positions for the card which is not a spade. We thus obtain the same probability as was obtained by Method 1. (1) In Method 1 . writing down Pr(SSSS).3 for a discussion of similar points. for example. A slightly more complicated argument. Consider. etc. the number of hands in which spades and hearts alternate. this becomes somewhat lengthy.nS. In the present case note that.ns. reduces to the solution already obtained. yet clear. (i) There are thus (i3) hands consisting of four spades. is the same. is therefore necessary.6 Probability and Random Variables lA.
(b).) = 13x12~50~49. that of Method 3 can be more easily extended to cover similar problems where larger numbers are involved. * leading to the same result as before. which is discussed in greater detail in Problem 2A. five schoolchildren . Solution There are at least two ways of solving this problem. (4) As a further variant on the Method 2 solution to part (iii). For example. Thus. We are. We shall solve all four parts of the problem using the first approach (referred to as Method 1 below). by an argument similar to that used for part (ii). (d) Daniel sits between Clare and Edward (but not necessarily adjacent to either of them). say).all of them equally likely. six are spades is.1A. there being lo! equally likely arrangements in all. and parts (a). This argument is rather less straightforward than it may seem. (a) Alan and Barbara sit together. we have n(s1ns2ns3ns4) 1 3 x 1 2 11 XIO. in a hand consisting of ten cards. Daniel and Edward sit together but Alan and Barbara sit apart. One is based on enumerating the seating arrangements corresponding to the events of interest. * We are dealing here with the hypergeometric distribution. 1A. . Of these. (c) Clare.Alan.3 Probabiliry I which the simpler argument does not give the correct answer. in fact.) (3) When we are dealing with a sample space in which all outcomes are equally likely. there are 50x49 possibilities for the next two cards dealt . ( 5 ) While the approach of Method 1 is probably the simplest for this problem. conditioning on the first two cards being a particular pair of spades (the three and the queen.3 The school assembly At the morning assembly. Since the probability obtained does not depend on which pair we are conditioning on. Clare. A second approach involves the multiplication of appropriate probabilities and conditional probabilities. we may consider a reduced sample space in which. find the probabilities of the following events. 11x 10 consist of two spades. and (d) using the second (Method 2). Daniel and Edward sit together. (b) Clare. Daniel and Edward sit down in a row along with five other children (whose names need not concern us here). the probability that. it is also the probability conditional on the first two cards being any pair of spades. given that the first two cards dealt are spades.$ . in solving part (iii) by Method 2.: as before. so that the required conditional probability is. = ~ and n(sIns. Barbara. If the children arrange themselves at random. 11. we may use the fact that n (A nt?) Pr(A I B ) = n(B) ’ where n denotes the number of outcomes in the indicated event.
and 8! ways of arranging the eight objects in order.= 15 60 20' (d) Consider first the seven other children. that we already know the probability that C. one consisting of A and B. and the remaining seven children.Pr(mncm). the required probability is 3!x8! 1  lo! _ 15 ' (c) Direct enumeration is difficult here. D(anie1) and E(dward) sitting together.=9 ! lo! 5. Formally.) . and the other five being the remaining children. The required probability is thus Pr(A in end position)Pr(B next to A I A in end position) + Pr(A in internal position)Pr(B next to A I A in internal position) = 1 5' (The conditional probabilities are obtained by regarding the nine possible positions for B. They can be arranged in lOx9X8X . X4 ( = lo! /3!) ways. if we can find the probability that they sit together and A and B sit together. one consisting of C. The required probability is thus 2x 1 . given that of A. (Here. D and E sit together. and when A is in an internal position. there are 2 ways in of arranging C. we consider permutations of eight objects: C(lare).3 (a) To count the number of possible arrangements here. the total number of arrangements in which A and B sit together is 2x9!. we may obtain the required probability by subtraction. Since there are 3! possible internal arrangements of the group of three children. We note. as equally likely. Thus the solution to this part is 2x10!/3! = 1 lo! 3' Method 2 (a) If we consider the position of A. or to the right. of B. there are two different situations: when A i s in an end position. we obtain 2X3!X7! = Pr(ABnCDE) = lo! 60 ' The solution to this part of the problem is therefore 1 1 1 .8 Probability and Random Variables Method 1 1A. D and E. we use AB to denote the event that A and B sit together and CDE the event that C. For each such arrangement. . with a slight misuse of notation.) We find Pr(AB flCDE) by considering the 7! possible permutations of seven objects. we consider permutations of nine objects. Pr(ABncDE) = Pr(CDE) . D and E sit together so that. . D and E in the remaining seats with D sitting between C and E. after allowing for the fact that A may be sitting to the left. the other eight being the remaining children. however. D and E. The number of possible permutations of nine objects is 9! so that. (b) As in the solution to part (a). Taking into account the 2 possible orders for A and B and the 3! possible orders for C. one of which consists of A(1an) and B(arbara) sitting together.
3 Probability 9 (b) Let us consider first the event that C. . . D and E sit together in that order) = = Pr(Cinoneofpositions1. 9. . and Method 1 for Problem 1A. .. D and E can sit. . . D and E sit together in that order. since there are 6 (i.8)XX8 . . D and E to be considered.2) may suggest itself as the most natural approach for many problems. 10X9X8 . . . x 1 9 8 Thus Pr(C. although possible.e. . probability is 6 (c) A solution to this part using Method 2.1A. 8.2 that. (2) A simple solution to (d) follows from the observation that there are six possible orders in which C. 2. D and E sit together in that order is Pr(D in position i+11 C in position i) X Pr(E in position i +2 I C in position i. it can sometimes prove (as in this problem) more powerful than the former. Although the latter approach may not appeal.2. Thus the probability we require is 9 i =2 Pr(D in position i) X Pr(C and E on opposite sides of D 1 D in position i) . Method 2 for this problem. and is not provided here..2(1X8 . the required probability is = +.. If D sits in position i. 3!) possible orders for C. the conditional probability that C. + 8x11 _ 1 Notes (1) We remarked in Note 5 to Problem 1A. these being equally likely. the conditional probability that C sits to the left of him and E sits to the right of him is i1 10i X* 9 8 ' the conditional probability that C sits to the right of him and E sits to the left has the same value. 3. If C is in position i (1 5 i 5 8). D in position i+ 1) = 1 . a . is rather complicated. which must be one of 2. . the required = 1 15 . while an approach based on splitting the experiment into stages (cf.x x 1 lo 9 1 8 1 9 1 8 Finally. Since two of these orders are such that D sits between C and E. . For this to be true. C must be in one of the positions 1. because of the complicated combinatorial arguments that it can involve. each of these has probability &. (d) We solve this part by first working conditionally on D s position.240 720 + 6' 2x7 + . it may turn out that some problems require us to enumerate outcomes in the sample space of the whole experiment.
Thus the probability that the last card is a heart is 1 3 X 5 1 X 5 0 X . the last one is a heart. T o count the number of such hands in which the final card is a heart we note that. make use of an argument based on considering all of the 5 2 x 5 1 x .350. ~ 4 possible hands of 10 cards (taking the order of 3 dealing into account) to be equally likely.since all four suits must be equally likely. for each of the 13 possibilities for the final card. using the law of total probability. . 2) denote the event that the i t h card dealt is a heart then. two being on the same day. . x 4 3 different possibilities for the preceding nine.5 A birthday coincidence The two authors of a letter to The Times. For example. the sceptical will find that the previous approach has become rather cumbersome. The conditional probabilities appear in the second of the given solutions. wrote as follows: ‘Whilst we were travelling by train back from work with two other friends. . we happened to discover that the four of us had birthdays on three successive days. They may. x 4 3 = 1 52X51X50X.5 1A. but the unconditional probability that the second card is a heart. there are 5 1 X 5 0 X . (2) Even students who are clear about what is being asked for are reluctant to trust the first method of solution. Using our limited arithmetic and electronic calculators we have worked out the odds of this rare occurrence as being approximately 1. preferring the second approach. . involving the enumeration of 29 possibilities. Find the probability that the second card dealt is a heart. A common answer when the question is put to a class is ‘It depcnds on what the first card is’ but what is asked for is not a conditional probability. . Solution The answer is clearly +. If this argument appears treacherous in its simplicity. Pr(H2) = P~(HI)P~(H~ 1 ~ +~ Pr() i? 1 ) ~ r ( I~i? 1) 2 52 51 13 12 1 = 4‘ Notes (1) Newcomers to the study of probability theory frequently find this question confusing. If we let Hi ( i = 1.000 to 1. This approach will not. suppose we require the probability that. the argument given in the following paragraph may be preferred. .’ Verify the correctness of the stated odds. published in September 1984. . x43 4’ * 1A. While it is again clear that this is f. . however. where the unconditional probability is obtained as their weighted average. . however. extend easily to similar problems involving many cards.10 Probability and Random Variables 1A. when ten cards are dealt.4 The second card Two cards are dealt from a pack.
2 365 365 and 2 = 8 365 3653 3 12 = 365 3653 3653 Pr(A2)Pr(A31A2)Pr(S41A2nA3) = XX 2 365 2 365 respectively. we obtain the required probability as Pr(s2nc3nB4)+ Pr(c2ns3ns4) p r ( c 2 n ~ . symmetry considerations will enable us to obtain the solution to the problem by multiplying by 6 (since this is the number of distinct pairs amongst the four passengers). one of the birthdays of those previously considered. The first of these three events has probability Similarly. C. Turning now to (ii). the second and third events have probabilities 2 Pr(A2)Pr(S31A2)Pr(A4/A2nS3) = Xx. we find that the probability corresponding to (i) is 24. two days before or after.5 Probability 11 Solution As with several of the problems in this section. Method 2 For the event of interest to Occur it must be the case that two of the passengers have the same birthday. or (ii) a C .e. To obtain the probability of (i).’s birthday falls berween that of P I and the ‘close’ birthday of another passenger. n s 4 ) + 365 365 365 2 xx] 2 365 365 1 365 + “xx) 365 1 365 3 365 Finally. one day before or after. we add the probabilities corresponding to (i) and (ii) to obtain the result giving odds against the observed coincidence of about 1 350 752 to 1. It will be sufficient to consider the case when P I and P2 have the same birthday. The event that this passenger has the We shall consider the four passengers in order. stand for the event that P. we shall present more than one solution. P3 and P4. that of PI. When we have found the corresponding probability. a B and an S .. Totalling these. illustrating different possible approaches. The event which aroused the interest of the letterwriters then occurs if the events relating to P2. P3 and P4 are either (i) two A s and an S . Method 1 P. P 2 . i. will denote the Occurrence of a birthday adjacent to.1A. while we will let B. In presenting each solution we shall refer to the four passengers as P I . A. will denote the occurrence of a birthday close to. We shall consider these two possibilities in turn.e. i. while the remaining passengers have birthdays which. and introduce the following notation. along with the common birthday of the first two. in any order. with the C preceding the B but the S in any position. A2nS3nA1 and A f l A 3 n S 4 . we sum the probabilities of the three events S2nA3nA4. . relating to same birthday as one of those previously considered will be denoted by S. fall on three consecutive dates..
4. we consider the four passengers in order and.and so the probability that all four passengers have birthdays on three consecutive dates. Notes (1) In our solution we ignore leap years and do not take account of the fact that birthdays are not uniformly distributed throughout the year. . To allow for these two factors would complicate matters immensely. there are three possibilities to consider. For a particular set of three days. P3 and P4 have birthdays on consecutive dates is obtained by adding the probabilities of two events. the answer to the problem is obtained by multiplying . and P4’s birthday must then occur immediately before or after the pair of dates occupied by the birthdays of P1 and P3: the associated probability is ~ 2 2 =i Second. P3’s birthday may be one day before or after that of P I . The probability that the four birthdays fall on three different days is then ~ r ( ~Zi 4~ +n r p 2 n~ 3 n ~ 4 )Pr(D 2 n ~ 3 n ~ 4 ) ) p ~ Zi n + 365 365 365 3 6 4 . all equally likely. but only 365 such combinations are of three consecutive days.12 Probability and Random Variables 1A.P3 and P4 have birthdays on consecutive dates is 6. This can be done by considering a particular set of three days and then multiplying by 365. The total number of outcomes with the required property is therefore 365X3X12 and. First. is L Finally. with P 1 and P 2 having the same birthday. b4). about P2. b3.S Forgetting. L X . dividing by 3654. depending on whether the duplicated birthday is on the first. To solve the problem we must count the number of outcomes having the required property. 364x363 ’ we obtain the same probability as by the other two methods. obtaining the result 36. 355’ this probability by 6. Thus M52 35. As in the earlier solution.in which case 365 355 =2’ P4’s birthday must occur between the two dates: the probability of this is && = 2.363) + 365 365 365 1 365 [x x x m ] 365 365 The probability we have just obtained is composed of equal components corresponding to each of the possible combinations of three days. Method 4 We consider a sample space of outcomes of the form (61. for the moment.52 the probability that PI. The probability that the four have birthdays on three consecutive days is thus obtained by multiplying the probability obtained in the previous paragraph by e) = 365 (3f5) 6 . we obtain the same result as before. where bi denotes the birthday of passenger P i : there are 3654 such outcomes. let Di stand for the event that Pi’s birthday is diferent from those considered earlier. the probability that P I . 62. for i = 2. but considers first the probability that the birthdays of the four passengers fall on three direrent dates. P3’s birthday may be two days before or after that of PI. 3. 3553 Method 3 Another possible approach is along lines fairly similar to those followed in Method 1. leaving consideration of the problem of consecutive dates until the end of the solution. Consider one of these: the number of possible allocations of passengers to the three birthdays is 4!/2 = 12. second or third day.
a ‘close’ birthday is also an ‘adjacent’ birthday. one is often uneasy until two different solutions to a problem yield the same answer. closely related to that described above. AAC . SSD . a smallscale version of a problem. see Note 6 to Problem lA.5 Probability 13 (2) The solution using Method 1 is. E and F is one of A . such as that sketched in Figure 1. valid for a ‘year’ of 4 days.1A. One is by means of a tree diagram. Since we have presented four solutions which agree with one another and with the odds quoted by the authors of the letter to The Times. then we must have A 3 or SJ. (For further comments on the r d e of tree diagrams in solving probability problems. On reflection. It is also interesting to note that we were. There are 64 such outcomes. AAB . There are other methods of enumerating all favourable outcomes. however. the least elegant of those presented. C2 and S2 ( B 2 being meaningless).as we have remarked elsewhere. SAA and SCB . for checking purposes. in Method 1. we must have one of A2. Eliminating these. where each of D . we find that the outcomes remaining are AAS .5 Construction of the diagram begins with the observation that. and then dealing with those involving a ‘close’ and a ‘between’ birthday. The method depends on the systematic enumeration of all outcomes favourable to the required event. It is. They are. somewhat confused by the fact that a check that we applied to a correct solution seemed to fail. 3 C and S . and so on. .1 Tree diagram for events in Problem 1A. in a sense. we 3 realised that the arguments we have used all break down for such a short ‘year’: for example. is to consider all outcomes of the form D 2 f l E 3 n F 4 (or. while working on this problem. from time to time. for the event of interest to occur. a measure of the remarkable . only fair to confess that. one of us also produced a number of incorrect solutions. Figure 1. however.ASA CBS CSB . It is sometimes useful to consider. for example in alphabetical order: AAA . which we may imagine listed in some systematic way. to use a suitable contracted notation. 43 (4) Letters of the the type quoted above appear. for which the probability is 6. (3) Checking solutions to probability problems can be difficult .1.) Another approach. D E F ) . if we have A2. since the entire sample space of 8 outcomes could then be easily enumerated. Unfortunately the methods we have used in 1 our solution then lead to a probability of 6. but represents possibly the most reliable approach. at one stage. while others (such as the second) are meaningless or impossible.lO. we feel happy that our solution is correct. In this case we considered the same problem for a ‘year’ of 3 days. which is clearly ridiculous. Of the outcomes in this list some (such as the first) do not contribute to the event of interest. in the press: generally the smallness of the probability seems to be taken as a ‘surprise index’. . . and we have chosen to do this by considering outcomes involving two ‘adjacent’ birthdays first. SSC .
It is instructive.6 nature of the event. . he continues to throw the dice until he either repeats the total of his first throw (in which case he wins) or obtains a total of 7 (in which case he loses). First. . but to add the probabilities of all other events of the same type. If a letter were submitted to The Times to the effect that four people had birthdays on January 12. To judge how surprised one ought to be by a claim. The contribution to the total probability of a win associated with the first score being a 4 is then . we see that this has Probability 6 2 Pr(score = 7) + Pr(score = 11) = . it is likely that the observed event will still seem quite unusual. given that the first throw results in a score of 4.yet the probability of this event. and so on. In the case of the four passengers. which is will be denoted by I .f ) i . (iii) all birthdays on the same day of the month (but in different months). 1A. we shall denote by w . On his first throw he wins if the total is 7 or 11.14 Probability and Random Variables 1A. the probability of a score of 7.w . to consider how this ‘surprise index’ ought to be assessed. the sort of calculation required in finding a significance level in hypothesis testing. the player throws two dice. Dealing first with the possibility that the player wins on his first throw. (ii) three birthdays on one day. or 12.1 throws which result in neither a 4 nor a 7 have to be followed by a final throw which results in a 4).6 Winning at craps In the game of craps. Find the probability that he wins. . it will be more convenient to retain the common denominator 36 in all probabilities until later in the solution. we find that the probability of a win. It is this fact which rules out a claim based on the four dates specified above. This is. Similarly. If his total on the first throw is not one of these figures. 4! /3654.l w (since i . for the moment. 2. a letter might also have been sent in the event of (i) all birthdays on a t most two consecutive days. is &. not only of the event itself. is far smaller than that referred to in this problem. one must consider the probability. however. Given that the first throw results in a 4. but even more ‘unusual’. Even after allowing for this. 3. April 24 and September 6. but also of any other event which would justify a similar claim. ) is (1 . and loses if it is 2. February 23. The introduction of this notation will enable us to deal with other possibilities for the first throw merely by changing the values of w and 1 . one imagines that the editor might not find space for it .+ 36 36 (Although it might seem natural to simplify the fractions involved here. consider the case when the first throw results in a 4: this has probability & which. To calculate a useful ‘surprise index’ one has not just to calculate the probability of the event observed. Summing these probabilities.2.) W e now have to deal with the remaining possibilities. Solution The probability distribution of the total score on two dice is obtained as the solution to Problem 1B. however. and we shall return to this topic in Chapter 4. the probability of a win a t the i th subsequent throw ( i = 1. of course.
518 and 1  [gr = 0. forming a weighted average of the conditional probabilities obtained. Instances of gambling odds very close to even have been the subject of interest from the earliest days of the mathematical study of probability theory. are happy to play the game. we obtain the probability of winning as . We may deal similarly with initial scores of 5 and 9 (with w = and with initial scores of 6 and 8 (with w = f ) . The probabilities of success in these two ventures can be Seen to be 1 [sr = 0. and use the law of total probability. In the seventeenth century the Chevalier de MCrt.6 Probability 15 Pr(first score is 4)xPr(win 1 first score is 4) = WL  w +1 32 36 3 + 6 ' Since the probability of a score of 10 is also the contribution to the total probability of winning associated with that initial score is the same as the above. more generally. so that it makes a steady profit. there is a small advantage to the house. Notes (1) It is interesting to see how close the probability of winning (0. 4). while he had a better than even chance of throwing at least one six in four throws of a die.) At this stage it appears as if this method. (2) The fact that the probability of obtaining a 4 before a 7 (or. (Although we have not indicated it explicitly. Thus Pr(p1ayer wins) = k I Pr(game terminates at throw k ) : =Z 0 xPr(p1ayer wins 1 game terminates at throw k).coupled with the observation w + 1' this argument is that we condition on the number of throws until the game terminates. however.1 &. like that used in the main solution. The basis of W . but we can avoid this when we note that the conditional probability of winning appearing on the right hand side does not depend on the value of k on which we are .1A. given that the score is either 4 or 7.491 * respectively. of obtaining a winning score before a losing score) takes the simple form that this is just the conditional probability of scoring 4. his chance of throwing at least one double six in twentyfour throws of a pair of dice was rather less than even. If the game is played in a casino. players. perceiving the odds to be close to even.493) is to one half. a French gambler. approached the mathematician Pascal with a number of problems. all probabilities in the above are conditional on the first throw resulting in a 4. Combining all these results. One of these concerned his observation that. involves the summation of series. suggests an alternative argument which does not involve the summation of series.x.
R : he goes to the river. Pr(F IR) = .) 1A. he comes home without catching anything. corresponding figures for the river and the lake are 40% and 60% respectively. Find the probability that the two fishermen will meet at least once in the next two weekends.16 Probability and Random Variables 1. (b) Find the probability that he catches fish on at least two of three consecutive Sundays.to a river with probability or to a lake with probability If he goes to the sea there is an 80% chance that he will catch fish.) a. (a) Find the probability that. A7 conditioning: it is Pr(score is 4 1 score is 4 or 7) = W  w +I' The unconditional probability of winning must therefore have the same value. 5 (a) Using the law of total probability. The given information may then be written as 1 4 Pr(S) = . the need to sum a series has not disappeared. he catches fish. But the right hand side of the above expression becomes I ) W w +1 x =2 Pr(game terminates at throw k ) = W .7 The fisherman Each Sunday a fisherman visits one of three possible locations near his home: he goes to the sea with probability +. Soh tion We shall use the following notation: S : he goes to the sea.q 2 5 + (ax$) + p) 4 5 + Pr(L)Pr(F I L ) z  13 20. F : he catches fish. Pr(F) = Pr(S)Pr(F IS) + Pr(R)Pr(F lR) = ('. X w + I the sum being 1 since it is the sum of the probability function of the length of the game. (Strictly speaking. 4 Pr(F lL) = 3 . .I. (c) If. L : he goes to the lake. 4 5 Pr(L) = 1 . 2 5 1 2 Pr(R) = . where is it most likely that he has been? (d) His friend. Pr(F is) = . who also goes fishing every Sunday. chooses among the three locations with equal probabilities. on a given Sunday. (Any assumptions you make in solving this problem should be clearly stated. i. on a particular Sunday.
(There is a further assumption in part (d). that they choose independently of one another.) .7 Probability 17 (b) It follows from the solution to part (a) that the number of Sundays on which he catches fish is a random variable X having the binomial distribution B ( 3 . The required probability is thus Pr(X = 2) + Pr(X = 3) = 3 [:$I  1 . in parts (b) and (d).1A. R2. t)2= $.2873 4000 (c) From (a).. $). 151.']+ p] 2 3 [LX']+ 4 3 4 3 = 1 3' The probability that they fail to meet over two weekends is then (1 probability that they meet at least once is 1 .Pr(F) = &.) [:+ . L 2 similarly. so that the The assumptions made in solving this problem are.I3 . (d) If. now. we let S1 denote the event that the first fisherman goes to the sea and S2 the event that the second goes to the sea. P r ( 2 ) = 1 . Hence 20 Similarly  20 So it is most likely that he has been to the river. in part (d). that the fishermen choose independently in different weeks and. and define R1. the probability that they meet on a given Sunday is = I'.$ = $. namely that they meet if they both go to the same location.
is obtained from the answer to part (a).  We may now use Bayes’ Theorem.256 0. 0. IM) Pr(S I L ) (1. 0 .2 respectively. T h e probabilities of bulbs of the three types being unsatisfactory are 0.2)2 = 0. the box chosen contains medium quality bulbs.0 ____ . Probabilities (ii) and (iii) are obtained similarly. 6 4 ) + 0.18 Probability and Random Variables Notes 1A. (ii) of medium quality.8 Testing lightbulbs The stock of a warehouse consists of boxes of high.o’2 0. namely f .0. IF a box is chosen at random and two bulbs in it are tested and found to be satisfactory.256. 4 ~ 0 . medium and low quality lightbulbs in respective proportions 1 :2 :2. Probability (i) is Pr(H)Pr(S jH) Pr(ii is) = Pr(H)Pr(S lH) + Pr(M)Pr(s I M ) = + Pr(t)Pr(S it) __ _ ( 0 .4. (iii) of low quality? Solution We shall adopt the following notation for events of interest: ii : M: I. 1A.0. Pr(M) = 0.2 0.0.0 0. 2 ~ 1 . : S : the box chosen contains high quality bulbs.1 and 0. 8 1 ) ( 0 .4. we deduce that Pr(S i H ) Pr(S = = = 1. 0 )+ ( 0 .2. This argument applies only since the second chooses each of the three locations with the same probability. to some extent.328. although this is. the box chosen contains low quality bulbs. The given information concerning the proportion of boxes of the three types may be written in the following form: Pr(f1) = 0. 0.415 0. (2) We can simplify the answer to (d) by observing that the probability that they meet on a given Sunday is just the probability that the second chooses the same location as the first. 1 ) ~ = 0. For (ii).78 . Pr(L) = 0. Pr( F ). disguised by the fact that the denominator.0 .81.2 + 0. for (iii).324 = 0.8 (1) In (c) we are using Bayes’ Theorem. From the information on quality. we find Pr(M I S ) and. what is the probability that it contains bulbs (i) of high quality.256 = 0.78 . the two bulbs tested are found to be satisfactory. = 0.78 Pr(L I S ) =  0.64. ( 1 .2x1. 4 ~ 0 .324 + 0.
99. 1A.0 .Pr(al1 type B components fail)} 0 = (1. probabilities (ii) and (iii) then follow almost immediately.999 is due to the rounding errors created by recording the calculated values to three decimal places.9 Probability 19 Notes (1) This is a fairly standard type of problem involving the use of Bayes’ Theorem. Components may be assumed to fail independently of one another.Pr(al1 type A components fail)} ~ ( 1 . .2~). add to 1. The probability that a type A component will function correctly throughout the planned lifetime of the spacecraft is 0.256) of the probabilities Pr(H I ) Pr(M IS) and Pr(L IS) in the course of finding probability (i): S.992 = 0. we could examine all systems weighing 11 kg (there is clearly no point in considering systems weighing less.324 and 0.9. Note the computational simplification which results from writing down the values (0. The fact that their total appears to be 0. made slightly more complicated by the fact that two bulbs are being tested.2 Illustration of seriesparallel system The systcm will function as long as at least one component of each type functions.connected in a seriesparallel arrangement.2. what is the composition of the most reliable navigation system weighing 11 kg or less? Solution (a) Suppose the system consists of a Components of type A and b components of type B.0. 8 . (a) Show that a requirement that the probability of failure of the entire system should be less than 1% may be met by using three components of each type (as in the diagram above).A and B .9 Designing a navigation system A spacecraft navigation system consists of components of two types . (2) The three probabilities obtained should. the corresponding probability for a component of type B is 0 . as in Figure 1. without the need to ‘start afresh’. (b) If components of type A weigh 1kg.00.1A. since any such system can be improved upon by .2. (b) In order to solve this part. the requirement of part (a) is met. If a = b = 3 .00. of course. this equals 0. I 1 I I Figure 1.1~)x(1.991 and.999x0. The numbers of components of the two types may be vaned in order to adjust the probability of failure of the navigation system. while those of type B weigh 2 kg. sincc this excccds 0. Then Pr(system functions) = Pr(at least one type A component functions) x Pr(at least one type B component functions) = (1. 0.
we can.e. the solution of larger problems requires a more sophisticated approach.i3)x(1.0 . for the events that a member is male or female.20 Probability and Random Variables 1A. i.0 .' 2 0.0. Although smallscale problems of this type can be solved relatively easily. Pr(F lA) = . and C the event that he or she is a child. There are five systems in all.10 The sports club Three quarters of the members of a sports club are adults. i.e.2~) = ow. (b) Find the probability that a member of the club who uses the swimming pool is male. Solution In our solution we shall let A denote the event that a member of the club is an adult. and three fifths of the children. = and a = 3. 4 4 .991.0. Half the adult males. 1.o. The given information may then be summarised as follows: 3 1 Pr(A) = . i. 3.1~)~(1. 6 2 This leaves just two possibilities to be considered: a = 5.991. therefore. Note Part (b) is an example of a knapsack problem. since there are relatively few possibilities to consider. and a third of the adult females. (c) Find the probability that a member of the club is female.0 . in which integer variables (here a and b ) must be chosen to maximise some quantity (here the system reliability) while satisfying constraints imposed by available resources (weight in this case). respectively.0.e. are male.10 adding components of type A). that the system we are seeking must exceed this in reliability.0 2 . We shall let M and F stand. 1A. and one quarter are children. (a) Find the probability that a member of the club uses the swimming pool. however. 6 = 3. Three quarters of the adults. It follows that a and 6 must satisfy 1. the corresponding proportion for children of either sex is four fifths. must have reliability exceeding 0. (e) Find the probability that a male user of the swimming pool is a child. 6 = 4.992. reduce the number of possibilities that we have to investigate if we observe that the system considered in part (a) weighs less than 11 kg and.0. with reliability (1. with reliability (1.2~) 0.0 .0 .0. use the swimming pool at the club.1" z 0. so that the best configuration weighing 11 kg or less consists of three components of type A and four of type B. a and 2 3. Pr(M IA) = 4 4 3 1 . and S for the event that he or she uses the swimming pool. (f) Find the probability that a member of the club who does not use the swimming pool is either female or an adult.991. Pr(C) = . (d) Find the probability that a member of the club who uses the swimming pool is female.
5 5’ 1 .738 the solution is 0. 16 10 (d) We could proceed here as in the solution to part (b).87 /An F ) 16 1 5 160 . .] =9 ++ 1 32 .+ .544.+ Pr(c )Pr(M 1 c p r ( s 1 M nc) 32 = 9 32 321 + . Pr(FIC) = . since the required probability is simply o Pr(F 1s) = 1 . and Pr(M ns ) = Pr(A nhf ns ) + Pr(C T\M ns ) = 9 . but can save work by making use of the solution t that part of the question. 160 321 800 (c) Pr(F) = Pr(A)Pr(F IA) + Pr(C)Pr(F IC) = 3 1 . Since Pr(M IS) = 0.Pr(M IS).738.288. 25 800 so Pr(M IS) = 7 = 0.262.10 Probability 21 3 2 P r ( M / C ) = .lA. Pr(S I A n F ) 2 3’ Pr(s 4 IcI1M) = Pr(S I c n F ) = Pr(S IC) = 5 (a) Pr(S) = P r ( A n M n S ) + P r ( A n F n S ) + Pr(CnS) = Pr(A)Pr(M IA)Pr(S lA n M ) + Pr(A)Pr(F jA)Pr(S + Pr(C)Pr(S IC) = [+x+x+] + + (ax.3 = . 1 Pr(S / A n M ) = .= 0.0. Now Pr(S) has been obtained as the answer to part (a).
There is. we obtain the inaccurate result Pr(M I S ) = 0. where it helps to shorten the solution. 160 T o obtain the value of the numerator.= .P r ( s ) = . no need to d o this and.22 Probability and Random Variables lA. converting to a decimal a t the end. answering part (a) when the answer is to relate to a member visiting the club on a morning when the swimming pool is closed. (2) Although the statement of this problem does not include the words ‘randomly selected’.+ . we have obtained the answer to each part in fractional form. In solving examination questions.737. It is not uncommon to encounter examination questions stated in a somewhat loose manner. however. in an earlier draft of the solution. to guard against spending too much time looking for such ‘shortcuts’ or attempting to use them at all costs. the word ‘random’ has a rather specific meaning here. for example. In this problem the assumption of random selection is the only reasonable one to make. and to have to make suitable assumptions in order to reach a solution. T h e disadvantages of earlier conversion may be illustrated by considering part (b): using P r ( M n S ) = 0. in effect. We obtain s Pr{(A UF)n S } = Pr(A nMn S ) + Pr(A nFnS ) + P r ( c nFn S ) = Pr(A)Pr(M IA)Pr( S I A n M ) tPr(A)Pr(F ~ A ) P r ( iA n F ) + Pr(C)Pr(F I C)Pr( S j c n q . Substituting their values.9 32 8 50 341 800 We now obtain the required probability: Notes (1) For the sake of accuracy. it is important to bear in mind that work done in solving earlier parts of a question may pave the way for solving later parts.299. and to sum their probabilities.401 and Pr(S) = 0. however. It is important.1. made use of results appearing in the solutions to earlier parts.1 1 . we did so to a lesser extent than we d o now. that these words have been inserted after the words ‘Find the probability that a’ in each part. (As discussed in Note 1 to Problem 1A.544.10 (e) The numerator and denominator in the above expression have already appeared in the solution to part (b). . we obtain ~ 3 Pr(C I M n S ) = 321 = 0.+ . ~ 25 Roo (f) The required probability is Pr{(A uF)n S } Pr(A U F I ) = Pr( S ) The denominator in this expression is easily obtained from the solution to part (a): 73 P r ( S ) = 1 .) The assumption might not be valid in real life: consider. it is simplest to split the event of interest into three mutually exclusive components. (3) In our solutions to later parts we have. our answer assumes.
2 and Note 1 to Problem 1B. but. ’ So far.10 Probability 23 Sometimes. we seek the conditional probability of the complementary event (that the member is a male child). we could begin the solution to part (a) in the following manner. it can be understood more easily by bearing in mind. and the numerator is 160 Substituting these values we obtain Pr(C n M ~ s ) = 0. With many problems it turns out. 73 as before. and to relate the diagram to the problem and its solution. in solving a number of problems for which a tree diagram might have been considered.934. A tree diagram on its own is not enough. they d o not always constitute the best method of presenting the final solution. after a tree diagram has been produced.066. due to the use of settheoretic notation.) (6) One way of getting to grips with a problem such as this is by means of a rough tree diagram.3 for further discussion of the rBle of set themetic notation in the solution of problems in probability. The solution we have presented is fairly concise. not unlike some examination questions that we have seen . . Thus. as obtained beforc. It is. preferred to present a verbal solution (we include in this category solutions in which many of the ‘words’ are mathematical symbols).lA. For these reasons we have.066 = 0. that only a small section of the diagram is needed for a solution.hence its inclusion. . however. Readers will notice that we d o not make much use of tree diagrams in this chapter. (See Note 1 to Problem 1A. define notation. it is easier to start afresh. Rather than find the required conditional probability directly. while tree diagrams have undoubted advantages in the teaching of probabirity and in the early stages of solving a complicated problem. and mentally to perform similar expansions on other sections. we’ve dealt with the first line of our solution. This is the case in part (f) of the present problem. (5) This is a rather unappealing problem. This is Now the denominator in this expression is . It has to be supplemented by sufficient verbal explanation to establish conventions. the meaning of the various events involved. since the probability that a child is a swimmer does not depend on sex. like many arguments which are. This reflects our view that. apparently. (4) There is a simpler solution to part (f). of a fairly abstract mathematical nature. ‘A swimmer can be either an adult or a child. so we have to add the probabilities of an adult swimmer and a child swimmer. . not because the concepts involved are particularly advanced. possibly. but because of the complicated nature of the manipulations required and. for example. The arguments could be presented verbally. We need not take the sex of the child into account in finding the latter probability. and there’s a lot more to go! The advantages of our more concise approach in presenting a solution are evident. but it is important to bear in mind in reading the solution (particularly if it seems rather mystifying) that it represents an argument along the lines we have expanded in words above for the initial section. but we d o have to take account of sex in finding the probability of an adult swimmer: this probability is then expressed as the sum of the probabilities of an adult male swimmer and an adult female swimmer . 3) = and hence Pr(A U F 1  0. while studying it. even if earlier results con be used. because it is difficult to imagine why anyone would be interested in the more intricate questions asked. but the result would become very lengthy.
11 Figure 1. unbracketed. the calculation of several others. quite possibly. (b) Find the probability that a player who still has some money left after playing five times has at least three more plays. at the nodes are the unconditional probabilities of the corresponding combinations of events. T h e situation would have been somewhat different for a problem consisting of only a few of the six parts of the present problem. and it is interesting to note that. We have.3. with appropriate explanation. no other payouts are possible. the necessary operations on these probabilities. a solution based on a tree diagram would involve the addition of four and. A t each play it pays out 30p with probability f . whereas those appearing. the probabilities to be extracted from the diagram. and the reader can thus relate the diagram to our earlier solution. indicated on the diagram where the probabilities required for each part can be found.3 Tree diagram for events in Problem lA. lA. The notation adopted is that already described. (a) Find the probability that a player who starts with 20p and plays until her funds are exhausted plays at least six times.ll Playing the fruit machine A fruit machine takes l o p coins. we shall not d o this. so that not much of the work involved in the construction of the diagram is ‘wasted’. and the conventions followed are that the probabilities in brackets alongside the branches of the tree are appropriate conditional probabilities. and performing. A solution of the problem based on this diagram would be completed by identifying. in Figure 1. while our original solution to part (a) involves the addition of three probabilities. . for each part. a tree diagram relating to this problem. however. It will be noted that most of the probabilities on the diagram are used in solving at least one part of the problem. Since this would repeat much of what we have already done.10 We present.24 Probability and Random Variables 1A.
Methods of investigating these differ according to the type of random variable.279. the most common types are discrete and continuous random variables. and is callcd. (b) What we require is Pr(she plays a t least nine times 1 she plays at least six times) . other functions which will be examined are the probability function (or . To find the probability of a t least six plays. we shall find the probability of five or fewer and subtract from 1. f~[+I4. or a subset of either. Since the latter implies the former. WLWLLLLL. There are three possibilities to be considered: LL. the corresponding probabilities are [$I2. LWLWLLLL. strictly speaking. LWLLWLLL. In such cases the experiment will have an outcome which is a number. WWLLLLLL. naturally enough. the probability that the player plays at least six times and plays at least nine times. the probability reduces to that of the latter. It follows that the probability that she plays at least six times is 0. Assuming independence of successive plays. a random variable. WLLWLLLL. The probability of eight or fewer plays in all is therefore The required conditional probability is thus Note The numerator in the conditional probability obtained in the solution to part (b) should be. To obtain the numerator. The problems in this section concentrate on the distributions of random variables and their properties. Both can have their distribution described by a cumulative distribution function (sometimes abbreviated to distribution function or c . 1B Random Variables In this section we move on to problems in which the sample space consists of the integers. and WLLLWLLL. finding the probability that she plays at most eight times and subtracting from 1. or the real line. In addition to the three possibilities considered above. 4 Thus the probability that she plays fewer than six times is (tl 4 = 0. and WLLLL. These are LWWLLLLL. we proceed as before. LWLLL. there are seven other possibilities which result in eight or fewer plays in all. ) . f .721. which varies from one repetition of the experiment to another.Pr(she plays at least nine times)  Pr(she plays a t least six times) ' The denominator has been obtained already. (t)* + 2 X t X and f~ ($1 . d .Random Variables 25 Solution (a) We shall represent possible sequences of results by sequences of letters. letting W stand for a win and L for a loss.
.1) 36 ’ x = l . i. so that X = max(S1. r ) . The cumulative distribution function of a random variable X will be denoted by F x ( x ) . 4D.(x . 2. its probability function is Pr(X = x ) . We denote the expectation of a random variable X simply by E(X). S2).l ) . . sixsided. The notation used will extend that used in Section 1A. ) for continuous random variables. . Besides those in the following chapter. and the probability density function (or densiry function. or. more generally. Note that it is the subscript which indicates the identity of the random variable. or p . (The shape of any brackets used carries no significance. and therefore = Pr(S 5 x ) Pr(S2 Now Pr(S. the expectation of g (X) will be denoted by E[g(X)]. typically those near the end of the alphabet. 6 . Thus Var(X) = E[{X .1. Problems 4D. occasionally the subscript will be omitted when the context makes it unnecessary. Therefore Pr(X 5x ) = SX} Pr(SISx nS 2 s x ) sx). as a function of x . for the special case g (X) = {X .e. 5 x } n { S 2 s x } are clearly equivalent. Finally. we define F x ( x ) = Pr(X s x ) . If X is discrete. . .m= l . x ’ 36 . Our notation for expectation and variance is also natural and conventional. its probability density function will be denoted by f x ( . we obtain the variance of X . . . 2. So we might wish to calculate Pr(X = 3 ) or Pr(X 2 2). The concept of expectation extends to functions of random variables. .1) }  2 . f . Show that Pr(X = x ) = (2x . = j ) = 1 . whether X be discrete or continuous.) for discrete random variables. In particular one is often required to calculate the mean and variance of a distribution. The two events {X and {S. die. If X is a continuous random variable. 6 i = 1. denoted here by Var(X). 2.2 are worth mentioning in this connection. j = 1. (b) A random variable Y is defined as the highest score obtained in k independent throws of an unbiased.) In particular.l probabiliry m a s s function.2 and 5D. lB. or pf. X . so that Pr (S. 36 .E(X)}*. The problems here also deal with the most important summarising measurcs for distributions based on the concept of expectation. where X denotes a random variable and x denotes a value which X might take. sixsided. 2. 6 . we note that the reader may find problems in later chapters which are closely related to those in this section. Find an expression for the probability function of Y . 6. . . die. Soh tion (a) Let S 1 and S 2 denote the two scores. 5 x ) = 6 But Pr(X = x ) = Pr(X = {x2 1 SX)  Pr(X S X . and it is sometimes convenient to denote this by p x ( x ) . Random variables are commonly denoted by capital letters.l The highest score in a game of dice (a) A random variable X is defined as the larger of the scores obtained in two throws of an unbiased. d . Pr(X = x).26 Probability and Random Variables lB. . . by independence.E(X)}2].
. the result above for the caSe k = 2. of course. . . be calculated once the cumulative distribution function is known. 6'w Notes (1) A n alternative solution to part (a) is as follows. we see that the event {X = x } is just A UB . and what are its mean and variance? S o htion The simplest approach is to denote the scores on the faces shown by the two dice by X and Y . and the sum of the scores on their uppermost faces is recorded. Denoting these two . so we obtain Pr(X = X ) = Pr(A U B ) = Pr(A) + Pr(B) . (The latter can.x 6 6 event that S = x and S2= x . occurs if S = x and S25 x or if S 5 x and S2= x .2 The total score from lwo dice Two unbiased dice are thrown simultaneously.Pr(A n B ) .1 y = l .) A similar approach may be adopted for the smallest of a set of random variables. .Pr(Y s y . and the probability distribution . Pr(Y = 6)  1.O f o r y < 6 . . . . it does not extend easily from the case k = 2 to the general case of part (b). While this method is more direct.1 . so that Y = max(S1. The event {X = x } . that the larger of S.1 ) .s k ) . s& denote the k scores. as k  m. We see that %. 1B. (4) In part (b) we would expect that.2 (b) Let S. and so has probability 1136. it is often much simpler to find the cumulative distribution function than to find the probability function.0.ask m.. and therefore seems more natural. 1 x . (2) In problems dealing with the largest of a set of random variables. Clearly. we obtain Pr(Y s y ) = Random Variables 27 . (3) The cumulative distribution function is also particularly useful when one needs to find the distribution of a random variable defined as a function of another. What is the distribution of this quantity. the range of the random variable 2 is the set of integers from 2 to 12 inclusive.Pr(Y = y ) . .10 and l B . and to write 2 = X + Y as the total. whose distribution is required. This technique is discussed in Problems lB. l l . and S equals x . By an extension of fl Pr(& ~ i=l k y ) and so Pr(Y = y ) = Pr(Y s y ) . and by symmetry this is also Pr(B). . We thus find as required. 6. . 2 . . Further. the event A n B is just the But Pr(A) = .lB. events by A and B . w h i l e Pr(Y = 6 ) . .
with the event Z = 2. = I Pr(X = x ) Pr(Y = 7 . 3 . the probability distribution o the random variable f Z .x . and note in passing that the other cases could also have been dealt with in this more rigorous way.x ) . by the addition law of probability. two mutually exclusive events. .6 6 36' We thus obtain the first result we require. 36 Extending the argument further. again. . + {Pr(X = 2)xPr(Y = l)}. Starting Pr(X = 1nY = 1) 36 and by essentially the same reasoning 1 Pr(Z = 12) = . =1 = zx. we find three possible patterns of values . (2. whatever value x is taken by X . we see by independence that Pr(Z = 2) = 1B. 36 Pr(Z =6) = Pr(Z = 8) = 5 .contributing to the event Z = 4. so we obtain 3 Pr(Z = 4) = Pr(Z = 10) = .=. Thus and.2 z = 2. 12. 1) . similar reasoning shows that 2 Pr(Z = 11) = . Y takes the value 7 . 3 ) .in an obvious notation.1 6 6 = 1 . which can most conveniently be expressed as a table. and three patterns contributing to Z = 10. 2) and ( 3 . . (1.1 = 6 1 6 . . 36 similarly Pr(Z = 5 ) = Pr(Z Also 36 since for each of these there are five patterns of values o X and Y contributing to the event f concerned. : I 6 by independence.28 Probabilify and Random Variables of Z is thus found by obtaining the set of probabilities Pr(Z = z ) . .x ) I1 6 = 9) = 4 . with the products appearing because events relating to X are independent of those relating to Y . So. Finally we show in detail the calculation for Pr(Z = 7). So we obtain Pr(Z = 7 ) = x P r ( x = x n Y = 7 . Pr(Z = 3 ) = {Pr(X = l)xPr(Y = 2)) = Pr(X = l)xPr(Y = 1) = 1 x. We see that the event Z = 7 occurs when. 36 Now the event Z = 3 will occur if X = 1 and Y = 2 or if X = 2 and Y = 1.
but can be extended readily.72 = . The other method is more sophisticated. 12 E(Z) = x z Pr(Z = z ) I =2 1 = {(2x1) 36 + (3x2) + (4x3) + (5x4) + (6x5) + + (7x6) + (8x5) + (9x4) + (10x3) + (11x2) + (12x1)) To obtain the variance of Z we first obtain E(Z2). 4. Noting that {Z . 36 6 7 and Var(2) = 9 = 5. but occasionally some feature of a problem makes another approach slightly easier. which is itself a convenient integer. as Var(z) Hence E(Z) Notes = = 1974 35 . Var(Z) = E[{Z E(Z)}2].1974 36 ' We now obtain the variance of Z simply. both of which are slightly quicker than that used in the solution.1B. so that E(Z) = E(X) Var(Z) + E(Y). but in principle the calculation is as follows.2 Random Variables 29 2 2 3 36 4 36 5 6 36 7 8 5 36 9 1 0 1 1 36 1 2 pr(z=z) 1 36 1 1 A L 36 I A 36 36 J 36 1 L 36 We now require the mean and variance of the distribution of Z . We recall that Z is the sum of two random variables X and Y .833. 0. 1. 9. and in fact this is the case here. 16 and 25. so that Var(Z) can be calculated easily from the formula usually used only as a definition. we obtain Var(Z) = ~ { ( 6 x O ) (10x1) 1 + + (8x4) + (6x9) + (4x16) + (2X25)} = . by symmetry. The mean can in fact be found easily. The first of these uses the fact that the distribution of Z is symmetrical about E(Z). 35 6 as before.7}2 takes only 6 values. and since X and Y are independent we also have the result = Var(X) + Var(Y) = 2Var(X). There are two alternative methods. the best one to use. E(Z2)= & { ( 2 2 ~ 1 ) + ( 3 2 ~ 2 + (42X3) + (52X4) + (62X5) + ) + (72x6) + (8*x5) + (g2X4) + (102x3) + (112x2) + (122x1)) . viz. generally. . viz. (1) The method used in the solution to obtain the variance of Z is.
Solution We have to find the probability distribution of X . Find the mean and standard deviation of the number of red balls drawn. 5 and 6. The first ball drawn may be. second is white) We may similarly obtain the probabilities of the other two outcomes as . which is equal to Pr(first ball is white) x Pr(second ball is white 1 first ball is white) X Pr(third ball is white 1 first two balls are white) Turning to Pr(X = l). A ball is drawn a t random. and hence the probability distribution. (2) The solution is presented in symbolic terms. 2 and Var(W) = 35n 7n 12 Note that in order to obtain these results we d o not have to find the probability distribution of W . we find that there are three outcomes to consider. for W . indeed this would be very difficult. and to each point there is attached a probability (& to each) and a value of Z (the sum of the values for X and Y). with probability attached to each of the six values 1. T h e sample space for the experiment will then consist of the 36 outcomes corresponding to these points. as well as expectations.30 Probability and Random Variables 18. 1B. We begin by finding Pr(X = 0). and it is replaced along with another ball of the same colour. The fact that for independent random variables variances can simply be added. red. and not just two. The first of these outcomes has probability Pr(first ball is red) x Pr(second ball is white I first ball is red) xPr(third ball is white 1 first is red. 3. E(W) = .3 Selecting balls from an urn An urn contains three red and five white balls. One could alternatively have presented the same method pictorially. or the first may be white. and the second and third white.3 since X and Y have the same distribution. 2. means that we can immediately write down. The advantage of this method is that it extends easily to the sum of any number of dice. Summing the probabilities of outcomes with any particular value for Z will then give the probability of that value of Z. the corresponding sum for n dice. we define various events and apply the laws of probability to them in order to obtain the probability distribution of Z. its colour is noted. This process is repeated until three balls have been drawn. 4. the number of red balls drawn. or two white balls may be followed by a red one. followed by a red and another white. representing the 36 outcomes of the experiment as a 6 x 6 grid of points. Now this distribution is the simple one. then shows that Var(X) = y. Straightforward calculation whence the required result.
second. depending on whether the single white ball drawn is in the first. Calculations are as follows: E(X) = (0 x 7 ) + (1x9) + (2x6) + (3x2) E(X2) = ( 0x7) + 24 (1x9) + (4x6) 24 . we find that 3 Pr(X = 1) = . is rather lengthy.17 _ 24 8 ' (The replacement. 4' Having obtained the probability distribution of X . or third position. We may still decide to keep the length of the solution eventually presented within bounds by not writing out everything in detail. Pr(X = 2) = Pr(Wl)xPr(R2 IWl)xPr(R3 lW1nR2) + Pr(Rl) x Pr(W2 1 R1) x Pr(R3 I Rln W2) + Pr(R x Pr(R2 I R1) x Pr(W3 lR lf1R2). we may write. is dealt with similarly.. 2. in the above. 9 . Summing the . We obtain _ 1 Finally. for example. again there are three outcomes to consider. If we let Ri denote the event that the i th ball drawn is red.ree respectively. and Wi the event that it is white. of the previously obtained values of the probability function of X by values expressed relative to a common denominator of 24 simplifies the computations somewhat.= _. probabilities we have obtained.27 24 + (9x2) 8' = 51 . if written out in full without recourse to words like 'similarly'. it is now a simple matter to obtain its mean and standard deviation. 8 The next value of X ..3 Random Variables 31 and .927 Notes (1) The above solution. but the use of a notation such as this may help a lot when . T h e use of set theoretic notation would enable us to produce a rather shorter solution.1B.) Thus 55 and hence the standard deviation of X is fl = 0..
. the dangers of assuming that it will work are illustrated by a variant of the present problem in which each ball is replaced along with one of the other colour: for this variant the approach of 'multiplying by three' will not work. T h e probability that A wins is thus 6 in the mathematical and substituting 6 for q we find the probability to be f. second. (a) Show that the three players are equally likely to win. T h e same applies to the case X = 2. (See also Note 2 to Problem 1A. In one throw. where we introduce the symbol q to represent the probability manipulations which follow.L)x(l . and multiplying by three (on the grounds that the red ball could be drawn first.4 we are producing the detailed rough work that precedes a final solution. An approach on these lines would. ..10 for a discussion of the rde of tree diagrams in the solution of problems in probability.4 The game of darts Three players A.) 1B.& ) x ( l . however. (See Note 1 to Problem 1A. the probability that A hits the bull is corresponding probabilities for B and C are $ and f respectively. Solution (a) The probability that A wins in the first round is clearly To obtain the probability that A wins in the second round we note that. Pr(A wins in round i) = +qi'.2. 2. Since all branches of the tree would have to be explored in the course of solving the problem. (See Note 6 to Problem 1A. if this is to happen.32 Probability and Random Variables 1B. require justification. B and C. 2. A. in general. Thus Pr(A wins in round 2) = Pr(al1 three miss in round 1) x Pr(A hits in round 2 1 all three miss in round 1) Similarly Pr(A wins in round 3) = Pr(al1 three miss in rounds 1 and 2) X Pr(A hits in round 3 I all three miss in rounds 1 and 2) and.10 for discussions of similar points. take turns (in the specified order) to throw a dart a t a dartboard. This suggests that we could get away with calculating just one component. .2 and Note 5 to Problem lA.+) = 9 &. the the first to hit the bull wins. for i = 1. or third)..) (2) Notice that the three probabilities that are summed to obtain Pr(X = 1) are all the same. (b) Find the expected number of throws in the game.) One might consider using a tree diagram to solve this problem. all three players must miss in the first round: the probability of this event is (1 . there would not be the objection that there sometimes is on grounds of wasted effort.
. . in particular. . ) 3  . 9.1B. (2) Some readers may wonder how we arrived at our solution to part (b). \ etc. . Aq. (Alternatively. we note that the sum is p(l + q + q 2 + 93 + . where q = 1 .4 Random Variables 33 Turning our attention to €3. 6 . Since the series in brackets can be recognised as the binomial expansion of (1 . throughout our solution. ) + 5q + 8q2 + . by subtraction. 8 . ). where p = = 1 . i = 4.p . ) + q 2 + . exponential series when dealing with the Poisson distribution. Some links between the present distribution and the geometric distribution are explored briefly in Note 5.9 Notes (1) W e are. we see that the number of throws T in the game has the probability function i. and. in seeking E(T).q . we may find other binomial & . this result may be obtained using arguments similar to those above: the probability that C wins in round i is found to be the same as for the other two players. we continue to find that the probability that B wins in round i is $qi'. To demonstrate that the probability function of T sums to 1. Pr(T = i) = I lo f$. . .) (b) From the results obtained in solving part (a). As with A. &q2.2. the probability that C wins is f . i = 7. for example.10). In finding the expectation and variance of a discrete probability distribution one frequently finds that the series to be summed are closely related to the series which one encounters in demonstrating that the probability function of the distribution sums to 1. ) L(3 10 + 6q + 9q2 + . binomial expansions of negative powers of 1 .$)X+ . hence. summing over i . .3. 5 . . we see that the probability that B wins in the first round is Pr(A misses)xPr(B hits) = (1 . we encounter binomial series when dealing with the binomial distribution. when dealing with the geometric distribution (see Problem 2A.' it is possible that. i = 1. . assuming that the results of different throws are independent.1 ~ 10 . Thus. . Hence E(T) = = = $(l+2+3 A(2 10 + (4+5+6)q + (7+8+9)q2 + . the probability that B wins is also Finally.q101 + q 9    10 (1 q)2 1 .q ) .q .
. A similar approach is also possible with part (b). . 2. .4 (1 . we shall not give full details here. are all since these d o not depend on k . the probabilities that A . . The basis of the method is that we condition on the fact that the game finishes in round k (k = 1 . We have now obtained a recurrence relationship for PA which. given that the game ends in this round. ). we obtain S = 30. a multiple of (1 . . we obtain (1 . = (then E ( T ) = & S ) . Alternatively. (4) Another way of solving the problem is through the use of recurrence relationships. . &. we recollect that 1B. . T. ) in arithmetic progression. round) has a geometric distribution with The expected number of rounds in the game is therefore Now there will parameter p = &. . . possibly incomplete. . and hence E ( T ) = 9.34 Probability and Random Variables expansions of 1 . Then p A = Pr(A wins in round 1) = + Pr(A wins in a subsequent round) & + Pr(al1 three miss in round 1 ) x Pr(A wins in a subsequent round all three miss in round 1) 1   7 + =PA. and hence proceed to evaluate S. An alternative approach. we may repeat the process of multiplying by q and subtracting. (5) In solving part (b) we could have made use of the fact that the number of rounds in the game (including the final. g.6. 2 and 3 with probability and otherwise.q)’. and the expansion of 3(1 . 8. . B and C will win in this round are all the conditional probabilities that they will win. i.q ) . . . in particular. . If we let E denote the expected number of throws in the game. gives the equation $PA = with the solution P A = 1’ 3 A. Thus the difference between these series must be a series with all coefficients equal.q useful. is the following. on rearrangement. Since the argument is similar to that described in Note 2 to Problem 1A. we shall find the probability that A wins .l . then = (24 + (6x2) + (6x3) + (&x(3 +E)) (This follows from the fact that the number of throws in the game takes each of the values 1. The series we require to evaluate in finding E ( T ) has coefficients (2. We may now recognise that the right hand side of this equation is equal to 2 + 3 q ( l . . Noting that qs = 2q and subtracting.q ) y = 1 + 2q + 3q2 + 4 q 3 + . As our first illustration of this approach. + 5 q 2 + 8q3 + . . which some may prefer. obtaining (lq)2S = 2+q. ( 3 ) There is another way of solving part (a). with a common difference of 3. i. The series to be summed is S = 2 + 5 q + 8q2 + l l q 3 + . equal to t. they are also the unconditional probabilities.e. Substituting the value $ for q . .call this pA.) The above recurrence relationship simplifies to &E = leading to E = 9. If the game has not Thus ended before round k . with probability the game effectively ‘restarts’ after the first three throws.q)’ has the same property. 5.q ) S 2 + 3q + 3q2 + 3q3 + . i.
In practice this means that p is chosen to be as close to 0 or 1 as is possible. The only important property of the randomising procedure is. 3 . (b) I did not vote Conservative at the last General Election.5 The randomised response experiment Let 0 denote the probability that a randomly sampled individual in some population voted Conservative in the last General Election. (a) I voted Conservative a t the last General Election.1 ) X 3 + 2 = 9 .outside the staff room. we remark that the method of Note 4 may be used to find the mean of the geometric distribution. and similarly only those who did not vote Conservative respond ‘True’ to (b). while still making it clear to subjects that their true position will remain private. If A is the probability that an individual responds ‘True’.p ) ( l 0) (2p . For a group of mathematics teachers attending a statistics course. in equation (*) above and solving for 0 gives estimate of 0. each of a random sample of individuals from this population responds ‘True’ or ‘False’ to one of the following two statements. A randomising device is used to ensure that the probabilities of responding to (a) and (b) are p and 1 . Thus E(T) = (T . that it should be impossible (and clear to the subject that it is impossible) for the interviewer to know whether statement (a) or (b) was chosen by the subject.l ) 0 + 1. In a particular type of ‘randomised response’ experiment.S Random Variables 35 be three throws in all but the last round. In the numerical example. of course.5. excecding speed limits regularly or smoking at school . Use these figures to estimate 8. W e see from equation (*) that the method cannot work if p is set at 0. in the last the expected number of throws is 2. since the only people who respond ‘True’ to (a) are those who did vote Conservative. for example. we can estimate A by A = .24 43 ’ Substituting this. Notes 8 = 0. and p = 0.p respectively. we obtain Pr( ‘True’) = Pr( ‘True’ n question is (a)) = + Pr( True’ r question is (b)) l Pr(‘True’ to (b) ). the value of p is chosen so as to maximise the precision of the estimator of 8 resulting from the experiment. where p is known and 0 < p < 1. which the subject can take and shuffle as much as desired before selecting a card.1B.3. As long as p is not too close to 0 or 1 then the individual response of a subject does not give a strong indication of the true activity of that subject. write down an expression for A in terms of p and 8. With this restriction.p ) = i.e. .p . Finally. 24 out of 43 responded ‘True’ in an experiment in which p was fixed at 0 .355 as an (1) This problem indicates a way of obtaining estimates of proportions of populations taking part in an ‘embarrassing’ activity. The randomising device used could simply be a pack of cards. that is. p Pr( T r u e ’ to (a) ) + (1 . or may be a bag of beads of different colours. as an alternative to the more usual method involving the summation of series. Solution In a rudimentary but obvious notation. 1B. A = p 0 + (1 .
These equations are obtained from the two items of information that we are given. which will give two equations to be solved for the two unknown probabilities. and Xc are independent binomially distributed random variables with distributions B (21.3 i = = 2. 2 therefore shows a significant (3) There are many variations on the basic randomised response technique. for the other 30 teachers.3x0. the procedure is very much as in the present problem.55) was obtained by the first method shown in the solution to Problem 4C. One can use this latter figure to derive a confidence interval for 8. but one needs to know the probability of replying ‘True’ to the innocent statement. and this gave a direct estimate of 8 of 21/51 = 0. can be compared more formally. 0. (If this probability is unknown. using different values of p . the probability of responding ‘True’ in the randomised response experiment is 0.6 Finding a probability density function A continuous random variable X .= ) 3 10. as follows. So if X is a random variable denoting the total number responding ‘True’.3. Hence E(X) = (21X0. in a crude way.3. For the 21 teachers who admitted to voting Conservative. . Problem 4D. 0. The value 0. which gives 8 = 0.. X .6.N(27. with mean unity.3. A preliminary experiment with the mathematics teachers mentioned in the problem Produced (from the 5 1 then taking part in the experiment) an estimate X = 35/51.27.3) and B (30.27. We see that the value 0. in which statement (b) above is replaced by an ‘innocent’ statement. But in addition to performing the randomised response experiment. The two estimates of 8 found above. the participants also wrote down on a slip how they had voted.36 Probabiliry and Random Variables 1B.6 for a different version.20. Solution In order to solve for the two unknown values a and b . we need two equations involving a and b . 7 ~ 0 .7. has probability density function fx (x) given bY Find the values of a and b .034.6 (2) From Note 1 .3 = (21x0. one can undertake the randomised response experiment twice. we can write x = x. This value of difference at the 5 % level but not at the 1% level. See.3) and Var(X) + (30X0.7) + ( 3 0 ~ 0 . concerning the timing of the subject’s birthday. and a 95% interval (0. for example. rn using a continuity correction since X is discrete.034.41 and 0. where X.71. suggests that the technique might not have been adequately explained to the teachers. we find that.7) = 27.355 obtained in a later experiment is. 0. and correctly. + x2. approximately.71).) 18. the probability of responding ‘True’ is 0. of course.7) respectively. and the corresponding standardised normal deviate z is 35 . as follows. T h e observed value of X from the randomised response experiment was 35. 0. As will be seen in the solution to Problem 4D.034 is very far away from this.41. we see that it is important that the rules of the experiment be clear to the subjects. quite consistent with the confidence interval above. which. Using a normal approximation to the distribution of X . 10.
one might perhaps consider using a quadratic density function as a rough approximation.~ ) ~ d= 1. in that one rarely finds a random variable with a quadratic density function in nature. and the range was restricted. The most direct value in this problem resides mainly in the reinforcement it can give. Hence b Random Variables 37 fa(6 . using only elementary calculus. and decides to place the pointer randomly. 1B. We now see that 6 = 4 and a = &. subjects have to indicate their probabilities for certain events by moving a pointer on a scale between 0 and 1. of value in simulation. 0 resulting in the equation ab3 = 3. (ii) The mean of X is unity. Note In some respects this problem might be thought rather unrealistic. (Such random variables are. This is to be solved together with the equation ab3 = 3 obtained earlier.=3 7 1 which simplifies to give ab4 = 12.x ) ~ & 0 = 1. Consider a subject who is unable to give any worthwhile assessment of a particular probability. where they are used as a basis for generating other distributions. l ) equally likely to be the point chosen. what is the probability that the ratio of the resulting shorter segment to the longer segment is less than $? .7 (i) The function f x ( x ) is a probability density function. with each point in ( 0 . So b Ja(6 . however.~ ) ~ x d = 1 x . x andso [ 0 U 6x ( ’1 h = 1.1B. to the result that all probability density functions integrate to 1 over the relevant range.6 + b ) d x = 1.7 Positioning the pointer In a psychological experiment investigating how individuals change their assessments o f probability in the light of data.) If a histogram of observations on some random variable showed symmetry. and in particular the normal distribution.x ) ~ ( x . 0 and thus b af(6 . and to the formula fx f x ( x ) d x for the expectation of a random variable X . 0 Splitting terms in the second bracket gives b b af(b 0 x)’& + abJ(6 . In such a case. and these integrals are evaluated as above to yield I _ ab4 ab4 4 + .
a subject’s estimate of the probability that a ball drawn at random is red may be 0. Before drawing any balls from the bag.5) .5) = P r ( 2 U l c x n U1<0.l). when U > . We now see that we do not need to select a new value for U 2 . conditioned as in this problem.38 Probabiliry and Random Variables Solution 1B. Experiments of this kind have shown that subjects behave conservatively in comparison with the predictions of Bayes’ Theorem. The required probability is therefore Pr(U < +) + Pr(U > +) = 5. For each value of X we need to generate two uniform random variables. We can verify this as follows. (3) A random variable with a uniform distribution.7 0 u 1 Figure 1. * . For example. The subject may then successively select balls at random. For 0 5 x 5 1. both uniformly distributed over the range (0. One way of doing this would be to select two independent random variables. ) . say. Suppose a bag of 10 balls is used. each time revising his or her estimate of the probability of drawing a red ball.5) Pr( U < 0. 1 Pr(2Ul<x Ult0. Thus if. Notes (1) An alternative way of obtaining the solution is as follows. U 1< then 2U1 is uniformly distributed over (0. t.4). The event of interest occurs . say. by symmetry.4 Position of pointer on (0. suppose we want to simulate a random variable X with the binomial distribution B ( 2 . where Ii = 1 if Ui > and 0 otherwise. 1) scale Let u denote the distance from 0 to the pointer (see Figure 1. remains a uniform random variable. (2) Experiments on the assessment and revision of probabilities are easily camed out and can be revealing. and quite often generating such variables is a timeconsuming feature of simulation. We then set X = / I + 1 2 . . for i = 1. U1 and U 2 . This result is sometimes useful in the area of simulation.5. 2.l). when U < f or. and may thus be used in place of U 2 . From the symmetry of the problem it is sufficient to consider the case U < we thus require i. 7 of the balls being red and 3 black. but rather we can reuse U1. observe their colour and replace them in the bag.
~ ) ( ~ ) . So we obtain f f Pr(2UI < x I U 1< 0 .U l ) = 0.1B. and this gives k 9 2 = 1 o k = . k s ( x2 2 + 7x .5. suppose that U1 = 023442. elsewhere.5 Pr(UI < 0. Jfx(x)dx = 1. 5 j k ( 2 . Thus we require E(X2) = sx2fx(x)dr = 2 2 5 9 sx2(2 .3116 < 0. since 0 5 x 5 1 . i. as required. as follows.x ) ( x 5)dx 2 5 = 1. i.5 we set I I = 1.1O)dx = 1..e.8 Pr(UI< f x ) Pr(UI< 0. and hence deduce the mean and variance of X. 1B. 5 ) = As an illustration. O 5 fx 5 f . ~ 0. We therefore set 12 = 0. r 2 9 The mean of X is defined as X = 2 9 5 [ x4 4 7x3 7 + .5) dr 2 .8 Finding the mode and median of a distribution A continuous random variable X has probability density function fx(x) given by f x ( ~ ) k ( 2 . In this case U1 > 0. and so U1 I x implies that U1 I . = 5 = 2 515. What are the values of the mode and median of the distribution of X? Solution Sincefx(x) is a probability density function. Find the value of k . = 2 Random Variables 39  1 ~ 1 f< ) x Now. the area under it must be unity.5. but 1 .5x2 J 2 = T ‘ 3 5 The variance of X may be written as Var(X) = E(X2) . where the integral is taken over the range of the random variable X .5.X)(X. We can clearly use this result to evaluate k .e.5) +x x 1 7 x .U1 < 0. and so obtain the simulated value of X = 11 + 12 = 1. and so in place of U2 we can take 2(1 .{E(X)}2. Since U1 > 0.
given by the value of x which maximises f x (x) over the range 2 5 x 5 5. 2 so that 9 I‘ 3 + __ . is symmetrical. Had this graph been drawn initially then the common identity of the mean. The equation 2M2 .902 and 6. rn . as shown in Figure 1. in shedding light on otherwise purely algebraic manipulations which might well have produced the wrong answer from a n arithmetic error. both of which are unacceptable since we must have 2 5 M 5 5 .14M :(see Note 2) and so we may factorise the cubic as 5. Indeed.= 10 4 9  20’ The density function in this problem is parabolic. and so we have a single mode. This equation reduces to the cubic 4M3 .5. mode and median for a simple probability density function. being parabolic. M . (Differentiating again shows that ___ . (2) In this example the mean. : .7k.) In order to determine the median.098.k(2x and so ~ dx d f x (1 ) = 0 when dx + 7) x = 3.7)(2M2 .14M + 11 = 0 has roots 0.8 leading to Var(X) 127 49 = __ . mode and median all coincide at x = The reason for this result is simply that the probability density function fx(x). leading to M3 + 7 M 2 + 120M . mode and median a t x = f would have been spotted instantly. we need to solve the equation from the definition of the median. providing confirmation that the stationary value is a maximum. T h e median is therefore given by M = Notes (1) This problem is useful in indicating the type of manipulation needed to calculate the population mean. 3..77 = 0. variance. W e see therefore the value of a rough graph. and in particular d2fx(x) when x = 5. had this fact not been known then the root M = to the cubic equation (*) would not have been obvious. dx2 Thus the mode of the distribution is at x = I .40 Probability and Random Variables 16. This is always negative. Thus 2 M J(2x)(x 9 2 5)dx = 1 . Now dfx(x) .42M2 The equation has solution M = (2M . + 11) = 0.1 0 x g 7x2 2 M = 1 2.
+ sin8 2 a 9 where A is the event that the needle crosses a line. and a needle of length 1 ( I < a ) is dropped and spun so as to fall with its midpoint equally likely to lie at any point between the lines. The simplest solution follows from a conditional argument.3 0. function of x in the range 2 5 x 5 5 .9 Buffon’s needle Two infinitely long parallel lines are a distance a apart.1B. 2 Pr(A 1 = 8) = 8 sin8 2 1 1 . 6 takes the value 8. and 6.1 o. the distance of the midpoint from one of the lines.6 2 3 v) 0. 6 is uniformly distributed between 0 and 71.20. =a Solution This complex problem involves two jointly distributed random variables: D .5 . From the conditions stated...0. Hence n Pr(A) = JPr(A 1 = O)fe(O)dO 8 0 .o. Suppose that. Then. on a particular throw of the needle. independent of D . I 1 I . and equally likely to point in any direction. 1 1 1 I I t 1 X Figure 1. no further analysis is necessary to find the median. D is uniformly distributed between 0 and a while. the needle will cross a line if its 1 midpoint lies closer than sin 8 to either of the two lines.4 0. the angle made by the needle with the lines.e.9 Random Variables 41 c . the equation can only have one root for x in this range. Show that the probability that the needle 21 crosses one of the lines is . strictly. given this. i.I r c .5 Probability density function for Problem 1B. Once the root x = is known then. increasing. 1B._ c 0.8 Fx(x)= (3) As F x ( x ) is a continuous.
+ E(X2) + . The experiment thus offers many opportunities for class discussion. different members of the group may try the experiment with different values of r . (i) The double grid.(4r .42 Probabiliry and Random Variables lB. = 21 .4 and 1 A . Results from a n experiment can then be used in a variety of ways to produce an estimate of n. and for any set of results both the estimates noted above might be calculated. then x = x 1 + x2 + . (There is an obvious problem if no ‘successes’ occur. that it crosses exactly one line (2(2r . .rr. and the corresponding experiment can be used to estimate n. 12. But in one respect it remains very simple. For example.. . . 21 na Now suppose a long needle has bands painted across it so that the lengths 1 1 .will then give an estimate of this probability.e.) . we d o not deal with this here. Then (under reasonable assumptions) X will be binomially distributed with index n and parameter 21 X The fraction . ( 2 ) This problem. Clearly there are now (for a needle of length 1 5 a ) three possibilities: that the needle falls entirely within a square (probability 1 .e. and E(X) = OxPr(X =0) + lxPr(X = 1) = Pr(A) = . of the m individual sections are all less than a . and two lines ( r 2 / n ) . . (i.r 2 ) / n ) .r 2 ) / T ) . For example.. a and n appropriately such an eventuality can be made extremely unlikely. an extension) of the law of total probability. in which a needle is dropped randomly onto a grid (simplified here to just two lines. . 7 . nu n 21n and one can transform this in a simple way to give an estimate . I. . . without loss of generality). rather than the probability of a line being crossed. but for a long needle it represents the mean number of lines crossed. not lying entirely within a square) to (4r r2)/n. amongst others. . also discussed in Problems 1A. . If we denote by Xi the number of lines (0 or 1 ) crossed by the section of length li ..) ax (3) The experiment can be extended and modified in various ways.) 2 = (!I na + 12 + . When the length 1 of the needle exceeds the distance a between lines the situation is naturally more complicated.for n. where r = l / a . Suppose that a needle is thrown randomly n times onto the grid. beyond commenting that by choosing I . probability of success) . If in the original solution we let X denote the number of lines crossed by the needle. . and this in turn can give a basis for a discussion of concepts such as efficiency of estimators. then plainly X can be only 0 or 1. is given the name of Buffon’s needle. or the proportion crossing at least one line (i.9 Notes ( 1 ) The argument in equation (*) is a form (strictly. let X be the number of occasions on which a line is crossed. of which we mention just a few. na In other words the same value is calculated now as before. Suppose that the needle falls randomly onto a grid of squares of size a x a . Comparison of results will then throw light on the properties of the various estimators. (ii) Needles of length greater than a . and if the entire needle crosscs X lines. + x . one might equate the proportion crossing two lines to r2/. But the double integral involved is avoided by the conditional argument. This probability could have been arrived at by examining the joint probability density function of D and 6 and integrating it over the relevant region. . + 1. indeed it often simplifies an analysis to look for a random variable on which one can condition. + E(X..
along the lines shown in Problems 4C.639 and a corresponding estimate of T as 3. which we leave to the reader to develop.corresponding inferences can be drawn about n . It must be said that the results have not always been regarded as convincing proof that the value of TT can safely be left in the care of statisticians.2 and 4C. Since these probabilities are all straightforward functions of T . 1032 had both lines crossed. x 21. in the single grid experimcnt.e. In the double grid experiment. each individual section of the needle has to be straight. which gives 3. more likely. and hence derive its probability density function.3. over two years. A n interesting special case. we are quite entitled to have the individual sections of infinitesimal length.128. and so on. is that in which the needle is in the shape of a circle. this gives an estimate of the probability that a line is crossed to be 0. but there is no requirement that the whole needle shares this property.hich I = a . and in the rcmaining 1933 only one of the lines was crossed. Find the cumulative distribution function of Y . we see that the results above can be manipulated to give tests of hypotheses (or. This can again lead on to discussions of efficiency of estimation. For example. although an unscrupulous operator could select results so as to show that good estimates can be produced. lB. Further. .10 Transforming a random variable Find the mean and variance of the continuous random variable X with probability density function given by 314. so that by a limiting argument the needle can be of any (twodimensional!) shape. Verify that.155. results over three years' courscs for teachers gave 2487 lines crossed out of 3890 trials in w. of which 153 resulted in the needle lying entirely within the square. In principle. A new random variable Y is defined by the relation Y = X'. Solution By definition. there were 3110 trials. In our discussion of long needles above we made no use of the customary property of needles of being straight.10 Random Variables 43 (iii) Curved needles. i. optimum choice of r . = otherwise. confidence intervals) for the probabilities of the various relevant events.1B. (4) The basic single and double grid experiments have been used by the authors in courses at the University of Kent over several years. Bearing in mind the link with the binomial distribution. and explain why this result holds in this particular example. The estimate of T based on the number crossing both lines is j+$31+T. jX(') ( 0 . E(XY) # E(X)E(Y). the needle was the same length as the distance between lines. for X and Y as given above.
as one would expect.e. and also when deriving new ones. f x ( x ) = 3x4 and the transformation is y = x'. Had we found that E(XY) = E(X) E(Y). the two events {Y 5 y } and {X 2 yl} are identical. clearly. Another example is given in Problem 1B. ! Y dr . When this happens it can be shown that. from equation (*) above. there is another. for 0 I y 11. so that X and Y are clearly correlated . by definition. as would the correlation. XY = 1 always.' ) c c = 3Jx4dx Y ' = y3. i. and so. it is often useful to check that (as required for a density function) they integrate to unity. method of deriving the density function of a random variable defined (as in this problem) through a transformation. E(X)E(Y) = . its derivative always takes the same sign. and 1 X 1 1 (2) It is useful to remember that the probability density function of a new random variable is often readily obtained by first deriving its cumulative distribution function and then differentiating. Y is a monotonically decreasing function of X .44 Probability and Random Variables lB. Here 50 sfx(x)dx = XI ~ S X .=~1 ~ X . W e can see that Pr(Y 5y = pr(X 2 y . given the density function f x ( x ) of X and the transformation y = g ( x ) . then the covariance between X and Y would have been zero. : But. quicker.11. so that L = x'. Notes (1) When dealing with unusual density functions. The method is available only when the function Y = g ( X ) defining the transformation is strictly monotonic. and now we obtain so. However. We thus obtain. where we express the right hand side in terms of y by solving the equation y = g ( x ) for x and substituting for x . so = E(XY) = 1 # E(X)E(Y). For the current problem.10 The cumulative distribution function of Y is given by F y ( y ) = Pr(Y I Y ) . The probability density function of Y is then given by We have already found E(X) to be i.negatively. as was done here. (3) Notwithstanding the comment in Note 2.
( z )= Pr(z 12) ZVZ~Z). so the event {Z s z } can be written as c25 I}. as before.ll Random Variables 45 and substituting y p1 for x gives f u ( y ) = 3y2.6 Solution Since the point Q is randomly positioned. Now. An exception is discussed in Note 1 to Problem lB. = Pr(X ~ . Pr(Z CZ). and require the distribution of Z . Figure 1.11 Length of a chord of a circle In Figure 1. Find the probability density function of the length AB . Z = 2 . a function of X. .6. OP is a radius of a circle with centre 0 and radius r . r ) . i. The random variable Z is. the distances OQ and AB are both random variables. 18.2. (4) In problems of this type it is almost always far simpler to use the expression Var(x) = E ( x ~ ) . from the above equation. and show that the probability that AB is greater than r is 0 4 6 6 . or as Q 2 : Since these events are identical. The chord passing through Q at right angles to OP meets the circle at points A and B . rather than the alternative and equivalent form Var(X) = E[{XE(X)}2].l. of course.{E(x)}~ when calculating a variance. by Pythagoras’ Theorem we see that x 2 + ( 3 ~ = r2. they must have the same probability. )~ In problems of this sort the simplest approach is to obtain first the distribution function F z ( z ) of Z . A point Q is chosen on OP in such a way that all points on OP have the same chance of being chosen. so m}. We are given that X is uniformly distributed on the range (0.e. some further comments on calculating variances are made in Note 1 to Problem 2A. and we denote them by X and Z respectively.lB.
a rigorous approach available.Pr(Z s r ) as required. . Some questions of this type are readily answered by working directly with the probability density function. in particular.lO. (3) In the solution we used an intuitive method to evaluate Pr(X 2 There is. The and r is thus probability that it lies between m FZ(Z) r . which some readers may prefer. Notes (1) Like Problem lB.46 Probability and Random Variables lB. r ) . X O s x sr. it is simpler to work with the cumulative distribution function. r ) . as is demonstrated in Note 3 to Problem lB. In other cases.lO. (2) The answer to the second part of the problem could have been obtained by integrating the density function of Z over the range from r to 2 r .(z) d dz = 2r To evaluate Pr(2 > r ) it is simplest to write Pr(Z >r) = 1 . and is 0 elsewhere. this problem deals with the transformation of random variables. Thus X has cumulative distribution function Fx(x) = T. attempts to solve it by working with trigonometric functions of the angle AOP could result in difficulties. T. Since X is uniformly distributed on the range ( 0 . but this was not necessary here. of course. its density function is given by 1 fx(x) = Osx sr.G G % r so that 9 = 1 GG% r To obtain the probability density function of Z we merely differentiate. But Pr(X 2 V7ZGi) = F x ( r ) .ll We now use the fact that X is equally likely to take any value in the range ( 0 . Hence fz(Z) = F. w). This problem is not an easy one.F~(GG%) as found in the solution. as here.
the art o getting to grips with a problem is f largely that of recognising circumstances leading to each of the wellknown probability distributions. one of the skills of the specialist in probability theory is to appreciate the opportunities to save effort in complex problems through breaking them down into manageable components. as we shall see. both exact and approximate. f Such problems are by no means artificial. for example. which is certainly not the case. naturally. also involved in some of these problems. if in a problem one can identify independent ‘trials’. each involving. We B ( n . then the total number of these ‘successes’ in a fixed number f of ‘trials’ (whatever these represent in the context of the problem) has a binomial distribution. A closely related skill.)   . if only to encourage readers to avoid habits of thought which may dictate. l). for example. Poisson and normal distributions. In other cases an approximation using the normal distribution is used. the chapter will concentrate particularly on the binomial. and the parameter is the probability of ‘success’. but note briefly here that we use the virtually standard notation N ( p . perhaps. for convenience. So. between different f distributions. This section concentrates on the binomial and Poisson distributions. if a random variable Y has a Poisson distribution with mean p we shall write Y Poisson(p). Similarly. we defer a full description of normal distribution notation until the introduction to Section 2B.2 Probability Distributions In this chapter we make the short jump from the general examination of random variables and their distributions to dealing with some of the most important probability distributions. p ) ’ to denote that a random variable X has the binomial shall use the notation ‘X distribution with index n and parameter p . In some cases we use one discrete distribution to approximate another. Indeed. make use of appropriate notation. is that o appreciating the relationships. and when it is suitable to use a n approximation. each with the same probability o a ‘success’. then the distribution must be Poisson. when it is appropriate to use them. the index of a binomial distribution is the number of ‘trials’. 2A Discrete Distributions For discrete random variables. if that is appropriate and seems useful. A feature o the problems in this chapter is that many involve two or more distributions. just a single distribution. particularly. Naturally we also give some discussion to others. a2) for the distribution and @ ( z ) for the cumulative distribution function of the standardised normal distribution N ( 0 . that if the binomial distribution cannot be used in a problem involving a discrete random variable. (When using a normal distribution approximation we shall also. It is convenient to deal here with a few small points of notation for these distributions.
and this chance remains constant however many games he plays against Eric. and Paul wins a t least two? Solution We denote the number of games Paul wins out of the first five games by XI. Hence Note T h e binomial distribution could be used in the solution because it was clear that the first five games could be. (i) We require Pr(X1 2 2). X1 must be 0 or 1. Otherwise Paul persuades Eric to play a further 5 games with him. we denote this event by S . geometric and hypergeometric distributions. for example the negative binomial.Pr(Xl= 0) . and Paul is determined to win a t least two games. and this is most simply evaluated as  a). Unfortunately his chance of winning any one game is only f . by the addition law of probability for mutually exclusive events. Now each of these events is itself the intersection of two independent events. to which the multiplication law can be applied. and X1 + X2 must be two or more.1 The squash match Paul and Eric are playing squash. We thus obtain Pr(S) = Pr(X1 = 0 n X 2 2 2) + Pr(X1 = 1 n x22 l ) .243  405  1024 376 .367 1024 (ii) We now require the probability that ten games are played.0. and Paul wins at least two of them. . XI. and we also deal with some of these.1 There are many other discrete distributions. 2A.5(+)(34  . if Paul has won at least two by then. The players agree to play 5 games and. From the conditions stated. Pr(X1 2 2) = 1 . Then X1 B ( 5 . with the same probability of success on each trial. regarded as five independent ‘trials’. and ‘failure’. in each of which there were just two possible results.Pr(X1= 1) = 1  ( 3 5 .) Under these circumstances the total number of successes. and that Paul wins at least two. play ceases. (Trials of this type are often termed ‘Bernoulli trials’. ‘success’ for Paul. and the number out of the second five (if played) by X2.1024 . and X2 will have the same distribution. is known to have a binomial distribution. (ii) that 10 games have to be played. What is the probability (i) that only 5 games are played.48 Probability Distributions 2A.
or. From the definition of expectation. n . x =o But.forx =1.n.{E(X)}2 n(n . Hence .1)) = i x ( x x =o .p) distribution.2 Discrete Distributions 49 2A.) =ncI:). ..p ) . 2 shows that the summation is 1.I. . equivalently. the sum of probabilities in the B (n 1. what is the probability that the company will accept the delivered batch? Solution If X B ( n . + To obtain Var(X). . If 20% of the articles in the batch are actually defective. then  Pr(X = x ) = (.{E(x)}~ = E{X(X . zero.)px(l pyx. Thus the expectation of X is np and its variance is np (1 . . p ) .nzpz = np(1 .p ) . andtheinitialterm(x=O)inthesummationis Ip since substitutinf y = x . . We find E{X(X . and a second random sample of 6 is taken. . or (b) a random sample of 6 contains two defective articles. A company taking delivery of a large batch of manufactured articles accepts the batch if either (a) a random sample of 6 articles from the batch contains not more than one defective article. x =o.. = n(n . .1)pz.2 Sampling incoming batches Show that the binomial distribution with index n and parameter p has mean np and variance w(1.(. we thus obtain E(X) = g x ( : ) p x ( l p)"". We can now deduce an since using the substitution y = x expression for Var(X) as v a r ( x ) = E ( x ~ ) .2A.1)) = + E(X) . and found to contain no defectives.2.1 in the summation shows it to be just the binomial expansion of (1 p))'.P I . we first use an extension of the method above to obtain E{X(X .l)}.l)pZ + np .1) ( i ) p x ( l p)"'.
) just as it can by the more usual probability function.f. the distribution of the number of ‘successes’ is no longer binomial. Theref ore Pr(batch is accepted) = 0. These random variables are independent. for instance. for which the variance is most easily found through the formula v a r ( x ) = E{X(X + 1)) . In fact ‘success’ may sometimes constitute what we might more readily think of as failure. Use of this device is far from obvious. In this way we could say that the device used matches the problem. where p is the probability that an item is defective. in Note 1 to Problem 1B. by independence of X1 and X2. In summary.2 in which the function (X 7)2 is used. and both have the binomial distribution with index n = 6 and parameter p = 0. for example. (See Problem Z A . when items are taken from a finite population. the coefficient of t’ is Pr(X = x ) . of a random variable X is defined as a function G x ( r ) = E ( r X ) . but in general the only advice that can be given is to look at the mathematical structure of the probability function and see if there is any feature which can be exploited. by the addition law of probability for mutually exclusive events. The p. but its value is seen from the cancellation that was possible in the binomial coefficient. can be used profitably in theoretical work on the binomial and other discrete distributions. A batch will be accepted if X1 5 1 or if X1 = 2 and X2 = 0. The same method also works for the Poisson distribution (see Problem 2A.g.1)).2. * (4) The technique of probabiliry generating functions.0644 = 0.f.E(X) . (2) In answering the numerical part of the problem it is necessary to assume from the statement in the problem that the batch is ‘large’.and we outline some of its main properties. Strictly speaking. This allows us to use the binomial distribution. = We also obtain.SO Probability Distributions 2A. we obtained Var(X) by first finding an expression for E{X(X . a discrete distribution can be described completely by its probability generating function (or p. hence the name of the function.7). we have taken ‘success’ to correspond to a defective article. l l for a discussion of this distribution. Thus Pr(batch is accepted) = Pr((X15 1) U (Xl = 2 = Pr(X15 1) n + Pr(Xl= 2 n X2 = 0 ) ) X 2 = 0). but hypergeometric.g.10.720.2 In the numerical part of the problem we denote the numbers of defectives in the first and second samples by X1 and X2 respectively.) (3) ‘Success’ and ‘failure’ are only really standard labels used for the two possible outcomes of a Bernoulli trial. the proportion with some attribute changes. Now Pr(X1 I = Pr(X1= 0) 1) = + Pr(XI= 1) (0. Notes (1) In the solution. This is done.{E(x)}*. in Problem 2A. of course. See also the ncgative binomial distribution. so that the proportion of defective articles remains constant even after sampling some of the articles without replacement.S)6 + {6X(0~2)X(0~8)5} 0. .6554.6553 + 0. (a) Expanding G x ( r ) as a power series in t . though more advanced. In the above example.
+ Y. . Y . .2A. and thus find the coefficient of t r to be the probability function of the binomial distribution.1)} = G ” x ( l ) . Specifically. approximately.(np)’ = n p ( 1 .l)p2 when t = 1. variance = np(1 . where X B( n . similarly. 4 5 ) distribution or. . . with distribution function @(.f.g. Find an approximate value for the probability that a random sample of 500 of these articles contains more than 25 which are defective. What. we find G j ( r ) = (1 . we define random variables Y 1 . of X .3 Sampling for defective items A machine produces articles of which an average of 10% are defective. . we use the normal distribution with the same mean and variance. Y 2 . + np and substituting t = 1 gives. in the random sample of size 500 has the binomial distribution with n = 500 and p = 0. . X 50 . and then evaluating the derivatives at t = 1 . equivalently. with Yi being 1 if trial i is a success and 0 if not. we find that the second derivative is. . 2A.1.g. of Y j by G i ( t ) . . Hence we use the N ( 5 0 . and thus we obtain. is the probability that the sample contains fewer than 60 defectives? S o htion The number of defectives. as the term in r ’ . But X = Y 1 + Y2 + .. X . in each of which the probability of success is p .3 Discrete Distributions 51 (b) The moments (in particular the mean and variance) of X can be found by differentiating G x ( r ) w.1 = 50. . E(X) and = G ’ x ( 1) E{X(X .p ) t o + pr’ = (1p) +pt. of the sum of any number of independent random variables is the product of their individual p. (c) The p.1)p2 .p ) = 5 0 0 ~ 0 ~ 1 = 0 ~ 9 ~ 45. we use the normal approximation to this distribution: that is.f.).f.g. l ) .r.n. E(X) = np. Note that by expanding the p. immediately.{ G ’ x (1 ) p . n ( n . Differentiating. . we obtain ~ ’ ~ (= t n ) { ( l p  +P ’ I’ p) + pt)”p*. Denoting the p.p ) .g. We can now use this result to obtain the mean and variance of X . .t. Differentiating again. so that Var(X) = G ” x ( 1 ) + G’x( 1 ) . Applying the technique to the total number of successes in n independent trials. t . so by property (c) above we obtain Gx(t) = ((1 . To obtain an approximate value for the probability Pr(X > 2 5 ) . easily.fs. i = 1. transform to ==YEand use the standardised normal distribution N (0. .P I as the p.g. We have mean = np = 500X0.f.2. in powers of t we obtain. Var(X) = n(n .p).
) However. The probability that there are fewer than 60 defectives is thus approximately 0. the probability that X 5 25 is approximated by the entire area to the left of 25. The booking office staff find that. whereas that of a normally distributed random variable is infinite).416) = 0. For example.7082 = a(1. the probability that the variable takes some specified value exactly is zero. 3 2A.0001. Notes (1) As stated in the solution. 0 is closer to the mean (SO) than 500.Pr(X 5 25) = 0. for each of the 500 articles separately. (3) The final probability in the solution has been rounded to 3 decimal places because linear interpolation in the tables of the normal distribution was used to obtain the fourth figure. and adopt a policy of selling u p to 100 tickets for any recital. Now. the distribution of the number of defectives in the sample is binomial.5.9999 = 0.45) between the values 24. for Pr(X < 60) we require [ 60+50 6.652) = 10. the indication here is that the normal approximation should be very good. This is clearly not the case for a random variable which has a discrete distribution. the further from the value of p is. Similarly the probability that X = 24 is approximated by the area under the curve between 23. given by Pr(X > 25) = 1 . whether or not it is found to be defective can be thought to constitute a Bernoulli trial. on average. A useful rule of thumb is that the approximation is reasonable if both np and n (1 . in the above case.5 and 24. since the number of articles in the sample is large.5 and 25. Similarly. the words ‘approximate’ and ‘approximately’ in the statement of the problem indicate that this is what is required. but even this is more than 7 standard deviations away from the mean. To see what is happening here. as will be the case.5. But we require Pr(X > 25). Note that the normal distribution can provide a good approximation to the binomial distribution even when this is not However. (See the Note to Problem 2A.5. What is the probability that for a recital for which 100 tickets have been sold.p ) exceed 5. when p # the larger will be the value of n required for the approximation to be a good one. the normal approximation can be safely used here. Since much less than 1 % of the normal distribution is contained beyond 3 standard deviations from the mean. continuing in this way. The above probability is then approximated by the area under the probability density function curve for N(50.4 Since we are approximating a discrete distribution (binomial) by a continuous distribution (normal) we include a continuity correction.9216. From tables of the normal distribution we find that Pr(X 5 2 5 ) = [ 25++50 6.7082 ] ] = @(3. one can think of approximating the probability that X = 25. 3% of people who book for a recital fail to turn up. A rough check on whether or not the normal distribution is likely to provide a good approximation to a particular binomial distribution can be carried out by seeing how much of the tails of the normal distribution is outside the range of the binomial distribution (a binomially distributed random variable has a finite range. for a random variable with a continuous distribution. symmetrical. independent of all the others. everyone who turns up has a seat? .52 Probability Distributions 2A.922. and since the calculated probability is only an approximation anyway. i. in this problem. (2) A continuity correction is used because a discrete distribution is being approximated by a continuous distribution.9999. 1. This is because. for instance.4 The music recital Regular music recitals are held in a small hall with seating for an audience of 98 people.
making the assumption that people who have booked act independently with regard to attendance or nonattendance.8: beginning with the value of Pr(X = 0). the limiting form of B ( n .(O. It is most unlikely that this assumption is valid. but note the rule of thumb given in Note 1 to Problem 2A.4e3 Notes = 1 . as follows: Pr(X z 2) = 1 . The Poisson distribution is. calculation of the necessary binomial coefficients by substituting the values of n and x in will fail because some of the factorials involved will overflow the calculator.p ) are both greater than 5. since many will book in small groups and it will often be the case that the failure of one member of a group to attend will be associated with the nonattendance of some or all of the other members of that group. however. We need to find the probability that two or more people fail to turn up (when there will be a seat for everyone). We are also. It may also be avoided by calculating probabilities sequentially. mathematically.e3 .1 0 0 ~ ( 0 ~ 0 3 ) ~ ( 0= 9 7 ) ~ ~ 0.2A. that such a solution is the best possible given the information at our disposal. we obtain the probabilities of other values of X using the recurrence relationship Pr(X = x ) = n x x + 1 p Pr(X = x .Pr(X = 1). Thus a solution based on the binomial distribution can only be approximate. by appropriate cancellation before entering any figures into the calculator. and have to make the assumption that the rate of nonattendance stays constant over time when computing the required probability. and hence the Poisson approximation is appropriate.m and p 0 in such a way that the product np remains constant.4 Discrete Distributions 53 Solution The number X of ticketholders who fail to turn up for the recital has a binomial distribution with index n = 100 and parameter p = 0. (1) We are told that. if n is large.1). 1p (3) Confusion often arises as to which approximation to the binomial distribution to use when n is large. on average. in using the binomial distribution.Pr(X < 2) = 1 .801.0. In practice this may not be the case as. as is done for the Poisson distribution in Problem 2A. The reason we say that the direct calculation of binomial probabilities is almost a simple matter is that.e. Hence the required probability is (approximately) 1 . as in the above calculations. (It is also worth noting that the normal distribution can be derived as a limiting form of the Poisson as its single .97)'Oo . for example. Since n is large and p is small we may approximate the binomial probabilities using the Poisson distribution with the same mean. different times of the year or the varying fame of performers may affect the attendance rates.3 that the approximation is likely to be reasonable if the products np and np (1 . The present problem gives an example where n is large and p is small. The two possibilities are the Poisson and normal distributions.3eP3= 1 . (2) The use of the Poisson approximation to the binomial distribution is less necessary now than it was when the only computational aids generally available were mathematical tables and mechanical calculators. 3. 3% of people who have booked for recitals fail to turn up. and hence mean = np = 3 . i. p ) as n . This may be avoided.Pr(X = 0) .805. It is clear.03.199 = 0. to find Pr(X 2 2) = 1 . With a modern scientific calculator it is (almost) a simple matter to calculate the required binomial probabilities exactly. The normal distribution approximation also depends mathcmatically on n tending to infinity.
X .. 3 10  = 2. and for this distribution a normal approximation is acceptable.3263. p.54 Probability Distributions 2A.N ( 5 0 x + . However. The two incorrect answers to any question are designed to be plausible. or if he guessed at five questions? Solution X (a) Let the number of correct answers given by an ignorant candidate be denoted by X . but with only one mark for a correct answer. If the examination is marked simply by giving one mark per correct answer. one of which is correct. s we have o X . ~ O X ~ X .w . approximately. +). T h e candidate has revised half the syllabus thoroughly. so we conclude that.801. for each question three answers are given.3263. Would the probability of passing be greater if he picked just three more questions hoping to get them all correct.) Although we see from this discussion that it is clearly the Poisson distribution which is appropriate to the present problem. it makes little difference numerically which approximation is used. so that the normal distribution can be used to approximate the Poisson distribution for large p. 0.5 Marking a multiplechoice test (a) A multiplechoice test paper contains 50 questions.01 and so we have to solve for x We are required to find that value x such that Pr(X in the equation 2 x ) =z Now @(k) = 0.24. W e thus see how the Poisson approximation to the binomial distribution might in turn lead to a normal approximation.99 implies that k = 2.50. i. If an ignorant candidate attempts every question. the distribution of X is B ( 5 0 . what should the pass mark be if the probability that a completely ignorant candidate passes is to be approximately 1%? (b) Suppose now that the examination is marked by awarding two marks per correct answer.810 rather than 0. 3 or x = .X the number of incorrect answers. Then . the score S of the candidate who attempts every question is a random variable such that + + S = 2X + (1)(50X) = 3X . X . 2A. it is the Poisson. ) .___ . +). so that an ignorant candidate could be expected to pick an answer quite at random. and when the pass mark is 28. but deducting one mark for every incorrect answer. not the normal. distribution which provides the appropriate approximation. (b) As in part (a). . Since X is the number of correct answers and 50 .92. this does not alter the fact that. and finds that he is certain of the correct answers to half the questions. from the structure of the problem. tends to infinity.B ( 5 0 . To gain the extra 3 marks needed to pass he decides to guess a t the answers to a few more questions.(?)*).e. In practice this obviously means that the pass mark 50 23’263 2 3 3 should be set at 25.z 1 .N(F.5 parameter. what is the expectation and variance of the candidate’s total score? (c) Consider the position of a candidate when the scoring system is as in part (b). If we use the normal approximation we find the answer to be 0.
the setup is a moderately realistic one. E(X) = . (The problem with guessing at many of these 25 further questions is that he is twice as likely to guess incorrectly. In practice. where K depends only on n .) If three further questions are attempted. From standard properties 50 100 of the binomial distribution used in part (a). but under the system in (c) only if the chance exceeds +. They do. x 1 and x 2 . find an expression for log P and show that. to maximise the total expected score. and each trial results in either success or failure. all must be correct in order to pass. However. candidates are usually not guessing quite at random. which is smaller than p 5 and p 7 . as to get an answer right. Trying nine questions is therefore a better strategy than trying eight.= p5. and gain reduced credit if the correct answer is one of those checked. though. since the candidate can then pass only by answering all four correctly. 2A. we see that the better strategy is to try five further questions. the candidate has 25 marks in the bag. even more) answers to the same question. Var(S) = 100.6 The insect breeding experiment (a) Two sets of n independent trials are performed. In practice one would wish to calculate the optimum number of extra questions to choose. independently of each other. a little calculation shows that. and be penalised. .50 and Var(S) = 9Var(X).6 Discrere Distriburioris 55 We see therefore that E(S) = 3E(X) . For example. A little reasoning shows that choosing four questions is not sensible. if seven extra questions are attempted. If eight or nine questions are tried. and the probability p 3 of this is p 3 = If five questions are attempted. a more difficult task than guessing three out of three. and scoring systems of the type described are in use. Trying more than seven questions is not advisable. the probability p 7 of getting five or more correct is 1 1 / 3 5 . with up to 25 further questions available to secure the extra 3 marks.X ' ( l p2)"Iz. the probability of success being p 1 in the first set of trials and p 2 in the second set.'p?(l .and Var(X) = . x1 and x2. offer candidates considerable incentive to revise. and the probability p s of this is (i)3. ps = (')5 + S X ( L3) d ( ?3) 11 =  35' 9 11 Since p3 = .< . and this strategy is thus just as good as choosing five extra questions. Other possibilities include allowing candidates to check two (or. (c) With one mark per correct answer and half the syllabus known. log P has a maximum value when p is such that (xl + 2 x 2 ) . If p 1 = p and p 2 = p 2 . Show that the probability P of obtaining x1 successes in the first set and x2 successes in the second set is given by P = Kp. and if the scoring scheme is known to them they have interesting strategic choices. for given values of n . a candidate should attempt a question if he assesses the chance of answering it correctly at f or more under the scheme in (b). since guesswork is most unlikely to be rewarded. 35 35 Notes (1) While the numbers have been kept simple. so we reach 3 9 E(S) = 0.(n . Similarly.x l ) p . (2) In part (c) the statement of the problem allows us to restrict attention to just 3 or 5 extra questions. choosing six questions is worse than choosing five. then a t least six must be answered correctly.p l ) n . then at least 4 must be correct to secure the three marks needed. if appropriate.2A. of course. and the probability of success from nine questions is 8 3 5 / 3 9 .3np2 = 0.
3np2 0.(n . log P = log K + ( x 1 + 2 x 2 ) l o g p + ( n .(1 and. and the probability of obtaining x1 successes is Similarly. which possessed the colour variation.'p. The required probability is therefore P where = Kp. as required.x 1 ) p ( l + p ) .56 Probability Distributions 2A. In the event there were 22 insects in the first section. .]P: ( 1 . Solution (a) The number of successes in the first set of n independent trials has a binomial distribution. The two sets of n independent trials are also independent of each other.x l ) p .x and x2 constant. we see that g (0) > 0 and that g ( 1 ) < 0.~ ) " . (There are special cases when X I = x2 = 0 and when they both equal n .p ) + (nxz)log(lp2). l ) and the other is negative.) Differentiation shows that the second derivative of log P is negative whenever p > 0. = i. also that the coefficient of p 2 is negative.~ ' . It follows that the roots of the equation g ( p ) = 0 are real. but the value of p was not known.~ 1 ) ~ .P2)nX2.'(l . x 1 and x2. in each of which 100 insects of a particular species were raised. taking logarithms. A local maximum value o log P is then given when the value of p is one of the f roots of this quadratic equation. the probability of obtaining x2 successes in the second set of n independent trials is given by the binomial distribution as (:.I ' . so that the two probabilities above should be multiplied together to yield the probability of the composite event described in the statement of the problem. (XI + 2x2) .~ ' ( P ~ ) ~ . we obtain p = Kpx . we differentiate with respect to p to obtain dP P Equating this to zero leads to the equation 1P 1p2 . Now denoting the left hand side of the equation in p above by g(p)..(n .x l ) l o g ( l .p 2 ) . 1 clearly K depends only on n .dlo P x1+2x2 n 11 2p(n x2) To maximise log P . Putting p = p and p 2 = p 2 in the expression for P . and that one lies in the range ( 0 . l ) gives a maximum for log P . + h 2 ( 1 .6 (b) A n insect breeding experiment was conducted in two sections. L=. and 7 in the second section. whilst holding n . Find the value of p which maximises the probability of this result. so the root of the equation g ('p) = 0 in ( 0 .2(n x2)p2 = 0 . ( X l + h 2 ) ( 1 . In the first section a proportion p was expected to have a certain colour variation and in the second section the proportion with this colour variation was expected to be p 2 .~ ~ P ~ ) " .e.
Find the probability (i) that in a 1minute period no calls are received.300p2 = 0 . This factorises to give (25p .6 = 0. the method being used here to find the value of p maximising l o g P is the method of rnnxirnurn likelihood estimation. 2A. 5 calls per minute. candidates often fail to recognise the relevance of the first part of a question to the second part. A simple way to convert a sum to a product is to take logarithms. Another is the method of least squares (see Note 4 to Problem 5A.2A. When answering examination questions.1 for its use in the context of linear regression). (iii) that in a 20minute period no more than 102 calls are received. x 1 and x 2 in the equation for p . this can equivalently. The method of maximum likelihood is one of the most commonly used methods of estimation in statistics (see Problems 2A.7 8 p . but with particular values of n (loo). on average. We therefore substitute for n .4 for other examples of its use). and obtain 36 . Now since log P is a monotonic function of P . or 50p2 + 13p . Hence. if X is the number of incoming calls in one minute. (2) Although it is not necessary to realise it in order to solve the problem.6)(2p + 1) = 0. The model for the insect breeding experiment depends on a parameter p . Others are the method of moments and the minimum chisquared method. algebraic.7 Discrete Distributions 57 (b) We note that the situation in the insect breeding experiment is precisely that which was defined in part (a). x 1 (22) and x 2 (7) specified. the root &. which has solutions p = that we require is p = the value of log P . part of the problem follows directly from the first. This is what was done in the solution above. we usually find that the quantity to be maximised is a product of several terms (themselves probabilities or probability densities). as was done here. between 0 and 1. (ii) that in a 2minute period fewer than 4 calls are received. It is usually easier to find the maximum of a sum of terms rather than that of a product. Then Pr(X = 0) = C 5= 0. (iv) that out of five separate 1minute periods there are exactly four in which 2 or more calls are received.7 The telephone exchange A telephone exchange receives.007 . (3) In applying the method of maximum likelihood to a problem like the present one. & and p = +. part. if we require the value of p maximising P . which we wish to estimate.11 and 4D. The method proceeds by finding that value of p for which the probability that the values actually observed occur is maximised. be done by finding the value maximising log P . Since p is a probability. numerical. The general result given in part (a) shows that this root maximises Notes (1) The second. and more easily. Solution (i) We assume that the number of incoming calls in one minute has the Poisson distribution with mean 5 .
Y . with no tendency either to arrive in groups or to be evenly spaced. i.58 Probability Distributions 2A. for example. are able to use a normal approximation. we find Pr(W 5 102) I  @ 102 + + . 0. A mathematically precise formulation of these assumptions is provided by the Poisson process . 1. While we have given the exact definition. (ii) in any small time interval ( t . Problem 2A.or threedimensional space. is negligible. We require Pr(W 5 102) and. Hence Pr(Y < 4) = Pr(Y = 0. (iii) in any small time interval ( t . the intention is that we should assume that the number of telephone calls in one minute has a Poisson distribution. and interest often centres on fitting one of the socalled contagious . using the binomial distribution B ( 5 .e. the probability that exactly four out of five 1minute periods contain 2 or more calls is 5(0. In many applications. with the following properties: (i) events relating to nonoverlapping intervals of time are statistically independent. has the Poisson distribution with mean 10.100 1 + 5) = a(0. W. the number of these occurrences in a fixed time is a random variable with a Poisson distribution. infrequent occurrences.Pr(O or 1 calls) = 18 ( 1 = 0. lo2). r + 6 r ) . (iii) The number of calls. Notes (1) Although it is not explicitly stated in the problem.7 (ii) The number of calls in a 2minute period. but in particular in the field o ecology.9596). This arises if we assume that calls are distributed a t random over time. We also assume that there is no change over time in the rate a t which calls are received. the important fact is simply that. 2 or 3) = e .9596) = 0.171.25) = 0. a model for completely random occurrences in time.4. in 20 minutes has the Poisson distribution with mean 100. the probability of an occurrence in the interval is proportional to the length of the interval.9596. the Poisson distribution is f often used to describe the distribution of organisms in two. when occurrences are completely haphazard in time. for completeness. justify the appropriateness of the Poisson distribution in other contexts (see. t + t i t ) .8). Then. An application of the Poisson process to the arrival of customers at a queue can be found in Problem 4D.0. etc.I+ lo 1 10 + + 102 2! 13! 03 1 = 0. It is this which justifies use of the Poisson distribution in so many cases involving accidents. the probability of two or more occurrences is proportional to ( 6 t ) 2 . (iv) In any 1minute period chosen at random the probability of 2 or more calls being received is 1 . Similar arguments. Using a continuity correction. with ‘space’ replacing ‘time’. the appropriate normal distribution is N(100. Since the mean and variance of the Poisson distribution are the same.599.9596)4(1.010. since the mean of the distribution is large. Alternative distributions (if the organisms are not distributed randomly) can be described as regular or clustered.
1)} + E(X) . if X is the number of sultanas in a scone. We see therefore that for the Poisson distribution the mean and variance are equal. .. .2A. all have the property that the variance is greater than the mean. . . (b) T h e mean of the Poisson distribution is simply the average number of sultanas per scone. x =2 this time using a substitution y = x . By a n essentially similar argument. . which is 3600/1200 = 3.8 Random sultanas i scones n (a) Show that for a Poisson distribution the mean is equal to the variance.0 Discrete Distributions 59 distributions to clustered.1. (2) In part (iv) we need to compute probabilities from two distributions (Poisson and binomial). . 0 4 9 7 9 59. Pr(X = x ) = 3" e ~ ~ . clumped or aggregated data.{E(X)}2 =p2+pp2=pL. It is not uncommon to find problems of this type. then Pr(X = x ) = e'p"/x!.. A n example of a contagious distribution is the negative binomial distribution (see Problem 2A. Assuming that the number of sultanas in each scone follows a Poisson distribution. we first consider E{X(X . = the estimated number of scones without a sultana is 60. . X! Substituting x = 0 in this expression. 2A. . and so E(X) = E x P r ( X = x ) x 0 D W = E x P r ( x = x ) = e'xxp'/x! x =1 W 1 =1 = pe'Ep**/(x x =1 . . To estimate the number of scones without a sultana we need to multiply this probability by the total number of scones.2 in the summation. Hence. where the probability p of 'success' in a Bernoulli trial is calculated from some standard distribution. estimate (i) the number of scones which will be without a sultana. this is s e n to be W E{X(X . since 1 2 0 0 ~ 0 . and to combine these.Poisson(p).1)} = E x ( . W x = 0.l)eOp'/x! I 2 W = p2e*E px2/(x . Solution (a) If X . and then used in the calculation of a binomial probability.10). x = 0. Hence.2)! = p2.l)}. (b) In a bakery 3600 sultanas are added to a mixture which is subsequently divided up to make 1200 fruit scones. W e now find Var(X) = E{X(X . For Var(X).04979.I)! = p. (ii) the number with 5 or more sultanas.7. 1 . since substituting y = x .1 in the summation shows it to be just the expansion of e p in powers of p. we obtain the probability that any one scone contains no sultanas as e3= 0.
60 Probability Distributions Now Pr(5 or more sultanas) = 1 2A. though. * (4) In calculating Var(X) in part (a) we used the device of first calculating E{X(X . : x 0 4 Computing the required probabilities sequentially. But lx0. we have made use of a simple relationship between successive values. 222 to the nearest whole number. using the recurrence relationship Pr(X = x ) = Pr(X X 5 = x . if a suitable calculator is available. and then decrease monotonically. Care needs to be taken to account for a sufficiently large number o decimal places in the f working to achieve the degree of accuracy ultimately desired.2240. and some of its properties are given. (The function is defined generally.2240.1680.e.1847 The estimated number of scones with 5 or more sultanas is therefore 1200~0.2.7.1847 221. ( 5 ) The problem of obtaining the mean and variance of a Poisson distribution can be tackled using the probability generating function. then if p > 1 the successive probabilities increase progressively to a unique maximum (corresponding to the largest integer less than p). Note 2 to Problem 2A. = i. (2) To find values of the Poisson probability function in solving this problem.2241. we obtain Pr(X = 1) = :Pr(X = 0) = 0.4). Notes (1) In this problem the use of the Poisson distribution depends on an assumption that the sultanas are ‘randomly’ distributed through the mixture.1)). see Note 1 to Problem 2A.2240. is that of compounding rounding errors. This illustrates a property of the Poisson distribution which holds whenever the mean p is an integer. (3) Note that Pr(X = 2) = Pr(X = 3) in this example. The probability that X = 1 was recorded as 0. For a detailed discussion of what the word ‘randomly’ means in this context.043153 = 0.1494 = 0.8153 and so =o Pr(5 or more sultanas) = 1 .8  I Pr(X = x ) .1494 while Pr(X = 2 ) was recorded as 0. namely that Pr(X = p . The reader may have noted that this was done in the calculation above.2. A short discussion of this device can be found in Note 1 to Problem 2A. for example.1) = Pr(X = p). in Note 4 to Problem 2A. A potential hazard here. yet in fact both figures given in the solution are accurate to four decimal 2 places. Better still. while if p < 1 they simply decrease monotonically. When p is not an integer. Such an approach can frequently simplify calculations when we are dealing with discrete distributions (see.7. 4 Hence x Pr(X = x ) = 0. Pr(X = 2) = iPr(X = 1) = 0.) .1494. Pr(X = 4) = iPr(X = 3) = 0. Pr(X = 3) = +Pr(X = 2) = 0.l). is to carry out the sequence of calculations entirely within the machine. writing down any results needed but basing subsequent calculations on the more accurate value still held in the calculator.
obtain G 'X(t) = pep(''). or if X = 2 and Y = z .9 Discrete Distributions 61 If a random variable X has a Poisson distribution with mean p. Now Z = z if X = O and Y = z .{G'x( I)}* p2 + p . We thus obtain z Pr(Z = z ) = x P r ( X = x x =o Pr(X = x Y =zx). (b) Show that.x ) = Pr(X = x ) Pr(Y = z . (a) Show that the distribution of Z is a Poisson distribution.2. for values of z 2 0. so. G x ( r ) given by W G x ( r ) = E(tX) = x t x P r ( X = x ) x =o In the present problem we require the mean and variance of X .x ) and so . and each is the intersection of two independent events. and find its index and parameter. These z + 1 components are mutually exclusive.1 . n Y = z . we find the second derivative to be p2 when z = 1.9 Relationship between binomial and Poisson distributions Independent random variables X and Y have Poisson distributions with means p1 and p2 respectively. Solution (a) The most direct method evaluates Pr(Z = z ) . ending with the event that X = z and Y = 0. by the addition law of probability.g. and the random variable Z is defined as Z = X + Y. the distribution of X is binomial. it has p. and these can be found by and differentiation. conditional upon the event Z = z . by independence. for example {X = 1) and {Y = z1).p2 = p = E(X) * 2A. Further. from the result presented in Note 4 to Problem 2A.2 . and so on. Differentiating again. and evaluating this at z = 1 gives E(X) = p. or if X = 1 and Y = z . by splitting the event {Z = z } into mutually exclusive component events whose probabilities can then be added together. we obtain = Var(X) = G " x ( 1) + G ' x ( 1) .f. For the mean we require G f x ( l).2A.
etc. viz. probability generating functions (p.) If a random variable W has a Poisson distribution with mean k. (2) As in Problem 2A.9 z r where.p )y . since the aim in part (a) is to find the distribution of the sum of two independent random variables. its application is quite practical.x ) . (b) To find the conditional distribution of X given the event Z = z . so that the joint probability can be evaluated as a product.. (Further details of definition. we thus obtain ePIPf e)4Plqr i.62 Probability Distributions Rearranging the terms in this expression gives Pr(Z = z ) = e (y+*) 1 Pf P x=ox!(zx)! 2A. and the numerator can be reexpressed as Pr(X = x n Y = z . but X and Y are. Problem 4C. Notes (1) This rather tricky problem is included partly to provide a basis for a later one. .) The problem also shows how closely interlinked the binomial and Poisson distributions are. can be found in Note 4 to Problem 2A. we have written p = p i / ( p ~ pz). Z has the Poisson distribution with mean p1 + p2. for clarity.e. The advantage of so doing is that X and Z are not independent. conditional upon Z = z. it has p.2. It is clear that the sum in this expression + is just the binomial expansion of b + (1 .g. 1. C . we simply need to evaluate Pr(X = x i Z = z).2.e.X .g. providing us with a twosample test for use with data from Poisson distributions. and is thus 1. hence i.f.fs) can be used to advantage here.B [z. ( t ) given by .8. (While the present problem is purely an exercise in manipulating probabilities. it is not just the case that one can sometimes be used as an approximation to the other. ___ PI”:. using the ordinary formula for conditional probability Pr(X=x / Z = z ) = Pr(X = x n Z Pr(Z = z) =z ) The denominator was found in part (a).
obtain an expression for Pr(X = x ) . of it landing 'tails'. it follows naturally that the sum of any number of independent Poisson random variables will itself have a Poisson distribution. . T y H The first y . 2A. We say that the random variable X has a geometric distribution. . with parameter p . rn +2. where Y is a random variable denoting the number of tosscs until the rn th head appears. with these heads and tails arranged in any order. (a) As an illustration. consider the case x = 5.2A.g. + Y .1 . .10 rn Discrete Distributions 63 G w ( t )= E ( t W )= w =O t"Pr(W = w ) = e*('').f. using an extension of the direct method of the solution to part (a) is very cumbersome. 2. . y = r n . .T illustrated corresponds to just one of the . and whatever replaces p in the formula will be the mean of its distribution.rn tails.8.. By contrast.g. we must have a sequence of tosses of the following form: Tossnumber Result 'ITHHT 12345 .1 heads and y .1) f and from the form of this function of t we see that Z must have the Poisson distribution with mean p1 + p?... . where X is a random variable denoting the number of tosses until the first head appears.1) .1 tosses result in m . Thus the sequence TTHHT . Conversely. 3. (b) In this case. . .. so we find rr. x = 1. But it is much more powerful. since it extends immediately to finding the distribution of the sum of three or more independent random variables. Pr(X = x ) = (1 .2. +*)(I . y .10 Tossing a coin until 'heads' appears (a) In a series of independent tosses of a coin. . obtain an expression for Pr(Y = y ) . ..p ) I p l p . .p ) " . All one has to d o is to multiply together the appropriate number of p. This argument is only slightly shorter than the more direct one used in the solution.O Gx(r)= e . rn + 1. of a random variable with a Poisson distribution. (b) If rn is a positive integer.g. GZ(t) = GX(t)XGY(t) JP. as seen in Problem 2A.1) and G y ( t ) = e*4(1 But Z = X . by the multiplicative property of p. so. with probability p of the coin landing 'heads'. and.fs.fs. and probability 1 . we obtain 1 Pr(X = 5 ) = ( . Now X and Y both have Poisson distributions. .3. and the product will be seen to be of the required form. in general. any function of t of this form will be the p. when we observe: TTTTH Since the tosses are independent. to obtain the result Y = y . Solution .p x = 1.
1. where Y has the negative binomial distribution given in the problem and Z is a binomial random variable with the B (n + rn . . for the geometric distribution. with parameters rn and p . The event {Z 2 r n } occurs if there are a t least rn heads in the first n + rn tosses.64 Probability Distributions 2A. The probability that the y th toss is a head is simply p .. which connects negative binomial and binomial random variables: n + r n ) = Pr(Z 2 rn). for i = 0. p ) distribution. We require i. It is instructive (and far more difficult) to prove equation (*) using algebra. * (2) By considering the number of heads in n Pr(Y I + m independent tosses of a coin we can readily verify the following result. the probability of obtaining rn . . . since The reader may care to perform the same check for the negative binomial distribution.1. the coefficient of (1 . n . 2 .e. so that equation (*) holds. . we see that. . which implies that the number of tosses Y until the rn th head must be no larger than n + rn . After cancelling p m . (*) for n = 0 . We see therefore that c r:] The random variable Y is said to have a negative binomial distribution. Notes (1) We may readily check that. and thus have the same probability.1 tosses in which there are exactly rn . 1 .10 possible orderings for the first y .p ) i on the left hand side is simply while on the right hand side the coefficient is and the equality of these two expressions follows from considering the coefficient of w i on both sides of the identity where w is a dummy variable. The two events {Y 5 n + r n } and {Z 2 m } are identical.p) distribution. . We note that when rn = 1 we have the special case of the geometric distribution found in Part (4.1 heads.1 heads from these tosses is found from the binomial BO. . the probability function sums to 1. .
We assume also that the population is stable. Viewing Pr(X = x ) as a function g ( n ) of n . so that there are no births.l ) ! ( n .m).11 Estimating the size of a f s h population Discrele Distributions 65 A lake contains n fish. X = x is t)(“k IF.k + x ) ! (nmk+x)!n!(n . so that marked fish mingle freely with others.k .m . which will often be unknown in practice.11 2A.m+kn)sx Smin(k. This probability is a function of the known quantities k and m .1 . and n .m ) ! ( n (n . We now need to count the number of ways of selecting the second sample in which X = x . and are no more and no less likely to be caught again in the second sample. (n m)(n k) > (n .k + x ) n .1 . A sample of m fish ( m < n ) is caught. = (n . examine the ratio g ( n ) / g ( n 1) and hence obtain the value of n for which g ( n ) is a maximum. Our assumption about the sampling is that all these ways are equally likely so that each of the possible samples has probability (It1l of being the one selected.x . S o htion In this solution we assume that marking the fish does not affect them in any way. say. .2A. then the number of unmarked fish in the second sample is k . (“k 1) ways.1)  k)!(n . The bounds shown for x are needed to ensure that the binomial coefficients are valid. with probability (kn)’. a second sample of k fish (k < n ) is caught. . and they also correspond to the practical constraints of the problem. and for each of those ways we can choose the k . max(O.m . find an expression for Pr(X = x). Since k fish are caught in the second sample. and all are then returned to the lake.k ) ! ( n 1m)! k) +x)n = q. Now if X = x . The function g ( n ) is simply Pr(X = x ) viewed as a function of the unknown n . Now when q > 1. the number x of marked fish in the second sample.m ) ( n (n . this sample can be chosen in different ways. Making reasonable assumptions about the way in which the sampling would be carried out.m . Thus the total number of ways of selecting the second sample such that . deaths or migrations during the investigation.). Latex. the fish are marked. where X is the number of marked fish in the second sample. We can choose the x marked fish in ways. so we have and substituting n .1 for n gives so that g(n) g(. and so each fish in Pr(X = x) = (It) .x unmarked t) (T] .
Approximation by the normal distribution may also be possible. Now the function g ( n ) gives the probability of observing x = 2 in the second f sample. is an example of what is termed a muximum likelihood estimator. The resulting estimator. g ( n ) < g ( n . (Other examples can be found in Problems 2A. let us suppose for the moment that there are only two possible values for n . viz. When k is small relative to n .) In sampling from a finite population. k = 3 and x = 2. (This distribution also arises in Problem 1A. and that another 3 were caught in the second sample. it gives the probability o occurrence of what has occurred. but omit this here for reasons of space..11 which results in as long as x > 0. Since g ( 6 ) 2g(14). 2A. [mk / X I . which is generally easier to manipulate mathematically. m _ _x n k‘ * For example. If we had to estimate n . (2) If we wish to obtain an estimate of an unknown population size n (and in practice this is often the aim of such an investigation). For these two values we can easily calculate g ( n ) . since what we have observed to occur is distinctly less likely in the latter case. the distribution of X is hypergeometric when the sampling is done without replacement (the usual practice). Hence. Had the second sample of fish been selected with replacement. on account of the form of its probability generating function. g ( n ) .) As it happens. In practice. the integer part of m k l x . g ( n ) > g(n1). Then we might observe x = 7 marked fish.66 Probability Distributions i.e. the distribution of X would have been the binomial distribution B ( k ) : . so there is no finite value of n for which g ( n ) is a maximum. (3) To illustrate the ideas involved in the method of maximum likelihood. as well as calculating a point estimate for n we would also obtain a confidence interval. i. We now turn back to the case x > 0.n m kn + n r .6 and 4D. this case the k ‘trials’ are independent. so that in fact 10% of the fish are marked. in which case q > 1 whenever n < r n k / x . the estimator mk/x follows from simply equating the proportions of marked fish in the lake and in the second sample. for n < m k / x . Since this is just an illustration. Notes (1) The distribution of X is called the hypergeometric distribution. for n > m k / x . the case for picking n = 6 would seem a strong one. while. and so the function g ( n ) takes its maximum value when n = [ m k / x ] . i. of which 2 had been marked. and we wish to estimate n . the value n = 6 is more plausible than is n = 14. and find that g(6) = f and g (14) = z i .1 ) .4.2. the total number of fish in the lake. since in of ‘success’. each with the same probability obtaining a marked fish. with only these two values from which to choose. it is intuitively sensible to select that value of n maximising Pr(X = x ) . n = 6 and n = 14. which is a hypergeometric function. we use a smallscale example. Suppose that in the first sample 5 fish were caught and marked. when one counts the number of sample members X with some attribute. that is. . In this event the estimate of n would be 714. and that we select k = 50 fish in the second sample. the difference between sampling with and without replacement will be slight. n2nk mn +mk > n 2 .e.e. and the hypergeometric distribution can be approximated by the binomial distribution above. suppose that n = 1000 (although in practice we would not know this) and that rn = 100. If x = 0 then q > 1 for all values of n . We then have m = 5.
say. o course. There are always practical problems to overcome. relative to the (known) value of n can be examined. once i . at any rate.1 ) h that the card it contains is an old one and. and are employed in practice. or it might also be a new card. fish with different histories of capture may have different ‘catchability’. m and k each time.4. such as conducting the sampling more than twice. provide a refreshing alternative to the dreary routine of taking the register twice daily. and its location and spread. in which the X s denote independent random variables. which is 2 therefore the maximum likelihood estimate of n . and add to the collection. and the value of n for which the likelihood is maximised is called the m i r n u m likelihood estimate of n . and one may also have to take into account the fact that the fish population will typically not be stable. say. the number of packets needed to secure the i t h new card. A useful. for each packet bought. A little calculation shows that g ( 6 ) = 1’ g ( 7 ) = 4. and collectors would aim to obtain a complete set. by independence. not restricted to just a couple of possible values for n . Typically one finds that the spread can be quite large. exercise can result from trying such a sampling experiment in. film stars or football teams. is at a maximum. and the basic assumptions of the method. Then we can write 2 = 1 +X2 + x3 + . In this case. T h e method is then the method of marimum likelihood.12 Discrete Distributions 61 In practice. the probability of Occurrence of what has been observed.+X”. g. k = 3 and x = 2. ( 5 ) Procedures of this type are known as capturerecapture experiments. mentioned above. the . A histogram could be drawn to summarise the resulting sample of estimates of n . and so on. m = 5 . and so on. It would.2. Collecting cigarette cards Cigarette packets used to contain cards representing such things as flowers.12 Solution Let Z denote the total number of packets purchased until the complete set of cards is obtained. we are. the first packet bought would contain a new card which would start the collection. unless very large values of m and k are used. Obtain expressions for the mean and variance of the number of packets which must be purchased until a complete set of cards is obtained. as assumed in this problem. a school playground. the f argument above suggests that a sensible way of estimating n is to pick that value for which g ( n ) . 2A. A complete set would comprise n different cards. correspondingly. (4) An interesting light can be shed on this method of estimating population sizes by simulating the method... The method is widely used. g ( 8 ) = g(9) = g. In the case illustrated above. the probability is ( i . Hence E(Z) = 1 n + I E(Xi) : i 2 n and Var(Z) = x V a r ( X . practical. X3 is the number of extra packets until the third new card.1 have been obtained. X2 is the number of packets (after the very first) until the second card is added to the collection. and the function g ( n ) is termed the fikelihoodfinction. The maximum value is at 7. Clearly. The next card obtained might simply duplicate the first card. using the same known values of n . but ways of increasing precision are usually incorporated. ) . may not be valid in practice.. For example. and so on. t=2 We now obtain the expectation and variance of X.
. l / p and Var(X) = ( 1 . k = 1.68 Probability Distributions 2A. Since. . has the geometric distribution with parameter (n i + l ) / n . E(2) = 1 +nx(n i =2 + l)l and These expressions can be written more simply as E(Z) = 1 + n Z 1 . for fixed i . ..i + l ) / n that the card is new. 2 . . . ..e. pr(X = k ) = ( I .2.!2 probability is ( n . [+]kl[+]. k = 1. the number of trials until the first success.Ik' [%I. in which case we obtain the E(X2) = . a packet with a new card) is constant. ) We thus see that Pr(X2=k) = [. . 1 0 ..2. Now if a random variable X has a geometric distribution such that . 25 3 . Xi. . 6 5 6 3 E(X3) = .~ ) ~ . from equations ( 1 ) and (2) above. the probability o 'success' f (i. Var(X3) = 2 7. Var(X2) = .1) ( n . then E(X) = .l j=l n 1 and j=l Notes (1) As an illustration of these results we consider the case n following expectations and variances.' p k = 1 .p ) / p 2 . (This distribution is discussed in more detail in Problem 2 A . in general.i 1)2 + ' Thus. and the 'trials' are independent. Hence we obtain n E(Xi) = ni+l n ni+l n and var(xi) = ]'[+I= i n(i . Pr(X3=t) = and. = 6.
. . Because of independence.5 + 2 + 3 + 6 + 2 + 6 + 30 and its variance is given by Var(Z) = 0. the probability generating function (or p. using suitable summations. be derived from first principles if necessary.g. In this way it can be shown that tJ in this series will be the .2 + 1. and in particular its mean and variance.75 = 38. The novelty in the problem lies in the fact that the Xs.24 + 0. Var(X5) = 6. 6.Xl0.. are not identically distributed. is just 1. of course. E(X5) = 3 .2A. ( t ) given by n1 . Var(X6) E(Z) = 1 = Discrete Distributions 69 30. The problem is solved by straightforward. though repeated. We can see from this information alone that the distribution of Z is likely to be skew. (2) Note that the solution is greatly facilitated by considering Z as the sum of independent random variables. which (see Note 3) is used to test decimal digits for randomness.4 standard deviations below the mean.. since the minimum possible value.2) is the product of the p. One can. = 14. These can. E(X6) = 6 . Hence the expectation of Z is given by + 1.12 E(X4) = 2 .7. . (3) When n = 10. see Note 4 to Problem 2A. (4) Calculation of the mean and variance of Z is.fs of the component random variables.it) (O i=l which may be written in partial fraction form as This can be expanded as a power series in t .g. only the first step in an analysis. We present below an outline of how we proceed in the important case n = 10. while independent. go further and derive the distribution of Z . We have the equation * z = 1+ zxi i =2 10 relating 2 to the independent random variables X2.99.g. and the coefficient of probability that Z = j. use of the geometric distribution. the distribution of Z forms the basis of what is termed the couponcollector test for randomness of a set of decimal digits. but this is more difficult.f. in practice. naturally. After some algebra we find that Z has p. Var(X4) = 2.f. G .
9) is 1. George Allen & Unwin.f. Neave (Elementary Stotisrics Tables.10 X Figure 2.700 m nine times out of ten corresponds to a probability of 0. then (X . Readers will be familiar with the standardisation property of the normal distribution.0 00 .60 1.C d I) I 5 2. A frequentlyused piece of notation in this section. Since this is not a conventional textbook we d o not include these tables here. so that in this case    2B. so as to appreciate the variety of ways in which the standard information can be presented. These are .75 1.8) is 0. 1984) and by H.850111 once in five attempts. calculate the mean and standard deviation of the distribution. From tables. 2 that the height he reaches on any one attempt is at least 1.850 m once in five attempts. a cumulative probability of 0.50 1.00 2. N( 0. A matter of key importance is that of learning how to read tables of the standardised normal distribution.2817. (read ‘Z has the normal distribution with mean 0 and variance 1’). Scott (New Cambridge Elernentory Sratisrical Tables. but recommend the reader to get to know the layout in several books of statistical tables.d. the probability that Z 5 2 .0 > c .e.1 P. Lindley and W. of normal distribution for Problem 2B.u2) represents the normal distribution with mean p and variance u2.700 m.85 1. 1. Solution If the boy reaches a height of at least 1.1. i. and indeed elsewhere.65 1. u2). R.1 (cumulative probability of 0. l ) .90 1. the standardised normal deviate corresponding to a tail area of 0.+ 6 C 6. These probabilities are represented by the shaded areas in Figure 2.05 2. l). Similarly.95 2.8418.1 If we denote the unknown mean of the normal distribution by p and the unknown standard . l ) . V.700m nine times out of ten attempts. Similarly. and to a height of at least 1. Assuming that the heights he can reach in various attempts form a normal distribution. where Z N ( 0.850m.l 2B Continuous Distributions In this section we concentrate principally on the normal distribution. . Two good sets of tables are those by D. Cambridge University Press. is @ ( z ) .0 4.2 (i. the standardised normal deviate corresponding to an upper tail area of 0.55 1. that if X N ( p. there is a probability of 0 .70 1. F. deviation by u we can write down two equations for p and u. reflecting its central rble in statistics.1 of failing to reach a height of 1. Calculate also the height that the boy can expect to exceed once in one thousand attempts.k)/u N( 0.80 1. in the customary notational system in which N( p. reaching a height of at least 1.70 Probability Distributions 2B. 1981).e.1 The slippery pole A boy is trying to climb a slippery pole and finds that he can climb to a height of at least 1.
so that. we find from tables of the normal distribution that the standardised normal dcviatc corresponding to a tail area of 0. with mean 8 parts per million and standard deviation 1. Then the information given in the question allows us to write Hence and 1.70 below it: a sketch diagram showing a normal distribution and the positions of 1.e.999). Suppose @ ( z ) is the distribution function of the standard normal distribution N ( 0 .092 (i. for example.1235.7905 m. However.85 must lie above the mean. Substituting for u in (1) or (2) gives p = 1. similar to that shown in Figure 2. (In Figure 2. l ) and that W ' ( y ) is the value of z for which @ ( z ) = y . more concise.1. Hence u = 0.8418 = U = @1(0.2817. the curve is drawn and calibrated accurately. examining the figures given shows that 1.0706m.1.p .2 Continuous Distributions 7 1 0.2. (2) In problems of this type it is always a good idea to draw a sketch diagram. if x is the required height.85 . Hence.009 m.2 Mixing batches of pesticide The concentration of a certain active agent in a liquid form of a pesticide must not exceed 12 parts per million.1) 1.092) = 0. giving x = 2. the shaded region on the right hand side has area exactly 0.8) U = 0.70 and 1. What proportion of batches exceeds the permitted maximum? If two batches are mixed equally by volume the resulting concentration is the average of the two concentrations in the constituent batches.15 Adding (1) and (2) gives __ 5 = 2.= W'(0. and 1. One can naturally only draw an accurate diagram once one has calculated p and u. In order to calculate the height the boy can expect to exceed once in one thousand attempts. way of writing out the derivation of the simultaneous equations (1) and (2) is as follows. The pesticide is made u p in batches in which the concentration varies normally between batches.) 2B. in order to visualise exactly what is required. O(3.2B.85 in relation to the mean will help to check for gross errors in arithmetic in the solution. Notes (1) An alternative.001 is 3. What proportion of such mixed samples exceeds the permitted maximum? What proportion falls below 3 parts per million (the level a t which the pesticide Ceases to have any effect)? .5 parts per million.
(ii) the percentage of cylinders produced whose diameters fall outside the specified limits.01 cm.771) = 0. if we were to mix the batches in unequal proportions w and 1 .8 If C1 is the concentration of the first batch and C2 is the concentration of the second batch (both in parts per million). so that C = wC1 + (1 .000 001 2. with corresponding tail area 1.0607. we consider the standardised normal 38 deviate = 4. and hence standard deviation 1.3 Tolerances for metal cylinders In a manufacturing process circular metal cylinders are being produced as components of a certain product. correct to 4 1. z For the proportion which now exceeds the permitted maximum concentration.3 = 2.w)C.5)2 + (1.) Note The factor in the expression for the mean of C is easy to remember. the standardised normal deviate is = 3.72 Probability Distributions Solution For the first part.0038.60 cm. From tables. we would obtain E(C) = w E(C1) + (1 . four are usable and one fails to meet the specifications. 2B.w ) ~ v ~ ~ ( c Z ) . ~ 12 . Generalising the problem.. independently of each other. (In fact. For a cylinder which is produced to be usable. . If the batches are chosen at random. (iii) the percentage of cylinders that cannot be used. T h e process is running in such a way that the cylinders have lengths which are normally distributed about a mean of 8.5)’} = 1. its length must be between 8.5 the corresponding tail area is 0. with corresponding tail area 1 .771. the diameters are normally distributed about a mean of 1. with standard deviation 0. then the resulting concentration is C = i ( C l + C 2 ) . To find the proportion falling below 3 parts per million. (iv) the chance that in a sample of five cylinders taken at random.667.0607 1 . Note that in no circumstances would one average the standard deviations in this type of problem.65 cm. C1 and C2 being independent.(D(3.w .54 cm with standard deviation 0.714) = 0.a(4. This arises because a variance.0001.125.0000. the standardised normal deviate of interest is 2B. and its diameter between 1.45 cm and 8.05 cm while. which is therefore the proportion of batches which exceed the maximum concentration. being the average squared deviation of an observation from its mean. Find (i) the percentage of cylinders produced whose lengths fall outside the specified limits.55 cm and 1.57 cm. cP(4.714) zz 0. but var(c) = w2Var(C1) + (1 . then C has a normal distribution with mean l ( 8 + 8) = 8 and variance f{(1. but many students find difficulty with the corresponding factor f in the variance. independently.714. 1.w)E(C2).0607 decimal places. is measured in the square of the original units. which is the required proportion.
2. Then the event that a cylinder chosen at random from those produced fails to meet the specifications is simply A UB .70 8.e.65 .8.1.0012. or 2.60 8.1.a ( 3 ) = 0.01 From tables.40 8. i. (ii) We now need to find the total area in the tails of the distribution corresponding to standardised normal deviates 1.20) = 0.05 To obtain the tail areas we first of all find standardised normal deviates corresponding to the given limits.0012 = 0.54 and s.75 X Figure 2.0498 + 0.a(1.0139 and @(1230) = 1 .54 = 2.d. Hence Pr(AUB) = 0. The specified Limits are 8.e.98%.0359) = 4.01 and 0. the percentage of cylinders that will not meet the specifications is 7.20. (iii) Let A be the event that a cylinder chosen at random fails to meet the specified limits for length.45 8.2B.27%. and Pr(AUB) = Pr(A) 1.2 Normal distribution with mean 8.Pr(AflB). by independence.57 = 3 0.3 Continuous Distributions 73 Solution (i) We need to calculate appropriate tail areas of the distribution of lengths of cylinders.45 and 8.a(2. these are 1 .0. 0 . 0.001 35 and 1 . Now. Pr(A n B ) = Pr(A)Pr(B) = 0.0498~0. Therefore the percentage of cylinders whose lengths fail to meet the specifications is lOO(0. and let B be the corresponding event for diameter. the required areas are 1 .0241 . i.55 .50 8.65 8.  + Pr(B) . These are 8.60 .35 8.0241= 0.57 = 2.a ( 2 ) = 0.L Y r 8 $ 6 'vr 4 w z K d 2 0 8. N(8.54. Hence the percentage of cylinders whose diameters fail to meet the specifications is lOO(0.55 8.41%.and the required tail areas are shown as the shaded areas in Figure 2.65. 0.0727.022 75.022 75)%.80) = 0.052).0139 + 0. 0.0359.05 and From tables of the normal distribution.00135 + 0. .
but no conventional model seems very plausible. The third part then uses these in an exercise in combining probabilities of different events.0241) = 0. s 1 .74 Probabiliry Distributions 2B. which are not mutually exclusive. From normal tables.75 gives k 2 = 0.0727. and letting p and u denote the mean and standard deviation.5 and p + 0. the manager finds that 3% of boxes contain fewer than 46 but 25% contain 51 or more. ~  1. Since manufacturer’s claim.4 The number of matches in a box A manufacturer of matches claims that boxes contain. Do you feel that the manufacturer’s claim is plausible? Solution The problem contains no statement of the distribution of the number of matches in a box. we naturally d o not complain about the and these give p = 49. we have @ and 1 @ _ giving two simultaneous equations in the two unknowns. 49 matches. (2) There is a slightly shorter way of solving part (iii) of the problem.P r ( 2 flE) = 1 . part (iv) uses the probabilities obtained from the normal distribution in an application of the binomial distribution. employing a continuity correction.9273 (where p is the probability that a cylinder meets the specifications) we find that the probability that 4 out of 5 cylinders meet the specifications is (~)(0. @ ( k l ) = 0. The random variable is clearly discrete. and the natural approach to the problem would be to make use of the normal distribution. a common form of problem. We thus solve p  k 1 r1 1 : [46 .4 (iv) Using thc binomial distribution with index n = 5 and parameter p = 0. while @ ( k 2 ) = 0. Using the notation of that part.Pr(A)Pr(B) = 1(1 0. 5 .88.96.03 gives k l = 1.67. u = 1. Making reasonable assumptions. We then obtain.670 = 5 0 .0498)(1 0.p _ _ = 0.9273)‘(0.880 = 45.03 = 0. > 49. Notes (1) The first two parts of this problem are standard exercises in computing probabilities from the normal distribution. Following this approach. Finally. It might be reasonable to assume a fairly symmetrical distribution. .269. on average. by independence.19. the event whose probability is required is the complement of n . calculate the mean and standard deviation of the number of matches per box. On checking output over a substantial period.0727) = 0.25. 2B.
and.Y ) = E(X) .7 = 0 .pY. and variance 0.5 Fitting plungers into holes A random variable X has a N ( p x . pY = 99. in the notation of a . to calculate them. . and offers literalminded mathematicians an opportunity to quibble.N ( p x . and therefore need to use a continuity correction. standard deviation 0. one runs through a catalogue of plausible models.Y has a normal distribution with mean 100. If components are selected at random for assembly. . This is because the wording of the problem implies that it is not just sample information that is available.20’ = 0. ) Var(X To answer the numerical part we note that a plunger will not fit into a cylinder if the external diameter of the plunger is greater than the internal diameter of the cylinder.E(Y) = px . say. Here the problem is. a + a. Hence X . Here we note that E(X . results from 200 randomly selected boxes the problem would have been one of inference. It is known that the manufacturing process is adjusted so that the diameters of plunger and cylinder are normally distributed.0625mm2. but the conclusion drawn in the solution is nonetheless the only reasonable one to draw in the real world.299. so that the standard deviation is 0.25 mm. in real life the statistician might encounter a problem like this through consultation with the manufacturer. Denoting the internal diameter of the cylinder by X and the external diameter of the plunger by Y (both in millimetres).Y ) = v a r ( x ) + Var(Y) = a.2 mm.15’ + 0. (2) The final part of the problem is worded somewhat vaguely. standard deviation 0. 5 m m . a random variable Y has a ) : N (pY.2. and so on.1. whose mean and variance can be found from the usual rules for sums and differences of independent random variables. with mean and standard deviation as follows: External diameter of plunger: mean 99. a distribution.20 mm.7.. Had the problem referred to.2B.2 + a : and therefore find that X . Since the calculation does not give p exactly 49 a claim that the average is 49 is not literally correct. and . rather a difficult one. what proportion of plungers will not fit? Solution A random variable which is a linear combination of a pair of independently and normally distributed random variables also has a normal distribution. The two problems differ in principle only in that here we have a discrete random variable. Noting that the random variable is discrete. and would then have the opportunity to observe the manufacturing process and use the experience to develop an appropriate model. . indeed. a distribution. number of events in a fixed period (Poisson). (3) The comment in the solution about conventional (discrete) models reflects the statistician’s natural approach to a problem like the present one.Y . instead. = 0. These two are plainly implausible here. (4) A manipulation similar to that used here is considered in greater detail in Problem 2B. What is the distribution of X Y? ) : In the manufacture of a certain massproduced article.7 mm.20.15and a = 0. Independently.15 mm. 2B.S Notes Continuous Distributions 75 (1) In many examples one estimates a population mean or standard deviation. Internal diameter of cylinder: mean 100. the first part of the problem. trying to imagine independent trials with constant probability of success (binomial).p . . a circular plunger has to be fitted inside a circular cylinder. we find that px = 100.
and knowing that linear combinations of normal random variables remain normal. noting that we require the diference between.2.S To obtain this we use the standardised normal deviate . We now use this result to evaluate the distribution of X . If we write W = U + V .f. the moment generating function (or m.25 From tables of the standardised normal distribution.Y . but is shown easily using the method of moment generating functions. (The m. ( s ) = E ( e " ) = E(e"Y) = My(s). random variables. not the sum of. the function can be used to obtain the distribution of the sum of independent random variables.Y < 0). where U and V are independent random variables. This latter result is quite difficult to demonstrate by direct methods.g.0. we find the cumulative probability corresponding to this value to be 1 . the former is easy. Notes * (1) The first part of the problem requires no more than general results on the expectation and variance of linear combinations of independent random variables. We now identify U with X and V with .) is defined as M x ( s ) = E(ea). can also be used for a discrete random variable.) Now the m. 0.+.5 .y s + 1u2s2 2 Y . and an analogous formula is used. 0.f. in particular. has properties rather similar to those of the probability generating function (see Problem 2A.. This is therefore the proportion of plungers that will not fit inside the cylinders. ( s ) = E(e") = E(esUe"") = E(e")E(e'") = Mu(s)Mv(s). where independence of U and V is used to justify writing the expectation of the product esuesv as the product of expectations.Y . which can be evaluated as W Mx(s) = Je"fx(. but for the latter we need M .97725 = 0. For a random variable X .s= e ) ' + lo?( s)2 2 y =e .f.02275. a*): algebraic manipulation gives the result where the integral is evaluated by taking the terms in the exponent together and 'completing the square'.g. We now obtain and p (s) M .@(2) = 1 . ( s ) = M ~ ( . then M .0. We need first to know the form of the moment generating function of N (k.g. 9 where f x ( x ) is the probability density function of X .76 Probability Distributions We now need Pr(X 2B.2) and.
emphasising that the numerical part of the problem requires the use of the result of the theoretical part.W has moment generating function and by comparing this expression with the N ( p .6 Bird wingspans The wingspans of the females of a certain species of bird of prey form a normal distribution with mean 168.Y has a normal distribution with mean 168.25 and variance 78. What is the probability that. and hence standard deviation 8. from tables of the normal distribution we obtain 1 .25 . 0 .50 = 6. We now need to find Pr(X . (However. M ( s ) of (2) In problems like this the concept of ‘fitting’ can be misleading. perhaps.Y t 0. for example. The proportion of plungers that fit is then just Pr(0 I .3. 26.75 .I 3 xY Figure 2 3 Normal distribution with mean 6. .25.Q(0.75cm and standard deviation 6.26.Y < 0).py.f.05 0 . 1 02 + uy). with no implication as to whether the fit is tight or loose. eneral formula above for the m. say.846. The corresponding value in the standardised normal distribution is We require the cumulative probability corresponding to this standardised normal variate.6 Continuous Distributions 77 Since W = U + V .5cm and standard deviation 6cm.954.g. if a male and female are taken at random. 1 mm greater than the external diameter of the plunger.5 cm. it is not at all difficult to adapt the problem to cope with an extra requirement that the internal diameter of the cylinder should not be more than.@(2) = 0. and variance (6.5)2 + 62 = 78.162. The wingspans of the males of the species are normally distributed with mean 162.25. the male has a larger wingspan than the female? Solution If the wingspan of a female is denoted by X and that of a male by Y then X . This is a common situation. the shaded area shown in Figure 2.707) = 1 .Y I which is easily found to be @(2) . but is often missed by candidates answering examination questions. What is intended b p the problem (which is typical of questions of this nature) is simply whether the plunger can be inserted inside the cylinder.) X 1) (3) It is worth.0.240.760 = 0. 0 4 j A v . u2) we see that W  N(px.
The only point to note is that the problem needs to be tackled by considering the distribution of the difference X . Write down functions of the X s which have the distributions below.5. . although in fact it is based on a genuine examination question. l ) is has the x: distribution. (c) Y 3 = i=l x: random variables has the tdistribution on 50 degrees of freedom. . (A more detailed discussion of this topic can be found in Problem 2B. Xz.7 Note This is a standard problem on the application of the normal distribution. the ratio of two independent x2 random variables. in any case.50 Solution (a) Y 1 = X I + X2 + x3 + X4 . say) of degrees of freedom has the tdistribution with Y degrees of freedom. xf.2/48 (d) i=51 98 ‘4 = 2Xiz/50 i1 50 has the Fdistribution with 48 and 50 degrees of freedom. Notes (1) This problem might be.) 2B.Y between the two random variables.78 Probability Distributions 2B. we recall that differences between independent normal random variables are themselves normally distributed. By definition. serve to illustrate the basic results concerned with sampling distributions related to the normal. These . its variance is the sum of the variances (since the X s are independent) and the sum of normal random variables is normal. justifying the results you use.4).x2. l ) random variable to the square root of an independent x2 random variable divided by its number ( u . t and F . 2 X. has the Fdistribution with those numbers of degrees of freedom.N(O. .Xlm are all independent and all have the normal distribution with mean 0 and standard deviation 1. . By definition. It does. the square of N ( 0 .are crucial to a full understanding of the basis of many common procedures in statistical inference. and the sum of n independent By definition. the ratio of an N ( 0 . each divided by the corresponding number of degrees of freedom. thought unlikely to appear in a public examination. 10 (b) Y2 = EX: has the . (a) N ( O ? 4) (b) x:0 (c) (d) 150 F48. .=I xfOdistribution.7 Distributions related to the normal Random variables XI. The mean of Y 1 is the sum of the means. and that the variance of the difference between two independent random variables is the sum of their variances.
each divided by the appropriate number of degrees of freedom. a result we shall make use of in a regression problem.v./5l and s2 = x(xi . random variable is Fl. the variance is increased by a factor of the square of the multiplier and the distribution remains N( k.)2/48 2 (Xi . also has the i =1 & distribution.2B. Y .u2). we could use 100 E ( X i . This is a s e a l case of the result that. (This is .) A n alternative for part (c) also comes from consideration of normal samples.F)* . i i=l where 2 variables. . a2u2).2. X51 and x b is the mean of the remaining 49 random 2B. 1 51 An alternative to Y4 comes from an Ftest for comparing two sample variances. The machine will stop if either belt fails. . where i=l 11 x 11 = x X i / l l . and the failures of the belts are independent. we see that Y ] is distributed as F with 1 and 50 degrees of freedom. . for example. Y. is the mean of X I . A t a trivial level. = E(X. But there are. for n = 51 we will have 50 degrees of freedom. renumbering the X s is obviously acceptable. of different lengths. if X UX + 6 N( a p + 6 . .. so set x 51 1 = cU.x ) 2 / 5 0 . For part (a). with mean times Q and 2a. x:   For part (b). o a . then normal.x .8 Failures of belt drives (a) A n exponential distribution has probability density function Show that the mean of the distribution is a and the variance is a2. (b) A machine contains two belt drives. Solution (a) The mean is given by r c a E(X) = s x f (x)& a = JxLez’adx. = ? .8 Continuous Distributions 79 For example. squaring Y3 in part (c) gives a random variable whose numerator X& has the distribution. several other correct solutions. a natural function to consider because of its link with the sample variance. Show that the chance that the machine is still operating after a time a from the start is ep3R. Thus. Generalising. = 2 1 has the required distribution. The most natural t statistic is where x is a sample mean and s the corresponding sample standard deviation. These have times to failure which are exponentially distributed. (2) The solution to this problem is. we see that the square of a t . of course. The numerator and denominator would be of the general form of Y.X )2/50 y. not unique. Problem 5A. as well. Noting the definition in part (d). and we note some of these below. since the mean remains zero.
E ( X 2 ) involve integrals evaluated between limits m and m. by ie independence. for the second belt.x / " d r a a and. ie For the first belt.{ E ( x ) } ~ 2a2 . the product Pr(first belt has failure time > a )XPr(second belt has failure t m > a ) .80 Probability Distributions Integrating by parts. the range of values for which the probability density function is nonzero is more restrictive than this for many distributions.8 W To obtain the variance we first find W E(X2) = s x 2 f ( x ) d x . Hence the probability that the machine continues to operate after t m a is e' x eCm = e3R.zI Integrating by parts gives 9 = 2aJxLe"/"dX = 2aE(X) = 2u2. . W Pr(fai1ure time > a ) = J l e . as required.a ) = ' = a2. ie Notes (1) Although the general expressions for E ( X ) . this becomes 28. (b) We require the probability that both belts have failure t m s greater than a which is. hence the limits for the integrals are 0 and m. In this example f ( x ) = 0 for x < 0. o a We now obtain var(x) = E ( x ~ .
the moment generating function can be used to obtain the moments of probability distributions.6321 To find the required probability we now use the binomial distribution. and all have probability 0. the lifetime of each having the above distribution. a slightly shorter method than the direct one given in the solution. and that if a component does fail then the replacement used is of particularly high quality and will certainly last for the 1000 hours.1009 = 0. n ) 5 and parameter (i.(O). Since there are 5 components. Pr(X = 3) + Pr(X = 4) + Pr(X = 5 ) = 5 x 4 (0.6321)' 1 x2 = 0.28. i.3418 + 0.2937 + 0. described in Note 1 to Problem 2B.e.9 Guaranteed life of a machine The lifetime in hours of a certain component of a machine has the continuous probability density function The machine contains five similar components. Solution For each component. can be found very easily for the exponential distribution.6321.3679)2 + 5 (0. 2B.e. acting independently. the mean is given by E(X) = M'.6321)4(0. p ) 0.5 for the normal distribution. the number X which d o wear out is binomially distributed with index (i. . By definition M x ( s ) = E(ed) = Je"f(x)dr P = 1 as' s <a1 As could be expected from its name. assuming that the components wear out independently. and the second moment E(X2) is given by M " x ( 0 ) . Find the probability that such a guarantee would be violated.3679) + (0. Specifically.9 Continuous Distributions 81 * (2) The moment generating function.6321 of wearing out. The makers are considering offering a guarantee that not more than two of the original components will have to be replaced during the first 1000 hours of use. the probability that the component wears out within the first 1000 hours of use is given by loo0 Pr(1ifetime < 1000 hours) = 0 Lex/looodx 1000 = 1 .e. For the exponential distribution we thus obtain E(X) = a and E(X2) = 2a2. We require the probability that X exceeds 2.eP1 = 0.6321)3(0.736.
y ~ 0 . so Pr(Y > 2) = 1 . i.82 Probability Distributions 2B. and the total number of these occurrences in a fixed time has a Poisson distribution (and the interval from some startingpoint to an Occurrence has an exponential distribution. It would plainly be very silly of the makers to offer the proposed guarantee! The reason for the probability being as high as this is simply that the exponential distribution is very skew. at 0.6321.7. where y is measured in minutes. the probability that more than two components fail during this period is quite high. in practice this is an important consideration. Pr(Y > y ) = e 3 1 9 ?sy + 3 . acting independently.875. the total number of failures Y will have the Poisson distribution with mean 5. Solution Let Y denote the time (in minutes) that the birdwatcher waits on the path.10 Watch the birdie A nature reserve has one bird of an uncommon species which frequents the large expanse of reed beds there. The theory underlying the analysis is that of the Poisson process.5ep5 . (3) Despite the mean lifetime being 1000 hours. Had the distribution been symmetrical. calculations involving another distribution have to be used first to obtain the value of the binomial parameter p . It is easy to show that for this distribution Pr(X > 2) = i. although to solve the problem one does not have to know this. The exponential distribution is commonly used as a waiting time distribution. as given above). ‘occurrences’ (here failures) take place randomly in time. In such a process. for any single original component (and any replacements needed for it) the number of failures in 1000 hours has the Poisson distribution with mean 1.74. described in Note 1 to Problem 2A. or a timetofailure distribution.10 Notes (1) The first part of the solution is a direct application of the exponential distribution. 2B. As in several other problems discussed in this chapter. the probability that an individual lifetime is less than this is as high as 0. It can be shown that. distinctly less than 0. * (4) Although in the statement of the problem it was made clear (twice!) that failures of replacement components could be ignored. and the mean and variance of the time that he or she waits there. 3 3 Find the probability that a birdwatcher spends between 2 and 4 minutes waiting to see the bird. (2) The second part of the solution makes use of the binomial distribution. since there are five components. Although the mean lifetime is 1000 hours.e5 .e 2! 52 = 0. the probability that he or she is still waiting there to see the bird a time y later. Then we require Pr(2 5 Y I and are given that 4). It is found that if a birdwatcher arrives at the path adjoining the reed beds. the and probability that an individual lifetime would be less than 1000 hours would have been the distribution of X would thus have been B ( 5 .74. 5). although still too high a value to Contemplate offering a guarantee. is 1 +Y 2 e + e +Y .
and one often finds questions of this kind set in such a context. once it is realised that the expression given in the statement of the problem is not the probability density function. for example. but it is our experience that many students try to answer the problem thinking that what is given is the probability density function.Pr(Y >y) and so the density function f y ( y ) is given by Hence the mean waiting time is given by E(Y) = i y L [c'y+e'y)dy 0 6 =  1 6 3 [2ye [2e . within the civil service. (2) The problem concerns a mixture of exponential distributions. 9 9 Notes (1) This problem is a straightforward application of the exponential distribution. To find the mean and variance of Y we use the direct approach based on its probability density function f y ( y ) . The wording does in fact make this clear.+ .t ~ .8 and 2B. also seen in Problems 2B.4e = 3 3 [ . and obtain 40 8 E ( Y ~ ) .10 Continuous Distributions 83 = 0. given by E(Y2) = $y2z[e 0 1 $ 1 + e "')dy We integrate by parts twice. then.4ye 1 4Y lo xi  +[ [2e =  1 To obtain Var(Y) we first obtain E(Y2). If the distribution function of Y is F y ( y ) .fY  .2B.2366. Fy(y) = 1 .9. Such a mixture of distributions has been found very useful in analysing periods of employment. for y 2 0.72 = 24.{E(Y)}' = 24 100 116 = . . one with mean 2 and one with mean 4 minutes. 0 3 ~ The variance of Y is thus given by Var(Y) = E(Y2) .
we obtain 1 2 10 E(Y) = (2) + (4) = .84 Probability Distributions 2B. based on the component distributions. x 2 0 . Let X1 and X2 be independent random variables with density functions . Returning to equation (*). Then. in proportions f and f respectively.8 that. * (4) If Y is a continuous random variable. for the density function of Y .m) only.10 * (3) Sometimes the density function of a random variable can be expressed as a mixture of density functions corresponding to distributions of known mean and variance. there is an alternative and simpler method. E(X) = a and E(X2) = 2a2. We now make use of the results given in Problem 2B. and i e 11 . W e have here. ie .. Employing this result here.I X ‘ . then integration by parts readily reveals that x1 E(Y) = JU 0 . * (5) It is important to distinguish between mixtures of density functions and mixtures of random variables. In this event. asbefore.FY(Y))dY.= . Y has a density function which is a mixture of the densities of X1 and X2. of finding the mean and variance. asbefore. the density of W is not given by equation (*). for a random variable X with density function alex’a. viz. we find II E(Y) = JPr(k’ >y)dy 0 = [$xi] + [$xl] = y. 9 9 = = 1 2 ~ ( 2 x 2 + (2X4*) ~ ) 3 = 24. and substituting a = 2 and Q = 4 in turn. taking values on the range (0. and with cumulative distribution function Fr (y ). from equation (*) in Note 3. But if we define a random variable W = fX1 + 3x2. This result allows one to obtain E(Y) directly from the cumulative distribution function. respectively. and can write this as a mixture of exponential density functions. x 5 0. 3 3 3 and E(Y2) thus giving Var(Y) E(Y~){E(Y)}~ 100 116 = 24 . without needing to construct the density function fr(y).
g.d.5 for some details of their definition and use).d.11 Continuous Distributions 85 The distribution of W is most easily found by using moment generating functions (m. Find the cumulative distribution function F r b ) of Y. Find the mean and variance of Z1 and Z2. Show that the expected value of Y is In.3 s ) 3  = 1x3 12 2 s + +".f. o s y Sf. Hence.g. We have.f. The m.$ s ) ( f . for suitable values of s . 1 Breaking a rod at random B1 (a) A rod of length 21 is broken into two parts at a point whose position is random. Let X be the length of the o smailor part.fs take a simple form. But X1 and X2 are exponentially distributed random variables. for s C $.g. and find Var(Y). and find the expectation of X . Write down the probability density function (p. 11 2'4 (f . .fs. and their m.fs of : exponential distributions (seeNote 2 to Problem 2B.. Similarly we obtain and so. otherwise. (b) Two rods.) of X .g. or otherwise. (c) For the situation described in part (b). are independently broken in the manner described above. let Z1 be the sum of the lengths of the two smallest parts and let 22 be the sum of the lengths of the smallest and largest of the four parts. show that Y has p.' 18 8 ' 3  using partial fractions.2B.8) and we thus obtain 2 . each of length 21. Let Y be the length of the shortest of the four parts thus obtained. for example. given by fro?) = (y'' 2. in the sen= that the point is equally likely t be anywhere on the rod. M w ( s ) of W is defined as +XI + +az M w ( s ) = E(e") = E(e 1 = E(e3 )E(e : ' x z ). see Note 1 to Problem 2B. But this is the form of a mixture (with weights f and ) of m.f.
f. W W X i >Y ) = Jfx Y (X )& Y <o. it can be derived from W if. fx(x) = { O . so that the length X of the smaller part is itself uniformly distributed on the interval [0.the cumulative distribution function of Y . o s x 5 1 . then Y = min(X1. since the two rods are broken independently. Its probability density function is thus 1/1. 0 . Now. X2). = (fy)/f. The expected value of Y is given by E(Y) = JYfYWdY W I = J2y(f y)/f2dy.d. ID y <o.f. We require F Y ( y ) .d. Osysf.11 Solution (a) The breakpoint is uniformly distributed along the rod. Differentiating Fy(y) with respect t o y to obtain the p. otherwise.2. otherwise. So Fy(y) = 1Pr(Xl>y n X2>y) = 1 Pr(XI>y)Pr(Xz>y). and therefore 1: Fy(y) = l .86 Probability Distributions 2B. I ] . and note that the event (Y > y ) will occur if and only if XI > y and Xz > y . because of the symmetry of the p. 6: c. Y>f. for i = 1. given by F y ( y ) = Pr(Y s y ) = 1 Pr(Y > y). Alternatively. o s ysf. fYb) = as required. The expectation of X can be written down directly as f x ( x ) .( l . o s yI f .y ) W . fy(y) gives 2(fy)/12. Jw) = Jo JXfx(X)h (b) If X1 and X2 are the lengths of the shorter parts of the two rods. y >f.
while from part (c) we obtain E(Xp) + X(*)) = E(Zl) = 1.{E(Y)}2. Notes (1) T h e terms expected value.) denote the length of the i t h smallest part. and so Var(X) = E(X2) . The mean values of the lengths of the four parts . The rods are broken independently.11 i.2B. Further. We see immediately that E(X(g) = 1 . of course. a constant. since each rod was of length 21. so + E(X2) = fl + fl = 1. E(Y) = Continuous Distributions 87 [$.12/4 = 12/12. But I E(X2)= s x 2 / l d r 0 = 12/3.1/3 = 2113.g] 0 1 1 I .e. since X1 and X2 have the same distribution as the random variable X defined in part (a). To obtain the distribution of Z2 we note that the shortest and longest parts must come from the same rod. Now Var(Y) = E(Y2) . all have the same meaning. They do. = 51/3. 3 and 4. expectation and mean are all used in the statement of the problem and in the solution.21/3 = 1/3. then Z1 = XI E(Z1) = E(X1) + X2.{E(X)}2 = 12/3 . Hence E(Z2) = 21 and Var(Z2) = 0. so that Var(Z1) = Var(X1) + Var(X2) = 2Var(X). Let X(. (2) The results found in the solution to this problem can be used to deduce the mean lengths of the four parts. Hence Var(Z1) = 12/6. Then from part (b) of the solution we find that E(X(I)) = E(Y) = 1 / 3 . for i = 1 . we have that X(1) + X(4) = X(2) + X ( 3 ) = 21 f so that E(X(3)) = 41/3 and E(X(4)) are thus in the ratio 1 :2:4:5. 2 . and since E(Y2) = s2y2(1 y)/f2dy 0 we obtain (c) If XI and X2 are as defined in part (b). so that necessarily Zz = 21.
1804.3. and go on to investigate how these may be used to simulate random variables with distributions such as the Poisson and normal.8438. Solution (i) If U has a uniform distribution on the range (0.8438.9473.1353. We now describe this method for the case of a nonnegative random variable X . Almost all computers have a facility for producing random numbers through the RND function in BASIC and equivalent functions in other programming languages. but now of a continuous random variable Z with the standardised normal distribution.0902. then X = 1 + 2U has a uniform distribution on the range (1. The problems that follow consider simple ways of generating uniform random numbers. hence in order to obtain a value for the random variable X we simply set X = 10 + ( 5 ~ 1 . (See Note 1 for a full justification of this procedure.8571.l). 5 4) = 0. We are told that the uniform random variable takes the value 0. (A further example is given in Problem 2C. Now if Z N ( 0 . the value z for which Pr(Z 12) = 0. we again use the cumulative distribution function. then X = 10 + 5Z has the N(10.) (iii) As in part (ii).8438 is given by z = 1.) . l ) . l ) is observed to take the value 0. therefore we set X = 1 + (2x0. It has become very widely used since the advent of computers.3). (Note that some versions of BASIC for microcomputers use slightly different forms. since 0. (ii) the Poisson distribution with mean 2.25) distribution. and so on.1 Simulating random variables Suppose a continuous random variable with a uniform distribution over the range ( 0 .8438 < 0.4060.05.2707.010. 2e* = 0. a valuable technique whose power is often underestimated.6876. 0. =  Notes (1) Part (ii) provides a particular example of a general method for simulating discrete random variables. 2C. 0 1 0 ) 15.88 Probabiliry Distributions 2C.2707. = 2) = 4C2/2 = 0.6767. for example.3). U e this value to generate an observation from each of the s following distributions: (i) the uniform distribution with range (1. and so obtain X = 3.8571. RND(l). which we may use as a value taken by Z. =0) = = 1) = so Pr(X so Pr(X so Pr(X so Pr(X 5 5 2) = 1) = 0.l 2C Simulating Random Variables The understanding of random variables and probability distributions is often enhanced by simulation. and efficient ways of simulating random variables from a variety of probability distributions. From tables of the standardised normal distribution function.8438) = 2. = 3) = 8eP2/3! = 0.) A central requirement for serious work is that one must have a reliable source of random digits. (iii) the normal distribution with mean 10 and variance 25. = 4) = 16C2/4! = 0. (ii) We can construct the cumulative distribution function of a random variable X having a Poisson distribution with mean 2 as follows: Pr(X Pr(X Pr(X Pr(X Pr(X eC2 = 0. 5 3) = 0.6767 < 0.
(For example. Pr{U F x ( x ) = Pr(X s x ) . we would set X = 3 if and only if 2 and the probability of this event is simply the length of this interval. . . Thus Pr(X S x ) = Pr{F. when p o is very small then the test: U < P O will usually fail. 1 . and so the two events {X 5 x } and {Fx (X) 5 Fx ( x ) } are equivalent and have the same probability. Since F x ( x ) is a cumulative distribution function of a continuous random variable it is a monotonic increasing function of x . and so to simulate a value for X we simply obtain a value for U and then solve the equation F x ( X ) = U for X . To simulate a continuous random variable X with cumulative distribution function F x ( x ) . by implementing the rules. . We now explain why this rule works. i. and so on. and the reader might like to consider how this might be done. * (2) Once again. l). . then we can simulate values of X by first simulating a value for U . until we find a value of i for which (2) is satisfied. this time for simulating a continuous random variable. This demonstrates that U and F x ( X ) have the same distribution. if U 2 P O .l). Hence 5Fx(x)} = F x ( x ) . first with i = 1.e. (Po + P1 + P 2 + P 3 ) . and in general Pr(X = i) = pi.) In such cases it is not difficult to devise ways of improving efficiency.2c. . In many cases this procedure will not be very efficient. As U is uniformly distributed over ( 0 . It is easy to see why the rules enable us to simulate X correctly.1 Simuhting Random Variables 89 If a random variable X has probability function Pr(X = i) = p i . l). 2 . we have used a particular example of a general rule. and simulate X by the value x found by solving the equation Fx(x) = u for x . we obtain a simulated value u for a random variable U uniformly distributed on the range ( 0 . For example.(Po + P1 + P 2 ) = P37 since U is uniformly distributed over the range (0. and following the rules below: We operate these rules by first testing result (1).(X) s F x ( x ) } . l). and by definition Pr{U s F x ( x ) } = Pr(X s x ) . So we see that. we apply the tests in (2) in sequence. the rule used in the solution. then i = 2.a random variable with the uniform distribution on the range (0. in part (iii). and from the previous displayed equation we see that Pr{U 5 Fx(x)} = Pr{Fx(X) 5 Fx(x)}. as required. i = 0 . Pr(X = 3) = p 3 . Then.
9664 as the second fourdigit number. 9 . the pair (2. i.2 Using dice to obtain random numbers In a series of ten tosses of two distinguishable dice A and B .90 Probability Distributions 2c.s) 1 2 3 4 5 6 7 8 9 Thus. 9664.2) ( 6 4 ) (6.6) (63) (193) ( 1 9 4 ) (196) (271) (274) (2.lo.5) (3. = O. we obtain Y = 10&(0.7)32. 9664. the probability of this (given that pairs ( i f i) are rejected) is 3x(')' 6 .1) (495) (572) (5.2 2C. the following faces are uppermost (the face for A appearing first in each bracket): Explain how you would use the dice to generate uniformly distributed random numbers in the range 0000 to 9999. i so that. If we take these digits in fours. Using the first of the fourdigit numbers.6) (492) (473) (426) (571) (573) (534) (f5. for example. . . 3 . 36 The pairs given thus result in the sequence 0 . again preceded by a decimal point. Solution There are six possible outcomes for each of A and B .5) (391) (372) (3. which can be used to generate decimal digits as follows. l ) . To simulate an observation on a random variable X with the B (3.0342. 0315. we obtain the required numbers. then we are left with 30 outcomes. we would choose the digit 5 if we obtain from the dice one of the three outcomes (4. . . and treating it as a uniformly distributed random number on the range ( 0 . 0315.2) being rejected. 2 .1. (4. 5 . 6 . outcomes of the form ( i . as required.0315 < 0. A We are given that A = 1. i ) .9664) = 0. 4 . given the value u of a uniformly distributed random variable on the range (0. viz. and illustrate your explanation by obtaining a fourdigit random number from the data.2.3. l ) by preceding it by a decimal point. i = 1 . Use this random number to obtain a random observation from the binomial distribution with index 3 and parameter 0.343. we see that 0 < 0.3.343. we note that Pr(X = i ) = [~)(O~3)'(0. To simulate an exponentially distributed random variable Y with parameter A . so we take X = 0 as the value for X.3o 1 . . l). for example. 6 . 0 of decimal digits.0. and a random observation from the exponential distribution with parameter 1. and taking u = 0. and hence 36 (equallylikely) outcomes in all.3). Pr(X = 0) = 0.2) and (4. Outcomes Digit 0 (122) (175) (233) (296) (3.e. 1 . we set 1 Y = logy. If we reject outcomes in which A and B show the same face.3) distribution. 6 .4) (4.
failure and success. For example.V).2) and (1. way of utilising the dice would be to return the result for the dice as 0 or 1 according as the face shown is one of 1 . We can thus use 1 Y = log. which could then be combined as required. and treated as binary numbers. This method of simulating Bernoulli trials (whether using dice or computergenerated random numbers) is always available for simulating binomial random variables. as we have noted.i ) . 3 ) and (4. (1.6). by treating ‘heads’ as 1 and ‘tails’ as 0. ( 6 .1. 1000 would be 8. 1 Y = loge(l A But we now note that if U is uniformly distributed on ( 0 . so that. It could.2c. but more wasteful. uniformly distributed on the range ( 0 . then we could regard the Occurrence of any one of the nine outcomes (172) (193) (194) ( 1 . viz. (2) The methods used to simulate the binomial and exponential random variables are as discussed in the notes to Problem 2C. we can simply set U = Fy(Y). the probability of a success will be 0. this is to be preferred.2) which is rejected). viz. s ) (1. for example. using the basic definition of such a random variable as the number of successes in 3 independent trials. with this assignment. To simulate an exponential random variable Y with density function f y ( y ) = he’?. This approach is wasteful when dice are used. If. as usual. (4. we reject results of the form ( i . this is commonly the case in simulation. so return 0. be considered if one were using coins instead.indeed. we have three failures. The fact that we can simulate uniform random numbers in two ways draws attention to the fact that the solution given is not unique . so we would record 2 as the first simulated value of the binomial random variable.3. but since the arithmetic (or computer time) required is proportional to n it is less attractive for large values of R . as in the solution.U ) . It is quite efficient for small values of n .3 of success. Clearly. since it involves slightly less arithmetic. 3 ) distribution can be simulated directly.2). For the second three results given (excluding the result (2. 0110 would be 6. l ) . For the first three tosses shown in the statement of the problem. success. so is (1 . respectively.6).4) ( L 5 ) as a success and the Occurrence of any other pattern as a failure.3).2 Simulating Random Variables 91 Notes (1) An alternative. (3.U A .V ) in the expression above without affecting the distribution of Y . 0 . we have.6. The resulting binary digits could then be taken four at a time. .5. For the exponential distribution we obtain Y FYb) = JfY(X)& 0 and so we could set U = 1 .6) ( L 3 ) (2.e” to give . l). Rejecting values greater than 9 would then result in a sequence of equiprobable digits in the range 09. 3 or one of 4.where U is. 2 . we could have used any way of allocating the 30 results to the 10 digits. (3) A random variable with the binomial 8 ( 3 . however. as long as just three were assigned to each. each with probability 0. and hence we can replace U by (1 .and so on.
1 . and let Z denote the number of flower heads on that plant.15774 < 0. 1) to simulate X . (b) Let Y denote the total number of ants on a randomly selected plant.5212 < 0. resulting in X The required sample thus consists of the four values 0. Numberofants Probability 0 0. resulting in X = 1. 0. i 0 Pr(X l i ) 1 2 3 4 0. We can d o this by simulating values of Y for n plants (where n is chosen beforehand).0000 < 0. Solution (a) If X is a random variable denoting the number of ants on a flower head.2050 < 0. and so on.92 Probability Distributions 2c. Pr(0.5212.2050 < U 5 0. 0.for which the simulated value of Y is 3 or more. l).3 Ants on flower heads The number of ants on a flower head is a random variable having the following distribution. 0. We are given that &5' Pr(Z = i ) = .2050) = 0.2050 = Pr(X = 0).15774. resulting in X = 0. = 1.3162 2 0. 1. (b) Explain how you would use simulation to estimate the probability that a plant will have 3 or more ants. 0.2050 1 0.5212 0.8016. then the cumulative distribution function of X is given by the foilowing table. We can use the general algorithm for simulating discrete random variables (see Note 1 to Problem 2C.0422 5 or more 0 (a) Use the following observations on a random variable with a uniform distribution on ( 0 . . 2. l ) to draw a random sample of size four from the distribution of ants: 0. 2 .5212.2804 3 0.493 68. The estimate of p will then be rln . . 0. if the number of flower heads on a plant is a random variable having the Poisson distribution with mean 5 . .49368 < 0.8016 0. say .1562 4 0.0000 If a random variable U is uniformly distributed on the range (0.9578 1. 1. then Pr(0. then we need to estimate p using simulation.2050.45581 < 0.2050 0.3 2C.60282 < 0. i! ' If we write p = Pr(Y ? 3 ) . We see that 0. .602 82.5212) = 0. and counting the number of times . resulting in X = 2.0000 < U 5 0.3162 = Pr(X = l).2050 < 0.455 81.r . 0.i = 0 .
2.fs of X .) If we define G x ( t ) = E(rX). for X . since 0 5 0 5 1. given a supply of independent random variables uniformly distributed on the range ( 0 . Consequently. the simulation proceeds as follows. and consequently the plant will not support any ants. we find in the present case that Pr(Y s 3) = 0.0422t4. (iii) If z > 0.. then we find i =O 31 i = 0. and record r/n as the estimate of p Notes (1) Part (a) is a straightforward example of the use of a cumulative distribution function to simulate a discrete random variable. Y and Z respectively. x2. . . +Xz = i). . l ) . since Pr(Y = i ) = Pr(X1 + X 2 + . 2i). denoting the simulated value by z . If we let qi = Pr(Y and let Q(0) = EBiqi. .e)l{i 2  eGy(e)}.1562r3 + 0. (ii) Simulate Z .1. (2) The distribution of Y. form the sum y = E x i .2.902. (vii) If nc I n . idJ j=i Reversing the order of summation gives 31 Q(e) = xpr(Y j =O =j)Eei i =O i .2 for a definition and discussion of properties.g. . for 0 c 0 5 1. then the p. simulate z independent values XI. .g. Simulating Random Variables 93 (i) S e t r = Oandnc = 1. then there are no flower heads on the plant. is an example of what is termed a compound distribution. We thus obtain Q(e) = (I . p = Pr(Y 3) is given by the coefficient of O3 in the power series expansion of After some algebra. C3 In outline. (If z = 0. We can in fact proceed further with this analysis. in part (b). G y ( r ) = E(tY)and G Z ( t ) = E(rZ) as the p.2050 m + 0. set y = 0. 3 1 3 1 Q(0) = xeiPr(Y i i ) i =O = xeixPr(Y = j ) . of Y can be shown to be given in general by the result G y ( t ) = G z { G x ( r ) } . (See Note 4 to Problem 2A. . . if i=l z = 0.2804t2 + 0.so that here where by definition G x ( t ) = 0.) I (iv) If z > 0. (v) If y 2 3. Otherwise stop. The distribution of such compound random variables can be found analytically using the technique of probability generating functions. return to step (ii). where Z itself is a random variable. increase r by 1. .x. (vi) Increase nc by 1.31621 + 0.f. .
For example.882 0.e. p ) . and the solution it provides can. By contrast. if we require the accuracy of estimation of p from the simulation to be such that. the latter may well be easier. such as that found in Note 2 above.9 0. So. n Estimate of Pr(Y 2 3) using n simulations 10 10 10 100 100 100 1000 1000 1000 10000 10000 10000 1. an analytic solution is exact.p ) / n 5 1/(4n). in the n simulations. to take n 2 1/(4e).o 0. For each value of n .91 0. If. (4) Some examples of estimating Pr(Y 2 3 ) by simulation. three simulations were performed. as in the present problem. using the method described in the solution.908 0. . which could result in a very large value for n . so that distributed random variable. viz. say.886 0.3 (3) An analytic solution to a problem.9033 .88 0.94 0. it suffices to choose n such that 4n 2 E' i.7 0. then R is a binomially B ( n . and since it is in algebraic form it can be adapted easily. if in the present problem the mean number of flower heads per plant changes from 5 to 6. The simulation could then be very timeconsuming. Var(R/n) IE. are given in the table below. of course. R  Var(Rln) = p ( 1 . although.9025 0.9079 0. a value of Y of 3 or more is obtained R times. only be an approximate one.94 Probability Distributions 2C. is preferable to that found by simulation. the only change in the analytic solution is to alter 5 to 6 in the expression for Q (0) above.
for example) will depend. or the individual observations may be available. we recognise that it will not always be feasible to do so in the artificial conditions of. on circumstances. 3A Data Summarisation In this short section we present some problems involving the calculation of summary statistics. or a confidence interval for a binomial parameter. We discuss also the corresponding problems of graphical presentation of data. In the second section. and we consider ways of dealing with both types. for reasons of space we have not usually been able to include plots and goodnessoffit tests for data analysed in Chapters 4 and 5 . in order to discover whether any assumptions which are crucial to that analysis Seem likely to be satisfied. We have just one cautionary remark. say.3 Data Summarisation and GoodnessofFit With this chapter we move away from probability and consider part of the process of statistical inference. A rule of thumb stemming from this is that plotting the data is a good way to start any data analysis. The data may be given in grouped form. one should wherever possible examine the raw data. In many practical problems. Although it is good practice always to carry out these preliminary analyses. The precise technique used (a ttest. naturally. we call the process of matching up distribution to data the ‘fitting’ of the distribution. similarly. but rather in what they can tell us about the population from which they were randomly selected. and we are naturally concerned to check the quality. a central assumption will be that the data form a random sample from some named distribution. we carry the process a stage further. from data. of the fit. but there are some general principles applying to most problems. the observations are usually of interest not in themselves. in order to present some rather more sophisticated ways of checking assumptions. for example means and variances. a threehour examination. The most basic of these is that. Accordingly we discuss some plotting methods in the first of the two sections of this chapter. . before undertaking the main analysis. or goodness. When one obtains data from a randomly selected sample.
1.5 164'5169'5 74.96 Data Summarisation and GoodnessofFit 3A. + 6 x 3 4x4 = . of course.5154.5164. Q2 and Q3. 132. 0. This latter value is Ql. 142.5149.5159. together with the cumulative frequencies. 147. 3. each cumulative frequency is plotted against the upper boundary of the corresponding class interval. The points plotted are then connected by straight lines. The weights are given in the table below. The quartiles.5134. Weight 129. 60 60 + . ungrouped. where x is the original value and x' the coded value.5 154. . in pounds. 3. 167. The point on the graph corresponding to this value is then projected down onto the horizontal axis. usually denoted by Q l . Use the grouped frequency table to calculate the mean and standard deviation.152)/5. and compare them with the values obtained using the original.5 139'5144. are the values which divide the total frequency into quarters. The cumulative frequency graph is shown in Figure 3. 4.51b. starting at 129. The mean of the coded data is then 3X(4) + 3X(3) + . Hence draw a cumulative frequency graph. data. using the graph.1 The weights of club members The members of a sports club. and use this to find the median and serniinterquartile range. in The semiinterquartile range is $(Q3 . .1. The second quartile Q2 is.e.5139. the point on the vertical axis corresponding to is first found. the median. 157. 172. 162.5 159.5 Ib. using classes of width 5 lb. To obtain the value of Ql. using i n and i n respectively. 60 male adults. The coded midpoints are 4. The values for Q 2 and Q 3 are found similarly.5 16951 Frequency 3 3 6 14 9 Cumulative Frequency 3 6 12 26 35 43 50 56 60 8 7 6 4 T o obtain the cumulative frequency graph. To calculate the mean and standard deviation we require the class midpoints i. 2. 2.1 3A. we find that Q2 1521b and Q3 161 lb. had their weights recorded. 1. 171 131 165 153 165 160 165 174 149 155 144 139 153 147 140 132 163 149 154 155 154 149 157 145 158 160 149 169 158 147 160 140 147 160 149 158 149 156 152 169 148 150 149 156 148 160 161 171 138 174 131 136 149 167 150 153 144 154 142 144 Construct a cumulative frequency table foi these weights. 152. For the data.15. To make the calculations slightly easier we recode these values using x * = ( x .Qt) = i(161 .5 134. Q l z 1461b.5 144.5 149.146) = 7. 137.0. Solution The number of values falling into each of the classes is given in the table below.. where n = 60.
3. ungrouped) data to be 3175 71 + 60 + . The variance in the original scale is then S2xxs2 = 110.917Ib. using the same coding we find that the variance is 602 + . Coding is always optional. The standard deviation is the square root of this value. It is always good practice to make use of the full information available. and arbitrary.3A.69. The = . With this informal coding. .132)/5. + 50 + 44 = = 52. . Converting back to the original scale shows the mean to be 152 variance of the coded data is S" + 5X0. (A fuller discussion of skewness is given in the Notes to Problem 3A. i. To find the mean and variance of the 60 original observations it is convenient to subtract 100 temporarily from each value.917. + 6X32 + 4 ~ 4 ~ 60 where the term 92 is the square of the sum of the coded values.3X(4)2 60 + 3~(3)' + . 10. so that we find the variance o the set of data to f be 106. (3) The frequency distribution is slightly skewed and this is reflected in the mean having a slightly higher value than the median.321b. 60 60 so that the mean weight is 152.751b.1 Data Summarisation 9 1 5 C 601 s 0 v 2 .521b. we need only note that the agreement is very good. we find the mean of the raw (i.) . squared. in units of pounds. any summary statistics such as the mean and standard deviation should be calculated from them rather than from a corresponding grouped frequency table. .40% w 3 V 5 20  0 125 130 135 140 145 150 155 160 165 170 175 180 Weight (Ib) Figure 3 1 Cumulative frequency graph for data on weights .48 and the standard deviation to be its square root.15 = 152. 10. .e. and will not affect the result (once this is transformed to the original scale).. it is done for convenience only. .442 Subtracting 100 does not affect the variance. (2) Clearly. Notes (1) Alternative ways of coding the class midpoints are possible. + SO2 3. To compare the summary statistics obtained from raw and grouped data. for example X * = ( x . .e. if the original data values are available.
who may well not be professionally involved with statistics and thus may have a partial view. . this range becomes 6. say which divisor was used. When all that is required is to summarise the data in hand. If the data given here were normally distributed.98 Dam Summarisation and GoodnessofFit 3A. so that readers can convert to any alternatives they may prefer. The calculation of Q l and Q3 can be done similarly. but not identical.96 Ib. one can use any divisor. situations. Since here the standard deviation is 10.E(X)}2]. The basic problem can be posed in its most confusing way in the form: ‘We are given a set xl. where. x. our first thesis is that one should never put the question that way. .1 ’ .32 Ib. the median may be calculated as Q2=L + m XC. of values. and by drawing distinctions between similar. The nine values which divide the area into ten equal parts are called deciles and the ninetynine values which divide the distribution into one hundred equal parts are called percentiles. likewise. . then 50% of the data would lie within a range of 0. We recall that in probability theory the term ‘variance’ has a precise definition. . Textbooks.6745 of a standard deviation on either side of the mean. For example. x. This value is slightly smaller than the semiinterquartile range on account of the asymmetry of the distribution. never think in terms of ‘variance of a sct of values’.1 (4) Fifty percent of the data lie between the values of Q l and Q3. (8) It is convenient to place here a rather lengthy note on the vexed question of ‘ n versus n . with no reference to a parent population. What is the variance of XI. But we will attempt to clear some confusion by giving some positive answers.3 gives an example of an uncommon one. x 2 . x2. but a subject with a syllabus which is taught and examined. rather than n 1. In general these cut points are called quantiles. m is its frequency. . For the data here we have Q2 = 149. The syllabus may well reflect the view of an examiner. A t an elementary level the facts of life are rather different. . then L is the lower boundary of the median class. In this awkward situation the reader will appreciate that we are not going to pretend to produce definitive answers to the vexed qucstion. (7) In calculating the variance in this problem. ( 5 ) The values of Q1. may adopt different conventions. its variance Var(X) is Var(X) = E [ {X . within limits. arbitrary.?’ Since we will be recommending different answers to the question according to what sort of values the x s are. of course.5 + 30*265 9 = 15172. . Statistics is then not an activity. the divisor was taken as n . naturally. If n is odd the same fcrmulae are used. . (The discussion in Note 2 of Problem 4D. the divisor n is generally used. if not quite. The value n I is used when the data are considered as a sample from a larger population and the aim is to obtain an estimate of the true (but unknown) population variance. Q2 and Q3 could have been obtained directly from the table of cumulative frequencies without drawing the graph. f is the cumulative frequency up to its lower boundary and c is its width. by definition. (6) There is no reason why we should restrict our attention to dividing the distribution into four equal parts. that what is correct to one examiner may not be to another. by nailing a few theses to tiie door. when we have a random variable X .) In a report one must. our case is that no answer could be definitive. if the class which contains the median value is referred to as the median class. The truth is that in advanced work the divisor to be used is nearly. and unfortunately may also have a bee in his bonnet about a ‘correct’ answer. The painful difficulty is. . as it were.
. or 100 if we divide by 9. The corrected sum of squares will now be 1800. and the almost invariable convention is to use n . T h e sample variance will conveniently give an unbiased estimator only if the smaller divisor is consistently used. If we have a set of 10 values X I . say. (iv) T h e values form a complete set or population. 2. this term almost always connotes a divisor of n 1. without replacement. 20. but use of n . when a random sample is selected without replacement (the usual case when the population is finite) the sample variance will be used to estimate the spread of the population values. the divisor n is indicated. In its favour is the fact that. and examine the concept of variance for each. and the measure of spread will be found by dividing this either by n . X ( x . n. use of n 1 is so common that to fail to use it would be counted a mistake. can be viewed. . Here we want to find a way to measure spread. If we use n as divisor.e. with each of the n members having the same chance of being the one chosen. . Despite its arbitrariness. then X will be a random variable with a distribution such that 1 Pr(X = xi) = . from which a random sample will be selected. then our measure of spread is 1800/20 = 90. .1 Data Summarisation 99 We shall now consider four ways in which these values X I .x10. and the corrected sum of squares is 900. where the variance has its usual rde in relation to a discrete random variable and its probability distribution. so that our set now consists of 20 values. . . . In these circumstances the x s are clearly a sample. then our measure of spread will be 90 if we divide by n . say.1 does have advantages. from which one member is to be chosen at random. T h e interest in the xs lies in what they reveal about the larger set. One reason for this is easiest to see by means of a n example. x 2 . .E(X)}2] and since E(X) = " 1E x i n l=l = X . (i) The values have been selected from some larger set or population of values. T h e resulting quantity is called the sample variance. 19. .3A. Most people would argue that the spread of the 20 values is just the same as for the set of 10. Now suppose that the values we have are exactly duplicated. or 94. But using n .1 as divisor. The variance of X can then be obtained from the usual definitional expression Var(X) = E[{X .X )*. . (iii) The values form a complete set and one wishes to provide a summary measure of their spread.. There are arguments for both divisors here.1 . (ii) The values form a complete set or population. Use of n 1 does also have the advantage that the estimator of the variance of the population is unbiased. without any implications as to sampling from them. are of course worth considering as measures. and the two measures obtained from it by division. therefore. it follows that In this case. the value of the population member chosen. .7. or by n . both for sample and population. i. The corrected sum of squares. If we decide to denote by X . and there is nothing to suggest that concepts based on probability have any relevance. and the divisor n is generally felt to be better. by 10. n i = 1. . x. . To use any other divisor would require corresponding changes to be made to formulae (such as those used in the ttests) and possibly also to statistical tables. as before. .1 gives 1800/19. compared with 100 previously. rather than in themselves.
00. and use it to calculate the modal class.1. Class boundaries 0.33 0.2 It seems then that in the case of a sample there is no real difficulty. those arguments supporting the use of n often prevail in elementary discussions.100 Data Summarisation and GoodnessofFit 3A.33 8.2 12 8 1. 0. But the reader will have noticed that we draw this conclusion with some reluctance. and that divisor is commonly recommended for the 'variance of a population' in firstlevel examinations. since it overlooks more sophisticated arguments and therefore gives a dogmatic prescription which is in fact based on an incomplete assessment. involving n 1. Thus a recorded weight of 0.00 8. we need to modify the class boundaries given in the problem.0 clearly represents a weight in the range (0.00 13.00 .1 kg.1 8 2.31.75.95 0.3.85.35 kg to 0.25 1. This is because the data are continuous.4 9 0.0.25 . .951.23. are as given in the following table.00 12. In the case of a population.1 kg. we refer to a rescaled frequency as a frequency density. one uses the sample variance formula. Histogram for catches of fish A keen angler kept a record of the weight of each of his last 51 catches of fish.0 kg. the heights of the rectangles can no longer be taken equal to the observed frequencies.5.2.05 kg).50. and the histogram constructed from the frequency densities is shown in Figure 3. The weights.0.01.1.25 2.95 . In a histogram we draw a rectangle for each class so that the area of the rectangle is proportional to the frequency in that class. but only recorded to the nearest 0.67 From the histogram we see that the modal class is (0. the appropriate areas are easily constructed. where a recorded weight of 0. For example.9 1. But when the classes have unequal widths.3 and 0.5. We now rescale the frequencies in all classes whose class width is not 0.. 3A.95.75 3. or complete set. classes 0.7 8 1. and similarly for the other classes.25 Frequency density per 0.5 respectively. but the class widths are 0. recorded to the nearest 0.82.5 kg 10. 1.7 4 3.45 .25).45 0. By and large.2. The only exception is in the first class. both divisors have their advantages. With equal class widths.75 .25 .75 both have frequencies of 8.2 2 Draw a histogram for the data. by analogy with probability density.2 Weight (kg) Frequency 0. So.4 kg represents an actual weight somewhere in the range from 0. that for the first must be E x 8 = 13. if in a histogram the height for the second of these classes is 8. Solution Before the histogram can be drawn.75 1.45 kg.251.25 and 1. The frequency densities for the classes are given in the table below.00 1. and rescaling is necessary.
45 is negligible. Of course. The easiest way to describe the lines is to show how they could be drawn for the present data (although. ~ 0 1 2 3 4 5 6 Mass (kg) Figure 3 2 Frequency densities: total catch of fish (kg) . 0.95 . Indeed. If one were to draw a parabola through three points. for example.33) and join (0.1. The required lines are always drawn diagonally across the modal class. but this would have been unnecessarily cumbersome. that is. then the maximum of this quadratic could be considered to be an approximation to the mode for the set of data. 0. 8. in the present problem we could have gone further and specified.25.95. the midpoint of the horizontal bar of the histogram for the modal class and corresponding points for the classes either side of the mode.2 Data Summarisation 101 1 . the method is not applicable when classes are of unequal width). in the present case class widths are not all the same.33) to (1. can be given by constructing a quadratic such that the areas under the curve are equal to the areas under the histogram for the three relevant classes.25. etc. and for the present data we would join the point (0. .95. The intersection of these two lines then indicates the position (although not the height) of the maximum of the quadratic. as here. valid under the same condition of equal width classes. The basis of the method is that one approximates the histogram. 13.) (2) With continuous data. and the method is therefore not applicable. as here. so there is no confusion in specifying this value as a dividing point between two adjacent classes. Sometimes one is expected to calculate a single value for a mode when data are presented in frequency table form. 12.45. In problems where measurements are made to a given accuracy. (A similar justification. 13. the second class as ‘above 0.45 but not above 0. as we have noted. it is usual to quote the modal class. The usual method is a graphical one. This reflects the continuous nature of the data and the need to ensure that no data value could possibly belong to two different classes. (For a continuous measurement.00). by a quadratic curve.25. we would have to ensure that no value could appear in two classes.) The location of the maximum can be found very simply by drawing two straight lines and finding their intersection.3A. and has some theoretical backing when the histogram is constructed with classes of equal width. and we are then required to group further.95. Notes (1) As the weights are recorded to the nearest tenth of a kilogram the class boundaries are taken as 0. 0.00) to (1.95’. in the region of its maximum. the chance that an observation is precisely 0.
+ 12 Hence the meanis (8x0) + (10x1) . 4. In a similar f way a negatively skewed distribution has a relatively long tail to the left. . particular question in an examination.) The mode is another possible measure of location. the mean and median will coincide. In particular. It is the value ranked in position ( n +1)R if n is odd and is the average of the two middle observations. 10 8 marks for a Mark0 No. . that is.48. By contrast. clearly.mode (i) skewness = standard deviation and 3(mean . a histogram) is drawn. What feature of the distribution is suggested by the fact that the mean is greater than the median? Solution (a) The mean is the straightforward average of the 8 + 10 + . (d) The mean. 4. and is not strictly relevant to sample data from a continuous distribution (but see also Problem 3A. = 500 500 + 6 = 500 marks.3 Distribution of examination marks The following table shows the number of candidates who scored 0. Any skewness is often apparent if a bar chart (or. is greater than the median. it is said to be positively skewed. This is because it is unaffected. giving the median income of employed persons in a town will usually tell more about that town's prosperity than giving the mean income. In this problem. Notes (1) If a distribution is precisely symmetrical. . by a small number of particularly large (or small) data values. the distribution o personal incomes).3 3A.2). i. unlike the mean. for continuous data. when the distribution is asymmetrical. + (b) The median of n values. and has a relatively long tail to the right (like.102 Data Summarisation and GoodnessofFit 3A. is the middle value. Two which are fairly easy to calculate are mean . an overrigorous attitude to samples from continuous distributions might suggest that since the n observations will all be different there will be n modes! (3) There are severai ways of measuring the skewness in a set of sample data. These are both 4. . . 1. so the median is also 4. . the values to the right of the median are more spread out than are those on the left.median) (ii) skewness = standard deviation ' . so is clearly 3 here.of Candidates 8 1 10 2 49 3 112 4 98 5 86 6 54 7 9 1 0 12 37 28 6 Calculate the mean. n = 500 and so the median is the average of the 250th and 251st in order of magnitude. the median is usually a valuable measure of location. for example.e. .48. (2) When a distribution is skewed. + (6x10) = 2241 4. (c) The mode is that value which occurs with the greatest frequency. This cannot affect the median. when n is even. (For example. put in order of magnitude. but the extra spread on the right causes the mean to be greater. those ranked ( n R ) and ( n n + l ) . median and mode of the distribution of marks. but suffers the serious disadvantages that it is not necessarily unique. which could be judged not to be of significance in assessing the general location of the distribution. which suggests that the distribution is positively skewed. .
the second moment. For the data given in the problem.JA. (4) In practice. and the two measures of skewness are. as can be seen by examining the data below. Just as the location of the distribution of some random variable X can be measured by E(X) and its spread by the variance E[{XE(X)}2]. The measures just mentioned are moments of probability distributions. therefore.74 and (ii) 0. One can go further and examine higher moments. . as asymmetry grows the three gradually drift apart. x 2 . Draw appropriate pie charts to illustrate the data and bring out the principal features. Overall.1 we would. Mark0 No.58. Formula (i) wi l l therefore give a negative value. for example. However. the standard deviation is 2. Number of cigarettes Percentage of men women 15 610 1115 1620 2130 3 140 over 40 3 5 6 7 11 10 8 5 6 7 6 7 5 4 . .4 Consumption of cigarettes The table below gives data purporting to show the percentages of men and women whose daily consumption of cigarettes lies in various ranges.5. discussed briefly in Problem 2B. a measure based on the third moment E[{XE(X)}3] can be used to measure skewness. with the moment generating function.00. it is a most roughandready rule that equates the numerators of the expressions above. median and mode coincide. and the technique extends in a natural way when the population is finite.x. of Candidates 7 1 23 2 88 3 105 4 110 5 114 6 88 7 68 8 36 9 1 0 20 4 It is straightforward to show that the mean. median and mode are 4. as mentioned in Note 2 the mode may not be unique. It is apparent from a bar chart of the data that the skewness is positive rather than negative. while formula (ii) gives a positive value. as might be expected. in particular a measure of kurtosis. By analogy with case (ii) in Note 8 to Problem 3A. (i) 0. these two measures are not ideal. These moments are useful in characterising the shape of unusual distributions. this function. 4 and 5 respectively. 3A.72. the peakedness in the centre of the distribution is based on the fourth moment E[{XE(X)}4]. on the third moment formula 1 " Z(Xi  " i=l F)3. . can be used to calculate moments conveniently. formula (ii) would generally be preferred to (i) because it uses the median rather than the mode.4 Data Summarisation 103 Both of these make use of the fact that while in a symmetrical distribution with a single mode the mean. neither of these two formulae for skewness is often used. They link up. base a measure of the skewness of observations X I . expanded as a power series. ..
(i) The ranges used are unequal.104 Data Summurisation and GoodnessofFit 3A. then equalsized pie charts would present the information more effectively. (The final category is awkward. exactly half the pie for men would be assigned to the ‘nonsmoker’ category. of course.3.) (iii) A very strong case could be made out for including nonsmokers in the pie charts. But one could argue for several alternatives. with the other categories reduced in proportion. for men and women separately. is that more men than women smoke. as in ‘I smoke less than most men. that the angle assigned to each category. is in proportion to the number in that category.4 Solution The production of a pie chart is a fairly trivial matter. But if the aim of the display is to compare individual men and women with other members of their own sexes. There is just one point to notice. those for women to 40 (the remainder being nonsmokers). If this were done. that the percentages for men add up to 50. and we will not insult the reader by dwelling on the details of arithmetic. but you smoke more than most women’.) (ii) Making the pies of different sizes does bring out the fact that more men than women smoke. with the radius of the chart for men being of that for women. 110. T h e principle is.4 0 over 40 Figure 3. The pie charts shown above are the most basic ones. and the consumption of cigarettes by male smokers is generally higher than that of women smokers. each with its advantages. (2) A decision as to sppropriateness of a particular pie chart is by no means as trivial as the production of the chart itself. shown both by the table and the charts. but leaving it as it is would not cause much difficulty. In the current case there are several alternative ways of presenting the data. and many statisticians would be happy to sacrifice the detail within the narrower ranges so as to have all the categories of equal size. The charts are presented in Figure 3. for women similar calculations would be needed. chosen so that a few points could be madc without excessive calculation. (But see also Note 4. ~ fi/m KEY Men Women 1 5 6 . Notes (1) We should emphasise that the figures used above are fictional. f . of course. 1120.10 11 .3 Pie charts: numbers of cigarettes smoked daily The principal feature.1 5 16 . In this case the pies would have equal sizes (unless one wished to be very finicky and take account o the different numbers of males and females in the population). and so on.20 21 30 31 . and so the pie charts are constructed so that the total areas are in the corresponding _ _ proportions.
25. (4) Other facets of the data could be brought out by use of other techniques. Range Percentage of men women 50 3 5 6 7 11 10 8 Number smoked by men women 0 none 15 610 1115 1620 2130 3140 over 40 Total 60 5 6 7 6 7 5 4 100 9 40 78 126 280. note that the chart for men would now have an area larger than that for women by a factor of 1252. in regression problems the observations on the dependent variable involve a systematic (linear) component and a random component. a confidence interval for a normal variance.GoodnessofFit 105 (3) Yet another pair of pie charts seems possible if we examine the data from the viewpoint of the tobacco companies. Certainly the production of cumulative frequency polygons or graphs. since that many more cigarettes are consumed by men than by women. In some problems the random samples are observed directly. and similarly for the other categories (8.25. for example. while for the charts in Figure 3.3 the ratio is about 1.5). If we argue that those who smoke between 1 and 5 a day will smoke 3 a day on average. or histograms. then on average.and therefore also contrary to good statistical practice. making such an assumption without adequate evidence would be contrary to good scientific practice . In more complex cases one may have only indirect observations. for every 100 people. The ratio of the radii would thus be about 1.1. more cigarettes in total will be sold for the 4% than for the 7%. wishing to see where the bulk of their trade comes from. as in Problem 3A. the total numbers of cigarettes smoked daily can be found by extending the original table as follows.0 100 Pie charts could be constructed for the data in the last two columns in the table. the use of pie charts will obscure this rather important point. will be called for. So while it may be acceptable as an academic exercise to assume that observations form a random sample from some distribution. while only 4% smoke more than 40. would be useful.5 and (a guess) 45.2. usually the normal.5 0 15 48 91 108 178. . 13. The pie charts above d o not show this. i.e. as in Problem 3A. but the feature can be brought out by looking at the population of cigarettes rather than the population of people.5 355 364 1252.5 182 800. and it is the set of random components that are taken to come from some distribution. or a twosample ttest. for example. at least in detail. The statistical analysis resulting from any assumed model naturally depends on that model: if the model is incorrect then so will be the conclusions. and an analysis such as. 3B GoodnessofFit Most practical problems in statistical inference have as a basic model that one or more random samples are selected from some distribution or distributions.0. 18.5.12. Although 7% of women smoke between 11 and 15 cigarettes a day.5/800.5 177. and indeed it is arguable that since the categories are ordered. form a natural sequence. 35.
discussed in the problems below. and ex is the corresponding 'expected' frequency when H o is true. i ) .20. Number of sixes Frequency u.l Numbers of sixes in dice throws Two dice were thrown 180 times. (b) The null hypothesis is now that X has a binomial distribution.125)2 c = (105120 (70 . There are many such tests. Now. ~2  . so we reject H o at the 1%significance level. x = 0 . 3B.106 Data Summarisation and GoodnessofFit 38. as one would expect. and H o will be rejected for large values of C . T h e upper 1 % point of is 9. = 180px.21.) (x) 0 105 1 70 2 5 Total 180 (a) Test the hypothesis that the distribution of X is binomial with probability p of obtaining a six given by p = t. and the null hypothesis is: No: X The test statistic C has the form . Now. we obtain. where f. one can construct a histogram and assess by eye whether it is of roughly the required shape. and a t each throw the number X of sixes was recorded. 2 . is a n estimate of the probability that X = x . but with parameter p unspecified.1 It follows that we must devise methods of testiqg assumptions as to distributional shape. under H o . Thus + . el = 50 and e2 = 5 . except that ex is now given by ex = n i x = 180px. T h e test statistic has the same form as in part (a). is the observed frequency of x sixes.50)2 + 50 5 = 11. based on the binomial distribution with parameter Total number of sixes observed = Total number of throws of dice ' The test statistic C still has. (b) Explain how the test would be modified when the hypothesis to be tested is that X has a binomial distribution. but the number of degrees of freedom is reduced by 1. 1 . both formal and informal. B ( 2 . under tlo. Solution (a) The appropriate test here is a x2 goodriesoffit test. the test statistic C has approximately a x2 distribution with 2 degrees of freedom. where p. approximately. and so that eo = 125. A t simplest. ex = np. but p is unspecified. there are more sophisticated methods. with the following results. denoting Pr(X = x ) by p x . a x2 distribution.
of thunderstorms Number of stations reporting x thunderstorms (x ) 0 22 1 37 2 3 13 4 6 5 2 V) 20 (a) Test whether these data may be reasonably regarded as conforming to a Poisson distribution. Test whether the data above are well fitted by a Poisson distribution with mean 1. However. (b) The average number of thunderstorms per month throughout the year is 1. so calculate Hence eo = l08$. (A fairly well known example is the pioneering experiment of Gregor Mendel. but this is itself not very satisfactory. in general.2 The occurrences of thunderstorms The table below gives the number of thunderstorms reported in a particular summer month by 100 meteorological stations. O . so in this case the null hypothesis will not be rejected at any of the conventional significance levels. the binomial random variable was already denoted by the very commonly used X . This value of C is to be compared with the x2 distribution on 1 degree of freedom. e l = 62. then the lower tail of x2 may also be of interest. No. a very small value of C may lead to suspicions that the data had been made up rather than actually observed. Thus. but they are given here for completeness.8$)2 8. rather than X2 which is used in most other problems. as in part (a). but this may cause confusion and is rarely done nowadays. (3) The x2 goodnessoffit test generally rejects a null hypothesis only for lnrge values of C .I . . where k is the number of different classes or values into which the range of the random variable X is divided. k . nevertheless.81. The notation X2 is quite often found. if it is suspected that the data are 'fixed' or fraudulent. Without further calculation state why. We have 6 = 80/360 = 2/9.3B. However.) (4) In this problem the notation C is used for the x2 goodnessoffit test statistic.0. well above 2. When parameters have to be estimated.2 Notes GoodnessofFit 107 (1) The number of degrees of freedom associated with the test statistic is. and use of X2 as well would have been confusing! 3B. and e2 = 8$. In the past x2 was often used as the symbol for the statistic as well as for the distribution it takes. In the present problem. and thus (105 ' I ' I (5 . (2) Calculations for the modified test are not specifical!y asked for in part (b) of the problem. with observed frequencies being very close to the expected frequencies. k 1 is only the appropriate number when the distribution of X is fully specified. this binomial model is inappropriate.81. and the null hypothesis is to be rejected if the value lies far enough into the upper tail. The observed proportions with particular colours are so close to the theoretical ones that it is generally believed today that the experimenter must have given the data a bit of a helping hand. as in part (b). on inheritance of colour in sweet pea plants. = 2.3 provides a good fit to the above data. since small values imply an exceptionally close fit. (c) The binomial distribution with n = 5 .84. the degrees of freedom are reduced by the number of parameters estimated. The 5% point of the xf distribution is 3. p = 0.
= 2 1.1)2 25.ei12 .1E.25.5.O.e2 . fi. 1 . . and Pi el = ep = rial = 100e"5 = 22. 12. This time it seems appropriate to combine the last three classes so that the new final class corresponds to x 5 3. e l = lOOe'" e2 = Ec = 36.8)2 36.2 Solution As in Problem 3B.0.o e3 = 100= 18. . approximately. The data are given in the form of observed frequencies. The test statistic is then Vi i=l .1. but one was .3.12. . . the sample mean x . 2. in the current problem.e2 . approximately.e l .4)2 18. the appropriate test for each of parts (a) and (b) is a x2 goodnessoffit test.3)2 ei + 22.81. .5 The test statistic has. . For the present data. so the Poisson distribution provides a good fit to the present data. so that there are now 5 classes. 100e"O = 36.8.0 = 27. part (b) has the distribution completely specified. . 3. e6 = 100 . eg = n . . a x2 distribution with 3 degrees of freedom (see Note 1 of Problem 3B. ..(22 ." ~ 33. The computed value is well below the upper 5% point of x$.e5 = 0.6 6..4 + 8. 4 and 5 or more (see Note 1). . (b) The null hypothesis is now that X has a Poisson distribution with mean p = 1. but the e j s are calculated from the probability function for the Poisson distribution with mean 1. a x2 distribution with 3 degrees of freedom. with expected frequency 8.3 (37 .22. In the x2 goodnessoffit test statistic the f i s are unchanged.36.. one degree of freedom was lost compared to part (a) because there is one fewer class. has a Poisson distribution with probability function p ( x ) = e+pX/x!.1). as in part (a) of Problem 3B. .5. 1.8.1. corresponding to x = 0. whereas in part (a) here there is one parameter to be estimated. required for the goodnessoffit test statistic are calculated from ei = npi = 100ij .ei)2 . 1 .5)2 33.. . and the new final class corresponds to x 2 4. This will be achieved in the present example if the last two classes are combined. The x' goodnessoffit test statistic C can now be computed as c =i =z l Cfi . If any of the eis is smaller than 5 .8 + (20 .4.e l ..for 6 classes. it is usual to combine adjacent classes until all e j s exceed 5 (see Note 3). x = 0.5 W e3 = n a 3 = IOOX 2! = 25.4. 2! . X = 1. The 'e!pected' frequencies.108 Data Summarisationand GoodnessofFit 3B. e i . which is 7. leading to the following values. 2 .0.5 + ( + (20 .33.8 + (37 . . with p unspecified. (a) The null hypothesis H o is that the random variable X .22.es = 1. where is obtained from the probability function above with p replaced by its estimate.6)2 = 1.1 + 13 .36. as in part (b) of Problem 3B.a2 = l O O ~ 1 .76. The test statistic again has. 1.1. the number of thunderstorms.8)' e i 36.8.(22 . 5 e .
A x2 approximation is still reasonable. or if it exceeds v by no more than about then it is certainly not necessary to consult x2 tables in order to discover that the data provide no significant evidence of lack of fit to the postulated distribution. 1. The eis need not be integers.28. m. 2 5 for all classes. of course. rather than the last three.5). . the set of all possible values for X is divided into classes. and its variance is 2v. (3) The distribution of 2 ui. Notes (1) The observed frequencies actually correspond to x = 0.2 GoodtiessofFit 109 gained because there are no unspecified parameters to estimate. The natural alternative hypothesis is that the observations form a random sample from some other distribution. so values of x 2 6 must be included in some class.8. In fact. T h e best known ‘rule of thumb’ is that none of the e i s should be less than 5. and that although Poisson distributions may fit individual months’ frequencies of thunderstorms. should be rounded to the nearest integer. say) is v. 5.) In a x2 goodnessoffit test. The upper 1% point of xf is 13. what we d o is to set u p and test a null hypothesis that the observations form a random sample from the specified distribution. The value of the test statistic becomes 35. 3. 2.can be regarded as a plausible model for the data.ei)2/eiis only approximately x2. 4 and 5. the number of thunderstorms per month. and there are now 5 classes. ei .3B. But X . This illustrates that a good fit of a distribution to a set of data is no guarantee that the distribution is a sensible one.here the Poisson . 2 . It is therefore convenient to consider the final observed frequency as corresponding to x 2 5. but it is really more stringent than is necessary. in part (b) of the problem. 1. rounding the e. one decimal place is often felt to be sufficient. the x2 approximation will still be. If this is not so for the classes as chosen initially. It is useful to remember that the mean of a x2 random variable (with v degrees of freedom. and only a small proportion are less than 5. k i=l k However. 3.0 is not a good fit to the data. The upper 1% point of x is : 11. the means of such distributions vary from month to month. a common Poisson distribution cannot be fitted to all months of the year.where k is the number of classes.1 and 1. If the observed value of the test statistic is less than v. Therefore.34. then adjacent classes should be combined until e. so that a Poisson distribution with mean 1. if only the last two. This procedure has been adopted in the present example. 4 .) ( 5 ) The aim of a goodnessoffit test is. (This follows because the x: distribution can be approximated by a normal distribution with mean v and variance 2v. The physical explanation for this result is probably that thunderstorms are more frequent in the summer than the winter. the expected value of the uppermost face when a fair die is thrown is 3. so the conclusion that the Poisson distribution with mean 1 does not provide a good fit is at least as clearcut as before. classes are combined.ei)2/ei. (c) A binomial distribution with n = 5 defines a random variable which can take only the values 0. can clearly exceed 5. (2) Confusion sometimes arises concerning whether ‘expected’ frequencies. to determine whether a particular distribution . in the same way that expectations of random variables need not be integers for integervalued random variables (for example. (4) It is not really necessary to look up tables of the x2 distribution in part (a).s to the nearest integer will tend to worsen the x2 approximation to the distribution of xui . Strictly. In practice. values of x greater than 5 can occur. but the approximation is good provided that none of the eis is too small. Thus.25. and there are now 4 degrees of freedom. so that the binomial distribution cannot be appropriate for X . a good one if all eis are greater than 1. there is no need to calculate the i=l eis to a high degree of precision. However. (They have nonzero probabilities if a Poisson distribution is assumed. the approximation is a close one for v greater than about 100. then the last two expected frequencies become 6.
we find that the sum of the 100 observations is 150 and the sum of their squares is 380.33.Q. .= 155. with the results given below. but cannot sensibly be used for other null hypothesis distributions.110 Data Summarisation and GoodnessofFit 3B. and the test is therefore called the x2 index o dispersion test. we discuss these in Note 1 to Problem 5C. Nor can it be expected to discriminate between the Poisson and another distribution for which the mean and variance are the same.7.2.l)s2 = 380 . we wish to test the hypothesis that the distribution is Poisson. in the sense that. These generally outperform the x2 goodnessoffit test for the distribution concerned. the discussion of contagious distributions in Note 1 to Problem 2A. generalpurpose test. This cannot be regarded as a comprehensive. ( n . usually. But in practice a likely alternative to the Poisson distribution is one with a different (usually larger) variance.1. some alternative distributions are much more likely than are others. In the present case. This is very close to the centre of the x2 distribution on 99 degrees of freedom (seeNote 4 . f approximately. . since it cannot be used unless the null distribution is Poisson. It turns out that. I. of children 8084 8589 9094 9599 100104 105109 110114 115119 120124 1 3 1 6 33 44 31 26 20 8 125129 130134 135139 140144 1451 49 150154 155159 1601 64 8 2 2 1 2 0 2 1 . Now one wellknown feature of the Poisson distribution is the equality of mean and variance. xn are the observations. . it follows that one might consider as a test statistic the sample index of dispersion. x:1. Test whether a normal distribution gives a satisfactory fit to the data.Q. (The problem of testing for independence in contingency tables has similar features.3 The distribution of I. for example. defined as I = s 2 / X . C Applying the test to the data in the present problem.5= 103. But while this is clearly desirable. given a large enough sample. No. of children I. No. which aims to reject the null hypothesis whenever the data come from another distribution. SpeCialised tests have also been constructed for other distributions (and in particular for the normal distribution).5. were made for a random sample of 200 grammar school children. (See. we carry out this test in practice by calculating where n is the sample size and X I . and rejected otherwise.3 The x2 goodnessoffit test is a generalpurpose test. .thus  150 1502 . it overlooks the fact that.) 3B. The null hypothesis would then be accepted if I were close to 1. in a natural notation. x 2 . 100 100 Hence C = 155/1. and we therefore accept the null hypothesis that the data come ) x =  from a Poisson distribution. it can distinguish between any null hypothesis distribution and any other distribution..) Rather than work directly with the statistic I. Measurements of I.Q. It is comprehensive.Q.
but we find that the eis for the last six classes are (rounded to one decimal place) 2.38 where the function distribution.) .2650 .8 29. for example. The values of f i and ei for these twelve classes are as follows.38.2 3. where is the estimate of pi.107.52.0 respectively. since two parameters p and u2 were estimated).38 @(z 1  @[ 94. For class i . 0. the sample mean and variance.58 d165.9 22. but neither the mean. which is well above any of the usual percentage points for x. We denote the observed frequency in class i by f i and the corresponding expected frequency by e i . calculated by assuming that each observation is situated at the midpoint of its class.2.3 8.0.017) 0. the value of is then obtained.58 and variance 165. by calculating the probability that a normal random variable with mean 107.107.2). Hence a 4 = ) is the cumulative distribution function of the standardised normal 1 ’ @(0. approximately.0 and 0.3. = and so e4 = 2OOp4 = 22. .628) . of classes .1 30.38 takes a value between the lower and upper boundaries of that class.5 respectively. With this assumption.1546 = 0. X = 107. Then ei = n i l . p.1 . is given.58 d165. We are fitting a normal distribution. which are rather small. 0. a x 2 distribution with 9 degrees of freedom (9 = no. The value of X 2 is 36.1104.@(1. We therefore estimate p and u2 from the data using X and s 2 . The required probability is a. pi a4 = @ [ 99.67. the 1 % point is 21. This calculation must be repeated for all the other classes (see Note 2).1.7 16.5 and 99.9.5 10. W e therefore conclude that the normal distribution does not provide a good fit to these data.3B.7 14. reducing the number of classes to 12.1.0 5. 0. consider the fourth class which has lower and upper boundaries 94.3 GoodnessofFit 1 1 1 Solution In order to use the x2 goodnessoffit test.5 . Class 5 84 8589 9094 9599 100104 105109 fi ei Class 110114 115119 120124 125129 130134 2 135 fi 26 20 8 8 2 8 ei 1 3 16 33 44 31 7.5 .1 28. ‘expected’ frequencies are required for all categories. These classes are therefore combined (see Note 3 to Problem 3B. nor variance u2. 0. To find ei one must estimate the probability pi of an observation falling in class i. For example. from tables of the standardised normal distribution.3.6 The x2 statistic X 2 is and under the null hypothesis that the data come from a normal distribution it has. (A possible explanation is given in Note 4.58 and s 2 = 165.1 23.
but sometimes individual observations are given. 5 84’. so that some tradeoff. In theory there is no reason why this should not be done.5. provided that none of the eis for the new classes is too small. (A better. a different rule might be appropriate for deciding where to put the class boundary.s. but we would then have needed to introduce one or more extra classes. to assume normality and test hypotheses concerning p and u2. However. we might have been required to test the fit of a normal distribution with specified mean and variance. T i occurs in the present example where the first class contains observations up to 84. if the fit were satisfactory. rather than at 19. but it would make the calculation of mean and variance (if needed) ambiguous. Assuming that it has been recorded to the nearest whole number. However. the estimated probability of an I. there is little point in using more than about twenty classes (only possible with large data sets) unless a very sensitive test is required. falling between 79. Class boundaries should be chosen so that (a) it is easy to assign observations to classes. at 84.Q. but in practice it is usual to take the classes as given and to treat the first and last classes as openended when calculating e.s.) (4) Given the context of the data it is hardly surprising that a normal distribution does not give a good fit. subject to none of the eis being too small. as with the choice of classes for a histogram. and classes must be chosen as part of the solution. then. using commonsense. One would expect that there would be some cutoff value below which there were . hs Class boundaries may not be uniquely defined. will be necessary.5. Frequently the only data given are the numbers of observations in each of a set of predetermined classes.Q.5 and 84.1 and 3B. the boundary would be at 20.Q. or calculate. Requirements (a) and (b) may sometimes be difficult to attain simultaneously. When classes are already given. way of arranging these analyses would be first to test the fit of a normal distribution.5. there are two possible slight complications. For example.Q. tests are generally constructed so that the score has a known mean p (which is usually 100) and known variance u2 for the population as a whole. for ‘age last birthday’ and classes 1519. This causes no problems in determining ei for the class. When classes have to be chosen as part of a problem. all with f.5. Fitting a normal distribution implies that I. (3) Since I. the first class could have been specified as simply ‘I. as done in the solution here. less than or equal to 84. The first or last class might be openended. has been calculated as (2) For the first class. Similar considerations apply to the tests for specific binomial and Poisson distributions discussed in Problems 3B.112 Data Summarisation and GoodnessofFit 3. 2024. We could have estimated instead the probability of I. for other types of data.Q. take any value between 84 and 85. and hence obtain expected frequencies. and the second class starts at 85. = 0 . corresponding to values below 79. The calculation would proceed as before with p and u2 replacing X and s2. for each class. for example. in theory. more informative. In the present problem. For moderatesized data sets there should be as many classes as possible. (b) it is easy to look up. it is natural to take the boundary midway between 84 and 85. but the test statistic would have two additional degrees of freedom.5.2. probabilities. it is usually straightforward to calculate or estimate the probability of falling in each class (under Ho) and hence obtain the e. is continuous and could. However. B3 Notes (1) There are several points to be made regarding the choice of classes and class boundaries. convenience and commonsense play a large part in the choice.
5.ei) gives rise to strong suspicions that the normal distribution is not a good fit.5. of course. even without the formal test of significance. The first three differences are negative.0%) and (116.4 Frontal breadths of skulls An anthropologist collected details of 462 skulls of Burmese tribesmen. and finally one further positive difference. Use your plot to estimate the mean and standard deviation of the distribution. and against these are . z. But the departure is not great. .Q. . 114. but have indicated by small circles two points through which such a line might go. followed by four positive differences. This would lead to a nonsymmetric distribution for I.5. and would be consistent with two or three observations being recorded &. whose upper tail is much longer than the lower tail. . If the sample (which is. Construct a probability plot for the data. The data below give the frequencies of occurrence of different values of ‘frontal breadths’ (measured to the nearest mm) for the skulls. and this is indeed what is observed. .5). One might rcasonably conclude that in the extreme upper tail (about the top 3%) the data depart slightly from the normal shape. the data values used are the class boundary points. 3B.5. then five negative differences. and discuss briefly whether you feel that the data could have been randomly drawn from a normal distribution. . (Note that the two extreme points (86. the pattern of the differences cfi . we would expect the probability plot to give virtually a straight line. . 100. values amongst children selected for a grammar schocl. while the cumulative relative frequencies are plotted on the transformed scale. For grouped data. g. although the top few points d o seem to deviate rather systematically from the line given by the rest. as here.) The probability plot for the data is shown in Figure 3. . For clarity we have not drawn a straight line through the points on the plot.4.100%) cannot be plotted. 458 all expressed plotted the corresponding cumulative relative frequencies as percentages. straight. Notice that. (88. pretty large) were in fact a random sample from a normal distribution.4 GoodnessofFit 113 very few I. =. In fact the plot is. Range ~ Frequency Cumulative Frequency 8788 8990 9192 9394 9596 9798 99100 101102 103104 105106 107108 109110 111112 113114 115116 5 9 26 34 59 68 80 64 47 42 12 5 3 4 4 5 14 40 74 133 201 281 345 392 434 446 45 1 454 458 462 Solution In a probability plot the data values are plotted on the arithmetic scale. by and large.Q.3B. .
1 0. so in general one would feel that the normal distribution fits the data rather well.4. distribution. a sample so large that a histogram (with correspondingly narrow class intervals) would take precisely the shape of a normal distribution density function.4 Probability plot: skull data from Problem 3B. 90 807060504030 2 C 5 2 1 11 112 x X I 108 x x I . A probability plot drawn for such an ‘infinite’ sample would give an exact straight line.104 5 n P 100 ? $ 6 L 92  88 0.4. This ‘line’ cuts the 50% line at about the point 99. We thus find estimates of the mean and standard deviation of the distribution to be 99.8 9998 ’ .645~7.4 (Based on ‘Chartwell’ probability paper. The where u is the standard deviation of the difference between these is an estimate of 2x1. Notes (1) The basis of the transformation used for the probability scale on probability paper can be expressed very precisely in mathematical terms.) near 115 (exaggeration!) rather than near 110. in terms of an ‘infinite’ sample. Peel & Company Limited.4 99.114 Datu Summarisation and GoodnessofFit 3B. . produced by H.01 0.5.5 and 4. As noted above.99 99.6 and 107. but have indicated where a line through most of the points might Lie. For our purposes it may be more helpful to think of it as follows. and similarly meets the 5% and 95% Lines at points 91.5 1 2 5 10 20 3040506070 80 YU 95 9899 w u 99 Figure 3. we have not superimposed a line on the points in Figure 3. W.80 respectively.
5 and 94.645u. The question of normality or nonnormality thus boils down to assessing whether the plot more nearly resembles points haphazardly scattered about a straight line or about a smooth curve. an ‘infinite’ normal sample shows as a straight line. The unhelpful part is the first column. (2) In the solution we used the slope and intercept of the probability plot to provide estimates of the parameters p and u of the normal distribution.892 1.944 Produce a probability plot for the data. naturally.562 0.987 1.845 0. Conversely. if one had an ‘infinite’ sample from some other distribution with a fairly smooth density function. a probability plot could have been done for the I. typically one would have to calculate this column from the preceding one. Obviously any such informally produced estimates cannot be expected to be as efficient as the standard X and s (in the usual notation). For a real sample.342 0.Q.6450 to p + 1.1.29 thus gives the estimate of u. (3) The presentation of the data in the statement of the problem is partly helpful and partly distinctly unhelpful. and uses the distance between the other two points as an estimate of 2 x 1. then obviously the 50% point would give the population mean p. in essence. with 1 added for convenience): 0. The basis of this method is as follows. one could calculate u. but they d o offer speed and simplicity. a corresponding change to the divisor. 3B.6450.964 1. from an ‘infinite’ sample. between 90 and 91.5.690 1. one would expect the probability plot to show as a fairly smooth curve. but the following consideration may be helpful. with. The inclusion of the Cumulative Frequency column is a help.301 0. Of course. such a result would be highly suspicious. If one had a perfect straight line. which appears to exclude values between 88 and 89. way to d o this. objective. One then estimates p by the 50% point. one must also bear in mind that.978 1.233 1. The same job could have been done by use of a x 2 test. It was thought that the logarithms of these quantities might be normally distributed. if not a substantial one. the simplest method of estimating these parameters is to draw a ‘good’ line through the set of points. a way of testing the goodness of fit of a distribution to some data. etc. of course. the class described as 9394 in fact contains those skulls with breadths between 92.357 0. and since 90% of the distribution (from the 5 % point to the 95% point) lies in the range p .3. and must therefore not draw very sweeping conclusions on the basis of the exact positions of one or two points.519 0. these logarithms are given below (to base 10.a real plot is due to nonnormality or to chance fluctuations. and to note the values corresponding to the 5% and 95% points. a simple division by 3. As yet. points are necessarily randomly positioned.S GoodnessofFit 115 In practice. thus. (4) A probability plot is. with random data. As we have seen. where we find that the data are measured to the nearest millimetre. The difficulty is resolved by looking above the table. there is no sound. . any two points will suffice to determine the slope of a straight line. as in Problem 3B. and discuss briefly whether you feel that the underlying distribution might be a normal distribution.3B. and one has to decide whether the ‘crooked’ appearance of .017 1. data of that problem.653 0.5 Rotating bends A set of 15 observations was made on the number of million cycles to failure of certain industrial components known as ‘rotating bends’. so one could use values other than 5% and 95% if it were more convenient. Correspondingly.
and thus plot every point individually.944. (One rather basic reason is that the smallest observation can be plotted (against l/n) but not the largest.2.4.5 Probability plot: rotating bend data of Problem 3B. W.e.50). . For the current data n = 15. so the points plotted are (0. is off the edge of the paper. which we shall follow here.6. 1 . A common practice.n) against i / ( n + l ) (expressed as a percentage). (1.301. or 93.25).519.S Solution When producing a probability plot for a set of individual observations. since n l n . . in the solution we used i / ( n +1) in place of i l n . We find the points to be arranged roughly in a straight line. as a percentage).4). while the largest is plotted against $. .25%. &. . like others following. . on the linear axis of the probability paper. 12. 6.5 ( B a d on 'Chartwell' probability paper.75).116 Data Summarisation and GoodnessofFit 3B. loo%. since i/n is the proportion of the sample lying at or below the value taken by observation i . these quantities are rather like the cumulative relative frequencies of Problem 3B. The most natural method might seem to be to plot observation i (in ascending order of magnitude) against i l n (expressed. the smallest observation (in the current sample) is plotted against or 6. .5. This is unsatisfactory. One then plots the observations. except that we make use of all the information available. The resulting plot is shown in Figure 3. Peel & Company Limited. (0.25%. .) Notes (1) The technique used here is in essence the same as that for grouped data (see Problem 3B.15% = 100% . . however. one starts by rearranging the sample values in ascending order of magnitude. produced by H. 93. for a variety of reasons. .) A small adjustment overcomes this particular problem. i. is to plot observation i ( i = 1. but the underlying ideas are somewhat different. against appropriate quantities on the probability scale. and conclude that there is no strong suggestion that the data do not come from a normal distribution. This has the advantage of symmetry.99 Figure 3.
there is a generalisation covering any continuous distribution. however.3B. of course. though assuming normality. in other words. are still reasonably effective whenever the distribution is not too obviously nonnormal. virtually indistinguishable. and in some cases (though we cannot give a justification here) 6 =3/8. many techniques discussed later. so that the average position of observation i is i l ( n +l). (6) In many problems in this book. used more frequently with grouped data. n 3B. in practice. Such a requirement restricts one. most statisticians would routinely obtain a probability plot. As noted above. This is rather hard to summarise. of accidents No. or some equivalent test of normality. with different members producing probability plots using samples of different sizes.) (3) For small data sets. (While we have presented this result for the special case of the uniform distribution. of weeks with x accidents (x) 0 5 1 15 2 21 3 24 4 5 6 5 7 or more 2 cf) 17 11 U e probability paper s (i) to verify that a Poisson distribution provides a good fit for the data. Popular choices are b =0. l ) and plot the values obtained as points on a line of unit length. and given some justification above. A random sample of size 15 is still very small (though the plot suggested in Note 5 is decisive enough) and generally one needs at least 20 observations before one can have much confidence that a plot can distinguish normality from nonnormality. but we can indicate a key result.5 to estimate the parameters of the normal distribution. the n points divide the line into n + 1 regions.6 GoodnessofFit 117 (2) The use of i / ( n + l ) as the ‘plotting position’ also has some theoretical backing. the n points divide the line. b =ID. the plotting position used for the transformed scale of the probability plot is fairly arbitrary.4 are also relevant here. and it is this which gives i / ( n + l ) its theoretical support. No. into n + 1 equal parts. over a period of 100 weeks. (4) The Notes to Problem 3B. on average. One must not. before proceeding with such analyses. this will be an instructive exercise. Clearly. to plotting positions of the form ib n +126’ but there is no general agreement on the best value for b . so one’s judgement as to the straightness or otherwise of a set of points is not really likely to be affected. and may wish to take antilogarithms and thus discover whether the original data could be regarded as normal. used in the solution. techniques have been used which depend for their validity on the data being normally distributed. expect that with only a few observations available one will be able to decide firmly whether or not the underlying distribution is normal. The key result is that each of these regions has mean length (n+l)’. In particular. (This point could usefully be explored with a class. The plots from all these positions are. one can object to the position i / n on grounds of lack of symmetry.) On the other hand. A probability plot is very useful in reassuring a statistician when this is the case. In practice. ( 5 ) The reader will note that the data presented are the logarithms of the number of million cycles to failure of the components. showing clearly that the normal distribution does not fit the original data. one can use the probability plot in Figure 3.6 Distribution of accidents i a factory The table shows the number of recorded accidents in each week in a large factory. . Suppose we select a random sample from the uniform distribution on the range ( 0 .
The points are plotted on the curved lines which correspond to different values. so we conclude that the Poisson distribution provides a good fit to the data. with no obvious systematic deviation.6 (ii) to estimate the mean of the distribution. as indicated on the righthand scale of the paper.11 8 Dam Summarisation and GoodnessofFit 3B. .) (i) The data are plotted on Poisson probability paper in Figure 3. (iii) to estimate the probability that a particular future week will have more than 8 accidents. The assessment of whether a Poisson distribution is a good fit. produced by H.6. The lefthand scale on the paper gives this exceedance probability. is subjective. W. Peel & Company Limited. which is calculated as the proportion of the data taking values of c or more. In the present example the points do indeed lie close to such a line. of the random variable X . based on such a plot. The idea is that if a Poisson distribution is appropriate then the plotted points will lie close to a straight line parallel to the vertical axis. assuming that the same Poisson distribution remains valid in the future. c . The position of each plotted point is determined by an estimate of Pr(X 2 c ) . Solution Figure 3.6) (Based on ‘Chartwell’ probability paper.6 Poisson probability plot from accident data (Problem 3B.
Note The present problem illustrates the use of one of the less wellknown types o probability paper. rather than gathering them together. whose intercept on the horizontal axis is the mean of the distribution.96. .3B.003. the sample mean number of accidents in a week is 2. f that for the Poisson distribution. and reading off on the left hand scale. Thus the intercept of the fitted line provides an estimate of the mean. the estimate is somewhere between 2. then they lie on a vertical line. Observing where the line c = 9 intersects the plotted vertical straight line. (Counting the class ‘7 or more accidents’ as ‘7 accidents’. The estimate is 0.) (iii) As stated above. For the present example. we have included notes at the appropriate points in the solution. it is impossible to be very precise when fitting the line by eye. Because such paper is relatively rare.0. gives an estimate of the probability of more than eight accidents.9 and 3. the lefthand scale on Poisson probability paper gives the exceedance probabilities for a Poisson random variable with mean given on the horizontal scale.6 Goodness of Fit 119 (ii) If exact Poisson probabilities are plotted on Poisson probability paper.
) In each of these cases an inference may be required either in the form of a test of an appropriate hypothesis or in the form of the production of a confidence interval. In this book we cover many different types of problem in which inferences are required. Poisson and normal distributions. But even with this restriction the scope is still wide. for example regression. but not only in those areas. (The case where p is known but u is not rarely occurs in practice. particularly) be working with an extra. In this chapter we concentrate on relatively straightforward cases involving the binomial. Finally. Section 4A deals with problems involving a single sample and Section 4B with twosample problems. and indeed in a vast range of applications. in educational research. but unobservable. and inferences are required about any unknown parameters. The process of inference. nor into the specialised types of inference discussed in Chapters 3 and 5 . are dealt with in other chapters. Applications of inference are naturally frequent in scientific areas. parameter. group from which the sample was randomly drawn. . of drawing conclusions from sample data about the entire. Section 4C deals with discrete data problems. analysis of variance and goodnessoffit testing problems. Thus we may be working with a single random sample. but useful as a steppingstone on the way to dealing with more realistic cases) and of making inferences about one or both of the parameters when neither is known. Using the conventional notation N(p. 4A One Sample: Normal Distribution In this first section we confine attention to the case in which a single random sample is selected from a normal distribution. is in essence the universal process of scientific discovery and shares its obvious importance.4 Inference To most statisticians inference lies at the heart of the subject. unknown. Inference is used in opinion polling and in any form of sample enquiry. The first two deal with the normal distribution. Some types. and in the latter case we may need onetailed or twotailed tests. We have divided this chapter into four sections. The inferences may be required either in the form of confidence intervals or tests of hypotheses. We may (in the case of the normal distribution. but it is important to keep in mind that all are particular facets of inference. or may be comparing results from two samples. or may have no such difficulty.u2). and in particular those involving the binomial and Poisson distributions. the possible problems are those of making inferences about p when u is known (of very limited applicability in practice. in Section 4D we present a variety of problems not falling into any of the earlier categories.
t . (c) By looking only at the number of bolts. With these assumptions.o.) 4A. x2 and F distributions is not as standardised. and a simple and convenient notation is 20. the Fdistribution has two such parameters. and their lengths (in cm) are found to be 5. i.e.68.29. (a) Find 95% confidence limits for the mean length.77. and the 0. for example.e. the point in the tdistribution on n 1 degrees of freedom above which a proportion a/2 lies.05 for the 5 % point.645.1 Inngths of manufactured bolts Bolt. so the limits are or 5. the number of 'degrees of freedom'. l ) lies. of bolts in the box. 7. ~ 0 . the limits are 5. except that r and x2 have a parameter.&. alternatively. we modify the notation in a natural way. 0 1 the 1% point. stating clearly any assumptions made in deriving the limits. or written as the lOOa% point) by z a .5% confidence limits for c ~ and therefore . 5.1).512 + 0.52.05 and n = 10.1% point of the Fdistribution on 3 and 7 degrees of freedom (18. ~ . (Another way of expressing this is to say that (5. For the present data.71. i. 8 2 . 5 . etc. 0 5= 1. 5. .512.e. and s2 denotes a sample variance (using the divisor n . its ~ . 95% confidence limits are given by X 5 1 .36. Notation for percentage points of the normal.) wl X2. so that. ~ 0 .of the lengths of the bolts is known. N ( p . 5.001. ~ value is 1.39. Further examples follow.05 cm2.O1. and it is known from past experience that the variance o the lengths of such bolts is 0. il always be a sample mean.77) will be denoted by F3. so that the sample mean 2 has the distribution N ( p .) (b) The value p = 5 cm is a long way outside the 9. When working algebraically. X (or. against the alternative I f I : p # 5 cm.. attached to them. 5.0. and it is convenient to note our conventions here. 5. 5. the 5 % point of the rdistribution on 8 degrees of freedom will be written t g .05 cm2. (i) The variance.860. 0 0 5 ) when u2 = 0. W e use subscripts to denote the numbers of degrees of freedom.13. 5. and is 0. are manufactured with a nominal length of 5cm. given the result in part (a). p = 5 cm does not seem to be a plausible value for the parameter p. (When we work algebraically. the symbol z seems to be emerging as a nearstandard. (ii) The observations are normally distributed. whose lengths are greater than Scm.45. 5. discuss whether p = Scm is a plausible hypothesis.4A. the 1 % point of the x2 distribution on 10 degrees of freedom (which takes the value 23. 9 6 a / f i . i. out of 10.5.373.1 One Sample: Normal Distribution 121 Notation for inferences relating to normal distributions is virtually standardised. u2. 0 5is the point above which 5 % of for N( 0 .651) is a 95% confidence interval for p. note that when the number of degrees of freedom is a function it will sometimes be placed in brackets.7.139.651. we denote the point above which a proportion a lies (sometimes known as the l O O a percentage point. A random sample of 10 bolts is taken f from a box containing a large number of bolts. A similar notation will be used for the other distributions. u 2 / n ) .373 and 5. thus ~ 0 . Solution (a) We make the following assumptions. as in 1(~1).21) will be written as x$. and so on. In particular. and we shall be using it freely here and elsewhere. 0 . (b) Without doing a formal test of a hypothesis. for clarity. Y = 5. p. construct a formal test of the null hypothesis H o : c~ = 5 cm. For the standardised normal distribution.
that assumption (i) should be made). rather than being a fixed.512 2 2.24. since rdistributions have longer tails than does N( 0 . Assumption (ii). if p = 5 cm and the distribution is symmetric (though not necessarily normal) then p = ’. it is the appropriate percentage point of r . y = 10. Notes (1) The statement of the problem clearly implies that the variance is to be treated as known ( i x .12)2/10 and t9. However. that observations are normally distributed. from the Central Limit Theorem.2874 . so that H . and only if. But z will be in this rejection region if. i.)lo + ( + ) ‘ O = (+)9 = 0. will in this case be rejected also at very much smaller significance levels. is needed because the sample size is approximately used is rather small. po is outside the corrcsponding confidence interval for p with confidence coefficient 1 . This reflects the additional uncertainty because of the fact that u is unknown.002. so the limits become 5. In fact. regardless (almost!) of the distribution of the individual observations. Since the probability found is less than 0. and each has the same probability of having a length greater than 5 cm.0.05177/10.025 2. constant. . or 5.l ) . l ) . and the probability of obtaining a value of Y as extreme as the observed value.512 2 0. the null hypothesis Ho: p = 5 cm is equivalent to Ho:p = and a twosided alternative will be equivalent to H . is i. the result is asymptotic. because a twotailed test is required. = the limits of the 95% confidence interval are 5. However.(55. . known. and cannot be assumed to be effective when the sample size is as small as 10. Pr(Y = 10) + Pr(Y = 0) = (. i. Then the expression X z a R a / f i for the confidence limits would need to be replaced by X f t ( n . 4. but the limits for p are wider. the sample mean normally distributed.1% level. Thus the number Y of bolts longer than 5 cm is a binomial random variable with the number of ‘trials’ equal to 10.122 Inference 4A. The probability p of success depends upon p. * s2 = 1l {?xi? n 1 i=  [ E x i ]I n ] 11 n 2 = $ (304.a. (2) There is an equivalence between tests of hypotheses and confidence intervals. where s 2 is the sample variance and r(. d s / G .e.000 vmmi = 7. z = 5.26. The value of s 2 is very similar to the value specified earlier for u2. In the present example.5.163. In the present example. at the 5 % and 10% levels) but not at the 0.349 and 5. : p # The test statistic is simply Y.pl). will be rejected in a test of size (or significance level) a if the value z taken by Z is such that 1z 1 ? z d .aR is the value exceeded with probability a/2 by a random variable with the rdistribution on n 1 degrees of freedom.01. suppose now that u had not been given. making the assumption of symmetry.e. and so had to be treated as unknown. unless the shape of the distribution of individual observations is known to be very close to that of the normal distribution. a test of H o : p = poagainst H I :p # powill have as a test statistic x and H .675.512 . of course. Thus. Since po= 5 cm is outside the 95% confidence interval found in part (a) it follows that H o :p = 5 cm will be rejected at the 5 % level. H o would be rejected at the 1% level (also.1 (c) Each bolt is independently chosen. When the sample size is large.26Xv0.
i. the mean and median are.a. How large a sample of the components must be taken to be (a) 95%. of a n appropriate confidence interval. Thus we have. of course. for case (a)(i). so that (i) as n increases. with confidence coefficient 1 . as 1 . and also to obtain narrower confidence intervals. These three results are all intuitively reasonable. We now assume that lifetimes are normally distributed. However. keeping u2 and a fixed. increases.96 and 2. rounding up to the nearest integer (see Note 6).) Because. keeping n and a fixed. (ii) as u2 increases.a . (b) 99% confident that the error in the estimated mean lifetime of such components will not exceed (i) 15 hours. (iii) as a decreases. forcase(a)(ii). in fact. little could be done without making a n assumption of symmetry. the null hypothesis is then expressible as Pr(observation > 5 ) = +. n 2 n 2 As might be expected. and is a nonparametric test. the width increases and is proportional to u. the probability that an observation is greater than 5 cm. (iii) as Q is decreased. Since the test converts the hypothesis into one concerning p . . the confidence coefficient.e. w o . and we require w . w .4A. When the distribution is symmetric. since in principle other types of test could be constructed. identical. of normality. based on a random sample of n observations. larger sample sizes are needed to achieve higher degrees of confidence. T h e test used in the solution is. where w o = 15.58 in (a) and (b) respectively. and one is really testing the hypothesis that the population median is 5 c m . the width increases. 20 in (i). since I the confidence interval has endpoints X 2 z a u / f i . the width decreases and is proportional to 1 / 6 . known as the sign test. keeping n and u2 fixed? The standard deviation of the lifetime of a certain type of electrical component is 144 hours. (ii) respectively.2 One Sample: Normal Distribution 123 (3) The solution given to part (c) is not the only acceptable one. and that the term ‘error in the estimated mean’ is interpreted as the halfwidth. 4A. Now w 5 w o implies z d u / V % 5 w o or n 2 ( z ~ u / w ~ )We. since the width is proportional to z d and zan increases as a decreases. are given that u = 144 and ~ from tables of the normal distribution we find z d = 1. (ii) 20 hours? Solution The width of the confidence interval is 2 z d u / 6 . (Other interpretations are discussed in Note 4. for the mean of a normal distribution with known variance u2. How does the width of the interval change (i) as n is increased. w = z d u / V % .2 Liletimes of electrical components Consider a confidence interval. (ii) as u2 is increased.
rather than on specific distributional assumptions. then n 2 in order to provide a confidence interval of halfwidth less than or equal to W O . we must have 2 z a . u 2 / n ) . 1 ) . and k d f i can be interpreted as the ‘error in the estimated mean’. By contrast.e. n (T [F] . For case (a)(. so we have the same answer as before.l ) . the estimated value for n will be approximate. Hence the sample size needed to achieve a given width has to be much greater when it is based on Tchebychev’s inequality. where k is any positive constant. There are also possibilities for transforming data before an analysis to reduce skewness. No alternative distribution seems very natural. The inequality states that.2 Notes (1) The assumption of normality is almost certainly intended here. for example. the mean of the observations will be approximately normally distributed (by the Central Limit Theorem) even if individual observations are not. In any case. I t(. However. x k2 so that k = 4... (2) In the solution we assumed normality of the observations. Because Tchebychev’s inequality holds for any distribution. this will only be an approximate guide to the correct value for n . ( 5 ) If the variance. If n > m .l). However.d > t ( . even though distributions of lifetimes are often positively skewed and hence not normal.= 0. it follows that x  Z = a/ n . i.a . in which case no further observations are needed. although general results can be obtained from Tchebychev’s inequality? which is valid for any distribution. but it is more likely to overestimate than to underestimate since r(. For a 95% confidence interval. with the large values of n found in the solution. then its variance is a%.will provide a guide to the size of sample needed :or . so for that interpretation the required sample sizes will be ( l / z d ) ’ times those for w . Then k d 6 5 wo implies that n 2 k202/w$ = 2Oo2/w$. Usually the estimate of cr2 used in the confidence interval will be based on all the n observations taken. since k r N ( p . (3) Another interpretation of the problem is that we are required to find the minimum value of n such that P ( 1 pi Iw o ) 2 1 a. If Y = .l). unless n c m .124 Inference 4A. u.5. Pr(jYE(Y)l >k m ) < 1 2. the standard error of the mean is u / f i = w / z d .05. is not known. 2 For t y probability to be. w . a for n > m . Similar calculations may be done for the other parts of the problem. and Pr( 1 X pi  5 wo) = Pr wo6].N ( 0 .47.5 and 4B. the confidence interval which it gives for p is generally much wider than a corresponding confidence interval based on specific distributional assumptions. since it is the halfwidth of a confidence interval derived from Tchebychev’s inequality. so that the required sample sizes will be 22. . times those for the half interval. or 4. for any random variable Y. so that n 2 20x(144)’/152 = 1843. the data in Problems 3B.).Q . and the standard error of the mean. w o = 15 and u = 144. (4) Other interpretations of ‘error in the estimated mean’ include the full width of a confidence interval for the mean. greater or equal to 1 . See. put 1 . but a estimate s2 is available based on an initial (pilot) ’ sample of size m . and will be different from s2. The full width of the interval is 2 w .where logarithms of the original values were used.
e.00mm. the more easily memorable percentage points of N(0. Such a level is commonly regarded as providing moderate. we therefore have Pr(1.050 rnm. that p > 12. and that only p (the parameter of interest) is unknown. a ztest cannot be used. i. .12. the diameters of a random sample of 105 marbles produced were found to have mean 12. T h e twotailed 5% point of tla being 1. so the result is just significant a t the 5% level. but the 5% critical value is now 1. Pr(X . so we need n 2 355 in order to obtain w 5 wo.But n must be an integer.3 The diameters of marbles A machine produces marbles whose diameters are normally distributed with mean 12.3 One Sample: Normal Distribution 125 (6) For case (a)(i). After modification of the machine.95.010.66. 9 8 ~ 1 6 p 5 5 X + 1. in the three other cases. i. as in others when the sample size exceeds about 30.04. O.98 c or + s/ n ~r 1. instead of for a test of a hypothesis. i. l ) .98) = 0. e. that p = 12. 104. If n = 354.049. and a ttest is appropriate. We note that X = 12.98. for which the critical value is 2.050 and that n = 105. so here t = 12. and the twotailed 5% point of t l w is 1. may be used as an approximation. a twosided alternative. degrees of freedom. that p # 12.98s/fi) = 0.1).00mm. The t statistic is defined as t = w.000 = 2. using equation (*).1 . The confidence interval argument would have started from the fact that.98. and a onetailed test would have been required. The null hypothesis is that the modification has not affected the mean diameter.00 mm. the tstatistic can be considered as approximately distributed as N ( 0 .010 .95. we conclude that the mean diameter has been affected.OSdl05 and this must be compared with percentage points of the tdistribution on n1.g.e. evidence against a null hypothesis. a twotailed test is needed. the statistic t has Student’s tdistribution on 104 degrees of freedom. if not strong. Notes (1) Had the problem referred to the modification increasing rather than affecting the mean diameter. ( Z ~ U / W = 354.e. since it is a sample size.4A.e. The tstatistic is still 2. (2) With 104 degrees of freedom. (3) The problem might easily have asked for a 95% confidence interval to be given for p. that s = 0. So in this case. The hypothesis is still rejected at the 5 % level but not at the 1% level. the alternative is that the mean has been affected. then w is (very slightly) greater than ~ ) ~ wo.96 for a twotailed 5 % point.049. 1. 4A. the alternative hypothesis would have been onesided.00mm. Would you conclude that the modification has affected the mean diameter? S o htion Since the population standard deviation is unknown. Similar considerations apply. Since the alternative hypothesis is twosided.010 mm and standard deviation 0.36. of course. i. The conclusion is the same.
such an interpretation would be wrong.980. . where s2 is the sample variance.) (4) Although it is correct to conclude. the critical value.) ( 5 ) The use by statisticians of the word ‘significant’ gives rise to much confusion. and interest centres on the variance u2. This question will often be of interest. though.050mm.010. say N ( p . Would you report to the manager that the new machine has a better performance? Solution With data of this sort we assume that the weights are independently normally distributed.00 from the 95% twosided interval is merely a reexpression of the statement that on a 5% twotailed test the hypothesis p = 12. amount.H I : u2 < 02.00 seems to come just on the edge of this confidence interval. for which we have HO: u2 = 08 = 15.010mm and s. Had one of the sample values been only a little smaller. of course. we have %Ti 9713 and 2: = 9435347.1 . 953. and would be judged highly significant. n 4A. 975. . The difficulty lies in the fact that the word has slightly different technical and nontechnical meanings. 12. X + 1 .049 is only just larger than 1. The mean p is unknown. o r ‘significant’. a hypothesis test cannot tell us by how much the mean for A exceeds that for B . denoting the 10 sample members by xl.22 . 977.X ) 2 = 1110.90.020). n = 105 gives the answer. 9 8 s / f i ) and substituting X = 12.4 Variability i weight of bottled fruit The manager of a bottling plant is anxious to reduce the variability in net weight of fruit bottled. The true meaning of ‘significant’.050. u2). and the net weights (in grams) in randomly selected bottles (all of the same nominal weight) are 987. From the ‘significance test’ one might decide that ‘the mean yield for A is significantly higher than that for B’. note. 0. in this context. one reason for using large samples is to enable us to detect very small differences. as in the solution.1.1. As we have seen. or important.98. is simply that the observed result signijies that the null hypothesis is felt to be implausible and is thus rejected. most practising statisticians would take note of the fact that the value 2. that we have not said by how much it is greater. This is.d. To focus ideas we consider the example of a crop experiment in which the aim is to compare the effects of two fertilisers A and B . A new machine is introduced. the difference between the sample means is sufficiently large to indicate to us that the true mean for A is greater than that for B . Indeed. if we were to work to unrealistic accuracy. 966. Now the natural interpretation of this is that A is better than B by an amount which is ‘significant’. and statisticians are. and the complementary technique of confidence intervals aims to answer it. averse to basing decisions on evidence as slender as this.805. 955. 9 8 s / f i . viz.00. and hence we cannot say that A is better by an important. The null hypothesis value 12.. generally. Unfortunately. and hence = Z(xi . However. In other words.4 The 95% confidence interval for p is thus (X . we would find the lower limit of the confidence interval to be just above 12. 981. (See also Note 2 to Problem 4A. T h e exclusion (if only just!) of this point p = 12. 967. Over a long period the standard deviation has been 15.2gm. For tests concerning the variance of a normal distribution the appropriate statistic is C = ( n l)s2/u$ . s = 0.98. applying each fertiliser to a random sample of plants and using a t test to compare the sample mean yields. the result might not have been significant at the 5% level. so that A would be likely to give much better results than B . and not unnaturally these can be confused. . the value of t from the combined sample of 210 would now be 2. (12. which reflects the fact that the tstatistic in the solution was very close to the critical value 1.000. In such cases one might recommend that further evidence be obtained with a view to clarifying the matter. (nl)s2.126 Inference 4A. x r o . Under the null hypothesis .00 is rejected. 972. (Note that if a second random sample of 105 also had mean 12. that the mean diameter has been affected. so C = 4.
it may be.e. since we wish to accept the alternative hypothesis that the standard deviation has been reduced only if the sample standard deviation is sufficiently small. The mean weight of 9 randomly selected items was 362 gm. we can thus write with equal tail areas being omitted at each end of the range. So in the case being discussed. Notes (1) The x2 distribution is asymmetric. the weight is expected to be 350gm.5 One Sample: Norrnnl Distribuiion 127 C has the x 2 distribution with 9 degrees of freedom. the interval is (58.28).350. Nine randomly selected items had weights (in grams) of 327.) Using tables of the x2 distribution. (3) With many modern calculators it is quite feasible to work directly with raw data (as we did above) when obtaining sums and sums of squares. The confidence interval for u2 is thus (n l)s2 ( n l)s2 19.l ) s 2 = 1110. would it be reasonable to suggest that the standard deviation has probably changed? . helpful to remember that E(C) = v. The null hypothesis that the standard deviation is unchanged at 15.and less usual . but one must naturally be careful to avoid errors resulting from truncation. 411.331. where v is the number of degrees of freedom. one starts with a pivoralfunction. In the problem above it is clear that the lower . Was there evidence of a change in weight. say. x2 with 9 degrees of freedom. and if so a t what level of significance? (b) The process was changed. and its distribution is indeed known. and the standard deviation is known to be 20gm.397. particularly when finding the square of a number with many significant digits.5 Changes in weight of manufactured product (a) For a certain type of manufactured product. is found by simply taking square roots.368. and indeed this would have been very desirable had the data been analysed without using a calculator. For the data above. (Another example is given in Problem 4B. and many people find using its percentage points tricky. but with the standard deviation remaining the same. so we would not report that the new machine is better. a function of sample members and the parameter of interest alone. values of x2 above 9 are towards the upper tail.02 ' 2.44. and the lower 5 % point is 3.6.2. leading to the interval (7. 1 4A.1.70 Since ( n . To obtain a confidence interval from first principles. we might well have coded by subtracting 950. In working out what values of x2 constitute the lower tail.30.36.367.359. Is there now evidence that the mean weight has been increased? (c) For the data in part (b).2 gm cannot therefore be rejected. i.) (2) Suppose the problem had asked for a twosided 95% confidence interval for u2 .15). In the present case.374. 20.tail is needed. (A fuller discussion of this point is given in Note 4 to Problem 3B. values less than 9 towards the lower tail.385. the standard deviation. The corresponding 95% confidence interval for u.64. whose distribution is known. C = ( n l)s2/c2 is such a function.
4. is provided again by the sample mean X. and we conclude that the evidence does not suggest that the standard deviation has probably changed. (c) We now need to test the hypothesis u = 20 against a twosided alternative hypothesis. in the sense that if the distribution is actually not normal but of a not utterly dissimilar shape the size (i. ul n 20i 9 This lies between the familiar percentage points 1.1 degrees of freedom.80.which is again 362.745. say) is known to be 20.J = y = 1. . Now &. tests on variances are distinctly more sensitive to nonnormality. It is hard to imagine circumstances in which.80. (b) The natural estimate of The test statistic z is. in particular. T h e sample mean X is 362. In practice one would usually avoid such an assumption by using a ttest. but it is really a good idea always to make assumptions explicit. n being the sample size. thus. Of the assumptions made here. as in the formal solution here. O . one could be so sure that the standard deviation was unaltered that one would wish to analyse data without first checking the assumption. The which under test of a general null hypothesis u = uo is based on the statistic C = ( n .645 and 1.55g2/9 = 4298. Notes (1) All the analyses undertaken in this problem assume that the observations are a random sample from a normal distribution. so the result is significant a t the 10% level only. a change having occurred. or more formally using a goodnessoffit test such as the x2 test. so the value observed lies comfortably in the central portion of the distribution. or 10. to be compared with tables of x2 on 8 degrees of freedom.g = 3. The problems in Section 3B show how such tests can be done.49 and X $ .l)s2 = 38894 . though with a sample of size only 9 it would be almost impossible to determine whether the underlying distribution is normal. We conclude that the evidence for a change is only very slight. therefore. Such assumptions are commonly left unstated. it would not be possible to assess randomness without information as to how sampling was conducted. falls short of normal statistical practice. still 1..S Solution (a) The population standard deviation (a. but the wording of the problem shows that this time a onetailed test is needed. normality can in principle be tested informally using a probability plot. We d o not. Hence ( n . We test the null hypothesis that the mean p is alternative. the significance level) of a test will be close to the nominal size. (2) Notwithstanding the comments in Note 1. The test statistic C is thus 4298/202. By contrast. ~ = 13.e.645. reject the null hypothesis a = 20.96.l)s2/a$. assuming randomness. Here n = 9 and (coding the observations by subtracting 300 from each) the sum of the observations is 558 and the sum of squares is 38 894. and we calculate z = XPO 362 . The 5% point is therefore 1. (3) This problem is based on a real examination question. z tests and ttests are not very sensitive to failure of an assumption of normality. and Ftests (for comparing two sample variances) are extremely sensitive. But the analysis required in part (b).128 Inference 4A.350 . so that a rtest is = 350 against a twosided unnecessary. this null hypothesis has a x2 distribution on n . and we would conclude that there is reasonable evidence for an increase in mean weight. However. This result follows from the Central Limit Theorem.
u2.6 4A. Solution (a) Given a set of n observations. . x 2 . (Further discussion can be found in Note 8 to Problem 3A. the quantity s : = l 5 ( x . then the sample mean. if n is large. or its square root. is used. construct a 99% confidence interval for the mean of the distribution. (b) The obvious estimate of the mean.and . which can be written as (c) If the original distribution has mean p and variance u2. can be used as an estimate of c2. $ ( n 1) . n1 . x. t o why s: might be used) but the main one is that s2 is o. .. provides a suitable measure. One Sample: Normal Distribution 129 . . Of course. it makes very little practical difference whether s 2 or s. (d) Given that n = 2 5 .4A. x.X )2 is usually preferred as an estimate. xl. the sum of the observations is r . If the n observations are a random sample from some larger (possibly infinite. The statistic s. random sample of size m . . There are various reasons for this preference (and some reasons. .1. and use it to test whether or not this mean could be 20.. p. . independent.X )2 and n j=] .) I f we simply wish to summarise the ‘spread’ of observations.= n m ____ m(n1) 1 3.6 Estimating the mean and variance 1 ” (a) Discuss the two expressions . is not. has a distribution with mean and variance a2/rn. . . x 2 .I  (n 1) i=l 5 ( x . write down expressions (based on t l and t 2 ) for estimates of the mean and variance of the mean of a further. . and the sum of the squares of the observations is t 2 . for a sample of size m . = I both of which are used to measure the spread of a set of observations x l . as well as being a summary measure for the sample. but s2 = . Explain how to estimate the mean and the variance of the distribution from which the random sample was taken. (b) A random sample of n observations is taken from a distribution.( x i . It can be thought of as the variance of the discrete probability distribution which assigns probability l / n to each of the values xl. from (b). t l = 400 and r2 = 8800.X)2. possibly hypothetical) population or distribution. The obvious estimates of these quantities are.F ) ’ gives a n i=l measure of ‘spread’ for this set of observations. (c) Given the random sample described in part (b). x 2 . following on from part (a). X = tl 32 . then we may want to use the sample to make inferences about the variance. then s: . whereas s. . . ..2 (xi . from the original distribution. . Also. obtained from an unbiased estimator for u2. is r 1 h . . the obvious estimate of the variance c2is s2. x . . . . of the population or distribution.
(2) The value 20 hypothesised for p is not very far inside the interval. . Assuming that the variances of the weights are the same on the two days.W5 = 2. for further discussion of the links between confidence intervals and tests of hypotheses.42).0 2 5.58 . or 16.1. that is.1 Weights of tins of peas (a) On a particular day a random sample of 12 tins of peas is taken from the output of a canning factory. 21.130 Inference 4B.0. regardless of the distribution of individual observations. * $ I & & I * Notes (1) No assumptions are asked for.58.0 and s2 = Now X = 8800 = 100. or spelled out. In many respects the analyses of means are.) 4B Two Samples: Normal Distribution We now move on to the naturally more complex case where we have observations from two normal distributions. we have t24. (c) Assume now that the samples on both days are from the same population. rather than a 99% interval.80. in part (d).6. For example. Thus the hypothesis p = 20 would have been rejected at the 10% level. or perhaps it could be assumed that.0.80. In either case the expression X 2 z & s / G might be used for the confidence interval.80xlO/d%. The mean and standard deviation of weight for the sample are 301. Find 99% confidence limits for the mean weight of peas in tins produced by the factory on the day in question. mathematically. This interval includes the value 20. (b) On the following day a further random sample of 12 tins is taken. = 16. Thus the required interval is (10.1 gm and 1.0 2. But ~ 0 .l ) . n = 25 is large enough for the sample mean to be normally distributed.6). For example. (See Note 2 to Problem 4A. amongst other places. in this problem.l (d) Assuming that the distribution is normal. and the object is to compare corresponding parameters of the distributions. 48.01. but there are techniques needed for twosample problems which d o not derive directly from the singlesample case. so on the basis of this confidence interval the hypothesis that p = 20 would be accepted at the 1 % level. For n = 25 and Q = 0.de. which does not include 20.0. so the limits of the 9'9% confidence' interval are 16. and the mean and standard deviation of contents for this sample are 302.4. show that a 95% confidence interval for the difference between mean weights on the two days includes zero. when comparing variances of two samples the appropriate analysis requires a test based on the F distribution. if a 90% interval is constructed. and their contents are weighed. using t24.o. from tables. Solution (a) The confidence limits for the mean net weight of tins produced on the first day have the general form . confidence limits for the mean are of the form 400 X tcnl).8gm respectively. Either the assumption of normality is intended.8gm and 1. based on both samples.05 = 1. 0 0 5= 2. then the interval is (12. Find a 99% confidence interval for the mean weight of tins in that population.005 = 2. we might replace t ( .6gm respectively. Also 25 400 t24. 19. so the corresponding interval will be somewhat narrower. straightforward extensions of singlesample procedures.71. & by z a .
05.1.r = ______ "1 + "2  nlx  + n2X2 and n1 + t12 .74. 0 7 4 x m x d  The interval is therefore (.3 f 2 . interval is discussed in Note 3.) . and 5 2 .91).19 gm and 303.6. so the required = interval has endpoints 0.01." is the pooled variance for the two samples (see Note 3). in that it asks for a confidence interval rather than confidence limits. Confidence limits are simply the endpoints of a confidence interval. now taking the values just calculated for the combincd sample of 24 observations.96. a = 0.95 rt 0. = 302. (b) The phrasing of this part of the question differs from that of part (a). n = 12. (c) Since the confidence interval for the difference between mean weights includes zero the assumption specified. But El = 301.41 gm. 302.074. but with F.8. 6 7 / m or 301.I = 301. Given two independent random samples of sizes n 1 and n 2 .1) 24 and = L3 2. The limits are therefore 300. so the limits are X = 301. I =  (12X301.14). a = 0.8.1.005 = 3. The reauired confidence interval has the same form as that in part (a). simpler.95 2 2.0005 x s / f i . . the mean and variance of the combined sample are given by .90.3 for a fuller discussion of these formulae.0.9s. Making the suggested assumption of equal variances for the populations of tins produced on the two days.1 Two Samples: Normal Distribution 131 as was given in part (d) of the solution to Problem 4A.0. with sample means x and X 2 and with sample variances x.81 x 1 . The interval is therefore (300. which includes zero.4B. and t2Z. Thus the interval has endpoints X 2 t23. that the samples come from a common population. (An alternative. In the current example.8.99.106. the expression for a confidence interval for the difference in means between the two days has the general form where s.) For the present data. s and n .1.mS 2.80. and the remaining notation is conventional. X. s = 1.8) + (12X302. which become 301. sp2 = 2. and t11. seems not unreasonable. (See Problem 4B.
T h e management will then be more interested in a lower limit for the mean weight than an upper limit (though. we perform the test. If the assumption of equal variances is in doubt. if the mean weight is too far above 300gm7 profits will be eroded). The equivalence between (twosided) confidence intervals and (twotailed) tests of hypotheses was discussed in Note 2 to Problem 4A. and (iii)al = 0. it should be noted that onesided confidence limits. in particular because it has no upper limit.132 Inference 4B. i11. Suppose that the tins are labelled as weighing 300gm net. or equivalently a onesided confidence interval. For example. and that it is a legal requirement for the mean weight of contents of all tins to be at least 300gm. (2) For the confidence interval in part (b) to be valid. the most usual choices of al. 1.g. etc. leading to a lower confidence limit. Although this type of interval is substantially different from the twosided 99% interval. A lower limit. (ii) a1 = a.8 .0. of the twosided limits.01 = 2. is 301. intervals for variances. any interval of the form where a1 + a2 = a. rather than both.39gm. for completeness.(2. is wanted and the current problem illustrates a situation where a onesided interval might be considered appropriate. however. all that is needed is to look a t one. In the present case s: and s z are close enough for a formal test to be unnecessary but. and intervals. based on the first sample. it still has the same probability of covering the true mean weight. a2 = 0.1 Notes (1) In each part of this problem. a confidence interval or pair of confidence limits is required. Finally. but replace $a by a in the cutoff of the appropriate distribution ( z t .39 gm. it can be tested using the Ftest (see the Note to Problem 4B. with confidence coefficient 1a. has the same coverage probability (1a). The corresponding onesided confidence interval now consists of all values greater than 300. There is also an unspoken assumption that weights are normally distributed underlying all the confidence intervals in the problem.718x1. 1x2 = a. There is similarly an equivalence between onesided confidence intervals and onetailed tests of hypotheses.) by simple modifications of the expressions for twosided intervals. In practice. differences between means. can readily be found in other situations (e. so that a 99% lower confidence limit for the mean weight. with confidence coefficient 1a. Occasionally a single confidence limit. Often. and a corresponding onesided confidence interval with no upper bound. and a corresponding onesided confidence interval with n o lower bound.8/*). or 300. leading to a n upper confidence limit. a test of H o : p = p o against H1: > p (or H I : C h)will reject H O at significance level a if and only if p o is p below (above) the onesided lower (upper) confidence limit for p. binomial parameters. In fact. The ratio of sample variances is easily seen to be . of course. a2 are (i) a1 = a 2 = a/2. one must assume that variances for the two populations of interest are equal. x2. etc.).718. is In the current example.3). leading to the usual (twosided) confidence interval.
and the only difference between the . . i. l l .025 = 1.3 2 1. (3) The socalled pooled variance.0. o .074 by 20. with confidence coefficient exactly 1a. the interval is (1.." = + (n21)sZ f nl+n22 which reduces to . in general. so the expression is an approximation to which gives a confidence interval. unless " 1 and n 2 are large.4B.025 = 2. (The result is not even significant at the 10% level. T h e sample sizes are really too small to guarantee a good approximation but. which makes the latter interval somewhat narrower than it should be. from that found earlier. T h e interval has endpoints 0. 1. the sample variances will be good estimates of the true variances.82 > 1. the interval given by the approximation will not be too different.e.96 { (1%)2 + 11. numerically. not the same as s2 2 = " (x.695.06).3 t 1. when the variances uf and uj are known.96X0. because of the equal sample sizes. Since the two samples are large.. Note that which is. for the two samples is given by s (nlI)s? s. and normality can be assumed. on a twotailed test. the sample variance which . interval found there and the present one is the replacement of t22. Returning to the expression we illustrate its use for the current data.1 degrees of freedom and. o 5 = 2.1 and n 2 . where 's is the pooled variance.) When the variances are not equal.27. for large enough samples.1 Two Samples: Normal Distribution 133 This is to be compared with percentage points of the Fdistribution on n . because of the f Central Limit Theorem. In this event an interval whose confidence coefficient is approximately 1a is given by using a natural notation. In fact. +sj) when n l = "2. we would accept the null hypothesis that the two variances are equal. because sf and s j are very similar. there is no single standard form for a confidence interval for the difference in means. T h e latter expression is still approximately valid in the absence o normality.96.66. since F l l .  X)2 j = l i = 1 n1+ n 2  1 .6)2)$ 12 12 = 0.(s. as used in part (b).
In the present example.134 Inference 4B. The appropriate test statistic . 302. Group B : mean 70. So we have nA = 250.4. has endpoints This will generally be slightly wider than that based on s . so we use the first expression.e. s i = 60.2 would be obtained if we treated all n1+n2 observations as coming from a single population. Solution We use a natural notation. alternative hypothesis and conclusion.96. and is somewhat easier to calculate.2.2 Experimental teaching of arithmetic In a study of a new method of teaching arithmetic. F B = 70. which uses the additional information that the two samples are drawn from populations with the same mean. After completion of the course all the pupils were given the same test paper. but clearly not the means. as required in the final part of the problem. effective.6. The = 250 observations on method A will be assumed to be. and similarly the . A. of sizes 250 and 150 respectively. with a subscript A or B indicating the group concerned.8. 400 children were divided a t random into two groups A and B. we have H o :pA= pB . 48.97.2. a confidence interval can be constructed using's instead of s2. the interval is only slightly wider. X A . and s z = 55. so such an assumption is very reasonable. is given by and the statistic r has the rdistribution on nA + ng . variance 60. Such an interval . are assumed to be the same.4. Since the population variances 02 and m i are unknown. a natural test to consider is a twosample rtest. In part (c). variance 55.92.2 degrees of freedom. is (300. and hence gains a degree of freedom. independent N ( ~ Aui). the population variances. Since B is an experimental method a natural alternative to consider is whether it improves on the standard method. Those in group B were taught using the experimental method. compared to s the interval using s 2 which has width 1.93). claimed to improve on conventional techniques. for the hypothesis H o is then s2  1 ' I = s where s2. which depends for its validity on equality of these two variances. nS = 150 on B are N ( p B . the interval using . u ) T h e null hypothesis will be that the two methods are equally i. F A = 67. However. Group A : mean 67.8. [*+ 3% . In this case and s are very close.6.x B the pooled estimate of the common population variance. Is there any evidence that method B is effective? State carefully your null hypothesis. so we choose the onesided alternative hypothesis H 1 :pa < pa. with width 1. i. In part (b). ttB = 150. and the group scores were as follows. while those in group A were taught by conventional methods.
48.2
Two Samples: Normal Distribution 135
Straightforward calculation now gives
,2 =
2 4 9 x 6 0 4 + 149x55.6 = 58,60, 398 67.8  70.2
so that
t =
=
3.036.
The rdistribution on 398 degrees of freedom is, for all practical purposes, the same as N( 0, l ) , so for a onetailed test at the 5% level the critical value is 1,645. Corresponding critical values for 1% and 0.1% tests are 2.326 and 3.090. We therefore reject the hypothesis at the 1% level (if not quite at the 0.1% level), and conclude that there is strong evidence that method B improves on its competitor. Notes
(1) The plausibility of the assumption that u i = u2 can be judged formally by setting up that s assumption as a hypothesis and testing it. The test for equality of variances of two normal distributions is an Ftest with test statistic s?/s?, here equal to 1,086. This has to be compared with percentage points of the Fdistribution on 249 and 149 degrees of freedom, using a twotailed test (unless there is some reason to d o otherwise, not hinted at in the statement of the present problem). Unfortunately, few published tables deal in such high numbers for the degrees of freedom, but one can sometimes get a lower bound for a percentage point by looking at the ‘m’ row or column of tables of F. So, for example, we find that the 10% point (on a twotailed test) of F , , l S ~is 1.223, and the corresponding point of F249,1@ will be larger than this. Since the observed ratio is only 1.086, the hypothesis of equality of u i and u2 cannot be rejected. (It is s possible to be more precise than this, since there are formulae, quoted in some books of tables, giving approximate percentage points for use when the numbers of degrees of freedom are large.) (2) In the solution we assumed the data to be normally distributed when deriving the r test. In practice, when samples are as large as they are hcre, the Central Limit Theorem gives assurance that the sample means have distributions adequately approximated by normal distributions, unless the distribution of the individual observations is very odd. (3) While arguments based on the rtest are logically sound, there is an alternative twosample test for equality of normal means. This is available when the samples are large (i.e. the theory ju;tifyinf it is asymptotic) but it does have the advantage of not requiring the assumption that u i = us. The equivalent confidence interval procedure is described in Note 2 to Problem 4B. 1.
(4) In the special case in which nA = carried out by comparing the statistic
nB = n ,
say, the test of the null hypothesis pA =
pB
is
X A
1s
s V2/n
with percentage points of the tdistribution on 2n  2 degrees of freedom. Now if a twotailed test is required, one can perform the test by comparing !FA Fs I with t s m , where t represents the appropriate percentage point of the rdistribution. What we are doing is, therefore, to compare the observed difference between the sample means with
t,,&vzx,
where v is the number of degrees of freedom, here 2n  2. The quantity displayed is known as the Least Significant Difference, and an equivalent quantity is used in Problems 5B.1 and 5B.3.
136 Inference
48.3
48.3
Milk yield of cows
(a) Two independent random samples of sizes "1, n2 are available, with sample means XI, F 2 and sample variances s , s$ respectively. If the two samples are combined to give a single : sample of size n 1 + n 2 , and with sample mean X and variance s2, show that
x = nlXl + "2x2
n 1 + n2
and
(b) A random sample of 10 dairy cows is taken from a large herd at Farm A, and the weekly milk yield, in kg, is recorded for each cow in the sample. Similar measurements are made for a random sample of 15 cows from a large herd at Farm B. The sample mean and sample variance for Farm A are 142 kg and 440 k 2 , whereas the sample mean and sample variance for the combined sample of 25 cows are 158 kg and 816 k$ respectively. Calculate the mean and variance for the sample of cows from Farm B.
Find a 95% confidence interval for the variance of milk yield at Farm A. (d) Repeat part (c) for Farm B and discuss carefully, without doing any further formal analysis, whether the variances at the two farms could be equal.
(c)
Solution (a) For the sample mean, we have
X
= (Sum of all nl
"I
+ n2 observations)/(nl + n2)
= (
+
Z x z i ) / ( n 1 + nz), with obvious notation,
i=l
"2
i=l
= (nlxl
+ n z F 2 ) / ( n 1 + n2).
To derive the formula for sample variance we make use of an identity for the sum of squares of any set of observations y1, y2, . . . ,yn about any constant c , viz.
I=l
i ( Y i  c y i y y)' + n ( Y  c y , ci
=
i=l
where is the sample mean of the n observations. From first principles, the sample variance of the n l
,2 = n1+n21
+ n 2 observations is given by
{Sum of squares about the common mean X}
Now, from equation (*), we have for the first sample
"I
E(qiX)2 =
j1
C(X]i  X1)2 + ,Il(Xl  X)* = ( n 1
i=l
"I
 1)s;
+ n1(?r1  X)*,
~ 2  ~ ) ~ .
and similarly for the second sample. Hence
(rr1+n21)s2= (n1l)sf
+ ( n 2  1 ) s J + n l ( ~ l  + n 2 ( ~)~
4B.3
nlFl
"1
Two Samples: Normal Distribution 137
Now, since
X
=
+ n2F2
+ n2
, we obtain
and similarly
Combining all these we reach the required result
(b) T i is simply an application of part (a). For the means, we have (in kg) hs (nl + n 2 ) F  "1x1
x2 =
"2
 25x158 
 10x142
15
= 168.67kg,
For variances,

24x816  9x440
   168.67)2 (142
25 14
 19584  3960  4266.67 
=
81 ,24
kg
x2 distribution;
for example, the
14
(c) Confidence intervals for a variance are based on the interval for 012 has the form
("1
 lb?
where &  I ) , & and upper and lower +a points, respectively, of the x2 distribution with ( n l  1 ) degrees of freedom. For a 95% interval, x&,.m = 19.02 and x$,o.w~ 2.70, from standard statistical tables, so the interval is =
xi?.,l),an x $ ,  ~ ) , ( ~  dare )
c: a? c
(n1
lb?
x&,  0.(1 an)
I"", s]
19.02
=
(208.20, 1466.67).
138 Inference
(d) The required interval is
4B.3
and from a table of the the interval is
x2 distribution
I.
Xm?
( n 2  1)s;
I ) , O 025
'
(n2 __ 1)s;
X(n,
2

1),0 975
we find that
x;d,002~ =
26.12 and
x;4,0975
=
5 63, so
There is a considerable amount of overlap between the intervals for uf and uj, so it seems plausible that the variances for the two farms could be the same.
*
Note
Despite the conclusion above, it is not necessarily the case that overlap between confidence intervals implies that equality of the two parameters is plausible. Consider, for example, the normal means pl and p2 when the corresponding variances a?, ui are known. A confidence interval for pl equality of p.1 and
p2 p2
has halfwidth z d ( a f / n l is implausible if
~
+ oj/n2)',
1
(cf. Note 2 to Problem 4B.1) so
1
XI  ~ 2 > z d ( a : / n l 1
p1 , p 2
+ u?/n2)T.
However, individual intervals for
will overlap if
1 XI Now z d ( u l / f i
x2/
<ZdUl/fi
+ Z&U2/*.
1
+u
2 / 6 ) 2 zd(uf/nI
+ o$/n2)'.
(In the simplest case when u1
= a2
and n 1 = n 2 , the ratio of (ul/V'& + a 2 / f i ) and (a;/nl + a22/n2)' is *.) W e see, therefore, that overlap between intervals can occur when p1 = p 2 is implausible. The formal test of Ho: af = a was not required here, but it is based on the Fdistribution. ; H o is rejected in favour of H I : uf # a2 if
s2 $1 (Since these percentage points of F are always greater than 1, we need only consider whichever of s:/sj and s i / s f is greater than one, and compare it with the relevant Fdistribution.) In the present example, s f / s f = 1.84 and Fl4,g,0.m5 = 3.80, so H O would be accepted. This test can, in fact, be adapted to give a confidence interval for af/a?of the form
 > F(nll),(n2l),d
s12
or 7 > F(n2l),(n,l),d.
s?
But F9,14,0.025
=
3.21, so the interval is 1.84X3.21 ' 1.84
")
= (0.17, 2.07),
which comfortably includes the null value a?/uj = 1
o. = = i1 8 8 i=l Since t7.~fj = 3676.02. Time taken (minutes) Child Beforeinstruction After instruction 1 26 19 2 20 14 3 17 13 4 21 16 5 23 19 6 24 18 7 21 16 8 18 17 (a) Find a 90% confidence interval for the mean time taken by first year children (i) before instruction. Before instruction. the 90% confidence limits for the mean time before instruction are 21. X X ~ .o5 = 1.895 = 16. (b) To obtain an interval for the difference between means on the two occasions. i.4 Two Samples: Norrnal Distribution 139 4B.27). EX?. and (ii) after instruction. 8 x ~ l= j i1 170. and the time taken (in minutes) by each child to complete the task was recorded. I). The following day the children were instructed how to perform such tasks efficiently. we use the fact that the data are paired.98). i=l z. 23.25 5 2.895.. (b) Find a 90% confidence interval for the mean diffcrcnce between times before and after instruction.4 Speeding up mathematical tasks Eight schoolchildren. the time taken to complete the task was recorded for each child and the results were as follows. We shall use a subscript 1 to denote measurements made before instruction and subscript 2 for the later ones. and a week later they were tested again on a similar task.see Problem 3A. Once again.25 *1 .e.1. chosen at random from the first year of a large school. 8 9 5 m = 21.02 . the 90% confidence interval is (15. a mathematical task.4B.5 * 1.* gives the required confidence intervals.23 . (c) Approximately how many children would have been needed in the sample in order to achieve a confidence interval in part (b) whose total width is 2 minutes? Solution (a) The familiar formula X 2 t ( . 17.5 2 1.) After instruction. assuming that times are normally distributed.132. without prior warning. for first year children. 2212. . After instruction the corresponding limits become 16. were given. so the 90% confidence interval is (19. T h e interval is then .48. 8 (Coding the data would simplify the arithmetic a little .
140 Inference
4B.4
where 2 is the mean difference for the samples, and s is the sample variance for the ] individual differences. T h e individual differences are
7 , 6 , 4, 5 , 4 , 6 , 5 , 1,
SO
i=l
Edi = 38 and iE ld ? =
8
8
= 204. Hence
= =
38
8
4.75 (= 21.25  16.5 = X l  r2), and
The 90% confidence limits for the mean difference (or, equivalently, the difference between means) are therefore
4.75
=
2
1,8952
4.75
1.23.
The 90% confidence interval is thus (3.52, 5.98). (c) The width of the confidence interval in part (b) is 2r~n11,0.05 sd
to l/G. If we took a larger sample, then r(n1),o.05 would decrease, and sd would change, but to get an approximate idea of the sample size needed to achieve a given width, we will keep these quantities fixed. A sample of size 8 gives an interval of width 2.46, so to get the width down to 2 (minutes), we will need a sample size of approximately 8 ~ ( 2 . 4 6 ) * / ( 2 = 12.1, and rounding u p gives a )~ required sample size of 13 children.
Notes (1) The expression for a confidence interval for a difference between two means used in Problem 4B.1 would not be appropriate here, because it does not take into account the correlation between the times for each child (those who were fastest before instruction also tend to be fastest after instruction). Because of this correlation, the quantity s:(llnl + l / n J overestimates the variance of X I  X2, hence leading to an interval that is wider than necessary, as we shall now verify numerically. The erroneous interval is
(xl
T'and so it is proportional
 '2)
* t(nl+n,2),0.05
sp f l l
+ lln2.
90% confidence interval has endpoints 4.75
2
1.761 6.96 
{ [;
+

;I};
and the interval is thus (2.43, 7.07). We see that the failure to take into account the paired nature of the data leads to a 'confidence interval' whose width is almost twice as great as that of the correct one. (A related matter is discussed in Note 1 to Problem 4B.5.) (2) As with Problem 4B.1, the situation in this problem is one where a onesided confidence interval might be of interest. Suppose, for example, that the instruction period involves expensive equipment. A lower confidence limit for the mean rmprovement in speed provided by the instruction might then be required before deciding on the purchase of expensive equipment.
40.5
Two Samples: Normal Distribution 141
*
As before, there is an equivalence between onesided (twosided) confidence intervals for differences between means and corresponding onetailed (twotailed) tests of hypotheses regarding differences in means. When diflerences between means (or between binomial proportions  see Problems 4C.5 and 4C.6) are of interest, it is much more common to be asked to set up a test of the hypothesis of no difference between means (or proportions), rather than to construct a confidence interval. The latter procedure is nevertheless a useful one, and it can be used to test the hypothesis of equal means (proportions) simply by seeing whether or not zero is contained in the interval. (3) If we wished to get a slightly better estimate of the required sample size, we could take account of the variation in t(,1),o.05 as n vanes, although it will only decrease from 1.895 to 1,645 as n goes from 8 to infinity, a relatively small amount of change compared with the possible variations in sd as the sample size is increased. However, there is little we can do to take account of change in sd , so we consider it to be fixed. The calculations in part (c) suggest a sample size of 13, but since the value of t should be less than 1,895, it may be possible to achieve the required width with a smaller sample size. In fact, for n = 12, the width is
2tll,o.o5Xsd/fi = 2x1’796=
1.90.
For n = 11, the width is
2 X 1 . 8 1 2 X V T S T i  i = 2.002,
so the required width of 2 minutes is achieved, by these (still approximate) calculations when n = 12, but it is not quite achieved for n = 11. (It will, of course, be achieved for n = 11 if sd for the larger sample is slightly smaller than that of the original sample. Conversely, it may not be achieved for larger n if sd increases.)
4B.5 Peeling potatoes faster An experiment was conducted to compare the performance of two potato peelers, and in particular to discover whether the typical user might be able to peel potatoes faster with one rather than the other. Ten volunteers were used, and each was given both peelers for a period before the experiment in order to gain familiarity with them. In the experiment the volunteer subjects used both peelers on standardised amounts of potatoes, and then repeated the experiment with the peelers used in the opposite order, so as to eliminate any effect due to ordering. The table below gives, for each subject, the mean of the natural logarithms of time (in seconds) needed to complete the tasks.
Subject
1 2 3 4 5
A
Peeler
B
2.34 2.79 1.91 2.60 2.03 1.80 1.81 2.00 1.98 2.30
6 7 8 9
10
2.33 2.76 1.91 2.62 2.01 1.77 1.81 1.99 1.97 2.26
Use Student’s ttest, at the 5 % level of significance, to test whether the peelers differ in their efficiency.
142 Inference
Solution
48.5
For data of this type a paired sample ttest is required. We therefore form the 10 differences d,, . , , , dlo between the results for peelers A and B, viz.
0.01, 0.03, 0.00, 0.02, 0.02, 0.03 , 0.00, 0.01, 0.01, 0.04.
Treating them as a random sample from N ( p , u2), we test the null hypothesis p = 0 (with u2 unknown). In a natural notation, we thus compare
with the twotailed 5 % p i n t s of Student's tdistribution on 9 degrees of freedom, i.e. with k2.262. Since 2 = 0.013 and s = 0.0177, the value of I is 2,327. We thus reject the null hypothesis, and conclude that the peelers differ in their mean level of efficiency. Notes
* (1)
For a pairedsample ttest to be justified, the differences must be a random sample from a normal distribution. A simple model which leads to the required condition is that
result = effect of subject
+
effect of peeler
+
random error,
as long as the random errors are themselves normally distributed. This model is in fact just a restatement of the standard model leading to a twoway analysis of variance, discussed in Problem 5B.3. In fact the pairedsample rtest is just a special case of twoway analysis of variance, though it is customarily justified using a more intuitive approach. (2) The problem asks for a test that the two peelers differ. This does indicate, of course, that a twotailed test is required, but is, as a hypothesis, too vague to use as a null hypothesis. (There is no way in which such a hypothesis could be formally disproved.) (3) The paired sample test is used because the 20 observations arose through both peelers being used by the same 10 volunteers. Had there been 20 volunteers, 10 randomly allocated to peeler A and the remaining 10 to peeler B, the twosample ttest (i.e. the test for two independent samples, as discussed in Problem 4B.2) would have been needed.
(4) If one peeler were more efficient than the other, one would expect it to have the effect of reducing the time required by a fraction rather than by a constant. Similarly, a good subject might do the job in half the time, rather than in 10 seconds less. For these reasons, it seems implausible for a model such as that in Note 1 to apply to the times themselves, but if one takes logarithms the model might well be a reasonable one.
There are many other circumstances in which one feels instinctively that a multiplicative rather than an additive model is likely to be more appropriate; the analysis then almost invariably starts by taking logarithms. This is particularly true in the area of economics, in which many quantities (inflation, wage claims, discounts) are naturally discussed in terms of percentages or proportions. (5) The data used in this problem are recorded to 3 significant digits. But the test used is based on the differences dl , d?, . . . , dlo, and through cancellation these are only available to one significant digit. In practice most statisticians would argue that in these circumstances the tstatistic could not be calculated at all accurately, and that the results from a test ought therefore to be regarded with some caution. In the solution the value of t was found to be 2.327, and the appropriate percentage point was 2.262. These values are really too close for us to feel confident in rejecting the null hypothesis at the 5% levei. Even if we were to reject the hypothesis, there remains the fact that the observed differences are very small. The discussion in Note 5 to Problem 4A.3 of the distinction between statistical and practical significance is of considerable relevance here.
4B.6
Two Samples: Normal Distribution 143
*
4B.6 Estimating a ratio
Independent normal random variables X and Y have means p and p p respectively, and both have variance 1. The ratio p of their means is of particular interest. Show that the distribution of Y  pX depends upon p, but not upon p. A single observation is made on each of X and Y , resulting in values x and y respectively. Use your result to test the hypothesis p = 1 against an alternative p # 1 at the 5% level when (i) x (ii) x
= 0.3 and = 0.3
and y
y = 2.7, = 1.5.
Solution We are given that X
 N ( p , 1 ) and Y  N ( p p , 1 ) independently.
= E(Y)
W e thus obtain
E(Y Var(Y
pX) p
 pE(X)
= pp
 pp
=
0
and, by independence,
~ =) Var(Y)
+ p2Var(X) = 1 + p2.
Since linear combinations of normally distributed random variables are themselves normally distributed we obtain
Y pX
N(0,1+p2),
depending on p alone, as required. To test a hypothesis we examine the sampling distribution of an appropriate statistic, when the hypothesis is taken as true. The statistic Y  pX is appropriate, since the unwanted parameter p is eliminated, and under the null hypothesis that p = 1
Y px
so that
=
Y
x
N(O,2),
77( Y  X )  N( 0 , l ) .
For case (i) the observed value of Y  X is y  x = 2 4 , and 2 4 = 1,697, which we must ~ compare with the wellknown percentage points of N ( O , l ) , using a twotailed test. Since 1.96 < 1.697 < +1.96, we d o not reject the hypothesis that p = 1 on the 5% twotailed test, despite the observed ratio 2.7/0.3 of 9.0. We would naturally expect from this that for case (ii) the hypothesis will again not be rejected, and so it turns out. Now y  x is 1.2, with a corresponding standardised deviate of 1.2/*, which is clearly not outside the range (1.96, +1,96).
Notes (1) This relatively unfamiliar piece of theory gives valuable insights. Note that the problem involves two unknown parameters and p; interest lies in one, p, only, while p is an example of what is actually termed a nuisance parameter. In order to make inferences about the parameter of interest, p must be eliminated from the problem. This is done in general by finding a socalled pivotuf function: that is, a function of the observations and the parameter of interest whose sampling distribution does not depend on any parameter. In the present case Y  pX has this property. (See. Note 2 to Problem 4A.4 for a related discussion.) Another, much more familiar, problem can also be set in the same framework, the problem of inference about a normal mean when the variance u2 is unknown. The mean of the distribution is now the parameter of interest, while u is the nuisance parameter. As is well
this to test the hypothesis p = PO. A common but mistaken view of a confidence interval is that the entire natural range for a parameter (M to M in the case of p here) gives a 100% confidence interval for that parameter.3.20.9po .7516~8 0 .96. with 21. but it is important to note that the confidence interval is not necessarily the interval between the two roots. When x = 0. then the 555 using an obvious notation. and we can generalise.962 = 3. Accordingly a value po for p lies in the 95% confidence interval if 1. for example. the quadratic inequality reduces to 1. The 95% confidence interval for p thus includes all those 01 POX)’ 5 3. if ( ’  5 +1. appropriate statistic would have been 4B.96 for a 5% test. The appropriate test statistic is now Y . the problem is solved by use of Student’s ttest. o r ( y 2 . 5 or 3. and when p = po the distribution of the statistic is N( 0 . Denoting the observed value of Y .77).pfi by y . + 0 . when x = 0. just as Y .144 Inference known.pX does in the present problem. so in case (i) here the 95% confidence interval for p is the entire real line except for the interval (1.e. This simple approach does in fact work for many straightforward problems (and hence perpetuates the mistaken view!) but the present problem has attracted much academic interest because the simple ideas break down.8416(1 + pt). Y P _. one merely changes the coefficient 3. l + pt). l ) . 0.S416)p. with u unknown.p ~ we see that we must compare .5916 . However. if ix I < 1.5916 2 0. 9 ~ + 1.3.8416 = 1. while a lesser degree of confidence will be attached to any narrower interval.8416) .8416. y = 1.pfi. a 95% confidence interval for p consists of all values for p not rejected by a hypothesis test a t the 5% level.0. of course.96 the coefficient of p i is negative. (2) The problem also gives an unusual but most valuable insight into the meaning of confidence intervals. but in fact this case is still more awkward. In the present problem we were required to test the hypothesis p = 1. since the hypothesis p = 1 was not rejected. One would have expected the same feature to occur in case (ii).6 Had u been known. and we use instead thc pivotal function xp S A G ’ whose distribution (Student’s t on n .3 .# m = = with percentage points of N( 0. as is required.96 i. Note that the value p = 1 lies inside the confidence interval.5.2xypo + (x23. 7 5 1 6 ~ 2 0. (For any other confidence coefficient. this statistic cannot be calculated.1 degrees of freedom) does not depend on the unknown parameter u. The easiest way of obtaining a confidence interval for p is to make use of the link with tests of hypotheses. m 1+Pt values po satisfying p ~ ) ’5 1.3. so values between the roots will be the only ones not in the confidence interval! This happens.) One now obtains the confidence interval simply by solving a quadratic equation. 5 0. and so overcomes the difficulty.96’.
but here a twotailed . not rejected) on a 5% test. and that it has been the subject of lively controversy.75. and since in particular the interval for E(X) contains 0 it is easy to see that it is quite plausible for p to be small or large.3 and in Problem 2A. frankly. most particularly the binomial and Poisson. We ought. and in particular the normal distribution can provide a very good approximation. The number X of pink flowers arising from the planting of 24 seeds is binomially distributed.$)(0. the problems are no less important when other distributions are involved. the information available from x and y does not enable us to exclude any real value from the confidence interval as being incompatible with the data. so we are required to examine whether 21 seeds is ‘extreme’ in the context of B (24. then.: (0.1 T h e pink and white flowers A seed manufacturer claims that in a particular variety sold by him there will be one white flower for every three pink flowers.26). to add the comment that this paradoxical . obtaining 21 pink and 3 white flowers. A 95% confidence interval is that interval including all values of the parameter held to be consistent with the data (i. Do you accept the manufacturer’s claim? Solution The ‘claim’ that 75% of flowers will be pink is simply a hypothesis that p .7.1 Binomial a i d Poisson Distributions 145 The corresponding quadratic equation has no real roots. The paradox arises because it is easy to slip into the error of imagining that every 95% confidence interval has a 95% chance of covering the true value of the parameter concerned.75)24. A similar interval for p p = E(Y) is (0. and similarly for other levels of confidence. a finding which gives the amateur and the experienced statistician alike rather a jolt.1150. (In fact these distributions can be especially instructive.25)’ + (0.) A difficulty with the binomial and Poisson distributions is that. so that (m. 2. It follows that the 95% confidence interval for p is (a.This would. Now the probability of observing 21 or more is ) ( .behaviour of confidence intervals noted above has not escaped the attention of academic statisticians.3.75)23(0. Approximations are then of great value. m) can only be a 100% confidence interval. is 0. especially when finding confidence intervals. or 0. positive or negative.46. Of several correct interpretations of confidence intervals. m).e. the probability of a pink flower. 4C. however. Use of the normal distribution is illustrated several times in this section.25)3 + (. particularly when the numbers involved are kept small and convenient. too roughandready) argument to justify this looks at x and y separately.25)2 + (:j)(0. be the significance level on a onetailed test. a 95% confidence interval for p = E(X) is (1.66. Since both intervals include positive and negative values. 3.75)21(0. 4C Binomial and Poisson Distributions While calculations for tests and intervals are generally simplest for normal distributions.4C. calculations quickly become tiresome. so the inequality is satisfied for all real values po. From x = 0. it is clear that it would be hard to sustain a claim that any particular value for p = E(Y)/E(X) was incompatible with the two observations.not to say perverse . The significance level of the result is just the probability of observing a result at least as extreme.75)22(0. Rith increasing sample size.46). In the present case (ii). 0. it is convenient to use here the link with hypothesis tests. as mentioned in Note 1 to Problem 2A. We have perhaps taken the reader quite far enough into uncharted territory already. You buy a packet and plant the contents. A simple (and.75).
(2) For the B(24.1193 lies between the two exact values for the two tails. Using a continuity correction. pink being dominant over white. but of the members of a random sample of 500 men. so there is no question of rejecting the hypothesis p = 0. Reasonable assumptions indicate that X has the binomial distribution with index 500. at least.1785) = 0.1213. 72 could distinguish between them. 0. Notes (1) Twotailed tests for discrete data are difficult.1213 = 0.1150 and 0. Note that the approximate value 0.75. The twotailed test will have a significance level roughly double 0. (3) A 3 : l ratio of pink to white flowers arises from Mendelian theory when colour is determined by a single gene with two alleles.75 is intendcd. when parents with a family history of some hereditary disease contemplate having a child.18 v9n 5 ] = 1 . so overall the significance level is 0. n = 500. 4C.75) distribution a normal approximation is not unreasonable. Probability theory is frequently used by geneticists. Had a significance level been required.3 provides some justification for this. and the difficulty just mentioned is avoided because there is no question of significance even on a onetailed test. A t an elementary level. 0.1150 + 0. rather dry problems of sampling and problems in genetics.1. In fact Pr(X 5 15) = 0. since there is nothing in the problem to suggest that an alternative p > 0. but in the other tail. and under this hypothesis E(X) = 50.1213..2363. Conversely. such outcomes can rarely be identified outside trivial games of chance. . the most reasonable solution would have been to treat the (binomial) distribution as symmetrical about x = 18. and parameter p . whereas in fact 72 were observed. Is the claim justified? S o htion Let p be the proportion of the male population who can distinguish. i. We accordingly accept the manufacturer’s claim. genetic applications offer practical interest to the student of probability. In the current case a twotailed test is needed.) We would then calculate Pr(X 5 15) + Pr(X 121) as the significance level.Q(1. To obtain numerical results gives some satisfaction. It is nowadays being used to advantage in genetic counselling. and let X be the number in the sample who can.115. since the probabilities involved can be calculated exactly.2 Distinguishing between brands of cheese It is claimed that 90% of men cannot tell the difference between two different brands of Cheddar cheese. rather than just algebraically.e. and when crossbreeding occurs in a particular way. (Note 1 to Problem 2A.146 Inference 4C. we have Pr(X 2 21) = Z 14 [ 21 . The null hypothesis is that p = 0. as they are with nonsymmetrical sampling distributions generally. but these are generally only available when one can identify equally likely outcomes for the experiment involved.2 test is needed.1193. and thus a fortiori the twotailed test will be nonsignificant. since it is hard to judge when a result is as extreme as the one observed.
1) is true. since noone would contemplate rejecting the claim if it turned out that 95% cannot distinguish.B ( 5 0 0 . the logic of the test would have been the same.02 (in the sense that a 90% confidence interval should have halfwidth no greater than 0. though the wording of the question does not unambiguously indicate the need for this. (2) In the solution.Pr(X = 2) = 1 . For n = 500. Indeed. These conditions are assured once we are told that a random sample has been selected..1.10(0. we thus calculate 72 .9)8(0.02).1. but use of the normal distribution would not have been justified. Such a result would therefore have been significant at the 10% level but not at the 5 % level.3 give further discussion of the approximation.205.1) . p = 0. Records are scanned for a sample of 80 accounts and it is found that 22 are held by women. and would not have been regarded as providing strong evidence against the claim.09) on a onetailed test. and we therefore confidently reject the claim. How large a sample size would be necessary to achieve this precision? . stating clearly any assumptions which are necessary for the approximation to be valid. Women investors in building societies A building society wishes to estimate the proportion. Consulting a normal table.50 z = = 3. the society would like to estimate p with a precision of 20.070. when p = 0.4c. where this probability is calculated under the assumption that the null hypothesis X.1% point (3.3 (b) Ideally. Using the continuity correction.45(0. Notes (1) The 'reasonable' assumptions referred to in the solution are that the results from different sample members are independent.45). Had the sample been o size 10. however.1 against HA : p > 0.3 Binomial and Poisson Distributions 147 The significance level of the test is Pr(X 2 72).1.  and we require the significance level of such a value.) Had the numbers involved been smaller. (3) Use of the normal approximation to the binomial distribution is standard when the numbers involved are of the order of magnitude of those in this problem. a onetailed test was used.Pr(X = 1 ) . (Notes 1 and 2 to Problem 2A. each having the same probability of being able to distinguish between the cheeses.9)'' = .Pr(X = 0) . for example about just what the target population is. 4C.(0. although if one were undertaking a real investigation there are more questions to be asked. of its savings account holders who are women. The probability is most easily calculated as  1 .9)9(0. It seems reasonable to argue. f with 3 being able to distinguish (clearly unrealistically small numbers but adequate for displaying principles) then X B(10. p ) and the significance level would have been Pr(X 2 3). that the claim is really that at feast 90% cannot tell the difference. (a) Find an approximate twosided 90% confidence interval for p . another way of expressing the hypotheses involved in this question is to say that one is testing H o : p s 0. so we treat X as if X N(50. the normal approximation to the binomial distribution is quite adequate. we find that the value is above the 0. p ..1)2 0. 0.
A second possibility is to square the inequality obtaining the quadratic inequality p'(n + z&) . The two assumptions which are necessary for this result are as follows. and also that the probability of an accountholder being female does not depend on the sex of the other accountholders in the sample.082.148 Inference Solution 4c.201 and 0. so the limits 1. 3 . (ii) The sample size is large enough for the normal approximation to be an adequate approximation to the binomial. p ( l . i. Since the coefficient of p 2 is positive. Problem 4D. zo.725/80}' = 0. A value of 80 is certainly large enough. 0.275 2 0. 1 for a 90% interval. ) Returning to the approximate result p^ N ( p . For the present problem we have n = 80.p ) / n ) .p ) / n ) . (Further discussion of this point can be found in Note 1 to Problem 2 A .p(2na + z & ) + n a 2 5 0.363) . with the usual women in a sample of size n . This can only be approximately true for a finite population of accountholders.357). and these roots will therefore give the endpoints of an approximate confidence interval for p . the results for the confidence interval are unlikely to be very different from those obtained by the cruder approximation given abpve.4 for the corresponding possibilities for a Poisson parameter). if X is the number of N ( p . (i) X has a binomial distribution. 7 0 6 ~+ 6. for which we need the sample to be random.645{0. This implies that all accounts have the same chance of being chosen.a . then p notation. but is a reasonable approximation provided that the sample is only a small fraction of the population. it follows that   This leads to two possible ways of obtaining an approximate confidence interval for p (cf. The interval is therefore (0. 0. which is not the case here.e the interval is (0.3 (a) We use the normal approximation 'p the binomial distribution.193. Replacing p by in the denominator of the left hand side of the inequality leads to a so that an approximate confidence interval for p .e. and p = X / n .275 2 $ = 0.201. so the quadratic equation becomes 8 2 .05 = 1. with confidence coefficient 1 . Given the large sample size. i. p ( 1 .645.363. 4 and the roots of this equation are 0.275X0.645. has endpoints In the present example. unless p is close to zero or one.275. 7 0 6 ~ ~ 6 . but we give them for the sake of completeness. the inequality will be satisfied for values of p lying between the two roots of the corresponding quadratic equation. p = $ and zaR = 1. and.050 = 0. p^ = are 0.
approach is generally used in the fairly common situation where h there is no initial sample. either from an initial sample as here. rounding up to the nearest integer we need n 2 1349. it should be noted that having determined n . Conversely. unless is very close to 0 or 1. The confidence interval based on the whole sample will therefore not have exactly the required width. Then w = 1. and hence greater precision can be achieved. or a different method of sampling adopted. so a further 1269 accounts need to be examined in addition to the 80 in the initial sample. but note that it is not centred on the sample proprtion p . 50 or 100.275x0. which guarantees that the precision will be at least that desired. Notes * (1) The alternative expression for the confidence interval given in part (a) is really too complicated to be used to estimate n in part (b). its values range only between 0. and taken a total sample size n . the procedure is often not too conservative. or 1692 (rounding u p to the nearest integer). the estimate one obtains from the whole sample will generally be different from that used to determine n .3 and 0. the interval is similar to that given by the cruder approximation. a a i. (b) The halfwidth of the interval based on the cruder approximation is i. A n alternative procedure.02 if 2 1. this is standard behaviour for p < : and the phenomenon. 6 4 5 m .e. so that we have p = 0. or (iii) the indicated value of n is so small that a larger value is feasible.21 and 0.725 = 1348.8 Thus. for > lower limit.275x 0 . Suppose that the first sample of 80 accounts has already been taken. or from prior a m for the halfwidth of an knowledge.7. unsurprisingly. Provided that p is not expected to be close to 0 or 1. 2 5 . The main objective is usually to get an approximate (perhaps conservative) idea of the sample size required to achieve the desired precision of estimation. when = However.3 Binomial and Poisson Distributions 149 As predicted. (It could be narrower or wider. becomes more .4c. give very similar numerical values for n . conservative.645d0.275. a n > [] lk : 2 X 0 . In the present example. is to replace p by rather than by . we * ~ s to estimate the value of n required to achieve a given precision. This same. 7 2 9 0. . i. In any case. and hence the interval is longest. but there is no estimate p of p available. in this example. i.) A conservative approach is to round up n to the nearest 10. rather than just to the nearest integer.645d0. the conservative approximation leads to i.02 . However. The lower limit is closer to d than is the upper limit. The two expressions will. This condition will hold if 2 x0.25 for values in the vicinity of p = of p^ between 0. This is another conservative procedure since (1d ) is maximised. a very precise value for n is not usually required in practice in this sort of problem. w = 1 . the upper limit is closer to p^ than is the extreme as approaches zero.275xO.725/n and so w I0. or (ii) the indicated value of n is too large to be feasible (which may be the case in this example). so that a lower degree of precision must be accepted. the conservative approach is a reasonable one to take in these circumstances. or whatever is convenient. since the function i(1d) is fairly ‘flat’ For example. A decision can then be made as to whether (i) the indicated value of n is reasonable. then it can be used in the expression z interval. * (2) If an estimate of p is available.
Q(0. . i.@ ( . Using a normal approximation with continuity correction then gives i. it is often helpful to recall that the null hypothesis can never be proved to be. ) ’ I u I I u I’ where p = np = 20 and u = dnp(1p) = fl. In the present case the aim is really to see whether the hypothesis p = (strictly. but this is. a random sample of 40 digits is selected from a large telephone directory.l . that p = f . 2 = giving x = 18. true. and therefore ~ + 1 . but the alternative is now that p < Accordingly.645. and rejected otherwise. l l ) = 0. the wording seems to hint that the hypothesis to test is that p < f. i 3 (2) In the solution to the second part of the problem we rounded 18.6517. p = is selected as the null hypothesis.3483. one can generally choose the critical values so that the probability is exuctfy 5%. the null hypothesis is. T h e hypothesis of randomness will be accepted if there are at least 17 but not more than 22 even digits in the sample.05. Accordingly. Using a suitable approximation. f A friend of yours claims that in fact more telephone numbers are odd than even. Pr(X SX) = Z @ = 0. When investigating the friend’s claim. of course. Accepting a hypothesis is not the same as being convinced of its truth. This was done because the precise definition of a 5% test requires the probability of rejecting the null hypothesis when true to be no more than 5%. but that is all. Recalling that X is in fact a discrete random variable. and require Pr(17 5 X remembering the continuity correction.150 Inference 4C. still. we obtain  3. Using a normal approximation.05. so that p < is the only remaining possibility. with a onesided alternative. not possible with discrete data. W e need to calculate the value o x such that f Pr(X 5 X ) =Z 0. the probability of doing so is 1 . it is ‘accepted’ if it is consistent with the data.79) . f ) random variable. we first 22) when X B(40.68. Notes (1) In the second part of the problem. In deciding cn the most appropriate null hypothesis in any problem. always are) other hypotheses also consistent with the data. or 0.4 4c. we would thus reject the null hypothesis that p = only if we observed 18 or fewer even digits.6517.68 down to 18 rather than to the nearest integer.2 5 w1. which is the probability of accepting the null hypothesis that p = Since a Type I error is to reject a correct null hypothesis. since there might well be (and. using a 5% test based on a random sample of size 50? Solution A binomial distribution is appropriate here. then. When the distributions involved are continuous. The result is.4 Are telephone numbers random? In an experiment to test whether final digits of telephone numbers are random. find the probability o a Type I error. How would you check whether this is true. and if X denotes the number of even digits.0. In this problem. we would reject the hypothesis only when X is unusually small for a B ( 50. indeed. p 2 f ) is rejected as being inconsistent with the data.
at the 5% level. Discuss whether the two following hypotheses are tenable: (i) that 50% of all women in the electorate agree with the policy.645 we reject. In the wording of the question. a. while rejecting only on 18 even digits would imply a significance level nearer 3%. and 130 of the men and 130 women agreed with the policy. and this would be unacceptable. In part (i). say).(230X0. the appropriate estimate is . and conclude that there is reasonable evidence that the proportion of men agreeing exceeds that of women.0325 and Pr(X 5 19) = 0. the null hypothesis that the proportions are the same. The numbers involved are clearly large enough for a normal distribution approximabon to be used. k random sample of 200 men and 230 women was used. we take the number of women agreeing. we thus obtain.. p ( l .50) = 1. (In fact.5 Binomial and Poisson Distributions 15 1 the normal approximation suggests that rejecting when there are 19 or fewer even digits would give a probability of error slightly greater than the nominal 5%. the relevant distributional results are Under the null hypothesis that p m = pJ = p .p ) [2:0 + __ 2:o I] .96. overall... as binomial. aJ . we test the hypothesis pr: = 0.794 > 1. 260 out of 260 430 agree. approximately. Using a normal 200 230  approximation. say. to assume a binomial model for this problem. Noting that the observed value for X is 130.91 V230 X0. so we calculate 130 z = . and unknown p (equal to pJ .) i) 4C. we need formally to test the hypothesis pm = p. l ) and. To obtain a test we thus need to estimate the common probability p . (ii) that the proportion of men agreeing exceeds that of women. with n = 230. so we d o not reject the hypothesis that the proportion is 50%. use of exact calculations for B ( 50. . for a onetailed test at the 5 % level. We thus calculate as a test statistic 430 130 130 This value is to be compared with percentage points of N( 0 . as in part (i).50. = . since.50 For a twotailed 5 % test the critical value is the ubiquitous 1.0595. and reasonable. Since 1. 130 Estimates of these probabilities are given by bm = 130 and p.4c.645.N [O . one wouid then have been using a significance level of about 6%. using a natural notation. rather than the normal approximation would have shown that Pr(X 5 18) = 0.5 Smoking i public places n A sample survey was conducted to determine the opinion of a large electorate on a new policy concerning smoking in public places. For part (ii).50 X 0. Solution It is natural. X . If one rejected the null hypothesis on 19 even digits. we find the hypothesis that 50% of women agree with the policy to be tenable. the appropriate percentage point is 1.
whereas there are eleven smokers in a sample of 25 history teachers. in the denominator. (a) Suppose that 40% of all statistics teachers smoke. (c) Without doing formal tests of hypotheses. The related topic of confidence intervals for problems with two binomial samples is treated in Problem 4C. But the test performed is probably what an examiner would want. Small random samples are taken of teachers of various subjects and it is found that of 20 teachers of statistics.6 Smoking habits of history and statistics teachers In a large survey it is found that 40% of schoolteachers are smokers. the hypothesis of equality is rejected.8 . using the information from the sample described above. (One might be forgiven for thinking that the distribution might be changed from normal to Student’s t .l). as here. but in fact this does not happen.6. it means that no hypothesis inconsistent with the one stated is tenable. and variance n p ( 1 . so that 4 + 3 . p is replaced by a sample estimate. X . the probability that 4 or fewer are smokers in a random sample of 20 such teachers. The distribution of X is approximately normal with mean np = 8. use the results of (a) and (b) to discuss whether the proportion of smokers among statistics teachers is likely to be the same as that for all teachers. one could hardly use the data given to refute a claim that more men than women agree with the policy. is a binomial random variable with n = 20 trials and probability of success p = 0.) The mathematical result that justifies this procedure is based on asymptotic theory. (b) Find an approximate 95% confidence interval for the overall proportion of statistics teachers who smoke. Taken literally. if.4. so that %O replacing p by should have little effect. It would be excessively arbitrary to use a onetailed test (which tail would one use?) so a twotailed test is the only acceptable possibility. approximately. (2) The wording of part (ii) is rather odd. 430 Twosample tests for binomial parameters can be conducted using the technique of contingency tables. and it is reasonable to a r ue that with 430 observations the estimate of p will be an accurate one. four are smokers. (3) Twosample binomial tests are somewhat tricky.8. discussed in Section 5C. Solution (a) The number of smokers in the sample. 4C. and while it is fairly easy to see why the distribution of is approximately N(O. Calculate.152 Inference 4C. (d) Find an approximate 95% confidence interval for the difference between the proportions of smokers among history teachers and among statistics teachers.6 Notes (1) In part (i) of the question there is no mention of any alternative hypothesis.p ) = 4. it is less straightforward to show that this holds when. the hypothesis stated is obviously tenable.
and n 1 and "2 are the sample sizes.96 for a 95% interval. (The form this interval should take is discussed further in Problem 4C. Thus = Pr(Z c 1.051. 2 2 0.175. that the proportion of smokers among statistics teachers may be different from that for all teachers. though only just.3. s the limits of the interval are o 0. (b) If p^ = X / n . p^2 = 0 . and z a = 1. so the interval has endpoints 20 = 0.4.022. i 2 = l . 2 X 0 .a z . The value is in fact 0. as the approximate 95% confidence interval for p fails to include 0.502). the probability of observing as few as four smokers in a random sample of 20 statistics teachers is rather small.025. 2 .) From the information given we obtain = & = 0. then p^ N ( p .055.24 2 0. = 11/25 = 0 . (The exact probability is 0. n1 = 25 and n 2 = 20. 0.2 2 1 . Note . 0. if the proportion of statistics teachers who smoke is 0 .262. 8 / 2 0 = 0 . which is not too different from that given by the normal approximation. where il = l . although not conclusively.) From part (b). and for the required 95% interval zan = 1.008 00 = 0. 4 . shown in Note 1.051. For the given samplcs. the binomial probability could be calculated exactly as or looked up in published tables of the binomial cumulative distribution function. the confidence interval for p is (0.a l .24 2 1. so that an approximate confidence interval for p has endpoints  a where z d is an appropriate normal distribution percentage point.2.60) = 0. p ( 1 . (d) An approximate confidence interval for the difference in proportions of smokers for the two types of teacher has endpoints p^.009 86 + 0.e.) Thus the results of parts (a) and (b) both suggest.6 Binomial and Poisson Distributions 153 where Z is a random variable having the standard normal distribution. i.96. Notes ( 1 ) In part (a).375) (c) The solution to part (a) shows that. so that the 95% confidence interval for the difference between the proportions in the two groups is (0.4C. 4 4 .p ) / n ) approximately. 9 6 d 0 . (The improved approximation described in Note 2 gives an interval which just includes 0.4.96 V0.
the solution to (b). This is because. or 1. q o = 1po = 0 .154 Inference 4C. However. as with differences of means. variances or ratios of variances).96 we find the roots to be 0. (2) The lower limit of the approximate confidence interval in (b) is rather close to zero. the formal test of Ho: p = p o against H I : p # P O (or against H I :p > p o or H1:p < P O ) is based on the test statistic which has an approximate N ( 0 . * al (4) When making inferences about differences between two binomial parameters p 1 and p ? . 4 . p o = 0 . it is much more common to test [lo: p 1 = p 2 than to find a confidence interval for p 1 . Note. there is not an exact equivalence between interval estimation and hypothesis testing (as in inference regarding a single binomial parameter . bascd on a normal approximation.83) = 0. by incorporating a continuity correction. (3) Although it was not asked for. it is usual to estimate p 1 and p 2 by a common estimate which is the overall proportion of ‘successes’ in the two samples combined. This. If we use the slightly better approximation described in Problem 4C. the probability would be approximated by Pr (. which is substantially less accurate. Note that the lower limit is further from zero. The test statistic here can also be modified. although this seems to be rarely done. Note also that. the overall width is somewhat smaller than that of the poorer approximate interval used in part (b). l ) distribution under Ho.080.034.p 2 . Without this correction.83. the variance of p A i sestimated by p (1 . however. The variance of . 6 .but unlike most inference involving means.080 and 0. and n = 20. The interval is therefore (0. which is identical to the value of the zstatistic without a continuity correction given in Note 1 . differences of means. but so is the upper limit.p ) l n .p 2 ) is estimated by i l Y l / n 1 + d 2 4 2 / n 2 . as in Note 3.6 also that a continuity correction has been incorporated into the approximate expression given in the solution. = 0 .see Note 3 above . 2 .2/. the value of the test statistic is 0. However. With the standard null hypothesis H o : p 1 = p 2 . for the test of the hypothesis p = p o the distribution of p is calculated under H o . in the approximate confidence interval. 0.416). 2 and z a = 1. in which case Var($) is known and is equal to po(lpo)/n. In interval estimation Var(p1 . is because Of the different expressions for variances in the two situations.3. With p^ = 0 . and indeed improved. we find the endpoints of the interval to be the roots of the quadratic equation in p Substituting n = 20. . I s] = Pr(Z I1. which is a signal that the approximation may not be particularly good. in hypothesis testing it is the variance under the null hypothesis which is required. I.416.p^2 is then estimated by a. that in neither case is the test exactly equivalent to the approximate confidence interval given in.
0424. and that the results for different cars are independent. X = 0. as a result of which 10% of production cars have a particular ignition problem. The required probability could.52 2! + __ 2. (b) We now assume X to have the binomial distribution with n = 30.05 the result is significant at the 5 % level. Engineers have come u p with a modification which they hope will end the difficulty. Using your model. in principle. Since this is less than 0.+ __ 4! = 1 .O. so the ‘tail area’ equivalent is simply its probability.042. is the most extreme possible.5% of cars.9)29 = 0. As in part (c). The null hypothesis is that p = 0. the distribution is adequately approximated by the Poisson distribution with the same mean.1. the alternative hypothesis is onesided.e. State and briefly justify a distribution you might consider using to model the number of cars X in this pilot run with the ignition problem. (d) Later. and Pr(X = 1) = 30(0. thus.54 + . and suppose that p is still 0. so that the probabilities are all the same. all with the same probability p of success.9580 = 0.0.01).Pr(X I 5 ) = 1 . but (in the usual notation) with n large and p very small a Poisson approximation is justified. be calculated exactly from that distribution. so that Pr(X = 0) = (0.4c.9)30 = 0. We thus calculate the probability of a result a t least as extreme as 6. (d) We conduct the test by comparing the observation made. B ( 500.7 Ignition problems i cars n (a) A new car model has a design fault. Estimate the probability that in the first 500 cars there are no more than two still with the problem. providing plausible evidence to contradict the manufacturer’s claim.1 and. we find Pr(X 2 6) = 1 .0~01). If in fact no cars have the problem. (b) Suppose that the modification has no effect on the problem. A n independent survey of 500 cars finds that 6 have the problem. so we can take X as Poisson (500 xO. Is the manufacturer’s claim reasonable? Solution (a) The binomial distribution is the natural one to use in this context.0. In 500 cars (assuming independence) the number X having trouble will be distributed 8(500. . the manufacturer claims that as a result of the modification the problem occurs in only 0.1)(0. since the manufacturer is said to be wanting to conclude that there has been an improvement.e2’5 1 + 2. and a pilot run of 30 cars is manufactured using the modified design. The problem faced by the manufacturer is simply one of hypothesis testing. i. The result is significant at the 5% level.5 + ~ 2.7 Binomial and Poisson Distributions 155 4C.1). with the null hypothesis distribution of X .01. Since it arises when one counts the number of ‘successes’ in n independent ‘trials’. 5 . which could be regarded as plausible evidence supporting an improvement.OOS). it would be justified if the modified design is consistently used in the pilot run. 0. The probability required is. S O O X O ~ O O S= 2 . find Pr(X = 0) and Pr(X = 1). The distribution of X is thus B(30. 6. The result obtained. (c) W e now gather that the probability of trouble is reduced to 0. would the manufacturer be justified in concluding that the change has definitely improved the situation? (c) Suppose now that the problem occurs in only 1% of cars manufactured in the modified way.53 3! 2.1413.0420.
The model on which analysis is based is that if X is the number of accidents in the two months before modification. and to test it against the onesided alternative we simply require  Pr(X 5 1) = e’ 5e+ + . we need the sampling distribution Since accidents can be presumed to occur at random. The reader will note. Modifying an accident black spot (a) At an accident black spot junction. on the basis of observing X to be 10 and Y to be 1. W e require to test H o :p1= p2 against H1 :p1> p 2 . Parts (b) and (d) also require the calculation of probabilities for standard distributions.8 (b) One member of the local district council examines the detailed records. for example. and there is therefore some reasonable evidence for an improvement in safety at the spot. we obtain   that is. the distribution of X is binomial with index 11 and P = 2Pl 2Fl + P 2 . the exact result in part (c) is 0. would you conclude that there has been an improvement? Solution of Y . In fact it is not too awkward to calculate the various probabilities exactly. is irrelevant to judging the hypothesis. on average. that the approximation is a very close one. Working conditionally on X + Y = 11. then. where p1 and p2 number in the following month. Now a twosample test for the Poisson distribution is performed by arguing first that the total number of accidents. and could have been placed in Chapter 2. (a) To determine the significance of there being 1 accident. and Y the Poisson(2~1)and Y Poisson(F2). (3) In parts (c) and (d) the Poisson distribution is used as an approximation. Does this provide evidence that the modifications have reduced the risk of accidents? 4C. . Our hypothesis is. and show how such calculations appear in the context of tests of hypotheses. road accidents generally occur at a rate of 5 per month. then X are the monthly accident rates before and after respectively. It is not very common to find a Poisson approximation used in connection with such a test.= 6epS = 0. though. with the fact that there was 1 accident in the following month. conditionally on X parameter p given by +Y = 11. but it is in fact the best way to proceed when the values of n and p (in the standard notation) are such that the approximation is valid. After road modifications. l! Since this is less than 0. because of this. On the basis of this information alone. the mean number of accidents in a month is now unknown.123. and finds that in the two months immediately before the modifications there were 10 accidents. (2) In part (d) we were required to test a hypothesis concerning a binomial parameter.8 (1) Parts (a) and (c) of this problem are exercises in distribution theory. 11.4 such approximations are less valuable now than they once were. * (b) We now work on the basis of information about only three months.05 the result is significant at the 5 % level.156 Inference Notes 4C. there is 1 accident in the first month. that Y Poisson(S). As mentioned in Note 2 to Problem 2A.040. the Poisson is the appropriate distribution. the number of accidents in a month.
.. (A similar illustration is given by inferences about the mean of a normal distribution.9. a particular case of a general finding that tests will be more powerful. (Again. and we would not conclude on the basis of the information available that there has been an improvement. The significance level is 5. the conclusions drawn in the solutions to those parts are different.) Now. Notes (1) Writers are often rather casual when dealing with randomness. in practice.4C. .)(+)"(+) + (+)I1 = 0. since months are of different lengths. The theoretical basis for this is the Poisson process. Warning signs are usually erected.) (3) The test used in part (b) is simple to apply but quite advanced in concept.9. the chance of it being before the modification is $. The under the null hypothesis that pl = p2. in effect. Intuitively. (2) There are several points in the wording of the question meriting some discussion. and in this case matters of definition would obviously be cleared up in discussion with the practitioner involved.8 Binomial and Poisson Distributions 157 (A theoretical justification of this result is provided in the solution to Problem 2A.0751 The result is thus not significant at the 5% level.) . and confidence intervals narrower. one would query use of the word 'month'. p = ) problem thus reduces to assessing an observation of 10 as coming from B(11. when a parameter is given than when it has to be estimated from a random sample. one might wish to take account of seasonal traffic loads and patterns. We have. then each of the 11 accidents is twice as likely to take place in the two months before as in the one month after modification. Trivially. on a onetailed test.1. a binomial model results. i. (. and the reader may have noticed that the solution to that part simply presumes randomness of occurrence of accidents and hence a Poisson distribution for Y . when a2 is unknown confidence intervals are usually longer than corresponding ones when a2 is known: see Note 1 to Problem 4A. p > f . and generally drivers approach recently modified junctions with especial care. In practice statisticians d o not analyse data in isolation. The theoretical basis is given in Problem 2A. in a popular summer holiday area it would be silly to compare accident rates in summer with those out of season. in the latter case we have just two months' figures and could derive from them only a (fairly wide) confidence interval for pl.e. It is customary to wait for a while until it is felt that motorists have become used to the new layout and then make a comparison as in part (b) between comparable periods. (4) The reader will have noticed that despite similar information . one can reason that if accidents occur at the same rate before and after. under the alternative. discussed further in Note 1 of Problem 2A. while in part (a) we are given that pl = 5 . a model for completely haphazard Occurrences in time.7.being given in parts (a) and (b). T h e wording of part (a) was devised to illustrate this. Since this holds separately for each of the 11. and perhaps weather and light conditions.a reduction in accident rate from 5 per month to 1 in one month . This happens because the information in part (a) is in fact much more precise than that in part (b). while. In a similar way one would in practice query use of data for the period immediately following the modification.
Show that. we find that the sampling distribution of T has probability function p (1) = Pr(T = t ) as in the table below. We also include two problems on methods of quality control. others discuss general principles. without replacement. six bags contain 10 apples. and one bag contains 9 apples. Listing these possibilities gives the following table (see Note 1). (a) By considering all possible samples of three bags. These are included mainly to encourage examination of basic statistical concepts.1 Sampling bags of apples On a supermarket shelf there are ten l$kg bags of apples. Suppose that a sample of three bags is selected randomly.1 4D Other Problems Finally we include some inference problems which d o not fall naturally into the categories earlier in this chapter. (b) Show that the sample mean = f T is an unbiased estimator for the mean number of apples. This is a topic of considerable practical importance. * 4D. but often overlooked in elementary treatments of statistics. raising issues such as the value of having unbiased estimators and the meaning of confidence intervals. in all ten bags. which should always be subject to constructive criticism and not just taken for granted. F. nor into the discussion of structured data in the next chapter. binomial and Poisson. T h e problems are basically of three types. Number of bags containing x apples x = 9 x=lO x = l l Number of ways of choosing such a sample Value of T 29 30 30 31 31 32 33 1 0 1 0 1 0 0 2 3 1 2 0 1 0 0 0 1 1 2 2 3 15 20 18 45 3 18 1 120 Adding the probabilities for each value of T . t 29 15  30 31 4a 32 18  33 ~ m i m ?a 1?0 i m 1 10 2 . which were picked out for attention earlier. in this case. (d) Consider the situation where three bags are selected a t random with replacement. the variance of is 0. Some deal with distributions other than the normal. (c) Find the variance of T .158 Inference 4D. and hence the standard error of . x x x Solution (a) There are ('3') = $ = 120 ways of choosing three bags from ten. Three of these bags contain 11 apples. and let T be the total number of apples in the three selected bags. obtain the sampling distribution of T .12. and under random sampling all are equally likely.
4. D1
(b) The mean number of apples in all ten bags is
cL=
Orher Problems
159
(
1x9)
+ (6x10) + (3x11)
10
= 10,2,
While we could calculate E(T) directly, our calculations will be simplified, particularly in part (c) below, if we transform T to, say, S = T  30 . Then E(T) = E(S) + 30, and
3
E(S) =
s =1
sPr(S = s )
= L(15
120
+ 48 + 36 + 3)
=
= __ =  = 0.6.
72 120
3
5
Hence E ( T ) = 30.6, so that
E($T)
=
$E(T)
10.2;
thus fT is an unbiased estimator for p. (c) Since T = S Now
+ 30,
Var(T) = Var(S), and it will be much simpler to calculate Var(S).
v a r ( s ) = E ( s ~ ) { E ( s ) } ~ and
3
E(S2) =
s1
s2Pr(S = s )
It follows that
The variance of
x
is therefore
1 9
=
v a r ( F ) = Var(T)
so that the standard error of
7 ,
75
x is a= 0.3055.
(d) When sampling with replacement, the variance of the sample mean is 0 2 / n , where o2 is the variance of a single observation and n is the sample size. Let X denote a single observation. Then X has the probability function given by the table below.
X
9 1/10
10
11
Pr(X = x )
6/10
3/10
160 Inference
We therefore obtain E(X) = 10.2 = p, and
11
4. D2
E(X2)
=
xx2Pr(X =x )
x =9
= L(81
10
+ 600 + 363) = 104.4.
=
Hence u2 = 104.4  (10.2)2= 104.4  104.04 = 0.36,so that u2/n
0.12.
Notes (1) The entries in the table giving the sampling distribution of T are found by combinatorial arguments. For example, consider the combination (1,1,1). This can occur in
ways. Similarly, the combination ( 0 , 2 , 1 ) can occur in
1
(01
= 45 ways, and so on (2) There is an alternative way of tackling part (d) which, however, takes much longer. In sampling with replacement there are lo3 = 1000 possible samples, each of which is equally likely to be the one chosen. Just as in the solution to part (a), one can construct a table listing the possible ways in which samples can arise, together with their probabilities of occurrence. The same reasoning as in that part of the solution can then be used to obtain Var(T) and hence v a r ( 2 ). (3) In the case of sampling with replacement we found the variance of to be 0.12. The is just the square root, i.e. 0.3465, which is larger than the v a h e 0,3055 standard error of found for the corresponding case where sampling was done without replacement. It is natural to expect this, and indeed it will always occur, since when sampled items are not replaced more information is likely to be gained about the population.
6 3 G) (1)
x
x
Two discrete random variables A discrete random variable Y takes the values 1, 0 and 1 with probabilities $3, 1  8 and ;ti respectively. Lct Y 1 and Y 2 be two independent random variables, each with the same distribution as Y .
4D.2
(a) List the possible values of {Yl,Y 2 ) that may arise and calculate the probability of each. Verify that your probabilities sum to unity. (b) By calculating the value of ( Y 2  Y 1 ) 2 for each possible pair { Y l , Y 2 } , determine the sampling distribution of ( Y 2  Y1)2. (c) Show that X = $ ( Y 2  Y 1 ) 2 is an unbiased estimator for 0, and find Var(X) as a function of 0. (d) Now suppose that Y1,Y 2 , . . . , Y,, are n independent observations on the random variable Y. Since 0 is the probability that Y is not zero, a possible estimator for ti is the proportion, Z , of nonzero values among Y,,Y 2 , , , . , Y,. Write down the mean and variance of Z , and state, giving your reasons, which of X and Z you would prefer as an estimator for 0 when
n =
2.
4D.2
Other Problems
161
S o htion
(a) The table below gives the joint probability for each possible value taken by the pair of random variables {Yl,Y2). The probabilities Pr(YI = y1, Y2 = y2) are given by the product Pr(Y1 = yl)Pr(Y2 = yz), for y1 = l,O,l, and y2 =  l , O , l , because Y 1 and Y 2 are independent.
Y1
Ij ;
Y2
1
:e2
0
1
e 2
+e(i  e)
'e2
The sum of these nine probabilities is
;e(i  e) ; (1  e)2 +e(i e)
+e(i  e) 'e2
{4x+e2} + { s x f e ( i 
e)} + { i x ( i = e2 + 2 q i  e) + (1  el2 = {e + (1  e)}2 = i 2 = 1,


e)2}
so that the probabilities sum to unity as required.
(b) Let U = ( Y 2 given below.
Y I ) ~a; table of values of U for each of the possible values of {Yl,Y2} is
Y2
1
Y1
1 0 1
0 1 4
0 1 0 1
1
4 1 0
We see from this table that U can take just three possible values, and by combining the information from the two tables we can evaluate their probabilities. W e see first that Pr(U = 0) =
+e2 + ( i  e l 2 + +e2 = +e2 + ( i  e l 2
= 12e+p2.
Similarly we obtain
{01 Pr(U = 1) = 4 ; (
and

e)} = 20(1  e),
Pr(U = 4) = 2(fe2) = +02. The sampling distribution of U is thus as shown in the following table.
U I
0
1
4
162 Inference
4D.2
(c) We have X = $U,so obtain directly E(X) = iE(U) and Var(X) = fVar(U). From the sampling distribution of U we find
E(U) = o x { i  213 +
=
;e2} + ix{2e(ie)} + 4x{3e2}
2e(i  e)
+ 2e2 = 28.
Hence E(X) = :(20) = 8, so X is an unbiased estimator for 8.
To obtain the variance of U (and hence that of X), we first find E(U2), as follows.
E ( u ~ ) = 0 ~ x 1 1 2e = 2e(1 
+ ;e2} + i2x{2e(ie)} + 42x{3e2)
e) + 819
= 20
+ 68’.
We thus obtain v a r ( v ) = E(u~){E(u)}~
+ 6€12 (29)’ = 2e + 2e2 = 2e(i + 0).
= 28
Hence Var(x) = +Var(U) = +6(1 +
e).
(d) In a sample of n independent observations Y1, Yz, . . . , Y,, on the random variable Y , each of the n has the same probability 0 of being nonzero. Hence V , the number of the {Y,} which are nonzero, is a binomial random variable with index n and parameter, i.e. probability of ‘success’, 6. Thus E(V) = n o , and Var(V) = n 8 ( 1  0). Hence, if Z = V / n , E ( Z ) = 6 and Var(Z) = O(l6)h. Therefore Z is an unbiased estimator for 0, for any positive integer value of n , and, in particular, when n = 2, Var(Z) = $e(l  0). Since 6 > 0, Var(Z)=$8(1  6) < $6(l + 6) = Var(X). W e see then that X and Z are both unbiased estimators, but Z has the smaller variance, and on this basis Z would be preferred to X as an estimator for 0.
Notes
(1) In part (c) we found the mean and variance of X in terms of those of U ,and then worked with the sampling distribution of U . W e could, alternatively, have found the sampling distribution of X and then obtained E(X) and Var(X) directly from it. Since X = f U , its sampling distribution is as given in the following table.
(2) Part (d) is somewhat openended, since in the statement of the problem no indication is given as to the basis of preference between X and Z . The obvious measures which one would consider using as criteria are bias and variance, and since both estimators are unbiased Z is preferred to X because of its smaller variance. But in some circumstances one may have to choose between one estimator which is unbiased but has fairly large variance and another which is slightly biased but has smaller variance, and in such cases the choice may well be a difficult one. Problem 4D.3 contains relevant material.
4. D3
Other Problems 163
4D.3 Mean square error of estimators Explain what the following statements mean: (i) Y i a statistic, s (ii) the statistic Y is a n unbiased estimator of 0. Suppose now that a statistic Y is an unbiased estimator of a parameter 0 and has variance k O2 . If one defines the mean square error MSE(X) of any estimator X of 0 by
MSE(X) = E{(X  O)’}, calculate MSE(cY) minimum. Solution (i) Since a statistic is a function of observed random variables, Y must be such a function. (ii) The random variable Y has expectation 8. We are now given that E(Y) = 8, Var(Y) = k 0 2 , and require the mean square error of cY. From the expression given for the mean square error (MSE) of any random variable X ,
, where c
is some constant, and find the value of c for which MSE(cY) is a
MSE(X) = E{(X  0)’)
= E(X2  20X
+ 0’)
= E(X2)  20E(X)
= Var(X)
+ 0*
+ (E(X)}*  20E(X) + 0’.
(Ce)*
Substituting cY for X , we obtain MSE(CY) = c2ke2 +
= B2(c2k
20.~0 +
e2
+ c’
 2c
+ 1) = m e 2 , say.
To obtain the value o c for which this mean square error is a minimum, straightforward f differentiation gives
dm  = 2kc
dc
+ 2~  2,
and setting this to zero gives c = (k + l)’. (The second derivative is positive for all values of c , so the stationary value is a minimum.) Notes
(1) The problem centres on use of expectation and variance formulae, and helps to remind us of a variety of useful results. One of these is analogous to the result in mechanics that the moment of inertia of a body about a point is least when that point is the centre of mass. In probability and statistics this is translated into results that E{(X  O)’} and x ( x z  e)2 are minimised when 8 is the mean; i.e. 8 = E(X) in the first case and 8 = X , the sample mean, in the second. These results are related to the following useful identities:
= E{(X  q2} v a r ( x )
+ {E(x)  0)’;
In the case of the current problem the first of these results could have been quoted to give MSE(X) = v a r ( x )
+ {E(x)  el2,
sometimes paraphrased as: ‘MSE equals variance plus bias squared’
164 Inference
4D.4
c
(2) In the context of estimation the theory in this problem is useful in drawing attention to the possibility of using biased estimators to advantage. Elementary textbooks occasionally hint that bias is undesirable, yet in practice almost all statisticians would prefer a slightly biased estimator with small variance to an unbiased estimator with large variance; a sensible general criterion is that of mean square error.
The most straightforward example to which this theory applies is the estimation of the X2, . . . ,X, . Since variance u2of a normal distribution from a random sample X I ,
E{$(Xi
i=l
 x ) 2 }= (n l)u2,
the usual estimate is s2, the corrected sum of squares divided by n 1 . T h e choice of n 1 as divisor may give rise to worries, but is justified, of course, on grounds of bias. But if we remove the requirement of unbiasedness other possibilities open up. When sampling is from a normal distribution the random variable z ( X i  x)'/u2 has a x2 distribution on n 1 degrees of freedom, which has expectation (n 1) and variance 2 ( n 1). It follows that the usual estimator of u2 is unbiased, and has variance 204/(n1). If we now convert to the notation of the problem, 8 = u2 and k = 2/(n 1) . The solution shows that the mean square error will be a minimum for c = (k +l)' = (n l)/(n + l ) . The rather surprising finding is that instead of producing an unbiased estimator by using the divisor ( n  l ) , one might consider using (n + l ) , which would minimise mean square error. (But it is important to add that such an estimator should be used with caution; for example, it would be quite invalid to use it in the formula for a ttest, or indeed use it in place of s2 in any other standard formula!)
4D.4 Arrivals at a petrol station
A small petrol station has a single pump. Customers arrive one at a time, their interarrival times indepcndently distributed with density function
f ( t ) = AeC",
t
2
0,
for some unknown A > 0. The times taken to serve customers are also independent and follow a normal distribution with unknown mean and variance. (a) Discuss how you might estimate A , and the parameters of the normal distribution, given observations on service times for n customers, together with the n  1 intervals between their arrivals. (b) Show that, in a fixed time interval of length T , starting immediately after an arrival, the probability of no further arrivals is eChT. (c) It can be shown that the number of arrivals in any time interval of length T has a Poisson distribution with mean A T . Suggest how this result might be used to give another method for estimating A . (d) In a random sample of 1hour periods the numbers of arrivals were
12,12,15,18,11,12,14,17,10,14.
Find a 95% confidence interval for the mean number of arrivals per hour, based on a normal approximation to the Poisson distribution.
Solution (a) For estimation of A , we record a sample of interarrival times, and calculate the sample mean, y , which will give a suitable estimate for the mean of the distribution, p. Now, the relation between and A is given by
4D.4
Orher Problems
165
=
[re&‘
low+
!e”dr,
0
integrating by parts,
Since A = l/p and y is an appropriate estimate for p, we could estimate A by 1/ 7. (There are other approaches to estimating A based, like the present approach, on interarrival times, but this seems to be the most obvious.) Estimation of the parameters of the normal distribution is relatively straightforward. A sample of service times is available, and the sample mean, X , and the sample variance, sz, can be calculated easily. These will then provide estimates for the mean and variance of the normal distribution. (b) The probability that there are no arrivals in an interval of length T , starting immediately after an arrival takes place, is given by
W
Pr(interarrival time > T )
=
Jf(t)dt
T
which is e”, as required. (In fact the result holds for any time interval of length T , not necessarily starting immediately after an amval.) (c) Suppose that the number o arrivals is counted for n nonoverlapping intervals each of f length T. These then form a random sample of size n from a distribution with mean A T . Thus the sample mean, X,will provide a n estimate of W , so that a suitable estimate of A is T/T. In fact the intervals used for counting need not all be of the same length, though they should not overlap. With unequal intervals the form of an appropriate estimate for A will be somewhat more complicated, namely
i=I
iTi
’
where Ti is the length of the i th interval and xi is the number of arrivals in that interval. (d) Suppose now that A is the mean number of arrivals in one hour, and that XI.XI, . . . ,X, form a random sample o numbers of arrivals, with sample mean f Since the mean and variance of a Poisson distribution are the same, E ( X ) = A and V a r ( x ) = A h . Using a we have, approximately, N ( h , Ah). Hence normal approximation to the distribution of
x. x
and so
where z0/2 is, as usual, a percentage point o N ( 0 , l ) . Thus f
96. . where s 2 is the sample variance and equals 6.72 for the present data.96.3 for the corresponding binomial distribution problem.22.5 t 1. the interval for A is (11. In that problem an alternative solution was shown.A(z& + 2n k ) + n x2 I0.5 Notes * 2.a that the inequality is satisfied. and in the case of the Poisson distribution here we give below two other ways in which a normal approximation can be used to obtain an approximate confidence interval for A .e.4 This. and. Another possibility. which gives a closer approximation. be more appropriate here to use tg.025 = 2. in fact.5 2 1.78).166 Inference 4D. replacing by an estimator gives as an approximate interval a a x t z c v ” m .5. In this event the interval becomes (11. 13. 1.so the i. is to square the inequality s /fi which holds with probability approximately 1 .262 rather than ~ 0 . 15. For the present example the roots of this quadratic are 2n i. interval has endpoints 13.to give a quadratic inequality in A n(%A)2 5 z&A.11). as it stands.5 2 1. But the probability is 1 .40 and 15. * (1) The method used in the solution to part (d) is not the only possible one.61.15. 15.98. One possibility is to replace V% by s / 6 .) gives a confidence interval for A . the roots are 11. so that the 95% confidence interval is (11.a. so that (Al. or n A2 .0. i. However.96. The limits are therefore 13. 15.e.35). z d = For the given data.e. does not give a confidence interval for A since the variance term involves A.98).96. It would. with confidence coefficient 1 a.65. for a 95% interval.28 or (11. A. It parallels the first of two methods used in the solution to Problem 4C. since we are essentially treating our sample of 10 obServations as coming from a normal X A distribution with mean A and using the distribution of to find a confidence interval for A.40. 0 2 5= 1. A2 This inequality is satisfied if A lies between the two roots A1 and nX2 A(Z& of the equation + 2n X ) + n X 2 = 0. or 13.89. the sample mean is 13.
. To find the maximum likelihood estimate of A we treat I as a function of A . Deduce that Pr(0. Ed log l n+l.n . interval being slightly wider. when one has a random sample from a Poisson distribution. and find the value of A at which it takes its maximum value. the sample mean estimate of A is thus X. (a) Find. (See Problems 2A. In fact there are several cogent reasons for doing this. this is because for the present data the sample variance is substantially smaller than would be expected. . . . Some lie outside the scope of this book. one should use the sample mean rather than the sample variance to estimate the parameter A .334X 5 A' I 19. and let the observations be denoted by XI. O.6 and 2A. x. x 2 . * (2) If a random variable X has a Poisson distribution. but we show here that the sample mean gives the maximum likelihood estimate of A . . If the Poisson assumption is valid then this second approximate interval is probably misleadingly precise.496X) = 0. The mean and variance of this distribution are known to be A' and A2 respectively.5 Other Problems 167 Comparing the three approximations we see that the first and the third are reasonably close.05. We find log f = . in terms of A . Because of this. .. Such a process was defined formally in Note 1 to Problem 2A. We see here. while the number of events in a fixed time (a discrete random variable) has a Poisson distribution. The second approximation is the narrowest. more accurate. with the latter. X. x 2 .log ( n x i !). given the mean and a Poisson assumption. and equivalent. dA A ' and setting this to zero gives X = a i / n . but in fact it is easier. say. . then Var(X) = E(X) = A . . it is not at first clear that. . are observed is given by where the sum and product are both over the range i = 1 . . .90. so that Pr(a 5 X 5 6 ) = 0. The maximum likelihood (3) In this problem the arrival pattern of customers at the queue is what is termed a Poisson process.n A so that + Dilog A .) Suppose we obtain a random sample of size n from the Poisson distribution with mean (and variance) equal to A. * 4D.90.4D. . to differentiate logl. that in a Poisson process the gap between consecutive events (a continuous random variable) has an exponential distribution. . 2 .5 Confidence intervals for an exponential distribution A continuous random variable X with a n exponential distribution has probability density function f(x) = b+? elsewhere. from part (b).7. hence write down a 90% confidence interval for A' based on a single observation on the random variable X.x. We could d o this by differentiating 1 with respect to A . Then the probability f that these values XI.11 for a definition and discussion of the maximum likelihood method of estimation. numbers a and b such that Pr(X < a ) = Pr(X > b ) = 0.
Similarly.05. a sample mean has approximately a normal distribution. the variance of Y .996). since u = O. 2 7 8 2 ) = 0. rather than a normal distribution.0513).496X). State the Pr(0. assuming that Y has a normal distribution. and is therefore a 90% confidence interval for A'. a . ' (b) For large samples.S x x X . so that Pr(X' < 0.6.e. i. the sample mean is 10.95. regardless (almost) of the distribution of the individual obxrvations.0.'"Ib= = 0. Now the integral is = Therefore eUX" 1 . and therefore.90. as Pr(A' < X/2.168 Inference (b) Let denote the mean of a random sample of 81 observations on approximate distribution of . Solution (a) The first of the required numbers.334X) = 0. so that.8212 5 A' 5 1 . so u = A'1oge(0.2 and the sample standard deviation is 10. using the approximation.334X.996A'. approximatcly. and we thus obtain .0513) = Pr(A' > 19.05. 19. Similarly Pr(X > 6 ) can be expressed as Pr(X > 2. Find a 95% confidence interval for cr2. if Y has in fact the same distribution as X above.O513A'. is such that a J~(X)& m = 0.996A'). Hence Pr(0.95) P that j f ( x ) d x = b [.and variance A '.334X 5 A' 5 19.95. we can express Pr(X < a ) as Pr(X < 0. Here the individual observations have mean A.(0~05) 2. = 6 is such = e = 0.and hence show. Now. from the Central Limit Theorem. Use the probability statement in part (b) to find an approximate 95% confidence interval for u2.05. The random interval (O. that 4D. x and Thus The inequality in the brackets is now manipulated so as to leave K ' on the left hand side. 6 = X'log.05 = 0. we thus obtain Pr(X < a ) = Pr(A' > X/0.> X/0.496X) has a 90% chance of covering the parameter A'.0513A'.496X) = 0. (c) In a random sample of 81 observations on a random variable Y .0513K') or Pr(A.
then there is an unbiased estimator for p of the form 6 = wa1 + ( I . Show that. with smaller variance than b. with n l . in general. Substituting x ((0.96 5 A' 5 7.w ) .6 Similarly. or (70. an unbiased estimator of the corresponding standard deviation u.278x )') 0. as required.821 X10. = Xl/n1.95.278X 10. a a'). a.04 1. (1.2)2.4D.8212 )' I A' 5 1 . If = X 2 / n 2 .96 1 = 0. (1.95. and simply squaring terms in the inequality (all of which are necessarily positive) gives 5 A' 5 = But X' is the variance of Y . and s2 is the sample variance. explain why S is not.15 and x & .o. Determine the range of values of the ratio n l / n 2 for which V a r ( a ) < V a r ( a l ) and V a r ( a ) < Var(a2). n 2 trials respectively and with the same probability p of success in each experiment.821X Pr((0.2)'}. we have Pr(0. o . . and = :(a1 + show that is an unbiased estimator for p . so the interval is Finally. we find that Other Problems 169 leads to < Thus 7 6 2 n + 1. 2 7 8 2 ) = 0. where 0 < w < 1.6 Glue sniffers (a) If S2 is an unbiased estimator of the variance c2of some distribution. (1. Using the result found in part (b). (b) Let X I . ~ s= 106. then (nl)s2/u2 has a distribution with ( n 1) degrees of freedom. (c) If Y has a normal distribution. [9x 10. a somewhat wider interval than that based on the normal assumption for Y .278x)2} gives a 95% confidence interval for = 10. ~ . we are given that Y has in fact the same (exponential) distribution as X. and a confidence interval for u2 is given by x2 Now x&. 169. and substituting n = 81 gives Pr i. * 4D.X? be the numbers of successes in two independent binomial experiments. so {(0.025. looking at the other tail.2 gives for this interval Var(Y).95. if n l f n 2 .13.821x)2.e.93).278x) = Pr(04321X 0.975 = 57.63.
Given that the overall proportion of sampled pupils who answered ‘Yes’ was 0. i.p ) / n 1 and Var(p2) = p(1 . then g ( X ) will not.) will be satisfied if i. from standard results. If the pupil tossed a head the question to be answered was ‘Is your birthday in April?’. Thus. estimate the proportion of pupils in the school who sniff glue. be an unbiased estimator of g(8). E(d) = so that d is an unbiased estimator of p . except when g ( X ) is a linear function of X or. in general. if .e. This part of the problem is concerned with a special case of this general result.p ) . whereas on a tail the question was ‘Do you sniff glue?’. so Var(p^l) = p(1 . if X is an unbiased estimator of 8.170 Inference 4. so that {E(S)}* < E(S2) = a2.p ) and Var(X2) = n Q ( l . so are for independent random variablcs we have a*. if X is a random variable. It is known that the proportion of pupils with April birthdays is 0. Solution (a) In general. This will be. then E&(X)} # g{E(X)}. D6 (c) In order to estimate the proportion of pupils in a secondary school who were glue sniffers. “1 4 “l”2 or . without divulging the outcome to the interviewer. Further.p ) l n 2 . Because and By the additive properties of variance the two experiments are independent. when X is a constant. to answer ‘Yes’ or ‘No’ to one of two questions. But v a r ( s ) = E(s*){E(s)}~. (b) Quoting results for binomial distributions. Consider the variance of S : Var(S ) = E[{S E(S )j2].e. depending on the outcome of the toss. In this special case the result can be proved directly without much trouble. Var(X1) = ti 1p (1 . it was decided to interview a random sample of pupils. greater than zero unless S is a constant. On such a sensitive subject a direct question would be unlikely to elicit a truthful answer. and g ( X ) is a function of X . E(X1) = n p and E(X2) = n p . E(S) < u.08. where X = S2 and g(X) = fi. trivially.11+”2 < . We thus find that E(id and hence that = W i d = Pt +{E(dl) + E(d2)) = +(P + P ) = P . Hence the required condition Var( a ) < Var( a. so each pupil in the sample was asked to toss a fair coin and.1.
and has other desirable properties. consider d = wp 1 + ( 1 . Then Pr('Yes') = Pr(birthday in April) Pr(Head) + Pr(sniffs glue) Pr(Tai1) 1 . for any other value of w .w)*var(&. the result is an intuitively reasonable estimator of p . "2 1 3 "1 To show that thereis an unbiased estimator of p of the form given with a smaller variance than p .08 = 0.05.e. where 0 5 w 5 1.. so that d is seen to be unbiased.e. But the estimator corresponds to w = so that V a r ( p ) < V a r ( a ) unless w = i. i.08. since d l and d 2 are independent. and solve the equation for p . D6 Other Problems 171 This condition reduces to " 1 < 3 n 2 . V a r ( a ) < Var(a2) if and so the two conditions quoted both hoId if nZ/nl < 3. Thus We see from this that V a r ( p ) depends on w only through the first term in the square bracket. Then E(d) = w E ( d 1 ) + (1 .w ) p 2 . a + 0. 0. which contains a squared component and is therefore at a minimum when w' = n I/(n + n?). (In fact it is unbiascd.5. i. where A l and A2 are respectively the events that the first and second questions arc answered. <<3. or n l / n 2 < 3.4.06.05) 8 0.x. i. Similarly.0. . Var( d ) will be greater. (c) The probability that a pupil returns a 'Yes' answer is Pr('Yes' ~ A 1 ) Pr(A 1) + Pr('Yes' 1 A2) Pr(A2).) We thus solve 0.06. Let the proportion of pupils who sniff glue be p .1 + p x 1 = e + 1 10 2 2 2 20' If we equate this expression to the observed proportion answering 'Yes'. 0. or 6%.w ) E ( i 2 ) = wp + (1w)p =p . = = 2 ~ ( 0 . For its variance we have var(d)= w2Var(il) + ( 1 . Thus the proportion of glue sniffers at the school is estimated to be 0. unless n l = n2.
in part (b). If only one or two are defective. results for different bolts being independent. there are no more than two defectives. and if three or more are defective it is rejected.15.w)/n2. as required. (a) What is the probability of a decision being made at the first stage? (b) Find the probability that the batch is accepted (i) if p =0. a further sample of 10 is selected. A less formal way of looking at the problem is to argue that since the two questions have the same chance of being answered. In the inspection.(0. questions similar to all three parts have appeared as parts of (separate) public examination questions in the U. A large batch of the machine’s production is inspected by a customer in order to determine whether the batch should be purchased. in particular. in total. the overall probability of a ‘Yes’ answer is a simple average of the two individual probabilities.K. since to many people differentiation is the most straightforward way of identifying a minimum. n 1 and therefore a minimum.08 = . but the solution given is the most natural one. as before.7 Quality control for bolts A machine produces bolts. w = nl/(nl + n 2 ) . and for each bolt there is a probability p of it being defective. whose solution is.n l w Equating this to zero gives 2w/n1 = 2(1 . D7 Notes (1) All three parts of this problem are difficult.06. and the batch is accepted if.05. which gives = 0. Setting this average equal to the observed proportion giving a ‘Yes’ answer. and solving. 10 bolts are selected at random and examined: if none is defective the batch is accepted. This is in some respects a more natural approach.172 Inference 4. (2) There is an alternative derivation of the value of w which. (ii) if p =0. ti2. parts (a) and (c) are nonstandard and somewhat openended. However. 4D. gives 0. Further. The stationary value for w is (3) The final part of the question is openended in several respects. and we thus obtain the equation . (4) The intcrvicwing technique discussed in part (c) is known as the randomised response techriiyue. we can differentiate again to obtain which is necessarily positive for valid values of p . minimises V a r ( b ) .5. If we differentiate the expression for V a r ( d ) we obtain dw dw n2w = n 1 . . as expected.1 +a). Another version of the technique is described in Problem 1B.
. If B1 occurs. (Often the sample sizes will be somewhat larger. i = 0 .)} + Pr(Bz)Pr(Bi) = (1p)10{i+10p(1p)9 + 145~~(i~)*}. Pr(B1) = l O ~ ( l . ) The probability of batch acceptance. and to let A denote overall acceptance.05 gives Pr(A) p = 0. i.e. and can offer significant savings in effort over the simpler attribute sampling schemes. and if such a iis measurement is made it offers opportunities for using extra information in a sampling scheme.7 Orher Problems 173 Solution (a) A decision is made at the first stage unless the number of defectives is 1 or 2. Pr(A 1B1) = Pr(Bi)+Pr(B. while when (1) While the problem contains relatively straightforward numbers the setup described is quite realistic.15 the probability becomes 0. B 2 and B 3 are mutually exclusive and exhaustive. 1 . and B .). B 1 .) Clearly Bo. T h e conditional probabilities are quite easily found. respectively. C . of the procedure. without excessive cost. . Similarly Pr(A lB. In many practical cases items will be acceptable if a measurement lies within certain limits. Such schemes are described as sampling by variables. and Bo. in that a double sampling scheme of this type is quite frequently employed in quality control work. items sampled are classified only as defective or otherwise. (We denote the corresponding events for the second stage by . I . B2 and B3 denote. and tables containing details of these schemes are published by various bodies concerned with manufacturing standards. Trivially Pr(A IBo) = 1 . and we have Bi 3 Pr(A) = ZPr(B. and indeed in most elementary problems involving ideas of quality control. often called tolerance l m t . B . Pr(Bo) = (lp)*'. i =O Further. 8 at the first stage. (2) In the current case.)Pr(A [Bi).) = Pr(Bi) We thus obtain Pr(A) = Pr(Bo) + Pr(BI){Pr(Bi) + Pr(B. 2 and 3 or more defectives at the first stage. B. the resulting sampling schemes are described as sampling by ariributes. and Pr(B3) could if needed be found by subtraction. .4 D. Pr(B2) = 45p2(1p)8.93. treated as a function of the proportion p of defectives. and in practice an important aim is to achieve a satisfactory O.C. say. 2 . so the conditions for the law of total probability are satisfied. then acceptance follows if the second sample contains no more than 1 defective. B l . Considerable research effort has gone into devising suitable schemes for all situations. 0 . is termed the operating characteristic or O . for example the British Standards Institution. Substituting p = 0. with a further 15 if a second stage is needed.p ) ~ . = since Pr(Bz*) = Pr(B. for the current sampling scheme with that for a scheme which samples. Pr(A l B .C.44. Notes 0. T h e required probability is thus (b) It is easiest to work algebraically in the first instance. in practice one might wish to compare the O. ) = 0. and in many cases one will be interested in rather smaller values for p . 3 . For example.).
in working units. or 1 4 . In order to check that the production process is running satisfactorily. the mean is 289/20 = 14.) Similarly. showing both 2. 0.6 2. From tables of the standardised normal distribution.778 = 9. where the widths are measured from the specified value in units of 0. .5% control limits (warning limits) and 0.6 5. and here one out of 20 so far has done so. we obtain x = 3.9995.1. in units of 0.9875. convert the average range into an estimate of the standard deviation of the widths of the articles in the population of manufactured articles.2 1.6 0. and none outside the 0.8 A control chart for massproduced articles Articles are massproduced to a specified width of 0.1% (action) limits. and conclude that the control limits are 3. on average.15cm.4299 x 14.290X2.242) = 0.2 Sample range 18 14 6 6 11 17 13 9 14 8 15 18 16 15 16 17 18 19 20 2.5% control limits are at a distance 2.0001 cm. (If the range of a random sample of size 5 from a normal population is r .o 2. Inspecting the diagram.174 Inference 4.O 1. provides some warning that the process might be getting out of control. so that 2. therefore. even if the process were under control.4 2 . What would your conclusion be for the data provided? Solution From the values of the 20 sample ranges given.242x2.6 0.5% (warning) limits. and no immediate action would be taken.4 2. Q(2.14 either side of zero.4 1. but the evidence is far from conclusive. This. random samples of five articles are taken at regular time intervals. Sample number 1 2 3 4 5 6 7 8 9 10 Sample mean 1. The following are the means and ranges of the widths in 20 consecutive samples. 0.45.290.15 cm. in the working units. Now.2 3.4299. we sec that just one mean (from sample 18) falls outside the 2. 4 5 lOP4cm. The average range of samples of size 5 (assuming that they are taken from ~ a normal population) provides an estimate of the population standard deviation equal to 0.4 4. we would expect.2 Sample range 32 21 19 Sample number 11 12 13 14 Sample mean 1. one in 40 of the sample means to fall outside the 2.778. The control chart for means is therefore as shown in Figure 4. 2 1 2 1 / f i = 2. then an estimate of the standard deviation is given by multiplying r by 0.8 18 11 15 8 Using these results. Hence an estimate of the standard error of the mean of a sample of size 5 is 6 .6 7. and the widths measured.4 2.1% control limits are obtained by scanning tables of the standardised normal distribution to find the value x such that Q(x) = 0.o 3 .) Hence draw a control chart for means.2121 X 10p4cm. (Recall that the measurements quoted are the discrepancies between the actual widths and the specified value.0001 cm. D8 4D.23 either side of zero.4 3.2 1.5% limits. Describe how this chart could be used to determine whether or not the manufacturing process is under control.45x lop4 = 6.1% control limits (action limits).778 = 6.
for various sample sizes. to provide an additional check on the process. for example. (3) We note at the end of the solution that the pattern of the points should be examined to test for a trend or some other regular (perhaps cyclic) variation. If this is done for the data provided. the conclusions are much the same as for the mean. R. The techniques of time series analysis provide us with appropriate tests. when joining them by lines obscures details and is thus a hindrance. by H. whether the means have an upward or downward trend. rather than fluctuating randomly. generally. 1 the points representing the sample means are joined by straight lines. .8 Other Problems 175 12' Sample number _Warning limits Action limits Figure 4. (2) A similar control chart could also be constructed for the ranges. are tabulated in some books of statistical tables. d o not seem to be discernible here. Such trends. This is an optional feature of a control chart. or other obvious patterns. One such source. But we have seen charts. in particular the statistic S described in Problem 5D. from which the value 0.4299 was obtained. Notes (1) Multiplying factors for converting sample ranges to estimates of population standard deviations.1 Control chart for Problem 4D. (4) In Figure 4 .1 over time to check.8 One should also examine the pattern of the points in Figure 4. including these lines can make any trend or cyclic pattern easier to see.4D. is Elementary Statistics Tables. especially when the number of points plotted is large. Neave.2 can be used to test for a trend here.
binomial or Poisson. like a variance. and both can be regarded as random variables. 5A Regression and Correlation People often confuse the techniques of regression and correlation. in the same sense as the ztest for a normal mean. One is that a correlation coefficient is very frequently. The problems on correlation we have included here represent the ‘nursery slopes’ of the study of correlation. however. appropriate to the somewhat unrealistic situation when the variance is known. But it must be said that misuse is very frequent (see. rather than a single sample from a bivariate distribution. If the aim is simply to assess the strength of the relationship between x and y . But their essential difference lies in the process of specifying an appropriate mathematical model for the data. little is clearcut in statistics. . There is. This reflects two facts. regression is used when x and y are nor to be treated symmetrically. a ‘nuisance parameter’ which has to be estimated but is not itself of particular interest. and we therefore keep this general introduction to a minimum. the problems in this chapter are all concerned with inference from sample data. perhaps when only one is random and the other is under an experimenter’s control. Of course. usually the normal. But the data in these problems had very little structure.4). and the inferences required concerned the parameter or parameters of the distribution concerned.5 Analysis of Structured Data As in Chapter 4. and there are circumstances in which both techniques can legitimately be used. seek very little in the way of understanding on the part of the candidate. Problem 5A. (In the twosample problems the allocation of observations to samples would. have to remain the same!) By contrast. and had the values been presented in a different order the conclusions would have remained the same. then correlation is appropriate. In that chapter we discussed problems in which one or two random samples had been selected from some distribution. little in common between the various sections of this chapter. The majority of examination questions on correlation. of course. and either could be applied baldly to many data sets. preferring to introduce the individual sections at greater length. Both apply to situations in which measurements are recorded on two variables. and for some reason this is particularly true of correlation. The other is that sitiations where correlation coefficients are of interest to the statistician typically involve. paves the way for the more useful r test. they are rather similar. one or several samples from multivariate distributions. Mathematically. all that is required is familiarity with computational techniques and a few mechanical procedures for hypothesis testing and interval estimation. and the problems are thus slightly more advanced than most of those in Chapter 4. by contrast. x and y say. for example. at the level we are considering. the data in each of the sections of this chapter present a welldefined structure.
although in statistical practice one is usually faced with many possible variables. explanatory variables.K. where in each case the sum is over the range from i = 1 to n . The three major quantities needed are the corrected sums of squares and products. . of retirement pensions (thousands) I 1966 6679 1971 7677 1975 8321 1976 8510 1977 8637 1978 8785 1979 8937 Plot the data and fit an appropriate leastsquares regression line. the aim being to determine which. i = 1. the numbers (to the nearest thousand) receiving U. But such detail does get tedious. which we denote by S. The corrected sums of squares . are the S S familiar quantities which appear when we calculate a sample variance. are similar. are the sums %i and X y i . amongst others. (XI. and . . help to explain the observed variability in the dependent variable y . we received a report of a survey into the annual income of statisticians in the United Kingdom.1 The numbers of pensioners The data below show. while the corrected sum of products is s . we confine ourselves to cases in which there is a single explanatory (or ‘independent’) variable. n = X(Xj i=l X)(y. For example.1 Regression and Correlation 177 In many respects the same is true of the regression problems to be found here. y i ) . . the sums of squares % and ZyF and the sum of : products Z r i y i . for n example. . including explicit limits on summations. 1981. retirement pensions. it is natural that we should present these using a common notation. and comment on your findings. drawing the line on your graph.5A. as this note was being written. 1985 and 2000. . . In the first few problems below we give the calculations in detail. The expressions for S . .s but is almost always calculated as n = E(x. and sex. i=l x )2.7 ) the former being the definition and the latter being more suitable for calculation. i=l 5A. In that survey annual income was the dependent variable. and with each step given. z x i y i can in many circumstances be shortened to Xxy without loss of clarity. age. and qualifications were. . thus. y2). y n ) . (x2. We noted above that the two techniques are similar mathematically. Is there convincing evidence that the number of pensioners is increasing? Use your results to estimate the expected number of pensioners in 1979. . for the years given. . . . n . It follows that the basic calculations required from a sample of pairs ( x i . ( x . period in current job. if any. . S and S . We suppose that the data available consist of n pairs of observations. number of jobs held. 2. location (in London or elsewhere). yI). and in later problems we omit some intermediate details of calculations and streamline notation somewhat. Year No. The former is defined as . Since they share the same basic calculations.
E(p) = 0 and Var(p) = u2/S.693. Under this null hypothesis.1 1965 1970 1975 1: 3 0 Data on retirement pensioners. dividing the result by 100.21 6 5.21 1 3. Year Pensioners (x) (r) 4 13.1 Solution I n 0 . that t h e slope p of the true regression line is zero.247  The estimated regression line has gradient 6 = 15'46 . 125. To calculate the leastsquares regression line for numbers like the ones above it is simplest to use some form of coding.1. that is. an arbitrary but fairly sensible coding scheme subtracts 1970 from the year and 8000 from the number of pensioners (itself in thousands).37 8 7.98.462 = 377.1. i1 7 7 x y i = 15.714.10 7 6. i=l I E x i y i = 287.= 32. and its intercept is given by ~ + 1 . 7 2 9 ~is shown in Figure We are asked to consider evidence that the number of pensioners is increasing. 7 5. 7 Sfi = 411. and to d o this we examine the null hypothesis that it is stationary. from Problem 5A.1. The data are then as follows.. i=l We next obtain the corrected sums of squares and products: S . n f $ 9Ooo 8500 8Ooo750070006500KEY Least sauares line I 1 m! Figure 5.247.178 Analysis of Structured Data 5A. 7 15.714 The line y = 5.00.37 The following basic calculations start the analysis: E X .(1'729x32'0) = 5. together With the data given in the problem. = i1 7 7 2~:= 411. i=l ZX? 272. where a is ' .693 = 217'306 .23 5 3. = 272.85 9 9.46.00  32.02= 125.103.0.1 The required plot is shown in Figure 5.729.
5. and is thus inappropriate here. and the evidence that the number of pensioners is increasing is therefore compelling. 9332. we note that o2 is estimated by Regression and Correlation 179 1'729 = 35. usually termed residuals. are also given in the table and are plotted in Figure 5. 20. with both x and y in coded form. In the table overleaf. linear and quadratic. (All quantities are coded as in the solution. By contrast.1% critical value. (Note 2 to Problem 5D. we show. 9.) . although a satisfactory test would require far more observations than have been given here. Thus all the predictions seem likely to exaggerate the actual number of pensioners.729~.2950/125. respectively. together with the predicted. or fitted.4 for the values in the original scale.89.164 for coded values of y and 8986.) A simple way of assessing the fit of an equation is to calculate the differences between the observed and fitted values. + 1. 13. for each value of x .235 and 46. Correlation is valid when both measurements are on random variables.4. One would couple these findings with the curvature evident in the plot.5A.693 + 1.864. A natural comment to make is that the results show the dangers of extrapolation. the residuals from the quadratic model appear haphazard.872~ 0 . for a onetailed test.217 gives a very much closer fit to the data. but readers will see that the bcstfitting parabola y = 5. The regression line found is We thus compare y = 5. by comparing the fit of this model with that of the quadratic regression model y =a + px + yx2 + E . Notes (1) In many cases there can arise legitimate doubts about whether correlation or regression is the more appropriate technique. but predictions for further ahead are increasingly implausible. it appears from the latter that the growth in the number of pensioners is not linear. but started to tail away before 1979.321. the 0. Substituting in the regression equation gives. but that a curved relationship is indicated. is 5. 10023. however.2 shows how one could approach the testing of these residuals for randomness. Predictions for y are required for values of x coded as 9.1 the error variance. 15 and 30. no such doubt is possible.2. where 4 represents a random error term. 0 2 9 1 ~ ~  We d o not have the space here to discuss fully how to fit the latter model. 12616. the observed value of y . it might be reasonable to try to predict for a year or two. values given by the two equations. When one of the variables is time. (2) Most statisticians would feel that the relationship between y and x is not adequately described by a straight line with random scatter. these discrepancies.1.7 degrees of freedom. and in itself indicates that such a model is inadequate. A common way of proceeding is to test adequacy of the linear model y =a + px + Q .68 with critical values of the tdistribution on 5 V0. the later ones grossly so. 1 1 . The plot of the residuals from the linear equation has a distinctly curved shape.
044 0.370 7.285 0. f and y to minimise the expression l C ( y .494 0.75 D = 0.261 0.422 0.965 2. it was used in Note 2 to fit the seconddegree model by choosing u.y x i y .966 6.500.00 0.272 Residuals (Linear) (Quadratic) 0.50 r r l Linear Quadratic n 0.75 1 . Since the estimates are found by minimising the sum of squares of the discrepancies.089 0.850 9.204 0.oo I i =1 and the bestfitting line then has slope and intercept given by the expressions at the start of this note.370 l .407 8.602 0.894 9.210 5.134 0.170 3.375 3.949 4.250.145 0. i=l n + + I + 0 0 + + + .459 7.180 Analysis of Structured Data 5A.Q .040 0.1 X Observed Y Fitted Values (Linear) (Quadratic) 12.098 4 1 5 6 7 8 9 13.414 4.100 6. . o o : p_ 0.210 3.864 13. .px.037 0. The method is widely used in regression and analysis of variance problems. they are naturally known as least squares estimates.230 3.735 0.135 9.25 _______8 0 0 0 0.678 6.608 3.
Y q = 10 596.61 44.7 16.06.5. Since the suggestion that unemployment depends on subject balance within a university implies that we have one explanatory and one dependent variable.0 24. regression is the appropriate technique to use.37 64.0.35 20. .82 53.4 19.3 shows a plot of the data.5A. Calculation of sums.6 13. with arts percentage denoted by x and percentage unemployed by y .09 75. the results obtained appear on the following page.96 64.81 32. Do you conclude that the suggestion is justified? Percentage of Arts students Percentage of graduates unemployed 34. where the sums are taken over the nine universities. We can now proceed to estimate the gradient f3 in the regression line y = a + p x .00 69.K. sums of squares and sum of products gives the following: n = 9. Zy = 182.2 Regression and Correlution 181 5A.0 25.7 24.09.1. We employ a natural notation.5 16.2 Graduate unemployment It has been suggested that the unemployment rate for graduates of U. Ilr = 504. in which there is a clear indication of a relationship between the two variables.7 21.2 . Zy2 = 3830. A random sample of 9 universities (excluding any with medical students) gave the following results. universities may depend on the proportion of Arts students in those universities.5 Solution x X X x X x x X x 01 I I I I I I I 0 10 20 30 40 50 60 70 t 3 A m students (%) Figure 5 3 Unemployment percentage data of Problem 5A.05 65. We first calculate the corrected sums of squares and products. Figure 5. Ilr2 = 30 162.
79. we conclude that the proportion of Arts students is relevant to the unemployment rate. u2/S. but the practical problems of interpretation in problems like these are often subtle and easy to overlook. therefore. (Section 5B deals with these topics.50 respectively.) = a+PxI.135 and S = 1931. rejecting the hypothesis that p = 0. with percentage points of the tdistribution. including the following: (i) normality. . by analogy with inference about the mean of a single random sample. fi = 397. .06X182. and. i = 1. Employing the model E(Y. . Comparing this with tables of the . with different Y s independently normally distributed.) For the present case the table is as follows.166. .=& = = cx2 10596. In the present case there are many questionable elements.79 = 0. To determine whether the dependent variable y is related to the explanatory variable x we need to test the hypothesis H o : p = 0. The result is thus significant at the 5 % level. .166/1931. is it plausible? (ii) linearity of relationship: how plausible is that? (iii) is it reasonable to judge the relationship from records of 9 of the 18 universities without medical schools? (iv) would there not be many more possible explanations for variations in unemployment rates which d o not directly involve the proportion of Arts students? * (2) The calculation of s above is a little cumbersome.0  s . . tdistribution on 7 degrees of freedom we find that the 5% and 1% critical values for a twotailed test are 2.36 and 3.79 9 5A. so that the statistic is 2. though not at the more stringent 1 % . and O2 = 30 1 6 2 5  The estimate of P is.1 = 397. 9 = 1931. we obtain We thus test the hypothesis p = 0 by comparing this quantity.990.2 S. . . and the presentation of an Analysis of Variance table often helps to formalise the process . 2 . n . s 2 = 9. and to do this we need to specify the statistical model more precisely. s= Notes (1) The mathematical aspects of the solution above have been written out in some detail. in which p is replaced by the null hypothesis value 0.and indeed illustrates the links regression has with oneway and twoway analysis of variance. 2 .2056.N ( P . where and. . i = l . The unknown variance u2 is estimated conventionally by s2. Here 0. we then obtain fi .). n ..182 Analysis o Structured Darn f 504.2056.) = u2. Var(Y.
so that the regression line is y = 8.0 13. and not the appearance of the data.7.30 21.21 19. with its use of ‘depend’. It is bad practice to decide which test to perform after looking at the data.51 20.65 20. 2 0 5 6 ~ . and are not competitors.939.81 23. W e see that the tests in the solution and in this note are identical.43 25.55 19.90 20.56 20.95 20.3 Regression and Correlation 183 Source Regression Residual Total Degrees of Freedom Sum of Squares Mean Quare 1 7 8166 63. 5A.35 22.85 19. justifies the employment of a twotailed test. But it is important to remember that it is the intention of the problem. X Y1 Y2 Y3 Y4 10.09 22. it is worth noting that once has been obtained one can straightforwardly calculate & = 7.15 19.43 20.0 4 *O 12.56.73 20. and that a onetailed test coulg be justified. and then to justify the decision taken.53 21.15 22. A sensible approach to a problem of this sort is to try to take a reasonable view df the intention o f the problem.) (3) The wording of the problem.36 22. one for each experiment. it is a t least arguable that one is not really interested in a negative regression coefficient.28 21.0 6.0 5. Yet these matters are in many cases not all that clear cut. Since ‘everybody knows’ that Science graduates are more employable than Arts graduates (a point hinted at by the exclusion of universities with medical schools).63 18.23 . In this case the tstatistic is 2.35 18. in words.0 7.72.31 20. and comparing it with tables of the Fdistribution on 1 and 7 degrees of freedom.27 20. (Mathematically.80 21.135 = 8.38 23. while there are four columns of yvalues.72 + 0 .19 20.25 22.72 20.0 9 .70 20.135 8 The estimate s2 of o2 is then the Residual Mean Square.0 8. the square of a random variable with Student’s tdistribution has an Fdistribution.94 14560 81.o 11. (4) While it is not required for this particular problem. which matters. whose square is 8.0 21.66/9.66 9.01 22.40 21434 21. onetail or twotail.90 21.6X = 20. and the test of the hypothesis p = 0 can alternatively be performed by taking the ratio 81.25 17.84 20.31 20.15 22.05 20.94 19. as we see in Note 1 to Problem 2B.55 19. which were conducted with the objective of investigating the effect of the value of a variable x on the associated value of a variable y .939.5A.3 Regression for several sets of data The data below are claimed to relate to four experiments.19 .2056 = 8.9898.0 14. In each experiment the same values of x were used: thus only one column of xvalues is given.66 21.01 X0.26 20.
11 EX. valid for the data in hand. I =1 I xy. s 1001.85 11 = . For bivariate data. are the same. at least approximately.452 S.45 = 30.0  99.0.02  We continue to find estimates of the slope p and intercept Q of the regression line as follows: 230.0x230. When we turn to the other three data sets. 11 = 2104.0 11 = 18. In such a case the method of least squares.0. The other three data sets.^. Corrected sums of squares and products are then = 110. The plot of the second data set indicates a strong curvilinear relationship: a possible analysis for this data set would be one in which the regression of y on x is quadratic (see Note 2 to Problem 5A.__ = 22.184 Analysis of Structured Data SA.’ =I 4850.) Plots of the four data sets are presented in Figure 5.374. When contemplating any statistical analysis. and therefore of the whole analysis.8.0. Notes (1) The fact that all four experiments yield results leading to exactly the same analysis is clearly suspicious.. it is important to check that the assumptions underlying that analysis are.Iy = 2104.. we find that the results of the basic calculations. through the application of inappropriate methods.= 4850. The plot of the first data set reveals nothing untoward. the other being far above the line.3 For each set of data: (i) obtain the equation of the least squares regression line of y on x .45. I =I 11 =1 11 = c x 1 2= 1001. and comment on the results.3015. Solution (i) Our basic calculations for the first set of data are as follows: Ex. 11 230. S. (i.28X 99. 1 11 11 = 99. (ii) plot the data. is inappropriate.1).3015 .43. which we have used to fit the regression line.4. The appearance of the fourth data set suggests that the variability of y increases with the corresponding value of x .85. where an analysis based on correlation or . 11 99. since it attaches equal weight to all observations in fitting the regression line. The data in this question have been artificially constructed to illustrate the point that an unthinking analysis of data can. x y l = 230. Such an observation (an outlier) merits our attention: in this case the best approach might be to reanalyse the data with this observation deleted (see Note 3). lead to conclusions which are seriously in error. result in plots which indicate that the statistical analysis we have performed is inappropriate.45 a=11 0. however. . In a plot of the third data set all but one of the points lie close to a straight line.
as can easily be seen by superimposing the regression line y = 18. or to F.4 Plots for data of Problem 5A. But when outlying observations are present. whereas results for the original data sets. 2 . through experience.3 (a) y 1 against x (b) y2 against x (c) y3 against x (d) y4 against x regression techniques is contemplated. but by reporting separately on them and then analysing and reporting on the remaining data in the usual way. (2) The problem could have asked for a more detailed analysis. each offending observation should be scrutinised to see if. of course. however. while in close agreement.3 Regression and Correlation 185 Y Y 3 0 X 5 Y Y 20 5 (C) 10 Figure 5. We frequently resolve this difficulty not by discording any outliers. but. Anscombe in an article published in 1973 in The . however. in view of the artificial nature of the data. are usually something of a luxury. The same would apply if we were to set a problem involving the production of (product moment) correlation coefficients for the four data sets.1 and 5 A. It can be instructive to conduct a class exercise in which students are each given one of the four data sets and asked to fit a regression line. Before this is done. a simple plot is an important preliminary to (or part of) the analysis. It is worth remarking. were not identical. and that some have been doing an inappropriate analysis. in which plots. (3) The problem of outliers is encountered in all areas of statistics. the value of plotting data before any further analysis. It is not good practice for a statistician to discard observations just because they ‘don’t look right’. but when they plot the data they find that they have been analysing different sets. J. Thus they learn. and not just in regression. for example. unless specifically asked for. involving questions such as those posed in Problems 5A. including them in a standard analysis can render that analysis meaningless.5A. (We are referring here. Comparing numerical results they should find that their answers agree exactly.) Three of the data sets in this problem are based on sets originally presented by American Statisticiarl These have been modified so that the basic calculations for regression yield precisely the same results for each data set. rather than to the more artificial process of answering examination questions. that if we had asked for any further analysis along these lines the results would have been the same for all four data sets. there has been a simple error in transcribing a figure (in which case the error can be corrected and the data reanalysed). we did not feel this was justified. to ‘real life’.43 + 0 2 8 x on a scatterplot of y 3 against x .
75 corresponding to x = 19.1 3. 18. (measured as a percentage).74. a change in this value would noticeably alter the fitted line. Our objection here is to the design of the experiment giving rise to the data.6 1. a further objection to the design is that it does not permit us to detect situations.186 Analysis o Structured Data f 5A.) 5A.4 7.4 9. The observation corresponding to x = 19 is influential: whereas even a quite large change in any of the other yvalues would have only a slight effect on the fitted regression line. In the case of the third data set in this problem the picture is sufficiently clear for no formal test to be necessary.98. . Sometimes such scrutiny reveals nothing. 22.13 and 20.J (a) Calculate the product moment correlation coefficient between variables x and demonstrate that the value you have obtained differs significantly from zero. while the writers of a later letter maintained that those of x and z showed clearly the effectiveness of Scottish dysentery in keeping prices down. in which a standard analysis would be inappropriate. X Y Z 1967 1968 1969 1970 1971 1972 1973 1974 1975 2.2 4.32.70. 1977. z : The number (in thousands) of cases of dysentery reported in Scotland during the previous year.0 4.8 4. Though this data set is less well suited to a class exercise of the type described in Note 1 .3 4. If this course of action has to be considered. 20. it illustrates another important statistical point.5 .K.20. 21.4 22.4 Inflation. T h e writer of one letter suggested that the values of x and y demonstrated clearly the effect of the money supply on inflation. money supply and dysentery The table below gives the values of the following quantities over the years 19671975. and (b) Do the same for variables x and z . 19. 21. Thus the accuracy of our estimates depends crucially on the reliability of a single observation. and an eleventh yvalue of 23.7 1. (c) These data were presented in letters which appeared in the correspondence columns of The Times in April. (4) A fifth data set which gives precisely the same results as the four in this problem has ten yvalues.66.44.3 3.0 1.8 11. in which case there may be no alternative to deleting (and reporting separately on) the offending observation(s).2 1.3 7. (Note that there is no suggestion that a standard regression analysis is inappropriate for this data set. Comment on these claims.1 9.5 3. all corresponding to x = 8.4 6. indeed.4 23.2 4.5 4. 19. y : The excess money supply two years previously (measured as a percentage).1 24.7 5.2 16.7 5. informal judgments as to whether suspiciouslooking observations are outliers or not may be supplanted by formal statistical tests designed for this purpose.91. x : The annual inflation rate in the U.4 see if the experimenter has any explanation for the discrepancy (in which case the observation might be omitted from further formal analysis). such as those illustrated by the second and fourth data sets. 20.62. 21. this would generally be regarded as an unsatisfactory state of affairs.9 7.
rather than ‘inflation causes y ’ . uhile not necessarily disagreeing with the suggestion of a link between the moncy supply and the rate of inflation.7977. we find that it exceeds the (twotailed) 1% point (3.848. To draw attcntion to the fact that the misuse of statistical techniques (and correlation in particular) can lead to ‘proofs’ of erroneous hypotheses. 2’ = 1166. = 363.. we reject the hypothesis of zero correlation in the underlying population. Notes ( I ) In corrclation studies we treat the two variables syrnmctrically.38 and Cxz = 234. if tables o critical values of the correlation coefficient are available. (c) In linking the rate of inflation to the reported prevalence of dysentery in Scotland. again.0. New quantities required are Z z = 31.0 . that the critical value of the correlation coefficient based on a sample of size 9 for a twosided test at the 1% significance level is 0.07 and Sg = 376. that f our observed value of r. the authors of the second letter. If tables o critical values for the correlation coefficient are not available.. d363. have been calculated already and details will not be repeated here. we calculate t = 0. leading (along with results obtained earlier) to S .866 1. = 13.72. testing at the 1% significance level.94.. not necessarily the particular quantity discussed above). being less than 0..0 :).4995.01  To demonstrate that this value differs significantly from zero we note.80.84S2 Comparing this value with critical values of the tdistribution with 7 degrees of freedom.4995). since ~t 1 > 3. indeed.  Once again. Here we arc tcrnpted to intcrprct the association i n tcrms such as ‘ y causes inflation’.94. We are dealing with observational. Zy = 84. we find . Recalling that S = 363. Zz’ = 121.SA. they produced a statistical argurncnt leading to a conclusion which was clearly ridiculous.578.94 x 542. = 60.7977. leading to the conclusion that ‘y causes inflation’ (where ‘y ’ represents any possible cause. however. is significant at the 1% level. 60.0. if appropriate tables are available. Sn = 542. likely) that both y and the rate of inflation are affected by other factors. we calculate f 1. some quantities. (b) Our basic calculations follow the same pattern as that followed in the solution to part (a).86 lead to corrected sums of squares and products S. is that the demonstration of associalion bctwccn y and inflation docs not imply that y in fact causes inflation. that the correlation coefficient between x and z is r. 363.06 .866. Zy’ = 1335. were making the point that the conclusion of the earlier letter should be treated with extreme caution.848 7 = 4.= 0.5.2. The correlation coefficient between x and y is therefore rT = 376.61. based on data such as these.06. . and all that wc can demonstrate is an association.94x 13. data and it is possible (and.80 .239. Alternatively. The fundamental error in arguments. = t = 0. mcrcly bccausc the thc observations of y precede the corresponding obscrvations of the inflation rate.866’ .4. Since our observed value of rq exceeds this critical value it may be judged to be highly significant.43 and L y = 1174. rather than experimental. we judge our observed correlation to be highly significant.22 and S .0.22 As before we note.4 Regression and Correlation 187 Solution (a) The basic calculations Zx = 85.
see Note 1 to Problem 5A.875.5 56. This transforms to a 1value of 7 = 3. 0.1 and 5A. since we are dealing with observational data.5 Correlation between height and weight The heights (in cm) and weights (in kg) of thirteen male and ten female students are given below: Height Male Weight Height Weight Height 170 165 168 163 160 155 168 Female Weight Height Weight 54. agrees with the critical value obtained from ttables. we fit a regression line with y as the dependent variable and x as the explanatory variable.0 66. strictly speaking. and the aim of regression is to e. to the accuracy we are justified in claiming given that the value we are transforming is accurate to four significant figures. since we d o not have a random sample from a bivariate distribution. however.0.0 72. A partial demonstration of their equivalence is possible if we consider. A test based on the z transformation of the correlation coefficient is also possible: however. indeed.0 59. Time.188 Analysis of Structured Data SA. plotting the variables against time. (Calculating a correlation coefficient between the rate of inflation and time a t the end of the preceding paragraph is also.7977.7977* which.94 which.4997 0 .5 64. we obtain (as in Problems 5A. 5A. provides a better ‘explanation’ of variations in the inflation rate than either of the other variables: the correlation coefficient between the rate of inflation and the year is 0. and this fact justifies our inclusion of the problem in this book.O .5 52.5 64. 1.5 64. The same problems arise. public examination questions which ask the candidate to make inappropriate use of correlation coefficients are quite common. the exact tests that we have used are preferred. invalid .0 72. The close links between the techniques of regression and correlation may be illustrated if we consider the application of regression techniques to the data of this problem. is the value obtained in the solution to part (a).035 hypothesis p = 0 may then be based on a tstatistic with the value = 4.0 64. where the observations are gathered through time.0 190 182 192 173 175 188 79.) (3) The two tests for significance of the correlation coefficient used in our solution are equivalent. in particular.5 44.5 . since it is based on an approximation. If.2) fi = 1.5 160 160 157 6. A test of the null 1.5 (2) Examples of ‘spurious correlation’ like the above.0 183 170 178 173 175 178 185 65.0 57. frequently arise because there is an underlying time trend in the variation of the two quantities under observation.5 for a comparison between the exact and approximate tests. Regression techniques (with the rate of inflation as the response variable) would be more appropriate. See Note 2 to Problem 5A.5 67. however.0 86. we note.0 64. for example. Unfortunately.5 62.035 and s2 = 21. for example.71.5. That this is the case in this example is easily seen by.0 58. 7 9 7 7 v 1. The presence of a time trend means that correlation techniques are inappropriate for the data of this problem. since they d o not require a random sample from a bivariate distribution. using the correlation coefficient here merely as a numerical measure on the basis of which to compare apparent strength of relationships. We are. (4) We have seen in Note 2 that correlation techniques are inappropriate here.5 51.71/363.0 76.rpluin variation in one quantity in terms of another.239 v21. a value of r equal to the 1%critical value.
e. 0. is therefore (0. we obtain Zh = 1626.6242 1 = o.0.6923 X 744. is 2.400.0769 and Shw = 370. = ylog.004 and 1. we proceed as we did for male students.0.244. .0.004. V601.0 and Zhw = 160 527.0769 To obtain a confidence interval for the correlation coefficient pm between height and weight for male students.e.554.125 = 0. f . (0.125 and S h = 121. we find that the transformed value is the same as the untransformed value (since the ztransformation has very little effect on values close to zero). Xh2 = 264 616. A 95% confidence interval for the transformed value of pm therefore has endpoints 0.7308.380. size.624 iZT9 1. 0.554 The distribution of z is approximately normal with variance 1/ ( n 3).7308 = 0. Turning now to the female students. Ch2 = 422 522. Cw2 = 33946.3402  v7' i.794). S = 596.96 i. Our 95% confidence interval for p.0. where n is the sample . sample correlation coefficient between height and weight for the male students is therefore r. and hence to Shh = 601.z is approximately N ( 0 . T h e . (b) A test of the equality of pm and py may be based on the difference between the observed values of z and z r : if the two population correlation coefficients are the same. S = 744.00 3x596.328.0 + 0. leading to Shh = 228. Applying the same transformation to the lower limit.0 .00. as the upper limit of our confidence interval for p.554 1. T h e corresponding interval for pf is (0.5. The transformed value of r .96 0.0. 1 / 1 0 + 1/7). To obtain a confidence interval for the correlation coefficient pr between height and weight for female students.5A. the distribution . of z . . We obtain The 95% confidence interval for the transformed correlation coefficient has endpoints 1. (b) Is there any evidence in the data to suggest that the two correlation coefficients differ? Solution (a) We shall let h denote height.40. = 370.5 Regression and Correlation 189 (a) Obtain 95% confidence intervals for the correlation coefficients between height and weight for the male and female students.75 and Xhw = 94022.847). we use Fisher's ztransformation.081).5. A test of the null hypothesis that the two .5. The sample correlation coefficient between height and weight for the female students is therefore "= 121. 1. Zw = 889. Applying the inverse transformation to the upper limit we find.0. Cw = 577. Zw2 = 61 538. 0.6923. and w weight. standard calculations lead to Zh = 2342. Starting with the male students. 1.
The first is the obvious one that. This suggests that.5510. It is the common practice for such tables to deal only with positive values of r and z . the distribution that is used to approximate that of z is not the standard normal distribution N(0. since the observed value of this statistic is 0. Doing so. If.576. based on a sample of size 13.Zf  (2) The fact that the confidence interval obtained for pm does not include the value zero indicates that. we obtain the value 0.560. there is no reason to reject the null hypothesis.but only just. for example. is 0. at least when the underlying correlation coefficient is close to zero. despite the notation which is in almost universal use. the null hypothesis pm = 0 should be rejected . We can investigate this point further by finding the value of r which would.6 t/l/lO + 1/7 with critical values of the standard normal distribution. since the lower limit of the interval lies quite close to zero. The difference between the results of these two tests illustrates the loss of information resulting from the replacement of the actual values of height and weight by ranks. even for quite small sample sizes.l). we reach the same conclusion. 5A. have been judged ‘just significant’ by an approximate test based on the ztransformation. The results are as follows: Contestant JudgeA JudgeB 1 8 8= 2 2 2= 3 1 1 4 12 10 5 9= 6 6 5= 7 5= 2= 12 8 9= 8= 9 3 4 10 9= 5 11 7 12 4 7 11 Calculate a rank correlation coefficient between the two judges’ results and perform an appropriate significance test.r ) = . (3) Since the (product moment) correlation coefficient for male students exceeds the 5 % critical value by only a narrow margin. Since the 5 % critical value for Spearman’s coefficient. at the 5% significance level. it might be interesting to investigate the accuracy of the approximation by comparing with the results of an exact test. since negative values may be dealt with using the easily verifiable properties z ( .r ( z ) . although we have indicated in the solution to this problem how the z transformation and its inverse can be computed on a calculator with exponential and logarithmic functions. . Notes (1) Two points about the ztransformation are worth mentioning. Comment on your results. testing at the 5 % significance level.5529. The second is that. the observed value is not significant.397 (without adjustment). quite close to the exact critical value quoted above. some readers may prefer to use tables of the two transformations. This would correspond to a value of / z 1 of 1 . This suggests that. we calculate Spearman’s coefficient. since the appropriate critical value for the correlation coefficient obtained from tables is 0.6 Judging a beauty contest The twelve contestants in a beauty contest are judged by two judges. or 0.386 (adjusting for ties). which transforms to a value of 1 r I of 0. since our method based on the ztransformation is only approximate.6198.z ) = .z ( r ) and r ( . 9 6 O O / m = 0. each of whom ranks the contestants in order.190 Analysis o Structured Dam f population coefficients are equal may therefore be performed by comparing the statistic Zm SA. . this example permits an interesting comparison with rank correlation methods. methods involving the z transformation may provide a fairly good approximation to the exact distribution.
1 ) / 2 pairs of contestants.5 12.0 3.0.5035 respectively. Spearman's rank correlation coefficient is therefore To assess the significance of this value.5 1.5 2.0 6.0 10. This formula is exact if there are no ties in either ranking. .0 10. For our data.0 1. so that i =1 As can be seen.0 2. we present two solutions. however. di 12.5 2. the approximation is.0 5. and 1 if they are ranked differently: if the two contestants are awarded the same rank by either or both of the judges the score is 0. SM = 142.0 3. 5 = . differs from zero at the lo%. we consult tables for Spearman's coefficient. 8. the approximation performs well for this example.75. we consider all n(n .5 0.0 7. fairly good unless there are many ties. = 140.) Performing the usual calculations. but the evidence is hardly conclusive: the calculated correlation coefficient reflects the fact (obvious on inspecting the data) that their agreement is far from total.0 0.5 1.0 11.0 6. but not at the 5%. Sd = 78. Spearman's coertieient Letting u and 6 respectively represent the ranks assigned by judges A and B. 8.0 b.7 . 8= and 9 = are replaced by the values 2 . and find that the 5 % and 10% critical values for a sample of size 12 are 0. 5 . U.5A. therefore.0. level of significance. for i = 1 . (Although we are performing a twotailed test. but is only approximate otherwise.0 8.0 4.5 1.0 4. 2. we obtain the values below.) Our observed value.0 2. For each pair we score + 1 if the two judges rank the two contestants in the same way. there is a fairly strong argument in favour of a onetailed test . is defined as the ratio S rK = .0. we obtain S. 5 .5 and 10. If the total of the scores obtained is S . There is thus a fairly strong indication that the two judges tend to agree (as contestants would hope).5874 and 0. Spearman's coefficient can be calculated as the product moment correlation coefficient between the a s and 6 s .0 We now find that x d : = 125. Keodall's coefficient To calculate this coefficient.see Note 1. one involving Spearman's coefficient and one involving Kendall's. . n .0 7.0 10. .0 4. 5 .6 Regression and Correlalion 19 1 Solution Since the statement of the problem does not specify which rank correlation coefficient is to be used. di denotes the difference between the ranks ai and bi assigned to the i t h contestant.5 5.5.5 3.0 8.0 10. Kendall's coefficient r.0 5.0 5.0 4.5 12 0. . An alternative method of calculating rs is to make use of the formula i=l rs=l n(n2 1) ' where n denotes the number of contestants and. (The tied ranks 2 = .
Of these. If we choose Judge A as the first judge. thus Comparing the value of rK with tables. the members of four pairs are tied by A (one pair having the rank 5. i. Our conclusions are thus similar to those reached through the calculation of r. With this simplification. and 1 for lower ranks. we move on to the next contestant. . but excluding the ranks 6 and 5 due to ties in Judge A’s ranking: this leaves just the rank 10.5 1 6 1 10 5 1 12 10 0 2 (It may be in order to explain how some of the above scores are obtained.5 5. the score of 11 is obtained. If there were no ties in the judges’ rankings both N A and N B would be equal to n ( n . noting the contribution to the total score.5. One way of calculating S involves first rearranging the observations from left to right so that the ranks awarded by one of the judges (it does not matter which) are in nondecreasing order. results are as follows. since such values indicate some agreement between the two . Since all eleven of these exceed 1. and there are two pairs tied by B. Thus N A = 62. excluding those corresponding to contestants tied with the current contestant by the first judge. For each contestant we consider the ranks appearing to the right of his or her position in the list.5 1 8. In this problem. and two are less than this value.5’ in that row. The first is obtained by considering all ranks to the right of the ‘1’ in the ‘Judge B’ row. we have We note that the effect of ignoring ties.5’ in the row of Judge B’s ranks. despite the absence of any wording which suggests a onetailed test. for our data.5. The (slight) simplification in this method is that we ignore ties in calculating N A and NB. though greater than in the case of Spearman’s coefficient. JudgeA JudgeB Score 1 1 11 2 2. provided that some justification is given for the choice. We score + 1 for higher ranks. considering the ranks awarded by the other judge. there are some problems where the situation is rather less clearcut.1 ) / 2 . leading to a score of 1 . Notes (1) Although in most hypothesis testing problems it is fairly clear whether a one. so that both have the value n ( n . there is a strong argument in favour of a test which rejects the null hypothesis for large positive values of the correlation coefficient.5 6 12 6 11 5 8. As with Spearman’s coefficient. and N B = 64.6 where N A and N B are the numbers of pairs ranked distinctly by A and B respectively.192 Analysis of Structured Daia 5A. In such situations our view would be that either type of test would be valid. but not at the 5 % level (the appropriate critical values are 0. there is an alternative method of calculation for rK which is only approximate in the presence of ties. we find that S = 26. The ninth is obtained by considering the three ranks appearing to the right of the ‘8. is slight. The eighth score is obtained by considering the four ranks appearing to the right of the first ‘8.1)/ 2.2 = 1. We then work through the contestants from left to right. while three pairs have the rank lo).4545 respectively).or twotailed test is appropriate (and a fairly good rule is to use the latter unless the wording of the problem indicates that departures from the null hypothesis in a particular direction are of interest). to 66 in this problem. we find that it differs significantly from zero at the 10% level. one exceeds 8.e.) Totalling the scores obtained. As it is. so that the score is 1 . but which is often used unless there are many ties.3939 and 0.5 7 8 10 10 2.5 9 3 4 7 4 7 5.0 = 1.
we have chosen to perform a twotailed test.5874) with the corresponding value for the product moment correlation coefficient (0. The tables are thus. These are based on the distributions which arise if the ranks awarded by each judge are independent random permutations of the integers 1. If thcy did not. For Kendall's coefficient a normal approximation can be used: in the absence of association. which could lead us to suspect that one or other of the judges is worse than useless! Despite this argument. we require a value of ti rather larger than we have here: the exact 10% critical value (0. just as Spearman's coefficient is a version of the product moment correlation coefficient which replaces actual data values by ranks.3939) may be compared with the = 0. (Note that there is nothing in the statement of the problem to tell us whether a (numerically) high rank corresponds to a good or a bad candidate. There is no totally satisfactory way round this problem. approximations may be used. . The first is that computational formulae which d o not corrcct for tics lead to results which are slightly inaccurate. if possible. as in Problem 5A. not necessary to know this provided that the two judges are following the same convention in ranking the candidates.3635. on the grounds that. equivalently. we would be interested in detecting and.5760).) ( 2 ) The presence of ties in the rankings causes two problems. For a good approximation. explaining it. The approximation is fairly good even for quite small values of n : for n = 12 we can compare the exact 5 % critical value used in our solution (0. is approximately normal with mean zero and variance the distribution of r. of course. and all we can d o is hope that the presence of relatively few ties has only a slight effect on the accuracy of the tabulated values. strictly speaking. viz.4). . and need not add much in the case of Spcarman's coefficient if a suitable calculator is available. 2(2n + 5)/9n(n . . This can. i 1 ifx>O 0 ifx = 0 1 i f x < O This shows that Kendall's coefficient is similar to Spearman's coefficient. to transform the value of rs to a value which may be checked against rtables. of course. but makes rather less use of the information provided by the ranks.6 Regression and Correlation 193 judges and this is what we would hope to detect.1). (3) If tables for testing the significance of rank correlation coefficients are not available. ti. be overcome (as in our solution) by using the corrcct formulae: this does not add much to the work involved in the case of Kendall's coefficient. . 2 . approximation 1. . Large negative values indicate systematic disagreement. inappropriate in the presence of tics. if there is systematic disagreement.SA. It is.6449(4) A concise expression for the quantity S which appears in the definition of Kcndall's coefficient is where sign(x) = It is possible to derive a similar expression for S. however. For Spearman's coefficient it is possible to use tables for the product moment correlation coefficient (or. The second problem resulting from the presence of ties in the rankings concerns the tables o f critical values for the two coefficients. we would expect to observe results suggesting a degree of systematic disagrccmcnt.
194 Analysis o Structured Dam f 5B. the necessary calculations are given separately at the end of the solution. Thus. Solution Although the data appear in the form of a twoway table. for example. but is now recognised as being one of the most important contributions of statistics to scientific experimentation. Fisher. and the resulting scores are given in the table below. and is therefore of particular value through the insights it offers into the practical applications of statistics. Twoway analysis of variance (and. and on each day three samples were taken from the process and analysed for quality. the yields of two or more types of crop. and we aim to partition the total variability of the data. we obtain experimental data. In this way one can judge the importance or otherwise of the various sources. generalisations which we do not cover here) shows the power of statistical methods of conducting and analysing experiments so as to draw conclusions simultaneously about two types of variation. as measured by the corrected sum of squares about the sample mean. in fact. indeed. one would not in practice conduct an experiment solely to compare yields of a few types of crop. a score was awarded in arbitrary units. a ‘between temperatures’ component and a ‘within temperatures’ component. determine between which temperatures there are significant differences a t the 5 % level. especially in industrial and agricultural research. and then allocating this to the various possible sources of that variability. The method was devised by Sir Ronald A. Sample 1 2 3 100°C (1) 41 44 48 Temperature (Day) 120°C (2) 140°C (3) ~ ~~~ 160°C (4) ~~ 54 56 53 50 52 48 38 36 41 Is there evidence to suggest that average score depends on temperature? If so. This is done in the Analysis of Variance table which appears on the opposite page. 5B. In each of these cases. or measures of handeye coordination of children of different ages. . But the topic is much richer than that. Its main application is to what are called comparative experiments. the analysis always proceeds by measuring the total variability in the data (by the corrected sum of squares of all the data values). and draw appropriate conclusions.1 5B Analysis of Variance This topic is mainly covered at undergraduate level. It does have a very wide application. simply repeated (and presumed independent) samples at the same temperature. The idea of experimenting in this way was considered rather unsound when Fisher first put it forward. or the fuel consumption of different models of car. the analysis of variance can be viewed as little more than an extension of the twosample ttest to more than two samples. Roughly speaking. Hence a oneway analysis of variance model is appropriate. and is just one of his fundamental contributions to statistics. the three scores in each column are. In all commercial farming. into two components. fertilisers are applied to crops. these are experiments in which we may compare. and experiments would be performed to obtain information simultaneously about crops and fertilisers.1 Relationship between quality and temperature An industrial plant was maintained at different temperatures on four successive days. where it can play a very prominent r61e. in the crop yield example above. A t its simplest.
Uncorrected total sum of squarcs. We conclude that the mean score is highest for the two intermediate temperatures. = CF = 26 7 11 .25 .25 Mean Square 144.16. T h e means are (in ascending order of temperature) 44.26 226. 1 = 11. A satisfactory sequence of calculations is presented below. We see then that the means for temperatures 120" and 140" are not significantly different at the S % level.75/6. T J = 115.1 Analysis of Variance 195 Source Between temperatures Within temperatures Total Degreesof Freedom 3 Sum of Squares 434.3 = 8. 23.434 25 = 50 00. .59. Numbcrs of observations: ti1 = ti2 = 113 = ~ I = J 3.33. . Calculations For problems in the Analysis of Variance the calculations are rather more lengthy than in most others in this book. Between temperatures = 4 . 54. To discover more about the precise way in which score is affected by temperature.33. and it is convenient to sct out the working leading to the Analysis of Variance table here.00 484.5B. GT = (41 + 41 + .25 The calculated Fratio. is now compared with the tabulated upper percentage points of the Fdistribution on 3 and 8 degrees of freedom. but all other pairs are significantly different at that level.CF = 26661 . SSB = f 11. (ix) Mean square = sum of squarcs / degrees of freedom. 50. we can calculate the socalled Least Significant Difference (LSD) at the 5% level. USS = 41' (Corrected) total sum of squares.306g2X6. T: Between temperatures sum o squarcs. and in particular the means for 100" and 160" are lower than the other two. The sample means for the four temperatures are now compared to see which pairs differ by more than the LSD.SSB 26226 75 = 434 25. .306 is the upper 2+% point of the rdistribution on 8 degrees of freedom. number of observations. . Within tempeiatures = 11 .25 F ratio 23.25 is the mean square within temperatures. in this formula 2.16 8 11 50.71.5') .16.75 484. T 3 = 150.25 = 23. The result is thus highly significant. . giving very strong evidence that temperature affects score. (x) Fratio = 144.25.1 = 3.25/3 = 4. and 6.75 6. Correction factor. T 2 = 163. Grand total. t 36 + 41) = 561. . CF = (GT)'/n = 561'/12 = 26226. Totals for temperatures: T I = 133.75. +  T: t T: I12  + f13 Ilj T:  CF = (133'+ 3 1 163'+ 150'+ 1 1.33. A t the 1% significance level the tabulated value is 7. )I = 12. We obtain LSD = 2. CSS = USS + 44' +  + 36' t 41' = 2671 1.00 and 38. Within temperatures sum of squares = CSS (viii) Degrees of frcedom: Total = II  = 484.
and consequently there is a definite and intentional structure amongst the four temperatures. The method derives from the conventional twosample ttest for the difference between two means. if all temperatures produce different mean scores. d is the appropriate idistribution percentage point and N is the number of data values used to compute each of the means being compared. we have made 3 inde endent observations on each of the distributions N ( p l . a short discussion is given in Note 4 to Problem 4B. the data are assumed to satisfy some theoretical model and calculations are done to determine whether the model is correct. as here. Otherwise it is impossible to decide whether the differences between the four means are due to the temperature change or due to a change in the day. (4) Although the LSD method is a legitimate one to use here. If the null hypothesis is true then the calculated F ratio ought to have an Fdistribution. 14 were classified as Offtype (0) and 11 as Aberrant (A). where s 2 is the within temperatures mean square. u 2 ) .u2). Then. and so the significance level of 5% is an underestimate of the true chance of rejecting the null hypothesis when it is true. there are simple formulae for estimating a . The LSD method. ignores the fact that six hypothesis tests are being performed. More advanced statistical techniques are available.this is a very important assumption. . for overcoming this difficulty. 25 were classified as Normal (N).2 The rubber content of guayule plants A random sample of 50 plants was selected from a field containing oneyearold plants of a variety of guayule. and would. the maximum of this function. The null hypothesis that we adopt is that p. The underlying assumption that might have been made when planning the experiment was that score is likely to exhibit a roughly quadratic relationship with temperature: that is.2.196 Analysis o Struciured Data f 5B. if it existed. (3) The LSD is a convenient and equivalent way of performing the six possible itests that would be required to compare each pair of means. be checked in practice. If a . W e may denote the mean of the distribution of scores at temperature 100" by p1. f u2. and v is its number of degrees of freedom. the mean at temperature 120" by p2. could be obtained using calculus.2 Notes (1) As in most statistical analyses. If the choice of day was thought to affect the score then a more involved experiment would be required. of course. however. t . etc. however. (5) In an experiment of the type described it is important that the conditions under which the industrial process is run are identical (except for temperature change) on each of the different days. . if mean score were denoted by y and temperature by t then the relationship would be y = a + bt + ct2. not one. a better analysis can be obtained by making use of the fact that the four temperatures can be placed in a natural order.u ). from low to high. N(p3. Of the 50. This structure has not been exploited in the given solution. Presumably the aim of the experiment was to discover which temperature is likely to maximise the mean score. b and c . In this problem we assume that the three scores observed at each temperature are a random sample from a normal distribution with a particular mean and variance. a plant species yielding rubber. N(p2.m2) and N(p4. a m . . 5B. The data on the following page show the percentage rubber content from each plant. b and c could be estimated. is assumed to be the same in all four distributions . (2) The formula for the Least Significant Difference at level a is given by LSD = t . = p2 = p3 = p4 and we wish to test if this is true rather than the alternative hypothesis that at least two of these means are different. Notice that the variance. With four equally spaced temperatures.
0182 4 1.01 6.51 4.24 5.48 6.26 6.462 .90 5.40 8.83 6.5B. PO = +(PN + PA).88 6.81 7. .42 7.12 6.58 5. Solution The data are classified in one way only and so the appropriate statistical model is that for a oneway analysis of variance.38 7.58 6.7728 0. we therefore find no evidence to reject the null hypothesis.32 6. On a twotailed test at the 5% level.25 7.46 5.01 7.50 The 24% point of the tdistribution on 47 degrees of freedom is.90 7.864 .25 7.10 (a) Test the null hypothesis that there is no difference in mean rubber content between the three types of plant.24 5.04 on 2 and 47 degrees of freedom.FA = 0 and use a ttest.5639 7.01 5.10 4.type Aberrant 6.20 6.11. For the data in this problem.84 7. (Statistical tables are often less detailed for degrees of freedom higher than 40.34 4. 2.5457 26.01.68 7.) The Fratio is highly significant.70 5. the necessary calculations being laid out at the end of the solution.72 7.29 7. again by interpolation. test the following particular null hypotheses: (i) PN ( 4 = PA.10 6.27 6.2 Analysis of Variance 197 Normal Off. t is given by 6.36 5.70 6. 5. PO and pA respectively. If XN and FA are the observed means of the two types and s’ is the within types mean square.84 6.31 6.43 7. Source Between types Within types Total Degreesof Freedom Sum of Squares Mean Quare F ratio 2 47 49 15.68 5.86 6.6. by interpolation.90 6.93 6.20 4.30 6.42 6. The Analysis of Variance table follows.04 (a) The Fratio for testing the null hypothesis that FN = PO = PA is 14.5536 14. The tabulated upper 1% point of the Fdistribution on these degrees of freedom is. (b) If the means of the distributions of each type of plant are denoted by pN.82 5.04 6.42 7. (b) (i) To test the null hypothesis that pN = FA we express the hypothesis as FN . giving overwhelming evidence of a difference between the three types of plant. the tstatistic is and has the t distribution on 47 degrees of freedom.00 7.
70. “N TA T i T i (vi) Ektwecn types sum of squares.5457.5% point is approximately 2. but that they differ from that of Offtype plants. Using the sample mean for Offtype plants. Calculations (i) Grand total. no = 14. and u2 is the common variance of the observations.2052. CF = (CT)’/n = 320. The tabulated 0.2 = 47. (Strictly. T A = 71.392/50 = 2052. These are that the observations are independent and normally distributed. In part (a). GT = 320. = 2 5 .(kN + kA). Between types = 3 . the measurements on the three types of plant form three independent random samples. so the hypothesis will be rejected at the 1% level on a twotailed test. (iii) Totals for types of plant: T . (iv) Uncorrected total sum of squares.864 + 6. where p is the common value of pN. CSS = USS .198 Ana!ysis o Structured Data f (ii) To test the hypothesis that po = pJ. = 171. Numbers of observations: n . n = 50.S S B = 26. the values obtained from the 50 plants are a random sample from N( p.39. number of observations.) .69.5589. po and pA.61.550 .2 i(pN+ X . USS = 2094.462) which we must also compare uith the Idistribution on 47 degrees of freedom. the alternative hypothesis is just that they are not all the same. Overall.+ . all with the same variance.i(6.7728/0. _( X N + XA) _ i ___ _ ~ For the given data we find 5. the appropriate statistic is r = 58. SSB = . Within types = 49 .5536 = 14. under the null hypothesis.5639.CF = 2094. (ii) Correction factor.5589 . n A = 11. There is convincing evidence that ko is not equal to . we again use a ttest. (x) Fratio = 7.995 = 41. “0 “A (vii) Within types sum of squares = CSS . from normal distributions with different means but with the same variance. the means are not necessarily all different. T o = 77. our conclusion is that the mean rubber contents of Normal and Aberrant plants are very similar.08.0182. (viii) Degrees of freedom: Total = n 1 = 49. Notes (1) The tests used in the solution require several conditions to be met for their validity. (v) (Corrected) total sum of squares.u2).CF = 15.04.. Under the alternative.995.1 = 2. to denote x. (ix) Mean square = sum of squares / degrees of freedom.+ .
variable. and test for differences between operators and between machines. that for a twoway analysis of variance. Solution In this experiment there are two main sources of variability: differences between operators and differences between machines. as here. A r test can be generated to test such hypotheses as a p N + b p o + c p A = 0. replacing u2 by s 2 gives us a statistic with a 1 distribution. there are only two null hypotheses of interest. 5B. under the hypothesis a p~ zero. in part (b).1 we used the Least Significant Difference method. but have not used it here.f and b = 1. The Analysis of Variance table follows. with each operator using each of the five machines for a whole shift. Operator Machine C D 53 55 56 44 93 95 96 88 A 56 64 62 51 B 92 83 80 78 E 68 62 62 69 1 2 3 4 Carry out an appropriate analysis of variance. its use is rather more complicated when. In the usual way. a = c = . with 100 corresponding to perfect quality and 0 to useless material. an estimate of a pLN b po+ c p ~ viz.3 Analysis of Vnriririce 199 (2) In the solution to Problem 5B. b and c . Y .) * (3) The rtest is useful when testing more complex null hypotheses than those such as pA = p. treating this as a random + . therefore. as numerator.say. The quality of the components having been quite variable. Now. differences between operators or both. and because the second of these does not involve a simple comparison of two mean values. for any given constants a . or p~ = PA. (In fact. where p is some specified value. The production was observed for each shift. in effect. an experiment was conducted to determine whether the variability was caused by differences between machines.5B. the sample sizes are unequal. + b po + c pc ” = 0. The LSD can be used when the aim is to make comparisons between pairs of means. This is mainly because. The statistical model that is appropriate is. we find that Var(Y) = ___ “N a202 2 + b+ a2 “0 c2m2 “A “N “0 “A Since. The quality of the components produced during any shift was assessed by quality control inspectors on a scale of 0 to 100.3 The quality of electronic components In a factory manufacturing electronic components there are four machine operators and five machines producing similar items. a XN+ b TO+ c FA. Y is normally distributed and has mean when the hypothesis is true. . In these cases the test is conducted by using. In part (b)(ii) we used. The results were as given in the following table. the calculations being presented at the end of the solution.
57/21.. Totals for operators: 0 = 362.45 = no = 129.55 Mean Squares 43.45. the 0. (xiii)Fratios: 43. MSS = Mi + + . 0 : of 0 : 01 (viii) Sum of squares between operators. Calculations Grand total. Degrees of freedom: Total = n  1 = 19.57 21. . 0 3 = 356.99 54. . the 5% point is 3.99 on 3 and 12 degrees of freedom.71 = 1. Uncorrected total sum of squares. conclude that the variability in component quality is a result of differences between the operators. We conclude that there is overwhelming evidence to reject this null hypothesis.30. CSS = USS . M D = 372.200 Analysis o Structured Data f SB.1 = 4.+ .30 260.CF no nn Mi M : Mi no M$ no 1 (2332 + 3332 + 20B2 + 3722 + 2612) . = 4. 0 4 = 330.3 .1 = 3.50 5144. 4 Residual sum of squares. . n = 20.75 Total 19 The F ratio for testing the null hypothesis that there are no differences between operators is 1.50. = 5 .129.49. Hence this null hypothesis cannot be rejected at the 5% level. RSS = CSS . n. Mc = 208. Mean square = sum of squares I degrees of freedom. 1 2 Totals for machines: M A = 233.CF (i) (ii) (iii) (iv) (v) (vi) "m = (3622 5 1 n.3 Source Operators Machines Residual Degrees of Freedom 3 4 12 Sums of Squares 129. (Corrected) total sum of squares.MSS = 5144. 1188. Residual = 19 . 0 = 359. USS = 562 + 922 + .75. (vii) Number of machines.55. Sum of squares between machines.2921. since it far exceeds 9.98982..98982.75 . We cannot.75 4754. In other words.75.+ . therefore.OSS . Between machines = n. This value is compared with upper percentage points of the Fdistribution.98982.75.+ . quality does vary from one machine to another.4 = 12. M B = 333. comfortably above the observed value. n.1% point of the Fdistribution on 4 and 12 degrees of freedom. This value is highly significant.55 . OSS = .45 = 5144. "m + 3592 + 3562 + 3302) . Between operators = n. .63. . Correction factor.30 = 260.+ .CF = 104 127 . In a similar way the Fratio for testing the null hypothesis of no differences between the machines is 54. number of operators.+ .4754. GT = 1407.45 = 4754. number of observations. CF = (GT)2/n = 1407?20 = 98982.25 1188. n.71 = 54.71 F ratios 1. + 692 = 104 127. M E = 261.99.
T h e difference between each pair of means is now compared with the LSD.25. C) and (A. A t this level the means of machines B and D are now also not significantly differcnt. Note that all data are assumed to have a common variance u2. v being the corresponding number of degrees of freedom.) In this problem the LSD. 93.00. (5) Twoway data often exhibit what is termed interaction. The 5% significance level should therefore be interpreted with caution. in fact. However. and this is usually good enough for practical purposes.2 for a note on the theory.71/4 = 7.25 and 7. and a model allowing for interaction could be used. b is the effect on mean quality caused by using a particular machine. The formula for the LSD at the 1OOu% significance level is LSD = t . (See Problem SB. data such as those considered here are often approximately normal. Of these three. The only change in the calculations would be to use the rvalue for a 1% test. respectively. we must be aware that when using the LSD we are. If the difference between any two means is greater than the LSD.18d2X21. and the analysis can be modified if necessary. d l / K . The LSD would then change to 10. then those means are significantly different at the 5% level. machine A has production quality intermediate between those of C and E.1. the Least Significant Difference (LSD) method can be used.25. 52.00. One alternative here would be to use a 1% significance level and obtain better protection against rejecting the null hypothesis when it is true. where s 2 is the Residual Mean Square.05. To test for nonadditivity each operator would need to use each machine for at least two shifts.18. (3) If it is necessary to decide which particular machines differ. .3 Analysis o Variance 201 f Notes (1) The statistical model for data such as these is that the observed data value can be expressed as rn + a + b + e . Here all means are significantly different except for those of the pairs (A. 58. The means of the data collected on each machine are. or nonadditivity. and no is the number of data values used to compute each mean. respectively. d is the appropriate percentage point of the tdistribution on v degrees of freedom. calculated at the 5% significance level. is LSD = 2. . and e is a random variable which is normally distributed with mean zero and variance u2. performing six separate ttests. 3. Such a model is usually referred to as additive. The overall conclusion would now be that machines B and D have a higher quality product than do the other three machines.) (2) The data are discrete and are restricted to lie in the range 0100. (There are techniques for checking these assumptions. a is the effect on mean quality caused by using a particular operator.25. In the present example this would imply that the difference in mean quality between two machines depends on which operators are using the machines. i.SB. where t v .l for another example of the use of the LSD. they cannot be exactly normally distributed. This would give repeat quality values on each combination of operator and machine. and Problem 4B. E). Clearly. * * (4) As in Problem 5B.05. . 83.e. which have differences of 6. where rn is the overall mean quality level.00 and 65.
Observed rainfall Forecast rainfall No rain Light rain 23 38 Total 31 26 141 (a) Find the ‘expected’ frequencies for each combination of forecast rainfall and observcd rainfall under the hypothesis. moderate. a topic of currcnt interest is whether there is any relationship bctwccn presence of heart disease and amount of exercise taken. of occasions with observed rainfall of type i . e3l = 141 x 38 ~ 141 . which is all that is needed 141 here).= 21. Thus we may wonder to what extent certain social habits (for example. is the observed proportion. Solution (a) We denote the ‘expected’ frequency ofokervation for class i of observed rainfall and class j of forecast rainfall by e l l . female) and one with three (low. individuals would also necd to be categoriscd by such factors as age. the topic of contingency tables deals with the analysis of data showing the relationship between attributes. Calculating e. or might wish to examine whether cirrhosis of the liver and alcohol consumption are related.5 21 . for the data in the problem gives the following table. . p . The table below summarises the incidence of each type of forecast and the observed rainfall for a sample of 141 forecasts. to both).0 7.0 (to one decimal place. and p^ is similarly the observed proportion of forecasts of type j .nationality and type of diet. H o . The other e. as will be seen in the following problems. Of course. In practice. = tipI p . For example. For example. and hence the estimated probability. smoking) are related to age. one with two categories (male.0 78 x . indeed. SCX. If we have two types of attribute. namely ‘no rain’.o 10. most contingency tablcs are manydimcnsional..1 5C Contingency Tables While one undertakes the analysis of relationships between variables by the techniques of regression and correlation. ‘light rain’ and ‘heavy rain’. 5C. Observed rainfall No rain Light rain Forecast rainfall 15. where n is the total number of forecasts. high) we place each individual into the appropriate one of the six possible categories. thus forming what we call a ( 2 x 3 ) cotiritigetity /able of frequencies.I The value of rainfall forecasts Forecasts of rainfall are made in three categories. and techniques have been devised recently to analysc such tables.. or to sex (or. In an invcstigation of this qucstion. Then e. that there is no association between the forecast and the observed rainfall. (b) Use a test based on the x2 distribution to decide whether or not H o is plausible. these numbers and attributes can be gcncraliscd.202 Analysis of Structured Data 5c. The basic method of data presentation is straightforward.]s are obtained similarly.
e. As with the x 2 goodnessoffit test. and 100 never walked and never cycled.49. as discussed (34 .0 for e31.7. whereas 700 cycled occasionally and 150 walked frequently. . In practice. 10 nevcr walked but cycled frequently.2 A s u r v e y of exercise habits In a survey.41. Thus.19. approximately. Of the 1000 individuals. e.0 Under H o the test statistic X’ has. to consider what form any association between these categories might take.78 and 9.7)* + . As can be seen from the expression for e31. i. 200 individuals never cycled and 125 never walked. significance level. and indeed is normally done. of association between the forecasts and the observed rainfall. 1000 individuals were asked how often they took exercise in the form of cycling. + 19. if anything. For cxamplc. (a) Arrange the data in the form of a ( 3 x 3 ) contingency table. the association is the reverse of the one to be expected. . A simple gcneral formula for e. . We obtain x2 = x2 goodnessoffit tests. for example. how the formula produces the value 21. although it is not very strong. (2) The conclusion to part ( b ) was that the data containcd some slight evidence against f/o. where r and c are the numbers of rows and columns in the table.SC. SC.5 + (24 . There were three possible responses.. namely ‘never’. thc statistician would now go on. as an example. but not the 596. cancellation is possible.2 Contingency Tables 203 (b) The form of the test statistic is very similar to that for in Chapter 3. Notes (1) In the solution to part (a) we used the expression e. stating clearly the hypothesis which it is designed to test. ‘occasionally’ and ‘frequently’.5)’ 41.0)’ 7. a x’ distribution with ( r l)(c 1) degrees of freedom. with the same thrce possible responses. = total for row i x total for column grand total i so that. 75 said that they frequcntly walked and cycled. T h e upper 10% and 5% points for x: are 7.7 (6 . in favour of an association between the forccast and the actual weather. (b) Apply a x 2 test to the ( 3 x 3 ) table.. ~~ p and showed. when henry rain is forecast the observed frequency of no rain is greater than its cxpected frequency. so that there are 4 degrees of freedom.. . so H0 would be rejected at the lo%. In the prcsent case. it is interesting to note that. In all. in consultation with specialists in the topic. there is some evidence. In the present example r = c = 3. If0 is usually rejected only for large values of X 2 . and state how many of the 1000 individuals both cycle and walk occasionally. A similar question was asked with respect to walking. is = tip. 35 never cycled but walked frequently.
against a general alternative. 725 e23 = n ~ 2 .5 15. x2 = + + + Notes (1) The x2 test for contingency tables.5 105. Cycling Never Total 30. T h e eys are calculated as in Problem 5C. and it is anticipated that they will increase or decrease together. Hence the n u m b r of individuals who both cycle and walk occasionally is 645. in particular. when the categories of the two variables are ordered.Js are the corresponding expected frequencies under the null hypothesis.0 200 507. s are the entries in the table of observed frequencies constructed in part (a) and the e. .0 700 72. consider an alternative where those who occasionally cycled rarely walked. and the remaining entries follow by subtraction. for example.. can detect any sort of alternative to the null hypothesis.0)2 (1587. no matter what form this might take.5 12.5)’ (1012.0 100 725 150 1000 The test statistic is (10025. 10OOxx1 1 1000 loo = 72. However. of dependence is of major interest as an alternative to independence.e J 2 7 1 Walking eq Never Occasionally Fr uentl Total 1 Yr e v 35* 200* O c c f F 40 700* FreqFtly 75* 100 11 150* 1000* I=lJ=l where the f . This would happen in the present example if it were thought that the alternative to independence is that amounts of exercise in the two activities increase together.5.5 15. as with the x2 goodnessoffit test.5)’ + . greatly exceeds any of the usual percentage points for xi. Thus the x2 test will have some chance of detecting any dependence between the two variables.204 Analysis of Structured Dam sc. The statistic is designed to test the null hypothesis that the variables defining the margins of the table (in this case the amounts of cycling and walking) are independent.1. (b) T h e usual x2 statistic is c c u. 693.64.l ) ( c . so that. the test statistic X 2 has a x2 distribution with ( r .2 Solution (a) The ( 3 x 3 ) table is given below. 25. This is true.0 87.64. . (7515‘0)2 = 693. for example. 1000 A complete table of ei.1 ) = 4 degrees of freedom. but those in the other two categories. We therefore conclude that amounts of walking and cycling are not independent. such a n . in some circumstances only one particular type. who ‘never cycled’ or ‘frequently cycled’. We would probably not.0 Under the null hypothesis of independence. walked more than average. The entries marked by a n asterisk are given in the problem. T h e value of X 2 . p= ~ .s is given below.
with + n . the total number of observations. . In the present example the evidence of dependence is so strong that there is no need to consider tests other than x2. 2 . and 86 females out of 200 insects at the second location. show that it can be expressed as (b) Samples of a certain species of insect were collected from two different locations. a more specific test based on restricted alternatives may provide conclusive evidence against independence. and then choose the test most likely to reject the null hypothesis in favour of that alternative.2. / n .r . Consider the data from all three locations simultaneously. j = 1 . Now . the column totals are r 1 and n . and e.3 Sex ratios of insects (a) In a contingency table with r rows and 2 columns the observations in row i are nil and niz. however. or otherwise. (A similar point is made in the discussion of Problem 5 A . with row totals n i .) (2) It is really not necessary to calculate the value of the x 2 statistic precisely in this example. and test whether there are differences in the proportions of females between the three locations. . In principle. and the number of females was observed. 2 . Solution (a) The x2 statistic is given in general by say. However. 2 = n . Show that the x2 statistic for testing independence of the variables defining the rows and columns of the table can be written as + Hence. that it is not acceptable to 'snoop' through the data.r .3 Contingency Tables 205 alternative would however be detected by a x2 test. The first term in the expression for the statistic is 225. unless an exact significance level (or pvalue) is required.5c. then there may be other tests which are better able to detect them (are more powerful) than the x 2 test. = ni. which on its own greatly exceeds any of the usual x2 percentage points. one ought to decide on the test to be used before collecting the data. If the range of plausible alternatives is restricted. .2. Note. .J = n . = nil ni2. in relation to choice of a onetailed or twotailed test in a different context. and in the present case J . (c) A further sample of 200 insects is taken at a third location and found to contain 110 females. trying to discover a plausible alternative hypothesis. . i = 1. . Test whether the proportions of females differ between the two locations. 2 . . This increased power is obtained by losing the ability to detect alternatives outside the limited range of interest. . * 5C. n . in tables with ordered categories where the x2 test just fails to reject independence. . i = 1 . There were 44 females in the 100 insects collected at the first location. .
where p is the common value of p 1 and p z under this hypothesis. : n We thus obtain the required expression. where qi = 1 .l 2n: n + . = p 2 against H I : p 1 # p2. . I l +p292 .l/n). 1 + . test statistics.nil . approximately. If and p2 are the sample proportions of females.ni. /=I "I  tl .n i .2.ni. I 2n1 r n: n r =I Ii Z/=I "I . Hence so that required.n.r n.P 2 .p .n .n.p2 h . Assuming that the two samples are independent. n .~ p ] .. it follows that."2 1 1. approximately.3 But ni2 .l/n = (nfl .nl/n)* = nI 11 2: E11 r n.p i and ni is the number of observations (or trials) in sample i . PlQ . (b) This can be viewed either as a test of equality of two binomial parameters or as a (2x2) contingency table.: p . Method 1 Let p I and p 2 be the proportions of female insects in the populations at the two locations. n n: since I zn. = 1. We now w s h to obtain the second expression for X2. These two viewpoints lead to apparently different. and q = 1 . then using the normal approximation to the binomial distribution we have.(n . di.Expanding the numerator. i Piqi/ni). but in fact equivalent..?/n = n i . we find that r (n.=I r = Et1.206 Analysis of Structured Data 5c. and we will give both methods.n I I . under H. = n r =1 and Enfl= n .n i . . l ) / n . + nl.N [PI .. .n. a. We then wish to test H.En. for completeness.ln.: "I n E n .
440 . it follows that Z 2 . a. that the statistic provides no evidence to contradict the hypothesis of independence between location and proportion of females. l ) approximately. Total 56 114 170 300 The {e.0. approximately..e l l .433. (114 . e I 2 = 130 . 1 96 and +1. Since Z . SO 0.56$)2 56.96 for a test of size 5%.3 Conringency Tables 207 The value of p is of course unknown. for example. and 2 can be used as our test statistic. but the obvious estimate is in the two samples together.010 _ Since this value is to be compared with percentage points of N ( 0 .} needed in the x2 test statistic X2 are given by 130X 200 and e I 2 = __ = 86f. for example. Method 2 Expressing the data as a contingency table we obtain the following. (Since the numbers in the ‘Total’ row and column are given.x: approximately.. For the given data.  n2 = 200. I ) on a twotailed test. 300 and similarly e21 = 56. and p 2 . given the obvious closeness of the two sample proportions and it a2. we conclude that there is no evidence of any difference between p . This might have been expected. It is clear.N ( 0 . and eZ2 = 113. We have then that.430  0. so we have here separate confirmation that the statistic X 2 for the x2 test does indeed have. in fact. ) The x2 statistic X 2 is therefore (44 . a x 2 distribution. as with Method 1 above. P I = 0.43i)2 x2 = 43+ + (86. . a = 0. the last three of these could have been calculated by subtraction. the proportion of females when N o is true. approximately.113f)2 + ______ 113. This is.430. d2 = 0. simply the square of 2 calculated above.5c.440.86$)* 86$ + (56 . n 1 = 100.
but not a t the 2+% level.208 Analysis of Structured Data (c) We now have a (2 x 3) contingency table. Notes (1) Algebraic manipulations similar to those in part (a) of the solution show the equivalence of Z2 and X2.l ) x ( c . There is yet another expression for X2 which is available for (2x2) tables and is simpler to calculate than the more general expressions.1) = 1 x 2 . 500 Total The expected frequency e l l of females in location 1. b . " 2 . n .bC)2 + b)(c + d)(a + c)(b +d ) ' where a . As with goodnessoffit tests. assuming independence between location and proportion of females. "21 and "22 respectively./n into the general expression. Substituting e. the main restriction needed to ensure a good . In the present example the alternative expression gives (2) The distribution of the test statistic calculated for contingency tables is only approximately the x2 distribution.. and n is the grand total. 1 sex Female Male Total The x2 test statistic X2 then takes the value Location 2 3 48 52 100 96 104 200 96 104 200 240 260 500 x2  (4448)2 48 + j8696)2 96 + . n 12. is given by and the remaining expected numbers can be calculated similarly (with some alternatively obtained by subtraction) to give the following table of expected frequencies. It is usually quoted in the alternative notation x2 = (a n(ad . and the 5% and 2+% points of xj are 5. n 1 and n 2 are the marginal totals of the table. and some algebraic manipulation.57.99 and 7. This is x = 2 n(n11n22 . There is thus evidence at the 5 % level.n12n2d2 n 1 n 2 n In 2 2 where nl . that the proportions of females in the three locations are different. quickly leads to the alternative expression.the statistic for the x2 test.le Total 1 I 1 44 56 100 Location 2 31Total 86 114 200 1 .. c and d are the four observed frequencies n 11.38respectively. x.. 200 1 :. 104 The number of degrees of freedom is ( r .. = n. as follows.3 sex Eay. + (90.104)2 = 6.
which is in any case when the x2 approximation will be a good one. If this alternative were of interest. rather than the very general alternative of there being some difference between the three locations. that there are differences between the locations.) For (2x2) tables the a proximation can be improved by replacing cf.0) (156. which is the distribution of any one o the individual cell values. incorporating Yates’ correction. The exact f distribution is related to the hypergeometric distribution.0) Female Male Total 130 170 (144.02) and 1% (6. (See Note 3 to Problem 3B. if the marginal totals are not very large.  1 144 1 1 + . some authors use separate estimators. whereas the opposite is the case (the test being said to become conservative) when the correction is used. This modification is termed Yam’ correction. rather than relying on the Xcpproximation. but they are less likely to make much difference to the value of the test statistic and they d o not take a simple form. . (3) Given the conclusions in parts (b) and (c) it seems likely that locations 1 and 2 both differ from location 3. also desirable for larger tables. somewhat more powerful than the general one presented earlier. but still not conclusive.+ + 96 156 These values lie between the 2+% (5. Continuity corrections are appropriate whenever a discrete distribution is approximated by a continuous distribution.0) (104. Location 1 and 2 Sex 3 110 90 200 (96. Continuity corrections are. however. it is distribution of the test statistic. Without the correction there is a tendency for Ho to be rejected slightly too often.. Method 1 . where ‘small’ takes on roughly the same meaning as for goodnessoffit tests. (4) In the solution to part (b). The present test is. and the exact distribution of the x2 test statistic is discrete. . in theory.5c. of replacing p 1 and p2 by a common estimate p^ rather than by separate estimates p^. If this is done we reach the table below (with expected frequencies in parentheses).3 Contingency Tables 209 approximation is that none of the el. and is a special case i) 4 of a continuity correction. practice. and such use cannot be ruled out as completely unacceptable. but not from one another. (See Problem 2A. so that the evidence is moderate.e. Yates’ correction will often make very little difference to the result. ible to calculate the exact For (2x2) tables.0) 1 1 I Total 240 260 500 I 300 + 156 + 1 The test statistic now takes the value 142 [G+ 96 1 1 or. the correction in fact tends to ‘overcorrect’. we followed the standard.7 for a discussion of this distribution. Also.63) critical values of x:. Even for (2x2) tables. and intuitively sensible. whereas of course the x2 distribution is continuous. However.)2 in the test statistic by ( If. so are generally ignored.s is too small. and p^z when the null hypothesis is H o : p l =p2.ez.) Calculating the exact distribution becomes very tedious when the marginal totals are a t all sizeable.2. 1 . a more powerful test is available by combining the first two columns of the contingency table.
so that the analysis of time series in climatology is a standard technique. are known as lime series. seasonal variation and random varinrion. (d) Plot. but there are many more. 5D. explain what are meant by the terms trend. and plot its values on the graph. .1 5D Time Series One of the types of statistical information which come most readily to the public eye is that contained in regular government publications. Time series analysis is. Plot the data on a graph. most governments publish monthly a great number of economic and social statistics. but rather different. and while the discussion above has concentrated on the prominent governmental area.210 Analysis of Structured Data 5D. what you think might happen to electricity consumption during the next 20 weeks after the end of the series given in (b). For example. so that matching of rings from different trees through the methods of time series is a practical method. meteorological data h a w been kept regularly over many years. Another example is found in the area of dendrochronology. and the width reflects (amongst other things) rainfall. unemployment figures and balance of trade figures are perhaps the most prominent.l The electricity consumption of a Canadian household (a) In the analysis of a time series. separately. (e) Discuss. these treerings are formed by the annual growth of bark on a tree. and the results are given below. for example. the dating of wooden objects through the analysis of the widths of successive treerings. however. seasonal variation and random variation. and discuss how the graphs in (c) and (d) are related to the ideas of trend. the weekly electricity consumption of a Canadian household was recorded. as an introduction to this interesting topic. with reasons. applications of time series analysis occur much more widely. time series problems. giving the successive values taken by some variable at more or less regular intervals of time. (b) For a period of 19 weeks. Thus. Units of electricity Units of electricity Week 1 2 3 4 5 Week 6 7 8 9 10 65 73 72 68 73 86 79 87 82 82 11 12 13 14 15 16 17 18 19 74 75 73 67 67 62 60 63 62 (c) Calculate an appropriate moving average in order to smooth the series. and seldom encountered in examinations at the level we have been considering in this book. We end with two representative. These statistics. rather a difficult subject. from midNovember until the end of March. the remainder left when the moving average is subtracted from the observed series.
x . = +<x.25 62. Typically.18.+I).l + XI + X. The term seasonal variation is. +x.8 63.75 83.e. + X . as below. .25 75. .75 62.4. and this random component of a series can include. (ii) y.8 61. Week 2 3 4 5 y.0 68. t = 3. i. (ii) and (iii). (i) A slowlyvarying.2 80.4 75. If we denote the series by . r = 1..75 65.2 74. among the four listed. respectively. This seasonal variation usually has a period of one year and is caused by the different nature of environmental or human behaviour at different times of year.5 65.2.6 78 6 81.3. It is not necessarily random in the strict sense that successive values are independent. they are not independent. 1 Time Series 2 11 Solutioa (a) A traditional approach to the analysis of time series is to imagine that an observed series is the sum (or perhaps the product) of three separate parts.19. sometimes used to represent other periodic behaviour with a fixed period. These will give. .SD. t =2.+*). ‘smooth’ part of the series which represents the longterm rrend.8 76.25 74. .. for a fractional value of t . 5 . + x .0 . This would be inconvenient when one is trying to compare x1 and y . for example diurnal variation.2 68. . the most drastic and least drastic smoothing of the series. weights which decrease linearly on either side of r (iv) a fivepoint moving average with linearly decreasing weights These are by no means the only possibilities. .8 80.4 83. but any of them would be adequate for the present problem. since using an even number will result in an expression for y. 74. for example. (iii) Random variation is essentially what remains after trend and seasonal variation have been removed from a series. (iii) I Week 11 12 13 14 15 16 17 18 y.2 71. Only one of the above moving avcragcs is required here. (ii) A periodic component which repeats itself regularly. .0 82.75 7 1.0 70.25 70. indeed. (iii) a threepoint moving average with linearly decreasing weights.2. behaviour which is almost periodic in nature. (b) The required graph is shown in Figure 5 . A moving average is generally based on an odd number of time points. .25 72. = +(I. . .+l + x. using two of them..25 80. . (c) There are several moving averages which could be used. .17. however. (ii) y.0 77.8 62. then we could consider using the following: (i) a simple threepoint moving average y... and denote the moving average by y.75 83. but for illustration we calculate y. (iii) 6 7 8 9 10 70.0 81 . (ii) a simple fivepint moving average y.
0 11 12 13 14 15 16 17 18 3.4 2..8 2. since they give a smooth version of the series and trend is defined as the ‘smooth part’ of any series.0 5. (iii) 2.0 0. which smooths less drastically than the first. it can be seen that they give similar. although the second. . In the present example.8 1.6 of these two series of {e. though certainly not identical. e. In the present case where only part of a cycle is available. in that an increase in e. identically distributed random variables (i.25 1.25 0. 1 Both these moving averages are plotted on the graph in Figure 5 . there is some evidence of a pattern among successive values of e. therefore estimates the extent of random variation.4 3.5 1.4 2.75 1. 5 .2 1.2 2. is a random series in the strict sense of the word). (ii) e.90 0 .8 1.75 2..75 3. with such a short series the evidence is not conclusive. and it is possible that e. is a series of independent. (iii) Week e.25 2.212 Analysis of Structured Data 5D.0 1. Moving averages may be used to estimate the trend in a series. (ii) e. e.8 1. any seasonal variation will appear as part of the trend.25 2.2 0.o The plots shown in Figure 5.8 1.6 7. at point t has a tendency to be followed by a decrease.} are very similar. > .25 1.Y a l c & 85Moving Awrage (ii) 80‘ E 753 7065 I 6055 I I I I I I I I 0 Week Week 2 3 4 5 e. However. follows the original series more closely and so tends to have smaller values of 1 e.25 1. pictures. In this example the ‘trend’ is in reality part of a seasonal cycle with maximum electricity consumption occurring in December and January. and vice versa.8 2.75 1.e.25 0. it is necessary to have two or more complete cycles. However to identify a separate seasonal part in a series. The remainder series.8 6. 1.0 6 7 8 9 10 3.
. Let S be the total number of turningpoints in the series of n observations. . in the solution. and so on.2). . x2. . the graph is ambiguous. and (iii) gives the least smooth.2 Time Series 2 13 Week Figure 5. then there is said to be. x2 the second. or the decline may be showing signs of levelling off. Note The choice of a moving average to use is rather subjective. and (b) the weights are all equal. Thus. . not decreasing either side of t . xr is the smallest spaced points of time. From the context of the problem. (a) For the case n = 4.2 Turning points in a time series A time series consists of observations xl. 5D. of three consecutive observations or largest of the three. Write down all the permutations of these four numbers. . x 3 and x4 take the values 1. and for each permutation write down the value which S would take if x 1 were the first number. of what might happen in the future. what is the probability function of S ? (b) Now suppose that X I . Show that the expected value of S is $ ( n . x2. taken on a random variable at equallyxt and x r + l .6 Plots of remainder series for Problem 5 . and the main consideration is the amount of smoothing which is required. There could also be a secondary peak in the middle of summer. a moving average will become smoother if (a) the number of terms used in the average is increased. If. In the present example. x2. D1 (e) It is always dangerous to extrapolate (forecast) a series without additional information. it is likely that electricity consumption will continue to fali for another month or two as heating and lighting requirements decline. all from the same continuous probability distribution. . apart from the observed values of the series. the consumption could be part of the way through a long decline. If all permutations are equally likely. x. are independent observations. x .SD. . . 2. Typically. whereas in late summer (towards the end of the 20week forecast period) the consumption will probably begin to rise again. 3 and 4 in some order. a turningpoint in the series at time t . because of additional refrigeration requirements. moving average (ii) gives the smoothest result of the four averages mentioned. suppose that xl.
.1 <x. is defined as 1 if there is a turningpoint at time t . . {x. {x.2 ) .1.21 4 Analysis of Structured Data S o htion 5D. or 6.l <x.1. so we obtain Pr(turningpoint at t ) = $ = $ Now S = S2 + S 3 + .). <x. Consider now three consecutive observations x. Given these positions. r = 2 . . there is a turningpoint at time t .=1)= f . ways.+l <x. 3 . the frequency of ‘S=O’ is 2. . instances of equality between two or more x s) can be assumed to have probability zero. and x. These are set out in the table below.e.} and {x.+l).+1.+l <x. S. that of ‘ S = l ’ is 12 and that of ‘S =2’ is 10. each of the six permutations has the same probability. Permutation s 0 1 2 1 2 1 1 Permutation s 1 2 1 2 2 1 1 2 1 2 1 0 1234 1243 1324 1342 1423 1432 2134 2143 2314 2341 2413 2431 3124 3142 3214 3241 3412 3421 4123 4132 4213 4231 4312 4321 1 2 2 1 2 1 Amongst the 24 permutations.J. together with the associated values taken by S . 3 and 4. Since the permutations are given to be equally likely. n1. and for the other two there is not. . . we obtain E(S) = f ( n . Note also that. because the distribution of the xs is continuous. . for r = 2 . Then E(S) and since = ‘i‘E(S. each has probability and the probability distribution of S is thus as in the table below. x. . 2. n 1. 3. (b) The relative positions of the x s when arranged in order of magnitude determine the value of S . . &. <x. and as 0 otherwise. I =2 E(S. Because the observations are independent and identically distributed. For four of these. viz. ) . These can be permuted in 3!. the actual values taken by the x s are irrelevant to S . + S.~ < xril < x . the number of turningpoints.2 (a) There are 24 permutations of the numbers 1 .)=OxPr(SI=0)+1xPr(S. where. . . ties (i. {x.
S. 1 to be independently distributed and. in the sense that all observations are independent and identically distributed.). (4) The test based upon turningpoints. it is presented here in the context of the analysis of a series of observations in time. Such a test could also be used in connection with a regression analysis. (See Problem 5A. a timeseries. more specialised. However. For most structured time series. like several other tests for randomness.e. they are not. the discrepancies between the observed values and the predicted values. tests will usually have greater power. used in solving part (b). Given the distribution of S for a purely random series (found here for n = 4) a test of the hypothesis of randomness can therefore be carried out which will reject the hypothesis for particularly small values of S . . such as a linear trend. S3. and so is relevant to the present section.. is of particular interest. the number of turningpoints will be less than would be expected for a purely random series.5.1 for an example. (3) Turningpoints provide one way of detecting whether or not a time series is ‘purely random’. other. This lack of independence means that we could not use a similar argument to obtain Var(S). . i. . is capable of detecting many different forms of structure in a time series.e. (2) The result E(S) = EE(S.) Plotting these residuals against values of time or some other potential explanatory variable can be used to test the ‘randomness’ of the errors. However. . . i. D2 Time Series 2 15 Notes (1) This problem is concerned with finding a probability distribution and an expectation. The alternatives to randomness are often a steadily increasing or decreasing trend. if one particular type of structure. does not require Sz. One of the assumptions in regression is that the ‘random errors’ are independent and identically distributed. One can test this by examining the residuals. and could therefore have been included in Section 1B. indeed. cyclic behaviour or some other structure underlying the series.
.
7 accidents. 198. 183 analysis of variance.Index Items are referred to by problem number or by page number as appropriate. 200 Fratio. 168.2 assembly. 195.2 batches of pesticide. mixing. 5D.4 Arts students. 183 Least Significant Difference. those in ordinary type are page numbers.1 approximate procedures. 53. 142.3 assessment of probability. 155. acceptance. school.6 belt drives. 185 ants on flower heads. 105. 58. 1A.12 structure of treatments in. vivii. 195. 102 attributes relationship between. 19d. 196 rtests. 1845 in analysis of variance. 195. judging a. 4B. 198 normal distribution. 200 contrasts. 197. choice of. 200 in regression. 22 symmetrical distribution.8 accuracy of calculations. 197. 5A. fallibility of.7 association between categories. 3. 142 action limits. 2C. 4D. 196 oneway. 150. rotating. 2A. 58. 52 and binomial distribution. 194 assumptions. 74 asymmetrical distribution. 11.8 bar chart. Entries in bold type are problem numbers. 3B. Problem numbers are used to index titles. sampling incoming. 3B. 147.3 apples. 148 random sample. 130. 1819. 129. 38 beauty contest. 199. 47. 198. 5A. experimental teaching of.6. 34 arrivals. 91 bias of estimators. 4D. 4D. 73 alternative hypothesis. moving. 2A. 124.2. failures of. and topics of central importance. 187 assumptions in statistical problems. For certain topics.2 arithmetic progression. random. 1B. problems involving. 173 authors.5 Bernoulli trials. 201 null hypothesis. 182. 5B.8 bends. 122. 170 . compared with exact. 201 calculations. F. 1634. probability of. 4D. only some of the more important Occurrences are indexed. 2B. 209 arithmetic. 197 degrees of freedom. 150.3 Anscombe. 199 twoway. use of. . sampling bags of. 48 simulation of.2 Bayes’ theorem. probability of batch. 173 acceptance sampling. 12. 50. use of. 2B. 4D. 103 BASIC programming language. 173 batches. 28. 147 checking of. J . 134. 202 between variables.8 addition law of probability.l average range. 13 average. 202 sampling by. 1A. 128. 74.6.8. 88 batch acceptance. 195. 4C. 4D.
93 conditional probability. 13 checks for valid probability density functions.9 independence of. 78 normal approximation. 109 models for. 23.11 British Standards Institution.9. SB.2. 156 confidence interval and confidence limits. 185 checking solutions. 4C. 1468. 127 x2 test combining classes. 4. 115 class experiments.6 confidence intervals.2 class boundaries. 185 class widths.4.4 checking assumptions.12 playing. 7 . 48.4.5. 100 classes in x2 test. 2A. weights of. 141. 2A. 81. 194 complementary events. women investors in. 967. 2C.3 x2 distribution. 1A. variability in weight of. 5D.1 capturerecapture experiment. 109 percentage points. electricity consumption of a .2 sum of random variables.8 pie. 1A.10 birthdays.3. 4C. consultation with.12 for goodnessoffit. 53.9 building societies. methods for. 122. 184. 4D.13 for normal variance. 423. 168 chart bar. 139. 3B. 2A. 147 expectation and variance. 2A. 173 Buffon’s needle. confidence interval coin. 122 bird wingspans. use of. 156 comparison of estimators. 4C. 1A. determination of modal. lA. l bottled fruit. 170 twosample test for. 2B. 189 . 161 black spot. 128. 4C. 3A. 47. 2. 100101. 4.6 birdie.1. modifying an accident. 122. 2A. 106 validity of approximation.2. 144.10 combinatorial arguments. 100101. 210 club members.4 civil service employment data. 2A.1 quality control for. watch the. 4C. 111 choice of tail. 108. 4D.1 coding of data. choice of. 58.4 boundaries. 17. 115 breaking a rod at random. 44 cheese. 2B. 66 and negative binomial distribution. 3A. 159. 38.12. 112. association between. 202 cautionary tales. 4D.8 bolts lengths of manufactured. 160 comparative experiments. 206 use in contingency tables. 133.4 power of.7 categories.l and Bernoulli trials. 112.7 boots. IB. 208 chord of circle. 153. 131 and hypothesis test. 17. 49 hypothesis tests for. 4C. 3A. 11 cards collecting cigarette. 124. combining. 75 climatology. 154. collecting. 2. 107 critical region. 111 client.9 as conditional distribution. 4C.11 cigarette cards. 48 and hypergeometric distribution. 64 and Poisson distribution. 109. 1A. 15 children sitting in a row. 103 control. 42. 4C. 52.46 inappropriate use. class. brands of. 206 use in double sampling. 3B.3 compound distribution. 4. suggestions for.12 cigarette consumption. 3A. 19 lifetimes of electrical. 32 Central Limit Theorem. 2B. time series in.6 exact and approximate tests. 127. 33. 1B. length of. 105. see correlation. 1A.4. 1257. 152. 4A.2 quality of electronic. 2A. 202 for contingency table. 2A.2 Chevalier de Mere. 2A. 74 components of systems in seriesparallel arrangement. 5C. 206 Poisson approximation to. 155 simulation of. 178 coefficient. tossing a. 38. 28.218 Index binomial distribution. 1301. 128. l A . histogram with unequal. 173 use in sign test.3 Canadian household. 4A. ignition problems in. 2A. 4A. 1A. 75 normal approximation to. 36. 1378 degrees of freedom. 151.1. 1089. 83 class. 209 statistic.7.8 distribution. 74. 35 cars.3.5 bivariate probability distribution. wrellington. 2A.
209 correlation. 139. 147. genetic. 198. 188 choice of tails. 48. 2B. 202 Yatcs’ correction. 187 misuse. 105. 4C.4. 210 density frequency. 134 in analysis of variance. 209 continuous random variable. 1689 for T . 1656 for proportion. 2B.1. 25. 5C. 36. standardised normal. 4B. 44.6 Spearman’s. 43 for Poisson mean. 82 and expectation.2.3. 4B. Chevalier. 179. 32. 48. winning at the game of. 4A. 113. 1913 twosample test. 967. 140 between normal means. 4B.45 Kcndall’s. 202 dendrochronology.6 counselling. see probability density function dependent variable. 140 interpretation. 176. 75. 46.12. 204 x2 test. 98 decimal places. 180. 4C.45 plotting of. 107.2.2 use in simulation. 195 Yates’. 138 paradoxical behaviour. 2C.Index 219 effect of confidence coefficient.6. 2A. 204. 13. 1256.6 for correlation coefficient. 58. 213 darts. 1B. 89 expectation and variance. 1889 correlation coefficient product moment. 58. game of. 100 function. 1R. 146 couponcollector test.6. 200 in regression. tolerances for metal. 5A. 36. choice of number of.1.6 in large samples. 177 in analysis of variance.1 total score. 257. 1A. 195.1 normal distribution.4 data coding of.4 for normal variance. 199 control chart. 4B.1 cumulative frequency. 208 degrees of freedom. sampling for. 145 width.56. 15 deciles.10 cows. 24. lB. 195. 149. 4A. 127. 121. 199. 52.4 between proportions. 4B. 1B. 146.3. 209 contagious distributions. 2C. 589.3 dice problems.3 degrees of freedom. 113 curvilinear regression. 149 effect of variance. 110 contingency table. 159. 96. 36. 200 in contingency table. 13 and binomial distribution. 3B. 89 in simulation. 4A. 745. 178.6 interpretation.5 for exponential mean.6 cubic equation. 133 least significant. use of. 1445 overlapping. 4A.8 correctcd sum of squares. 95. 4A. 182. 52 defective items. milk yield of. 5A. 212.4 for normal mean. 4C. 1767. 105. 2B. 4C. 71 diameters of marbles. 105 snooping. 4B. 184.3 presentation of ordered.2 effect of sample size. 197. 78. 1401. 72. 1878 ties. 192 in paired data. 138. 179 cyclic behaviour in time series. 206 alternative tests. 5A. 52. 73 diagrams in solutions. 44. 4A. 1467 factor.3 confidence limib. 136. 84 as monotonic function. 4D.2 difference between random variables. 184 de Mere. 1956. 205 trends in. 5A. 113 paired. 178 grouped. 201 . 1878 tdistribution. 168 for difference. 127. 5A. 177 design of experiments.46 and regression. use of. 5A. 215 cylinders. 152.2 for binomial parameter. 1A. 138. 164.l number of sixes.1. 48. 69 covariance of random variables. see confidence interval conservative procedures. 4B.3 dangers of extrapolation. 99. 186 deviate. 40 cumulative distribution function. 189 z transformation. 83 contrasts in analysis of variance. 70 simulation. 175 thoughtless analysis. 18. 209 continuity correction. 198. 181. 74. 4C. 187 correction continuity.3 craps.
index of. l A . 3. 186 randomised response. 124. 42 double sampling. 83 enumeration of outcomes. quality of. 84 and expected value. 215 uniform distribution. 106. 38. 212 see also confidence interval.2 modelling. 167 applications. 167 for discrete random variable. see cumulative distribution function divisor for sample variance.2 expectation of random variable. 174 type I.89. 207 normal distribution. 2B. 165. 28 exercise habits.2 skewness. dangers of. 72. 110 and cumulative distribution function. 26. voting in general. i 8 . 119 in regression. 167 inference. 31. 5C. 28. 168 mixed. mutually. 98. 40 digit. 34. 58 economics. 68 linear combination. 13 equal variances. 8 . 71. 110 distribution. 171 use of remainder series. 37. 1145. 2C. 49 difference. 172 mean square error of. 4D. 81 normal approximation for mean. 4B. 72 normal distribution. 2B. 42. 6 complementary. 28. 170. finding mode by. 194 design of.5. 26. 4D. 196. 2C. 4A. 17 mutually exclusive. survey of. 5. 33 dysentery. 131 equally likely outcomes. 4D. 35. 146. 179 exponential distribution.8.8 sum. 214 joint distribution. 35 employment data. 95 examination questions.1. 159.4.9 of population size. 169 simulation. 74 indcpendent. 28. 159.11 of ratio. 15960. 83 and variance. 4D.12.4 simulation. 162. 36. 4C. 5A. 164 double integration. 179. 164 efficiency of. 28 examination marks. 2A.1 electronic components. 144 simultaneous. l l . l A . assumption of. 14. 213 . 162.3 dispersion. 47 exact and approximate tests. 67 explanatory variable. 82 expectation and variance. 1 capturerecapture.45 and Poisson process. 19.10. 2A.3 rounding. 58. 88 discrete distributions. 967 exponential distribution. 177.5 electrical components. 36. 60 standard. wording of.4.1 of sample standard deviation. 1B. 29.3 of n. maximum likelihood estimation events. 201 estimation biased. l . 22. 2A. 2C.3 examination of raw data. 86 see also mean expected frequency.3 embarrassing activities. 2A. 5B. 108 in contingency table. 40 yuadratic. 82 extrapolation. 4D. random. 202. 155 normal approximation. 87 binomial distribution. 142 efficiency of estimators. 111 small values.4 ecology. methods based on systematic. 1702 suggestions for class. see under numes o particular f distributions distribution function. 23.7 duration of a game.11 comparative. 29. 5D. 170 Poisson distribution. 756 effect of scaling.4.2 electricity consumption of a Canadian household. 18.10 moment generating function.220 Index differentiation. 1B. 74 error mean square. 7 . 90 equations cubic. 42.6 precision. 180. 165. t6. 22 exclusive events. 43 from probability plot. 25. 182 minimum variance. lifetimes of. 208 experiment. 43 elections. 214 geometric distribution. civil service. 155 expectation and variance. 150. 59. 129.
109.4 Poisson distribution. 4C. 144. 267. 5C. 142. S O . Sir R. 1B. 14. dangers of. weather. 4C. m2 unknown. 100 frequency graph.1 odds close to evens. 170 of two random variables. 28. 17 independent random variables.7 income of statisticians. cumulative relative.l3 normal mean. 2B. 1B. cumulative. 152. 3B. 15 probability of winning. 100 hypergeometric distribution. 18890 fishing. 4B. 122.6.4 gaps in Poisson process. use of. A. 179 fitting distributions to data. 194 Fisher’s z transformation in correlation.8 Fdistribution. 3A.4 fruit machine. 5B.5 generating function.6 goodnessoffit tests. 33. see moment generating function. 178.1 forecasting in time series. 43 power.10.11.Index 221 factory.4 fruit. lA.11. 179 fitting plungers into holes. 96.11 glue sniffers.2 graph. 1301.2.7. 113. 150. 2A.9 guayule plants. 5A. 105 unequal class widths.5 histogram. cumulative frequency.2 heart disease and exercise. 209 and binomial distribution. 177 independence of components of systems. 155 in analysis of variance.and twosample tests. 161 of an unbiased estimator. 113 frequency density. 200 in contingency table. 4A. ants on. 195. 1A. choice of.11 F test for equality of variances. 33 geometrical probability. cumulative. 5A.1 Fratio in analysis of variance. 128 see also entries for one. 105 graphical methods. 2A. 197 frequency. 4D. 134. 3A. 1457. relationship between. problems involving. 155.. 1545 choice of tails. 33 highest score in dice. 5C.3. 200 function of random variable. 209.12.6 failures of belt drives. 4C. accidents in. pink and white. 202 height and weight. 101 see also plots grouped data.6 graduate unemployment. 189 binomial parameter.3 flowers. 1923. 5A. 209 correlation coefficient. 105 frequency table. 123 in analysis of variance. 4A. 133. 2B.6 robustness. 18. 3A. 1434 normal variance.16 from probability plot. variability in weight of bottled. cumulative. 205 conservative. 28. 7 . 135 in regression. o2 known. 165 general elections. 105 frequency polygon. 128. 154. 96. goodnessoffit tests. 113 guaranteed life of a machine.5 normal mean. 182 hypothesis test and confidence interval.4 7r.5 flower heads. correlation between.7. 205. 183 in twoway analysis of variance.2. 183. 3B. multiplication law for. 2A. 3B. 4C. 2C. 196 in goodnessoffit. annual. and for particular tests and distributions ignition problems in cars. voting in.2 fitted values in regression. 125. probability generating function genetic counselling. 96 frontal breadths of skulls. 146 geometric distribution. 215 randomness in regression. rubber content of. 4A. 68 geometric series. 142 . 179. 19 independent events.4 for finding a mode. 87 independent samples and paired samples. 130. 215 ratio of means. 66 hypothesis. 141. 1257. 3A. sum of. 3B. 170 see also under names ofparticular functions game duration. 34. 213 forecasts. lA. plotting. 95 fitting of parabola.12. 4A. 135 see also Ftest Fisher. 188 for median. 38.4 of randomness. test for. 106 in regression.
174 see also expectation. 146 MtrC. 184 lifetimes of electrical components. 135. 56. 4D. 201 in analysis of variance. 2B. 103 larger of two random variables. 414. 72. 196 and type I error.6 insects. 4A. 2. 56.1. 123 Mendel. 2B. Chevalier de. 5C. 142. 3. 96. method of.1 manufactured product. 5A.3 joint probability density function. sample mean mean square error. distribution of. 11.1 derivation. 201 least squares.3 milk yield of cows. 151. 184 Poisson. 95. 1B. normal mean. 186 insect breeding experiment. 57. 195. speeding up. 5D. 2A.3 means.1 multiplechoice test. 103 method of.4 mutually exclusive events. 163 massproduced items. 43 inflation.8 Lindley.3.4. 176 and probability. V. 91 exponential distribution. 115. 57.6 Kent.8 of a random sample. 76. 2B. 9 for independent events.9. 109. 2B. 5. see confidence interval tolerance. 5A. 90 I.2 lightbulbs. 163 minimum variance estimator..3 interaction in twoway model. 101 mode of a distribution.3 marking a multiplechoice test. 201 interquartile range. 2B. testing. 114 standard error of. 28 . 3B. 43 Kingman. 2A. 201 in regression. 15960. l largest of a set of random variables. C . 124 inference.4 mathematical tasks.222 Index independent variable. diameters of. 84 modal class. 4B. 2B. 163 moments. 102 logarithms. guaranteed life of a .3 mass. University of. 57 money supply. 42 joint probability function. 142 machine.5 multiplication law of probability. 97. 40 models. Prof. 66.. 1 knapsack problem. 2A. sex ratios of. 51. marking a . examination.10 mixtures of probability density functions. vi. 155 in analysis of variance. number of. 161 jointly distributed random variables.4.4 influential observations in regression. 48. 110 inequality quadratic.5 marks. 124. l B . 1B.4 moving average. 172 mixed exponential distribution. 120. 156 moment generating function.5 maximum likelihood estimation. F. 85 moment of inertia. measure of. 745. 834 of random variables. D.8 matches in a box.8 median. 47 of dispersion. 28.2.102. 167 mean. 105.2 Kendall’s correlation coefficient. 356. 48. 12.4 mathematics teachers. 180. 60 use of differentiation. 96. 73 for conditional probabilities. 2C.8 limits confidence. 128 estimation from plot. 173 warning and action. 199. control chart for. Sir J. 17 multiplicative model. changes in weight of.4. 15 metal cylinders.3 minimum mean square error. 3 . 1. 4. 177 index of binomial distribution. 5A. 1B. lcngths of. 196. 79 location.56. 102 calculation. . 20 kurtosis. 142 music recital.4. Gregor. 96.9 manufactured bolts. 4. control chart for. 4D. 4D. centre of. use of.3 distribution of.8. 96 intersection of events. 27 Least Significant Difference. 4D. tolerances for. 4A. 148 Tchebychev’s. Q. 81. 196.5 marbles. 4D. 1A. 1B. 100102 of Poisson distribution. 70 linear combinations of normal random variables. 3 inversion method of simulation. 58.
2B. problems involving.2 odds. 36. 2B.12 operating characteristic. 4A. 4B.46. 175 negative binomial distribution. 1B. 70. 183 oneway analysis of variance.1 expectation and variance. 1657 rank correlation coefficient. 59. . 179. fitting. 1856 paired data. 3B. 64 nonadditivity in twoway model. equally likely. 2B. 2C. 121 regression. fitting of. 98 nature reserve. 71 square of. 128. mixing batches of. 3B. 23. 4C. 4D. 2B. 1434 playing card problems. 1434 ratio. presentation of.4 twosample test. 142 ttest. 15 onesample tests for binomial distribution. Blake. 124. 127 percentiles. 214 personal probability.2 moment generating function. 4B. 176 partial fractions. 5A. 127. 4B. 157. 47 nuisance. 2A. 3B.4 hypothesis test.9 pictorial solution. 177 use of. 2A.6 twosample tests. 78 standardised. R . 4B.4 exponential mean. 1401 onetailed and twotailed tests. 121 for Fdistribution.1 simultaneous equations. 1457.5 pointer. 1256.3. 4B. 4C. 135 for x2. inference from Buffon's needle. 130 see also normal mean. 4C. 104 pink and white flowers. 6. 30 pie chart. 123 nonsymmetric distributions.3. 1B. 20. 98 occurrences in time. 135 notation percentage points. 38 pesticide. 153. 3A.2 petrol station. 130. 152. 156 Occurrences of thunderstorms. 35 plots.2. 90 outliers in regression. 170. 120 ordered data. 74. use of.4 difference. positioning a . 1.1. 2B.l pivotal function. arrivals at a. treatment of.2 normal variance confidence interval. 193 t distribution. 179 parabolic probability density function. 1A.4. tests for. 2A. variance of set of. 173 opinion polling. 4A.56. 174 normal mean confidence interval. random. 142. 76 plotting.1 percentage p i n t s . 85 Pascal. H. 174 related distributions. 157. 169 Poisson distribution. 111 goodnessoffit.55.2.8 1test. 4B. 4B. 130 cumulative distribution function. 4D.10 and binomial distribution. 95 in regression. 143.3. 1689 estimation.5 comparison with independent samples. 1A. 4B.1 fitting. 105 ordering.1.35 linear combinations. 201 nonparametric tests.5.45 range of random sample. 36. 11. 4A. 15 peas.4. 109 discrete distribution. use of. 143. 128 plungers into holes.8 hypothesis test. 8.5. 2B. 5B. 38. 1A. 14651. 122.Index 223 n versus n . 70 tables. 28. approximate. 58. 206 x' distribution. 12 normal approximation binomial distribution. 142 parabola. 146 nonuniformity of birthdays. 125 normal deviate. 176 numbers. 16. 128. 205 in correlation. 131.7 simulation. 4B. 2B.4 radius of. 4A. 4A. 13 use in simulation.3 probability.5.10 Neave.1 as divisor. weights of tins of. normal variance onesided confidence interval. 40 parameter binomial distribution. 723 normal distribution assumption of. 2B. 1923 in regression. 125. 98 permutations.5 for Poisson distribution.4 T . 1A.7 .3 outcomes. 301 nuisance parameter. 79 mixtures. 1A.3.
88 numbers. 3 presentation of ordered data.7 average. 2C. 182. 25. 2A. 1A. 2C. 39.4.11 in binomial distribution. 53. 98 quartiles. 3B.6. 209. 25.78. 55 random series. 82. value of. 110 fitting. 1A. 2. 3B. 116 possibility space.10. 162 function of. problems involving. 44 joint. 9.1 quality control.2. corrected sum of. 60 models for. 42 mixtures of.224 Index Poisson distribution. peeling faster.79 and binomial distribution. 173 multiplication law. 96. 1656 expectation and variance. 1B. 4C. 171.5 power of a test. 105 presentation of solution to probability problem. 178.8 probability generating function. 167 onesample test.11 positions. 45 quadratic equation.4 products. 35. 114 interpretation. wording of examination. 21. 2A. 170 simulation of. 29. 2C. 4C. 4D. 73 assessment of. 39 evaluating constant. 25. 51 probability mass function. 6063 use for moments. 28. 60 approximation to binomial. 86 see also under names o particular f probability function. 17 law of total. 48.6 justification.13 transformation of. 170 confidence interval for. 184.46 normal distribution. 38 sequential calculation. 5B. 501.3 quantiles.9 and exponential distribution. 157. 6 probability addition law. 3B. 69 Poisson distribution. 171 prediction from regression line.6 confidence interval for difference. 33 see also under nurnes of particular distributions probability generating function. 58. 75 normal approximation. 170 distributions . 35. 187 proportion estimation of. 4D. 3B.6 simulation. 105.10.1. estimating size of. 179 relationship. 1B. 2C. 84 use of cumulative distribution function.1 random digit. 1B. 8. see probability function probability paper. 83. 156 sample. 102 selection at. 8 probability density function. 3. 17. 196 quality. 18. 10 conditional.l3 occurrences in time. 128. 26.1 Premium Bonds. 39. 2A. 148 regression.7 Pythagoras’ theorem. 157 precision of estimators.6.4 rainfall forecasts. 73 personal. 205. 603 probability plot. 115. 42.46 estimating moments.45 Poisson distribution. 215 practice. good statistical. 177.4 pooled variance. 22. 1 potatoes. 5A. 47 expectation and variance.7. 4. 2A. 42. choice of plotting. 33. 1B. 5A.2. 3B. 144 inequality. 45.4. 1656 maximum likelihood estimate. 22 queueing problems. 155 confidence interval for mean. 3B. 98 questions. 128 product moment correlation coefficient. 167 applications.8 Poisson process. 18. 267.6 probability plotting. 1B. 4C. 82. 215 random variable continuous. turning points. 28. 97. 38. 4D. 4C. 58.6 see also binomial distribution psychological experiment.8 geometrical. 58 mode.3 twosample test. 82 and Poisson process. 5B. 5C. 131. 70 discrete. 1334 population. 7. 60 subtraction. relationship with temperature. 1B.
36. 4A. 4B. 4D.2 largest.9. 173 double. 623. 4C. 8 4 smaller of two. 2B.Index 225 random variables covariance. 179 relative frequency. 5B. 170. 66. 176.1. 212 replacement. F. 179 distribution of slope. 72. 170 see also mean. 215 of telephone numbers. 1A. 1401. sampling with and without. 2B. 1B. 1B. 5A. 4D. 7.5 rounding errors. 2A. 5A. 1920 remainder series. 4D.2. 6. 86 sum of. 42. 3B. 4C.4 testing residuals for.8 conversion to standard deviation.2 sampling methods.1 binomial. 2B. 4D. 34 see also time series seriesparallel system. 1516 samples.8 interquartile. 161 sample variance. 4D. 22. 4B. 78 function of two random variablcs. 79 sampling for defective items. see least significant difference . 27 rank correlation. 33 techniques for summing. 149 standard deviation. variance sample space.6. 145.3 school register experiment. 19.12. 147 significant difference. 55. 70 seasonal variation. 79 mixture of normal. 3.l robustness of hypothesis tests.2 sample location. 4B.7 bags of apples. 7.l randomised response experiment. 215 sum of Poisson. estimating a. regression for. 4D.6 Scott.3. 4D. l B . 22 acceptance. lB.2.9 random variation in a time series. 99. 5D.3 sampling incoming batches. effect on expectation and variance.56.7 sampling distribution. W. 2A. 1B. 60 rubber content of guayule plants.10 difference between. 15. 99. l total from two dice. use of. 34 reduced sample space.8 score highest in a game of dice. 69.2. 3. 53. 4B. 96 of random sample from normal distribution. 174 of random variable.2. 1.3 sampling with and without replacement. 102 range. 173 by attributes. random sultanas in. l linear combinations of normal. 2A. 2B. 185 curvilinear. 179 range average.2.11 of model. 31 several sets of data.9 set notation.12 based on normal distribution.6 randomness hypothesis tests for.1 selection at random. 60 series geometric. 2A.3 sign test. 14.7 twostage. 182 outliers.1 residual mean square. 1A.56. 6. 184 interpretation. 2A. 4D. 113 reliability of systems. 1B.1 scaling.6 recurrence relationships. 1B. cumulative. 173 by variables. 5D. 128 rotating bends. 181. 72 school assembly. 4D. combining. 16 regression. 5A.4. 179 retirement pensioners. 67 scones. 66.5.3 sex ratios of insects.3 sampling assumption of random. 1B. 96 sequential calculation of probabilities. numbers of.6 ratio. 174 size.13 and correlation. 188 analysis of variance table.. 87. 183 residuals in regression. 1223 significance level. 4D. 5A. 161 jointly distributed. 179 estimation of variance. 140 function of. 178. 14. 178 erroneous use. 182 fitted values. 75 semiinterquartile range. l B . 2B. 1B. 5C. 2A. 1856 residuals. 1823 class exercise.
36. 32 Tchebychev’s inequality. 4B. interpretation of. 4B. 15960. 182. see sample space spacecraft. 1A. 10. 37. 157 statistical tables. 70 standardised normal deviate. 129 spurious correlations.l normal distribution. 13 pictorial.l Poisson distribution. 2A. 38.1 test couponcollector. measures of. reliability of.5 sign. 147 by enumerating appropriate outcomes. 122. 4C. see also analysis of variance. 8 suggestions for class experiment. 1913 Times.1 standard error of sample mean. 87 expectation and variance. 3B. 124 tdistribution. 215 Poisson distribution. 122. definition.12. 5 methods for checking.9 sums of squares and products in analysis of variance. 5C. 170. 2B. 86 smoking habits of statistics and history teachers. 124 skewness. 120 statistical practice.8 estimation for normal distribution.3 exponential distribution. 29. 70 tables of normal distribution.7. 2C. 99.1. 1A. 74 probability density function. 2C.4 sports club. 5B. 2A. random. good.10 symmetrical distribution assumption of. 102. 4D. assumptions made in.3 exponential distribution. distribution of. 2C. 75 statisticians. 195. 67. The. use of method of. books of statistical. 30 presentation of. 92 table.4 temperature. 90 inversion method. 205 solution. 123 systems. lA. 42 sultanas in scones. contingency table tables. 125 relationship with Fdistribution. navigation system for. 187 surprise index. 141 degrees of freedom. 1778.l simultaneous equations. 70.9 Spearman’s correlation coefficient. 1B. 82 skulls. 2A. 97. 97. 40 symmetry. 2C. occurrences of. 4C. randomness of. 177 subtraction of probabilities. 3A. 4C. 723 statistic. relationship between quality and. 187 8 squash match.6 in public places.2 telephone exchange. 198. 78 in correlation. 124. 126. cumulative frequency. 199 space. 36. 3.10 spread. 170 estimation from probability plot. 71. 79 teaching of arithmetic. 2C. cautionary. 208 smaller of two random variables.2 comparison with analytic method. 2A. books of. 187 normal approximation to.5 smoothing of a time series.1.6 speeding up mathematical tasks. consultation with client.226 Index significant result. 94 discrete distributions. 2C. frontal breadths of. 128. 28.2 ties in rank correlation. 91 binomial distribution. data. 9 standard deviation calculation of.l slope of regression line. 2112 snooping. 200 in regression. 186 . 1920 table. problems with. 71 sources of variability. 17882 small expected frequencies. 2B. 69 multiple choice. annual income of. place of. 142 simulation. 70 statistician.7 numbers. 38.13 Bernoulli trials. 2A. 2C. 128 skew distribution. null hypothesis for.3 uniform distribution. 174 standardisation of normal distribution.2. 105. 184. 29. treatment of. 174 tales. 11.1. 96 conversion of range to. use of.8 sum of independent random variables distribution. 13 survey of exercise habits. 3B. 96 table lookup method.1 stages. 1223 see also hypothesis test thunderstorms. 6 use of diagrams in. 163 statistical inference. experimental.2 swimming pool. arguments based on. 5A. 2A.4 slippery pole. 2C. 170 for normal distribution. 114 expectation of sample. 2C. 74 size of test. 2C.
6. law of.1 uniform random numbers.7 twotailed tests. 5A.4 variable dependent. 173 variance. 2B. 5C. 142 twosample. 86. 198. 99. 131. 175. 5B. 18. 4B.3 and sample size. structure of. 209 ztest. use of. 10 wellington boots. 29.5 paired sample. 49 divisor. 1B.8 watch the birdie. 1A.2 uniform distribution. see Bernoulli trials t test in analysis of variance. 24. 2B. sampling by. 149 widths. 43 unthinking analysis of data. 4A.2 total sum of squares. lB. 4B. graduate.8. 134. 31.3 Type I error. 3 University of Kent. 4D. identification of. bird. Fisher’s. 1B.4. 4C. 14 urn problems. 13.8 ttest. 4A. 1334 see also analysis of variance. 4B. 2A. 201 unbiased estimator. 29. 184 unusual events. 44 value.2. 4A. 5A.3 for Poisson distribution.l tolerance limits. 52 probability density functions. 5C. 83. 48. 170 treatments. use of. 1823 onesample. histogram with unequal class.2 for normal variances. 23 voting in general elections. 4A.6 women investors in building societies.5 warning limits. 208 confidence intervals. 38. see onetailed tests twoway analysis of variance.4. 170 Poisson distribution. 131 normal approximation to binomial. 164 binomial distribution. 4c. 2C. 173 total score from two dice. 1B. 195. 2C. 4B.45.lO. 199 variability in weight of bottled fruit. 100 wingspans. l width of confidence interval. 72 exponential distribution. 756. 164 effect of scaling. 989. 194 twostage sampling. 5D. 194 turning points in a time series. 134. 87 value of rainfall forecasts. 129. 1623. 50.2 twosample tests effect of unequal variances.10. 22 Yates’ correction for 2 x 2 contingency table.10 weather forecasts. 38. 2B. 5D. 167 geometric distribution. 173 total probability. 177 explanatory.Index 227 tins of peas. 177 independent. 140 of linear combination. 3A. 1A. 163 of difference. 206 for normal means. 4D. 188 in regression. 4C.1 weight.1 variability. 189 for binomial distribution. 4C. 42. 72 of sum. 26. 177 see also random variable variables. weights of.8.3.12. 4A.1.1 trials. 196 tree diagrams. 162 biased estimation. l A .5 z transformation in correlation. 159.5. problems involving. disadvantage of. 170 unemployment. 356. sources of.5 weighted average of probabilities. 2B. 53 x2 approximation. 2A. 18890 . 110 pooled. 35.3 validity of arguments checking assumptions. normal variance verbal arguments. 8. 133 for correlation coefficients. 21. 196.7. 4B. 200 transformation of random variables. 1989 in correlation. 2B. 171.1. 1B. expected.13 union of events. 4C. 1401. 32 trend. 68 identities for.4.4. 36.3.3 wording of examination questions.