This action might not be possible to undo. Are you sure you want to continue?

LANGUAGE

BOOK

SOCIETY

Teach Yourself

STATISTICS

Richard Goodman

SOME OTHER ELBS LOW-PRICED

EDITIONS

The 'Practical Books' Series Abbott, P. TEACH YOURSELF ALGEBRA EVP TEACH YOURSELF CALCULUS EVP TEACH YOURSELF GEOMETRY EVP TEACH YOURSELF MECHANICS EVP TEACH YOURSELF EVP TRIGONOMETRY Cousins, D. TEACH YOURSELF BOOKEVP KEEPING Hersee, E. H. W. A SIMPLE APPROACH TO Blackie ELECTRONIC COMPUTERS TEACH YOURSELF MECHANHood, S. M. EVP ICAL DRAUGHTSMANSHIP Ireson, R. T H E PENGUIN CAR Penguin HANDBOOK TEACH YOURSELF Pitman College EVP TYPEWRITING Snodgrass, B. TEACH YOURSELF T H E EVP SLIDE RULE T H E R I G H T WAY TO Taylor, H. M. Elliot CONDUCT MEETINGS, CONand FERENCES AND DISCUSSIONS Mears, A. G. TEACH YOURSELF EVP Wilman, C. W. ELECTRICITY The 'Textbook' Series Allen, R. G. D. MATHEMATICAL ANALYSIS FOR ECONOMISTS Bailey, N. T. J. STATISTICAL METHODS IN BIOLOGY Hill, Bradford A. PRINCIPLES OF MEDICAL STATISTICS Macmillan EVP Lancet

TEACH

YOURSELF

STATISTICS

By

R I C H A R D GOODMAN M.A., B.Sc.

Head of Department of Computing, Cybernetics and Management> Brighton College of Technology

THE ENGLISH LANGUAGE BOOK and

SOCIETY LTD

THE ENGLISH UNIVERSITIES PRESS London

First printed zgS7 Reprinted ig6o, 1963, 1965, 1966, 1969 and 1970 E.L.B.S. edition first published 1965 Second E.L.B.S. printing 1967 Third E.L.B.S. printing 1969 Fourth E.L.B.S. printing (metricated edition) 1972

All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.

ISBN 0 340 05906 0

P R I N T E D IN GREAT BRITAIN

new methods of analysis are constantly arising and influencing even the foundations. In case notation v . problems of application are the more readily and successfully tackled. the dangerous contact is established. but it is not difficult. when reading Chapters VI and I X . so. and the reader is strongly advised to follow them up. and for reasons of space. hoping t h a t whatever situation may arise will be a text-book one to which the appropriate text-book test will obligingly apply. But Statistics is a dynamic. but once you begin to touch science. especially Teach Yourself Calculus. One can. New techniques. including the normal bivariate distribution. which involve double integrals. have had to teach themselves some Statistics. The reader is urged to bear this fact in mind all the time and. often with little mathematical training. Once t h a t has been acquired. however. Abbott's books in this series. Occasionally the algebra may appear to be a little tedious. have been given. a note has been added a t the end of the appropriate chapter. Statistics is a branch of applied mathematics. Synge (Science—Sense and Nonsense) OF recent years. at least from one point of view. have been treated in an appendix. Whenever possible. have found it easy. the concentration has been on fundamentals. for. particularly. The standard of mathematics assumed is not high." Professor John L. developing science. some mathematics is essential. and very soon one begins to wish t h a t one had learned a little about the actual nature of the fundamental tests. references to Mr. The trouble is. Where this has not been possible and new ground has been broken. Few. even in an elementary treatment of the subject.P R E F A C E TO SECOND IMPRESSION " It is all very well to say that the world of reality should be kept separate and distinct from the world of mathematics. I t is the aim of this book to help those who have to teach themselves some Statistics to an understanding of some of the fundamental ideas and mathematics involved. probably. t h a t such situations are rare. many people. Continuous bivariate distributions. and. In the trivial operations of daily life. For this reason. try to learn some of the basic routines and tests as rules of thumb. you may be able to keep clear of the concepts of mathematics. of course.

O. G. i960. Chambers). R. University College. perhaps. F. Alec Bishop. and R. A set of Exercises concludes each chapter. Durrell. Of the hand-operated models. to whom I am immensely grateful. Comrie (W. and obtainable from. a list of a few mathematical symbols and their meanings follows t h a t appendix. Braithwaite and C. the Madas 10R is recommended. London. P. Wishart. Moore.vi PREFACE should present 'difficulties. wherever possible the necessary corrections have been made. J. In this I have had the encouragement and invaluable assistance of Mr. R. t h e most useful collection of exercises at present available is t h a t provided by the two parts of Elementary Statistical Exercises issued. van de Geer. and (2) Lanchester Short Statistical Tables by G. and to the late Dr. V. P R E F A C E TO 1965 IMPRESSION In this impression a determined effort has been made to eliminate residual mistakes and misprints. Lastly. A desk calculator is not essential. especially Dr. L. to my sister. the Department of Statistics. Titmus (English Universities Press). while to all those who have drawn my attention t o mistakes and errors. I express my gratitude. T. the Rev. D. C. BRIGHTON. G. Liam Grimley and Dr. except the first. Miss Nancy Goodman. to the staff of the English Universities Press and to the printers I wish to express my appreciation of the care they have bestowed upon this book. to Messrs. Chaffer. G. . To be adequately equipped to tackle such exercises the student is recommended to have by him: (1) Chambers's Shorter Six-Figure Mathematical Tables by the late Dr. but if the student can possibly obtain one he should certainly do so. but the student is urged also to tackle those provided in some of the books listed on page 239. and Leonard Cutts. R. Specially important as. J. my thanks are due for their encouragement and suggestions.

examples and tables : Sir R. Jonathan Cape. Fisher. Aitken and Messrs. the Association of Incorporated Statisticians. W. Mr. Paradine and Rivett and the English Universities Press. Mr. Weatherburn and the Cambridge University Press. Oliver and Boyd . E. the Rand Corporation. Professor C. Synge and Messrs. Professor G. Detailed acknowledgements are given in the text. Heinemann. Messrs. Obsolete or obsolescent units have been changed to metric units. Inc. Iowa. R. A. P R E F A C E TO 1970 E D I T I O N Following the death of Richard Goodman in 1966 I have been entrusted with the task of preparing the new edition of this excellent little book. some misprints have been corrected and t h e setting improved in a few places. Rosenbaum. the Directorate of Army Health and the Editors of the Journal of the Royal Statistical Society. Snedecor and the Collegiate Press. the Royal Statistical Society. C. S.. Glencoe. the Institute of Actuaries. P. and the Free Press. Professor M. C. Messrs. LAMBE vii . Ames. Illinois. G. Charles Griffin. G. Professor J .ACKNOWLEDGEMENTS The Author wishes to express his thanks to the following for so kindly and readily allowing him t o use quotations. Brookes and Dick and Messrs. and the American Statistical Association. John Wiley and Sons. and the Senate of the University of London. Kendall and Messrs. L. Professor A. Rider and Messrs. California.

PAGE V A FIRST LOOK AROUND Statistics and Statistics—Descriptive Statistics— Samples and Populations—Statistical Models.CONTENTS PREFACE I . 2 1 Frequency Tables. STATISTICAL MODELS. . . Poisson. Normal. I l l : THE NORMAL DISTRIBUTION Continuous Distributions—From the Binomial to the Normal Distribution—The Normal Probability Function—Some Properties of the Normal Distribution— Binomial. . Histograms. viii 75 . I : THE BINOMIAL DISTRIBUTION Tossing a Penny—Generating Function of Binomial Probabilities—Binomial Recursion Formula—Some Properties of the Distribution—Moment Generating Functions—Fitting a Binomial Distribution—"So Many or More"—Mathematical Note to Chapter— Gamma and Beta Functions. 50 III IV 64 V STATISTICAL MODELS. I I : THE POISSON DISTRIBUTION : STATISTICAL RARITY . II II FREQUENCIES AND PROBABILITIES . . Statistical Distributions—Tests of Significance. Frequency Polygons —Cumulative Frequency Diagrams—Samples and Statistics—Mode and Median—The Mean—Measures of Spread—Moments of a Frequency Distribution— Relative Frequency Distributions—Relative Frequencies and Probabilities—Elementary Probability Mathematics—Continuous Distributions—Moments of a Continuous Probability Distribution—Expectation—Probability Generating Functions—Corrections for Groupings—Note to Chapter Two—use of desk calculator. On Printer's Errors—The Poisson Model—Some Properties of the Distribution—Approximation to Binomial Distribution—Poisson Probability Chart— The Negative Binomial Distribution. STATISTICAL MODELS. .

CONTENTS CHAPTER _ IX P A G B VI VII VIII IX X XI MORE VARIATES THAN ONE : BIVARIATE DISTRIBUTIONS. . ANALYSIS OF VARIANCE 166 The Problem Stated—One Criterion of Classification— Two Criteria of Classification—Three Criteria of Classification—The Meaning of " Interaction "—Latin Squares—Making Latin Squares. .201 Curve-fitting—The Chi-Square Distribution—Properties of Chi-Square—Some Examples of the Application of Chi-Square—Independence. . I I : t. .238 SUGGESTIONS FOR FURTHER READING. I : SOME FUNDAMENTALS OF SAMPLING THEORY . Correlation Surfaces—Moments of a Bivariate distribution—Regression—Linear Regression and the Correlation Coefficient—Standard Error of Estimate—Rank Correlation—Kendall's Coefficient —Coefficient of Concordance—Polynomial Regression —Least Squares and Moments—Correlation Ratios— Multivariate Regression—Multiple Correlation—Partial Correlation—Mathematical Note to Chapter— Determinants. REGRESSION AND CORRELATION 94 Two Variates—Correlation Tables—Scatter Diagrams. 189 The Correlation Coefficient Again—Testing a Regression Coefficient—Relation between t. 239 . z. . . AND F . SAMPLE AND POPULATION. Homogeneity and Contingency Tables—Homogeneity Test for 2 X ft Table —h X k Table—Correction for Continuity—Confidence Limits of the Variance of a Normal Population. 151 The ^-distribution—Confidence Limits—Other Applications of the ^-distribution—The Variance-ratio. 130 Inferences and Significance—What Do We Mean by " Random " ?—Random Sampling—Ticket Sampling —Random Sampling Numbers—The Distribution of Sample Statistics—Distribution of the Sample Mean —The Distribution of x when the Population Sampled is Normal—Sampling without Replacement—Distribution of the Sample Variance.and ^-distributions—The Distribution of r—Combining Estimates of p—Testing a Correlation Ratio—Linear or Nonlinear Regression ? CHI-SQUARE AND ITS USES . F. . SAMPLE AND POPULATION. . TESTING REGRESSION AND CORRELATION . APPENDIX : CONTINUOUS BIVARIATE DISTRIBUTIONS . Stereograms. 228 SOME MATHEMATICAL SYMBOLS . . .

.

1935). Statistics. diagrams and graphs in economic and scientific publications. F. life insurance. football pools. analysis and interpretation of such data. but they have flowered since the industrial II . Kendall. Most of us have some idea of what the word statistics means. In his 1952 Presidential Address to the American Statistical Association. it is concerned with action. of course. In this definition ' natural phenomena ' includes all the happenings of the external world. Wickens remarked : " Statistics of a sort can. however. and with the general principles involved in this activity. and there are many others. as a science.CHAPTER ONE INTRODUCTORY: A F I R S T LOOK AROUND 1. vol. We should probably say t h a t it has something to do with tables of figures. to start with. Willcox. like all sciences. Our answer would be on the right lines. cricket averages. 288. We might even point out t h a t there seem to be at least two uses of the word : the plural use. be traced back to ancient times. has given his definition as follows: " Statistics is the branch of scientific method which deals with the data obtained by counting or measuring the properties of populations of natural phenomena. with public-opinion polls. Nor should we be unduly upset if. population-census returns. p. p. with production planning and quality control in industry and with a host of other seemingly unrelated matters of concern or unconcern. G. M. is. Statisticians themselves disagree about the definition of the word: over a hundred definitions have been listed (W. A. Revue de I'Institut International de Statistique. " intelligence " tests. not merely descriptive.1. 2). J. when the jvord denotes some systematic collection of numerical data about some topic or topics . when the word denotes a somewhat specialised human activity concerned with the collection. One of the greatest of British statisticians. whether human or n o t " (Advanced Theory of Statistics. Statistics and Statistics. vol. with the cost of living. we seem a little vague. the singular use. 1. ordering. 3.

Wilks in " Undergraduate Statistical Education ". pp. while. vol. Ia private enterprise. controls t h a t must be exercised. No. have stressed: " In high degree the emphasis in the work of the statistician has shifted from this backward-looking process to current affairs and to proposed future operations and their consequences. writers. the reader. it would be necessary to start b y collecting information about the play of a group of " possibles " . The sequence of numbers set down against each batsman's name would tell us something about his run-scoring ability. vol." Journal of the American Statistical Association. C. on one occasion. however. statistics have been used to settle problems.C. and to determine courses of action. about his style— that. No doubt they influenced t h e course of men's thinking then. D. we shall not consider how we have chosen these. statistics collected and analysed with reference to decisions t h a t must be made. about the batsmen and bowlers. Let us suppose t h a t you.12 STATISTICS revolution. March 1953. Journal of the American Statistical Association. tell us much. judgments t h a t entail action " (*' Statistical Agencies of the Federal Government." (" Statistics and the Public Interest. 46.S. It would not. quality control tests now change the production lines of industrial enterprises. but primarily their uses were descriptive. . and even. have been appointed to be one of the M. in some instances. Clearly. 1948 ". F. and especially since World W a r I. on our list. though by no means everything. Scientific experiments turn upon statistics. 253. Mills and C. may have led to new policies and new laws.C. as batsmen and bowlers. statistical records were developed to describe the society of t h a t era. although he . 1-2). New products are developed and tested by statistical means. Two other U. Ultimately. in the 20th century. on another. March 1951). for instance. samples selected. 261. Long. Selectors responsible for picking an English cricket team to tour Australia.' Beginning in the 19th century. (For the moment. No. if anything. Experiments are designed. quoted by S. S.) We might begin by noting down each man's score in successive innings and by collecting bowling figures. 48. Increasingly. and to throw light on its economic and social problems. . our collection of figures would tell us quite a lot. . he scored a boundary with a superlative cover-drive.

63 runs. . such a difference is not measurable. Then x may take on values like 12-6. assign the number 1 to a first-class stroke and t h e number 0 to one t h a t is not. in the common-sense use of measure. etc. . however.INTRODUCTORY : A FIRST LOOK AROUND 13 scored a boundary. would constitute a set of statistical observations. including zero. now x can take on any positive real number 2 as value. 27-897. x is a discrete or discontinuous variable. Although. difference between any two possible values of A is a finite amount (1 in this case). we have agreed upon some standard enabling us to label any particular stroke first-class and not-first-class. his stroke was certainly not superlative. Thus any rational number is a real number. See T. is not rational). I t may be. 1 not merely the positive integers. if a fielder walks round a circle of radius 1 m. together with our fellow Selectors. Let. from this point of view. Hardy. so. 3-333333 . now. Any aspect of whatever it may be we are interested in t h a t is countable or measurable can be expressed numerically. Assume. because the smallest. let x denote the number of metres run b y a particular fielder on a given occasion in t h e field. Such lists of classified numbers are the raw stuff of statistics. 1 A RATIONAL NUMBER is any number that can be expressed in the form rjs. however. Let x be the number of runs scored by a particular batsman in a given innings of a given season. t h a t we are primarily interested in a batsman's style rather t h a n his score. nevertheless. that. . More precisely. Number. and. The list of number-pairs (5 wickets. This. x denote the average number of runs scored off a particular bowler per wicket taken by him on a given occasion his side fielded in a specified season. H. A Course of Pure Mathematics. like 00010011101111010111110000000. not merely any positive rational (for instance. and K. he traverses 2K m. ! A REAL NUMBER is any number that can be expressed as a terminating or unterminating decimal. Any particular innings of a given batsman would then be described. Then x is a variable which can take any positive integral value. or G. of this kind. by a sequence of 0's and l's. we can. from zero to infinity (0 wickets. . non-zero. Not all variables are. Dantzig. but not all real numbers are rational. tell us something about his bowling over a certain period. too. . 41 runs). Thus the variable can vary continuously in value a n d is called a continuous variable. likewise. then. the Language of Science. Theoretically x can range over the entire set of positive rational numbers. Again. where r and s are integers and s is positive. may be represented by a variable taking on values from a range of values. while being a real number. for example) against a bowler's name would.

Thus the systematic collection of numerical data about a set of objects. times not out. we summarise t h e information contained in the data. simultaneously presenting them in some simple. we might t r y to arrive a t some figure. highest score. tabulating.. a set of numbers (e. The data we have collected. t h e numerical data collected in our field of interest. We might want to compare the average (arithmetic mean score) of each one of our " possible " batsmen with the overall average of all the batsmen in the group. so ordered and displayed. more usually.3. average number of runs scored per completed innings). in the very process of collecting we have already behaved statistically (in the singular). Descriptive Statistics. Indeed.2. by a single number or. a rather technical word we shall explain later. 1. to denote a set of quantitatively measured data.) 1.g. requires statistical (in the singular) treatment (ordering. this use.). summarising. The plural use of statistics is confined to the plural of the word statistic. unambiguous diagrammatic or graphical form. we must begin to " sort them out " . displayed and summarised will tell us much t h a t we want to know about our group of possible Test players. ( W A R N I N G : Thus far. etc. in addition to the mean score. Samples and Populations. ordered. Secondly. for we shall have been forced to develop some system to avoid utter chaos.14 STATISTICS When we have gathered together a large collection of numerical data about our players. Actually. we have been using the term statistics (plural) in the sense in which it is often used in everyday conversation. Our statistics (in t h e plural). the systematic collection of statistics (plural) is the first phase of Statistics (singular). and . which would tell us how a batsman's scores in different innings are spread about his mean. among statisticians themselves. Which of these " summarising numbers " we use depends upon the question we are trying to answer about the subject to which the data relate. we display them so t h a t the salient features of the collection are quickly discernible. maybe. for a particular purpose. total number of runs scored. for a batsman. How do we go about sorting out our data ? First. In an a t t e m p t to assess consistency. and. number of innings. This involves tabulating them according to certain convenient and well-established principles. without direct reference to any inferences t h a t may be drawn therefrom. In doing" such things we should be engaged in descriptive statistics—ordering and summarising a given set of numerical data. is almost obsolete.

To obtain a more nearly representative picture. Such a population may have a finite number of elements. in its turn. the whole population may be a sample of itself. the population of all possible selections of three cards from a pack of 52. In this way we meet with one of the central ideas of the science of statistics—that of sampling a population. Let us say a t once t h a t this definition is opened to serious objection. may involve the very idea of randomness we are trying to define. of every player in every first-class county. we call individuals. B u t this would be. bowling averages. the population of all first-class cricketers in the year 1956 . need not be a population of what. we call the population infinite. the picture we obtain from our group. So we are forced back to the idea of drawing conclusions about the population from the information presented in samples—a procedure known as statistical inference. we begin to suspect t h a t it. will not be truly representative of t h a t population. or may be so large t h a t the number of its elements will always exceed any number. at random from the population of all first-class players. of course. Indirectly.. In the case of a finite population. Ideally. A population. no matter how large. etc. we should have picked a sample. but in the case of an infinite population this is impossible.. the population of all actual and possible measurements of the length of a given rod . let us set down some definitions: POPULATION (or UNIVERSE) : the total set of items (actual or possible) defined by some characteristic of those items. better still. To fix our ideas. they will also tell us something about the state of first-class cricket in England as a whole. note. in this latter case. each element of which has an "equal chance" of being drawn. SAMPLE : Any finite set of items drawn from a population.g. certainly impracticable. if not theoretically impossible. as we call it. of first-class cricketers. to obtain a really reliable picture we should need all the scores. Statistically. RANDOM SAMPLE : A sample from a given population. we may choose. or sample. or. in everyday language.INTRODUCTORY : A FIRST LOOK AROUND 15 may help us to pick the touring party. we speak of a population of scores or of lengths. for when we think about what exactly we mean b y ' ' equal chance ". . But because the group of '* possibles " we have been studying is a special group selected from the entire population. e. a number of samples.

" On a Unique Feature of Statistics ". is the concept of probability. December 1948. called tautologies by logicians. a black cat is black. Switching to a new field. statistical inference. The common method of drawing names from a hat occurs to us a t once. among others. of 200. are sometimes of considerable interest because. Centra]. assume that from all National Servicemen born in a certain year. Presidential Address to the American Statistical Association. but none is a statement about the world. Unfortunately it is not . rather than designating some property of the aggregate of elements of the sample discovered after the sample has been drawn. The great mass of our knowledge is probability-knowledge. will be 100% accurate. vol.6 STATISTICS There are various methods by which we may obtain a random sample from a given population. W. is his interest and skill in the measurement of the fallibility of conclusions " (G. based on the sample-data. We need not worry about this. like : 1 was born after my father was born. therefore. We are " absolutely certain " t h a t a statement is " 100 % true '' only in the case of statements of a rather restricted kind. Indeed. 44. in this problem of inference from sample to population. Such statements. they are in fact disguised definitions . 1933. Journal of the American Statistical Association. one plus one equals two. t h a t Statistics is i m p o r t a n t : " The characteristic which distinguishes the present-day professional statistician. No. I t is for this reason. although a t first sight it may appear to be so. That section of Statistics concerned with methods of drawing samples from populations for statistical inference is called Sampling Statistics. we draw a random sample or a number of samples. March 1949). We may be " pretty confident " t h a t any statement saying something remotely significant about the world is a probabilitystatement. This suffices to emphasise the very important point that— the adjective random actually qualifies the method of selecting the sample items from the population. 245. like the third example given. What can we infer about the distribution of height in the population from that in the sample ? And how accurate will any such inference be ? We repeat that unless we examine all the elements of a population we cannot be certain t h a t any conclusion about the population. probability theory is the foundation of all statistical theory that is not purely descriptive. Snedecor.

e. POPULATION P A R A M E T E R : A number characterising some aspect of the distribution of a variate in a population. frequency-function (or. Using these new terms. We now define : VARIATE : A variable possessing a frequency distribution is usually called a variate by statisticians (more precise definition is given in 2. universally acceptable definition of probability. in the case of a distribution of a continuous variate. population range. We thereby obtain the frequencydistribution of the variable in the sample.g. we assume t h a t we know roughly what the word means. Corresponding to each of these ways.. and so on. and how accurate will such estimates be? 1. . we shall find t h a t there are so m a n y with heights of 153 cm and under. SAMPLE STATISTIC : A number characterising some aspect of the sample distribution of a variate. In a sample of 200 of all National Servicemen born in 1933. so many with heights exceeding 153 cm but not exceeding 156 cm.4. We m a y thus set up a table giving t h e frequency of t h e variable in each interval for all the intervals into which the entire range of the variable is divided. population mean. These are our standard distributions (just as the equation ax2 + bx + c = 0 is the standard quadratic equation.INTRODUCTORY : A FIRST LOOK AROUND 17 possible to give a simple. sample range. I t is distributed over the 200-strong sample in a definite manner. When we examine actual populations we find t h a t the variate tends t o be approximately distributed over the population in a relatively small number of ways. Statistical Distributions. e. the number of values of the variable falling within a specified interval being called the frequency of the variable in t h a t interval. Our variable here is height in centimetres of National Servicemen born in 1933... In the next chapter we shall t r y to clarify our ideas a little. to which we refer in the course of solving actual quadratic equations). For the moment. i. we set up an ideal distribution which will serve as a model for the type.g. we may now formulate the question raised above in this way : How can we obtain estimates of population parameters from sample statistics. sample mean. Each is defined by means of a mathematical function called a.13).e. a probability-density-function) in which the population parameters appear as parameters. Statistical Models.

8 STATISTICS as controllable 'constants in the function. which. serve to distinguish the various specific cases of the general functional form. we obtain a new distribution which tells us the manner in which the statistic in question varies over the set of all possible samples from the parent population. t h a t a certain modification to a process of manufacturing electric-light bulbs will effectively reduce the percentage of defectives by 10%? Samples of t h e eggs of the common tern are taken from two widely separated nesting sites : is it reasonable to assume. This is a very common type of problem occurring in very many different forms : Is it reasonable to assurfie. we can sample theoretically the corresponding ideal population. we shall find t h a t the mean value of the mean values of all possible samples from a population is itself the mean value of the variate in the population. For example. we are led to the idea of statistics or functions of statistics as estimators of population parameters. Thus. I t is then frequently possible to work out the manner in which any specific statistic will vary. t h a t there is no difference between the mean lengths of the eggs laid by birds in the two localities? . In other words. with the size of the sample drawn. on the basis of the data provided by certain samples. In this way. we may want to decide whether or not a certain assumption about a population is likely to be untenable in the light of the evidence provided by a sample or set of samples from that population. outside which a particular parameter will lie with a probability of only " one in so many ". the constants a. and t h a t the mean of a particular sample has only a specific probability of differing by more than a stated amount from t h a t value. we are able to decide how best to estimate a parameter from the sample data and how to assess the accuracy of such an estimate. Very often we are not primarily concerned with the values of population parameters as such. established from the sample data with the help of our knowledge of some model distribution. and to the closely related idea of confidence limits—limits. when varied. a. 1.5. Instead. providing our model is appropriate. we obtain different cases of the same functional form. Tests of Significance. b. (In the function f{x.) Once we have defined a standard distribution. c) = ax1 -f bx + c. Such distributions we call Sampling Distributions. as a result of the random-sampling process. b. on the evidence of these samples. c are parameters : when" we give them different values.

say. The hypothesis to be tested is a— because it is to be nullified if the evidence of random sampling from the population specified by the hypothesis is " unfavourable " to that hypothesis. a random sample of. 10. Previous sampling may have led us to suspect that the number of defective light bulbs in a given unit of output. In all this. . Technical experts have suggested a certain modification to the production process. We shall have reason. In practice. Now the level of probability chosen is essentially arbitrary. This modification has been agreed upon. We decide what shall be considered " unfavourable " or " not unfavourable " by choosing a level of significance (level of probability). I t is up to us to fix the dividing line between " unfavourable " and " not unfavourable ". the levels chosen are frequently the 0-05 and 0-01 levels. we should expect not more than 1 sample in every 20 to contain 5 or more defectives : if. level and found t h a t more than 1 in 100 samples contained this number of defectives or more. 100 bulbs. from our knowledge of the appropriate model distribution. But any level may be chosen. say.e.000 bulbs. the probability of obtaining 5 or more defectives in such a sample will be 1 in 20. an event improbable at the level of probability chosen (in this case yo ° r 0-05) has occurred. if we draw from each 10. to suspect our hypothesis—that the modification has reduced the number of defectives by 10%. however. what is the value of m such t h a t we should expect only 1 sample in 100 to contain m or more defectives ? H a d we chosen this. We might well have made the test more exacting and asked of our model what number of defectives in a sample of the size specified is likely to be attained or exceeded with a probability of 0-01. We should then be justified in very strongly suspecting the hypothesis.INTRODUCTORY : A FIRST LOOK AROUND 19 Consider the light-bulb problem. introduced and production has been resumed. we find t h a t 3 samples. If the hypothesis is true. in 20 contain 5 or more defectives. there is need to remember that it is NULL HYPOTHESIS. is too high for the good name of the firm. if our hypothesis is true. therefore. In other words.. i. then. 0-01. Such tests are T E S T S O F S I G N I F I C A N C E .000 bulbs produced. that. say. a very improbable event would have occurred. Has the modification been successful ? The assumption t h a t the modification has reduced the number of defectives by 10% is a hypothesis about the distribution of defectives in the population of bulbs. we may conclude.

the better equipped we are both to employ it effectively and to detect occasions of its misuse and abuse. " False " or " True ". although each test in a series of tests may yield a non-significant result. between " Guilty " and " Not proven ". and subject this pooled data to the test. Here we attempt to make a small start on the job of understanding. as we shall see. " False " and " Not disproved ". it is quite possible that. economist and all who are confronted with the job of taking decisions on the basis of probability statements. For. Again.20 STATISTICS always dangerous to rely too much on the evidence of a single experiment. But any tool is limited in its uses. and all tools may be misused. a very powerful tool. of the principles underlying its operation. test or sample. but. . a non-significant result is no evidence of its truth : the alternative is not one between " Guilty " and " Not guilty ". technologist. although a significant result is evidence for suspecting the hypothesis under test. Statistics is a tool. when we pool the data of each test. Problems such as these we have rapidly reviewed are. in the hands of the scientist. then. the result may actually be significant a t the level selected. The more we know about a tool. rather. a few of the many with which statistics is concerned.

. When I have carried out the routine 200 times. The variate is. I make out the following table showing upon how many occasions out of the 200 all the pennies showed tails (no heads). Here the lengths have been measured correct to the nearest centimetre. to 35-4999 . In other words. . are included in t h e 35-cm class. here. in other words. two pennies showed heads and so on : Number of Heads (H) . I take six pennies. Frequency Tables. a continuous variate. called a variate (in this case H. being capable of taking certain values only in the range of its variation. drop them into a bag. and repeat the experiment. .CHAPTER TWO F R E Q U E N C I E S AND PROBABILITIES 2. the number of heads shown in a single emptying of the bag). . . I note the number of heads shown. and 39-4999 .1. Frequency (/) 0 2 1 2 3 62 4 5 6 4 Total 200 19 46 47 20 Such a table summarises the result of the experiment. In this case the variate is discontinuous or discrete. one out of the six pennies showed a head. 200). the frequencies have been grouped into classes corresponding to equal subranges of 21 . . and is called a Frequency Table. the variate (L) here could have taken any value between say 29-50000 . Consider the following frequency table showing the distribution of length of 200 metal bars : Length (L) . 30 31 32 33 34 35 36 37 38 39 4 8 23 35 62 44 18 4 1 1 Total 200 Frequency (/). takes a specified value in a given total number of occasions (the total frequency. and all bars having lengths in t h e range 34-5000 . Histograms and Frequency Polygons. I t tells a t a glance the number of times (frequency) a variable quantity. shake them well and empty the bag so t h a t the pennies fall on to the table. But not all variates are of this kind. . . cm. but for convenience and because no measurement is ever " exact ".

On a horizontal axis mark out a number of intervals. How do we display such distributions diagrammatically ? (a) The Histogram. upon each interval as base. erect a rectangle the area of . 2. in practice a distribution of observed frequencies only covers a finite number of values of the variate. Thus in a sense. corresponding to the values taken by the variate. The mid-point of each such interval is labelled with the value of the variate to which it corresponds. Then. this 62 /FREQUENCY POLYGON 0 1 2 3 4 5 6 NUMBER OF HEADS FIG. although this may at times be very large.22 STATISTICS the variate and labelled with the value of the mid-point of the class interval.—-Frequency Diagram. could have taken any value in its range. usually of equal length.1 (a). especially as the frequencies have been grouped. But although the variate. second distribution may also be regarded as a discrete distribution. L.1.

only accidental. all t h e intervals are of equal length. 62 44 35 23 18 30 31 32 33 3 4 35 36 37 3 8 3 9 B A R LENGTH IN CENTIMETRES (TO NEAREST CM) FIG.1. measures the frequency of occurrence of the variate in the interval upon which it is based. The area of each cell. 2. a histogram.1 (6). 2. called.1 (a) and (b) show the histograms for our t w o distributions. Figs. I n this way we obtain a diagram built u p of cells. Of course. as is often t h e case. as it were. whereas in the case of . b u t this is. I t should be noted t h a t .FREQUENCIES AND PROBABILITIES 23 which is proportional to the frequency of occurrence of that particular value of the variate. t h e height of each cell serves as a measure of t h e corresponding f r e q u e n c y . we emphasise.—Histogram. if. The area of all the cells taken together measures the total frequency. from the Greek for cell.1.

on the " 0 " interval. The area of the rectangle set up on the last interval will measure the total frequency of the distribution. a t the mid-point of each interval. Consequently the actual results obtained may be regarded as those of but one sample of 200 throws from an indefinitely large population of samples of that size. erect an ordinate proportional in length to the frequency of the variate in t h a t interval. erect an ordinate measuring the " accumulated " frequency up to and including that value of the variate. 2.2. the experiment with the six pennies could have been continued indefinitely.1. Samples and Statistics.1 (a)).2. in our pennies example. 2. Exercise : Draw a cumulative frequency histogram and polygon for the data given in the second table in 2. To do this we set up a cumulative frequency diagram in either of the two following ways : (a) On each interval corresponding to the different values of the variate. when the variate was really discrete. Thus.1 results.24 STATISTICS the first distribution. in the case of the second distribution.1. H a d we performed . 2.1). 2. we may want to show diagrammatically the frequencies with which our six pennies showed three heads or less or five heads or less. Cumulative Frequency Diagrams.2. Now join together by straight-line segments the upper terminal points of neighbouring ordinates. Theoretically. on the interval " 2 " a rectangle of area 2 + 19 + 46 = 67 and so on. 2. The resulting figure is a cumulative frequency polygon (see Fig. the mid-point of which is used to denote t h e interval. Thus. The diagram shown in Fig. We may be interested more in the frequency with which a variate takes values equal to or less than some stated value rather than in the frequency with which it takes individual values. for example. Alternatively. (b) Frequency Polygon. Join the upper end-points of neighbouring ordinates. each class-interval represents a single value. set up a rectangle of area 2. (b) Alternatively. a rectangle of area 2 + 19 = 21. set up rectangles in area proportional to the combined frequency of the variate in t h a t interval and in all those corresponding to lower values of the variate. at the mid-point of each interval.3. The figure so obtained is a frequency polygon (see Fig. on the " 1 " interval. each class value represents in fact a range of values.

throws. we should have obtained yet another distribution. a set of " descriptive .2. some method of summarising their salient features by. emptying the bag another 200 times. We therefore require some method of describing frequency distributions in a concentrated fashion.1. say.—Cumulative Frequency Diagram. that of another sample of 200 throws. 2. Had we made 350 NUMBER OF HEADS FIG.FREQUENCIES AND PROBABILITIES 25 the experiment again. we should have obtained a somewhat different frequency distribution. this time of a sample of 350.

and L — 34 is accordingly the median value. To find an approximation t o the median length. Mode and Median. On the other hand. had our total frequency been 201.1 is H = 3 and the modal frequency is 62. and parameters when they describe t h a t of a population. If we think of the bars as arranged in order of increasing length. The mode of the first distribution in 2. but it is possible for a bar to have any length in the range. in the bar-length distribution the frequencies are grouped into classes. and t h e median group could easily have been found. multimodal distributions). and t h a t the frequency of the 34-cm group is 14. the . suppose t h a t t h e median group is t h e 34-cm group (33-5-34-5 cm). the 100th bar had fallen in t h e 34-cm group and the 101st bar in t h e 35-cm group. Many distributions are unimodal. Let the median value be Lm. The mode of the second distribution is L = 34 cm and t h e modal frequency 62. Then the difference between the median value and t h a t of the lower endpoint of the 34-cm interval will be Lm — 33-5. In our coin-throwing experiment both the 100th and 101st throws. 2. Since the variate can only take integral values between 0 and 6.26 STATISTICS numbers ". On the other hand. The MEDIAN is that value of the variate which divides the total frequency in the whole range into two equal parts. fall in the " 3 " class. As we said in Chapter One we call these " descriptive numbers " statistics when they describe the frequency distribution exhibited by a sample. The MODE of a frequency distribution is that value of the variate for which the frequency is a maximum. t h a t is there is only one value of the variate in its total range for which the frequency is a maximum. when the throws are considered as arranged in order of magnitude of variate-value. the 101st bar length would be the median value. however. the median value is 3 and the class H = 3 is the median class. The median cumulative frequency is 100.4. the 100th and 101st bars fall in the 34-cm group. For the present. we concern ourselves with samples only. B u t there are distributions showing two or more modes (dimodal. Suppose. t h a t t h e cumulative frequency u p to and including the 33-cm group is 98. we should say t h a t the median value was between 34 and 35 cm.

1) Thus the mean number of heads shown in our penny-distribution is given by 200H = 2 x 0 + 19 x 1 + 46 x 2 + 62 x 3 + 47 x 4 + 20 x 5 + 4 x 6 = 609 or H — 3-045 Frequently much arithmetic may be avoided.p. Then = £/. or briefly. The point where it cuts this axis gives an approximation to the median value. 2. Consequently it is reasonable to write Lm .(*< + 34) = + 34S/. If the variate x takes the values x% with frequencies /. Through the 50th division draw a horizontal line to meet the polygon .34. correct to 1 d. let * = L . Divide this ordinate into 100 equal parts and label them accordingly.2. . . Alternatively.5. is defined by Nx= 1 fiXi <=l . k).5.FREQUENCIES AND PROBABILITIES 27 difference between this and the cumulative frequency a t t h e lower end-point is 100 — 98. The last ordinate in such a diagram measures the total frequency of the distribution.33-5 or = 100 .by using a working mean. while t h e frequency in the 34-cm class is 14. the mean of the distribution. The Mean. Exercise : Find the median value for the distribution whose frequency polygon was drawn as an exercise in 2. . or. x. 3 . Examining the frequency table.33-5 34-5 . . We use this method to calculate the mean of our second distribution in 2. More important than either of the two preceding statistics is the arithmetic mean. 2. the total fre< l — quency. we see that the mean will lie somewhere in the region of 34 cm. . . Consequently.98 14 Lm = 33-5 + 0-143 = 33-6. then through this point of intersection draw a vertical line to cut the axis of variate values.• k respectively (« = 1. (2.1. where I f i = N. the mean. the median value may be estimated graphically by using the cumulative frequency polygon for the distribution.

but two distributions having the same mean and range may yet differ radically in their " spread ".6. the frequency /» in the x% class is represented by the area of the corresponding rectangular cell.28 STATISTICS dividing by S/< = 200. such a statistic. x. Teach Yourself Calculus. The reader will notice that. the difference between the greatest and least values taken by the variate.1 or L = x + 34 = 33-9 cm. or ^-co-ordinate. Consequently.2 0 . Measures of Spread. the mean of a distribution. is.34. If we refer back to the method of finding the median graphically from a cumulative frequency polygon. the following table : L. same mode and same median. therefore. to statistics of position or of central tendency—mean. but differ from each other according as the values of the variate cluster closely around the mean or are spread widely on either side.0 . is the abscissa. x = -20/200 = . The R A N G E . since in the histogram of a distribution. -4 -3 -2 -1 0 1 2 3 4 5 - fx. L = x + 34. Nx = S fiXi is i the first moment of the area of the histogram about x — 0. Two distributions may have the same mean value. of course. In addition. mode and median—we require additional statistics to measure the degree to which the sample values of the variate cluster about their mean or spread from it. of the centroid of the area of the histogram (see Abbott. we see t h a t this . 2. 30 31 32 33 34 35 36 37 38 39 /• We therefore set up x = L . Chapter XVII). 16 24 46 35 4 8 23 35 62 44 18 4 1 1 200 -121 44 36 12 4 5 101 S f x = .

therefore.1) In words. k But Nx = E f x . (2. i-i s 2 = ( k j t x f i N ) . corresponds to the radius of gyration of the histogram about this axis (see Abbott.x2 .6. this is better formulated as: = [AT i/iX. Together. therefore. Expanding the right-hand side of (2.6. it is the mean squared deviation from the mean of the sample values of the variate. iVs2 = k k k S fi(x? . for the smaller the inter-quartile range the more closely the distribution clusters about the median. and.1). is a statistic called the VARIANCE. the 25th percentile. Such values are called P E R C E N T I L E S . This and the semi-interquartile range are often useful as practical measures of spread. . Chapter XVII).* - • (*•«•») . Thus s. More important theoretically as a measure of spread.2xxi +X1) = £ fiXi2 . The ^-percentile being t h a t value of the variate which divides the total frequency in the ratio p : 100-p. the median and the 75th percentile quarter the distribution. Using the notation in which we defined the mean. Teach Yourself Calculus. The difference between the 75th and 25th percentile is the inter-quartile range. known as QUARTILES. the variance.6. Thus the median is the 50th percentile. They are. If we think in terms of moments of area of the histogram of the distribution. commonly called the standard deviation of the distribution (it is the root mean square deviation). (2. is given by Ns2 = i= i 2 fi(xi - x) a .2x S /<*« f >=1 i=l i=l Nx2. . . .2) When a desk calculator is used. Ns2 is the second moment of area about the vertical axis through the centroid of the histogram. s 8 .FREQUENCIES AND PROBABILITIES 29 method can also be used to find the value of the variate below which any given percentage of the distribution lies.

x)2 = S fi(xi . 16 9 4 1 0 1 4 9 16 25 — fx. I t is. the mean of which we have already found.m ~ (x .m y = iVs2 + .m) 2 (2.2 0 fx'. -16 -24 -46 -35 -121 44 36 12 4 5 101 -121 S/* = . I t also shows t h a t the sum of the squared deviations from the true mean is always less than t h a t of the squared deviations from any other value.m)) 2 •= 1 i=1 k k = S fi(xi — my — 2(x — Mi) S fi(xi — m) + N(x — m)* i=l i=1 k 2 = S — m) — 2V(. 313). p.? — m)2 i=l k or Z .2(a)) t= i This equation shows how we may calculate the variance using a working mean. -4 -3 -2 -1 0 1 2 3 4 5 — X3. We now calculate the variance of the distribution of bar. We have k k Ns* = S fi(xi .6.30 30 S T A T I S T I C S Now let m be' any value of the variate.' lengths. 30 31 32 33 34 35 36 37 38 39 — /• x = L-34. in fact. m. the analogue of the so-called Parallel Axis Theorem for Moments of Inertia (see Abbott. 64 72 92 35 0 44 72 36 16 25 S/* 2 = 456 4 8 23 35 62 44 18 4 1 1 200 = N . Teach Yourself Calculus. We extend the table set out in 2-5 as follows : Distribution of bar-lengths in 200 bars (Working mean = 34 cm) L.

For the present we shall assume t h a t we are dealing with finite sample distributions of a discrete variate.34)2/iV - s = 1-51 cm. The rth moment of a distribution about its mean. 2 . ml = 0 . ^ K r . (i = 1 . since we may always combine the last two terms. . with k frequency ft.1) i= 1 where the variate x takes the k values Xi. . . is 2-20 correct to 2 d. .15) t h a t in the case of grouped distributions of a continuous variate (as here). A). Expanding the right-hand side of (2.34)2 = f H . Moments of a Frequency Distribution. . .7. likewise. 2 . defined to be Nm/ = S fix? »= 1 The rth. the sample size or 1= 1 total frequency. .(O-l)^ _ 2-27 We shall see later (2.p. . . . (r — s + l)/s! or r\/s\(r — s)!. is.7. k). or the xth mean-moment. 2 /< = N. m3 = m3' . The quantity Ns2 is the second moment of the distribution about its mean.7.1) by the binomial theorem. so corrected.3mt'm%' + 2(w 1 ') 3 . . r(r — l)(r — 2) . ' + (^(m^hnr-i + (+ .3) where i ^ j is the binomial coefficient.i f V + ( . In particular. . We shall find it useful to extend this idea and define the higher moments of such distributions.7. m/.7.6. The variance of the present distribution.7. we have mr = m/ — ( j j m j ' r a . . + (_ —. . moment (2.l) r (»» 1 ') r (2. and x is the sample mean. m t = m t ' — 4m 1 'm a ' + ^ m ^ Y h n ^ — Z{m\) 1 (2.2(a) and s 2 = S/(L . (2. a small correction to the variance is necessary to compensate for the fact t h a t the frequencies are grouped. about x = 0.— x)r .4) and . . 2. . mT is defined by k Nmr s 2 /<(*. m 2 = mt' . . (i = 1.( m / ) 2 .FREQUENCIES AND PROBABILITIES 31 From 2.2) . (L .

If. I n reply we should probably say t h a t if a numerical measure of t h a t probability has t o fee given. 3 4 amd 35 cm. 35 were in the 33-cm class. of the 200 bars already measured. the best we can do is to give the fraction f ° r this is t h e relative frequency of the 34-cm . so £ Fi = 1. 62 were in the 34-cm class. some a t least of the class-frequencies will also be great. we are: asked to predict the length of the 201st bar. Relative Frequency Distributions. when we are dealing with relative frequencies. the foundation of statistical analysis. and it follows a t once k that. If we take into account the information provided by the first 200 bars. T h e next two hundred bars are not likely to reproduce this distribution exactly. t h e class with the highest relaitivie frequency. Suppose that. Suppose now that we are asked to estimate the probability t h a t the 201st bar will have a length falling in the 34-cm class.9. the total area of the cells in a relative-frequency histogram is unity. Clearly. Directly we begin to speak o f ' ' relative frequencies '' we are on the threshold of probability theory.8. Relative Frequencies and Probabilities. w e shall choose the 34-cm class. then. it is reasonable "to assume t h a t the distribution of length of the n e x t 200 will mot be radically different from t h a t of the first 200.** Thus. we have to plump for any o n e length. When we have a large sampLe of observations. the mean is simply t h e first moment of the distribution about x = 0 and the variance is simply the second moment about the mean. the relative frequency »=i of the value xi isfi/N. If the variate x takes the value fi k times and t h e total frequency 2 /. if we write Fi =ft/N and.32 STATISTICS 2. we shall probably argue something like this : we notice t h a t . 44 were i n the 35-cm class. 2. Of all the lengths t h e three occurring most frequently in the sample are 33. but if the 200 bars already measured are anything like a representative sample of the total batch of bars. In this situation it is often convenient to reduce tine frequency distribution to a relativefrequency distribution. 1 k x = 2 FiXi i=i and s* = 2 Fi(Xi — = ( I FiXA . hav4ng witnessed the measurement of the length of each of the: two hundred metal bars we have been talking about.• = N.

we should. the generally accepted definition of a random selection from a population is t h a t it is one made from a population all the items of which have an equal probability of being selected. as sampling is continued. made under exactly similar conditions. we have no reason to expect t h a t the relative frequency of this class in the next sample will be greatly different. and if t h e drawing of each sample of 200 bars is indeed random. Assume now t h a t the next sample of 200 has been drawn and t h a t t h e relative frequency of t h e 34-cm class in this sample is /„%. that each sample was really representative of t h e population of all t h e bars available. the relative frequency of an occurrence tends to some unique limit which is " the probability " of t h e occurrence we are trying t o estimate. t h e relative frequency of t h e 34-cm class would lead to exactly t h e same limit? Secondly. what is the probability of a head in a single toss of a penny? In trying to answer this question. have we a n y grounds for assuming t h a t in two different sampling sequences. But. presumably. There can be little doubt t h a t it was somewhat in this way t h a t the concept of " the probability of an event B given conditions C " arose. The relative frequency of this class in the combined distribution of 400 bars will be AJA or -ffo. the other t h a t it will show a tail. as we saw in the first chapter. If the coin is perfectly symmetrical and there is no significant change in the method of tossing. So our definition of the probability of a particular event E. There is then one chance in two t h a t a head will show. The required measure of the probability of a head in a single throw of the penny is then 1 in 2 or .FREQUENCIES AND PROBABILITIES 33 class in t h e available sample of 200. I n assessing the probability t h a t the 401st rod will fall in t h e 34-cm class. I n so doing we are actually implying that. B u t there are difficulties in t h e way of developing a " relative-frequency " definition of t h a t probability. we might argue as follows : There are only two possible outcomes (assuming t h a t the penny does not land standing on its edge!) : one is t h a t it will land showing a head. we made the explicit assumption t h a t the sampling was random. itself depends on knowing what you mean by the probability of another event E'. In t h e first place. given another set of conditions C'! W h a t is the chance t h a t when you toss a penny it will " show heads " ? More precisely. given a set of conditions C. a head or a tail is equally likely. use this latest figure.

. 1 but. as n increases. in a single throw are each equal to 1/6.to the probability of a head./). 11). (i == 1. p. 0 < p{ < 1 and 2. is in this situation measured. how do we know t h a t the coin is " perfectly symmetrical ". n. with a margin of error. . 1 Professor Aitken has said: " Every definition which is not pure abstraction must appeal somewhere to intuition or experience by using some such verbal counter as ' point'. assuming t h a t the coin is asymmetrical. how are we to estimate the probability of a head in a single throw without recourse to some relative frequency experiment? A battered coin can have a very complicated geometry! On the other hand. ' straight line ' or ' equally likely under stigma of seeming to commit a circle in definition " (Statistical Mathematics. is biased. p{. each empirical frequency tends to stabilise.34 STATISTICS But even this " geometrical " line of argument is open to criticism. i. The matter is too complicated for full discussion here. Correspondingly. m). So we are back again at the relative frequency position! But. the " perfectly symmetrical " coin and the " completely unbiased " die do not exist except as conceptual models which we set up for the purposes of exploration and analysis. by the relative frequency /. unbiased? Surely the only way to test whether it is or not is to make a sequence of tosses to find out whether the relative frequency of a head ultimately tends to equality with the relative frequency of a tail./zi (0 1). C. but sufficient has been said for us to adopt the following definitions:— Definition 1: If the single occurrence of a set of circumstances. for all i. in any mathematical model we set up we assign certain probability numbers. 2. can give rise to m mutually exclusive events E. and if a reasonably large number. we automatically postulate a probability distribution for the E( s in setting up the model. it being assumed that. 4. 5. while the very term " completely unbiased " six-faced die is only another way of saying t h a t the probabilities of throwing a 1.e. or 6. 3. of actual occurrences of C are observed to give rise to ft occurrences of Elf / 2 occurrences of E2. in fact. Once again t h a t haunting " equally likely " has cropped up. piE^C). and so on (where necessarily ==1). to each Et. is. and to t h a t of a tail in a single throw. . . then the probability of Et. And the very process of setting up such models entails assigning precise probabilitymeasures : in saying t h a t a coin is " perfectly symmetrical " we are in fact assigning the value -J. = 1. apart from the suspicion of circularity. such that.. 2.

(2. Elementary Probability Mathematics. however. say.10.. (2. .1) For suppose in a large number of occurrences of C. if we have n mutually exclusive events E{ with probabilities p.3) 2..) 2. If. .. then we assert that the probability of the event " either Ex or E2 ". (i — 1. On the other hand. «). .10. i. the occurrence of a head in the first throw does not influence the outcome of a second throw. . is given by P(E\C) = 1 .FREQUENCIES AND PROBABILITIES 35 Definition 2: An event E2 is said to be dependent on an event Ex. It follows from the postulated correspondence between probabilities and relative frequencies t h a t If the two events Ex and E2 are mutually exclusive and p(E1\C) = pi and p[E2\C) = p2. .1 is the law of addition of probabilities for mutually exclusive events. which we write PIEx + E2\C). the latter is independent of Ex.e.10.10. then. On the other hand. . and thus the two events are independent. the two events are said to be mutually exclusive.1 also gives immediately.10. E. n. 2. if the occurrence of Ex affects the probability of the occurrence of E2. with a relative frequency ( f x + f^ln — fx/n + f j n . the occurrence of Ex does not affect the probability of E2. If E2 is independent of E1 and Ex is independent of £2> the events Ex and E2 are said to be independent. .10. In contrast. I t follows that the probability of the non-occurrence of any E. (2. . the probability of " either Ex or E2 or E3 or . . . (In a single throw of a coin. the event " Heads " and the event " Tails " are mutually exclusive. If p(E\C) is p and p(E\C) is q.p{E\C) . . is Pi + P .2) E is often called the event complementary to E. . +Et) = Pi + Pz + P3 + • • • + P* = Xpt. then the event " either Ex or E2 " will have occurred fx + f2 times. 2. the second of two successive shots a t shove-halfpenny is usually dependent on the first. in the case of complementary events p + q = 1 . if the occurrence of Ex precludes the occurrence of E2 and the occurrence of E2 precludes the occurrence of Ex. or Et " as p(Ex + Ei+E3 + . the event Ex is observed to occur fx times and E2 is observed to occur / 2 times.

T.'s are exhausted. Denote the Et's common to S{ and T} by S(.. if Ts and Si are independent p(Tj\S. in which all the C. T) s.). p{S. P(St. Z^S.. Tj) = pu.\C. Now.'s occur once only.4) The corresponding frequency identity is ftj — / ( .36 STATISTICS Multiplication Law: Group some of the events Et (i = 1 to n) together in a set Sx.. . have probabilities p(E. Hence it would appear that the quantities pjpi.) • • • (2. but Aj/fi• is the relative frequency with which the event S. = T.) — p^. = St. Let the probability of the event St. If we consider the events S. is the conditional probability of T} given S„ i. .• P.(/ y //i. Likewise.'s.10. Consequently... T} occurs in all the occurrences of the event S. . Tj for which i is fixed but j varies. Tj.) and 0 <Pi. using the same method..y In general. Referring to Definition 2. . Consequently pl}/pi. To see t h a t this is in fact the case consider the identity Pa = PdPulPi) • • • (2-10.\Ci) (2. T} = Tj. any St and any Tj will have some of the Et's in common. are mutually exclusive and together exhaust the set of original E.. group the E{'s into a different set of events Ts and let p(Ts) — p. p(Tj\S. pk = np(E. Tj) = P(St) ..5a) Consequently If k independent events E. it is clear t h a t DjS. occurs.5) This is the probability multiplication law. taken over all values of i and j. . it is the relative frequency with which the event Tj occurs on those occasions in which S. Let p(St. Each S( m a y be considered as a new event. is P(E.t and ji>(S ( |Tj) = p(S.) = p„ the probability that they all occur in a context situation C..10.5b) = p(Tj) =p.) = 1 = ^(pjp.e. group some of the remaining events into a set S 2 and continue in this way until all the E. (Every p(jlpi./Pi.e. also exhausting the E{'s. is essentially positive). (2.) Hence Pi) = Pi. the S{. pi — and p.10. are also probability-numbers. Then the events S.. Et\C) =p1...) = p. e2 . We now have another set of mutually exclusive events.) .pi.{pir Also ^(pjp. i.

or E1 occurs and Et does not. E2) = pls.pn) + pn =pl + p2Pu Problem : If from n unlike objects.p12) + (pt . in how many ways can these r objects be ordered or arranged ? Imagine r places set out in a row. and there are therefore n — 1 ways of filling the second place. occurs ? Let P(EX) = plt p(Ei) = p2 and p{Er . or Et occurs and E1 does not. there will be (n — r + 1) ways of filling the last place. .) — p(Et . Consequently p(Et. Es). Example : If the events El and E2 are neither independent nor mutually exclusive. n(n — 1)(« — 2) . consequently. The events are. is given by p(Et + E2) = (p. (» — > + 1). . We may write this " P r = » ! / ( * . What is the probability that the balls drawn will be alternately white and black ? The probability of drawing a white ball at the first draw is f . then. the probability that the third ball drawn will be white is. . We may fill the first place in any one of n ways. therefore. One ball is drawn at a time. p(E1.r)\ Problem : In how many ways may r objects be picked from n objects regardless of order ? Let this number be x. Now x times the number of ways in which r objects can be ordered among themselves will be the number of . Now the probability that E1 occurs is the sum of the probabilities that both E1 and E2 occur and that Et occurs and E2 does not. £ ± £ £ • 6 * . since there are 5 white balls in the 9 to be drawn from. 6 = 36. or 3 and 6. the required probability that at least one of the two events occurs. 4 and 5. the probability of then drawing a black ball is f . 5 and 4. there are n — 1 objects left. Therefore the required probability is or Example : A bag contains 5 white balls and 4 black balls. f and so on. The required probability is the probability of one of three mutually exclusive events—either both occur. E2). E2) = />(£. E2) — p2 — p12. The total number of ways in which the r objects may be ordered is. E2) = pl — pl2. 2 * . E2) and p(Et . If we throw any one of these we cannot throw any of the rest. of which 4 are black. what is the probability that at least one of El and E. since there are now 8 balls to draw from. being the sum of the probabilities p(El. Likewise.5 2 4 * 31 2 * l — 126 l 0*8*7* 5 * . The required probability is. r objects are selected. then. The number of possible outcomes is 6 . p(E1.FREQUENCIES AND PROBABILITIES 37 Example : What is the probability of throwing exactly 9 with two true dice ? To score 9 we must throw 6 and 3. Having filled the first place. Arguing on these lines. mutually exclusive. Therefore.

. the second in (n — %) !/m2! (» — Mj — wa)! ways and so on. 2. the term on the right of the last denominator is. imagining now the supply to be inexhaustible. we greatly increase the accuracy with which we measure. the product of all these terms : nI (n — %)! (n — — n2 — j)! ! (n — %)! ' n2! (n — n1 — n2) 1" ' ' n^! (n — nt — »a — . continue measuring bar a f t e r bar. Example : In how many different orders can a row of coins be placed using 1 shilling. i=1 apparently. we shall be able progressively to reduce the range of our class-intervals. 1 penny and 6 halfpennies ? If we treat the halfpennies as all different the number of arrangements is 9 ! But it is possible to arrange the halfpennies in 6 ! different ways. then. all of which we now consider to be equivalent. the required number of ways is k n\ —. . 2 L's and 2 U's. Putting n = 1. . . + n2 + n3 + . n l j »t/ 4=1 II n^. The number of ways r objects may be arranged among themselves is clearly r\ (by the previous problem). Problem : In how many ways can n objects be divided into one group of n1 objects. . . j . If. a second group of w2 objects and so on. Continuous Distributions. But there are 2 C's. consequently the number of ways is 8!/2! 2! 21 = 7!. there being k groups in all? We have «. simultaneously. — u%) ! k Since S = n. The total number of ways is. Consequently the required number of different orders is 9! /6! = 504.38 STATISTICS arrangements of r objects selected from n objects. . we find 0! = 1. since the indistinguishable C's may be arranged in 2! ways and so also the L's and U's. + = Now the first group may be chosen in n\/n1\ (n — n^l ways. which must be taken as the definition of the symbol 0! Consequently.11. 0! Has this symbol a meaning? We have (n — 1)! = n[\n.n^l k Example : In how many ways can all the letters of the word CALCULUS be arranged ? If all the letters were unlike there would be 8! ways. 1 sixpence. Let us return to the bars of metal we were measuring and. Since length is a continuous variable. Hence x = n\jr\(n — r)\ This number is usually denoted by the symbol nCr or by {^j.

on the relative-frequency method of estimating probability. but this. so too we describe continuous probability distributions. 2. therefore. none of our class intervals. in the limit. A/j is the relative frequency of the variate in the interval ± centred a t x = xi. In the case of a continuous variate. I <f>(x)dx = 1. moments of the distribution about some specified value of the variate. no matter how small their range. y = <f>(x). = A/i/A^j. I t follows t h a t the probability that x lies within an interval a < x ^ b is given by P(a < x < 6) = J cj>(x)dx a and. the frequency of each interval will increase indefinitely. in the limit. Thus. such t h a t the relative frequency with which the variate # lies within an interval # i \dx will be given by ydx = <f>(x)dx. and. in any finite range of the variate.FREQUENCIES AND PROBABILITIES 39 if we go on measuring indefinitely. relative frequency diagram for grouped frequencies of a continuous variate is transformed into the population probability curve of t h a t continuous variate. our simple. it is measured by the ordinate a t x of the probability curve y = <f>(x). is the probability. t h a t can be taken is infinite . If.say. Thus. the mean of the distribution is the first moment about x = 0. And. Moments of a Continuous Probability Distribution. We therefore confine ourselves to speaking of the probability dp(x) t h a t x lies in an interval x ± \dx. defining <j>(x) to be zero at any point outside the range of r + x. theoretically. then. the variate x. clearly. since x must lie somewhere — co within its range. Just as we described a sample-frequency distribution by means of. the height of the relative-frequency histogram cell based on this interval. Yet. will be given by y. it is not impossible t h a t # == Xi. Following the convention that Greek letters . yt. will be vacant. dp(x) that * will lie between x ± \dx. the probability t h a t * = would appear to be zero. for instance. In this way we have dp(x) = <j>(x)dx. say. it is meaningless to speak of the probability that this variate. where </>(x) is called the probability density and defines the particular distribution of x . For the number of possible values. we shall have a continuous relative frequency curve. On the contrary. shall take a specified value Xi.12. x.

. (jls = 0. In order to compare the skewness of two distributions.1) and for the rth moment about x = 0. I t m a y happen t h a t a mode. .'r. . and negatively Jshaped if dyjdx is everywhere positive (the " tail " of the distribution being towards the positive or negative side respectively). = (x — \j. A Ki-shaped curve occurs if d*yjdx2 is everywhere positive and dyjdx = 0 a t some interior point of the range (see Fig. 1 If the range of * is finite.i ' W * ) * * = f + *24>(x)dx . if the curve y = <j>(x) is symmetrical about x = f i / . and it. dy/dx2 < 0 (see Abbott. and the curve is accordingly called negatively skew. and the rth moment about the mean.12.2) In particular the second moment about the mean is the population variance. given in the case of a continuous curve by dyjdx = 0. (2. for instance. however. The curve is then often J -shaped. if. .3) or or2 ^ = n. the curve is called positively skew. = f ( x ~ V . does not exist.3(a)) The probability curve y = <j>(x) may be symmetrical about its central ordinate. (2. we write the first moment about x = 0. (2. • + » Mi' = f J— C O x<f>{x)dx* .<(*x')* •>-«. 2. 88).' . the mode lies to the left of the mean. (2. or it may be " skew ".( Bl ')» .12.1')'<j)(x)dx -«.00 and.12). it is necessary to have some measure of skewness which will not depend upon the particular units used. +<*> /• + » x<f>{x)dx.12.40 40 S T A T I S T I C S denote population' parameters while the corresponding Roman letters denote the corresponding sample statistics. p. there will be a long tail on the negative side. Teach Yourself Calculus. (j.12. I t is more in keeping with the use of the moments of a distribution to describe t h a t distribution t h a t we should use +« > / (x — fi\)3(l>(x) dx. we define <f>(x) to be zero for all values of x outside this range. One such measure (Karl Pearson's) is given b y : (Mean — Mode)/(Standard Deviation). / . and we write • P. If it is uni-modal and the mode lies to the right of the mean. positively Jshaped if dy/dx is everywhere negative. from x = a to x — b.

s is positive. the cubes of the positive values of x together FIG. If. M—X) = <f>{X). +0 0 sequently. 2. by making the transformation X = x — jx/. i. in this case. therefore. are greater than those of the negative values and.. if the curve is negatively skew.—Types of Distribution. On the other hand.e.12. and. (i.FREQUENCIES AND PROBABILITIES 41 This is easily shown by transferring to the mean as origin. the cubes of the negative values axe greater than those of the . / X3<f>(X)dX = 0. If the curve is symmetrical about X = 0. but (-Xy<j>(-X) = -X3<j>(X). then |x3 = j X3tf>(X)dX. the curve —Q O is positively skew. con. now.

Xx + P(X%) . This measure. (2. denoted by p1( is the conventional measure of skewness. The corresponding sample moments may.13. X i + . . .13. This definition may be generalised to define T H E E X P E C T A T I O N o r A C O N T I N U O U S F U N C T I O N O F x.22. denoted by . this distribution is used as a standard. the probability that x is less or equal to k is at least theoretically. Definition : EXPECTATION OF A VARIATE: (1) When x is a discrete variate which may take the mutually exclusive values x{(i = 1. . a random variable (or chance variable). is defined to be £(x) = / . and ji 3 is negative. if. x. x i= 1 (2. . In the case of the normal distribution.42 STATISTICS positive values. with respective probabilities fi(Xi).13. . as it is commonly called. be used to measure the skewness and kurtosis of frequency distributions. = 3 . the expectation of x. We can now make that definition a little more precise. is a variable such that. The square of this quantity.(x)dx . . of course.+» C O x<f. n) and no others. X2 . for any given number ft. is given by £(X) = p(Xy) .e. + p ( X i ) . calculable. or in principle. i. In the previous chapter we roughly defined a variate as a variable possessing a frequency distribution. 3.2) where <f>(x) is the probability-density defining the distribution of x. To ensure independence of units employed. 6{x).. is given by |x4/(j. then.13.82. as follows : £(6(*)) = [+ 6(x). (2. Expectation.<l>(x)dx .3) . the quantity j32 — 3 measures what is called excess of kurtosis. . +p(Xn) • Xn or 6(x) = or £ p(Xi) . J C . Definition: A VARIATE or. . a variate is defined by its associated probability distribution. 8(#). Thus Pj = p32/n23We use the fourth mean-moment to measure the degree to which a given distribution is flattened at its centre (kurtosis). 2. . it is necessary to divide by a 3 . 2.1) (2) When x is a continuous variate the expectation of x.

Then. of length 2a. Weatherburn. Let pdx. with constant of 1 failure success. + kqk-! + . be the probability that the point P is taken at a distance x ± \dx from A. The concept of expectation arose from gambling. = qp( 1 + 2q + 3 ? a . Therefore the required expectation is 1 . 1 — p. finally. pp.. some or all of the x's may be negative. if these are the only amounts you have a chance of winning. q*p + . . Show that the expected value of the area of the rectangle AP . your chance of winning £xn is pn . .8a3/3)/2a = 2a a /3 . + k . . £xt on nic occasions. q*p + .43 providing the integral has a finite value (see Abbott.2a area is the £(x(2a -x))= x(2a . . we have rla / pdx = 1 or p = 112a o The area of the rectangle AP . your expectation is FREQUENCIES AND PROBABILITIES This is. q*p + 3 .. PB is x(2a — x). and. E. in fact. PB is 2a 2 /3 (C. N-y<o Lj = l -1 Example : Show that the expectation of the number of failures preceding the first success in an indefinite series of independent trials. whereis qp. the success q = The probabilityprobability ofand then is qjp. i. finally. the probability of winning £xi.. T . A First Course in Mathematical Statistics). .. the mean amount won (and.e. For suppose t h a t in N " goes " you win £x1 on nx occasions. 2 failures „ „ qqp = q2p. where p is constant. Therefore the expected value of . remember. all positions of the point being equally likely. But when N tends to infinity. Suppose that your chance of winning a sum of money £xx is pv t h a t of winning £x2 is p2 and so on until. the limit of the average sum won if you were to go on gambling indefinitely. losses!) is k k Z niXi/N = S (ni/N)x{. dx = (4a3 . k failures . qp + 2 . „ q*p. £x2 on n2 occasions and so on. . Thus 6(x) = limit f~ S niX{/N~\.)= qpj{\ _ q)t = iPlP2 = IIPExample : A point P is taken at random in a line AB. 1=1 t=i ni/N tends to pi. Teach Yourself Calculus. then. since P is somewhere in AB. 227-232).x) .

2. k). for x). . call it the Probability Generating Function for x in t h a t situation. say. . Whether x be a discrete or continuous variate. (i = 1. Definition. The most important property of generating functions lies in the fact t h a t : When x and y are independent. When a head is thrown let . in this situation. Suppose t h a t A is a T discrete variate taking the values X{. a Jp. Probability Generating Functions. whenever t h e situation supposed occurs. qx. (2. when they are continuous variates. (2.14. . Let t h e probability of throwing a head with t h e | p in a single throw be px and t h a t of a tail. obtaining a function of t. respectively.f. (i — 1. I t follows from our definition of the expectation of a function of x t h a t the expectation of t h e function tx of x is : 8(P) = pxt*> + p2t*. . . G(t). + . Such a function would.14.14. Let t h e corresponding probabilities for t h e other coins be p2. The corresponding expression when x is a continuous variate with probability density <j>(x) is G(t) =${tz) = [ •'-CO H(x)Ax . . . expand it in a series of powers of t and read off the coefficient of Pi.3) is the Probability Generating Function for x (p. q3. so worn t h a t they are unsymmetrical. .14. x takes the value x%. in fact. to find the probability that. . Let Us take three coins. with respective probabilities pi. k).g. discrete variates. generate the probabilities pi with which x takes the values xt. the function. . Let us now assume t h a t it is possible to sum the series on the right-hand side of this equation. defined by G(t) = 8 ( f ) . We.2) which is clearly a function of t. .44 STATISTICS 2. a l p and a 2p. say. . . . often. so to speak. + pit*< + . we have only to bring out G(t).1) The coefficient of Pt on the right-hand side is precisely the probability t h a t x takes the value Xi. G(t). q2 and p3. and. . the product of the generating function for * and t h a t for y is the generating function for the new variate (x + y). Now if we can keep this function by us. + pkFk (2. 2. therefore.

g. The p. t h a t of t. if our variate x is now t h e points scored. Corrections for Groupings.g. t h a t of i 2 is the probability of 2 heads and 1 tail (/» = tW + W ) . the p. Suppose it is tossed thrice. (?i + PMqi + PMi» + PJ) = ?i?2?3 + {pMs + qip*q3 + ISUPsV + (PiPtf» + pi92pa + <lip2ps)ti + PiPiPa*3We recognise immediately t h a t the coefficient of t3 (x = 3 indicates 3 heads when the coins are thrown together) is the probability of exactly 3 heads. then the p. for x. (it-3 + . t h e number of heads shown in the n throws. I t is reasonable to say. the values we obtain will in general be inaccurate. 7 and 15. If now we select any one of t h e coins.f. the probability of scoring 5 is pl and t h a t of scoring —3 is p2. the probability of 1 head and 2 tails (t = W ) and t h a t of t° (t = i°a) + W ) .-. and the situations Si are all independent. a n d toss it n times. or grouped. . t h a t : If Gt(<) is the p. the Jp.15. that the only possible scores are —9. we distort the true distribution of x. distribution. say. is (qx + pxt)n. q2t" + pj. with respective probabilities f. so t h a t P — i — q. for the score will be = (£)3*~9(1 + i!8)3 = + 3<8 + 31" + t2i) 1 ls = it~' + i ^ + W + ¥ This shows. for t h e Jp is qj" + p-fi-.'s: (?!<• + P^Wtf this may be written + P^Wzt" + p3n.f. the probability of all 3 coins showing tails. is G(t) = Gx(t). for x in the situation S. G2(t).f. and t h a t for the 2p q3t° + p3t\ Consider t h e product of these p. i n a single throw.g.g.If.FREQUENCIES AND PROBABILITIES 45 the variate x take the value 1 and when a tail is thrown t h e value 0. however. Consequently.g. In the last case. The p. if we calculate the moments of the distribution from the distorted. —1. the generating function is ( < ? + P\ti)n. we score in such a way t h a t each time a head is thrown we gain 5 points and each time a tail turns up we lose 3. as the reader should confirm by other methods. G3(t) 2.f. for x in the compound situation S ^ S j . t h a t for t h e lp. therefore.1.f.g.f. let the coin be symmetrical. f. . for. When we group all the values of a continuous variate x lying between xt ± into a single class and treat them as all being exactly Xi.

Using the transformation X( = (x{ — x^jjd. xn. and + (» . in a n y particular case. .. + (» . necessarily improve matters. E v e n then t h e y do not counteract the distortion completely. J-shaped) the calculated first moment need not be corrected. say. the frequency diagram of the following distribution: Height (cm) . often simultaneously. X. Draw 150 153 156 159 162 165 168 171 174 2 0 15 29 25 12 10 4 3 (L.1 )«. the terminal frequencies for the range are small (i. nor do they. so that Xl = 0. = 1. Statistical Mathematics. = (i . although they tend to do so on the average. These corrections are known as Sheppard's corrections./„. b u t the calculated variance should be reduced by an amount equal to h*/12. where * . ./ a + 2». . = . . the latter being accumulated in the product register. Sheppard's correction should not be applied (see Aitken. Measurements are made of the heights of 100 children. X..1)/.1). it is often convenient to take as working mean the lowest of the variate values occurring and to rescale using the (constant) interval length as unit. the distribution is not. . ./j + l. = O. These quantities are easily evaluated.ft + l 2 . + i-i S ftX? = 0.X. Let the given variate values be xx. pp 40-41). pp./. = xn — x„-i = d. . S f. Xn = (n . for example. then. . xt. . N O T E TO C H A P T E R Two When a desk calculator is used to compute the moments of a grouped distribution. this a d j u s t m e n t makes a difference of less t h a n in the estimate of the s t a n d a r d deviation. If. If the variate is essentially discrete.1).U. If. X(. . . EXERCISES ON CHAPTER TWO 1.e. 44-47).) Calculate the mean and the standard deviation. and m a y be applied only under certain conditions.4 6 STATISTICS and so corrections m u s t be applied to counteract the distortion due to grouping. in one continuous machine operation./. the former in the counting register. . Statistical Mathematics. = * » — * . for example. however. This transformation also enables higher moments to be computed by a simple method of repeated summation (see Aitken. where h is the length of each class-interval. Frequency . h is less t h a n one-third of the calculated s t a n d a r d deviation. . = 2./„ + 2. — * . .

) 5. and from it estimate both the median value and the standard deviation.FREQUENCIES AND PROBABILITIES 47 2. 4. Draw a cumulative frequency polygon for the data of Question 1. 3. A and B. The following frequency tables are obtained: TOTAL . Find the mean of each of the distributions: Wife's age (at last birthday) 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55% Distribution of Wives with Husbands a ged: 25-29 45-49 0-8 27-1 01 57-8 0-7 12-5 3-0 1-5 9-7 0-2 29-9 01 44-3 — 10-3 2-0 — 1000 100-0 (L. This component is obtained in bulk from two sources. In the manufacture of a certain scientific instrument great importance is attached to the life of a particular critical component.U. Price in £ 3-34 303 3-15 3-42 3-93 3-22 3-62 3-67 3-65 3-99 3'58 3-39 3-55 3-53 3-34 3-55 3-42 3-55 3-26 3-60 3-40 2-63 2-96 2-80 3-28 3-25 312 3-00 3-26 3-20 3-44 3-17 3-70 3-21 310 3-11 3-38 2-84 3-21 3-17 310 3-34 3-07 3-92 3-23 3-27 3-55 323 3-24 3-26 3-35 3-45 3-21 3-24 3-74 3-32 Estimate the median and quartile price. Construct a cumulative frequency diagram from: Price of an Electrical Fitting in 56 Shops. and in the course of inspection the lives of 1.000 of the components from each source are determined. and give a measure of dispersion of the distribution.

2. £(y).030-1. From a bag containing 6 red balls. In how many different ways can 3 letters out of 25 different letters be arranged if any letter may be used once. 1. 80.75 Probability of surviving 5 years p0 80 85 90 95 px p2 p% pt (I. sketch the curve and find the mean and variance.120 No.080-1. What is the probability that 2 particular chairs are not occupied? (L.48 STATISTICS Source A. twice or three times ? If two and not more than two different letters are used ? (L. Life (hours). (I. 1.) 8. (iv) none will die between ages 90 and 95.060-1.S.060-1. 161-7. if x and y are independent. (L.050 1.060 1. 6 white balls and 6 blue balls. 85 and 90 respectively (i) all will attain age 95. and that. of components. (R. IJ x <f>(x) = e~ . Median: ^3-28.100-1. Show that if is also independent of the property Es. derive expressions for the probabilities that. 159-6 cm. Solutions 1.040-1.040 1. 339 136 25 20 130 350 Examine the effectiveness of the measures of dispersion with which you are familiar for comparing the dispersions of the two distributions.) 6. A property is known to be independent of a property £ ? .U. 4-72 cm. 40 96 364 372 85 43 Source B.040 1. . Life (hours).) 11. 5 men are sitting each on a chair. Median value. has a range from 0 to co.020-1.A. of components.090 No. £(x + y) = £(x) + £(y).000-1. of a property (Ei + E3) and of a property (E2ES). 12. £(xy) = £(x).U. A probability curve. 3. Calculate the probability that the number of white balls drawn will exceed the number of red balls by at least two. Find also the third moment about the mean.U. y = <f>{x). (ii) all will die before attaining age 95.080 1.100 1.080 1. Exact age . of four men aged exactly 75.) 9. In a room containing 7 chairs. semi-interquartile range £0-16.050-1. (iii) at least one will survive 10 years.080-1.) 10.040-1. Using the symbols given in the table below.A.060 1. 12 balls are simultaneously drawn at random.S. Prove that if x and y are discrete variates.070 1.020 1.070-1.) 7.

1/21.yy Clearly x + y may take nm values. + i the sum of the probabilities that * takes the value x{ when y takes any one of the possible values ys. .FREQUENCIES AND PROBABILITIES 49 4. We have then u ii < j But Svn = it. 5. the probability that xy takes the value x<y1 is ptPj. . 10. 2 . . E2) = p(E1. say.b+ f ) . 25 x 24 x 3. ftj = 2. . Let x take the values x{ with probabilities pt. 12.P o P S P z ^ i .E2) * = ^a^s). 45-14. 6. 1. (i = 1. (iii) p0p. Mean. 1 8. Let El occur « . (ii) [1 -PoPiPiPAW -PiPMl 1 -/>J. . m). »j tj Summing first over j. Interquartile Range: (A) 27.y. . Full solution-.(i) PaP\PiP% . » ) and let y take the values yt with probabilities P)• (. E S E X ^31 times. 2 . Also let ir. 23 12 ~t~ ~f~ w123 n n2 + w12 + »23 + »j123 ». w12 times.£(y)] = £(y) S ptx. (B) 22. + S Z t t ^ . If now x and y are independent. = £{y) . + X P. (A) 120. (B) 46. 7. £(xy) = 2[ptX. Standard deviation : (A) 21. £ 2 or Es n0 times. . and none of -Et. £(x) = £{x) . and this is pt. £(y).) = S S ^ a r . i i The reader should now prove these two theorems for the case when both x and y are continuous variates.x. Therefore £(xy) = VZpiPixlys = SS (p. = £(x) + £(y). E 2 3 w23 times. Range-. E1E2E3 M123 times. (B) 60. Q _2_ 2 8 .. Accordingly 11. 26-89. E2 -(. A = [c — a + e)j(d . Then the conditions of independence are: p(E1) = p{E1. Likewise = P. w _ «12 + ». [l . E2 n 2 times. 1. . + m12 + nl3 + n23 + «123 _ %2S W 23 + W123 Now if ajb = cjd = elf = X. + «. i.PsY\. (I>Y ).Hence £(x + y) = Zp. Variance. times. + ^ + p2p% + pzpt-.e.^) .j be the probability that x + y takes the values x{ -j. . + .7 = 1 . 1 + wi. times out of n. Full solution-. ' j £(x + y) = (x( + y. w. 253 .

6. E) denote an event the outcome of which is either the occurrence of a certain event E or its non-occurrence. given C. and ysWe may arrive at these figures by a different route. and 0 heads will be 1. Let C(E. 4. Hence.CHAPTER THREE STATISTICAL MODELS I : T H E BINOMIAL DISTRIBUTION 3. We have p = q — Since each toss is independent of every 50 . we may rewrite the outcome of 2 tosses as 1 H H or 2HT or 1 T T and t h a t of 3 tosses as 1 H H H or 3 H H T or 3 H T T or 1 TTT Writing H H H as H 3 . Tossing a Penny. and that of E is q. Now suppose t h a t the probability of E happening. 1 and the corresponding relative frequencies. we have outcome of 2 tosses : either 1H2 or 2 H 1 T 1 or IT 2 „ 3 „ either 1H 3 or 3H 2 T» or 3H!T 2 or IT 3 By analogy. and assume t h a t the constant probability of obtaining a head in each throw is equal to that of obtaining a tail. Consequently the outcome of two tosses will be either H H or H T or T H or TT The outcome of 3 tosses may be written down as either H H H or H H T or H T H or T H H or H T T or T H T or T T H or TTT If now we disregard the order in which H and T occur. and is independent of the outcome of the first toss. 2 heads. We toss again. and the respective frequencies of 4 heads.. p = q = The outcome of a single toss is either a head (H) or a tail (T). and the outcome of this second toss is again either H or T. denoted by E. -fg. 4. is p. ^ . in 4 tosses we shall have : either 1H 4 or 4H 3 T! or 6H 2 T 2 or 4 H l T 3 or IT* In 4 tosses then there are 1 + 4 + 6 + 4 + 1 = 16 = 2* possible outcomes. We ask what is the probability t h a t in n occurrences of C there will be exactly * occurrences of E ? Once again we toss a penny. 3 heads.1. H H T as H 2 T \ etc.

+ p"t* . T 4 are the successive terms in the expansion of (H + T)(H + T)(H + T)(H + T). we have 1 = Pn(0) + Pn( 1) + Pn(2) + . . + />„(») . 4H 3 T 1 .2. Looking once more at the possible outcomes of 4 tosses. of (H + T) 4 Now... denoting the probability of exactly x E's in n occurrences of the context-event C(E. . + pn{x) + . 4H 1 T 3 . where p is the constant probability of the event E and q is t h a t of E. for instance) is (-J-) (-J-) (1 — — = TJ.e. is 6 x = f. ILL 51 other toss. (q + pt)n = pn(0) + pn(l)t + pn(2)t* + . the proba b i l i t y ^ ! E occurring exactly x times in n occurrences of C(E. And we notice t h a t H 4 .3) If now we put 2 = 1 .STATISTICAL MODELS. Generating Function of Binomial Probabilities. Suppose now we require to know the probability of exactly 47 heads and 53 tails in 100 tosses. in the general case. And since the number of different order-arrangements of # £ ' s and (n — x) E's is n\/x\(n — x) ! or (^"j. Hence..2. E) by pn(x). the relative frequency we obtained before. + pn{x)tx + . H 4 . for instance. 6H 2 T 2 .2. Is there some comparatively simple way of obtaining this probability without. and p + q = 1. setting out all the possible arrangements of 47 heads and 53 tails in 100 tosses ? 3. . (. . by the Binomial Theorem. the probability of exactly x E's is (fjpzqn-z n (3. (3. . the probability of throwing 2 heads and 2 tails.2. i. . say. . the probability of four heads. irrespective of order. .Next. E) in some specified order is pxqn~x.1) Now the expansion of (q + pt) is. therefore the probability of obtaining H 2 T 2 . + pn(n)tv.q + pt)n = qn + ( j ) r 1 ^ + (g)?"-8^2 + • •• (3. . in some specified order (HHTT. but we may arrange 2H and 2T in a group of 4 in 6 different ways (HHTT H T T H T T H H T H H T T H T H HTHT) .2) + {^jq ~ p t n x x x + . we recall that there are sixteen possible results—either H 4 or 4 different arrangements of H 3 T J or 6 different arrangements of H 2 T 2 or 4 different arrangements of H ^ 3 or T 4 . in four tosses is (i)W(i)(i) = (i) 4 = ts.

. Pn(2) = . (3..3. which is clearly 1. then.52 STATISTICS and this is only'to be expected. By (3. p. t . for the right-hand side is the probability of either 0 or 1 or 2 or . 3. which for a given q and n is quickly calculated.1) Now pn (0) = q . then the probability of exactly x E's in n occurrences of C is given by the coefficient of f in the expansion of (q + pt)n. which because of its method of generation. Some Properties of the Binomial Distribution. If the probability. pn(\). o r n E's in n occurrences of C. for this particular distribution. pn(0). Replacing x by x + 1 in (3.3. 0-59049 = 0-32805 ' ' ' ' 0-32805 = 0-07290 0-07290 = 0-00810 0-0081 == 0-00045 0-00045 = 0-00001 4 1 PtW ~~ 2 '9 3 1 _ 3 9 2 1 Pe( 4) _ 4 ' 9 1 1 P*W _ 5 ' 9 . We may say. p.2. of an event E is constant for all occurrences of_its context event C. . Binomial Recursion Formula. t .1) P>(* + 1) = Since II ="o II Hl *.(0) PM = 0-59049. is called the B I N O M I A L D I S T R I B U T I O N . . we have Pn(x + 1) = [x * J f — ' P + Hence M* n 1 + 1) = • P~ • Pn(x) . 3.g.f.3.1). etc. Thus (q + pt)n is the Probability Generating Function. Let us calculate the binomial probabilities for p = n — 5. Then pn( 1) = ? .4. the outcome of which is either E or E.

(S • in •w •M • rl •o 10000 stoo-o 1800 0 6ZC00 I9Z20 S06S0 o .I-0 .2000 0 tWO'O 0220-0 8121-0 996S0 09S20 08Z.

4 (C). t h a t rpn(x) increases with * v n + 1 ' until x > p(n + 1) — 1. I t will be seen (Fig. although for a given value of p the skewness decreases as n increases.> rp. this means t h a t plt(x) is a m a x i m u m when x = 4.1 as —.e. i.4 (c)) t h a t .4 shows histograms of the Binomial distribution for different values of p and n. 3. The distribution is unimodal. Moment Generating Functions. B u t since x can only take integral values. Otherwise the distribution is skew. whatever the value of n.54 STATISTICS of course (why?). <0 irt © O 00 S? o CM to U) e< i W O o 4 * to 5S CO C0 W O O o 3 0 1 2 3 5 6 FIG. p u t t i n g q = 1 — p. whenever p = q = the histogram is symmetrical. Taking p = n = 16. From (3. so long x + 1 q * 4.1) we see t h a t pn{x + 1) will be greater t h a n pn(x) so long as > 1. We see. n = 6). unless pn is small. ple{x) increases until * > 17/4 — 1 = 3-25. 3. 3. Can we find functions which generate the moments of a distribution in a manner similar t o t h a t in which the probability generating function generates the probabilities of a variate in a certain set of .—.5. 3.. Fig. then.—Binomial Distribution (p — ^ = q.3.

• • • • (3.1). is consequently the coefficient of t'jr\ in the expansion of M(t). (3. 3. (j..f.5. . and putting t = 0. of t h a t distribution. + xfF/r! i=l ^P(xi) + [ Z p(xi)x^ t + [ S p(xi)xi^ + . in general. We see then that. . . . + [J = 1 +|V</1! + (V72! + • • .. (3. pron viding the sum S p{x()e"i' exists—which it will always do when n is finite—the Moment-generating Function.14. . .1) The rth moment about x — 0.2) For a continuous variate with probability-density <j>(x).)) t*/2l S (p(Xi)( 1 + Xit + XiH*/2\ + . G{t). of a distribution by e' and call the function of t so formed M(t).2. (i = 1. and. n) +. Assuming now t h a t we may differentiate both sides of (3.5. by e'.)e*<«.STATISTICAL MODELS.. + (Vr/r! + £(*<)*<•] P/H + . 2./. M(t) = £(e") = •'+00 Compare with 2.5. ^ = ^ = M <"(0) . . and M(t) is the Moment-generating Function required.g. . . . M(t). I 55 circumstances? Let us replace the t in the p. e*4>(x)dx . .5. Then M{t) = G(e>) = = = = 2 p(x. G(t). for a given distribution is obtained by replacing t in the probabilitygenerating function.2a) .

Hence M(t) = (q + pe')n.5. Consequently._„ = [npe'(q + ^ < ) n " 1 ] 1 = 0 ' which.6) . X. since (q + p) = 1. then.5.5) or. gives Hi' = np Likewise. is given by ( 2 = ( j — (n/)2 = np + n(n — 1 )pi — n2p2 = X X' np( 1 — p) = npq . (3. and Mm(t) = «-»«Af(<) . however. the variance of the distribution. (V = M"(0) = = (q + pe')n\ 4-^e')"" = o 1 (3. and transfer the origin to x = m.4) The mean of the Binomial distribution is np and the variance is npq. the second moment about the mean.5.5. Generally. is given by M = M ' ( 0) = [ | { ( ? + /*•)">]./ = M'(0) = m. generating function. is given by Hr = [ j j ^ w W ] = M m^rm- • (3. Exercise : Show that « = np[(n — 1)(» — 2)p 2 + 3(n — 1 )p + 1]. If now we assume t h a t [/.3) = [»M? 2 2! + = 0 1)£ « (? + ^ ' ^ " L ^ o = np + n(n — l)p1 = [Aj' -f n(n — 1 )p2 Therefore. (3. while the rth mean-moment of the distribution. measuring the variate from this new origin and calling the new variate so formed. g(e") = S(es' + ">) = eMS(ex'). we have x = X + m. I t follows t h a t the generating function for moments about any line x = a is obtained by multiplying M(t) by e -" . the mean Hi'. G(t) = (? + pt)n. we are more interested in the meanmoments (moments about the mean of a distribution) t h a n in those about x = 0. (jt2. if Mm(t) denotes the mean-moment M(t) = e""Mm{t).56 STATISTICS In the particular case of the Binomial Distribution. Consequently. say.

6. We have n = 50. that the mean is np.STATISTICAL M O D E L S . No. ILL 57 Exercise : Show by direct differentiation that for the binomial distribution : (i) MI = Mm'(0) = {«—•(? + M " } ] ( _ o = 0 (it) M = M„"(0) = a {"-""(? + ^ ' ) n > ] 1 .50 Using the method of 3. Thus _ 0. If. the reader should verify that the estimated frequencies are those given in the following table : No.19 + 4. So far we h a v e considered Binomial distributions for which the probability of E in n occurrences of C has been known in advance.1 np = 50p = loo which gives p — 0-04 and the mean = 2. of 12 27 29 19 8 0 1 2 3 4 5 4 6 1 7 0 Estimated No.12 + 1. of headless matches per box.4. we find the mean of the observed distribution. of boxes Estimated No.4 + 6.27 + 2. then. of headless matches per box Observed boxes No. 0 = n Pl 3. With this value of p. I n practice this rarely happens.29 + 3. of boxes 0 12 1 27 2 29 3 19 4 8 5 4 6 1 7 0 Total 100 Let us assume that the distribution of headless matches per box over 100 boxes is binomial. Consider the following example : Worked Example : The distribution of headless matches per box of 50 in a total of 100 boxes is given in the following table : No. of boxes (nearest integer) 12-97 26-99 27-50 18-30 8-95 3-43 107 0-28 13 27 28 18 9 3 1 0 . the binomial distribution of frequencies of headless matches per box over 100 boxes is given by 100(0-96 + 0-04*). We remember.8 + 5. Fitting a Binomial Distribution. we can estimate p. but we have no a priori value for p. however.

1) x=0 x=0 Generally.l) 2 + 19-12 + 8-22 + 4-32 + l-4» 100 = 1-78 (b) variance of theoretical. (3. .. Thus our match-manufacturer would. 50 3 (50) = x=i £ p50(x) S p„(x) = 1. the evaluation of a single binomial probability is tiresome enough. .2)2 + 27 . probably. (a) variance of observed distribution = 12 . estimated.Pn{x < k) When k is small and n not too large. and let Pn(x < k) denote the probability of less than k such occurrences. however. + Since.7. ( . in n occasions of C(E. however. while . We may overcome this difficulty by using the facts t h a t : when n is large and p is small. Usually we are more interested in the probability of at least so many occurrences of an event. this formula is useful. ( .o(* > 4) = pb0(4) + pb0(5) + . Let Pn(x &)_denote the probability of k or more occurrences of E in n C(E. E.3. . Now 50 P. pn( 1). let alone that of the sum of several such probabilities.P 5 0 (* < 4) . the binomial distribution approximates to the Poisson distribution (see next chapter) . .7. When n is large and k is large. but he might well be concerned to estimate the probable number of boxes containing 4 or more headless sticks. P 5 0 (* > 4 ) = 1 - S p50(x) = 1 .58 STATISTICS The " fit " is seen to be quite good. pn(k — 1) may be evaluated directly using the formula (3. . Then the probability of four or more headless matchsticks per box of 50 is denoted by Pb0(x Js 4). E) rather than in the probability of exactly so many occurrences of E. . If now we compare the variance of the observed distribution with that of the theoretical. we find that. the mean being 2. distribution. for the calculation of Pn(x < k) is then not too tedious and the successive values pn(0). " So Many or More " .1). E). Pn(x > k) = 1 . estimated distribution = npq = 50 X 0-04 X 0-96 = 1-92 3. not worry unduly about how many boxes in a batch of 100 contained exactly 4 headless matchsticks.

n 99 we may use what is known as the Incomplete Beta Function Ratio. n .3) Returning to (3. .A. . of E. (3. The Gamma Function.k + 1) . edited by Karl Pearson. Alternatively. it approximates to the normal distribution (see Chapter Five).4) See Note at end of this chapter..2) o If in (3. for.A. (3.1 usually denoted by Ip(k. writing dx = 2 X d X . (3. University College. pp.iJn 1 xn~1 exp (— x)dx .A.l ) r ( » . if n > 1. . for probability. See P. Pn(x ^ k) = Ip(k. 3 we have. London. we have T(n) = 2 [ " X . 0 0 >0 .A.1) and integrating by parts. (3. In such situations we use the properties of these distributions to approximate Pn(x ^ k). we have an alternative definition of r (n).2) Tables of the Incomplete B-Function Ratio.X*)dX . ILL 59 when n is large but p is not small. 188 et seq. Abbott. MATHEMATICAL NOTE TO C H A P T E R THREE A. pp. If n is positive the infinite integral 2 I xn ~1 exp (— x)dx has a finite value.A.1) 1 2 3 (3.0 r(n) = xn~l exp (— x)j + (n — 1)J xn-2exp(— x)dx = (n . 227 et seq. we write r(n) = o We have immediately T(l) = J exp (— x)dx — 1 . are published by the Biometrika Office. which gives the value of Pn(x ^ k). Teach Yourself Calculus. Abbott. n — k + 1). for k 50.A. p. .STATISTICAL MODELS.7. cit. I t is clearly o a function of n and is called the Gamma Function .1) we put * = X 2 . op.1) exp ( .

.»/2 i) = 2 / = Jt • • (3. (3. n) = 2 / sin 2 ™. 1) = 1 z = 1 — x. The Beta Function. <j> = TT/2. (3.5) / B.B.B. When x = 0.2) . If m and n are positive.C. I t follows that. n) and T(w) and r(n) are related by the formula which immediately displays the symmetry of the B-function in m and n. Now put and B(m. m) (3. since B(\. we have T(n) = (n 1 )(n . op. I t follows at once t h a t . x = sin 2 <j>. .A. . i r ( l ) = (n — 1)! . 225. Now make another substitution.5) C. It can be shown t h a t B(m.4) a useful alternative form.1) (3. Beta Function and write B(m. cit.n)={ Clearly. Relation between Gamma and Beta Functions.B. this integral is We call this function the — x)n~1dx . 2 . ..60 STATISTICS Applying this "formula to the case where n is a positive integer. <j> = 0 and when x = 1. (3. Next consider the integral i xm'1(l — x)n~1dx.3) Thus we see t h a t the Beta Function is symmetrical in m and n. .dx = 2 sin <f> cos <f>d<j). (3.1 <f>d<j> . = f zn~1( 1 — z)m~1dz1 K = B(n. Thus /•wa B(m. n) = — ( (1 — z)m~1zn~1dz \ o B( 1.2) Abbott.B.B. = tt and T(l) = 1 B(b i) = (T(i)) 2 or m 1 = Vk . p.2) then dz = —dx o finite and is a function of m and n.1 ^ cos 2 ".

(3. Dividing this incomplete B-function by the complete B-function gives us another function of t. moreover. and n = n — k + 1. 1) the result is a function of m and n. we have r(-J-) = 2 J Consequently exp ( . to which we referred in 3.*)-1 dx Then *)n-kd* (3. It{m. using (3. m = k.C. IP(k. The Incomplete B-function Ratio. m and n.1) x)»-*dx Now put t — p. m and n are integers. say. n) = If. Suppose now t h a t we integrate over only a portion of the range.k + 1) = f %-Ml 1)! (n . .k)\J (k - f Integrating by parts and putting q = 1 — p. called the Incomplete BFunction Ratio. n) = B((m.STATISTICAL MODELS. We have seen t h a t if the function — x)n~1 is integrated over the range (0. .^ + + • •.D. . m and o n and is called the Incomplete B-Function.3). n) = / # m _ 1 ( l — x)n~1dx is a function of t. n)/B(m. + ^ k(k + 1) . from 0 t o t. n) = -Am + * f*»-1 (m .A.3) a result we shall need later.1)! (n — l)\Jo (1 . / 00 exp (—x*)dx = .D. D.2) (3. . then Bt (m. . n) and write It(m. n . We denote it by It(m. {n - \)n pn .7.x2)dx. I 6L But.

(L. However. If on the average rain falls on twelve days in every thirty.7. .j = 1 — Ii-.n)=\ — I-i-i(n. + pn (3.1). we can make use of the simple relation It{m. find the probability (i) that the first four days of a given week will be fine.n-k+l)= (f)p*q"-* + (k * 1}p* + iqn-k-i + . Find r/m in terms of m in order that there may be a nine to one chance that the selection is free from any errors. the binomial probabilities for p = i.(n. Calculate. and the remainder wet. . = pn(k) + p„(k + 1) + . (3. n) for 0 ^ n ^ 50 and n ^ m ^ 50. . . n = 8.U.62 STATISTICS Consequently.3) which is easily proved as follows.m) . Calculate the mean and variance. (3.D. In a book of values of a certain function there is one error on the average in m entries. m) o X"-1^ Writing x = 1 — X in - _ x ) m - l d x ' X^-^dx] EXERCISES ON CHAPTER III 1. correct to the four decimal places.) 3.D. + pn(n) = Pn(x > A) The Tables of the Incomplete B-function Ratio referred to in 3. Thus if n > m we cannot use the tables directly to evaluate /<(w. In . 2.. IP(k.1)1 f1 (w-1)!(M-l)Ul_( (m + n — 1)! r f 1 = (m — 1! (w — 1)! [ / o . the chance of all being accurate is (m — 1 )/r times as great as that of having only one error included in the selection.7 give t h e values of 1t(m. Prove that when r values are turned up at random (with the possibility that any value may be selected more than once). n).2) . we have M) ^ = ( m + w . (ii) that rain will fall on just three days of a given week. The values of t are given in steps of 0-01.

(0 < p < 1). 9 as indicating failure.U. etc.. 2. 1. 7.0-0231. 4. 7. the first five random digits will be 28986. 3. count the number of S in each 5-digit group. Let the digits 0.0-2076. Solutions 1.STATISTICAL M O D E L S . in a certain trial and the digits 4. 4. (L. S. 0 0038.0-1001. 5. 1. s a = 1-5.A. Repeat using the columns of Table 7.) 6. 2. r\m -. 5. Form a frequency table giving the number of groups with 0. Repeat taking 0. 2 as indicating an S and 3. r increases to a limiting value of nearly 10-5% of the size of m. Table 7. 0.1. (i) 0-00829. 0-0000. i.0-2675.6pq)/npq. 6. Calculate the theoretical frequencies and compare these with those actually obtained. gives 500 random digits grouped in 100 groups of 5 digits.e. Working along the rows of the table.I . 2.. ILL 63 this case prove that as a very large set of tabulated values approaches perfection in accuracy.0-0865. p = . Calculate the value of p if the ratio of the probability of an event happening exactly r times in n trials to the probability of the event happening exactly n — r times in n trials is independent of n . (ii) 0-2903. 7. 5. The theoretical distribution will be given by /g 4A 6 100 ( + j . Show that a measure of the skewness of the binomial distribution is given by (q — p)l{npq)i and its kurtosis is 3 + (1 . page 136. 3 be each taken as indicating a success. 1.0-3115. 3. m = 2. 6. 2. 5. 8. 9 a failure. S's.1. 1. 8. 0-0004.(Ijm) log 0-9/log [(m — 1 )jm\. 6. 6. (I. 8.) 4.

infinite. although N is finite. and the mean number of flashes over a specified number of such intervals is found. 4. . as an estimate of p. E in fact is found to have occurred only a finite number of times.2. The Poisson Model. to all intents and purposes. How do I set about estimating the percentage of pages in the whole book with 0. t h e total number of occasions upon which t h e event both did and did not occur.CHAPTER FOUR STATISTICAL MODELS. i. But in our present problem. Here is a similar problem : A small mass of a radioactive substance is so placed t h a t each emission of a particle causes a flash on a specially prepared screen. We are therefore faced with the 64 . it is clearly ridiculous to ask how many times an error could have been made on one page but was not. has occurred.1. E has occurred in the N equal intervals. we want t o know n. on the average. although we know the number of times. there are 2 errors per 5 pages. B u t in such cases as we are now discussing this is not possible : n = Nn% is indefinitely large because. in the event of «. whose probability we wish to estimate. .• being known and finite. I am correcting the page-proofs of a book. the ratio Nm/Nni. could or could not have occurred in a fixed interval of time or space is. of occasions upon which an event. 2. The number of flashes in a given time-interval is recorded. E. 1. we need to know not merely t h e number of times an event E. I I : T H E POISSON D I S T R I B U T I O N : STATISTICAL R A R I T Y 4. errors ? To use the Binomial distribution. although over the N intervals sampled. but also t h e number of times it could have occurred but did not. Nm.e. Moreover. On Printer's Errors. Nm. 3 . say. is now meaningless. which we could have used. what is t h e chance of observing some specified number of flashes in one time-interval ? Both these problems arise from situations in which the number. We can use the Binomial distribution as a model only when we can assign values to p and to n. I find that. After having corrected some 50 pages.. On the assumption t h a t the disintegration of any particular atom is purely fortuitous. is indefinitely large.

limit-distribution. q = 1 — mjn. Our clue is this : if the number. . very large. the probability-generating function of the new. + mTtTlr\ + .STATISTICAL MODELS. of occurrences of E is finite in N fixed intervals (of time. . . p{ = NmjNrii — m\ni) must be very small. . . Nm. + mrjr ! + . is «»>(<-*). then. area. what happens t o the Binomial distribution under the conditions t h a t (1) n tends t o infinity. and (ii) t h a t the probability of either 0 or 1 or 2 or . p. becomes (1 + m(t — 1) /«)" and this (see Abbott. npq. . and the Binomial variance. and thus p is extremely small (i. em = 1. as we should expect. (4. volume) and the n of the Binomial expansion ( = Nni here) is very. np.3.m2/2 ! + .1) signify? Since we have derived the Poisson distribution from the Binomial by putting p = mjn and letting n tend t o infinity. thus p(x. We ask. the event is relatively rare) ? The probability-generating function of the Binomial distribution is (q + pt)n. ) = e~m . 4. length.1) . We have em(t-1) = e-m . Then the p. in accord with this. Teach Yourself Calculus. Some Properties of the Poisson Distribution.3. (a) What exactly does the m in (4. (4. but (2) np remains finite. . . . . ill 65 task of so modifying our old model t h a t we can circumvent difficulties of this kind.g. we shall obtain t h e Poisson Mean and the Poisson variance b y operating in the same way on the Binomial mean.f.2. gmt = e-m(l m tj\| + mH*l2\ + . . 127) tends to the limit as n tends t o infinity. . Thus we have Poisson Mean = Limit np == Limit n • mjn = m . Thus under the conditions set down. m) = e-mmxlx! We note at once : (i) t h a t it is theoretically possible for any number of events to occur in an interval.) The probability of exactly x occurrences of a statistically rare event in an interval of a stated length will then be the coefficient of tz in this series..e.1) . .2. P u t np — m and. . This new distribution is called the Poisson Distribution. occurrences of the event in t h e interval is e-»»(i -f m/l ! -(. .

3.1) is t h e value of both the mean and the variance of the new distribution. er*.1)] .3. p(x. m) =er™ . n —^ o o > —oo i (4. (4. . it follows t h a t the momentgenerating function is given by M(t) = exp [»»(«« .2) Thus we see t h a t the m in (4. 0-9048 0-8187 0-7408 0-6703 0-6065 0-5488 0-4966 0-4493 0-4066 m. Since G(t) = «»»('-1). and the mean-moment generating function by Mm(t) = e-"» . m*+1l{x + 1)! = —^r where ^>(0) =«-»».3) = [J\t{M{t)me^ = mM{ 0) + mM'{ 0) = m(m + 1).5) For convenience we give t h e following table: TABLE 4. = 0 0067 X 0-8187 x 0-9704 = 0 0053.2.3.» 66 STATISTICS Poisson Variance = Limit npq = Limit n .2. 0-3679 0-1353 0-0498 00183 00067 00025 0-0009 0-0003 0-0001 m. (4. 1-0 2-0 3-0 4-0 5-0 6-0 7-0 8-0 9-0 e~m. . x -f. (6) Higher moments of the distribution m a y be worked out by using 3.3. exp [m(e> — 1)] . for example. e-**>.5. (c) To ease the work of calculating Poisson probabilities. (1 — mjn) — m . mjn . for example. 001 0-02 003 0-04 0-05 0-06 007 0-08 0-09 e 0-9900 0-9802 0-9704 0-9608 0-9512 0-9418 0-9324 0-9231 0-9139 (x N O T E : Since e~ +" + " = e^ .3. m) (4. We have.4) . i*2' = M"(0) = exp (m(e> 1))]^ = [M(t)me< + me'M'(t)]t-o .1 . we note t h a t : p(x + 1. 0-1 0-2 0-3 0-4 0-5 0-6 0-7 0-8 0-9 Values of er™ e-m. m. er* . we see that. if we have to use this table.

STATISTICAL MODELS. Bortkewitsch collected data on the number of deaths from kicks from a horse in 10 Prussian Army Corps over a period of 20 years. 1.3 and formula (4. for m > 1. it becomes positively skew. . 4. 2. . 3 . Consequently using Table 4. we should expect 67 pages with 0 errors. thereafter. ill 67 (d) Fig. but sufficiently entertaining to bear repetition.3. Worked Examples 1. 27 pages with 1 error. 2. I t will be seen t h a t when m < 1. It is assumed that relevant conditions remained . errors per page based on the 50 pages sampled are : X 0 0-6703 1 0-2681 2 0-0536 3 0-0071 4 0-0007 P(X) Thus. is here 0-4. while x < m — 1 and. m.3. From (4. shows probability polygons for m — 0-1 and m = 3-0. it follows t h a t p(x) increases with x. decreases. 5 pages with 2 errors and 1 page with 3 errors. the mean.5). when m > 1. Consider the proof-correcting problem discussed at the beginning of this chapter. tending towards symmetry as tn assumes larger and larger values. 4. but.4. in 100 pages.3.5). This is a classical example. the probabilities of 0. t h e polygon is positively i-shaped.

0-61 X^O-lOll = o-lOll 0. Using the Poisson model with m = 0-61. Moreover. . Being the limit of t h e Binomial distribution.°'61 0 61 = X 4°-° 206 = 0-0031 and /(4) = 0-62 = p(5) X 0 0031 5 0-0004 and /(5) = 0-008 The " fit " is good. „„. Sampling reveals an average of 1 bulb per 100 defective. = 0-5433 and /(0) = 108-66 p( 1) ^ ° ' 6 1 X ^ 5 4 3 3 = 0-3314 and /(I) = 66-28 p(2) = M 0-61 x 0-3314 = z . each carton containing 100 electric fight bulbs. the estimated frequency of x deaths per Corps in 200 corps-years will be given by f(x) = 200e~°'61 . (0-61)*/*! Using Table 4. Suppose we have a consignment of 1. His figures are : Actual deaths per corps Observed frequency 0 109 1 65 2 22 3 3 4 1 5 0 Total 200 The mean of the sample.12 P{i) . when p becomes very small (the event is rare) and n tends t o infinity.5). it is much easier t o calculate e~mntx/x\ t h a n it is t o calculate ( ^ j pxqn~x.3. m is I X 65 + 2 x 22 + 3 x 3 + 4 x 1 0-61.0206 and /(2) = 20-22 and / ( 3 ) = 4 .000 cartons. 4. p(0) = Using (4-3. Approximation to Binomial Distribution. t h e Poisson distribution m a y be expected t o provide a useful approximation t o such a Binomial distribution.68 STATISTICS sufficiently stable over this period for the probability of being kicked to death to remain constant.5.

. q == and n = 100. I t follows from (4. we run our eye up the line m = 1 until it intersects the curve k = 2. 2. . As we saw when discussing Binomial probabilities. m) = £ tr™m*lxl x= k .5.6. with p = rra.6. I n this case m = np — 1 and so p(x. Poisson model . m). On t h e horizontal axis are values of m .STATISTICAL MODELS. (4. moving horizontally t o t h e left.1) To avoid having t o calculate successive terms of this series. defectives per 100 . when the mean number of occurrences in a sample set of such intervals is m. The Poisson Probability Chart. II 69 If we use the Binomial model. Binomial model. In the case considered in 4. 3. . 4. If we want to find t h e probability t h a t a given batch of 100 bulbs shall contain 2 or more defectives. while along the vertical axis are values of P{x ^ k. . probability of x defectives in 100 bulbs will be given by £100 (*) = and p100{x + 1) = ( 1 °°)(l/100)-(99/100)"»-» ~ • Piaai*) Since the occurrence of a defective bulb is a rare event. 1) = e-i/xl while p(x + 1 )/p{x) = l/(x + 1). we use the Poisson Probability Chart of Fig. . 0 1 2 3 4 5 6 36-64 37-01 18-51 6-11 1-49 0-29 0-05 36-79 36-79 18-40 6-13 1-53 0-31 0-05 The reader should check these figures as an exercise. we may use the Poisson model. we frequently require t o know t h e probability of so many or more occurrences of an event. we find the required probability marked on the vertical . will be given by P{x > k. m = 1.1) t h a t the probability of k or more events in any interval. across the chart are a series of curves corresponding t o values of k — 1.2. The following table results : No. then. 4.6.

7° CD O £ a do samvA .

STATISTICAL

MODELS.

ILL

71

axis—in this case 0-26. This means t h a t of the 1,000 batches of 100 bulbs about 260 will contain 2 or more defectives. We could have arrived at this result by recalling t h a t the probability of 2 or more defectives is P(x ^ 2, 1) = 1 - (p(0) + p{ 1)) = 1 - (0-3679 + 0-3679) = 0-2642. We may also use the chart t o find approximately the probability of an exact number of defectives, 2, say. We have already used the chart to find P{x~^2, 1) = 0-26. In a similar way we find t h a t P{x~^3, 1) = 0-08, approximately. Therefore, p(2) = 0-26 - 0-08 = 0-18, approximately. The calculated value is, as we have seen, 0-1840. 4.7. The Negative Binomial Distribution. We have derived the Poisson distribution from the Binomial, and a necessary condition for the Binomial distribution to hold is t h a t the probability, p, of an event E shall remain constant for all occurrences of its context-event C. Thus this condition must also hold for the Poisson distribution. But it does not follow that, if a set of observed frequencies is fairly closely approximated by some Poisson series, p is in fact constant, although, in certain circumstances, this may be a not unreasonable inference. If, however, it is known t h a t p is not constant in its context C, another distribution, known as the Negative Binomial distribution, m a y provide an even closer " fit ". Suppose we have a Binomial distribution for which t h e variance, npq, is greater t h a n the mean, np. Then q must be greater than 1, and since p = 1 — q,p must be negative. B u t np being positive, n must be negative also. Writing n = — N and p = — P , the p.g.f. for such a distribution will be G(t) =(qPt)-* The trouble about this type of distribution lies in t h e interpretation, for we have defined probability in such a way t h a t its measure must always be a number lying between 0 and 1 and, so, essentially positive. Again, since n is the number of context-events, how can it possibly be negative ? Any detailed discussion of this problem is beyond our scope, but the following points m a y be noted : (1) I t is often found t h a t observed frequency distributions are represented by negative binomials and in some cases that this should be t h e case can be theoretically justified

72

STATISTICS

(G. U. Yule, Journal of the Royal Statistical Society, vol. 73, pp. 26 et seq.). (2) In many cases, if two or more Poisson series are combined term by term, a negative binomial results. (3) Frequency distributions with variance greater than the mean often arise when the probability p does not remain constant. We conclude this section with an example of a kind fairly common in bacteriology, where, although a Poisson distribution might reasonably be expected t o hold, a negative binomial gives a better fit. The following table gives the number of yeast cells in 400 squares of a haemacytometer : Number of cells Frequency 0 213 1 128 2 37 3 18 4 3 5 1 Total 400

The mean is 0-68 and the variance 0-81, both correct to two decimal places. Putting np = 0-68 and npq = 0-81, we have q = 1-19 and, so, p = — 0-19 and n = — 3-59. The p.g.f. is thus (1-19 — 0-19<)-3M. Hence , ,,

P(X + =

3-59 + * 0-19 _ , • FT9 ' p ( x )

Calculating these probabilities and comparing them with those obtained from a Poisson model with m = 0-68, we have No. of cells Observed frequency . Negative binomial Poisson 0 213 214 203 1 128 123 13& 2 37 45 47 3 18 13 11 4 3 4 2 5 1 1 0

STATISTICAL

M O D E L S . ILL

73

EXERCISES ON CHAPTER FOUR 1. Rutherford and Geiger counted the number of alpha-particles emitted from a disc in 2,608 periods of 7-5 seconds duration. The frequencies are given below : Number per period. 0 1 2 3 4 5 6 7 Frequency. 57 203 383 525 532 408 273 139 Number per period. 8 9 10 11 12 13 14 Frequency. 45 27 10 4 2 0 0

Show that the mean of the distribution is 3-870 and compare the relative frequencies with the corresponding probabilities of the " fitted " Poisson distribution. 2. If on the average m particles are emitted from a piece of radioactive material in 1 second, what is the probability that there will be a lapse of t seconds between 2 consecutive emissions ? 3. A car-hire firm has two cars, which it hires out by the day. The number of demands for a car on each day is distributed as a Poisson distribution with mean 1-5. Calculate the proportion of days on which neither of the cars is used, and the proportion of days on which some demand is refused. If each car is used an equal amount, on what proportion of days is a given one of the cars not in use? What proportion of demands has to be refused? (R.S.S.) 4. Show that the sum of two Poisson variates is itself a Poisson variate with mean equal to the sum of the separate means. 5. Pearson and Morel (Ann. Eugenics, Vol. 1, 1925) give the following table showing the number of boys at given ages possessing 0, 1, 2, 3 . . . defective teeth: Number of teeth affected. 0 1 2 3 Central ages (years). 7A 12 4 6 4 Q1 7 16 14 23 11 Total. 10A 27 13 28 20 1 61 47 43 35 13& 67 69 50 41 183 147 150 111

**74 Number of teeth affected. 4 5
**

6

STATISTICS

**Central ages (years). Total.
**

Q1T O3-4

10&

11K

13A 22

10

7 8 9

10 11

7 5 3 4 4 1 1 1 52

12 Totals

21 15 16 5 11 6 4 2 1 145

14 7 7 3 5 1

28 15 20 5 5 2 1 2 1 265

8 2 7 3 1 2 282

92 52 54 19 32 12 8 5 869

125

Estimate the probability of a defective tooth for each age group. Plot this probability against age. Fit a negative binomial to each age group and also to the total. Fit a Poisson distribution to any group and to the total. Comment on your findings. (E.S.E.l) Solutions 2. Complete Solution : If there are on the average m emissions in 1 second, there will be mht on the average in Si seconds. The probability of no emissions in 81 will then be exp (— mSt) and that of 1 emission e^'mSt. Therefore the probability of 0 emissions in n intervals of Si and of 1 in the next such interval will be Sp = exp (—mnht) exp (— mSt)mSt. Let nSt —> t as n —> oo and 81 —> 0, and we have ^ = me-*u at or dp = me'^dt. 3. 0-223; 0-191; 0-390 ; 0-187. 4. We use the fact (see Chapter Two) that the generating function of the sum of two independent variates is the product of the generating functions of the two variates. The m.g.f. of a Poisson variate with mean m1 is exp (m^t — 1)); that of a Poisson variate with mean m 2 is exp (m2(t — 1)). Hence the m.g.f. of the sum of these two variates is exp (ml + m j ( t — 1), i.e.', that of a Poisson variate whose mean is the sum of the separate means.

So far our model distributions have been those of a discrete variate. 1954. 117. we m a y call " measurables ". Journal of the Royal Statistical Society. P u t rather crudely : up till now we have been concerned with the distribution of " countables ". by S. 1951 ". Rosenbaum (Directorate of Army Health). or continuous variates.CHAPTER STATISTICAL III : T H E N O R M A L FIVE M O D E L S D I S T R I B U T I O N 5.1 (Adapted from " Heights and Weights of the Army Intake. Series A. vol. Table 5.) Number Height (cm) 100 150-153 294 153-156 881 156-159 2 172 159-162 4 368 162-165 7 173 165-168 9 627 168-171 171-174 10 549 174-177 9 422 177-180 6 886 180-183 4 103 183-186 1996 186-189 787 189-192 264 192-195 81 Total 58 703 75 .1. equally crudely.1 shows the distribution of heights of National Servicemen born in 1933 (mostly aged about 18 years 3 months). Continuous Distributions. Now we must consider distributions of what. Part 3. TABLE 5.

I t was first discovered by de Moitfre in 1753 as the limiting form of the binomial. have been taken. had the total number of men been indefinitely increased. we cannot reduce t h e interval indefinitely. t h e smaller. continuous parent population of measurable items. In practice. By increasing the size of our sample and reducing the class interval. however. lies within some specified interval. for instance. In such a case. But. we make the steps of our histogram narrower and shallower. of course. Kendall has commented as follows : " The normal distribution has had a curious history. . measure the " true value " of any quantity. In his book The Advanced Theory of Statistics. whatever it is. then we can conceive of a position such that. G. but was apparently forgotten and rediscovered later in the eighteenth century by workers engaged in investigating the theory of probability and the theory of errors. If. and we come t o t h e idea of a continuous distribution. Such a distribution is essentially ideal. below a certain limit. t h a t which we are attempting to " measure " no longer exhibits " definite " boundaries. on certain plausible hypotheses. the most we can ever say is t h a t " the height " of a certain man. but there will always be an interval. One of the most important continuous distributions is t h a t to which the distribution of heights in Table 5. We then say t h a t our variate (height in this case) varies continuously. in fact. however. The discovery t h a t errors ought. we idealise the situation and ignore the existence of such a lower limit. .7 6 STATISTICS Smaller class intervals could. We never. lying within t h a t interval. but we may regard any actual finite sample of measured quantities as a sample from such an infinite. there will always be at least one value of our variate. some of the classes might well have been null or empty. we can conceive of an infinite population of heights such that. For instance. of course. . there will always be at least one height lying within each interval. . M. t h a t interval will be. I t is called the Normal distribution.1 approximates. for. of course. to be distributed normally led to a general belief t h a t they were so distributed. The smaller our unit. no matter how small our interval. there is theoretically no reason why any of these classes should have been null. t h a t all measurement is approximate. No matter how fine our unit of measurement. where the relative frequency of the variate varies continuously as the variate itself varies continuously over its range. The fact remains. no matter how small we chose our class intervals.

yx = pn(x). Consequently. The relative-frequency of any one value of the variate is then represented b y t h e area of the corresponding cell. .3. 5. ILL 77 Vestiges of this dogma are still found in textbooks. the relative-frequency of the value x of our variate will be yx.. Denoting the increase in yx corresponding t o an increase of Ax (= 1) by Ay x . From the Binomial to the Normal Distribution. We take the equal class-intervals t o be unit intervals and the mid-point of each interval to correspond to the appropriate value of the variate. particularly in the theory of sampling. For these and other reasons . using (3. I t is in fact found t h a t many of the distributions arising in t h a t theory are either normal or sufficiently close to normality to permit satisfactory approximations by the use of the normal distribution. . we may write = yx(yx+ Jyx i) The relative-frequency of the value x of a binomial variate is precisely the probability of exactly x occurrences of an event E in n occurrences of C(E. the corresponding increase in the relative frequency is yx+ t — yx. Let the height of t h e cell corresponding to the value x of the variate be yx. Consider two adjacent cells of a Binomial relative-frequency histogram (Fig. I. I t was found in t h e latter half of the nineteenth century t h a t the frequency distributions occurring in practice are rarely of the normal type and it seemed t h a t t h e normal distribution was due to be discarded as a representation of natural phenomena.2.1). . .e. When the value of t h e variate changes from x to x + 1. pp. 5. the normal distribution is pre-eminent among distributions of statistical theory " (vol. since the class-intervals are unit intervals. Then. B u t as the importance of the distribution declined in the observational sphere it grew in the theoretical.STATISTICAL MODELS. . and. £). . 131-2).2). i.

our new variate will be X = x — np.2. Therefore. Thus we may put x = X + np. pn(X) = pn(x) and so yx = yx. and if we transfer our origin of coordinates to the mean value of our variate. Y=Y(X) FIG. Moreover. . 5.78 STATISTICS The Binomial mean is np. where X is the deviation of the original variate from its mean value. while AX being still unity is equal to Ax.

2.1 lq) Now let Y = Y(X) be the equation of a suitable continuous curve fitted to the vertices of the Binomial relative frequency polygon. we may. therefore.. For positive and negative integral values of X {X = 0. YAX = Y (•. For values of X not positive or negative integers (5. But we have not yet reached the continuous distribution to which the binomial distribution tends as n increases indefinitely and as the class intervals diminish indefinitely. ±2. In y = — In (1 + Xjnp) .4) ? where y o is the value of Y at X — 0.X2/2n*p* (5. Y 0 is the .5) is meaningless. 332) and neglect powers of Xjnp greater than the second. p.4) then becomes In (Y/Y„) = ^ or (X/np . i.2.5) Y = Y„ exp ( . Clearly.STATISTICAL MODELS. (5. ± 3 . as it stands. p. since we have not evaluated Y0.X/q + In Y 0 .X/q = X*/npq (5. ILL 79 But when n is large and p is not of the order 1 jn. Over the greater portion of the curve Y = Y(X). with which the variate assumes the value X.. expand In (1 + X/np) in terms of Xjnp (see Abbott. Since we are approximating. the formula. is valueless. Teach Yourself Calculus. And. Then we may put dY^AY = r 1 n y dX AX AX \-q{\ + X/np) qJ or F S ^ ^ r r i ^ - 1 ] (5 23) - Integrating from X = 0 to X = X (see Abbott.2.e. or probability. X will be small compared with np.• AX = 1).2. the value of Y a t the mean of the distribution. both p and 1 are small compared with np.X'j2npq) Let us be quite sure what this equation says. tit. op. 158). in any case. Thus Ayxi&X J] + X/np)q . ± 1 . thus In (1 + Xjnp) = X/np . anyway.) it gives us the approximate relative-frequency. . .X*/2n2p*) .

. this gives Y 0 == \j(2impq)1t.2. by (3. named after Stirling. with some easy simplification. we have an approximation. what is the same.2. This has the effect of replacing the unit classinterval of the ^-distribution by one of wi for the new zdistribution.1). corresponding to unit interval for X. We now replace X in 5.7) (see Nicholson.. p"P . the reader will verify that.6). . Fundamentals and Techniques of Mathematics for Scientists. Furthermore. t AX.2. gives 0 np\(n — np)! . e~y . and this is YAX = (— X /2npq). as n increases.80 80 S T A T I S T I C S probability that X assumes the value 0. Thus. (5. 363). diminishes. we have a/>(*) ^ 1 exp (—z 2 /2pq) Az V2npq or. . or. nlAz. Using this approximation for nl. This. Now npq was the variance of the original Binomial distribution. the probability that X lies in a particular unit interval will be the probability that z lies in the ^-interval corresponding to that particular X-interval.8 by znt. and so the probability of z lying in the interval Az is —. V2 irnpq exp (— z2/2pq) . Since we have put z = Xn~i. the variance of the new continuous .2.3. i. * exp V 2-nnpq But AX = nl. q(n-np) = ^ r W i • p n p •gmi • (5 2 6) -- When N is large. The Normal Probability Function. as n tends to infinity. np\ and nq\ in (5. Az y/2-Kpq e have Calling this probability Ap(z). that x assumes its mean value np. 2 y ^ v i # q e x p x *. Therefore. Az.e. * exp (—z2j2pq) . N1* . p. the interval for z.2ai) ' - (5 2 8) ' ' 5. for Nl Nl === V&tN .

\i)V2a2) . unimodal (mode. is y = <f>(x + n) = if>(x) This curve is symmetrical (since <p(—x) = i/i[x)). (6. to which t h e binomial distribution approximates when n is very large. (5.3. Calling this <ja.(* .4. Some Properties of the Normal Distribution.STATISTICAL MODELS.* W ) • .1) (c) The equation of the probability curve referred t o its mean as origin. we have the probability density of t h e new distribution 4>(X) = <jv2-ir exp ( . x2r +1 >p(x) is a n odd function and. 5. (a) The range of a normally distributed variate with zero mean and variance a 2 is from — oo to + oo. Since ip(x) is a n even function of x. consequently. (b) When t h e mean of t h e distribution is a t x = (x. median a n d m e a n coincide) a n d such t h a t y decreases rapidly as t h e numerical value of x increases.1) This is the Normal probability density function. t h e p. is <f>(x) = exp ( . (i2r + a = x*' +1 4>(x)dx = 0 . defining the Normal distribution. (d) I t follows f r o m t h e s y m m e t r y of t h e distribution t h a t the mean-moments of odd order are all zero. the continuous curve.f.4. ILL 81 distribution of z will be pq.d.

+ 00 = —7= / exp [te — *a/2a8]<f* =r = h C ex e x p i(*2" +°H2) + 7 ] / J 6XP d * aVE P [ - ~ ^ = exp exp ( . alternatively. MJt) or = _ L = exp (*. (2r — l)a 2 r |i 2 = a a and (x4 = 3d4. W e also have a useful recurrence relation l)<jV 2 r.4. i.y»/2a»)«fy where y = x — GH.. . .4.««») • 1 <TV27r M m (f) = exp [\<j2t2) . we see again t h a t the mean-moments of odd order are all zero.e.C.a . .4. 2 r \ j r \ Hence jjL2r = (i)ra2r2r \jr ! = 1. .82 STATISTICS (e) The mean-moment generating function is Mm(t) = 6{et*) = f+" e>*ij. To find the meanmoments of even order. But this / i — SO aV2v exp (— x%j2at)dx = 1 + co exp (— x'(2a')dx = csVZn . (V = (2r 1 (5. the probability that x shall take a value somewhere between — <> and +oo is the total area under * f + °° 1 the probability curve.5 .3. Hence + » exp x 2 /2a i )dx. .3.4) By 3.(x)dx • 0 —0 ' 1 /. then the coefficient of t2rl2r! is (ia2)r . (5. . we first find t h e coefficient of t2r\ this is (i^Ylr!.3) I n particular. J is certain.2) Since the coefficients of all the odd powers of t in the expansion of this function are zero.. (5.

* * / 2 o * ) d x B u t t h e integral on the left-hand side of this equation gives the half the area under t h e entire curve and is.2P(0 < x ^ X ) This is frequently the case. then P(|*| > X > 0) = 1 . (5. say 0-5 c m . therefore.4 = f° exp ( . W h a t percentage of the day's o u t p u t m a y be expected to be substandard? This is w h a t is generally called a two-tail problem. however.STATISTICAL MODELS.1. The mean is in fact t h e specified diameter. for we are concerned with t h e probability of a bearing having a diameter which deviates f r o m t h e mean b y more t h a n 0-0002 cm above . 5. problem: (5.x2/2a*)dx = P(x > X).7) Consider the following.4. Therefore P(x > X) = i .4. (5.P(x < X) . P(0 < * < X) = a V2nJg P (—x2/2a2)dx . hypo- In a factory producing ball-bearings a sample of each day's production is taken.4 = f" exp ( °V2 nj0 = —~J*exp x*/2a2)dx (-x*/2c2)dx + exp ( . .5) Now. measured f r o m its mean. referring t o Fig.4. we are concerned only with the absolute value of the variate. A bearing whose diameter falls outside the range 0-5 ± 0-0002 cm is considered substandard. and from this sample t h e mean diameter a n d t h e s t a n d a r d deviation of t h e d a y ' s o u t p u t are estimated. we see t h a t . ILL 83 (/) Since the area under the probability curve between x = 0 and x = X is t h e probability t h a t the variate x will assume some value between these two values.6) If.4. t h e s t a n d a r d deviation is 0-0001 cm. equal to also (-x*/2o*)dx = P(0 < x ^ X) a n d W2tz)x . thetical.

.e. •0239 •0636 •1026 •1406 •1772 <2123 •2454 •2764 •3051 •3315 •3554 •3770 •3962 •4131 •4279 •4406 •4516 •4608 •4686 •4750 •4803 <4846 •4881 •4909 •4931 •4948 •4961 •4971 •4979 •4986 3-6 •4998 7. 4 . •0159 •0199 •0557 •0596 •0948 •0987 •1331 •1368 •1700 •1736 •2054 •2088 •2389 •2422 •2704 •2734 •2995 •3023 •3264 •3289 •3508 •3631 •3729 •3749 •3925 •3944 •4099 •4116 •4251 •4265 •4382 •4394 •4495 •4605 •4591 •4599 •4671 •4678 •4738 •4744 •4793 •4798 -4838 -4842 •4875 . •0279 •0676 •1064 •1443 •1808 •2167 •2486 •2794 •3078 •3340 •3577 •3790 •3980 •4147 •4292 •4418 •4525 •4616 <4693 •4766 •4808 •4850 •4884 •4911 •4932 •4949 •4962 •4972 •4980 •4985 3-7 •4999 8. equal) areas under Normal curve between —00 and . . 4 .8) Table Area under the Normal Curve: = exp ( mdt P(t <T) T (= X/a).4.0 .4. •0120 •0617 •0910 •1293 •1664 •2019 •2357 •2673 •2967 •3238 •3485 •3708 •3907 •4082 •4236 •4370 •4485 •4582 •4664 •4732 •4788 •4834 •4871 •4901 •4925 •4943 •4967 •4968 •4977 •4983 3-3 >4995 4. 5 ) again.0 0 0 2 cm and between + 0-0002 cm and +00. Then ( 5 .84 STATISTICS the mean and of a bearing having a diameter which deviates by more t h a n 0-0002 cm from t h e mean below the mean. This will be given by 1 — 2P(0 < x < 0-0002). 5. How. •0359 •0753 •1141 •1517 •1879 •2224 •2549 •2852 •3133 <3389 •3621 <3830 •4015 •4177 •4319 •4441 •4645 •4633 •4706 •4767 •4817 •4857 •4890 •4916 •4936 •4952 •4964 •4974 •4981 •4986 3-9 •5000 T. •0040 •0438 •0832 •1217 •1591 •1960 •2291 •2611 •2910 •3186 •3438 •3665 •3869 •4049 •4207 •4345 •4463 >4564 <4649 •4719 •4778 •4826 •4866 •4896 •4920 •4940 •4955 •4966 •4976 •4982 31 •4990 2. 0-0 0-1 0-2 0-3 0-4 0-5 0-6 0-7 0-8 0-9 1-0 1-1 1-2 1-3 1-4 1-5 1-6 1-7 1-8 1-9 2-0 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 0. do we find probabilities such as P ( 0 < * < 0-0002)? Consider ( 5 . then. 4 8 7 8 •4904 •4906 •4927 •4929 •4945 •4946 -4959 •4960 >4969 •4970 •4977 •4978 •4984 •4984 3-5 3-4 •4997 •4998 . i. •0319 •0714 •1103 •1480 •1844 •2190 •2518 •2823 •3106 •3365 •3599 •3810 •3997 •4162 •4306 •4430 <4636 <4625 •4699 •4762 •4812 •4854 •4887 >4913 •4934 •4951 •4963 •4973 •4980 •4986 3-8 •4999 9. Let us take the standard deviation of the distribution as the unit for a new variate and write t = x/o. 5 ) becomes P ( 0 < i < T) = — L j ^ exp (-< 2 /2)rf< where T = X/a. (5. we are concerned with finding the two (in this case. 5. •0080 •0478 •0871 •1255 •1628 >1985 •2324 •2642 •2939 •3212 •3461 •3686 •3888 •4066 •4222 •4367 •4474 •4673 <4656 •4726 <4783 <4830 •4868 •4898 •4922 •4941 •4966 •4967 •4976 <4983 3-2 -4993 3. •0000 •0398 •0793 •1179 •1654 •1915 •2257 •2580 •2881 •3159 •3413 •3643 •3849 •4032 •4192 •4332 •4452 -4554 •4641 •4713 •4772 •4821 •4861 •4893 •4918 •4938 •4953 •4965 •4974 •4981 3-0 •4987 1. 6.

The integral 1 . T® = 0-03125 and T 6 /40 = 0-00078. where [1 . ILL 85 » Now P(t ^ T) is t h e area under the curve y — (— / 2 /2) between t = 0 and t = T. we have. ..<*/2)A = 0-5--j=J 1 / " I -f exp(-t'/2)dt .t*!2)dt .P/2) = 1 . Modern Analysis.t*\2)tdt and. 14). .l / r a + 1 . be evaluated in finite form. or Nicholson. correct t o four decimal places. I t cannot. 5IT' + 1 . 3 / r 1 . ~=f exp ( . . exp ( . . we use what is called an asymptotic expansion (see Whittaker and Watson.ra/6 + r*/40 .^ j 1 F ° exp ( . b u t if we expand t h e integrand a n d integrate t e r m b y term.*8/2 + <</4. .<«/8. Therefore tT ^ L J exp ( . is a function of T. Table 5. but. for T > 1. however. j /• 0-800 Worked Example : Find the value of-j= / exp (—Pj2)dt correct 0 to four decimal places using four terms of the expansion of the integrand.) Now r = 0-50000. 8.STATISTICAL MODELS. This method is satisfactory when T ^ 1. 7IT* . t h e integral can be computed t o a n y degree of accuracy required. .t'/2)dt = (T . . Ch.1 .4 gives values of this integral for T = 0 t o T = 3-9. /• 0.4.. We have exp ( . we have — = / V2nJ o exp ( . 3 .. which should be compared with that of 0-1915 given in Table 5. T 3 = 0-12500 and r®/6 = 0-02083.] 1/V2^ = 0-39894228. Fundamentals and Techniques of Mathematics for Scientists. T> = 0-00781 and r 7 /336 = 0-00002 _ .t 2 j 2 ) d t ^ 0-1914.500 Taking 1/V2ir = 0-39894.T'/336 . integrating successively by parts. exp vrj0 exp(-<2/2)<" frequently called t h e probability integral. 5 .3 ! + . for larger values.2 ! . 3 . Ch.

The standard deviation of the sample was 0-0001 cm. We m a y now return to our ball-bearing problem. the probability of a deviation greater t h a n 3a is only 0-0027.4. By symmetry. Likewise. or only about of a normally distributed population deviate from the mean by more than 2a. Therefore the probability t h a t the diameter will exceed 0-5002 cm is 0-5 — 0-4772 = 0-0228.oo. (g) If we pick at random a value of a variate known to be distributed normally about zero-mean with variance a2. on the data available. Since the Normal distribution is symmetrical. from the mean? Entering Table 5. Thus the probability of it lying between — a and + a is 2 x 0-3413 = 0-6826. 3a.4 at T = 1 -00. this means t h a t the probability t h a t it will lie outside this range is 0-0456. the probability t h a t it will lie between 0 and — a is also 0-3413. t h e probability of a bearing with a diameter less t h a n 0-4998 cm will also be 0-0228. This is the probability t h a t the random value of the variate will lie between 0 and a. Since we have to find P(x > 0-0002). the probability t h a t a random value will lie between — 2a and -). Consequently the probability t h a t it will deviate from the mean by more t h a n a in either direction is 1 — 0-6826 = 0-3174. X = 0-0002 and T = 0-0002/0-0001 = 2. what is the probability that this random value will deviate by more than a. but since we have just found that . This means t h a t we should expect. we find t h a t the area between the mean ordinate and t h a t for T = 1-00 is 0-3413. Hence the probability t h a t the diameter of a bearing will lie between 0-5 and 0-5002 cm is 0-4772. (h) Suppose now that we were to . 2a.2a is 0-9544. I e xp (—t'/2)dt correct to four decimal places and check the result with the value given in Table 5. as the reader may ascertain for himself. Similarly. or less than -J.plot values of the integral against the value of X. This is not possible over the full range.86 STATISTICS Exercise: Find . Hence the probability t h a t the diameter of a bearing will lie outside the tolerance limits will be 0-0456. just over 4^% of the bearings produced on the day in question to be substandard. — oo to -).

There is nothing strange in this.10) The graph of a typical F(X) is shown in Fig. When n is large the Binomial distribution tends towards the Normal distribution with mean at the Binomial mean value and variance equal to that of the discrete distribution. . 5. when m is large. ILL 87 deviations of more than 3<j from the mean are very rare. If we do this we obtain a cumulative probability curve for the Normal distribution. as we have also seen. for probability paper is deliberately designed to ensure t h a t this will happen ! 5. (5.4.4. Clearly F(X) = i + P(0 < x < X) . Poisson.3) the resultant cumulative probability curve is a straight line. we plot F(X) against X on probability graph paper (Fig.STATISTICAL MODELS.5. In fact. The function F(X) defined by F(X) E= — f X —CO exp ( _ x2/2a2)dx .2. we can confine our attention to t h a t section of the range lying between — 3-9ar and + 3-9o. the Poisson distribution also tends to normality.4. when the mean of the Poisson distribution—also a discrete distribution—is very large.4. 5. From (5.9) it follows t h a t the value of the ordinate of this graph at X is equal to t h e area under the curve of the probability density function <f>(x) from — oo to X. the range covered by Table 5.9) is called the Normal distribution function. If. Normal. the Poisson probability polygon tends towards symmetry.4. Binomial. however.4. (5. Furthermore.

The .88 •999 STATISTICS •995 •99 •98 •95 •90 •80 •70 •60 / / y 1 2 p< > « /7 — t.—Cumulative Probability Curves on Normal Probability Paper. the probability of a success in each trial being p = 0-1. The mean of the binomial distribution is np = 10 and the variance. The standard deviation is. therefore.3.18 <) +1(5 +2 <5 FIG. 3.4.2<J . Worked Example : Use the Normal distribution to find approximately the frequency of exactly 5 successes in 100 trials. 5. npq./ J X w A 4 s v/ / •< •50 •40 •30 •20 A •10 / ' u •05 •02 •01 •005 ' . = 9.

0-0016 x 58 703 = 93-9. .1. Multiplying by the total frequency 58 703 we find ordinates 35-2 and 146-8 at 160 and 153 respectively.89 binomial class-interval for 5 will then correspond to the interval — 4-5 to — 5-5 of the normal distribution (referred to its mean as origin) and dividing by a = 3. and entering Table 5-4 at 1-50 and 1-83. is 100. correcting for grouping (2. The area under the normal curve from the mean to the right-hand class-boundary (2-903) is. First Treatment: (1) Draw the frequency histogram for the data.e. the area from the mean to the left-hand boundary (3-385) is 0-4997. Three Examples. while the frequency obtained from the Poisson series with m — 10 gives 3-78. following carefully the directives given. (3) The normal curve corresponding to these values for the mean and standard deviation is drawn using Table 6.16) the value of the latter. 3-385<r and 2-93<x. For each class-interval. of each of the boundary values. and the standard deviation 6-60 cm. ILL Example 1: To fit a Normal curve to the distribution given in Table 5-1. The reader should verify for himself that direct calculation of the binomial frequency gives 3-39. from the mean. The corresponding observed frequency. 5. i. work out the deviation.5 on page 90. This time we use Table 5. It will be found that the mean is 172-34 cm. Thus in 100 trials the frequency of 5 successes will be approximately 3-32. Second Treatment: (1) Draw the cumulative frequency polygon for the data of Table 5. since 3 is the class interval. STATISTICAL MODELS. The reader should complete the calculations and draw the theoretical frequency polygon. the boundary values of the first interval are 150 and 153 whose deviations from the mean are 22-34 and 19-34. The reader should work each step himself. These ordinates are multiplied by 3/CT.5 the corresponding ordinates are 0-0013 and 0-0055. Owing to the symmetry of the normal curve. (4) We now calculate the theoretical frequency for each classinterval in order to compare it with the corresponding observed frequency. (2) Calculate the Mean and Standard Deviation. Hence the probability of 5 successes is approximately 0-4664 — 0-4332 = 0-0332. Is — 1-50 to — 1-83. Now from Table 5. we read 0-4332 and 0-4664 respectively. is our theoretical frequency for this interval. we may disregard the negative signs. 0-4981.6. we plot the required curve. note. in standardised units. Proceeding in this way. For instance. this. from this table. We conclude this chapter with two other typical problems in the treatment of which we make use of some of the properties of the Normal distribution. The difference between these two values multiplied by the total frequency.4. giving values 0-0006 and 0-0025.

and the ordinate at 153). ' •3988 •3956 •3885 •3778 •3637 •3467 •3271 •3056 •2827 •2589 •2347 •2107 •1872 •1647 •1435 •1238 •1057 •0893 •0748 •0620 •0508 •0413 •0332 •0264 •0208 •0163 •0126 •0096 •0073 •0055 •0040 •0030 •0022 •0016 •0011 •0008 •0005 4. •3989 •3961 •3894 •3790 •3653 •3485 •3292 •3079 •2850 •2613 •2371 •2131 •1895 •1669 •1456 •1257 •1074 •0909 •0761 •0632 •0519 •0422 •0339 •0270 •0213 •0167 •0129 •0099 •0075 •0056 •0042 •0031 •0022 •0016 •0012 •0008 •0006 •0004 •0002 3. 172-34.. by symmetry. 0-0 0-1 0-2 0-3 0-4 0-5 0-6 0-7 0-8 0-9 1-0 1-1 1-2 1-3 1-4 1-5 1-6 1-7 1-8 1-9 2-0 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 3-0 3-1 3-2 3-3 3-4 3-5 3-6 3-7 3-8 3-9 0. But the area under the normal curve between X = 0 and X = —19-34 is. •3989 •3970 •3910 •3814 •3683 •3521 •3332 •3123 •2897 •2661 •2420 •2179 •1942 •1714 •1497 •1295 •1109 •0940 •0790 •0656 •0540 •0440 •0355 •0283 •0224 •0175 •0136 •0104 •0079 •0060 •0044 •0033 •0024 •0017 •0012 •0009 •0006 S A 1.4 as follows: > To find the ordinate of the cumulative frequency curve at.90 T A B L E 5. •3989 •3902 •3802 •3668 •3503 •3312 •3101 •2874 •2637 •2396 •2155 •1919 •1691 •1476 •1276 •1092 •0925 •0775 •0644 •0529 •0431 •0347 •0277 •0219 •0171 •0132 •0101 •0077 •0058 •0043 •0032 •0023 •0017 •0012 •0008 •0006 2. / / 6.d. C f (2) Draw the theoretical cumulative normal frequency curve with mean 172-34 and s. The deviation from the mean is — 19-34. •3984 •3945 •3867 •3752 •3605 •3429 •3230 •3011 •2780 •2541 •2299 •2059 •1826 •1604 •1394 •1200 •1023 •0863 •0721 •0596 •0488 •0395 •0317 •0252 •0198 •0154 •0119 •0091 •0069 •0061 •0038 •0028 •0020 •0016 •0010 •0007 •0005 v. T (= Xla). •3980 •3932 •3847 •3725 •3572 •3391 •3187 •2966 •2732 •2492 •2251 •2012 •1781 •1661 •1354 •1163 •0989 •0833 •0694 •0573 •0468 •0379 •0303 •0241 •0189 •0147 •0113 •0086 •0066 •0048 •0036 •0026 •0019 •0014 •0010 •0007 •0005 8. •3973 •3918 •3825 •3697 •3638 •3352 •3144 •2920 •2685 •2444 •2203 •1965 •1736 •1518 •1316 •1127 •0967 •0804 •0669 •0651 •0449 •0363 •0290 •0229 •0180 •0139 •0107 •0081 •0061 •0046 •0034 •0025 •0018 •0013 •0009 •0006 •0004 v > L. o But this is i — (area under curve between mean. i. say. we have to find the area under the normal curve from — o to X = 153.5. •3986 •3951 •3876 •3765 •3621 •3448 •3251 •3034 •2803 •2565 •2323 •2083 •1849 •1626 •1415 •1219 •1040 •0878 •0734 •0608 •0498 •0404 •0325 •0258 •0203 •0158 •0122 •0093 •0071 •0053 •0039 •0029 •0021 •0015 •0011 •0008 •0005 ? 6. •3982 •3939 •3857 •3739 •3589 •3410 •3209 •2989 •2756 •2516 •2275 •2036 •1804 •1582 •1374 •1182 •1006 •0848 •0707 •0584 •0478 •0387 •0310 •0246 •0194 •0151 •0116 •0088 •0067 •0050 •0037 •0027 •0020 •0014 •0010 •0007 •0005 7. the lower end-point of the interval 153-156. STATISTICS Ordinates of Normal Curve Multiplied by Standard Deviation D i v i d e each value b y a t o o b t a i n y = . at 153. This is done using Table 5. •3977 •3925 •3836 •3712 •3555 •3372 •3166 •2943 •2709 •2468 •2227 •1989 •1768 •1639 •1334 •1145 •0973 •0818 •0681 •0662 •0459 •0371 •0297 •0236 •0184 •0143 •0110 •0084 •0063 •0047 •0035 •0025 •0018 •0013 •0009 •0007 •0005 •0003 •0002 s •0001 9.e. 6-60. the area under the curve between X = 0 and .

from the fitted cumulative frequency line we find the position of the 84th percentile.i*2). mean and median coincide and the median is the 50th percentile. quartiles: 158 and 177 cm.) Example 2: To find. Then. Thus 84-13% of the area under the normal curve lies to the left of the ordinate at /t + a. If exp (1 /a') = 4. Example 3: The frequency distribution f(x) in obtained from the normal distribution N(t) = exp ( . The reader should calculate the other ordinates in a similar manner and complete the curve. then. (b) the area under the normal curve between — oo and /i + o is 0-5000 + 0-3413 = 0-8413. > — oo . we draw the cumulative relative-frequency polygon of an observed distribution on such paper and find that it is approximately a straight line. If. we may assume that the distribution is approximately normal. (Median: 167-3 cm.+ co. (3) If now we mark upon the vertical axis percentage cumulative frequencies (with 58 703 as 100%). So the 84-13 percentile corresponds to a deviation of + a from the mean. the mean is 3 and the mode is 1-25. the cumulative frequency curve of a normal distribution is a straight line.4 at this value.3. we can find the position of the median and other percentiles. . As * — 1 . working with this " filled " line : (a) since. ILL 91 X = + 19-34. entering Table 5. using probability graph paper. (L. we read 0-4983. therefore. Likewise. approximate values for the mean and standard deviation of an observed frequency distribution which is approximately normal. by means of the equations "vl-n and (ii) t = a log (x — 1).U. Dividing 19-34 by 6-60.) Treatment-. if we find the 50th percentile. show that the median of f(x) is 2. the difference between this and the mean will give us an estimate of a for the observed distribution.STATISTICAL MODELS. for the normal distribution. If. We next draw the straight line to which the polygon appears to approximate. The required ordinate of the cumulative frequency curve is then given by 58 703 x (0-5000 — 0-4983) = 99-8. the difference between the 16th percentile and the median will also be an estimate of a. as *—>. we shall have a graphical estimate of the mean of the observed distribution. Treatment: When plotted on probability graph paper. deciles: 164 and 181 cm. we obtain 2-9. /—> + oo.

.. Fit a normal curve to the distribution of lengths of metal bars given in 2. we have N(t) = ^ or / W*r = \±^ /(.)*] exp | = f(x) . 155. x = 2. 9. 21.e. The wages of 1..e. The class frequencies. 2. from the lowest class to the highest. J e /(*) = ae-l'Nit) (<• + J ) ] = ./ 2i But x = 1 + x^J —0 0 (1 + e'l")N(t)dl = 1 + ^ L J —0 0 I ^'("-D dt dt i. Hence r+00 i /•+» .Hf + 2t/a)] • ~ [' + If then ^l^L1 = 0. They are grouped in 15 classes with a common class interval of lOp. 131. by 0 = a log (x — 1). 6. 65. * = 1 + e»/»* = 1 + (4)» = 3. 90. exp [ . 17. 52. —O O Hence the median value of f(x) N(t)dt. since log 1 = 0. we must have ax a Thus x — 1 = e »' = £ or * = 1-25. 173. which defines the modal value.1. Show that the mean . are 6. x = / Xf(x)dx. 35. 75.000 employees range from 45p to ^1-95. i.92 /OO ^+0O STATISTICS Jl f(x)dx = I N{t)dt = 1.e. 48. Differentiating (i) with respect to t. 117. is given by j f(x)dx — J = j 1 — to i. EXERCISES ON CHAPTER FIVE 1.

000 Assume that this distribution of incomes.) 4.5 8 . 113-5. 113-1. 140-5. What is the actual probability ? Solutions 3. is linked with the normal distribution by the relationship I N(t)dt = J f{x)dx. 140-8. Assuming the distribution of values to be normal. (Weatherburn. 0-0519.000 1. .000. f(x).S. Number of individual incomes in different ranges of net income assessed in 1945-46 : Range of Income after tax (x).000-2. 4. 11-3.STATISTICAL MODELS. 6. 13. 6.000 2. 26-0. 25-3. Show that (3a for a normal distribution is equal to 3.S. ± 6-6 ohms.6. 48-1. 79-0. £ 150-500 500-1.000 137. and find the number of incomes between £250 and £500.000 and over Total Number of Incomes. find what tolerance limits should be put on the resistance to secure that no more than . ILL 93 wage is £1-2006 and the standard deviation 26-26p. (R.) 3. to two significant figures. 48-0. 11-5 and 6-7. where t = a log (x — 150) + b. number of incomes between £250 and £500 is 2-5 x 10s. 150 Obtain estimates for a and b from the data. showing that the class frequencies per thousand of the normal distribution are approximately 6-7. If p — A.000 652. ' 0 0 of the resistors will fail to meet the tolerances. (2-48). See Example 3 of 5. 151-0.500 14.2 . Fit a normal distribution. 5.175. A machine makes electrical resistors having a mean resistance of 50 ohms with a standard deviation of 2 ohms. use the normal distribution to estimate the probability of obtaining less than five or more than 15 successes in 50 trials. Mathematical Statistics. a = 0-71(5) 6 = . 79-5. 0-0503.500 35.

y cm. We could have discussed the distribution of weight among them. y* *3 *t Vi . Denote t h e number of men in this class-rectangle by fy. the group (x it yj) for short.. How do we tabulate such a distribution ? To each National Serviceman in the total of 58.. = 3.2. R E G R E S S I O N AND CORRELATION 6. corresponds to t h a t of those men whose weights are in the 58-5-63-0 kg weight class and whose heights are in the 164 cm height class. Such a table is called a correlation table. a n d / .. A general scheme is given in Table 6. Correlation Tables. the correlation Table becomes : x y x*.) group in Table 6.. b u t there will be some in t h e same weight-group and t h e same height-group. Vi .. for instance. In the last chapter we discussed the distribution of height among 58. yt is the mid-value of the _/th y-array.703 National Servicemen born in 1933 who entered the Army in 1951. B u t had we considered the distribution of both height and weight. Each row 94 . *B ViV The (xt. we should have had a bivariate distribution. there corresponds a pair of numbers. x kg say. . .CHAPTER SIX MORE VARIATES THAN O N E : BIVARIATE DISTRIBUTIONS.1. In either case. 6.703. others will occupy the same height-group (the y} group. his weight.2..2. say) but different weight groups.2. Some men will be classed together in t h e same weight-group (call it the xi group) b u t will be in different height-groups. (ii) If the data is not grouped—to each value of x corresponds but one value of y and to each y corresponds but one value of x.1.879.2. the distribution of one measurable characteristic of the population or sample. y. Let us group the heights in 5-cm intervals and the weights in 4-5-kg intervals. a joint distribution of two variates. the distribution would have been univariate. The joint distribution may then be tabulated as in Table 6.2. NOTE : (i) x{ is the mid-value of the class-interval of the ith *-array. and his height. Two Variates.

1 . c S • a C 1 .C r-t O O M l lH C iO » " O H I NH^OOffi05C5HLOO«OlCH i i j | i HWWNl^^^OlOMH | C W C C iH O 5 O O | | | (j N^COH MNMOONCOMH 1 1 J 1 II lOOOOJW 1 1 II 11 < Cf N < H O O COIOOOCOOO^CQ | H .1 1 1 1 1 1 1 1 1 1 II I >0010010 l Ol O O W j ^ s / oOo d O^ clc rO^ NO l OiO l O Ofl O OO^ ^ ^ ^N i s t iH oo^f lO HH rfT^T^lOlOOtOt^r-OOOOOlOSOJiHHHHiH o / // 00 HHHH . .95 •o o C O e s •S e fe. 1 .< i C H 3 O 1 1 1 C C! O < ' 1 1 1 ' I S g S S S 1 0 * II II 1 II II 1 I I 1 1 II 1 1 II 1 1 II 11 1 1 1 1 * S a « O CO ^O ^ s -o 2 j: s 2 I < o ss s tr. . < / 5 I £ U 0> 31 C O 3 d aj sn VI > >o -c n - as a>i — i O J o 00) I — 00 C 5 l> T X Ia> C O to O S >a ia C 5 T H rH a> Ci — iO C O 11111 i 1 1 .1 1 11 I I 1 1 1 I I I | WHNOCI^WHHH | | \ I I I I I 1 OICNOM^ONIOMH 1 [ I 1 M M NIO^^NH M M TT. 1 1 1 N««H H«ONOOOOIOOT|(I>®HM« || Hf ^ oo co I-h | i | cq W O © M O « H i-T co co i-h 1 i r H IH | | R OS Y S — 'C ii e 0 CJ W 1 -o. .C • 3 <D 1 1« 1 1 I I 1 I I 1 1 I I aw H « E H •2 fr M g t3 t. . £ 3s (3 K ai ® ^ "-i W . •S < 3 S M§ ^ -S t*ONCOOl>OXt*T(iHH i i t iH 03 t> C t. > W+. +J ^ a.1 . . CO I> ^ t.

A pair of rectangular axes is taken. On each of these rectangles we erect a right prism of volume proportional to the occurrence-frequency of the value-pair represented by t h e rectangle in question. The main disadvantage of this method of display is t h a t it is not well suited t o the representation of grouped data. we make a scatter-diagram.3. a scatter-diagram is very often suggestive of directions along which further investigation may prove fruitful. X Correlation Table for Grouped Data Vt Vz fu TOTALS ft. Stereograms. Alternatively. K y VI fu tn hx h y> u..) To represent a grouped bivariate distribution in three dimensions. This is a prismogram or stereogram. AU /»• h» AuN Xp TOTALS f'n /•i A f..2. So t o every pair of values (xi. Thus each row (or column) gives a univariate frequency distribution. 6. we have a scatter-diagram. h A. TABLE 6. (Figs. we may erect a line perpendicular to the horizontal . In this way we obtain a surface composed of horizontal rectangular planes.3. How do we display such a distribution ? If we confine ourselves to two dimensions. (6). 6. Scatter Diagrams. (c) and (d). at the centre of each class rectangle.. mark off on mutually perpendicular axes in a horizontal plane the class-intervals of the two variates. .. for it is difficult t o exhibit a number of coincident points ! Nevertheless.. A row or column is often called an array : the xe array. for example.96 STATISTICS and each column tabulates the distribution of one of the variates for a given value of the other. is t h a t row of y-values for which x = xs.1 (a). .. fi 2 fzt h *. the abscissae being values of one of the variates. the ordinates those of the other. corresponding in three dimensions to the histogram in two. We thus obtain a network of class-rectangles.* /as As fa fp8 . If we plot these points.2. f. yj) there will correspond a point in the plane of the axes.

2.3. we obtain the three-dimensional analogue of the two-dimensional frequency polygon.3. which represents t h e continuous bivariate probability distribution in the parent population. 6. Thus if we cut the surface with a horizontal plane we obtain a contour of the surface corresponding to the particular frequency (or probability) represented by the height of the plane above the plane of t h e variate axes. also have their disadvantages.—Frequency Contours of Bivariate Distribution. Three-dimensional figures. Now. and we frequently find it convenient t o return to two-dimensional diagrams representing sections through the three-dimensional surface. Fig. If we then join up all t h e points so obtained by straight lines.2 shows frequency contours of ten. continuous surface—the correlation surface— _l 150 1 155 1 160 1 165 —i 170 1 175 1 180 1 185 1 190 1 — 195 HEIGHT IN CENTIMETRES FIG. however.MORE VARIATES THAN ONE IO105 plane proportional in length to the frequency of t h e variates in that class-rectangle. if we regard our distribution as a sample from a continuous bivariate population parent distribution. we can also think of a relative-frequency prismogram as a rough sample approximation t o t h a t ideal. 6. a hundred and a .

. I t also shows mean weights at each height and mean heights at each weight.1) ' i t where i = 1.xi*.2.1. . . (Xi* + ZxXi + x') since 1 2 = 0.)*<. and. however. we have nt10 ' = x (6. .98 STATISTICS thousand m e n ' i n the groups of Table 6. *' I n particular. 2. .2. y = 0 for the distribution of Table 6. . 3. • • • + /w*pW mrS' = ^ j 2 2 f i j X i r y f . A brief treatment of continuous bivariate distributions is given in the Appendix.Xi. . the mean value of x in the sample.4. the total frequency of the value Xi. the second JSI j j JSI ( about the origiji.2) and. Moments of a Bivariate Distribution. q and N= 2 2 f y . we have Nmla' = 2 S fyxi.. m 20 ' = i 2 fi. (Xi + *)» = ^ 2 fi. + /(. m 1 0 ' = i — £ fi. we cut the surface by a plane corresponding t o a given value of one of the variates. 3 .p. the toted frequency of the distribution. « or Writing 2 fa= fi.3) Again. 2. 6. j= 1. . = I 2/. .s + .4.X. We define the moment of order r in x and s in y about x = 0. summing for j. we obtain the frequency (or probability) curve of the other variate for t h a t given value of the first. .2 as follows: Nm™' =fux1ry1s +fi2x1ry2" + . . . max' = y (6. i ) Nmw' = S { f a + fa + . (6. Denoting this •iV mean by x. We confine ourselves here to the discussion of bivariate distributions with both variates discrete. .4. likewise.4. If. Writing Xi = Xi — x. mt<s' = 2 2 fyx? = i 2 fi. moment of x Y) ' Vi ~ V. . + Aix/yi* + A A W + • • • • • • + faxi'yf + .

6) 6. Regression. it warrants the inference t h a t a corresponding association exists between the variates in the population.' + y) = * s s (/ijX. for instance. m 20 and m 02 are the moments of order 2.4. and whether. Frequently. y .MORE VARIATES THAN O N E IO105 io105 Denoting the variance of x by sx% and var (y) by %2.1 (a) and (b) we have scatter diagrams of samples from populations in which the variates are linearly related. If. there is a fairly well-defined locus of maximum " dot-density " in the diagram. + yfijXi + fijxy) i\ | ^ = iv S £ fijXi Yj + J i j The quantity fyXiYj is called the covariance of x and iv i j y and is variously denoted by Sxy or cov (x.f -f y 2 = m02 + (m 0 .y. We may therefore write win' = "in + io' • m01' or cov (x. a scatterdiagram of the sample data provides a clue. of course.') a . + i / i . 0 and 0. if our sample is of sufficient size. similarly. 2 about t h e mean. We also wish to know what type of association. the dots do not appear t o cluster around or condense towards some fairly definitely indicated curve. this locus *' condenses ". however. y). In Fig. When we examine the d a t a provided by a sample from a bivariate population. if such an association is apparent. as it were. if any. when we increase the sample size. (6. one of the things we wish to ascertain is whether there is any evidence of association between the variates of the sample. we may reasonably suspect this curve to be the smudged reflection of a functional relationship between the variates in t h e population.4) (6. and yet are not distributed at random all over the range of the . we have m 20 ' = s*2 + ** = w 20 + (m 10 ') a and.3. If.4. the smudging resulting from the hazards of random sampling.5) where. 6. y) = sxy= m n ' — mw' • m0i'= m x l ' — xy (6. . Now consider m ii iV ^ = iV s 4 y + *)(v. m02' = s. more and more t o a curve. exists.4.5. and if.

V+ + + +++ + + + + + + + X LINEAR REGRESSION. V NEGATIVE FIG. r POSITIVE Fig. . 6.1 (a).3.1 (6).3. 6.—Scatter Diagram (I).—Scatter Diagram (IX).+ + + + +V + + x • L I N E A R REGRESSION.

—Scatter Diagram (III).1 (c).3.—Scatter Diagram (IV).3. X — • T . 6. 6.O FIG.1 (d) .IOI + + + T v+ +++ + + + X — T > O 4. .+ + + + + Fig.

I t may be. if this were so. in this case. analytical technique. Linear Regression and the Correlation Coefficient. y() is the regression curve of y on x\ t h a t suggested by the set of points (xj. If the regression curves are straight lines. 6. If we plot the set of points (xi. Now it is reasonable to assume t h a t if there is some tendency for x and y to vary together functionally or statistically. 6.102 STATISTICS sample. the latter some idea of how x changes with y. and the mean value of each *-array against the corresponding value of y. nevertheless perhaps as a result of the operation of unknown or unspecified factors. In practice.. as for example in 6. yj) is the regression curve of x on y. Increasing the sample size will generally tend more clearly to define these curves. we then say t h a t the chances are t h a t the variates are statistically related. it is clear that. occupying rather a fairly well-limited region. it will be more evident if we plot the mean value of each y-array against the corresponding value of x. yi) tend to he on a straight line. however. To begin with we confine ourselves to considering the case where regression is linear. they do not do so exactly and our problem is to find t h a t line about which they cluster most closely. we shall find t h a t in general each set will suggest a curve along which or near which the component points of t h a t set lie.3. Let the mean of the y-array corresponding to the value x = Xi be yi. although the set of points (xi. Assume . We cannot rest content with such a purely qualitative test and must devise a more sensitive. yj). the variates do tend to vary together in a rough sort of way. which. then the regression is curvilinear. we may suspect t h a t there is no association between the variates. yi) and the set (xj. The former gives us some idea how y varies with x. we say t h a t the regression is linear. it is customary to denote the means of ^-arrays by small circles and those of y-arrays by small crosses. and the mean of the #-array corresponding t o y = yj be x}.3. Clearly. while we cannot assume a functional relationship. if not.1 (c). And it is intuitively fairly obvious t h a t if there is a direct functional relationship between the variates in the population sampled these two regression curves will tend to coincide.1 (<*)). would be called statistically independent (Fig.6. t h a t the scatter-diagram is such t h a t the dots are p r e t t y uniformly distributed over the whole of t h e sample range and exhibit no tendency to cluster around a curve or to occupy a limited region. We call these curves regression curves and their equations regression equations: that curve suggested by the set of points (x.

B S S fyxt = 0 i I > ..6. .= 0 if and only if all the means of the y-arrays lie on this line.6) (6. (y. .6. . where A and B are constants to be determined accordingly.2S/. Likewise (6. using this criterion—there are others—so t h a t these points cluster most closely about it will.B S S fa= 0. The value of f i corresponding to Xi should be Axi + B. i j and.6. .6. y) lies on the line yt = Ax( + B.Ax. To do this we use the Method of Least Squares. for an introduction to partial differentiation).A S S fax? . with respect to A and B. i j i j I i Dividing through by N = S S fa. we equate to zero the two partial derivatives of Sy2 (see Abbott.Ax. . = Axi + B.6.B)Xi = 0 (6. (6.3) Jlfy— i (6. Now Sy2 is a function of the two quantities A and B. — B.2 S fi. (6. Now to each Xi there corresponds ft.A S £ fijXi . values of y. (6. ( f . The diSerence between t h e actual value of fi and this estimated value is fi — Ax.7) . . To allow for this fact we form the sum S2V = S / ( .6. Chapter XVIII. — B) = 0 (6.2) and ^ (Sy2) = . again dividing by N. To find the values of A and B which minimise Sy2. Thus Sy2 would appear to be a satisfactory measure of the overall discrepancy between the set of points fi) and this theoretical straight line.5) showing t h a t t h e mean of the sample (x. — Ax. (ji .2) may be written S S fijXiyj .1) Now since all the terms making up Sy2 are positive.MORE VARIATES THAN ONE IO105 the line t o be y. Teach Yourself Calculus.4) The latter equation gives us—remembering t h a t fi. then. Sy2. be that line for which Sy2 is a minimum. we have i J y = Ax + B .BY < . The " best " line we can draw. = S S fifli .6.6. Thus we have QJ (Sy2) = . . . this is m u ' — Amm' — Bx = 0 . .

0./sssy)X. 6. y() lies on the line (y . ii I n this case. .10) a n d this m a y be further simplified by putting T .6. therefore. f r o m 6.6).6. Y = (l/r)X now have our two regression lines. the line of regression of y on x.6.e. (6. i. we sample distribution from t h e mean in variate as unit. .. sx* . i. (x{.104 STATISTICS Solving for A between (6.6. transfer our origin t o t h e mean of t h e and measure t h e deviation of each variate terms of t h e standard deviation of t h a t if we p u t Y = (y — y) jsy and X = (x — x) /sx t h e equation of our regression line of y on x is Y = (s*j.6.SrylSxSy • (6.8) Finally.y)/sy = (SzylSzSy)(x — x)jsx If. p. one with gradient r. Teach Yourself Trigonometry. r) = (1 .e.6. The line of regression of x on y is immediately obtained by interchanging x and y. (6..9.6. 101). : . passing through the mean of t h e sample distribution (see Fig. r.f ! )/2r (6.7).9a) (6.5) from y< = Axi + B.6. (6. . (6. t a n 0 = (I/r — r)j( 1 + ( l / r ) . The angle between them. .6.5) and (6. . thus (x — x)lsx = (sxylsysx)(y or — y)lsy .12a) X = rY. subtracting (6.11) Thus Y = rX (6. is obtained b y using a well-known formula of trigonometry (see Abbott. of t h e change in Y for unit change in X. we have fi y = (sXylsx*)(Xi x). Thus.13) i > tan — t a n d>» .6. • the other with gradient 1 /r.6. we have m n ~ xy m20' — x2 miB = ia . viz.9) as t h e line about which the set of means of the y-arrays cluster most closely.12) a n d in this form the regression line of Y on X gives us a measure..6.

t h a t . cov (x. s. This quantity. We accordingly call r the sample coefficient of product-moment correlation of x and y or. as it is often written. t h a t the two variates are perfectly correlated. r — ± 1. i. we should regard r as a measure of the degree t o which the variates are related in t h e sample by a linear functional relationship. the sample correlation-coefficient. positively if r = + 1 and negatively if r = — 1. if r = 0.6. 8 = 90°. t h e regression line of y on x.14) It should be noted t h a t regression is a relation of dependence and is not symmetrical.x • = s^js*.9). On the other hand. y)/var (y) is the coefficient of regression of x on y and is denoted by b x r I t follows t h a t b. then.6. The gradient of the line given by (6.MORE VARIATES 1 THAN O N E IO105 We see immediately t h a t if r = 1.2 or cov (x. similiarly. . We say. while correlation is one of interdependence and is symmetrical. little or none in the parent population : the variates are uncorrelated. therefore. This means t h a t in t h e sample the two variates x and y are functionally related by a linear relationship and in the population which has been sampled it m a y also be the case. when regression is linear or assumed to be linear. (6. 0 = 0 and the two lines coincide. briefly.. called the sample coefficient of regression of y on x.. I t is natural. is denoted by byx. s„* = . .//sl. probably. is (s xy ls x a ) or.e. and there is no functional relationship between the variates in the sample and hence. y) /var (x).

When r — ±_ 1..byxXi)'* = NSy2. Therefore (6. all the points representing the different observed values (xu yj).2b yx S S fi3Xiy} + byx* E E f(jx? I j i j I j . i. . we have NSya = Nsf or or + Nby^s** = AV(1 .106 STATISTICS 6. .7.6. Since Sj. every (xi. i Then NSV* = S 2 fyyf . Sy2 = 0 and so every deviation. We must now examines r in a little more detail so as t o substantiate our statement that! it is a measure of the degree to which the association between M and y in the sample does tend toward a linear functional? relationship. it can never be negative. ordinates. . Likewise. There is then a straight-line relationship between the variates in the sample and the correlation in the sample is perfect. say. byxXi). . Remembering t h a t the frequency of (xt. Sx. from (Xi. r is therefore a measure of the extent to which any relationship there may be between the variates tends towards linearity.2) shows t h a t r* cannot be greater than 1.7. We see then t h a t the nearer r approaches unity.2 is the mean of a sum of squares. and not just the points representing the array-means. this means t h a t every point representing an observed value. is called the Standard Error of Estimate of x. y3) from the points on the l i n e y = byxx corresponding to x — Xi. But if r = ± 1.s*2(l — r 2 ).7.S y s /s„ 2 . yj).sx/ls^sy*) S.7. (y3 — byxXi) = 0. where Sx2 . . but of all the points (xi.2) (6. the more closely the observed values cluster about the regression lines and the closer these lines lie to each other.r * ) r* = 1 . . Taking the mean of the distribution as our origin of co-. . Sy is called the Standard Error of Estimate of y from (6. yj) isfy. the total sum of square deviations will be S £ fy (y3 . Standard Error of Estimate. the line of regression of y on x and t h a t of x on y coincide.r*) (6.e.3) (1 . Consequently. .9). we may write the equation of the line of regression of y on x is the form y = byzx.1) = NSy1 . (6. lies on the regression line of y upon x. We now form the sum of the squared deviations not of the array-means from the corresponding values predicted from this equation.2byxNsxy Since byx — s^lsx*. 2 = V ( 1 . lie on a single line.7.

yj).x)' = J i. the frequencies for the various classes being shown in the correlation-table below. cdssr CSxdSy ( 6 8 3) SXSY ' .MX.2) s g L = SxS. likewise. x = a + cX. . We take our working mean at (57. x and y.8.67)/5 What will be the effect of these changes on our calculations ? Consider a general transformation.1) s*2 = c2sx* and. .X)* .x)(y. The values of x and y indicated are the mid-values of the classes.000 26 97 226 244 189 137 55 Treatment: (1) To simplify working we select a working mean and new units. (6. Y = (y . 67) and since the class-interval for both variates is 5 marks. Show that the coefficient of correlation is 0-68 and the regression equation of y on x is y = 29-7 + 0-656* (Weatherburn. . gained by 1.51)15'.) x y 52 57 62 67 72 77 82 87 TOTALS 42 47 3 9 9 26 10 38 4 20 — i — — — — — 52 19 37 74 59 30 7 — 57 4 25 45 96 54 18 2 — 62 6 19 54 74 31 5 — 67 — 72 — — 77 82 — — — — — — TOTALS - 6 23 43 50 13 2 7 9 19 15 5 5 8 8 21 — 3 2 5 35 103 192 263 214 130 46 17 1.X)(Y. are grouped with common class intervals of 5 marks for each variable.8. Worked Example. 6. r = *i S / y f X .MORE VARIATES THAN ONE IO105 Exercise : Show that the line of regression of y on x as defined above is also the line of minimum mean square deviation for the set of points (x{. y = b + dY\ then x = a + cX] i> = b + dY We have : = is/sto or Also = l . we take new variates : X = (x .8.7) = cdsXT m <t (6. The marks.000 students for theory and laboratory work respectively. . s„2 = dlsT* .

108

Again

b

108 S T A T I S T I C S

vx = sztlszs = "fc xy /c»s x 2 = (djc)bYX

and, similarly, K, = (cIWxy (6-8-4) We conclude, therefore, that such a transformation of variates does not affect our method of calculating r, while, if, as in the present case, the new units are equal (here c = d = 5), the regression coefficients may also be calculated directly from the new correlation table. (2) We now set out the table on page 109. (3) (i) X = 0-239; Y — 0-177. Consequently, x - 57 + 5 x 0-239 = 58-195, and y = 67 + 5 x 0-177 = 67-885. (ii) IS S ftsXf

2

= 2-541.

sz*=±

S S fitX,» - X*

= 2-541 - (0-239) = 2-484, and sx = 1-576. Consequently, although we do not require it here, st = 5 x 1-576 = 7-880. (iii) - i s s f t l Y * = 2-339. sr' = 2-338 - (0-177)2 = 2-308 w i j and sr = 1-519. Consequently, s ( = 5 x 1-519 = 7-595. (iv) I s X f ^ Y , = 1-671. sz}. = I S S f ( j X { Y , XY = 1-671 - (0-239) (0-177) = 1-629, giving s (not here required) = 5 s x 1-629 - 40-725. (v) The correlation coefficient, r, is given by r - sxrjsx . sT = 1-629/1-576 x 1-519 = 0-679. (vi) The line of regression of y on x is given by y ~ 9 = - *)• But 6 „ = sTXlsz* = 1-629/2-484 = 0-656. Hence the required equation is y = 0-656* + 29-696. Exercise: For the data of the above example, find the regression equation of x on y and the angle between the two regression lines. Use 6.6.13 and the value of tan 6 to find r. Check and Alternative Method of Finding the Product-moment: It will be noticed that along each diagonal line running from top right to bottom left of the correlation table, the value of X + Y is constant. Thus along the line from X = 4, Y = — 3 to X= — 3, V = 4, X + y = 1. Likewise, along each diagonal line from topleft to bottom-right, the value of X — Y is constant. For the line running from X = — 3, Y = — 3 to X — 4, Y = 4, for example, X - Y = 0.

LOG

I T

* W M

C I

O N M

O>

O

•>(1

03

®

•OI

eo

CS CO

IN

K.

C| W-

IN

00

IN IN

00

05 •O CO

•0< 00 eo o

CO ICO

a> CO

©

>O O>

eo eo

IN +

CO <N

•01

IN

"C £

\

W-

CO

1

-01

O

i ( — 7 eo O

S3

-0< i 1 —

1

S3

1,671 II

8

CO CO <N

a

00 eo

•01 •01

+ + IN +

CO IN

-0<

CO

+ S

l>

8

05

a>

IN

eo

•1 <1

IN

O eo

CO •OI

II 8

a>

tco_ >o

00 t1-

1o + •0<

CO

<N

>o

1

1

1

1

1

1 »o 00 00

IN

l> i i —

+

O V lO o

CO

+

eo

1

1

1

1

1 as C O lO •a

CO CO

+

<N <N

•01

CO IN

IN

t-

+

•O >o

IN

1

i L —

+

o ©

CO

CO

eo

1

1

CO

CO

CO

M +

eo

IN

eo

•OI

o o

eo

(N

CO

l>

1

1

CO

•01 00 + ©

IN +

00 OJ

ao

CO

CO CO

en

•<(1

IN

C O l> >o

+

O

>a

•OI

eo

>o 1

Ol 00

1 Til •a C M Ieo >o

CO OS

IN

eo 00

o

IN

lO

00

IN

1 t-

<N

•0< "0<

CO IN IN

vH 1 i-H O 7

CO

lO O H

®

os I 1 — lO 1

<N

•O<

r-

CT1 o eo o

1 1 1

1

1

iH lO •01 tIN IN

CO

o>

t-

«<i

CO

CO

00

o

IN

•01 1

tC O

•OI

1 eo 1 eo a> o •01

1

rH vH 1

CO

O

IN (N

•OI

—

1

CO IN 1

CO

eo

C —

1 M

CO 1

IN 1

O

7

+

r-

IN +

eo

+

•01 +

>T

>T

•<5

TQ

ft.

W -

M S3

/

^

•O

t-

>O

IN CO

CO

CT-

IN 00

£

j

— H O H < J o )

—

110

Now SS il and

110 S T A T I S T I C S

+ Y,)> = E £ fi)X{2 + ZS/^Y/ i i i i

+ 2 2 2 / , ^ , ij

s s / A - y,)* = s s f ( j x ? + s s / „ y / - 2 s s / l j j : 1 y ( *j t i ii if If then, as in the present case, the entries in the table cluster around the leading diagonal, we may use the second of these identities to find the product-moment of X and Y. Tabulating, we have Xt-Yi h • • • -3 17 9 153 -2 85 4 340 -1 209 1 209 0 346 0 0 1 217 1 217 2 103 4 412 3 23 9 207

(X, - Y,)' . MX, - Yt) *

2 2 h,X? = 2,541, 2 2 ftJYi2= 2,339, i j it From the table, S S / „ ( X , - Y,y = 1,538. Therefore, S I f,JC,Y, i I i> = i(2,541 + 2,339 - 1,538) = 1,671. If the entries cluster about the other diagonal, the first of the above two identities is the more convenient with which to work. 6.9. Rank Correlation. Suppose we have n individuals which, in virtue of some selected characteristic A, may be arranged in order, so t h a t to each individual a different ordinal number is assigned. The n individuals are then said t o be ranked according to the characteristic A, and the ordinal number assigned to an individual is its rank. For example, " seeded " entries for the Wimbledon lawn-tennis championships are ranked : they are " seeded " 1, 2, 3 (first, second and third) and so on. The concept of rank is useful in the following ways : (a) We may reduce the arithmetic involved in investigating the correlation between two variates if, for each variate separately, we first rank t h e given values and then calculate the product-moment correlation coefficient from these rank-values. In this way we have an approximation to the correlation coefficient, r. (b) We m a y wish to estimate how good a judge of some characteristic of a number of objects a man is by asking

MORE

VARIATES

THAN

O N E IO105

io111

him t o rank them and then comparing his ranking with some known objective standard. (c) We may wish to investigate the degree of agreement between two judges of the relative merits of a number of objects each possessing some characteristic for which there is no known objective standard of measurement (e.g., two judges at a " Beauty Competition ".) Assume a set of n individuals, al, a2, . . . a„, each possessing two characteristics x and y, such t h a t the n individuals may be ranked according to x and, separately, according to y. Let the rankings be as follows : Individual *-Rank y-Rank «1 «! • • • « ( • • • »,

XT

. . . X, . . . *

N

yi y% • • • y< • • • y.

Here xt, x2 . . . Xi . . . xn are the numbers 1, 2, 3 . . . n in some order without repetitions or gaps. Likewise the y's. Now if there were perfect correlation between the rankings we should have, for all i, *< = y<. If this is not so, write "i — yi =di. Then S d? = E (Xi — y^ = S Xi1 + Sy<2 - 2 S xtyi

{ i i i i

**But S *ia = S y ^ = l a + 2» + 3» + . . . + n \ the sum of
**

< i

**the squares of the first n natural numbers. Consequently, = S y^ = n(n + 1)(2« + l)/6. Therefore.
**

i i

2xiyi =

i

+ Sy<») i i i

d? JSd?.

i

= n(n + 1)(2n + l)/0 -

Now cov (x, y) — - S x , y i — xy; n i and But v a r * = v a r y = - (I s + 2» + . . . + «»)— y»; n ^ = y = ( l + 2 + 3 + . . . + n)/n = (» + l)/2.

y) = (n + 1)(2n + l)/6 = («a . for all i. consequently. then.9. cov (*. and. var y) ! = cov (x. If X di2 = 0.x) is + (n2 . will be equal to n + 1. and there is perfect correlation by rank i (since di = 0 for all i. 1'Zxiyl = n(n -f l')» . R. £ (x{ + y. . £ d * is a maximum. . W h a t happens. We have. (6.(* + l)*/4 = (n« So cov (x.1)/12.(n — 1)/12 and var (.1)/12 — (« + l) 2 /4 i (S^a)/2« and ' var x = (n + 1)(2« + l)/6 . Year Number convicted of crime N u m b e r of unemployed . Xi — yi). however.6 ( i d i * ) l ( n 3 .1) S This is Spearman's coefficient of rank correlation. if the two rankings are exactly t h e reverse of one another ? This is t h e most unfavourable case and.n ) .)2 = n(n + I) 2 . while. + y. J? = 1. so. R = — 1. t x. y) /var * 1)/12 = 1 . y) is 2 or Xxiyi = n(n + 1 )(n + 2)/6. 1924 1925 1926 1927 1928 1929 1980 1931 1932 1933 7-88 8-12 7-86 7-25 7-44 7-22 8-28 8-83 10-64 9-46 1-26 1-24 1-43 1-19 1-33 1-34 2-5 2-67 2-78 2-26 Treatment: We rank the data. Thus R varies between the two limits ± 1Worked Example 1: The figures in the following table give the number of criminal convictions (in thousands) and the numbers unemployed (in millions) for the years 1924-33. Find the coefficient of rankcorrelation. 4 .2n(n + 1)(2» + l)/6 Cov (x. y) /(var x . thus : Year Number convicted Number ployed unem8 4 9 16 1924 1925 1926 6 5 7 1927 9 1928 8 1929 10 1930 4 1931 3 1932 1 1933 2 6 4 10 1 7 1 6 16 3 1 2 1 1 0 4 dt' .112 STATISTICS Therefore. . since for all t.

Kendall's Coefficient. the total score is — 11. 6). Now consider t h e 10 pairs (a. the product-moment correlation coefficient. (9. R = I . is the ratio of the actual to the maximum score. We have Srf. 6). for the same d a t a was — 0-451. . 0.. t h e total score would have been 11 + 10 + 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = 66. and the score — 1 if a > b. with each succeeding number. in this case. for the above data.n = 1. 0. T. The total score for this set is — 1 + 1 — 1 — 1 — 1 — 1 — 1 — 1 + 1 — 1 = — 6. where a = 9. indicating that the judges have fairly strongly divergent likes and dislikes where babies are concerned ! 6. 1. 3. Continuing in this way. — 11. Kendall. but the coefficient of rank correlation will tell us something about the judges. A second coefficient of rank correlation has been suggested by M. R = — 0-455. 12. Thus for the 11 pairs {a. T = — 6i6 i = — 3 A The Spearman coefficient. b). as in the upper ranking. (9. we obtain the 11 following scores. G. To each pair. — 1.— 6. Consider the rankings in Example 2 above. where a — 12. and. (9. 3). i.e..2 = 416* »3 . n — » = 990. allot the score 1 if a < b.6 x ^ = 0-709 Conse- Exercise : Find r. If we take t h e first number of the second ranking. we shall have 11 number-pairs.716 i Thus. since n = 10. — 1. etc. — 1. 0. < quently.10. 10). R.— 6. The Kendall rank correlation coefficient. Worked Example 2 : Two judges in a baby-competition rank the 12 entries as follows : X Y 1 12 2 9 3 6 4 10 5 3 6 5 7 4 8 7 9 8 10 2 11 12 11 1 What degree of agreement is there between the judges ? Treatment: Here we have no objective information about the babies. say. totalling 22. (a. viz.MORE VARIATES THAN ONE a IO105 We have S d? = 48. H a d the numbers been in their natural order. b).

) Treatment: (1) Consider the first ranking. 3. 4. Worked Example : Show that the values of t between the natural order 1. we may set down the two rankings. 8. 2. one of which is in natural order.2) + (0 .1) . one above the other. 2.3) + (1 . 5. Alternatively. A shorter method of calculating S is : In the second ranking.l)/2 . 6.1) 11 Hence T = . — 1. the figure 1 has 0 numbers to its right and 11 to its left.1) where S is the actual score calculated according to the method used above. 3 has 5 numbers to its right and 4 to its left. — 8. (6. 0. 2. 2 to 2. we have S = (6 . Allot the score 0— 11 = — 11 and cross out the 1. 1. score is 5 — 4 = I . 8. t = 2S/m(« — 1) = 1 . and — 1 only if one is the inverse of the other. . . 10. — 2. Vol.7) -f (0 . therefore. . 2. . obtaining the set of scores : — 11. Kendall.10.4 N / n ( n . 3 to 3 and so on. 1.2) + (3 .114 114 S T A T I S T I C S Generally. When n is large t is about 2i?/3. Kendall's t is + 1 when the correspondence between the rankings is perfect. cross out 2. Then if we count the number of intersections (care must be taken not to allow any two such intersections to coincide). 9. 6. 4.1) + (0 .5) + (2 . 7. . say. (Modified from M. N. cross out 3. — 2. if there are n individuals in the ranking. 9 Find also t between the two rankings as they stand.2) Like Spearman's R. Advanced Theory of Statistics.1) . Using the short method of calculating S. the Kendall coefficient is x = 2 S j n ( n .0) + (1 .0-24. p. I.7) + (4 . therefore. the total of which is 5 = . and join 1 to 1. S will be given by S = n(n . G. (6. Continue in this way. 1. 5. 10 and the following rankings are — 0-24 and 0-60: 7.10. . 2 has 1 number to its right and 9 numbers to its l e f t : the score. — 1.11/45 = . 3 10. 437.22.2N and. 1. allotted is 1 — 9 = 8. — 1.

these sums will be m. + m2)/« . 10 and the above two rankings has the values —0-37 and 0-45 respectively and that between the two rankings as they stand R = — 0-19. . nm and their variance. m 2 (l 2 + 22 + 3 a + . 2. 2. . . Thus T = 1 . then. if there are m rankings.MORE VARIATES THAN O N E IO105 Io3 (2) Using the alternative short method for the second ranking we have : 1. 3. we should have had 3 6 9 12 15 18 21 24 27 30 and the variance of these numbers would then have been a maximum. Frequently we need to investigate the degree of concordance between more than two rankings. The mean of each ranking is (n + 1) /2 in the general case. 5. 8. 7 1.11. W i t h perfect concordance. 3. therefore. 3m. as in the present case. 5. 7. 2. 10 Then S = — 5 and T = — ^ = — a. 4.m*(n + l) 2 /4 = m2(ws . 4. 3. 1. 7. there is little concordance. 6. 6. 9. obviously. we have the sums 17 18 17 8 16 18 23 21 13 14 Had there been perfect concordance. rearrange both so that one is in the natural order. 1. Exercise : Show that R between the natural order 1. 8. 4. . . 10 10. we have the following 3 rankings : ^ 1 2 3 4 5 6 7 8 9 10 Y 7 10 4 1 6 8 9 5 2 3 Z 9 6 10 3 5 4 7 8 2 1 Summing t h e columns. N = 9. 6. B u t when. 6. I t is reasonable. 4. to take the ratio of the variance of the actual sums to the variance in the case of perfect concordance as a measure of rank-concordance. therefore.4 x 9/90 = 3/5 = 0-60 (3) To find x between the two rankings. 8.1)/12 . . Coefficient of Concordance. Here it is easier to put the second in that order: 10. 8. the variance is small. 7. 6. 9 and. . 2m. the mean of the sums will be m(n + l)/2. for example. 3. Suppose. 9. 2. 5. 9. 5. the number of inversions of the natural order. . 2.

we again use the Method of Least Squares to determine the coefficients.1) where the coefficients ar.2) Exercise : Verify that (6. We define the coefficient of concordance.2) holds in the case of the three rankings given at the beginning of this section. let y. by W = {S/n)/m*{n* . it will be recalled. If the regression equation of y on x (or x on y) is of this form. f.116 STATISTICS Let S be the sum of the squared deviations of the actual sums from their mean. (6. n = 10 and W = 12 X 158-5/9 X 990 = 0-2134 I t may be shown (see Kendall.12. which. The simplest type of non-linear equation is t h a t in which one of t h e variates is a polynomial function of the other. . . m = 3.2. Referring to Table 6. Or. y — a0 + axx -j.1) . 1. In the case of the three rankings given. t h e exception rather t h a n t h e rule.1 )/(m .11. i?aT. however. = (mW . ft. 2. 1. the coefficient of correlation. are not all zero.1) .1) Clearly. 6.2. vol. or predicted. m(n -f l)/2. ft). (If the data are not grouped . If. 411) t h a t if i?av. . 1. (r = 0. be the calculated. of the polynomial. (6. . p. .12. . + a^xk = S a / r= 0 . .11. although they are important exceptions.6. t h e line about which these points tend to cluster most closely is usually curved rather than straight. (6. we plot yi against Xi (or Xj against yj). . . is a measure of the extent to which any relationship between the variates tends towards linearity. ft). Once we have decided upon the degree. value when * = xt is substituted in (6. viz. When this is the case.11. .1). denote the average of Spearman's R between all possible pairs of rankings. W varies between 0 and 1.a^x* + . between the rankings. be the actual mean of t h e x(-array. + Orxr + . (r = 0. and let Y. So far we have limited our discussion of regression to bivariate distributions where the regression curves were straight lines.. Advanced Theory of Statistics. W.1)/12 = 12S/m*n(n* . 2. Such distributions are.12. Polynomial Regression. is no longer a suitable measure of correlation. . we have polynomial regression. using the notation of 6.

dS'/da = 2 S (a + bu. To find the values of these quantities which minimise S 2 . t h a t value of y is.250 Find the parabolic regression of y on x.650 1.L a. + cu.300 {Weatherburn.) y 1. to zero.950 2.) i i i .*. This gives us K + 1 simultaneous equations in the o's. we differentiate S 2 partially with respect to each of these quantities and equate each partial derivative. k). 1. v = (y — 1. the normal equations.) The sum of t h e squared residuals.* — v. The following example illustrates the method when k = 2. .950 5 2.300 — Then— vu. Or. v = a + bu + cul and. u.5 -37 6 52 21 v. 16 1 0 1 16 34 y1.2 v.i/i)2.650 4 1. 1 2 3 4 5 -2 -1 0 1 2 0 4 1 0 1 4 10 -8 -1 0 1 8 0 M4. S 2 . from which the required coefficients may be determined. + c L u.(y< < Y. f y .650) /50. dS2ldOr = 0 (r = 0. Worked Example : The profits. . 1. .x.12. I . . -32 . .400 1.)* = S/. . 16 5 0 6 26 53 vu*.y r • (6.) i = 2(na + 6 S «. 8SildaT. S* = 2 (a + bu.400 3 1. itself t h e mean.250 1. of course. k).8 — 5 -13 6 13 6 For parabolic regression of v on u.MORE VARIATES THAN O N E IO105 Io3 and only one value of y corresponds to a given x. so.(r = 0. . is then given b y : S2 = £/.2) S 2 is thus a function of the A + 1 quantities.(y. Treatment: Put u = x — 3 . + cu(' . of a certain company in the xth year of its life are given by : X 1 2 1.

' = 2 (a E «.F<) + 2 2 fi0i i j Yi)*. + cu? .*/) . If.6 = 0. The correlation table being (6.)u.140 + 72* + 32-15*2 6.2) partially with respect t o a. S y l is the standard error of estimate of y from this equation. + 6 S « i . l l .4 .2) i > i i In the present example.e. the mean square deviation of the y's from the regression curve. the required regression equation is y = 1. .S v. k oi the polynomial to those of the data.x^y. - . + cu.£a. Then Yi = y(xi).21 = 0 giving a = — 0 086. ( .2 — v. 1.xiY. Least Squares and Moments. .0-086 + 5-3« + 0-643»2. b = 5-3. If we differentiate (6. 2 f.yi)(yt . let the regression equation of y on x be y = y(x). 2 . NS/ = 2 i j = 2 ^My. the normal equations 0S2/Sa.2 + 6 S M<s + c S w.u. 6. 10a + 34c . . .yl + y( .?«)" i j . for all r. = 2 Jt. dS'/dc = 0.. Yi)" = 2 2 fijiy. equating to zero. therefore. v = .(y. then. Correlation Ratios. are 5a + 10c ..)u. i. 106 — 53 = 0.u.12.) < < i » • dS'ldc = 2 S (a + bu. i i showing t h a t — The process of fitting a polynomial curve of degree k to a set of data by the method of Least Squares is equivalent to equating the moments of order 0. i = 2(aS«. Changing back to our old variates. c = 0-643. The regression equation of v on u is. + c S t ( .yi)2 + 2 2 .118 STATISTICS 8S*l8b = 2 S (a + bu.2).2. SS'jSb.v.£ v. we have 8S*/8ar = Zft..2x() i r and.14.13.

When this is the case.7. Sv2 is the mean value of the variance of y in each #-array.2).MORE VARIATES THAN ONE IO105 Let fH = fi sa S fij.3) S„' ! = s„ 2 (l . Now let Sy'2 be the mean square deviation of the y's from the mean of their respective arrays. we write this where eyx = SylSy (6. (yi — Yi) = 0. — yi) = 0.)* = S Z f o ? . Consequently.y. the total frequency in the *<th array. each is equal to Sy2.2 (1 — r*) and so the standard deviation of each array is s„ (1 — r2)i.fii(yt i j .14.'2 = s. i J NSy'2 = £ Xfijyf > > i £ £ Mi2 • j i . i.y2 i i . since 2 fij(y.2 £ mzfvy. therefore. 2 . is the correlation ratio of x on y.Nm0i' = Nsy* - - £ rnyi2 = Nsv2 + Ny2 — £ run* (£ «iy. for all i.14. i NSy2 = i + S«i(y< i Yi)* .1) It follows t h a t if all the means of the ^-arrays lie on y = y(x). (6.2 £ £ t fofi. Likewise e-cy = SxJsr. and. If the regression is also linear.)] i + SS/(jy(« i j + S £ ftiy? i i i But £ f i f f j = yi £ fij.4) ana is called the correlation ratio of y on x.14. Then NS. • .«***) • . Then.:a — Ny2) i But y is the mean of the array-means. taken over all such arrays. • (6. By analogy with (6. the regression of y on * is said to be homoscedastic (equally scattered).e. i and sVti2. . the variance of the y's in the same array. Since both S y ' 2 and . if all the variances are equal. which we shall denote by So S.'2 = S Y. Sy2 = Sj. S i J = £ •Zf.sg2 .. therefore the expression in brackets is N times the variance of the means of the arrays. yi.2) (6.14.

Furthermore.r 2 . S / 2 = 0. we see that. since deviation from the mean is unchanged by change of origin.14.evx2 < 1 . (6. so.Xy*)INsS . Worked Example. If T(y) be the sum of the y's in the distribution. when eyx2 = 1. Then T{(y) = n. (6.8(2). since the mean-square deviation of a set of quantities from their mean is a minimum. Moreover. since both numerator and denominator involve only squares. i. (But see 10. and s^E^Cy)] j t j j e sum Qf those quantities each of which is the square of a term in row F divided by the corresponding term in row E. the regression curve. y = y(x). yj) lie on the curve of means. is 0 when r" = 1. Treatment: (1) V " = V / V = (2 .r\ 2 i.14.e. work with the variables X and Y in our table of 6.1) Let T((y) be the sum of the y's in the Xjth array. and 1 when r = 0. Calculate the correlation ratio evx for the data of 6. From the definition. therefore. . 0 < S„' 2 < S„2 or or 0 < 1 .8. e^ is unchanged thereby. r < eyx2 < 1 .f.120 STATISTICS Sy* are positive. a non-zero value of eyx2 — r" may be regarded tentatively as a measure of the degree to which the regression departs from linearity. then. Consequently Then Ny* = [T{y)]*/N and S n&?= .15.5) 0 < eyx* .x* < 1.10. they are changed in the same ratio by any change of unit employed. 6. .r* < 1 . e^f = s j 2 /s„ 2 .15.e. and £ T{(y) = Ny. and we may. there is an exact functional relationship between the variates.3) shows t h a t e. and.) That eyx is a correlation measure will be appreciated by noting that.. i T(y) = S T. all the points (*<. 1 — r2. we see that T(y) is the total of row F. From that table. Since we regard r as a measure of the degree to which a n y association between the variates tends towards linearity and since the residual dispersion...(y). Hence e^ is unchanged by 'change of unit and origin. (6.

\> x2> xz> measured f r o m their means.&13.1) (6.5) (6.6) 6.16. the departure from linearity is small.000 x 2-308 <'•• V = 2-308 from 6-8(3)) or = 0-471 e^ = 0-686. Multivariate Regression. . (6. (6. or (2) we m a y be interested in assessing t h e interdependence of two of t h e variates.117 (to 4 significant figures).16. 1 1 1 7 01 Thus ~ 1.2*.1). 6.3) *3 = &31-2*1 + &32-l*2 .16.^ 2 .3 2 *2*3 + 6 13 . s 2 2 . • • .2) (6. When we have more t h a n two correlated variates. We shall determine t h e b's b y t h e m e t h o d of Least Squares. Also 37" 1132 161* U^ 120^ 184' : 26 + 97 226 + 244 + 189 + 137 + I \ n< ) 112s 66* 172 + 55 + 21 + 5 = 1. = 2 2 . Consider (6.2.)a • (6.16.MORE VARIATES THAN ONE IO105 (2) We have then : T(y) = sum of row F — 177 and N = 1.16. two m a j o r problems present themselves : (1) we m a y wish t o examine the influence on one of the variates of t h e others of the set—this is t h e problem of multivariate regression. .16.2 2 *2*. Here we confine ourselves to the case of three variates. let t h e regression equations be x 12-3*2 + &13-2*3 • *2 = &23-l*3 + &21-3*X • = 6 • • . Since e yr 2 — r* = 0-009. s 3 2 respectively. The sum of t h e squared deviations of observed values of xl f r o m the estimated values is given b y S* = 2 (*.000.3 2 + b13. 2 2 * 3 = 2 xxx3 . giving [T(y)]'jN = 31-329.bu.16.16. Since the variates are measured f r o m their means. approximately. with variances.3xj The normal equations are : b12. after eliminating the influence of all the others—this is t h e problem of partial correlation.4) .

for instance. these equations are represented by planes. r3l are total correlations : rl2.16.10(6)) (6. we have r j. Whereas ry = r.2. we have i 2 ^ P " 1 ~ yn2.8) Here r12. (6. formed.16. is not in general equal t o bji-tThe reader familiar with determinant notation 1 will realise t h a t we may simplify these expressions. These regression planes should not be 1 See Mathematical Note at end of Chapter.16.122 S T A T I S T I C S Solving these. . 1 3 = 0 J (6. 3.3 l y 21 i?n + r22Rl2 + r23R13 = 0 >oH r2lRxl + Rl2 + r23Rl3 = 0 fR r»iRu + r32R12 + r33R13 = 0 J l f 1 3 f l n + r23Ru + • .3 and fc13.10(c)) + s3 Sj 62 In the space of the three variates. 23i' ~ ^12 = r lS r 13r23i ~ — Ria r l3 r l2r31- also „i? 'li-Ru + ' 12^12 + 'ls-Ris = -R] f-Rn + * 12^12 + * is' 13 = R'1 .2 are called partial regression coefficients. is the correlation between xt and x2. Let R denote the determinant where r x l = r 22 = *-33 = 1 and n j = r-ti for i. (6. The coefficients bu. if Rij denotes the cofactor of rij in R. by ignoring the values of x3.16.. : (6. in the usual manner.16.-. r23.» 3 8 ] So L J • . + s2 s3 1 (3) Regression of x% on xtl and x% : 1 ^ + ^ 3 2 ^ = 0 . j = 1.9) The regression equations become : (1) Regression of xx on x2 and x3 : + ^ + ^ = 0 .10(a)) (2) Regression of x2 on x3 and + s =0 . but i Then.

3x1xi + £>13. ••• rH = (ii) var ( x j = s ^ (iii) var (b12.£1^ [R — r l l ^ l l ] p R = Sl*[l .3x2 + bl3. for instance.38(xlx2) + bl3.3&la. b12.2*3) = = s2 W~i •"11 R i 2 R n r i 2 ~ KuRii'iai = ^'t1 - + .i?/i?u]. denoted by is the coefficient of product-moment correlation of xl and its estimate from the regression equation of xx on x2 and x3. ji cov (xr. there is an exact functional relationship between the three variates.2x3) = 6 12 .17.9).4) = [i?12a + R13* + 2R12R13r23] = K l l [i?i2(i?12 + r23R13) + i + n Using the second and third equations of (6.^ [''12^12 + ' i s ^ i d = Now.3x2 + bl3. s covO*. % 6. ) [var xx .) + 6 1 3 .MORE VARIATES THAN ONE IO105 confused with the regression lines : the regression line of xl on % being xx = (r11sllsl)x2. D E F I N I T I O N : The coefficient of multiple correlation of xx with x2 and x3. since xx = *2 = x3 = 0 = bl^.2»'23s. var (&12.16. using the first equation of (6.9).3*2 + &18-2*3> = .16. 2 s 3 ! + 2&12. That this is a natural definition is clear if we recall t h a t if x x lies everywhere on the regression plane.*.2x3)]i (i) cov to. var (fc12. 2 COV ( X j X 3 ) sj S j I? j 3 = r y 7" p 12SlS2 — p i3SlS3 i 2-nll 3 11 S 2 = . 3 2 s 2 2 + & 13 .3*2 + &13.t£(xxx3) = 6 l a .s*2 + b13.7. Multiple Correlation.!ss (see 7. cov (*„ b x2 + W . &i2. From the definition..2x3) = S(bli.2*1*3).

cov (*. .*. r .) = + S X j X . .i = . = .jR/R^li and = [1 R/R3S]i 6.) = A S XtX. r i. .^ -0-143.7 32 + 12 1 49 144 229 +3 +4 +7 + 7 + 21 + 12 + 4 8 +27 + 79 - X a = .124 STATISTICS Consequently.t>.18.( .X2» = 0-857.0-286) = 1 .0-286. Calculate the multiple correlation coefficient of xt on xt and x}.0-143) = 3-857 . = 11-286.*„) = i £ X ^ j cov (x^.0-286) ( . ±E.0-286) ( .0-286. = 1.( . A S * .1 5 +10 0 3 0 0 20 4 -1 +1 -2 1 1 6 13 .D Find also the regression equation of Treatment: * a -= 3 5 3 2 = *** *3 21 21 *3-20 +1 -1 . = = 3-857. A S = 32-714.2 0 1 1 4 0 2 -1 + 1 4 3 1 8 -1 -3 +4 -2 1 9 16 32 4 2 2 3 2 -1 -I 0 1 1 1 1 0 + + 15 17 - 1 1 5 3 0 1 1 25 9 0 -1 -1 +2 0 0 + + + 1 + 1 1 .from the following data : *15 3 2 4 3 1 8 xs.0-082 = 0-918 ' = 3-857 . = l i . X% = . Worked Example.0-286)(— 0-143) = 11-286 .0-041 = 3-816 = 11-286 . + (6-I7.4 = .( . 2 4 2 2 3 2 4 21 21 15 17 20 13 32 on xt and x. cov(*. = [1 .0-041 = 11-245 . 8 = 4-571.R I R ^ i = ^ Likewise ' m i = [1 .

then at this value of x3.19-857) .more variates than one io105 . Suppose we have the three variates. 6. .16. + 20 = 19-857.» = L 1 ~ « r J = L 1 " 0*4254J = Regression equation of x1 on xt and x3 is (6. 1 3 where xx = X + 4 = 3-714. s.[0-492 . xv x2.) = { S j .0-143)2 = 32-714 . In line with (6.16.( .0-020 = 32-694 °'918 „ « 3-816 12 [4-489 x 0-775]i = 0-492. Hence the required regression equation is 0-20072(*! . ) (6-19. .31 and r3. 2 = 0-857 .) = J. + 24-0*2 .[32-694 x445489]i = ° ' 9 2 7 I1 0-492 0-927 R = 0-492 1 0-758 = 0-0584 | 0-927 0-758 1 Ru = [1 .X * = 32-714 . with regression equations (6. In many situations where we have three or more associated variates. Let x3 be held constant. = (0-775)* = 0-88034.0-082 = 4-489 var (*.-3 X V . + 3 = 2-714.16. x3.3-714) + 0-24024(*„ . * "1* f~i 0-0584-1 i . s. 3 and 621.0-082 = 0-775 var (*. £ Exercise: Calculate t-2.10(a)):— _ ^ + _ Xi) + 5 _ = o.( .(0-758)2] = 0-4254. R12 = .( . and st = (4-489)* = 2-11873.1).0-927] = . = X. .2) and (6. the two partial regression lines of xl on xs and of x2 on xt will have regression coefficients i 1 2 .3. x.14). = X. we therefore define the partial correlation of xt and x2 to be given by ' i s s 2 = (&x.3).0-09708(x.0-286)2 = 0-857 .0-5541 T. ' .9-7*3 + 53-0 = 0 A var (*.2-714) . r23 = [0-775 x 32-694]i ' 23 = °-758: . 98 •• r ' . Partial Correlation. = (32-694)* = 5-71787.19.0 or 20-1*.X t ' = 4-571 .16. it is useful to obtain some measure of the correlation between two of them when the influence of t h e others has been eliminated.0-286)» = 4-571 . x.12 and find the other two regression equations.0-758 x 0-927] = 0-211.6.Rls = [0-492 x 0-758 .1) .) = A _ x . (6.

3 = 1 r ~ r i* r ** .3 is independent of x3. Thus 1 ^ 1 = 1 x 7 . .2) In practice it is seldom found t h a t »-12. The determinant of these four numbers I 2 1) +* = — 1 is called the cofactor of a 2 i in The cofactor of a33 is 1112 1)3+3 I J — — | a H a l2 | 1 #21 #22 t ' ^21 #22 I n general. and we therefore regard the value of »-12.2 a = (631-2 x 6J3.8 . b. Suppose we select any element. and rule out the row and column in which it occurs. c. a 2 1 say.19.e.r M «)(l . [(1 . [0-4254 X 0-1407]* N o t e to Chapter Six Mathematical If a. d are any four numbers we denote the quantity by I a I.126 statistics Likewise 28-ia = (623-1 * 632-1) and y sl . ( I.2) • I t follows t h a t -^12 x _ s2 R21 _ r 2 _ _ 12 S S 2-RH 1-^22 •^11-^22 y ( 12 ^13^23)^ (1 . We are left with the 4 ad-be elements I U l 2 a I a38 S» multiplied by (— the determinant.. (6. f 12. A determinant of order three has 3 x 3 elements and is written I all «12 «18 j i a 21 a 22 a 23 ' ! ^31 ^82 ^88 I the suffixes of an element indicating the row and column it occupies in the determinant.»-312) i.-.3 x 5 = . we denote the cofactor of an element by AQ.18. J \ c a \ | 5 7 I Such a function of its four elements is called a determinant of order 2.' „ • ) ] * . = 128 005533 T = 0-226.3 given by (6.19..^ ( l . . Using the data of 6.2) as a rough average over the varying values of x3. we have y. having 2 x 2 elements.

3) + 2(14 + 2) + 3(.1) X ( . Say. t h e value of the determinant is zero. C. (The reader will find a useful introduction to determinants in C. atlA 21 + altA 22 + aisA 2l = 0. Smith's Biomathematics (Griffin). In fact.) . clearly.2) z (8 . Suppose now the elements of any row (or column) are proportional to those of another row (or column).21 _ 8) = — 50. we have a i2 "13 Xa12 Xa ls — a 11 Xa 12 a3 3 + a i2^ a 13 a 31 T »1S«<ll»tj — "-18' *31 "32 "33 In fact.more variates than o n e io105 127 The value. A. Now let us write a 11 Then. if any two rows (or columns) of a determinant are identical or their elements are proportional. t h a t sum is zero. B. if we form the sum of the products of the elements of any row (or column) and the cofactors of the corresponding elements of another row (or column). of the determinant may be obtained by forming the sum of the products of the elements of any row (or column) and their respective cofactors.3 2 -1 —1 = 1 1 + 3 3 2 i + ( . Thus 21 + «22A a2 + = anA 11 + a21A For instance 21 + aalA sl = a 12-^12 ~t~ ^22-^22 "1" ®S%A jjg = a 1 -2 3 7 4 . Aitken's Determinants and Matrices (Oliver & Boyd). A. a more detailed discussion is t o be found in Professor A.1 2 . for example.

assuming that there is error in the y values only.U.3 to . -1.2 . y.A.4 to —3 to —2 to — 1 to 0 to -3. (L. Calculate the coefficient of correlation between the continuous variables x and y from the data of the following table : X.) 3. 0. By the method of least squares obtain the best values of the constants m. represent graphically and comment on the fit.128 statistics EXERCISES ON CHAPTER SIX 1. 3.000 (I. 0-25 012 1-00 0-90 2-25 213 4-00 3-84 6-25 6-07 y • The corresponding values of x and y satisfy approximately the equation y = mx + c. c. 2. -2.) 22 8 .2 to . — — — — Total. . 1. Daily Newspapers (London and Provincial) 1930-40 : Year Number Average circulation (millions) Year Number Average circulation (millions) 1930 169 17-9 1936 148 18-5 1931 164 17-6 1937 145 191 1932 156 17-9 1938 142 19-2 1933 157 18-2 1939 141 19-5 1934 147 18-0 1940 131 18-9 1935 148 18-2 Fit a straight line to each of these series by the method of least squares. Calculate the means of the following values of x and y : X . 150 20 10 — 1 to 2 to 2.1 — 1 to 0 0 to 1 1 to 2 2 to 3 Total 10 24 214 40 60 40 30 16 38 224 20 90 60 36 30 48 284 10 20 50 42 ' 20 34 176 10 20 20 16 6 72 16 6 — 6 2 — 220 200 180 150 100 150 1. .

3). the two numbers within brackets denoting the ranks of the same student: (1.) 8. Show that the rank correlation coefficient is 0-51.) 9. 7). If xt — xx + xt.. m = 0-988(4). (Hint: Use (6. verify that the means of the ^-arrays are collinear. data-.0-874. 12).t = —0-586. 3-5-4-5. rn.U. Number of cases Mean y .15. In the table below. (Weatherburn. = 1. and also those of the y-arrays. 10).U.) 7. 13). and deduce that the correlation coefficient is —0-535. Total. x 0 1 2 3 4 0 y\ 3 1 12 12 18 54 18 4 36 36 4 3 9 3 (R. (5. r „ = 0-836..) 6. 20 11-3 30 12-7 35 14-5 25 16-5 15 19-1 125 — The standard deviation of y is 3-1. (10. r u = — 0-641. 4). = — 0-736.S. (L. (14. 3. (6. ra and Verify that the two partial correlation coefficients are equal and explain this result. (12.U.S. = —0-586. Index of income payments 114 137 172 211 230 239 Index of retail food prices 97 105 124 139 136 139 (L. r = 0-334. 6). (2. = 1-3. s. An ordinary pack of 52 cards is dealt to four whist players.. From the table below compute the correlation ratio of y on x and the correlation coefficient: Values of x 0-5-1-5. c = —0-105.S. 9. 11). 8).more variates than o n e io105 io105 4. 0-972. Calculate the correlation coefficient for the following U. 9). (13.. »•„ = 0-370. obtain r4>. e„ = 0-77(4). 14). 5). (R. 2). r„ . 4. If one player has r hearts. 1). 16).) Solutions 1.S. (3. . 1-5-2-5.) (L. s3 = 1-9. r = 0-76(4).S..1).) 5. (15. (8. 2-5-3-5. (9. Calculate »-„. The three variates xlr *„ jrs are measured from their means. (4. 4-5-5-5. what is the average number held by his partner ? Deduce that the correlation coefficient between the number of hearts in the two hands is — A. (7. 8. The ranks of the same 15 students in Mathematics and Latin were as follows. (11. r.

This we must now start t o do. measure the height and weight of each individual in a sample. that. the better any estimate we i base on our examination of t h a t sample.1. Suppose. t h a t we have settled on t h e size of the sample or samples we shall take. and calculate a value for r. we cannot weigh and measure every individual belonging to this " p o p u l a t i o n " . summarising their main properties mathematically and establishing certain general principles exemplified by them. for the time being. So far we have been concerned with problems of descriptive statistics : we have concentrated on describing distributions. t h a t sample must be a representative one. t h a t we are satisfied t h a t our method of sampling is of a kind t o ensure random samples : we take our. however. ! whatever the size of the sample. that. Assuming. sampling. for example. In practice. secondly. Common sense tells us. that we wish to find whether among males in the British Isles belonging to some specified age-group there is any correlation between height and weight. the larger t h e sample. may we draw inferences about the nature of a population when we have only the evidence of samples of t h a t population to go on ? Suppose. Inferences and Significance. other things j being equal. first. the correlation coefficient. Immediately a host of new doubts and misgivings arise : How do we know t h a t the value obtained for r is really significant ? Could it not have arisen by chance ? Can we be reasonably sure that. although the variate-values obtained from a sample show a certain degree of correla130 . For one of the fundamental problems of statistics is : How. how do we make sure t h a t the sample will be representative. N. and with what accuracy. We have not as yet used these summaries and general principles for other purposes. and. We therefore resort t o . a random sample ? This is our first problem.c h a p t e r s e v e n SAMPLE AND POPULATION I : SOME FUNDAMENTALS O F SAMPLING THEORY 7. samples. j based on the sample size.

problems. from sample to sample of the same population. of these values shall we use as the best estimate of p. unless we can establish some general rules of guidance on such matters. let us set down what appear to be a few of the main types of problem with which the necessity of making statistical inference—inference from sample to population based on probabilities—confronts us: (а) There are those problems involved in the concept of randomness and in devising methods of obtaining random samples. And. The " dictionary definition " of random sample is usually something along the following lines : A sample obtained by selection of items of a population is a random sample from t h a t population if each item in the population has an equal chance of being selected. or suppose t h a t with a different value of N we obtain yet another value for r. Before starting a more detailed discussion. (б) There are those problems which arise from t h e variation. What Do We Mean by " Random "f Unfortunately it is not possible here to enter into a detailed discussion of the difficulties involved in the concept of randomness. i i 131 tion. for. of significance. broadly. of the various sample statistics—problems concerned with the distribution of sample statistics.2. This is one of the main tasks of t h a t branch of statistics usually termed Sampling Theory. (c) There are those problems connected with how to estimate population parameters from sample statistics and with the degree of trustworthiness of such estimates. Like most dictionary definitions. it has the air of trying . Which. t h e correlationcoefficient in the population ? Clearly. all our descriptive analysis will be of little use.sample and population. this one is not really very satisfactory. as the reader will realise. lastly. in the population as a whole the variates are correlated ? Suppose we obtain from a second sample of A a different T value for r . 7. if any. (d) There are those problems which arise when we seek t o test a hypothesis about a population or set of populations in the light of evidence afforded by sampling.

we must reconcile ourselves. Thus. however. If the probability of any item in the population being chosen is constant throughout the sampling process. two other. Conceptual populations may be finite or infinite. sampling with replacement is simple sampling. " What precisely are we trying to find out about what population ? " (2) A method t h a t ensures random selection from one population need not necessarily do so when used t o sample another population. Sampling with replacement. however. I t has. however. this virtue. the population of all real numbers between 0 and 1. When sampling. the distribution of any variate in such a population is necessarily discrete. t h a t a population is so large t h a t even sampling without replacement does not materially alter the probability of an item being selected. so is any population in which t h e variate is continuous. after each selection. Because all measurement entails approximation. we call the sampling simple. there are those populations which actually exist and are finite. The second type of population we are likely to encounter are theoretical or conceptual populations: The difference between an actual population and a conceptual one is illustrated when we compare a truck-load of granite chips with. I t may happen.132 statistics desperately to disguise something t h a t looks suspiciously like circularity. t o using it. of the items of t h a t population. the probability of a n y item being selected is altered. sampling without replacement approximates to simple sampling. Nevertheless. t h a t it brings out the fact t h a t the adjective random applies to the method of selection rather than to any characteristic of the sample detected after it has been drawn. but any infinite population is necessarily conceptual. con- . and is thus equivalent to sampling from a hypothetical infinite population. here a t least. There are two ways of sampling such a population : after selecting an item. or set of characteristics. In this connection. related. points must be made : (1) W h a t we are out t o get when we sample a population is information about t h a t particular population in respect of some specified characteristic. Apart from their intrinsic interest. can never exhaust even a finite population. we may either replace it or we may not. with a stable population. say. In such a case. W h a t are the main types of population we may sample ? In the first place. we should keep asking ourselves. Sampling without replacement will eventually exhaust a finite population and automatically.

" assume the mantle of reality " ? Perhaps all we can say at t h e moment is t h a t such " populations " receive their ultimate justification in the empirical fact t h a t some events do happen as if they are random samples of such " populations ". precise.ii133 I3I ceptual populations are important because they can be used as models of actual populations or arise in the solution of problems concerned with actual populations. nor are they anything like as definite as the population of " all real numbers between 0 and 1 " . These are certainly not existing populations like a truck-load of granite chips. A moment's reflection will convince us t h a t there will have been a t least a tendency for t h e more massive chips t o gravitate towards the bottom of t h e truck. since there is no selection. since the chips have been thoroughly shaken up on the journey. On the other hand. just arrived in the siding. For example. in Kendall's phrase. the method of selection must be independent of the property or variate in which we are interested. which is. while the lighter and smaller tend to come to the top. an adequate sampling scheme would have been to select a number of the churns a t random and. Random Sampling.3. successful sampling of the varieties of birds visiting a given 10 acres of common land during a certain period of the year requires t h a t the sampling scheme be drawn up with the assistance of ornithologists intimately acquainted with the habits of possible visitors. Can we. then. for chip-size. there are " populations " such as t h a t of " all possible throws of a die '' or t h a t of '' all possible measurements of this steel rod ''.sample and p o p u l a t i o n . If we wish to sample a truck-load of granite chips. no choice. To begin with. regard t h e result of six throws of an actual die as a random sample of some hypothetical population of " a l l possible throws " ? And. However. We come then to the very much more practical question of how to draw random samples from given populations for specific purposes. it would be fatal t o assume that. successful sampling demands specialised knowledge of the type of population to be sampled. 7. had we been interested in sampling a given number of churns of milk for f a t content. for instance. can they be regarded as constituting a random sample ? And in what way do we conceive essentially imaginary members of such a " population " as having the same probability of being selected as those members which. There are many difficulties with such "populations". Finally. having thoroughly stirred their . any shovelful will provide us with a random sample. mathematically. Certain general principles should be borne in mind.

A Million Random Digits. giving 15. No. N.000 digits arranged in twos. to ensure t h a t shuffling is really thorough is by no means as simple as it sounds. R. Random Sampling Numbers. California. Indeed. Given a finite population. 24. but. representing a sample of n from the actual population. Tippett. Santa Monica. giving five-figure numbers (see Table 7. say. This set of numbers is virtually a conceptual model of the actual population. involves much work preparing the model. C. as. A. No. 7. we should seek to eliminate the human factor as far as possible. it is essential not to do so. H. t o ladle out a given quantity from each. bias is most definitely operative.600 digits. for instance. Among t h e best known are : L. Agricultural and Medical Research. for it is extremely likely t h a t if this is done the bias of number- . Moreover. Tracts for Computers. 7. 3. The first method is ticket sampling. Tracts for Computers. at first sight. in choosing the final digit in a set of four digit numbers. we assign to each item of this population an ordinal number 1.1). Suppose we wish to draw a sample of n. For experience tells us t h a t human choice is certainly not random in accordance with the definition we have here adopted. . giving 10. Ticket Sampling.4. So to eliminate this factor. Indeed. We construct a model of this population as follows : On N similar cards we write down the relevant features of each member of t h e population. 15. Let us assume t h a t we have a finite population of N items. Babington Smith. We do not " pick out " numbers haphazardly from such tables as these. Yates. G. composed of 41. bias would seem hardly likely. Fisher and F. Even in cases where. giving 100.400 four-figure numbers. The second method is the method of using random sampling numbers. shuffle the cards thoroughly and draw n cards. 2. . We use a table of random numbers. if the population is large. Kendall and B.5. of which but two can be mentioned here. M. .000 digits grouped in twos and fours and in 100 separate thousands.134 statistics contents. Statistical Tables for Biological. This is a fairly reliable method. published by the Rand Corporation. we resort t o a number of methods. B u t how can we select a number of churns a t random ? Can we rely on " haphazard " human choice ? The answer is " No " .

class.sample and p o p u l a t i o n . ignoring those greater than 58703.class.ii135 I3I such tables. 1 item from the 68. 23 169 439 1030 2116 3947 5965 8012 9089 8763 7132 5314 3320 1884 876 383 153 63 25 58703 Sampling Number. 1-23 24-192 193-631 632-1161 1162-3777 3778-7724 7725-13689 13690-21701 21702-30790 30791-39553 39554-46685 46686-51999 52000-55319 55320-57203 57204-58079 58080-58462 58463-58615 58616-58678 58679-58703 — We now read off from Table 7.class. 3 items from the 66. Calculate the mean of the sample and compare it with the population mean of 67-98. which we seek t o eliminate by using will operate once more.class. We thus obtain : 23780 28391 05940 55583 45325 05490 11186 15367 11370 42789 29511 55968 17264 37119 08853 44155 44236 10089 44373 21149 Our sample of 20 is consequently made up as follows : 2 items from the 64. Instead. 4 items from the 65. t h e table of is taken. . 2 items from the 72. having stated numbers used and having indicated which section should work systematically through t h a t section. 3 items from the 67. will make the procedure clear. we An example preference. Example: Draw a random sample of 20 from the " population " given in the table below. Treatment: we number the items in the table as follows: Length (cm) 59 and under 606162636465666768697071727374757677 and over Frequency.1 20 successive five-figure numbers less than 58704.class.class.class. 5 items from the 69.

Random Numbers (From A Million Random Digits. Illinois). December 1953. we saw t h a t our sample mean differed somewhat f r o m the m e a n of t h e population. grouped for convenience.) 23780 88240 97523 80274 64971 67286 14262 39483 70908 94963 28391 92457 17264 79932 49055 28749 09513 62469 21506 22581 05940 89200 82840 44236 95091 81905 25728 30935 16269 17882 55583 94696 59556 10089 08367 15038 52539 79270 54558 83558 81256 11370 37119 44373 28381 38338 86806 91986 18395 31960 45325 42789 08853 82805 03606 65670 57375 51206 69944 99286 05490 69758 59083 21149 46497 72111 85062 65749 65036 45236 65974 79701 95137 03425 28626 91884 89178 11885 63213 47427 11186 29511 76538 17594 87297 66762 08791 49789 56631 74321 15357 55968 44155 31427 36568 11428 39342 97081 88862 67351 N o t e : This table gives 5 0 0 random digits. published for the Band Corporation by the Free Press (Glencoe. 48. W e r e we t o t a k e a large n u m b e r of samples of 20 we should h a v e w h a t in fact would be a frequency distribution of t h e m e a n of so m a n y samples of 20. 264. also differing f r o m t h e population mean. (Hi) the population of London. (it) a forest of mixed hardwood and softwood. rejecting those greater than 160. In the example of t h e previous section. . the population mean. (v) the varieties of bird_s visiting a given area of common land..6.136 statistics Taking the mid-value of the classes as 59-5. previously published in the Journal of the American Statistical Association. 61-6. read off successive groups of three successive digits.) Table 7. 60-5.1. as compared with 67-98 cm. A different sample of 20 would have yielded a different value. the mean value of the sample is immediately found to be 1354/20 = 67-70 cm. (vij plants in a very large area of the Scottish Highlands. (iv) all the cattle in Oxfordshire. Exercise : It is desired to obtain random samples from the following : (i) a truck-load of granite chips. Explain the principles which would guide you in collecting such samples. The Distribution of Sample Statistics. Vol. The result is : (237) (802) (839) 105 (940) (555) (838) 125 (645) (325) 054 (906) (597) (411) (186) 153 (578) (824) 092 (457) (892) 009 The numbers in brackets are those greater than 160 and are rejected.U. Suppose six random numbers less than 161 are required. No. The six random numbers less than 161 so obtained are therefore : 105 125 54 153 92 and 9 7. in five-digit numbers. etc. (L.

(i = f . + OnXn = 2 a{Xi i= 1 (7. So the question arises : What do we know about the distribution of sample statistics when : (a) the population sampled is not specified. . . we form a new variate n X = + a2x3 . and (b) the population is specified ? 7.. (How?) If we drew the relative frequency polygon for this distribution. £{X) =£( \ i S 1OixA = t S i ai£(Xi) = / = or ^'(X) S Oi^'iXi) i« 1 (7. If x<.2) If we put at = 1 /n. . reformulate it as follows : Definition: Random Sample. . . mathematical use. variance and moments of higher order. . then the set Xi. for all i. . • • = £(xn) = £(x) = (x. . other sample statistics. and subject the to the condition that they all have the same probability density.sample and p o p u l a t i o n . for instance. we have £•(*!> = £(X2) = .. + OiXi + . n) is a set of n statistically independent variates.2) and then. the mean of a . .7. ... . . The Distribution of the Sample Mean. + * 2 + . for sampling with replacement. Likewise.) . 4(x„).7. 2. 3 . .ii137 I3I Suppose we drew every possible sample of 20 from the population of 58703. . 3. . with means | i i ( x i ) . with its own mean.1) We have where the a's are arbitrary constants.7. it would approximate closely to a continuous probability curve. + xn) jn. also have their distributions for samples of a given size. (i = 1. x„) = $(1kx) . This would give us. Now suppose from the n variates x. We begin by recalling our definition of a random sample (7. 2. the enormous number of 58703 20 /20 ! such samples. to make it of practical. (f>(x2). rt) is a random sample of n from a population whose probability density is 4>(x). not all zero. the sample variance. and X becomes (*. each distributed with the same probability density <f>{x) and if the joint probability density of the set is given by f(xlt x2.

(x. (i # j) (7.(x^)2] = But £{(Xi 2 afE^Xi — (X()2] *=1 + 2 n 2 n 0{(lj£[(Xi .2) (7.7.)2 + 2 2 i= 1 ) So . Also let pg be the coefficient of t h e correlation between a n d Xj (i ^ /). assuming such correlation t o exist. .. Then (X . for. is a better estimate of t h e " length " of t h e rod t h a n a n y one measurement. if i * j. — [nXj .V-i)(Xj .7.) (Xi W = oflfa Hence ox* = 2 ai'm* + 2 2 a a p w j p f t . say. t h a t this is so is obvious. So (7. the distribution of t h e sample mean is identical with t h e distribution of t h e individual items in the population.138 statistics sample of n froih a population <j>{x) a n d mean (x.\ijXi + (XiW) = = €{XiXj) cov ' fax.3) is also t h e justification of t h e common-sense view t h a t t h e average of a number of measurements of the " length " of a rod. becomes 6(x) = I S €(xi) = i . £(Xi> . Oi<ij(Xi \n)(Xj w n n = 2 ai*(xi . when i £[{*• .7.ZMxi n .w)] = £(xtx.for t h e mean of Xi. If n = 1.(x x ) 2 = (. (xx for t h a t of X. or.(Xi)(*f — W)] . a n d a.|Xi2 = <Ji2 .7.-2 a n d aj2 for t h e variances of a n d X. (7. taking all possible samples of 1.4) . respectively.2(Xi*< + (Xi2) = e(Xi') Also.Hi)) . W h a t of t h e variance of x ? We write jx.3) Thus the mean value of the m e a n of all possible samples of n is the mean of the population sampled: or the mean of the sampling distribution of x is the population mean.|X<)«] = j. n£(x) = =i n .

x. no m a t t e r how the variate is distributed. we have. since Xi and Xj(i ^ j) are independent. it is known that. whatever the population sampled. the standard deviation of the sampling distribution of any statistic is called the Standard Error of t h a t statistic.g.2). The above results hold for any population. remembering t h a t the m. In other words. say.7. o x a = o / = oa/7l Thus : The variance of the distribution of the sample mean is 1 /nth that of the variance of the parent population.7.f. . there is evidence t h a t the ^-distribution is approximately normal. while. the m. then py = 0. = CT„2 = a a . for all i.5) Here. + O n V = S chW i= l (7.ii139 This is an important formula in its own right.4 (a)) Again putting a{ = 1 /«.sample and p o p u l a t i o n . however. in general. and subjecting the x's to the condition t h a t they all have the same probability density <f>(x). n being the size of the sample. the variates are independent. and. The Distribution of x when the Population Sampled is Normal. and o x 2 = a 2 a = .g. of the mean. The mean-moment generating function for a normal distribution of variance o a is Mm(t) = exp (|a a < 2 ). If. The standard deviation of the sampling distribution of x is usually called the Standard Error of the Mean. the larger the sample. . the distribution of x tends towards normality.4. for a n y distribution is M(t) = f ( e x p xt).f. for all i. Consider a normal population defined by </>(x) = exp [ . [/. of the n independent . j. However. so t h a t X = x. . 7. even for relatively small values of n. Now. as the sample size increases. the mean of a sample of n from </>(x). we recall.(* or 0j a a* 2 = W = -4= VI • (7. (5. and + « 2 V + . the more closely the sample means cluster about the population mean value. . t h e population variance. is the population mean and <j!. and the function generating the moments about the origin is M(t) ^ Mm(t) exp (n<) = exp (ytt + &H*).8.

the m.8. of a normal distribution with variance cz* = S ofa* .f. .7. for x is exp (at + i(a a ln)t").g. each with probability function 4>(x). a„2.7. . £ (exp xt) = (£ (exp xt/n))n = (M(tjn))» = exp { n ^ / n + iaV/n 1 )}.f. n i £ ( e x p xt) — £ (exp S Xit/n) = £ ( II exp Xit/n) «=l „ / = II [£ (exp Xit/n)] i= l But.m. . but adds the important information t h a t the distribution of x. and. with variances c^2. then any linear function of these n variates is itself normally distributed. ..f. B u t the *'s are independent.1) B u t this is the m.140 statistics variates X{.g. (7. since (xv x2. . . xn are n independent variates normally distributed about a common mean (which may be taken at zero). . is n .8. n The proof is simple : Let X = S OiXi. a) is itself normally distributed about |x as mean with standard deviation (error) a/Vn. . This is in agreement with (7. xn) is a sample from <f>(x). when the population sampled is normal. . so. of a normal population with mean jj.g. . . 3. (t = ' 1. M{(t) is exp (Joi2**) and therefore Mx(t) = £ (exp Xt) = £ (exp ( S OiXit) j = £ ( II (exp «(*<<)).2) . of t=i X{. and variance a* In. Actually this is a particular case of a more general theorem : If xlt . The m. i.e. . 2. is itself normal. Hence— The mean of samples of n from a normal population (|x. n). a22. .g.f.5). (7.3) and (7. . . Mx(t) = n £ (exp OiXit) = n £ (exp x((ait)) = U M i i o i t ) = exp S^'c'j/'J which is t h e m. . .

the mean of a sample of n from an infinite population of mean jx and variance a 2 . This being t h e case the variate t = (x — y. p. will differ numerically from the population mean.273 and 1.sample and p o p u l a t i o n .277 kg net? Treatment: We axe here drawing samples of 2. t h a t here we are sampling a population whose variance is assumed known.500 boxes. The standard error of the mean net weight for samples of 2. of the two normal variates. 1.9. is approximately normally distributed about (jl as mean with variance a2In. however. When this is not t h e case. is normal about the common mean with variance ctj 2 + a 2 l . (i) if n — 2 and = a 2 = 1.500 from an assumed infinite population. and (ii) if n — 2 and a x = 1. and so we m a y assume t h a t x.ii141 i3i Consequently. Worked Examples. although we do not know t h a t t h e population sampled is normal. a 2 = — 1. The probability that the sample mean will deviate . by less than an amount d (measured in units of the standard error of the sample mean) is given approximately by 0 where fit) = V2w exp ( JL if 2 ).)/(a/Vn) will be approximately normally distributed about zero with unit variance. the problem is complicated somewhat and will be dealt with later. Now in many of the problems we encounter. The net weight of i~kg boxes of chocolates has a mean of 0-51 kg and a standard deviation of 0-02 kg. is normal about the common mean with variance <ji2 + <r2a.500 will be 0-02/V^500 = 0-0004 kg. 7.276/2. the distribution of the sum. n is large. X. The chocolates are despatched from manufacturer to wholesaler in consignments of 2.276 kg net? What proportion will weigh between 1. What proportion of these consignments can be expected to weigh more than 1. — x2.500 kg = 0-5104 kg. Thus in this case t = 0-0004/ 0-0004 = 1. xlt x 2 . The mean net weight of boxes in a consignment weighing 1.276 kg will be 1. the distribution of the difference. *i + *2> the two normal variates xlt x2. I n other words: The probability that the sample mean. I t must be emphasised.

a deviation from the population mean of + 2 standard errors. P(t > «i/5) = 0-1 and.273 kg. and.2 < t < 2) = 2P(0 < l < 2 ) = 2 x 0-4772 = 0-9544 In other words. with variancesCT12. since xx and are approximately normally distributed. Then the standard error of the mean is 125/V «.277 kg. aj = or. Consequently. let X = x1 — x%. It is decided to sample the output so as to ensure that 90% of the bulbs do not fall short of the guaranteed average by more than 2-5%. The deviation from the mean is then — 0 0008.000) = 0-009375 or ox = 0-0968. »i/5 = 1-281 or n = 40-96. so. X = 0-5 and ox = (2-5) (1/1. if x1 and are the means of these two samples.2 + <ra2. just over 95% of the batches of 2.276 kg. too. by 7. so. 3. for we do not worry about those bulbs whose life is longer than the guaranteed average.142 statistics from the populati6n mean by more than this amount is P(t > 1) = 0-5 — P(0 < t sj 1).500 boxes will weigh more than 1. The means of simple samples of 1. Can the samples be regarded as drawn from a single population of standard deviation 2-5 ? Treatment: Just as the sample mean has its own distribution. Therefore just under 16% of the consignments of 2.7. Therefore. —2. Then X = /x. Also the deviation from the 1. Then. X is distributed about' zero mean with variance o-j' = a 2 /«! -f And. — ft2. If x1 and x% are two independent variates distributed about mean /i„ /i.000-hour mean must not be more than 25 hours. in standardised units. If the consignment weighs 1. P(t ^ 1) = 0-3413. % a In our example. for this is a " one-tail " problem. o).CT22irespectively.500 boxes will lie between the given net weights. the required minimum sample size is 41. The probability that a consignment will weigh between these two limits is then—this being a " two-tail " problem— P ( .4 (a). or. P(0 < t < = 0-4 Using Table 5-4.000 + 1/2. the mean weight of a box in that consignment is 0-5092. Thus the observed X is more than 5 times the standard error of its distribution calculated on the hypothesis that the samples are from the same population with standard deviation .000 hours with a standard deviation of 125 hours. If a consignment weighs 1. the corresponding mean weight is 0-5108 kg. or. does the difference between the means of two samples of a given size. 2. What must be the minimum sample size ? Treatment: Let n be the size of a sample that the conditions may be fulfilled. This is a " one-tail" problem. Now set up the hypothesis that two samples of n1 and « 2 are drawn from a single population (/i. we find that t = 1-281. in standardised units.000 are 67-5 and 68-0 respectively. The " guaranteed " average life of a certain type of electric light bulb is 1. so is X.000 and 2. not more than 25/(125/V») = Vnj5. Therefore P(t > 1) = 0-1587.

on the assumption that X is approximately normally distributed. is the mean of the / t h sample. Denoting the sum of the items in the yth sample by ( L .5..10. is most unlikely. H . Then 1v 2 X{ = 0.e. m = 0. say. If m. taking the s population mean as origin. ii 143 2. i. let m be the < —1 mean value of all possible values of ntj. 7.5. /N — l\ s (») fN — 1\ since each Xi occurs I _ j ) times and so n 2 mj = 0.sample and population. This. there are but ^ ^ j ways of forming samples of n. we have. /N\ V"/ 1 r/N 1V"/ / — 1\ 11 B \1 /N — 2\ -I . Na 2 = S <. Let the population mean be zero. Thus the mean of the means of all possible samples is the mean of the parent population. n W . = \ and /N\ n V"/ (. I n these samples any figures j j times. We therefore reject the hypothesis that the samples are from a single population of standard deviation 2. if <jm* is the variance of the sample mean. Sampling Distribution of the Mean when Sampling is without Replacement from a Finite Population.l Moreover. Let the population sampled consist of N items. we have \ i. one value of the variate. If o 2 is t h e variance of the population.1 >) t " M W . Consequently. Let the sample size be n.~i*V. for if Xi is chosen. The number of samples of n t h a t can be drawn without replacement from N is ^ j .

.X* = . being population parameters. Therefore ji. once xt is chosen. (i ^ j). showing that when a. 2 2 (x&j) = — 2 t .2 Xi* .11.( 2 Xiln) L i ' .).144 statistics But and.2 = a 2 . Then in 1 n l n /« \2 s a = . <=] < j A = 0.• . since I 2 xA \«=j / 2 t= 1 = 2 Xi" + 2 2 (*<*.n ^ S ^ ' ^n S S (XiXj). 7. there are (n — 1) ways of chosing Xj so that Xj xt. (7.n N\ ~~ n ' iV .x)* = .1 ' n! (N . since and Xj are independent e ( x z (XiXj)) = 2 2 [£(xixj)] = 2 2 [£(Xi) . 6(Xj)] \ i j ' i j i J = 2 2 (£(#))* = 0 • i since 6(x) = (jlx' = 0. Let s 2 be the variance of a sample of n and let x be the sample mean. Also.2 (*. I 1 A or CTm2=_. (i * j) "<<=] = n i=l t j s i=1 Consequently sis') = .__ <s* N . and [j. .n)l „2 A — « T . [i s '. population is very large.10.2 Xi* . (i * j) But there are n ways of chosing Xi from n values and.'. o m 2 — a * in. (i * j) 71 i i 2 ^ ) - 2 iXiXj)]. sampling without replacement approximates to simple sampling. Distribution of the Sample Variance. .i s 2 (XiXj).i) If we let AT—^-qo .

' If there are p( < n. the number of degrees of freedom of the sample is reduced by p. of samples of n drawn increases. Let x be the mean of a sample of n. (7.*)*/(» 1)) = a 2 . £(s2) . the n t h is necessarily determined.> a 2 . as we should expect. We must now ascertain the standard error of the sample i . the mean value of which will tend to the actual value of a 2 as the number. ii 145 Thus the mean value of the variance of all possible samples of n is (n — 1) /n times the variance of the parent population. S f i = n. i i with this definition. there are only n — 1 independent for when we have selected n — 1 items of the sample. if we continue sampling and calculating ns*l(n — 1).sample and population. Then nx = S xu and. instead of the usual i 2 {Xi — x)"ln. the actual figure calculated from the data of a single sample will. s a = S (Xi — x)21n is a biased estimator of or2. Although we have not adopted this definition here. we shall obtain a set of values. s a . AT. we shall have £(nsi/(n 1)) = £((xi . as its values. <j*. i for a given x.2) In other words : If we calculate I (Xi — x)*/(n — 1). unbiased estimates of a population parameter is called an unbiased estimator of t h a t parameter. Of course. But. Thus if we draw a single sample of n from a population and calculate the variance. of course) linear equations of constraint on the sample. some writers define the variance of a sample to be S f i ( x t — x)2/(n — 1). Thus if 6 is a population parameter and 0 (read " theta A cap ") is an estimator of 0. differ from the actual value of 2 a . of t h a t sample. the population mean. 0 is an unbiased estimator if £($) = 0. S = nx is a linear equation of constraint on the sample.i) = jx. A function of x and n which in this way yields. and. on the other hand. For this reason.11. as the sample size increases indefinitely. m / 33 L Xijn is an unbiased estimator of (jl. in general. we are introduced by it to an important notion-—that of the degrees of freedom of a sample. i since £(m. the sample variance is an unbiased estimate of the population variance. we have an unbiased estimate of the populai tion variance.

as it is i-1 called. x -(. xn) from such a population is such that its mean lies between x ± and a standard deviation lying between s ± ids ? Since the n x's are independent yet are drawn from the same population. the equation 2 (Xi — x) 1 = ns' becomes <-1 If n were equal to 3. Then dxxdx2 . . dp will be the probability that our sample has a mean and standard deviation lying between these limits.( V + V .11.again we confine our attention to the case when the population sampled is normal with zero mean and variance a2. . x2 ± idxt. . .3) Now think of (xlt x2. . x). and the " distance " between this " plane " and the " parallel plane " is dx . Then dp is the probability that the point P shall lie within this volume element. the equation n n . . . the probability that the n values of x shall lie simultaneously between x1 ± \dxx. 3i. 2 X{ 1=1 «= nx may be written (xl — x) + (xz — x) + (xa — x) = 0 and represents a plane through the point (x. dp = exp ( . dxn (7. represents a sample of n with mean lying between x ± \dx and a standard deviation between s ± -Jds. Now we have the two equations. Once. i + dx. Call it dv. Moreover. equation 2 Xi = nx represents a " hyperplane ". + *„*))2csi)dx1dx2 . Again.dx) is dx . . lying within it. 3i and. xn) as the co-ordinates of a point P in a space of n-dimensions. the n . . ± idx„ is + . . . so. If now we choose this volume element dv in such a way that any point P. the perpendicular distance between this plane and the parallel plane through (x + dx. n n £ Xi = nx and 2 — x)2 — ns2. Our problem therefore is to find an appropriate dv. .146 statistics variance. In the w-dimensional case. if « = 3. t=i Each of these equations represents a locus in our w-dimensional space. dx„ is an element of volume in that space. . the length of the perpendicular from the origin on to this plane is 3^/3i = x . x. What is the probability that a sample (xu x2.

sample

2 2

and

population.

2 2

i

147

(#! — x) -)- (x2 — x) + (x, — x) = 3s , and thus represents a sphere, with centre (x, x, x) and radius s . 3i. The plane B 2 Xi = 3x passes through the centre of this sphere. The i= 1 section will therefore be a circle of radius s . 31, whose area is proportional to s3. If s increases from s — %ds to s 4- J\ds, the increase in the area of section is proportional to d(s2). So the volume, dv, enclosed by the two neighbouring spheres and the two neighbouring planes will be proportional to dx . d(s2). In the w-dimensional case, instead of a sphere, we have a " hypersphere " of " radius " s . ni and this " hypersphere " is cut by our " hyperplane " in a section which now has a " volume " proportional to s n _ 1 . So, in this case, dv is proportional to dx . d{sn~x) = ksn~2dsdx, say. Consequently, the probability that our sample of n will lie within this volume is given by

d p

= (1S^STi

ex

P { - i , f ,

n

X i

* }

sn

~2dsd* '

<7'

1L3(a))

2 (xt — x)2 = ns2 may be written i=1 2 X{2 — n(s2 -f x2) and, therefore, i= 1 dp = kt exp (— nx2/2a2)dx x X k2 exp ( - ns2l2v2)(s2)C-3)l2d(s2) . (7.11.4) where k1 and kt are constants. 1 1 Determination of k1: Since r + °o kl exp (— nx2 ji^dx = 1 But the equation we have immediately (5.4(e) footnote) Aj = (2iro2/n)~i. Determination of Aa : s'* varies from 0 to 0 ; therefore 0 Aa exp ( - «s2/2a2)(sa)<"~3"2d(s2) "0 Put «sa/2<72 = x\ then ht f ( 2 o a / « ) ( " - e x p ( - x)x^-W2-1dx = 1 0 But since, by definition (see Mathematical Note to Chapter Three),

c

r

I'

exp ( - x)x<— W- 1 ** = r « » - l)/2) A, = (»/2os)<—1>'2/r((n - l)/2).

i48

s t a t i s t i c s

We see imme'diately that, when the population sampled is normal: (i) the mean x and the variance s 2 of a sample of n are distributed independently; (ii) the sample mean is distributed normally about the population mean—taken here at the origin—with variance a2In-, and (iii) the sample variance s 2 is not normally distributed. The moment-generating function for the ^-distribution is M{t) s £ (exp ts*). Thus

Af(<) = k2j

exp (— «s a /2a a ) exp (fe2)(s2)("-3)/2cZ(s2)

**we have x f exp ( - ^ h W " ^ ) " 0 But, by 3.A.3., exp ( - X * ) . ( X * ) ( ~ ) ~ l d ( X * ) =
**

1

I

^

2

)

...

=

.

.

(7.11.5)

The coefficient of t in the expansion of this function is the first moment of s 2 about the origin : (n — 1)<j2/m, as already established. The coefficient of <2/2 ! is the second moment of s 2 about the origin: («a — 1 )o4/«2. Hence var (s2) = (w2 - l)o*/wa - (n - l ) V / « 2 = 2(n - l ^ / w 2 For large samples, therefore, var (s2) == 2a* jn. In other words, = the standard error of the sample variance for a normal parent population is approximately ^ for large samples.

sample

and

population.

ii

149

7.12. Worked Example. If s, 2 and s22 are the variances in two independent samples of the same size taken from a 2common normal population, determine the distribution of Sj2 + s s . (L.U.) Treatment: The moment-generating function of s 2 for samples of n from a normal population (0, a) is (7.11.5) M(t) = (1 - 2o></»)-<"-»>'* Hence £(exp tsts) = £(exp tst2) = M(t) But, since the samples are independent, £ (exp t(st2 + s22)) = £[(exp is,2) (exp te22)] = [Af(f)]2 Hence the m.g.f. for the sum of the two sample variances is (I - 2o«//»i)-<»-»). Expanding this in powers of t, we find that the mean value of 2 2 2 s, + sj is 2(n — l)cr /«—the coefficient of til I—and var(s, 2 + j, 2 ) = 4n(n - l)a*ln2 — 4(n - 1 ) V / » 2 = 4(n - 1)CT4/«2. which, for large n, is approximately equal to 4a'/«. The probability differential for s, 2 + s s 2 is dp = '^^"i)' exp ( - n(s* + s12)l2o2)(s12 + st2)»-2d(si2 EXERCISES ON CHAPTER SEVEN 1. Using the table of random numbers given in the text, draw a random sample of 35 from the " population " in Table 5.1. Calculate the sample mean and compare it with the result obtained in 7.5. 2. A bowl contains a very large number of black and white balls. The probability of drawing a black ball in a single draw is p and that of drawing a white ball, therefore, 1 — p. A sample of m balls is drawn at random, and the number of black balls in the sample is counted and marked as the score for that draw. A second sample of m balls is drawn, and the number of white balls in this sample is the corresponding score. What is the expected combined score, and show that the variance of the combined score is 2mp(\ — p). 3. Out of a batch of 1,000 kg of chestnuts from a large shipment, t is found that there are 200 kg of bad nuts. Estimate the limits between which the percentage of bad nuts in the shipment is almost certain to lie. 4. A sample of 400 items is drawn from a normal population whose mean is 5 and whose variance 4. If the sample mean is 4-45, can the sample be regarded as a truly random sample ? 5. A sample of 400 items has a mean of 1-13; a sample of 900 items has a mean of 1-01. Can the samples be regarded as having been drawn at random from a common population of standard deviation 0-1 ? + st')

150 s t a t i s t i c s

6. A random variate x is known to have the distribution p(x) = c(l + xja)m-1 exp (— mxja), - a < x Find the constant c and the first four moments of x. Derive the linear relation between /J, and /S2 of this distribution. (L.U ) 7. Pairs of values of two variables * and y are given. The variances of x, y and (x — y) are <rx2, and <7(X_ „)2 respectively. Show that the coefficient of correlation between x and y is (L.U.) 8. If « = ax + by and v — bx — ay, where x and y represent deviations from respective means, and if the correlation coefficient between x and y is p, but « and v are uncorrelated, show that a,a r = (a2 + (L.U.) Solutions 2. Expected score is m. 3. Probability p of 1 kg of bad nuts is ^ ^ = 0-2. Assume this is constant throughout the batch, q = 0-8. Mean is np and variance is npq. For the proportion of bad nuts we divide the variate by n and hence the variance by «2, giving variance = pqjn. The standard error of the proportion of bad nuts is = s j ^ ^1000^ ^ = 0-1264. The probability that a normal variate will differ from its mean value by more than three times its standard error is 0-0027. We can be practically sure that no deviation will be greater than this. Required limits are therefore—for the % of bad nuts—100(0-2 ± 3 x 0-01264) = 23-8% and 16-2%. 4. No, deviation of sample mean from population mean > 4 x S . E of mean for sample of size given. 5. Difference between means is nearly twenty that of S . E of difference of means. I p(x)dx = 1. Transform by using substitution J-a 1 + xja = tjm. c — mme-mlaV{m); mean-moment generating function is t~"\ 1 - - J ; = a- ^j,2! = a a /2m\ m/3! - a»/3m»; 2 M„/4! = a*(m + 2)/8m ; 2JS, - 3/3, = 6. 6. Hint:

e. t. i. .1. and. We have seen t h a t if x is the mean of a sample of n from a normal population (|x. it is not.1) is a normal variate.. (8.1. z. But if we do this.1(a)) Since we m a y write t = (n — l)i{(x/a)/(s/a)}. Thus dp(x) = (»/2?t)* exp (— nx*!2)dx Consequently. (8. If we hold s constant we have sdt = (« — 1 )*dx.c h a p t e r e i g h t SAMPLE AND POPULATION I I : t. But what if the variance of the parent population is unknown and we wish to test whether a given sample can be considered to be a random sample from t h a t population ? The best we can do is to use an unbiased estimate of a based on our sample of n. In fact. s{n/(n — l)}i. x and s3 are statistically independent. therefore.y.2) 151 . a) the variate t ss (x — (i)/a/Vn is normally distributed about zero mean with unit variance. s being the sample variance. the probability differential of t for a constant s 1 may be obtained from dp(x) by using 8. then t =(n1 )*x/s . we have dp{t.)(n 1 )*/s .1(a) and the relation sdt — (n — 1 )idx. is independent of the population variance— a most convenient consequence which contributes greatly t o the importance of the distribution. the /-distribution.1. . we cannot assume that t = (x . Now x is normally distributed about zero with variance n~l (since t is independent of a. we may take a = 1) and. W h a t is the distribution of this t (called Student's t) ? Let the population mean be taken as origin.1. constant s1) = [n/2n(n — l)]*s exp [— nsH*/2(n — l)]<ft (8. . as we showed in the last chapter. AND F 8. The ^-distribution. .1. when t h e parent population is normal.

l)/2] ™*/2)(s2)(»-Wd(s2) X X dt f (sa)<» . we have l}/2]d(s ! ) f [ ( * .2)/2 e x p [ .ns2t2/2(n — 1)] s x X dp(t) = dt f 0 (w/2)l»-i)/2 r [ ( n .D x (1 + t2j(n . However.« " .2.!))]-*. for example. a t the same time. By (7. which defines what is known as the Cauchy distribution. we obtain t h e probability differential of t for all s2.l)/2] e x p _ (n/2)(».l)]t V[{n .11.ms2{1 + t2/(n 'o Putting s 2 = 2w/w[l + t2l{n — 1)].3) We see at once t h a t t is not normally distributed. t h a t B[(n . since [1 + t2j(n — l)]-"/ 2 m a y be written [(1 + <*/(» . we have dp(t) = (1 /tc) (1 + t2)-1. Vn — 1 + t*/(n .2 T V .! ) ] .1 )]-nl2dt I V e x p 0 r(«/2) " VTC(w — l)*r[(w — l)/2] • f 1 + ^since I j / n . using Stirling's approximation (5.1)* tends to Vsfc a s n —y oo .l)/2] X (2/w)"' 2 [1 + t2l(n . 6Xp = [nJ2n{n — 1)]} exp [ . (n . while . being infinite.M[n/2n(n . n = 2. a distribution departing very considerably from normality. If.4). it can be shown.1 )/2.7).l)]-»/2dt (8. 1/2] . for instance. the variance.152 statistics If now we multiply this by the probability differential of j* a n d integrate with respect to s 2 from 0 to <x>.1. as the reader should verify for himself. 2 > _ 1 exp ( 0 1 ] B/ (- >" <*' = r(w/2)j (since vtc = r(^)) or <#(<) = 1 B p { ~ 1t \ .

1 at v = 15. we find t h a t the value of | 11 which will be exceeded with a probability of 0-05 is 2-13. On this information is there any reason to reject the hypothesis that the population mean is 43 ? Treatment: t = 1-5 x 15*/2-795 = 2-078 for 15 degrees of freedom. A random sample of 16 values from a normal population is found to have a mean of 41-5 and a standard deviation of 2-795. 8. we find that the probability of / > 1 -75 is 0-10 and of t > 2-13 is 0-05. Suppose. (1 + / /(» — exp ( . Worked Example. from the sample data. we put X 15 2^795 Entering Table 8. in the above example.3. Thus the probability that the population mean is 43 is over 0-05.sample m and population. the number of degrees of freedom of the sample. Confidence Limits. s ii 153 (1 + x/m) —> exp x as m —> ®. Entering Table 8.2. To find these limits. Sir R. Hence L1 1 %95JiJ > < < 2 13 " or 39-9 < (x < 43-1 Exercise : Show that the 98% confidence limits are 39-62 and 43-38. 8. in random sampling. with certain probabilities (•P)• Fisher's P is related to Fv(t) by P = 1 — 2 J Fv(t)dt. and to write the probability function of t for v degrees of freedom thus In his Statistical Methods for Research Workers. ~1} —> Thus the t-distribution approaches normality f \ exp (—t'/2)) as n increases. A. the limits within which the population mean will lie with a probability of 0-95. We call these limits the 95% confidence limits of the population mean for the sample in question. On the information provided by the sample there is then no reason for rejecting the hypothesis. . we had wanted to find. Fisher gives a table of the values of | 11 for given v which will be exceeded.1 at v = 15.<s/2). w 2 t c It is customary to put v = (n — 1).

) p. s v i And if tp is the value of t with a probability P of being exceeded . from Statistical Methods frn Research Workers. 1*1 = 1 1 l£rufj. for a sample of n with mean x and variance s%.1. Oliver and Boyd. statistics Values of \t \ for Degrees of Freedom Exceeded with Probability P in Random Sampling (Abridged.154 Table 8. Fisher. Messrs. A. and thi publishers. Sir R. v. by permission of the author. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 00 1-000 0-816 0-765 0-741 0-727 0-718 0-711 0-706 0-703 0-700 0-697 0-695 0-694 0-692 0-691 0-690 0-689 0-688 0-688 0-687 0-684 0-683 0-682 0-681 0-680 0-679 0-678 0-674 6-34 2-92 2-35 2-13 2-02 1-94 1-90 1-86 1-83 1-81 1-80 1-78 1-77 1-76 1-75 1-75 1-74 1-73 1-73 1-72 1-71 1-70 1-69 1-68 1-68 1-68 1-67 1-64 12-71 4-30 3-18 2-78 2-57 2-45 2-36 2-31 2-26 2-23 2-20 2-18 2-16 2-14 2-13 2-12 2-11 2-10 2-09 2-09 2-06 2-04 2-03 2-02 2-02 2-01 2-00 1-96 31-82 6-96 4-54 3-75 3-36 3-14 3-00 2-90 2-82 2-76 2-72 2-68 2-65 2-62 2-60 2-58 2-57 2-55 2-54 2-53 2-48 2-46 2-44 2-42 2-41 2-40 2-39 2-33 0-50 0-10 0-05 0-02 0-01 63-66 9-92 5-84 4-60 4-03 3-71 3-50 3-36 3-25 3-17 3-11 3-06 3-01 2-98 2-95 2-92 2-90 2-88 2-86 2-84 2-79 2-75 2-72 2-71 2-69 2-68 2-66 2-58 In general.

sample

and

population.

ii

155

for v degrees of freedom, then the (1 — P)100% confidence limits for (x are: x — stp/v* < (Jt < x + stp/vi . . . (8.3.1) 8.4. Other Applications of the {-distribution. shown by Sir R. A. Fisher t h a t : I t has been

If t is a variate which is a fraction, the numerator of which is a normally distributed statistic and the denominator the square root of an independently distributed and unbiased estimate of the variance of the numerator with v degrees of freedom, then t is distributed with probability function Fv(t). Problem : Given two independent samples of n1 and «2 values with means x and X, how can we test whether they are drawn from the same normal population ? We begin by setting up the hypothesis that the samples are from the same population. Let xlt (i = 1, 2, . . . «,), and Xt, (j = 1, n. _ 2, . . . n2), be the two samples. Then x = S x(jn1 and X

n, «=1

= E Xjjn 2 , while the sample variances are respectively j=i s, 2 = S (*, - ^)2/«! and saa = S (X, t = 1 3 = 1

X)'jna

These give unbiased estimates of the population variance, "i s i 2 /(»i — !) a n d »!S2a/(«a — !)• Now since £(nlSl* + n2ss2) = (», - l)a2 + (», - l)a2 = (», + n, - 2)o" a2 = (KiS,2 + « 2 5 s 2 )/(« l + « , - 2) . . (8.4.1) gives an unbiased estimate ofCT2based on the two samples, with " = n 1 + wJ — 2 degrees of freedom. If our hypothesis is true—if,_that is, our samples are from the same normal population, x and X are normally distributed about ft, the population mean, with variances a1jn1 and atjnt respectively. Therefore (7.9 Example 3), since the samples are independent, the difference, x — X, of their means is normally distributed with variance <j2(l /n 1 + 1 /» s ). It follows that <j2(l /«j + 1 /»8) is an unbiased estimate of the variance of the normally distributed statistic x — X, and therefore, in accordance with the opening statement of this section, [«i»,/(». + • • • (8-4.2)

is distributed like t with * = « , + « , — 2 degrees of freedom.

156

statistics

8.5. Worked'Example. 1. Ten soldiers visit the rifle range two weeks running. first week their scores were 67, 24, 57, 55, 63, 54, 56, 68, 33, 43 The second week they score, in the same order : 70, 38, 58, 58, 56, 67, 68, 77, 42, 38

Thi ; j i

Is there any significant improvement? How would the test bt affected if the scores were not shown in the same order each time f (A.I.S.) Treatment: 1st week score (x). 67 24 57 55 63 54 56 68 33 43 520 (10*) 2nd week score (X). 70 38 58 58 56 67 68 77 42 38 572 (10X) X - x 3 14 1 3 -7 13 12 9 9 -5 52 (X - *)» 9 196 1 9 49 169 144 81 81 25 764 4,489 576 3,249 3,025 3,969 2,916 3,136 4,624 1,089 1,849 28,922 X% 4,900 1,444 3,364 3,364 3,136 4,489 4,624 5,929 1,764 1,444 34,458

(1) We assume there is no significant improvement, that, consei quently, both X and x are drawn from the same normal population and that, therefore, X — x is normally distributed about zero Then, regarding the 10 values of J? — x as our sample, we have s» s var (X - *) = S (X - *)»/» - (X - x)* - 76-4 - 27 04 = 49-36; and, therefore, s = 7-026. Hence t = (X - x)(n - 1 )i/s = 5-2 x 3/7-026 = 2-22 Entering Table 8.1 at v = 9 we find that the probability with whicl t = 2-26 is exceeded is 0-05, while the probability with which t = 1-83 is exceeded is 0-10. Therefore the result, while significant at th< 10% level, is not significant at the 5% level. We conclude, there fore, that there is some small evidence of improvement. (2) Had the scores not been given in the same order, we shouk have had to rely on the difference between the mean scores. W<

sample

and

population.

ii

157

**again suppose that there has been no significant improvement and use the variate _
**

' =

X

7 " • (njizl(nx

+

nx))i,

where = + n ^ 2 ) / ^ + nz - 2). In the present case nx = nx = 10, and we have 10s*2 = E* 2 - 10*2 and 10s z 2 = 10.X2. 10(5/ + sx2) = S*2 + S X' - 10(x* + X*) = 28,922 + 34,458 - 10(52» + 57-22) = 3,622. 2 /. a = 10(s/ + sj 2 )/18 - 201-2 or a = 14-18. Consequently, t= x (100/20) i = 0-82 for v = 18 d.f.

Entering Table 8.1 at v = 18, we find that there is a 0-5 probability that t will exceed 0-688 and a probability of 0-10 that t will exceed 1-73. Consequently, the result is not significant at the 10% level and there is no reason to reject the hypothesis that there has been no significant improvement. 2. In an ordnance factory two different methods of shell-filling are compared. The average and standard deviation of weights in a sample of 96 shells filled by one process are 1-26 kg and 0-013 kg, and a sample of 72 shells filled by the second process gave a mean of 1 -28 kg and a standard deviation of 0-011 kg. Is the difference in weights significant? (Brookes and Dick.) Treatment: Assuming that there is no significance in the difference of weights, _ 96 x (0-013)2 + 72 x (0-011)2 0 96 + 72 - 2 or 0 = 0-0125; \x - X | = 0-02 and (nxnxl(nx + nz)i = (96 x 72/168)* = 6-43. .-. | * | = 0-020 x 6-43/0-0125 = 10-29 for 166 degrees of freedom. Since v is so large in this case, we may assume that t is normally distributed about zero mean with unit variance. Then | t \ > 1 0 standard deviations and is, therefore, highly unlikely by chance alone. The difference in the weights is, then, highly significant. 8.6. The Variance-ratio, F. We now discuss a test of significance of t h e difference between t h e variances of two samples from the same population. Actually, if t h e sample variances axe such t h a t the two samples cannot have been drawn from t h e same population, it is useless t o apply the 2-test t o ascertain whether the difference between t h e means is significant, for we assume in establishing t h a t test t h a t t h e

158

statistics

samples are in' fact from the same population. Thus, the present problem is logically the more fundamental. Problem : A standard cell, whose voltage is known to be 1-10 volts, was used to test the accuracy of two voltmeters, A and B. Ten independent readings of the voltage of the cell were taken with each voltmeter. The results were : A . M l 115 114 110 109 1 11 112 115 113 114 B . 1 12 1 06 1 02 1 08 1 11 1 05 1 06 1 03 1 05 1 08 Is there evidence of bias in either voltmeter, and is there any evidence that one voltmeter is more consistent than the other? (R.S.S.) We already know how to tackle the first part of the problem (see at the end of this section), but what about the second part ? The consistency of either meter will be measured by the variance of the population of all possible readings of the voltmeter, and this variance will be estimated from the ten sample readings given. Thus we have to devise a test to compare the two estimates. This has been done by Sir R. A. Fisher, whose test is : If u* and v2 are unbiased estimates of a population variance based on nx — 1 and n2 — 1 degrees of freedom respectively (where rii and n2 are the respective sample sizes), then by calculating z = i log, (u'/v') and using the appropriate tables given in Statistical Methods for Research Workers, we can decide whether the value of this variance ratio, u*lvs, is likely to result from random sampling from the same population. Let xu (i = 1, 2, 3, . . . n x ), and Xj, ( j = 1, 2, 3, . . . n?), be two independent samples with means x and X respectively.* Unbiased estimates of the population variance are : u*

= m 1 s 1 2 / ( m 1 — 1),

2

and v2 = n^s^Kn^ —

1),

where s ^ and s 2 are the respective sample variances. If V 1 = n l — 1. V | S « , - 1 , V + 1) a n d = v2t/»/(v2 + 1) . (8.6.1) s Now the sample variance, s , has the probability differential (7.11.4) n-l <*p(s*) =

(w/2j2) 2

exp ( - ns'/2a>) ( s ' ) ~ d(s')

5) .1). (8. for a given v. we integrate this with respect to v2 between 0 and oo.2) But m and v are independent. the probability differential of m2 is -.(viK2 + v2v2)/2e2]d(u*)d(v2) Now let 2 s l n («/y) = J i n (M2/«2) 2 2 . therefore t h e joint probability differential of u2 and v2 is X exp [ . d(u2) = 2v2 exp (2z)dz.6. (vtefc 4. (8. obtaining 2(v 1 /2qy' 2 (v 1 /2<Ty' i i = r(vj2)rM2) » x n.v2) 2 . and.(vj«2* + v2)u2/2a2](t>2) v.6.6. .s a m p l e a n d p o p u l a t i o n .2 2 d(v2)dz . .2 22 a a 2 / Recalling t h a t r T(n) — / xn'1exp 0 u exp[— (vjtf + v2)^ /2CT ](y ) 2 <2(y2) (— we put 1 x = ^ (v^ 22 -f v2)v2. -t.3) (8.-a dp(u2) = [ K /2<ja)">/21r(vj/2)] (tta) 2 exp ( ^u'l2a2)d(u2) (8. ii 159 Substituting from (8. Then u — v exp (2z).6.6. + v. . Therefore X exp [ .6) To find t h e probability differential of z.y.

7). P(z Z). At t h e intersection of the appropriate column and row we find two figures : the upper figure is t h a t value of F exceeded with a probability of 0-05.. the " 5% and 1% points " of z.6.f. F./2. We may now return to t h e problem at the beginning of this section. To obviate the necessity of using logarithms."'V«/2 JPC-! . u2/v2. in honour of Fisher.. v. it should be noted. How do we use it ? The probability. Substituting F = u2 lv2. Table 8. in (8^6.01.. which. Zo. Thus P(x ^ X) is the Incomplete B-function Ratio. i/. Snedecor (Statistical Methods. 1 Writing x = nFl("ip + vt) °r F = ^/"ifi - x ) we have Hence = jf V . He calls these values. the d. Inc. t h a t z will be not greater than some given value Z for v t . we have v. . is independent of the variance of the parent population. of the smaller estimate. the 1% point. Ames. vx> of the larger estimate.2)/2 (Z (ViF + v2) 2 I n the F-table.P(z < Z) = 1 — j dp(z). In his book o Fisher gives tables setting down the values of Z exceeded with probabilities 0-05 and 0-01 for given v t and v2. v2 degrees of freedom is j dp(z).< 1 - and the integral is B I (v.. give the column required. where u2 is the larger of the two estimates of the population variance./2) and can be found from the appropriate tables (see Mathematical Note to Chapter Three. the d. or F = exp (2z). the lower figure is t h a t value exceeded with a probability of 0-01. Then o z P{z > Z) = 1 .5 and Z0. D). t h e 5% point. lz(v1j2. Iowa) tabulated the 5% and 1% points of the variance ratio. the row required. which he denotes by F.160 statistics This defines Fisher's ^-distribution. Collegiate Press. and v2. instead of z = -J log./2).f.

W.sample T a b l e 8. 5% and population. 246-249). \ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 161 4052 18-51 98-49 10-13 34-12 7-71 21-20 6-61 16-26 5-99 13-74 6-59 12-25 6-32 11-26 5-12 10-56 4-96 10-04 4-84 9-65 4-75 9-33 4-67 9-07 4-60 8-86 4-54 8-68 4-49 8-53 4-45 8-40 4-41 8-28 4-38 8-18 2 200 4999 19-00 99-01 9-55 30-81 6-94 18-00 5-79 13-27 5-14 10-92 4-74 9-55 4-46 8-65 4-26 8-02 4-10 7-56 3-98 7-20 3-88 6-93 3-80 6-70 3-74 6-61 3-68 6-36 3-63 6-23 3-59 6-11 3-55 6-01 3-52 5-93 3 216 5403 19-16 99-17 9-28 29-46 6-59 16-69 6-41 12-06 4-76 9-78 4-35 8-45 4-07 7-59 3-86 6-99 3-71 6-55 3-59 6-22 3-49 5-95 3-41 5-74 3-34 5-56 3-29 5-42 3-24 5-29 3-20 5-18 3-16 5-09 3-13 5-01 4 225 5625 19-25 99-25 9-12 28-71 6-39 15-98 5-19 11-39 4-53 9-15 4-12 '•85 3-84 7-01 3-63 6-42 3-48 5-99 3-36 5-67 3-26 6-41 3-18 5-20 3-11 5-03 3-06 4-89 3-01 4-77 2-96 4-67 2-93 4-58 2-90 4-50 5 230 5764 19-30 99-30 9-01 28-24 6-26 15-52 5-05 10-97 4-39 8-75 3-97 7-46 3-69 6-63 3-48 6-06 3-33 5-64 3-20 5-32 3-11 5-06 3 02.6. 4-86 2-96 469 2-90 4-56 2-85 4-44 2-81 4-34 2-77 4-25 2-74 417 6 234 5859 19-33 99-33 8-94 27-91 6-16 15-21 4-95 10-67 4-28 8-47 3-87 7-19 3-68 6-37 3-37 5-80 3-22 5-39 3-09 5-07 3-00 4-82 2-92 4-62 2-85 4-46 2-79 4-32 2-74 4-20 2-70 4-10 2-66 4-01 2-63 3-94 8 239 5981 19-37 99-36 8-84 27-49 6-04 14-80 4-82 10-27 4-15 8-10 3-73 6-84 3-44 6-03 3-23 6-47 3-07 5-06 2-95 4-74 2-85 4-50 2-77 4-30 2-70 4-14 2-64 4-00 2-59 3-89 2-55 3-79 2-51 3-71 248 3-63 12 244 6106 19-41 99-42 8-74 27-05 5-91 14-37 4-70 9-89 4-00 7-79 3-57 6-47 3-28 5-67 3-07 5-11 2-91 4-71 2-79 4-40 2-69 4-16 2-60 3-96 2-53 3-80 2-48 3-67 2-42 3-55 2-38 3-45 2-34 3-37 2-31 3-30 24 249 6234 19-45 99-46 8-64 26-60 5-77 13-93 4-63 9-47 3-84 7-31 3-41 6-07 3-12 5-28 2-90 4-73 2-74 4-33 2-61 4-02 2-60 3-78 2-42 3-59 2-35 3-43 2-29 3-29 2-24 3-18 2-19 3-08 2-15 3-00 2-11 2-92 00 254 6366 19-50 99-60 8-63 26-12 563 13-46 4-36 9-02 3-67 6-88 3-23 5-65 2-93 4-86 2-71 4-31 2-54 4-31 2-40 3-60 2-30 3-36 2-21 3-16 2-13 3-00 2-07 2-87 2-01 2-75 1-96 2-65 1-92 2-57 1-88 2-49 . by permission of the author and publishers. pp.5. ii 161 and 1 % Points for the Distribution of the Variance Ratio.3 of Statistical Methods by G. 1956. from Table 10. Snedecor (5th Edition. F (Adapted.

0-24 X 0-16 = 0-038. correct to two decimal places. = 500. The required 1% point is 2-18 + 0-038 = 2-22. > 24 we proceed as illustrated below: (а) To find the 5% point of F when = 200. 120/58 = 2-18. We add this to 1-92. the 6% point for i>. = 65. The 8% point for = 24 is 2-16. the square roots of the 5% and 1% points of F for v.8). = 8 is 3-51 and when i>l = 12 is 3-17. (c) To find the 8% point of F when k. 0-18 x 0-34 = 0-06. The 5% point for i>t = 40 is 2-61. Continued 6 2-60 3-87 2-57 3-81 2-55 3-76 2-63 3-71 2-61 3-67 2-42 3-47 2-34 3-29 2-25 3-12 2-17 2-96 2-09 2-80 8 2-45 3-56 2-42 3-51 2-40 3-45 2-38 3-41 2-36 3-36 2-27 317 2-18 2-99 2-10 2-82 2-02 2-66 1-94 2-51 12 2-28 3-23 2-25 3-17 2-23 3-12 2-20 3-07 2-18 3-03 2-09 2-84 2-00 2-66 1-92 2-50 1-83 2-34 1-76 2-18 24 2-08 2-86 2-05 2-80 2-03 2-75 2-00 2-70 1-98 2-66 1-89 2-47 1-79 2-29 1-70 2-12 1-61 1-95 1-52 1-79 00 1-84 2-42 1-81 2-36 1-78 2-31 1-76 2-26 1-73 2-21 1-62 201 1-61 1-80 1-39 1-60 1-25 1-38 1-00 1-00 1 "l20 \ 4-35 810 4-32 8-02 4-30 7-94 4-28 7-88 4-26 7-82 4-17 7-56 4-08 7-31 4-00 7-08 3-92 6-85 3-84 6-64 2 3-49 5-85 3-47 6-78 3-44 6-72 3-42 5-66 3-40 6-61 3-32 5-39 3-23 5-18 315 4-98 3-07 4-79 2-99 4-60 3 3-10 4-94 3-07 4-87 3-05 4-82 3-03 4-76 301 4-72 2-92 4-51 2-84 4-31 2-76 4-13 2-68 3-95 2-60 3-78 4 2-87 4-43 2-84 4-37 2-82 4-31 2-80 4-26 2-78 4-22 2-69 4-02 2-61 3-83 2-52 3-66 2-46 3-48 2-37 3-32 5 2-71 4-10 2-68 4-04 2-66 3-99 2-64 3-94 2-62 3-90 2-53 3-70 2-45 3-61 2-37 3-34 2-29 3 17 2-21 302 21 22 23 24 30 40 60 120 00 Note to Table 8. The required 5% point is 2-52 + 0-016 = 2-54 correct to two decimal places. 2-34 .6. The difference between the two given 6% points is 0-23. 0-18 X 0-09 = 0-016. Hence the required 1% point is 3-17 + 0-06 = 3-23. 0-12 of this difference. we find that the probability differential of F transforms into that for 1. enter the Table at * i = 4. The difference between the two known 1% points is 3-61 . the 1% point for >•. simultaneously putting f l = 1 and i. (1) To find the 1% point of F when v1 = 12. 120/600 = 0-24. = 60 is 2-82. .6 (1) To find the 5% and 1% points for values of Vi or vt not given in the above table. 24/11 = 2-18. Divide 24/24 = I. in fact.3 17 = 0-34. v. The 1% for v. = 21. = oo is 2-18. enter the Table at >•. = °o is 1-92. 2-61 . Thus we may use the F-tables to find the 5% and 1% points of /.162 statistics Table 8. They are. obtaining 1-95. = 4. 120/40 = 3. = 120 is 2-34. (2) If we make the substitution F = t1 in (8. (б) To find the 1% point of F when = 11. divide 24/200 = 0-12. y. = 18. 120/120 = 1. (See also 10. = v. = 21. divide 24/ao = 0. 24/8 = 3. The 1% point when ». the 5% point for y.3). •>.2-52 = 0-09. correct to two decimal places. 120/oo = 0. enter the Table at v. = 18.2-18 = 0-16. when Vj > 8 and v.6. 0-12 x 0-23 = 0-0276. enter the Table at v. 24/12 = 2. •= 1. y. = 12. 120/60 = 2.

(0-034)2 = 0-000924 = 0-0304 = 1-066 . 0-0001 0-0025 0-0016 — -0-01 0-01 0-02 0-05 0-03 0-04 0-24 0-0001 0-0001 0-0004 0-0025 0-0009 0-0016 0-0098 x = = sx2 = = sx = 110 + 0-24/10 1-124 2 (x . ii 163 We tabulate the working as follows : Reading of voltmeter A (X). 0-02 -0-04 -0-08 -0-02 0-01 -0-05 -0-04 -0-07 -0-05 -0-02 -0-34 X = 1-10 .sample and population. 0-01 0-05 0-04 — (x .0-034 Sj.» (X .(x . 1-12 1-06 1-02 1-08 111 1-05 1-06 1-03 1-05 1-08 (X .MO)1 0-000404 0-0201 Reading of voltmeter B (X).1-10). 0-0004 0-0016 0-0064 0-0004 0-0001 0-0025 0-0016 0-0049 0-0025 0-0004 0-0208 = 0-00208 .l-10)a/10 . Ill 1-15 114 1-10 1-09 1-11 1-12 115 1-13 1-14 (x .1-10)2.1-10)1.1-10).

1 at v = 9 we find that the value of t exceeded with a probability of 0-01 is 3-25. Since the samples are of equal size. and we have no reason to reject the hypothesis that there is no difference in consistency between the two voltmeters. In other words. the voltmeter reads low.S. For voltmeter B: | t | = 0-0340 x 9*/0-0304 = 3-36. Since the value of t here is positive. the same number of degrees of freedom. in this case. Is this compatible with the hypothesis that the mean length of the eggs of this bird is 0-99 cm? 2.3-07 = 0-16. while that for = 12 is 3-07. we read that the 6% point of F for = 8 is 3-23. 0-16 x 0-67 = 0-107.6 at v2 — 9. A group of 8 psychology students were tested for their ability to remember certain material. F = u2jv2. Once again the value of t is significant at the 1% level and we conclude that. not significant at the 5% level. The result is therefore significant at the 1% level. where u2 > v2 and u2 and v2 are unbiased estimates of the population variance based on. therefore. 3-23 . 24/9 = 1-67. and their scores (number of items remembered) were as follows : A B C D E F G H 19 14 13 16 19 18 16 17 They were then given special training purporting to improve memory and were retested after a month. Scotes in two tests : 21 19 16 22 18 20 19 21 23 16 24 17 17 16 Compare the change in the two groups by calculating t and test whether there is significant evidence to show the value of the special training. The value of F obtained. since t is here negative.) .164 statistics For voltmeter A: \t | = 0-024 x 9^/0-0201 = 3-68. 9. we assume that the samples are from populations of the same variance. correct to two decimal places. To test whether there is evidence that one voltmeter is more consistent than the other. but was given no special training.S. Entering Table 8. A sample of 14 eggs of a particular species of wild bird collected in a given area is found to have a mean length of 0-89 cm and a standard deviation of 0-154 cm. Entering Table 8. 24/8 = 3. 24/12 = 2. EXERCISES ON CHAPTER EIGHT 1. Scores then : A B C D E F G H 26 20 17 21 23 24 21 18 A control group of 7 students was tested and retested after a month. the voltmeter A definitely reads high. Therefore the 5% point of F for Vl = va = 9 is 3-177 = 3-18. is. we set up the null hypothesis that there is no difference in consistency. Do you consider that the experiment was properly designed? (R. we have F = sx2js2 = 0-000924/0-000404 = 2-29. 2-29.

22-3. on 25 targets each. 22. 2. 20-9. 26. Initial scores in control group too high for control to be useful. that of control group highly insignificant. 28. 21-9. 22-8. A 22-0. 20-0. 21-0 Is there any evidence from these data that the cuckoo can adapt the size of its egg to the size of the nest of the host? Solutions 1. 4. 21-2. 22-1. i. 22 0. 23-9. Two marksmen. 24. 22-0.sample and population. 25. Another sample of 5 values : 21. Latter has given the following data for the length in mm of cuckoo's eggs which were found in nests belonging to the hedgesparrow (A). Not significant at 0-02 level. P and Q. . 21-0. reed-warbler (B) and wren (C): Host. A sample of 6 values from an unknown normal population : 20. Ascertain whether one marksman may be regarded as the more consistent shot. 23-5. 20-9. 21-2. 23 0 B 23-2. ii 165 3. 21-0.A. 21-5. 21-7. 23-8. 22-9. 25. obtained the scores tabulated below. . 23-1. 21-6. 25-0. 24. 22 0. Score . although there is evidence that Q is more consistent this could arise from random variation alone. 93 94 95 96 97 98 99 100 Total Freouencv( P 2 1 4 0 5 5 2 6 25 frequency-^ q 0 2 2 3 3 8 5 2 25 (I. 22-2. 4.e. 20-9. 27. Show that there is no good reason to suppose that the samples are not from the same population. Evidence of improvement in test group highly significant. 23-8. 21-6. 24-0. 23-1. 26. 22-8 C 19-8. F not significant at 0 05 point. 23-0. 22-0. 20-8..) 5. 20-3.

described in the previous chapter encourages us to embark on a much wider investigation. this is the case. A. reasonable to divide our sample into five sub-samples or] classes. Eleventh Edition. Con-. Then if we could split up the total variance of our sample into two components—-that due? to variation between t h e mean milk-yields of different breeds and t h a t due t o variation of yield within breeds. 1950. Fisher. using the evidence provided by our sample : Does milk-yield vary with the breed of cow ? other words. ! To do this we first set up the null hypothesis that the factorj according to which we classify the population and. ths\ sample values of the variate. These fifty amounts (in litres.. i.1. . does not influence milk-yield. t h a t of the analysis . F. Now t h e herd may consist. are milk-yield and breed connected ? Or. as " The separation of the variance ascribable to one group.CHAPTER NINE ANALYSIS O F VARIANCE 9.e. on the further assumption t h a t the populations sampled is normal.—and we want to find an answer to the following problem. has no effect on the value of the variate. each class into which we divide our sample will itself be a random sample from one and the same population. etc. Jersey. The Problem Stated. 211). If. according t o breed. sequently any unbiased estimates we may make of the population variance on the basis of these sub-samples should ba! compatible and.' indeed. of five different breeds—Ayrshire. of causes from the variance ascribable t o other groups " (Statistical Methods for Research Workers. Sir R. in As a first step towards answering this question. perhaps. of variance. therefore. p. This important statistical technique has been defined by its originator. we could! subject these two components to further scrutiny. it would be. The test of significance of the variance-ratio. in our present case t h a t breed (the factor of classification). Suppose t h a t from a large herd of cows we pick fifty animals a t random and record the milk-yield of each over a given period. these estimates should not differ significantly 166 . say) are a sample of fifty values of 6ur variate.

We should have to conclude. Let the mean of the ith class be x%.Xi. Consider a random sample of N values of a given variate x. This would present us with a problem of analysis of variance with two criteria of classification. for all i. let the yth member of the i t h class be Xi. not merely the influence of breed on milk-yield. The sample values m a y then be set out as follows : Class 1. i =1 Also. each presents its own particular problems. N = mn.• = N. each class has n members and. t h a t our null hypothesis was untenable and t h a t milk-yield and breed are connected. One Criterion of Classification. S (x^ .) = 0 j= i . have to analyse. of course. . should they be found to differ significantly. but also t h a t of different varieties of feeding-stuffs..e.analysis of variance 167 when subjected to a variance-ratio test. for all i. 9.. . but are in fact drawn from several different populations brought into being as it were by our method of classification. Let us classify these N values into m classes according t o some criterion of classificam tion and let the ith class have M. for instance. Although the general principle underlying t h e treatment of all such problems is the same. *2f Xl„l X2n s Xint xtl Class i x n *<2 x a Class m x ml X mi Xm "m I t is frequently the case that. We may. members.1) . *11 X11 *1. i.. Problems arising from three or more criteria are also common. Class 2. Then S «. However.2. and the general mean of the N values be x. (9.2. the problem is seldom as simple as this and m a y involve more than one criterion of classification. Then. in short. rn = n. In practice. consequently. we should have t o conclude t h a t our sub-samples are not random samples from one homogeneous population.

— x. a f t e r t h e variation between classes has been separated out f r o m the total variation.-. e ( \i . \i=ij-i / a 2 is t h e variance of t h e parent population. The right-hand side of t h e equation shows t h a t this " t o t a l variation " m a y be resolved i n t o two components : one.x.2) : e ( 2 2 (Xij.)2 m m rn = 2 «<(*.2.)2 + 2 2 (9.m)a2. the other. + Xi.f + 2(xt. Taking expected values of t h e terms in (9. each class into which t h e sample is divided by this factor will be a r a n d o m sample f r o m the parent population.2) i= 1 i=lj'=l The left-hand member of this equation is t h e t o t a l sum of t h e squared deviations of t h e sample values of t h e variate f r o m t h e general m e a n . is t h e variation which would have resulted h a d there been no variation within t h e classes.)A = 2 \£ 2 (Xi.j j= l j = i xt.*. In short.)1.2.x.xt.2.)2 + m(xt..-. Total variation = Variation between classes + Variation within classes Assuming t h a t t h e classifying factor does not influence variate-values. . . measured b y t h e second t e r m on the right-hand side.Xi.)2 + m(xi. Also m iii \ nt p / * H \-i 2 2 (Xij..1)0*..168 168 s t a t i s t i c s Consequently. it is therefore t h e variation between classes.. t= i where .) 2 (x.Xi.. .) =1 i ni i =l m = 2 (Xi.)2>l ={N . is t h e residual variation within classes.Xi. for all i. this is easily seen b y p u t t i n g x^ . it is a measure of t h e " t o t a l variation " . . .) = 2 (Xij — xt.*. nt ti( 2 2 2 (Xij — x. measured b y t h e first t e r m on the right-hand side..l)o 2 = (N . — x. (in virtue of (9..1)) i=i Hence m ni 2 2 (Xij t = 1j=1 x.) rn = 2 (xn — Xi.)*) =Ij =i / f = lL \ j = i /J m = 2 (m ..

— x.. m S fii(x{.1)<J2 . m . m — 1 N —m N . We now impose the restriction that the population sampled be normal.. the estimate of o2 from the variation within classes must..*.*. S n{(x.. . therefore.x{.. and m rti 2 S (x{j . e ( S m(xt. . from the conditions of the problem.analysis of v a r i a n c e 169 Consequently. be based upon more degrees of freedom than t h a t of ct2 from the variation between classes.m)a2 = (m - l)a2 Thus. 2 ». With this restriction.)*l <-i (m .(*<• . .. and all the conditions for applying either Fisher's ^-test or Snedecor's version of t h a t test are fulfilled. We now draw up the following analysis of variance table : Analysis of Variance for One Criterion of Classification Source of variation.(N .)' ni Degrees of freedom.)» m ni m ni 2 2 (x„ .)*l(N .. . viz. i=lj=l based on N — m degrees of freedom. (9.m). to take the estimate of o 2 from .. based on m — 1 degrees of freedom. providing our null hypothesis stands.X„)2J = (N .x. Between classes Within classes Total m i-i m Sum of squares.)•/ f-n-i (N-m) — S S (x{.2. So far all we have said has been true for any population.)2/(m .1) 2 2 (xtl .*. both N and m are greater than 1. I t is reasonable.x. the two unbiased estimates are also independent.2) leads us to two unbiased estimates of a 2 .)» (-U-1 Since. of necessity.1).1 Estimate of variance.

r42/»<) = s s v . If. but not considerably so. a shift of origin does not affect variance-calculations.*)2 = S (77/n.S S V . and T. the other estimate of a 2 . then. This also contributes to reducing the arithmetic.s i w .= ij ij ii T2/N (6) S S (*„ . we m a y straightaway conclude t h a t there is no evidence upon which t o reject the hypothesis. any change of scale will not affect the value of this ratio.*. (2) We note t h a t : (а) since the variance of a set of values is independent of the origin. Machine. (б) since we are concerned here only with the ratio of two variances.)2 = s ( s (*«. Therefore N = nm = 60. we m a y test whether it is significantly greater by means of a variance-ratio test.*)2 . Diameters in m x 10~5 A 12 13 13 16 16 14 15 15 16 17 16 16 18 17 B 12 14 14 19 20 18 14 21 17 14 C 19 18 17 17 16 15 24 27 24 20 21 D 23 27 25 21 26 E 16 13 17 16 15 15 12 14 13 14 F 16 17 16 15 16 16 13 18 13 17 (Paradine and Rivett) Treatment: (1) We set up the hypothesis that the machine means are constant.) . is smaller than this. and m = 0. We set up the following table : (c) . We take our new origin at * = 20.) 2 ) = S ( s * „ 2 . however. even if the null hypothesis is untenable. . for all t. If.T'lN ' ' ' i 1 (4) In our example.Nx". «< = 10. 9. The following data give the diameters at ten positions along the wire for each machine. Examine whether the machine means can be regarded as constant.i. (3) We may further reduce the work of calculation as follows : Let T = 2 2 xt. = li i Then (a) 2 X ( x t j .170 statistics the variation within classes as the more reliable estimate. t h a t from the variation between classes. Six machines produce steel wire.3. but in the present example is unnecessary. Worked Example. it is greater. We may therefore choose any convenient origin which will reduce the arithmetic involved.

5. 10. 4. 2. To examine .5 3 -5 -5 -4 -3 9 36 25 25 16 20 18 . 14 7. 1.analysis Positions of variance 171 Machines (m = 6).3 6 0 -2 0 4 16 15 . of the diameter of the wire produced. 28-6 — — Entering Table 8. variance.153-4 = 228-6 829-8 Degrees of Estimate of freedom. 12 13 13 16 16 -8 -7 -7 -4 -4 64 49 49 16 16 15 15 16 17 . 8.382-0 . A (» = 10) Tt Ti'tn ! 280-9 305 6. 1. Between machines Within machines Total Sum of squares. Now let us consider the case where there are two criteria of classification. there is a significant variation. 3. reject the hypothesis that there is no difference between the machine-means. 5 6x9 = 54 59 120-3 4-2 — F. Suppose.153-4 . from machine to machine. Two Criteria of Classification. In other words. 9.552-1 = 601-3 1. we classify our cows not only according to breed but also according to the variety of fodder given them. for instance. va = 54. We.5 5 -8 -6 -7 -4 -7 -3 -4 -5 -6 -6 64 36 49 16 49 9 16 25 25 36 13 18 13 16 17 15 15 16 16 17 . therefore. accordingly— Source of variation. 9. we find that this value of F is significant at the 1% point.1.6 at = 5.4.3 2 B 12 14 14 16 16 18 17 19 -8 -6 -6 -4 -4 -2 -3 -1 64 36 36 16 16 4 1 9 14 -6 36 23 3 9 21 129-6 186 C 1 -3 -6 1 9 25 5 25 17 14 36 -1 -2 -3 -3 -4 -5 1 4 9 16 25 9 26 6 36 24 4 16 27 7 49 24 4 16 20 0 0 19 18 17 17 102-4 146 D 27 7 49 21 1 1 21 + 38 1 1 144-4 202 E 12 14 13 16 13 17 16 15 15 14 .4 4 -7 -2 -7 -4 -3 -5 -5 -4 -4 -3 49 9 25 25 16 16 9 4 49 16 302-5 325 F 193-6 218 T'jN = 182»/60 == 552-07 T => s< Ti'im < J -183 = 1153-4 = 1382 The analysis of variance table is.

— x.)2j = (m - l)o 2 . while t h e third term.4.1 ' . t h a t due t o breed. t h a t due t o variety of fodder. i. . . 259-262) t h a t t h e three t e r m s on the right are independent.*. National Mathematical Magazine.. ... t h e residual variation.x. I I .x . assumed t o be normal.. pp.e.172 statistics where milk-yield' varies significantly with breed a n d with diet. t h e residual term.*. finally.j — x. " On t h e Difference between Two Sample Variates ". the variate. Let the sample variate-value in the i t h A-class a n d jth Bclass be Xij.!)(« . . measures t h e variation in x remaining a f t e r the variation due to t h a t between A-classes and t h a t between B-classes has been separated o u t . T . (1937). due t o unspecified or unknown causes.j + (9. Let our sample be of N values of x. \ S S (Xij . ( 1 0 4 2 ) ( and m n e { l m ( x .)* + -f £ £ {Xij — Xi. we have ( ( m n \ Z i=lj=] m 2 (Xij . such t h a t N = nm. using t h e m e t h o d of t h e last section. we m u s t analyse t h e variation in yield into three components..(mn — l)o 2 — (n — l)a a — (m — l)tj 2 = (m ..y + i=l )= 1 i= 1 m n m n £ 1 m(x. t h e second t e r m is the sum of squared deviations from the general m e a n if all variation within B-classes is eliminated.I)*. and. r ) = ( n . . if each item in a n A-class is replaced b y t h e mean value of t h a t class. into n classes.)A i-= l j .. The reader should verify.l)o a . once again we assume it t o be normal and d u e t o unspecified or unknown causes. Craig.)a = S n(x(. £ n(xi. Since it has been shown (A. and.)') = (mn »' l)o a .j + x. vol. t h a t E E (Xij . according t o another factor B.x. a n d let us classify it according t o some factor A into m classes.1) i-lj-1 The first term on t h e right-hand side of this equation is t h e sum of t h e squared deviations f r o m t h e general mean if all variation within the A-classes is eliminated.x.

the values 1.— X..) . 3.. 3. 7th row . j.j i-ij-i (n . QB and QA x B respectively. and (б) test QBIQA x i for m . S n(x<. Three Criteria of Classification.)2/{n .1 and (m — l)(n — 1) degrees of freedom using the F-table. A and B. .x. classified according to three criteria.. The test procedure is. .1 )(n. We now assume t h a t we have N = Imn sample values of a given normal variate. m —1 n .s + *. C. . m 2 n(x.1 Estimate of variance.)' classes i-i Residual (A x B) Total m n S S (xtj .x. % = ( S Xijh\ /n = mean of values in i t h group.f.5).j — Xi. 2. A.1) 2 2 (X. 9.. i takes the values 1.analysis 2 of variance 173 Here o is.x. m\ and k. of course. n. .x.)2 Degrees of freedom. into I groups of m rows and n columns. the variance of t h e parent population. Between Aclasses of Variance for Two Criteria of Classification Sum of squares. (m . 3.. . — x. The analysis of variance table is. We shall use the following notation : = general mean ..x..)2/(w» . . .. 2.1) m n S S (xtJ . B. .1) .1) i-1 JLm(x. the values 1.. Let be t h e value of the variate in the7th row and Ath column of the ith group (see Table 9. . assumed homogeneous with respect to t h e factors of classification. .S mix. 2.)2 mn — 1 — i -1j-1 Let us call the three resulting estimates of a 2 . .)» "+ ij)l(m . therefore : Analysis Source of variation. as follows : (а) Test QAIQA X B for m — 1 and (m — 1 )(n — 1) degrees of freedom using the F-table. QA.1) n Between B.5. then.

k — X. The identity u p o n which t h e analysis of variance table is based is : l £ m £ n I £ (xijk . CI separately influences t h e variate-values.)* t=l* =1 I m + n £ £ (% — — % + x.)*. we have spoken of " interactions " between t h e factors A.! When. = ( £ \<=1 ..li=l k=1 £ £ xijic) !mn — mean of «'th group ..)* m n + kl £ (x.ic = ( £ m \ £ Xijic) flm — mean of ftth column.x.....k .)a 3 = 1 i= 1 Z n + W £ £ (*« — Xi.6.174 statistics = ^S^Xijtj jm = = \ „ „ „ „ „ „ ithgroup.Y + Im £ (x. Table 9.Athcolumn. x...!c + X. + x..x.. I n the analysis of| variance for three criteria of classification.. B a n d C.Y = mn £ (*<„ .5 will m a k e t h i s notation clear..... — x. n » £ x^) /In = mean of jth row . — X.j. (9.. The Meaning of " Interaction ".i + X. therefore. we h a v e three or more criteria of classifica-1 .)2 t=ij=i m n £ £ (Xijk — x.*.... I x. + Xi.1) m n l + £ i=li=Ur=l T h e resulting analysis of variance table is given on page 175. t h a t a n y two of these factors acting together m a y do so.j. + x. B.j.j.. .jic — — Xij. > / . i x. B u t it is conceivable . — x. yth row...jk = ^ £ Xiji^ II ( m n .)* l l + I £ £ (Xijlc — X.5. /Hh c o l u m n . W h a t d o we' mean b y this t e r m ? Our null hypothesis is t h a t no one of t h e factors A. 9..=.

SS 1 OO• « Xlll Columns (Factor C) k = 1 to n . it is necessary to test the estimates of a 2 based on the " interaction " terms before testing the variations due to the individual factors separately.. 3 \ .x • • • • X • X. k . If none of these " interactions ". and then proceed to test the estimates due to the individual factors. X. « X. • • • X1U • • X1}t . • xlln • Un • V 1 *W • Xl-l • x O p.k . if.. T a b l e 9. . however. Sample for Three Criteria of Classification 1 .ml • X. one.5...m Xlml • S q Means ^lmn m* X\" Xll- • xUn • Xa„ • X x "0 ® * 1 j XiU • xi}l Ximl X. • ljh • • lmlc • • • • i-k • . X -1.1 * ' . • X l~ 1 3 ' . .l. X. is significant. % *tm X • ut • • lmk • • lml • • • X Means • Uk • • x.ii • X. 1 0 • I ' I < H i O w a! P* to P. when tested against the residual estimate of a 2 .u .l .ln X. xm x X X x Means ~ Xll. of the estimates due to interactions is . we may pool the corresponding sums of squares. or more. • Xnt . Xll. based on more degrees of freedom.analysis of variance 175 tion. ..: J X x -mk • • '«m -m- M e a n s of Columns • x. .. • • • Xx. form a revised estimate of a 2 . .t . • Xi mi .• Means c Rows x .i Xlll Xljl x V m % x m • imn • i>n • lln • lln • Xlmn • l-n X X x x tmi- Means X O * i 8£ "S2 !1» «« 1 • • m .

kJ»/ (l~ 1)(« .1) « h ~ *<" 5 .)2 ~ p SunTof squares.X(.1) 2 E E {xm ... — x... ....k — *...)2 Residual (A x B x i-lj-lt-I S S + 2 (xijk . . — x.x{ ' (ft 1) i-H-lt-l 1 +%+ ' +*. 1W n .jt _ x.% + *.)2 j-i ' (*••* ~~ m Between columns Im 2 (x.I){m ..)2 1 i-i mT S l hi m (*.„ ... tnfi ' l — l -.xq.k + x.Jt . 1 r S (x{„ — x.+ *_)' {»-!)(«-1) {m _ 1){n _ ^ S (x. + X.k . .. J™ " .x ...Analysis of Variance for Three Criteria of Classification Estimate of variance. S^l $ Between groups m — 1 n — 1 „ — 1 Ifyi n Between rows s i mn S (x.J.k + 2 Interaction between ' » rows and columns Vx* (B X C) _ . _ .1) (M ...„ „ ..w„ ~ *<" ~ + (l ~ 1)(" ~ •» v v ( / . . +X.„)a n Interaction between groups and rows <A X B> n 1} I S . ~ *<•• ~ ** + *-)' (* ~ ^ ~ « .l)(w - Interaction between •» columns and ' £ groups (C x A) " .l)(m .. .»„ n n ' * ...1) ..k . {I . . / > ..n.x{.x.)» w " o ~ _ x. — i-1 m kl 2 (x. .

i j i i i i *)' ' £«<(*< . This is just another reason why the closest co-operation between statistician and technologist is always essential. each criterion grouping the data into five classes.T*IN.T*IN < 1 1 1 . The results are as follows (recovery time in days) : Doctor.7. however. Worked Examples. Significant interactions.*)' = S(77/«. 23 24 20 17 19 4. and N = 25.x)* = SSV . do not n e c e s s a r i l y imply real interactions between the factors concerned.3 Treatment (3)) : . (the number of items in row i) = nt (the number of items in column ») = 5. Five doctors each test five treatments for a certain disease and observe the number of days each patient takes to recover.T'/N•. would appear to be a sensational result. 1. and (b) treatments. A B C D E 10 11 9 8 12 2. (A.) . Thus w. 14 15 12 13 15 Treatment.) . With the suffix » denoting row and the suffix j denoting column. at first sight. but never more so than when the statistician finds something apparently of " significance ".) Treatment: This is a problem involving two criteria of classification.Nx* = SSV . 20 21 19 20 22 Discuss the difference between : (a) doctors. 1. 9.I.S. • i i 1 S S (*. let r = SL*„. The working scientist and technologist are only too aware that. Transfer to a new origin at 16.S * „ . we have reason for suspecting the null hypothesis. 3. 9. r 4 . experiment may give rise t o what. 18 17 16 17 15 5. = then (cf. .analysis of variance 177 found to be significant. from time to time.(g. -*)' = £ (Tfln. but t h a t closer scrutiny may well reveal t h a t it is due to chance heterogeneity in the data and is no sensation after all.*)» = S n.

The analysis of variance is. 4 4 16 24 Estimate of variance. Between doctors Residual.178 s t a t i s t i c s We now draw up the following table : Treatment. 2. -2(4) -1(1) -4(16) -3(9) -1(1) -11 121 31 7(49) 2(4) 8(64) 1(1) 4(16) 0(0) 1(1) 1(1) 3(9) -1(1) 23 529 139 3 9 7 Tt.' 900 484 102 J2043 469 22 V 1. 2 77 = 139 469 V 190 Consequently. > (iii) Sum of squares for doctors. 101-66 6-46 216 — F.1-96 = 467-04. then : Source of variation. 4(16) 5(25) 3(9) 4(16) 6(36) 22 77. 25 64 16 25 9 V109 116 90 91 63 T. 2 (T^/nJ . 47-00** 2-99 — — Entering Table 8.7*/25 = 469 .} — T*IN = 2043/5 .1-96 = 406-64. S S V — T-jN = 469 . 4. ' (iv) Residual sum of squares = 467-04 — 406-64 — 25-84 = 34-56. ' ' (ii) Sum of squares for treatments. v. = 16. 2 {Tfln. we find the 5% and 1% points of F to be 3-01 and 4-77 respectively. (i) Total sum of squared deviations.T*/N = 130/5 . 406-64 25-84 34-56 467-04 Degrees of freedom. 5 8 -4 -5 3 T = 7 ET.6 at = 4.1-96 = 25-84. Total . 3. Sum of squares. We conclude. Between treatments . there-1 . -6(36) -5(25) -7(49) -8(64) -4(16) -30 Doctors 5.

John Wiley. instead of one. while that between treatments is highly so (highly significant at the 1% level).T'lN = 812-41 . I Reading number. There are 3 readings on each of 9 rolls from each lot. Rider (An Introduction to Modern Statistical Methods. 6. the lot number and the reading number each dividing the data into three classes. .659-35 = 153-06. t h a t we have here three criteria of classification—the roll number dividing the data into nine classes. 1-5 1-5 2-7 3-0 3-4 2-1 2-0 3-0 5 1 1-7 1-6 1-9 2-4 5-6 4-1 2-5 2-0 5-0 1-6 1-7 2-0 2-6 5-6 4-6 2-8 1-9 4-0 1-9 2-3 1-8 1-9 2-0 3-0 2-4 1-7 2-6 1-5 2-4 2-9 3-5 1-9 2-6 2-0 1-5 4-3 2-1 2-4 4-7 2-8 2-1 3-5 2-1 2-0 2-4 2-5 3-2 1-4 7-8 3-2 1-9 2-0 1-1 2-1 2-9 5-5 1-5 5-2 2-5 2-2 2-4 1-4 2-5 3-3 7-1 3-4 5-0 4-0 3-1 3-7 4 1 1-9 II III We shall carry out the appropriate analysis of variance assuming. 1 2 3 1 2 3 1 2 3 Roll number. for the time being. This will illustrate the method employed in such a case. 3.analysis of variance 179 fore. 9. First Treatment: (1) We draw up the table as shown at the top of page 180. 8. P. 7. given for each Lot x Roll. that the difference between doctors is hardly significant (at the 5% level). we shall regard the data as classified by two criteria (roll and lot) with three values of the variate. New York) quotes the following Western Electric Co. 2. 2. data on porosity readings of 3 lots of condenser paper. less artificially. + 3-7! + 4-l« + 1-9') <<* = 812. 4. 5.41 and T = 23M. iV = 3 x 3 X 9 = 81 give T*/N = 659-35 The total sum of square deviations from the mean. R. Thus E 2 S V = (1-5* + 1-5' + 2-7* + . Then. . Porosity Readings on Condenser Paper Lot number. £ £ £ xijt> . 1. .

6. 7. 6. 7.659-35 = 26-31 (why do we here divide by 9 ?). i. . 4. 1 II Total x roll table : Total (Lots). 7-3 6-8 8-1 8. ..e. this time. 1-5 1-5 2-7 3-0 3-4 2-1 2-0 3-0 5-1 1-7 1-6 1-9 2-4 6-6 4-1 2-5 2-0 5-0 1-6 1-7 2-0 2-8 6-6 4-6 2-8 1-9 4-0 4-8 4-8 6-6 8-0 14-6 10-8 7-3 6-9 14-1 1-9 2-3 1-8 J-9 2-0 3-0 2-4 1-7 2-6 1-5 2-4 2-9 3-5 1-9 2-6 2-0 1-8 4-3 2-1 2-4 4-7 2-8 2-1 3-8 2-1 2-0 2-4 6-6 7-1 9-4 8-2 6-0 9-1 6-8 8-2 9-3 2-6 3-2 1-4 7-8 3-2 1-9 2-0 1-1 2-1 2-9 5-5 1-8 8-2 2-8 2-2 2-4 1-4 2-8 3-3 7-1 3-4 8-0 4-0 3-1 2-7 4-1 1-9 8-7 18-8 6-3 18-0 9-7 7-2 8-1 6-6 6-8 19-0 27-7 22-3 34-2 30-3 27-1 21-9 18-7 29-9 1 2 3 24-3 26-8 26-8 77-9 19-6 22-6 24-1 66-3 28-2 26-1 38-6 86-9 231-1 69-1 78-8 T o t a l (Lot III) Total (Lot II) T o t a l ( L o t 1} II 1 2 3 Totals III 1 2 3 Totals Total (Rolls) Total (Readings) Grand Total 24-3 + 19-6 + 2 6 . a two criteria classification) is thus (4-8* + 4-8* + 6-6* + .280-37/3 . + 8-12 + 6-62 + 6-52) = 2. 100-77 . 9. Roll. 1 2 3 Totals 1. . while the sum of squared deviations for l o t s is (77-9* + 66-31 + 35-6a)/27 . .280-37. 3. RoU. 8. Note that we divide 2.e. 9. III (Rolls) 4-8 4-8 8-8 7 1 8-7 18-8 6-8 8-0 14-8 10-8 9-4 8-2 6 0 9-1 6-3 18 0 9-7 7-2 6-9 14-1 8-2 9-3 6-6 6-6 77-9 66-3 38-6 19-0 27-7 22-3 34-2 30-3 27-1 21-9 18-7 29-9 The sum of squares for l o t x r o l l classification (i. + 18-72 + 29-92)/9 . and the sum of the squared deviations from the mean is 2. 4. divide by 27?). 8.26-31 . the residual sum of squared deviations.7-90 = 66-56.180 statistics Lot.280-37 by 3 because each entry in the body of this l o t x r o l l table is the sum of three readings.659-35 = 7-90 (why do we. 3. 2. Finally. 2. 1. The sum of squared deviations for r o l l s is (19-0* + 27-72 + . Totals.659-35 = 100-77. 8. now called Interaction (Lot x Roll) is found by subtracting from the total sum of squared deviations for this classification the sum of that for Rolls and that for Lots. I Reading.2 26-8 + 22-6 + 26-1 26-8 + 24-1 + 35-6 88-8 J | 1231-1 (2) We draw up the following l o t Lot. ..

l o t s x r e a d i n g s and r e a d i n g s x R o l l s are not significant. that 1the levels of significance are unaltered when this new estimate of o is used. We already have the sums of squared deviations for l o t s ( = 7-90) and that for r e a d i n g s ( = 5-73).086-25. I II III Reading. The sum of squared deviations from the mean is.5-73 = 3-27. e Q 4 + 1 6 + 32 °'89' We find. + 26-la + 35-6a = 6.7-90 . as the reader will confirm for himself. 8-6 10-0 11-7 30-3 6. We have the corresponding sum for r o l l s . . 5-8 4-9 8-0 18-7 9. 12-7 11-1 10-4 34-2 8. we may combine the corresponding sums of squares with that for r e s i d u a l to obtain a more accurate estimate of the assumed population variance. and . We see at once that the Interactions. 1 2 3 Total (Rolls) 5-9 6-1 7-0 19-0 2. (4) The l o t x r e a d i n g table is : Lot. for that matter.659-35 = 5-73.086-25/9 . this is 3-27 + 10 19 + 33-10 _ . while that between l o t s is significant at the 5% level. Interaction r o l l s x l o t s is significant at the 1% level and so is the variation between r o l l s . However. 24-3 19-6 25-2 3. Interaction (Lot x Reading) is 16-90 . then. 6-4 6-9 8-6 21-9 8. 43-23 — 5-73 — 26-31 = 10-19. 69-1 75-5 86-5 Reading. Since the two Interactions.a n a l v s i s of v a r i a n c e l8l (3) The reading x roll table is: Roll. nor.659-35 = 16-90. are they significantly small. 5-9* + 7-0a + 5-9* + . already : 26-31. l o t s x r e a d i n g s and r e a d i n g s x r o l l s are not significant. 2. 7-0 9-5 11-2 27-7 3. The sum of squared deviations for r e a d i n g s is (69-1* + 75-52 + 86-5a)/27 .107-73/3 . . Interaction (Rolls x Reading) is. Total (Reading). (5) The analysis of variance table is shown on the top of page 182.107-73. then. Consequently. . + 8-6' + 8-0a + 8-3J = 2.659-35 = 43-23. 5-9 6-3 10-1 22-3 4. Hence the sum of squared deviations for l o t x r e a d i n g is 6. 1. . 9-8 11-8 8-3 29-9 Here the sum of squares. 7-0 8-9 11-2 27-1 7. 2. 26-8 22-6 26-1 1. 26-8 24-1 35-6 The sum of squares in this case is 24-3* + 26-82 + .

with three values of the variate. being taken. Between rolls . Between lots Interaction Lots) . 26-31 7-90 5-73 Degrees Estimate of of freedom. breaks down. variance. variance. TOTAL Sum of squares. 8 2 2 3-29 3-95 2-87 F. the situation is summarised by the table given in Step II of our first treatment of the problem. This being the case. 8 2 3-29 3-95 F. The corresponding analysis of variance is: Source of variation. Between rolls Between lots Between readings Interaction Lots) . while that between lots is significant. Residual . lots and rolls. 3-39 ** 4-07 * (Rolls x 66-56 52-29 153-06 16 54 80 416 0-97 — 4-29 ** — — Quite clearly.182 statistics Source of variation. our null hypothesis. TOTAL 3-27 4 0-82 — 10-19 33-10 153-06 16 32 80 0-66 1-04 — — — — we conclude that the variation of rolls within lots and that between rolls are highly significant. 3-14** 3-81 * 2-76 66-56 16 4-16 4-00 ** Interaction (Lots x Readings) Interaction (Readings x Rolls) Residual . . (Rolls x Sum of squares. 26-31 7-90 Degrees Estimate of of freedom. of the homogeneity of the condenser paper with respect to these two factors of classification. instead of one. Second Treatment : The conclusion we have reached justifies the view we suggested originally that it is less artificial to regard the data as classified by two criteria only (ROLL and LOT).

when subjected to treatment by two types of fertiliser. D. we shall have some such arrangement as : Strengths M 1 fe 2 1 A E B D C Fertiliser A 2 3 4 B C D E A C A E B D D B C A E 5 E D A C B a s t 4 OK H. in the case of three-factor classification. Denoting the crop-varieties by A. considered t o be schematically arranged in five parallel rows and five parallel columns. We divide the plot of land used for the experiment into 5 s sub-plots. each of five different strengths. say. 5 Now assume that the following figures (fictitious) for yield per acre are obtained : A 3-2 E 20 B 30 D 2-4 C 2-6 A 2-2 E 1-2 D 20 D 1-6 B 2-6 E C 1-8 E 1-6 A 2-6 B 1-4 2-8 B 2-0 A 2-8 C 2-0 C. E. Let us suppose t h a t we wish to investigate the yield per acre of five variates of a certain crop. Then the five varieties of the crop under investigation are sown at random in the sub-plots. 2-4 C D 2-2 B 2-4 A 3-6 1-8 D 2-2 E 1-8 . B. each of the rows is likewise treated with one of the five strengths of the second fertiliser (B). When.8. Latin Squares. each criterion results in the same number of classes.analysis of variance 183 9. n. some simplification may be effected in our analysis of variance by the use of an arrangement known as a Latin Square. this device aims at isolating the separate variations due to simultaneously operating causal factors. C. but in such a way t h a t any one variety occurs but once in any row or column. Essentially. Each of the five columns is treated with one of the five strengths of one of t h e fertilisers (call it Fertiliser A).

The Standard . therefore. 0-465/5 = 0-093. Both the variation between varieties and t h a t due t o the different strengths of Fertiliser A are significant a t the 1% level. The sum of squared deviations for Fertiliser B is 18/5 — 1-44 = 2-16. we may obtain a more accurate estimate of t h e assumed population variance : (2-16 + 5-28)/ (4 + 12) = 0-465. The analysis of variance table is then as shown on page 185.3 ( 9 ) D 1(1) C 2(4) 11 E -5(25) A 0(0) -11 39 C 0(0) A 3(9) 1 4 1 16 C D m M r® 3 4 E C -3(9) -1(1) B 2(4) A 2(4) h c 1(1) B 3(9) 9 63 1 0 1 n 18 E .184 statistics Transferring our origin t o 2-2 and multiplying each entry by 5.6 289 81 0 25 1 2 B A 5(25) . That of the difference of the means of any two independent samples of 5 sub-plots is 2 x 0-093 = 0-186. The sum of squared deviations for Fertiliser A is 340/5 — 1-44 = 66-56. we have : Fertiliser A Totals (B) Totals (Varieties).1 ( 1 ) E c . and the sum of squared deviations for Varieties is 620/5 — 1-44 = 122-56. This is the estimated variance of t h e yield of a single sub-plot. By pooling the sum of squares for Fertiliser B and the Residual sum of squares. The estimated variance of the mean of five such sub-plots is. The sum total of squares is 198 and so the total sum of squared deviations is 196-56. T2/N = 1-44.I P ) -2(4) B D 4(16) .1 5 225 — e E D -1(1) -4(16) 6 620 1 23 -4 26 3 £ v) 47 / 121 121 1 16 81 340 With T = 6 and N = n1 = 25. Strength 1 2 3 4 D -2(4) B 1(1) 5 E -2(4) D 0(0) A 7(49) 0 34 15 84 35 30 6 198 198 (H* 0 A B 17 9 0 .

) When the letters of the first row and first column are in correct alphabetical order. variance. consequently. between the means of any two samples of 5 t h a t is significant at the 5% level is given by : m/0-43 = 2-12 or m = 0-9116. 69-6 *» 37-8** 1-2 — — Error of the difference of the means of two such samples is.280 of order 5. it is the only standard square of that order. TOTAL Sum of squares. Fisher and F. 576 squares of order 4. Thus A B C B C A C A B is a standard square of order three—indeed. 373. (See R. 122-56 66-56 2-16 5-28 196-56 Degrees Estimate of of freedom.248. while none of the strengths of Fertiliser B differs significantly a t the 5% level. (0-186)* = 0-43. 4 4 4 12 24 30-64 16-64 0-54 0-44 — F.210. If a Latin Square has « rows and n columns it is said to be a square of order n. Making Latin Squares. The number of possible squares of order n increases very rapidly with n. for instance. 9. A. Yates. Between varieties Between strengths of Fertiliser A Between strengths of Fertiliser B .9.000 of order 6.analysis of variance 185 Source of variation. Residual . The standard squares of order 4 are : A B C D B A D C C D B A D C A B A B C D B D A C C A D B D C B A A B C D B C D A C D A B D A B C A B C D B A D C C D D C A B B A . m. Statistical Tables for Use in Biological. and 61.428. the square is a standard square. as the reader will easily realise. 161. Agricultural and Medical Research. There are. I t will be seen therefore t h a t all the five varieties differ significantly.278 of order 7. For 16 degrees of freedom the least difference. t h a t only strengths 1 and 5 of Fertiliser A do not 'liffer significantly.

Gains in weight in kilogrammes over a given •eriod were recorded. then. essentially different. If. R 1( R s . To choose a Latin square of given order. t h a t what is required when deriving such non-standard squares is a new pattern or lay-out. we say t h a t the standard squares have been enumerated. using. a table of random numbers. This has been done for n less than 8. non-standard squares may be derived. (L.) 2. yields 4 ! different column arrangements and 3 ! different row arrangements : 4 x 4! X 3! = 576). When all t h e standard squares of a given order have been set down. 25-5 28-5 50 40 R. We thereby obtain a total of 12 possible squares of order 3 (12 = 3! x 2!) and 576 of order 4 (in this case each standard square. R». and permute at random. then. both for selection and permutation. however. 62 41 45 46-5 Ri 22 47-5 41-5 31-5 R. The following results were obtained in four independent samplings : 2 6 14 12 6 5 (1) 6 19 10 17 19 16 (2) 11 19 23 8 17 (3) 11 14 2 29 16 20 (4) 19 Carry out an analysis of variance on these data. the two squares A B C D B C D A D C D B A C A B and D C B C D A B A C A B D A B D C present the same pattern and are. Is there a significant difference : (a) between breeds. I t is important to understand. (6) between rations ? . EXERCISES ON CHAPTER NINE 1.186 statistics From these standard squares all the remaining. we select a standard square at random from those enumerated. B. no different. although for t i ^ 8 a considerable number have been listed. of which there are four. Merely interchanging letters does not suffice.U. Four breeds of cattle B„ B„ B„ B t were fed on three different rations. B« Bt B. we require to derive a non-standard square from a given standard square. For example. we must permute all columns and all rows except the first. therefore.

) 5. 4.548 4. Are there differences between solvents and temperatures.898 4. Is the betweenyears difference significant ? 4.analysis of variance 187 3. 1944 1945 1946 1947 3.556 Fourth Quarter.320 3.758 4. Solvent t = 3 i = 1 i = 2 i = 4 = 1 66-9 68-3 71-2 70-3 66-2 64-6 79-1 66-6 68-6 70-0 71-8 71-8 70-1 69-9 66-2 71-1 = 2 63-4 63-9 70-7 69-0 64-9 62-7 65-9 64-9 67-2 71-2 69-0 69-3 69-5 66-9 66-2 72-0 = 3 66-4 64-1 67-5 62-7 71-6 70-8 68-9 68-8 66-2 67-0 64-0 62-4 73-6 70-4 70-5 72-8 Carry out the appropriate analysis of variance.P. A chemical purification process is carried out in a particular plant with four solvents (i — 1. (Weekly averages. 4) at three different.120 4. Four different tests for the presence of a certain chemical were made on the samples.884 3. 3. 3. The arrangement is shown in the following table with the % by weight of the chemical as determined by the tests. 3.T. For every one of the 4 x 3 = 12 combinations of solvents with temperatures the process is repeated four times and the resulting 48 test measurements are shown below (a low value indicates a high degree of purity).072 3.991 Second Quarter. equidistant temperatures. ^000) First Quarter.836 3.872 3.376 3.752 Third Quarter.310 2.703 Carry out an analysis of variance on these data. the samples being taken at 4 different heights. Letters denote the different tests. Passenger Traffic Receipts of Main-Line Railways and L.B. 2.611 3. Districts 2 4 1 3 1 {2 2 !> 3 3 a 4 A 8 D 6-8 B 6-3 C 5-7 B 5-3 A 4-9 C 4-7 D 3-3 C 4-1 B 4-1 D 4-0 A 40 D 5 C 3-2 A 5 B 4-2 . The atmosphere in 4 different districts of a large town was sampled.U. taken as a whole ? Is there any interaction between solvents and temperatures ? (L.

3. Variation between districts significant at 5% point but not at 1% point. interaction between solvents and temperatures significant at 1% point. . No significant variation between years but that between quarters is highly significant. No significant difference between breeds or between rations. 4. that between solvents is significant a t that point.188 s t a t i s t i c s Is there evidence of significant variation from district to district and between heights in the percentage of the chemical present in the atmosphere ? Can it be said that there is a decided difference between the sensitivity of the tests ? Solutions 1. no significant variation between heights or between sensitivity of tests. 5. 2. Variation between samples significant at 5% point but not at 1% point. Variation between temperatures is not quite significant at 5% point.

The Correlation Coefficient Again. calculated from a sample of N from a bivariate normal population is really significant ? Is there a way of deciding whether such a value of r could have arisen b y chance as a result of random sampling from an uncorr e c t e d parent population ? Linked closely with this problem are several others : If a sample of N yields a value of r = r0. .2. For we shall use t h e technique of analysis of variance (or. the sample correlation coefficient. Let (#. (10. But this. given a number of independent estimates of a population correlation coefficient.1.CHAPTER TEN TESTING REGRESSION AND CORRELATION 10. analysis of covariance) to test the significance of a linear regression coefficient calculated from a sample drawn from a bivariate normal population. Taking the sample mean as origin (x = 0 = y). to testing the hypothesis t h a t in the parent population.-. or. We now return to the problems raised at the beginning of Chapter Seven : How do we know t h a t a value of r. be a sample of N pairs from what we assume to be an uncorrelated bivariate normal population. the regression equation of y on x is y = bj/xX. where byT = s^/s* 2 . . Testing a Regression Coefficient. is equivalent to testing the significance of a value of r. as we shall soon see. 2. p = 0. what is the same thing. more correctly. N).2. yt). 10. how shall we test whether two values of r obtained from different samples are consistent with the hypothesis of random sampling from a common parent population ? Finally. how may we combine them to obtain an improved estimate ? We start to tackle these problems by what may at first appear to be a rather indirect method. how can we test whether it can have been drawn from a population known to have a given p ? Again.1) Let Yi be the value of the ordinate at x = X{ on this 189 . . (t = 1. in this case.

i= 1 i= 1 i=1 i=l = Ns. N A A = E yt" .)Y 4 + S Y«a «= I since Hence Thus : byx — sxyjsx2 A = £ j=i S Xiyi ..2Nsxy*ls** + Nsxy'lsx* + Nsxyt/Sx* = iVs„ a (l . (10.2byx E Xtyt + byx1 E x? + V y 2 i." Then t h e sum of the squared deviations of y t h e y ' s f r o m t h e sample mean ( y = 0) is simply S y. a = E (y< i=) i=l Yi)2 + E Y«2.r*) + Nsv*r* .3. 27ww sample variation of y about the regression line of y on x is measured by Nsyl(\ — j-1).byx S x^ j = 0. while the variation of. or VARIATION ABOUT MEAN = VARIATION ABOUT REGRESSION LINE + VARIATION OF REGRESSION LINE ABOUT MEAN. S «=i Y<) Yj = i ( y t t=i Y<)a + 2 S t=i byxXi)K^> = byX N A - y. B u t i=i £ yi2 = £ (y. V ) s E y.2.\ . \< = i i=i / A S i=i .e. Now (10.2.-2 = Nsy%(l .* .) i= l now seen as a n obvious algebraic identity. i=i The sum of squared deviations of observed values of y from t h e sample mean = sum of squared deviations of observed values of y f r o m the regression line of y on x + sum of squared deviations of t h e corresponding points on t h e regression line from the sample mean.190 statistics regression line. 1=1 >= 1 Yi + Yi)'' A = 2 (yf <=i However.2) m a y be re-written : S yi» = S (y< .S ^ / S x V ) + Nsy* (SxyVSxa .2. the regression about the sample mean is measured by Nsy*r%. (10.2) S y.bvxXi)* + byx2 E i= 1 » i=l i=l. 2 .

Y()*/(N . on t h e other hand.e.Thus corresponding to iV%s( 1 — r2) we have N — 2 degrees of N freedom..2). r\N ' = * l n . 1 N .2~\i "ITZTFJ (10-2-4> i _ » . . y — 0). = l n fN .r*)/(N . Ns/r> Ns„*( 1 .1 Mean square. since the y's are subject only to the restriction t h a t their mean is given (in our treatment. Consequently 2 Y<2 = Ns v l r 2 has but one degree of freedom. Degrees of freedom.2) y . i. If.••»)/ (AT-2) — N Of regression line about Nst*r* mean Residual (of variate Ns*(l .2 /1 = N s f r * should be significantly greater <= l than t h a t provided by S (yi . «= 1 Now suppose that the parent population is uncorrelated (p = 0). the value of z given by . the estimate provided by S V. Sum of squares.r>) about regression line) TOTAL Ns. — Yi)2 there is one additional restraint— «= l that the regression coefficient of y on x shall be byx. t= i We may therefore set out an analysis of covariance table as follows: Source of variation.testing regression a «= 1 and correlation igi To the term S y? there correspond N — 1 degrees of freedom.2 N .2) = Nsv*(l .» If then t h e sample d a t a does indicate a significant association of the variates in the form suggested by the regression equation. the regression coefficient is significant. then the sample variation of the regression line about the mean and the random variation of the y's about the regression line should yield estimates of the corresponding population parameter not significantly different. corresponding x to the sum 2 (y. if there is in fact an association between the variates in the population of the kind indicated by the regression equation.

10. r[{N — 2)/(I — r2)]i is actually distributed like t with N — 2 degrees of freedom. with the result that dp{t) = v»f?(v/2. Relation between the t.6. for. the correlation coefficient.2) /(I . Then F = 0-25 x 14/(1 .6. 2"2 v/ = w k r ) t 1 2v-i = w k T v 2 Then / (1 t"\ 9 >•+1 2 + in-t* * (i» + v) 2 However. 10. Vj = 1 and v a = v. A significant value of either z or F requires the rejection of the null hypothesis that. Worked Example: In a sample of N — 16 pairs of values drawn from a bivariate population. Consequently. K. like t h a t of F. Alternatively. .> In other words the distribution of t.0-25) = 4-667 for 1 and 14 degrees of freedom. = 14 degrees of freedom are 4-60 and 8-86. We have (8.nor i^-tables are available. r = 0-5 *= 1. p = 0. Is this value significant ? Find the minimum value of r for a sample of this size which is significant at the 5% level. (10.2.192 statistics will be significant at least at the 5% point. we must remove the factor 2. B(Vi/2 Vs/2) (v! exp 2z + vj) Now p u t z = | l n <2. the value of / significant at the 5% level . Treatment: N = 16. or t = 2-16 for 14 degrees of freedom. we m a y use /-tables. the value of F = r*(N . r[(N — 2)/(l — r2)]i is distributed like t with N — 2 degrees of freedom. since z ranges from 0 to while t ranges from — co to -f. as we shall now show. in the parent population. Thus when we test the significance of a regression coefficient we are also testing the significance of r.4 (a)) can be tested using Table 8. = 14.4. If neither z. the observed correlation coefficient between the variates is 0-5.and z-distributions.ra) .7) g 2 /2 exp (vtz) = 2vi '' V' • . is a special case of t h a t of z.3. The 5% and 1% points of F for vj = 1 and v.co. 1) " ( 1 + dt • ( s e e 8 L4 .

r)] = t a n h .r)] = tanh" 1 r. Therefore for r to differ significantly from the given value of p at the 5% level. that the observed value of r. Then (z — Z) /(N — 3)~* is approximately normally distributed with unit variance. Z = i In [(1 + P )/(l . therefore. If the sampling is strictly random from the same population or from two equivalent populations.1 r . We conclude. Now t h e value of such a variate which is exceeded with a probability of 5% is 1-96 (see Table 5.p)] = tanh' 1 p and variance 1/{N .5. and in the neighbourhood of p = ± 1 it is extremely skew even for large N. we put z = i In [(1 + r)/( 1 . as N increases. We must now consider the problem of testing the significance of an observed value of r when p 0. The distribution of r for random samples of N pairs of values from a bivariate normal population in which p ^ 0 is by no means normal. Should there be a significant difference.r') = 4-60 or r = 0-497 10.1) The importance of this transformation lies in the fact t h a t — z is approximately normally distributed with mean \ In [(1 + p)/(l . I t was for this reason t h a t Fisher introduced the important transformation z = | In [(1 + r)/( 1 . is just significant at the 5% level: there is less than 1 chance in 20 but more than 1 chance in 100 that this value should arise by chance in random sampling of an uncorrelated population.4). we should . this distribution tends to normality quite rapidly. however.Z)(N . rx and r2 will not differ significantly.testing regression and correlation 193 is 2-14 (using Tables 8-6 and 8-1 respectively). So far we have assumed t h a t the population we have been sampling is uncorrelated.3) and. (10. we must have (* . The required minimum value of r is given by r* X 14/(1. 0-5.3)4 > 1-96 (б) Now assume t h a t a sample of N t pairs yields a value of r = rt and a second sample of N z pairs a value r — r2.5. (а) To decide whether a value of r calculated from a sample of N pairs of values from a bivariate normal distribution is consistent with a known value of t h e population correlation coefficient p. The Distribution of r.p)] = tanh" 1 p.

we have (p. Consequently if 1*1 . (1-40/0-60) = 0-4236 (z — Z) (N — 3)* is normally distributed about zero mean with unit variance. (1-65/0-35) = 0-7753 Z = i log. 141). p = 0-40 ? (6) What are the 95% confidence limits for p in the light of the information provided by this sample ? (c) If a second sample of 23 pairs shows a correlation. (a) Is this consistent with an assumed population correlation. In the present case (z .194 statistics have reason to suspect either t h a t the sampling had not been strictly random or t h a t the two samples had been drawn from different populations. Let zx be the ^-transform of r1 and z2 t h a t of r2. . Consequently the value r = 0-65 from a sample of 19 pairs is compatible with an assumed population correlation of 0-40. if they have been drawn from one population. if. var — z2) = var zt + var z2 = l/(Af t — 3) + 1 /{N2 — 3) Hence the standard error of zt — z2 is V i V 7 ^ 3 + * n d <*>- ' • > / ' + will be approximately normally distributed with unit variance. they are not random samples.Z)(N ~ 3)* = 1-4068.* t l / V i v 7 ^ " 3 + < 1-96 there is no significant difference between rl and r2 and we have no grounds for rejecting the hypothesis t h a t t h e samples have been drawn at random from the same population. which is less than 1-96. we have grounds for suspecting t h a t they have been drawn from different populations or that. Worked Example: A sample of 19 pairs drawn at random from a bivariate normal population shows a correlation coefficient of 0-65. r = 0-40. 10.6. On the hypothesis t h a t the two samples are random samples from the same population (or equivalent populations). however. can this have been drawn from the same parent population ? Treatment: (a) z = i log.

'2/( 2 m. a 2 .0-49 < Z < 0-7753 + 0-49 giving 0-2853 < Z < 1-2653 or 0-2775 < p < 0-8524 and these are the required 95% confidence limits for p. the weighted mean is k t k k £ niiZil £ mi = S MiZi i=l 1 t=1 where k M. Consequently 0-7753 . . {i = 1. and since (0-7753 . k). we put | Z . . (e) The ^-transforms of rt = 0-65 and of »-s = 0-40 are respectively zt = 0-7753 and = 0-4236. 2. . is given by i= 1 a 2 = 2 M M 3 = 2 mi2o. Combining Estimates of p. Let samples of Nu N t . (i — 1. k) about a common mean Z = tanh _ 1 p. t=i i=i t=i Now a 2 is a function of t h e k quantities nu. t h a t of S MiZi. (i = 1. .: = w. . . . . • • • ?k. Then these k values are values of variates which are approximately normally distributed with variances (Ni — 3).-/ S m.How shall we combine these k estimates of p. the variance of their difference is equal to the sum of their variances. . .-2. If we " weight " these k values with weights tm.7.Z | x 4 < 1-96 or | z . k). (i = 1.)». Let us choose these ft quantities in such a way t h a t a 2 is a minimum. we conclude that there is no ground to reject the hypothesis that the two samples have been drawn from the same population (or from equivalent populations). . Thus — 2a)/0-3354 is distributed normally about zero mean with unit variance.. .1 . .Z | < 0-49. t'=l If the variance of zi is a. 10. N t be drawn from a population and let t h e corresponding values of r be rv r2. 2. The standard error of — z t is then (1/16 + 1/20) i = 0-3354. . The necessary condition t h a t this should be so is t h a t for all i.0-4236)/0-3354 = 1-044 < 1-96. A) be Zi. da" 18mi -0 k k k . the population correlation. 2. . On the assumption that the samples are from the same normal population (or from equivalent normal populations). to obtain a better estimate of t h a t parameter ? Let the 2-transforms of n. 2.testing regression and correlation 195 (6) To find the 95% confidence limits for p on the basis of the information provided by the present sample.

1) N( . k 2 f'e] (Ni 3)zil =0 m^a. 0-48 respectively. Use these values of r to obtain a combined estimate of the population correlation coefficient. of 0-41..e. 2 m^j mm* — ( . a constant . for all i. for all i. (10.'2 = k k 2 m^c^j 1n I 2 m%. the sample correlation coefficient.3)z<.V(-S)». 2 m S a f ) ( . i. k (Ni / 2= 1 3) The minimum-variance estimate of Z is then and the required combined estimate of p is p = t a n h ^ 2 (Ni . yielding values of r. 0-60. 30..3 ) * / S ^ (iV< 10.e.7.. 17 27 37 47 128 (Nt .3. Treatment: We form the following table : z. (-i giving ' p = tanh 0-5589 = 0-507 N O T E : (1) To save work tables of the inverse hyperbolic functions should be used to find the ^-transforms of r. 1= 1 nti oc 1 /<Tia. lor all i. 40 and 50 are drawn from the same parent population. (2) The weighted mean of z obtained in the previous section and used here is approximately normally distributed with variance . 7-412 18-711 20-831 24-581 71-535 0-436 0-693 0-563 0-523 — £(. Worked Example : Samples of 20. 0-41 0-60 0-51 0-48 TOTALS 3) J .196 s t a t i s t i c s i.e. 2 m ^ i. 0-51.8.

14. Moreover.2.2 to be our correlation table and assume our origin t o be taken at the sample mean (x = 0 = y)... i the frequency of the y's in the #<th array. whether such a value of could have arisen by chance in random sampling. with our present origin). 10. eyx. where s s is the variance of t h e means of the . the standard error of Z is 0-0883. Thus we may expect p to lie between 0-368 and 0-624. is defined by (6. the right-hand side. Testing a Correlation Ratio.9. s ' ZfiWiVi i ~ yi) = S(y.9. — 3) + 3 J pairs. i. by 6.10).e. the correlation ratio of y on x for the sample.-) + 2 2 fry?. (10.^-arrays and s„ 2 is the sample variance of y.2) = 0.1) If now there are p ^-arrays.y.TESTING REGRESSION AND CORRELATION 197 1/ E (Nt — 3). The value of Z we have obtained may be treated as an individual value calculated from a single sample of 131 pairs. The accuracy of our estimate is then that to be expected from a sample of £ E (N.-2fyy}) «' j 2 (y. Thus the term Nsv2eyxs < j t has p — 1 degrees of . be taken as a measure of the degree t o which the regression departs from linearity (but see below. i j i i i i = Nsy*(\ . In the present example this variance is 1/128 = 0-0078. 2 i X f y y f j Expanding Vi) = 2 2My i j } . the p values of yi are subject t o the single restriction 2 2 fyyj = 2 M. When the regression of y on x in a sample of N pairs of values from a bivariate population is curvilinear. which..4) eyx1 = V / V .y.n.2 2 f y ) ' j 1 = 2 («iy<2 . 10. Then the sum of the squared deviations of the y's from this mean is a S V f y y f = i j S i j - yt + vi)\ where yi is t h e mean of t h e y's in the *jth array.7. .yt. provisionally. We take Table 6. We now have to devise a method of testing whether a given evx is significant.e. Consequently 2 2 2 f a y ? = 2 2 f i } ( y i . i.y'02 + 2 S/yyi 2 + 2 2 i i 2 f y U V i ~ i j The cross-product vanishes because if 2 f% = «<.eyx*) + Ns/efX* . eyX2 — r* may.= Ny{ = 0.

. in 10. On « 3 the null hypothesis. The corresponding analysis of covariance is. 10. while the term 2 2 2 fijfi is the variation sum of squares between arrays. then. for the term £ fay j2 there are N — 1 degrees of i j freedom. Sum of squares. each array may be regarded as a random sample from the population of y's.e^)l(N .9 we assumed it to be non-linear.1) NsJ{l . one which will indicate whether.10. the Ny/s are subject to the same restriction and. NsSe„x*l(p . a test which is in fact logically prior to the other two. In other words.2 we assumed that the regression of y on x was linear . each of these sums divided by the appropriate number of degrees of freedom should give unbiased estimates of a s 2 . we must reject the hypothesis that there is no association of the kind indicated by the regression function in the population. the regression line is y = byzx. so. « j >3 ' 3 Let byx be the coefficient of regression of y on x. then. TOTALS .198 s t a t i s t i c s freedom. Between arrays Within arrays . regression: is linear or non-linear. the value of is found to be significant. Hence the term 1 — eyxl) involves (N — 1) — (p — 1) = N — p degrees of freedom. then : Source of variation. Linear or Non-linear Regression ? In 10. with our assumption that x = 0 = y. Nsszeyxz Degrees of freedom. On the null hypothesis that there is no association between the variates in the population.V ) N . To do this we return to equation 2 Z f a y j * = 2 Xfyly. We must now complete this set of tests with.y<)2 + 2 2 f y t f . the value of eyx obtained from the sample data is significant. PN 1 Estimate of at*.p) — Ns„*(l . on the sample data.p Nst* 1 If. Likewise. 2 Jjfy{yj — yi)3 «j is the variation sum of squares within arrays. . then.

.Y. and the third term represents the variation of the regression line about the sample mean. On the hypothesis t h a t regression is linear. corresponds but one degree of freedom.yiY ^ A V ( i .10. i j i j i i the cross-product again vanishing (why ?) The first term on the right-hand side of this equation still represents the variation of y within arrays. • S S f y ( V i . the second term represents the variation of array-means from the regression line. i j with p — 2 degrees of freedom. has N — p degrees of freedom.y«)a + £ £fy(yi . + K<)a i j i j i j = £ £My. We have already seen t h a t the term s Zfyiyi 1 •yx . so. I t follows t h a t it is not sufficient to regard eyx2 — r2 by itself as a measure of departure from . Now we may write Z Z UjY? = byx2 S S fax? = byX* E mx?.Then we may write £ £fyyf= £ .Yi)'1 + £ ZfyYS.yi)» + £ -Lfi0i .1) If this value of F is significant. The analysis is shown in the table on page 200. Nsx2 = Nr2sv2. the hypothesis of linear regression must be rejected. the variai tion it represents depends only on byx and to it. Furthermore. £ niXi2 = (sayVsx1) . t Consequently the term £ — Yi)2 = Nsy2(eyx2 — r2). i j i ) i But S niXi2 is independent of the regression and.Yi)* + s S f y Y i * = Nsv%x* i j j with p — 1 degrees of freedom. the mean square deviation of array-means from the regression line should not be significantly greater than that of y within arrays.testing regression and correlation 199 Let Yi be the estimate of y obtained from this equation when x — xi. therefore.«. Moreover. We may thus test V (1 e„x)(p 2) (10.

0-009 x 991 „ „„. No. for F depends also on eyxz./ regression line Of regression line Nr\» about sample mean Within arrays TOTALS Nr's* Ns.p N . = 7. Is there reason to believe that the regressidn of y on x is non-Unear ? 4. there is no reason t o reject t h e hypothesis and analysis m a y proceed accordingly.»(l . .11. .*(1 (.N-p) — Ns. OAnn. Treatment: We have e. we find that the 1% and 5% points of F for „.x* — 0-471. Random samples of 10. There is some ground for believing that the regression is non-linear. significant at the 5% level.1 2 W Mean square. If t h e value of F is not significant. Test for significance the value of r found in Exercise 4 to Chapter Six.4 0 9 f ° r "» = 7 ' ~ Using Table 9-6. Worked Example: Test for non-linearity of regression the data of 6-8 and 6-15. Sum of squares.He. eyI2 — r* = 0-009.).2) Of array means about Ns. p = 0-43 (2 d. therefore. 15 and 20 are drawn from a bivariate normal population. but not at the 1% level.r«) Degrees of freedom. . p1 N .200 statistics Source of variation. Solutions 2. 10. Are these values consistent with the assumption that the regression of y on x is linear ? 3. Yes. EXERCISES ON CHAPTER TEN 1. v2 = 991 are 2-66 and 2-02.«„») Ns„> linearity. 0-4. Grouped in 14 arrays. N = 1000 and p = 9.'*)/ ( p . Form a combined estimate of p. A sample of 140 pairs is drawn at random from a bivariate normal population. The value of F is. the data yielded r = 0-35 and em = 0-45. 3. N and p. 0-49 respectively. 2. Test the values of eyl and r found in Exercise 8 to Chapter Six. 4.p. F = 0-529 X 7 = 2 . yielding r — 0-3.

Then the theoretical frequency of Xi in a sample of N will be xi —i M say. The Chi-Square Distribution. 11. 2 . Of a sample of N values of a variate x. (i = 1 . where S = N. . . the " fit " is not good. . The question we now ask i s : How well does this theoretical curve fit the observed d a t a ? On the hypothesis t h a t t h e fitted curve does in fact represent the (hypothetical) population from which the set of observed values of x is a random sample. is likely t o result from random sampling. however.• = N.2. the 201 . If now the probability of * taking the value X{. let x take the value xx on n1 occasions out of t h e N. Curve-fitting. 2 . Xi ± -J/j. Let <f>(x) be the probability density of the continuous distribution corresponding to the curve we fit to the data. the divergence of observed from theoretical frequencies must result from random sampling fluctuations only.-. the probability of x taking the value xx on occasions. (i = 1 . we shall be forced t o conclude that. at this level of significance. . . finally. . 2. . If. The value Xi will in fact be the mid-value of a class-interval. the value xt on k nk occasions.1. k).CHAPTER CHI-SQUARE ELEVEN ITS USES A N D 11. the fitted curve does not adequately represent the population of which the observed x's have been regarded as a sample. the value x t on » 2 occasions. and. k). . W h a t we are actually trying to do when we " fit " a continuous curve to an observed frequency distribution is to find a curve such t h a t the given frequency distribution is t h a t of a random sample from the (hypothetical) population defined by the curve we ultimately choose. Suppose that the observed frequencies of the values Xi of the variate x k are (i = 1. k) be pi. Crudely. . say. t o some specified degree of probability. this total divergence is greater than t h a t which. so that 2 «.

10) t AT! P = -J: .202 statistics value on n2 occasions and so on. But S iti = N.2. is constant. pi.\T*~* —' ^ £ P - rn-. we have. = Npt + Xi{Npt)i. viz. n pfH .Npi)/(Np. * (2tc) 2 n p d > = • ] • n [piln^ +i 4=1 1 (2ttN) 2 .)i. n 1= 1 n pi* • (11. . and.1) n «i! < = 1 7= 1 For sufficiently large values of approximation to n !.. exp (— n) then P ~ n (2-K)iW + iexp{n N)' _£ i=l pi7H [{2K)in. In P =2= In C + Write = (w. . (11. and. putting 1/C for this expression.e. we may use Stirling's n ! ^ (27i:)iwC'! + i) . for sufficiently large rii. therefore i=3 A 1 r — [ ( S 1f («< + « ] . for a given N and a given set of theoretical probabilities. .. 2 (m + -i) In »= l (Npt/tn). i. Therefore. . will be (2. t + i exp (— «<)] (2tt) 2 exp I * \ t=J E mj t I.2) k Now the expression (27tiV)Cfc~1)/2 II £.2. n.4 is independent of the i=1 Hi's. regardless of order.

AXi. In P In C . 1 = 1 ' ^ {teF-Wpt exp (. (11. we have p ^ ( 1 (2.>i (k r 2 exp ( .i 2 Xi{Npi)-i 4=1 i=1 or. will be given by A Xi = (Npi)-$. 2 Xi(Npi)i = 0 •= 1 1=1 i s= 1 indicating that only k — 1 of the Xi's are independent./(Npi — InC £ [AT/. p.i 2 Xi{Np()i i = 1 = 0. Teach Yourself k Calculus.3) or P znz C exp (— i Now the are integers and since w. on expanding the logarithm in a power series (see Abbott. t=i .. follows t h a t In P — In C + I [Np. + X((Npi)i + X 1 X In [Np. Remembering t h a t only k — 1 of the Xi's are independent. ^ • • • A*.)i)] X.-.2. since k k k 2 m = N = S Npi.(Np.i j . ^ 2 ) ^ . the corresponding change in Xi.rA0 *1)/2 _ n pii i=1 1 exp ( . = Npi + Xi(Npi)i. since N is large and terms in N~i and higher powers of N~i may be expected.i 2 X? »•= l Z Xi').chi-square and its uses 203 Then. 332) and using the fact t h a t In P ^ In C . 2 Xt* . to a change of unity in m.* 2 Xi 2 ) V 1=1 ' k i=1 _ (2K) .(Np()-i] If none of t h e m ' s are of the order 2V-1.* 2 pki v / \ n (Npi)i. + Xi(Np()i 1 + i ] In [1 + It + X. . we have.

204 statistics or.6) When. P. if we treat the event of x taking t h e particular value X\ as a success in N trials. the variates Xi = (m — Npi) l(Npi)i are approximately normally distributed about zero mean with unit variance.pi) + Np? = Npi (11. . we have on the assumption t h a t the frequencies are independent. the sample t=1 size. . Summing over the i's. . . in more detail. however. we write ni — zi + Npi and allow N to vary. t-1 k k . to vary.. as N tends to infinity. if then we allow N. N is sufficiently large. var (N) = N. We begin k by recalling t h a t 2 = jV. .2. is given approximately by the probability differential of a continuous distribution defined by P — dp = B exp ^Xi^dX^dXi . 2 pi2 i=l t=l i=l k or.4) Let us now consider the variate X. to find the distribution of the sum of the squares of k standardised normal variates. I t remains.pi) + var (N) . Xk) be a point in a ft-dimensional Euclidean space and put . therefore. etc. i= 1 Therefore. the frequency of success will be distributed binomially with mean Npi and variance Npi( 1 — pi).2. I. .5) (11. When N is treated as constant.et P = (Xlt X2. . . then. since T. subject to the restriction t h a t only ft — 1 of them are independent. we have k k k var (N) = S var (ni) = N S pi( 1 . var (zi) — NPi(l — pi). n^x^s.5(a)) or var N = 2 var m i=] Again. 2 CT„i2 = Ojy2 . var (ni) — var (zi) + pi* var (N). var (m) = Npi(\ . the probability of w ^ ' s .2. If. pi = I. (11. dX^x (11. or var (ni) = Npi(l — pi) + pi2 var (N). Now put zi = ni — Npi.2.

however. which is the normal distribution with probability density doubled—due to the fact t h a t x ^ 0.9) still holds.}X2WX. 1 = A j o giving IjA = 2(*.2. whereas in the normal distribution the variate takes negative and positive values. ft). ^ • (H-2.9) in the form dp = — ^ exp ( .}x 2 )x 4 " 2 rfx • where. (11. . k — p. which although actually t h a t of x is sometimes called the x a ~distribution. we may say t h a t x has ft — 1 degrees of freedom. Putting.9) I t can be shown. . dp = (2 Mi exp ( .chi-square and its uses 205 k subject to the restriction E Xi(Npi)t i= 1 = 0. When v = 1. Using i=l i=l arguments parallel to those used in 7.2. 2. since the probability of x taking some value between 0 and oo is unity. (11.10) . The x 2 -distribution proper is obtained by writing (11. conventionally.i X 2 ) (X2)"'2 " W ) 2 ' r(v/2) (11.11. . Since of the kXt's only ft — 1 are independent. v = ft — 1.2.8) exp ( ixV-^X This defines a continuous distribution. m are Xi.} z a Wx (H. _ 2 1 T(v/2) X .2. we find t h a t the probability t h a t of the N values of the variate x. there are p such linear equations and the number of degrees of freedom for x is. t h a t if instead of one equation of constraint. we have dp = .2. is approximately equal to the probability t h a t x = ^ S Xi 2 ^ * lies between / an d X + <X v i z -. (i = 1.7) d p = A exp ( . .3 )' 2 r[(ft . An element of volume in the X-space now corresponds (see 7.1 exp ( .l)/2] . consequently.11) to the k element of volume between the two hyperspheres 2 Xi 2 = / a k k *=^ and 2 Xi2 = (x + subject to 2 Xi(Npt)i = 0.

' d S The right-hand side is.2.x V a X x 2 15 ) ^ 1 r(v/a) .2. and the above equation may be written in Karl Pearson's notation P (X* < Xo2) = 1 ( v ^ f ^ T v/2 ) • <»•«•»> Tables of this function are given in Tables of the Incomplete T-function. edited by Pearson and published by the Biometrika 03 V= 2 0-2 V=4 01 t > • 5 „ _ i F i g .206 s t a t i s t i c s 2 If now we write / = S. V=8 10 e ^ r . in fact. an Incomplete r-function.10(a)) 1 exp (— ^ S ) S v / 2 _ 1 ^ S 2V/2 r(v/2) Then t h e probability t h a t x 2 will not exceed a given value P l y ' < Zo2) = 2 — f r( v /2) •'o =Z ° exp ( - i S ^ . (12. 11.

A. With them evaluate P(x 2 < Xo2) f o r v < 3 0 For many practical purposes. however. A. V. Yates. University College. Our Table 11. therefore. 11.2. by Sir R. They include those in : (1) Statistical Tables for use in Biological.chi-square a n d its uses 207 Office. Fig. we may formulas which given an approximate value of y 2 with a probability of 0-05. Agricultural and Medical Research. Values of x 2 with Probability P of Being Exceeded in Random Sampling 0-08 3-81 5-90 7-82 9-49 11-07 12-59 14-07 15-51 16-92 18-31 19-68 21-03 22-36 23-68 25-00 0-01 6-64 9-21 11-34 13-28 15-09 16-81 18-48 20-09 21-67 23-21 24-72 26-22 27-69 29-14 30-58 \p. C. When the value of P obtained from the table is greater than 0*95. by D. Fisher has shown that V 2 * 1 is approximately normally distributed about mean V i v — 1 with unit variance. be regarded as significantly small. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 IS \ 0-99 0-0002 0-020 0-115 0-30 0-55 0-87 1-24 1-65 2-09 2-56 3-05 3-57 4-11 4-66 5-23 0-95 0-004 0-103 0-35 0-71 1-14 1-64 2-17 2-73 3-32 3-94 4-58 5-23 5-89 6-57 7-26 P. by Sir R. Miller. and (3) Cambridge Elementary Statistical Tables. by permission of the author and publisher. Fisher. P. Thus V2x* — V 2 r — 1 may be considered as a standardised nonnal variate for values of v > 30.2 will make this clear. . from (2). V. V. Lindley and J. Tables of x 2 for various values of v ^ 30 are readily available.4). London. TABLE 11. while that for 35 ^ v > 10 is we can use two exceeded point of l-25(w + 5). (2) Statistical Methods for Research Workers. The approximate 0-05 X2 for v ^ 10 is l'55(v + 2). The " fit " is too good to be regarded without suspicion ! (see also W a r n i n g in 11.2 is reproduced. {2) When v > 30. Fisher and F. 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0-99 5-81 6-41 7-02 7-63 8-26 8-90 9-54 10-20 10-86 11-52 12-20 12-88 13-56 14-26 14-95 0-95 7-96 8-67 9-39 10-12 10-85 11-59 12-34 13-09 13-85 14-61 15-38 16-15 16-93 17-71 18-49 0-05 26-30 27-59 28-87 30-14 31-41 32-67 33-92 35-17 36-42 37-65 38-88 40-11 41-34 42-56 43-77 0-01 32-00 33-41 34-80 36-19 37-57 38-93 40-29 41-64 42-98 44-31 45-64 46-96 48-28 49-59 50-89 NOTE : (1) The value of y* obtained from a sample may be significantly small. the probability of a smaller value of is less than 5% and this value must.

Mi Hence = vt—I.higher powers of v.2-'2r(v/2) / e x P ( .2Q--/2 j T(v/2) Y~r~2tdv exp (— v)v"!2.1) i. (11.208 STATISTICS 11.2t)-"l2 Expanding [1 — 2 w e M(t) = 1 + have + v(v + 2) ^ + . Hence = v.f 21 + 5 . fx2' = v(v + 2) and.e.3.. . (11.3. 4i2 + • • • .iS)S»l2~ 1 exp (St)dS = 2W2r(v/2l / 5 W 2 _ 1 ex P ~ %t)}dS Putting M(t) = f S ( l — it) — v. (i2 = jz.. .3) .' . The mean of the ^-distribution for v degrees of freedom is. v and the variance 2v. therefore.'» = 2v.~1 v 1 -s. r(v/2) 20--I w r(v/2) ' . „ 1 5 F V2v 2LV2v 2 2v T J t2 = ^ + higher powers of v-1 Thus Mm(t) -> exp (-J/2) as v-> to .1dv = (1 — tzl s . More Properties of Chi-Squares. . The mean-moment-generating function in standardised units is. then. M(f) = (1 . . . consequently. . The moment-generating function of the -//-distribution is M(t) s £(«*"') = €(e&) 1 f . • ..3.n. dS = (1 .

Consequently. v (1 = 2. . . or 2 2 2 Z =0 (v/2 .e. are each distributed like x 2 with w degrees of freedom. the moment-generating-function of S< with respect to the origin is [1 — 2*]-W».1)S"/ .2 t y m * i=l t=1 P n ^(exp Sit) t-i = (1 .f. of I Si = n (1 .chi-square and its uses 209 and.. (11.g. we see that the distribution tends to normality about mean v with unit variance. . . is. then 2 Xi is distributed like x 2 with p i o l Z n degrees of freedom. P But the moment generating function of S S. by definition i= 1 f ^ e x p ( 2 S<i)J = ^ ( . p).(«' «= 1 . the m. . . 3 . vp degrees of freedom respectively. The mode of the /^distribution is given by 3g[S»/*-iexp(-iS)] i. . »=i I t follows at once that if the sum of two independent positive variates x1 and xt is distributed like x2 with v degrees of v = . 2.2i) =» (1 — 2<)-"/2.^ S * / 2 . 2 . Xi2> X22. . . the skewness of x 2 is given by Now consider ^>x 2 . . .}S) = 0 S = =v-2.i e x p ( .exp ( . comparing with (5.v a r i a t e s .3.4) Using Karl Pearson's measure of skewness.. . (mean-mode) / standard deviation.-JS) . p). p).2). where v = £ v« Hence we have the important theorem : If the independent positive variates *<.4. . . If S< = Xi2 (i = 1. • • • Xp2 with Vj. v 2 . n exp S<<) = since the S's are independent.

n) for v. In p = — This is very convenient. X22 = 13-2 for v2 = 10. is. For example— Three experiments designed to test a certain hypothesis yielded 2 = 9-00 for v. Xa the values of x 2 for v = 2 corresponding t o Pi> P2> Ps respectively. m) degrees of freedom.4-333 226 X 2 8-666 452 .) for v = 2 + 2 + 2 = 6 d.210 statistics freedom. (2) Next assume t h a t a number of tests of significance (three. 2.f. . ps. If then we write X2 = S x»'2. None of these on its own is significant at the 10% point. . as the reader may verify.1-386 294 p3 = 0-350 In ps= — 1 049 812 . Let the values of y2 corresponding to these experiments be x*2 (» = 1. = 5. Xl X3 2 = 19-1 f o r v3 = 15. and i f ' * ! is distributed like x 2 with vx degrees of freedom. this value of x 2 will i n fact be the value obtained from pooling the data of the n experiments and will correspond to v = 2w degrees of freedom. \ 2 a are X2 .(i = 1. Now a n y probability may be translated into a value of x2 f ° r a n arbitrarily chosen v. give us less reason for con-] fidence in our hypothesis than do those of any one of the experiments taken singly. x 2 is distributed like x 2 with v2 = v — degrees of freedom. . . yet—in view of t h e experience of (1)—we require to obtain some over-all probability corresponding to the pooled data.f. Some consequences of this additive property are : (1) Suppose we conduct a set of n similar experiments to test a hypothesis. when pooled. for if Xi2. . significant at the 10% point. Their sum x* = 41-3 for v = 30 d. 2. We know nothing more about the tests than this. . say) have yielded probabilities plt p2. B u t when v = 2. required pooled probability is obtained. For example:] p1 = 0-150 In — 1-897 120 pt = 0-250 In p% = . Thus we see t h a t the data of the three experiments. and the. however. the pooled value of x 2 is — 2(ln pr + In p 2 + In £«.

while the 5% point exceeds the 10% by 1-947.chi-square 2 and its uses 211 The 5% point of x for v = 6 is 12-592 „ 10% „ „ v = 6 „ 10-645 We see then t h a t t h e pooled probability is slightly less than 0-10. The number of successes in each throw was noted. The antilog of 2-9675 is 0-0944.1-30103. we find the required pooled probability t o be 0-094. we notice t h a t the pooled value of x 2 exceeds the 10% point by 0-021. with the following results (Weldon's data) : Number of successes. The theoretical frequency generating function on this hypothesis is then 26. 3.306 times and a 5 or a 6 was counted as a success. a success. 0 1 2 3 4 5 Frequency 185 1. then.114 5.149 3.475 6. This means that the probability of throwing a 5 or a 6. Therefore the pooled value of -/_2 corresponds t o 1X 0-30103 = . Interpolating thus. f. to three decimal places.194 Number of successes. Frequency.306 a + 1 Either natural logarithms (base e) or common logarithms (base 10) may be used. To find this more accurately. 6 7 8 9 10 Total Is there evidence that the dice are biased ? Treatment: We set up the hypothesis that the dice are unbiased. Now 1 log 10 0-10 = — 1 and log 10 0-05 = g-69897 = .067 1.265 5.331 403 105 18 26. is -J = i and the probability of a failure.1-0325 = 2-9675. since In x = In 10 X log10 x. 1 1 A Some Examples of the Application of x a : (A) Theoretical Probabilities Given Twelve dice were thrown 26.306 . The difference between these is 0-30103.

has been imposed..114 5. measured in cm.067 1. 0 185 1 2 3 4 5 6 1.345 5.254 8 403 392 9 105 87 ^ = 38-2.149 3. the size of the sample total.194 3. namely. • 4-2 4-3 4-4 4-5 4-6 4-7 Frequency. rejected. The number of degrees of freedom is thus 10. of 294 eggs of the Common Tern collected in one small coastal area: Length (central values).265 5. The value of x 2 obtained is then highly significant and the hypothesis that the dice are unbiased is. therefore. (B) Theoretical Probabilities not Given The following table gives the distribution of the length. 54 34 12 6 1 2 .425 6. The 1% level of for v = 10 is 23-31.306 26. 10 18 14 Xs == 2 There are 11 classes and one restriction.331 1. 3-5 3-6 3-7 3-8 3-9 40 41 Frequency.576 6. 1 1 6 20 35 53 69 Length (central values).927 Totals 26.273 5.217 3.306 Theoretical frequency (<r) 203 Number of successes Observed frequency (o) .018 2.212 statistics The estimated frequencies are then found to be (correct to the nearest integer) : Number of successes Observed frequency (o) . Theoretical frequency (e) Then 7 1.

Estimated frequencies are obtained by the method of 5-6 and we find Length Observed frequency (0) Estimated frequency (c) 3-5 3-6 8 1 0-4 s 3-7 3-8 3-9 4-0 4-1 1 2-0 9-1 v 6 6-7 J 20 17-9 35 37-0 63 55-3 69 62-6 Length Observed frequency (0) Estimated frequency (c) 4-2 4-3 4-4 4-5 4-6 9 4-7 Total 54 54-1 34 33-8 12 16-3 6 6-1 1 1-4 7-9 2 0-4 294 294 e — = 2 — — 2 2 o + 2 e.) Treatment: We have first to fit a normal curve. The sample mean is an unbiased estimate of the population mean. but iV = 2o = 2 e. We find by the usual methods that x = 4-094. we group together into one class the first 3 classes with theoretical frequencies < 10 and into another class the last 3 classes with frequencies < 10. There is some divergence of opinion as to what constituted a .e. This entails calculating the sample mean and sample variance.U. The sample variance may be taken to be an unbiased estimate of the population variance. that the class frequencies were sufficiently large. e . o — e is not integral. since it removes the labour of squaring non-integers. s = 0-184. (2) Because we derived the distribution of x* on the assumption that Stirling's approximation for n ! held. consequently..-. This effectively reduces the number of classes to 9. since N = 294 is large.chi-square and its uses 213 Test whether these results are consistent with the hypothesis that egg-length is normally distributed. i. (L. x2 = 2 t _ N N O T E : (1) This form is preferable where the estimated frequencies are not integers and.

we have estimated both the mean and variance of the theoretical parent population.5.214 statistics " low frequency '' in this connection. We conclude. consequently there are 9 — 3 = 6 degrees of freedom. If there is no link-up between the disease and the . and. Tests of independence and homogeneity also come under this general heading.294 = 2-5 We must now calculate the corresponding degrees of freedom. Homogeneity. When we use the //-distribution to test goodness of fit we are testing for agreement between expectation and observation. . . ii! " Generally such cases are demonstrably due to the use of inaccurate formulae. We take a sample of individuals and classify them in two ways : into those deficient in the vitamin and those not. There are effectively 9 classes. W A R N I N G : It may happen that the fit is unnaturally good. In these cases the hypothesis. 11th Edition. . or more. there is good reason to believe that egg-length is normally distributed.\ Suppose the value of x2 obtained was such that its probability of occurrence was 0-999.: therefore. p. considered is as definitely disproved as if P had been 0-001" (Statistical Methods for Research Workers. i i 2 4. we find that the chance of such a value of x2 being obtained at v — 6 lies between P = 0-95 and P = 0-50 at approximately P = 0-82. different ways. 11. 4 . Aitken (Statistical Mathematics) prefers < 10. from the sample data. 81). but occasionally small values of x2 beyond the expected range do occur. we m a y wish to determine whether deficiency in a certain vitamin is a factor contributory t o t h e development of a certain disease. The reader would do well to compromise with a somewhat elastic figure around 10. consequently. Fisher has pointed out that in this case if the hypothesis were true. Very often we want to know whether these classifications are independent. Also. Contingency Tables. One restriction results from the fact that the total observed and total estimated frequencies are made to agree. such a value would occur only once in a thousand trials. that the fit is good but not unnaturally good. Suppose we have a sample of individuals which we can classify in two. For instance. We have then 3 constraints and. while Kendall (Advanced Theory of Statistics) favours < 20. Entering the table at v = 6. He adds : x2— 4. ~ 9-1 + 17-9 + 37-0 + 55-3 + 62-6 + 54-1 W W W _ + T 33-8 T 16-3 ^ 7-9 = 296-5 . Independence. Fisher (Statistical Methods for Research Workers) has used the criterion < 5.

„ a not-B = (c + d)/N. Now suppose classification I divides t h e sample of N individuals into two classes. t h a t in " not-A and B " be b. we may display this in the 2 x 2 contingency table A. In the present case. It is usual to set out the sample data in a contingency table (a table in which the frequencies are grouped according to some non-metrical criterion or criteria). where we have two factors of classification. b d b+d Totals. then we calculate the expected number of individuals in each of the four sub-groups resulting from the classification : those deficient in the vitamin and diseased. those deficient in the vitamin and not diseased.s q u a r e a n d its u s e s 215 vitamin deficiency (i. B and not-B. If the divergence between observation and expectation is greater than is probable (to some specified degree of probability) as a result of random sampling fluctuations alone. on the evidence of the sample data. we have a 2 x 2 contingency table. B Not-B TOTALS Not-A.c h i . Then the probability of being " A and B " will be ^ . we have. and working from margin totals. we shall have t o reject t h e hypothesis and conclude that there is a link-up between vitamin-deficiency and the disease. those diseased but not deficient in the vitamin. A and not-A. a not-A = {b + d)IN: aB = (a + b)IN. if the classifications are independent). a + b c+ d a+b+c+d=N a c a+ c On the assumption that the classifications are independent. and those neither diseased nor deficient in the vitamin. t h a t in " not-B a n d A " be c.. We thus obtain four observed frequencies and four expected frequencies. and t h a t in " not-A and not-B " be d. and classification 2 divides the sample into two classes. Let the observed frequency in the sub-class " A and B " be a . resulting in the division of the sample into two different ways. the probability of being an A = (a -f c) /N.e.

.bcY(a + b + c + d) ~ (a + b)(a + c)(c + d)(b + d) ' l1101> . will be given by a x [" _ (a + c)(a + b) I" _ (b + d)(a + b) -|» L (a + fc + c + L (a + b + c + <Z)J + ~ (a + c)(a + b) (b + d)(a + b) {a + b + c + d) (a + b + c + d) r _ (a + c)(c + d) -|» r _ (b + d)(e + d) -|» L (a + 6 + c + d)A L (a + b + c + d)A + + (q + c)(c + d) (b + d)(c + d) (a + b + c + d) (a + b + c + d) (ad — be)2 f 1 ~ N L(a + c)(a + 6) + + (a. . B .be)* m ~ N " (a + b)(a + c)(e + d)(b + d) x .. in this case. (6 + d)(a + b) N (b+d)(c + d) N Not-B Providing the frequencies are not too small. (ad . we may then use the ^-distribution to test for agreement between observation and expectation. (a + c)(a + b) N (a + c)(c + d) N Not-A. „ c)(c + d) L 1 (6 + d)(a + b) + 1 1 (b + d)(c + d) J _ (ad .216 statistics X ^ ^ an(j the expected frequency for this sub-class in a Likewise. the probability and the corre^^ ^ • I n this way we sample of N will be of being " B and not-A " is ^ sponding expected frequency ^ can set up a table of corresponding expected frequencies : A. + . and this will. as we did in the goodness-of-flt tests. be testing our hypothesis of independence. in fact. Xs.

TOTALS . say t h a t for " A and B ". Local General TOTAL Dead. the others are fixed and may be written in by subtraction from the marginal totals. Directly then we calculate one value. Local General . 511 173 Dead. Results are given below : Alive. the number of degrees of freedom for y2 is one.S. We recall t h a t the expected values are calculated from the marginal totals of t h e sample. 33 Totals. Local General . 24 21 45 Totals. 535 194 729 182 684 12 45 . Worked Example : A certain type of surgical operation can be performed either with a local anessthetic or with a general anessthetic.S. The contingency table is : Alive. Consequently for a 2x2 contingency table. Thus the observed values can differ from the expected values by only 1 degree of freedom. 535 194 729 Using the marginal totals. (R. 24 21 Test for any difference in the mortality rates associated with the different types of anmsthetic. 511 173 684 Dead.) Treatment: Our hypothesis is that there is no difference in the mortality rates associated with the two types of anaesthetic.chi-square and its uses 217 I t remains to determine the appropriate number of degrees of freedom for this value of -/2. the expected values are (correct to the nearest integer) : Alive.

In this way we calculate the expected frequencies in both classes for the two samples. The individuals in two very large populations can be classed into one or other of k categories. Sample I Sample I I TOTALS . will be the same for both samples. say. Homegeneity Test for 2 x k Table. Sample 1 Sample 2 1 «„ nal 2 .218 statistics Accordingly. A random sample (small compared to the size of the population) is drawn from each population. n W. Not-A. «s2 . . thus : A. therefore. x2 — 9-85 for v = 1 d. this probability will be (a -f c) JN and the estimated or expected frequency on this hypothesis of the A's in the first sample will be (a + c) X (a + b)/N. a + b c+d a + b+ c+ d(=N) a c a +c We now ask : " On the evidence provided by the data. and the following frequencies are observed in the categories : Category. b d b+d Totals.6. • . and we conclude that there is a difference between the mortality rates associated with the two types of anaesthetics. Suppose we were to regard the frequencies in the general 2 x 2 contingency table to represent two samples distributed according t o some factor of classification (being A or not-A) into 2 classes. highly significant. t . Again basing our estimates on the marginal totals. n ««* Devise a suitable form of the -/^-criterion for testing the hypothesis that the probability that an individual falls into the tih . can these samples be regarded as drawn from the same population ? " Assuming t h a t they are from the same population. • «i< • . the probability t h a t an individual falls into the class A. 11. . if the divergence between expectation and observation is greater to some specified degree of probability than the hypothesis of homogeneity demands. nn . The probability of this value of x2 is 0-0017 approximately The value of x2 obtained is. and. k • • Total. it will be revealed by a test which is mathematically identical with t h a t for independence.f.

1) | N1N2(nlt = N 1 N 2 (=1 How many degrees of freedom must be associated with this value of x 2 ? To construct the table of expected frequencies. On the assumption t h a t the probability that an individual falls into the tth class is the same for the two populations. . Derive the appropriate number of degrees of freedom for X2.7.6. There are.U.2 1 Mfi(Ni or .The expected frequency. we must know the grand total. To calculate the appropriate number of degrees of freedom. and consequently v = ftft — (h + ft — 1) = (ft — l)(ft — 1) degrees of freedom. 1 -f 1 + k — 1 = k + 1 equations of constraint. . There are thus ft + ft — 1 equations of constraint. 2.)* (nu + na) (11. .M^n)* +n^) [N 1w2i . .N2nlt]* \ N2(N1 + N2)(nlt + n2l)J | ~ nJN. since there are 2k theoretical frequencies t o be calculated. v = 2ft — (k + 1) = k — 1 degrees of freedom. nlt + nn. 11. to estimate these probabilities.(L. for the question could well be reformulated to ask whether the two samples could have come from t h e same population. on this assumption. ' . we use the marginal totals. one of the sample totals and k — 1 of the class totals. N1 + N. whether we are testing for independence or homogeneity. Given the grand total. and.)(nu + nu) .) This is a homogeneity problem. k). . _ X t= 1 v " ' + a 'J / Nx + Nt J _ N. h X k Table. ft) is the same in the two populations.n^ + N. we note t h a t there are h x k theoretical frequencies t o be calculated. In the case of a h x k table. 2. therefore. is («« + «2/)/(#i + M*). we follow exactly the same principles. Thus our estimate of the probability of an individual falling into t h e tth class. . Consequently. we require h — 1 and ft — 1 of t h e marginal totals to be given also. (t = 1.chi-square and its uses 219 category (t = 1. on this assumption. . in the fth class of the 1st Sample is therefore Ni{nlt + nu)/(A'j -f N2) and t h a t for the same class of the 2nd Sample is Nt(nlt + nn)j(N1 + jV2).

Miscellaneous sources 24 28 26 80 68 226 1942.e. i. the probability of an accident in any given class will be constant for that class. Hence the expected frequency of accidents for this source in a yearly total of 226 will be 63 X 226/690 = 20-64. 63 103 62 311 151 690 Is there significant association between the site of the accident and the year ? Treatment : On the assumption that there is no association between the origin of an accident and the year.. 20-64 33-74 20-31 101-86 49-46 0. 19 41 10 123 32 225 Totals. Blast furnaces Gas producers . the probability of an accident at a given source is not constant through the years considered. Correction for Continuity. distribution and use of gas . 20 34 26 108 51 21-82 35-68 21-48 107-72 52-30 1942. therefore. 20-54 33-59 20-22 101-41 49-27 We find that x s — 34-22 for v = (5 1)(3 — 1) = 8 d.220 statistics Worked Example : A Ministry of Labour Memorandum on Carbon Monoxide Poisoning (1945) gives the following data on accidents due to gassing by carbon monoxide : 1941. The probability of an accident at a blast furnace is estimated from the data to be (24 + 20 + 19) /(690) = 63/690. At At At In blast furnaces gas producers gas ovens and works . 20 34 26 108 51 239 1943. Proceeding in this way. The x 2 -distribution is derived from the multinomial distribution on the assumption . Gas-works and coke ovens Gas use and distribution Miscellaneous 24 28 26 80 68 e. 19 41 10 123 32 1943. reject our hypothesis of no association between source of accident and year. O.f. E. We. and the table shows that this is a highly significant value. 11. we set up the following table : 1941.8. o.

t h e probability of this value being attained or exceeded is 0-1141. t o tackle t h e problem f r o m some other angle.5)' * 5 5 and this value is attained or exceeded for v = 1 with a probability of 0-00157. This is also the probability of 0 heads (or 10 tails). We have. Suppose we toss an unbiased coin t e n times. however. for v = 1. gives us t h e X ! -estimate of t h e probability of obtaining 9 or more heads. The probability of 9 heads is 10 x 0-00099 = 0-0099 and hence t h e probability of 9 or more heads in 10 tosses is 0-00099 + 0-0099 = 0-01089. 0-05705. I n fact. t h e number of degrees of freedom for x 2 is reduced b y c — 1. in some cells or classes. W e can see t h a t t h e x 2 -estimates are already beginning t o diverge quite considerably from t h e " e x a c t " values. these frequencies have fallen below 10. we have adjusted m a t t e r s b y pooling t h e classes with such low frequencies. The corresponding value of x 2 — ^ g ^ + ^ ~ ^ = 6-4 and. Therefore t h e probability of either 10H or OH is 2 x 0-00099 = 0-00198. we have a 2 x 2 table with low expected frequencies. whereas the distribution of observed frequencies is necessarily discrete. 0-000785. The expected number of heads is J x 10 = 5 and t h e probability of obtaining just r heads is t h e coefficient of f in t h e expansion of + i)10. £(10H) = (J) 10 = 0-00099. we m a y either modify t h e table a n d then apply t h e y_2-test or we m a y abandon approximate methods and calculate f r o m first principles the exact probability of any given set of frequencies in t h e cells for t h e given marginal totals.chi-square and its uses 221 t h a t the expected frequencies in the cells are sufficiently large to justify the use of Stirling's approximation t o n ! When. Using the x 2 -distribution. W e have : The probability of 10 heads. while the probability of 9 or more heads and 1 or less tails is 2 x 0-01089 = 0-02178. Half this. gives t h e x 2 -estimate of the probability of just 10 heads.6)« (0 . If. I n t h e course of t r e a t m e n t of t h e example in t h e next section we shall develop and illustrate t h e " exact " method. therefore. If c such cells are pooled. t h a t of t h e //-distribution is essentially continuous. t h e value of x 2 for 10 heads or 0 heads is _ (10 . The . I n t h e present section we shall consider t h e method by which we " correct " t h e observed frequencies t o compensate somewhat for t h e fact t h a t . Half this value. no pooling is possible since v = 1 for a 2 x 2 table.

of observed from expected frequency.. we shad obtain a " better " value of x2This is Yates' correction for continuity for small expected frequencies. the area t o t h e right of r.8. the central values of the intervals being the various values of r . r + 1. if we " correct " the observed frequencies by making them % nearer expectation.. in the case we are considering. the sum of the areas FIG. irrespective of sign. When. If this is not so. 11. r. Clearly a closer approximation would be obtained if we calculated X2 for values not of r. 11. gives the exact probability of a deviation ^ + r. etc.e. however. for the continuous curve. the symmetrical binomial histogram is composed of frequency cells based on unit class intervals. However. The area of the tail of this distribution to the right of the ordinate corresponding to a given deviation. of the cells corresponding to the values r. gives therefore a normal-distribution approximation to the probability of a deviation attaining or exceeding this given deviation. the deviation of observed from expected frequency. t h e frequencies in the tail are small. r + 2.8). but for the histogram t h e area t o the right of r — -J (see Fig.222 statistics problem is : can we improve matters by finding out why this should be ? We recall t h a t when v = 1. but for values of ) r — ^ i. the //-distribution reduces t o the positive half of a normal distribution. I t s justification is based on the assumption t h a t the theoretical frequency distribution is a symetrical binomial distribution (p = q = £). we are taking. the theoretical .

S. Advanced Theory of Statistics. P = 0-071. Hence X2 (2-87)2 corresponding + 5^7 + 8^7 + ioojtJ = = 8-237 X 0-577 = 4-75 = and for v = 1 the probability of x" attaining or exceeding this value is 0-029. quoted in Kendall. the following results were obtained : Died of T. 11. Not inoculated or inoculated with control media Totals 6 8 14 Unaffected or slightly affected. In experiments on the immunisation of cattle from tuberculosis. Inoculated with vaccine . 1931-1934. 13 3 16 Totals. P = 0-029. with a correction for continuity. if p is near the correction should still be made when the expected frequencies 2 are small. Therefore the expected frequencies are : 14 x 19 = 8-87 10-13 30 5-13 5-87 Each observed frequency deviates from the expected frequency by ± 2-87. that the probability of death is independent of inoculation—the probability of death is i j . However. x2 = 4-75.M. I). 1934. on the hypothesis that inoculation and susceptibility to tuberculosis are independent. it must be emphasised. (Data from Report on the Sphalinger Experiments in Northern Ireland. the corresponding probability is 0-072.. is the probability of a proportion of deaths to unaffected cases of 6 : 13 or lower in a . This figure.e. Worked Example.B.9. and that by the exact method. H. and no simple adjustment has been discovered as yet t o offset this. Treatment: (1) On the hypothesis of independence—i. for the resulting value of y yields a probability definitely closer to the " exact " value than t h a t we obtain when the correction is not made. 19 11 30 Show that for this table. or very seriously affected. This is brought out in the following example..O.chi-square and its uses 223 distribution is skew.

e. b items of a second kind. we emphasise. = P = 0-072.) .224 statistics sample of 19 inoculated animals and of a proportion of 8 : 3 or higher in a sample of 11 animals not inoculated. in the probability of obtaining a proportion of deaths to unaffected cases of 6 : 13 or lower in a . for v = 1.. the total number of ways of setting up such a table with the marginal totals as above is / N \( N (Nl)» [a + c + bj ~ (a + c)! (6 + d)! (a + 6)! (c + d)! ~ n»' say' Secondly. b.b + c + d. c. Consider the table a b ( a + 6) c d (c + d) 2 (a + b + c + d)=N. (2) The observed frequency with the continuity correction applied are: 6-5 12-5 7-5 3-5 2 and. we ask in how many ways we can complete the 4 cells in the body of the table with N items. consequently x = = (2-37) x 0-577 = 3-24. i. with c + d items remaining. while from A items we may select a + b items in T (a + b) w ays. will be given by P(a. Clearly this is the number of ways in which we can divide the N items into groups of a items of one kind. d). Therefore. But (2. (a + c) (b + d) First. on the hypothesis of independence. From N items we can select a + c items in ways. yielding. d) = — = N\a\b\c\d\ " (11 9 1) -' How shall we use this result to solve our present problem ? We are interested here. c. say. on the assumption that the expected proportion for either sample is 14 : 16. b. when b + d items remain.10) we know this to be ah a\ 6! e d\ = "" Say' l Consequently the probability of any particular arrangement. c items of a third kind and d items of a fourth kind. P(a. we consider the number of ways in which such a table can be set up with the margin totals given from a sample of N. (3) We must now discuss the method of finding the exact probability of any particular array of cell frequencies for a 2 x 2 table. where W = a .

10. Hence the required probability will be twice that of the sum of the probabilities of these 4 arrays. (by 11. We conclude with an example of the way the x 2 -distribution may be used to give exact results when the observed data are not frequencies. 19! 16! 141 11! _ 19! 14! 30! 16! 11! 3! 0! ~ 30! 3!' We may evaluate this by means of a table of log factorials (e. 14.. 11i6 . 1 5 . P(5. ^-determination of the Confidence Limits of the Variance of a Normal Population.f j - p „ 3 . 15. Chambers' Shorter Six-Figure Mathematical Tables. We have log 19! = 17-085095 log 30! = 32-423660 log 14! = 10-940408 log 3 ! = 0-778151 28-025503 33-201811 28-025503 . 2) = 7 x 0-00439560 = 0-03076920 The required probability then is 2 x 0-03546450 = 0-07092900.9. 9. 11. 9. 0) = 0-00000666 PU in n P ( 4 .1). 14. P(6.10. 1) = 191 I 6 ! 141 111 3 0 ! 1 5 . 1 0 . 13. „1 1 . 2) = P(4. 1 0 U ! u _ = 1611 . 16. 8.33-201811 = 6-823692 and the antilog of this is P(3. If ATS2 = S (xt — x)2 and a 2 is the population .chi-square and its uses 225 sample of 19 inoculated animals and of obtaining a proportion of deaths to unaffected cases of 8 : 3 or higher in a sample of 11 animals not inoculated.g. Let us draw a small sample of N( < 30) from a normal N population. we are interested in the probability of each of the following arrays : 16 13 I | 5 14 M 4 15 I I 3 16 | I 8 3 | 9 2 M 10 1 M 11 0 | But it will be seen immediately that the probability of obtaining a 6 : 13 ratio among inoculated animals is also precisely that of obtaining a ratio of 8: 3 among animals not inoculated. In other words. The probability of | j j | is. 3) = ^ = P(5. 11. m ) P[ f 0 = 44 x 0-00000666 = 0-00029304 Similarly. 1) = 15 X 0-00029304 = 0-00439560 Finally.

000 1. 296 246 187 119 30 3 . the value of ATS2/a2 t h a t will be exceeded -with a probability of 0-05 will be t h e 0-05 point of t h e //-distribution for v = N — 1.1) or 8S2 = 30-8. Worked Example : A sample of 8 from a normal population yields an unbiased estimate of the population variance of 4-4.985. the number of days on which rainfall exceeded R cm at a certain station over a period of a year: R .944. 30-8/14-07 = 2-19 and 30-8/2-17 = 13-73. Scotland. on the basis of t h e sample information. Then the lower 95% confidence limit required. Likewise the upper 95% confidence limit will be NS'/xo.723. Find the 95% confidence limits for a.813. TOTAL .052. 0-00 0-04 0-10 0-20 050 1-00 N .000 3.226 variance.000 449.000 221. Let Xo-os2 be this value. 9 $ 2 . NS /a N s 2 statistics N 2 (xi — x)2/a* and is thus distributed »=I like x 2 with N — 1 degrees of freedom (the one constraint being S Xi = Nx).U. Treatment: We have 4-4 = 8Sa/(8 . where Xo-951 is the 0-95 point of the x 2 -distribution for v = N — 1.000 On the assumption that there is no difference between the proportion of males to females in the two regions. The 0-95 and 0-05 points of the ^-distribution for v = 7 are 2-17 and 14-07 respectively. calculate the probability that a child under five will be a girl. to find the 95% confidence limits for a 2 . Since TVS2/a2 is distributed like x2.) 2.000 1. Males Females . respectively.041. will be NS2 lx0. Therefore the lower and upper 95% confidence limits for a2 are. 228. (L. EXERCISES ON CHAPTER ELEVEN 1. The following data give N.000 1. The Registrars-General give the following estimates of children under five at mid-1947 : England and Wales.000 3. The corresponding limits for o are (2-19)* = 1-48 and (13-73)* = 3-69. Our problem is this : i=1 Given N and S 2 .000 Total. Hence find the expected number of girls under five in Scotland and say whether the proportion is significantly high.536. 2.

Owned by man „ women TOTAL . 72.) 4.S. x* = 14-2 for v = 8 and x2 = 18-3 for v = 11..U. p. Show that the three experiments together provide more justification for rejecting the hypothesis than any one experiment alone. Is the " fit " too good ? (R. Yes. . No. Rural Districts.7. Apply the x2 test of goodness of fit to the two theoretical distributions obtained in 4.S. Solutions 1. 219.1-98-ff. The following information was obtained in a sample of 50 small general shops : Shops in Urban Districts. These gave x2 = 11-9 for v = 6. Far too good.000. A certain hypothesis is tested by three similar experiments. 2. 3.chi-square and its uses 227 Test by means of x2 whether the data are consistent with the law log10 N = 2-47 .) 3. 18 12 30 Total 35 15 50 17 3 20 Can it be said that there are relatively more women owners of small general shops in rural than in urban districts ? (L. 5. 2 x < 0 02 for v = 5.

l). Across the scatter diagram draw the lines and x = x + iA*. y — y y = y — \&y ^Ay (Fig. Within this rectangle will fall all those points representing value pairs (x^ y3) for which x — < Xi < x + and y — iAy < y.x. say. the relative frequency of t h e value-pairs falling within the rectangle ABCD. < y + iAy Let t h e number of these points be AN. Ap.APPENDIX : CONTINUOUS BIVARIATE DISTRIBUTIONS Suppose t h a t we have a sample of N value-pairs (xi. y f ) from some continuous bivariate parent population.l. x = x — \bs. will clearly 228 . Consider the rectangle ABCD of area A*Ay about dc+5 FIG. A. The proportion of points inside t h e rectangle to the total number N in the diagram is then AN IN = Ap. A x A. y). the point (x.

2) . to t h e probability of E's occurrence in a single occurrence of its context-event. y)dxdy = 1 . as the number of occurrences of its context-event tends to infinity. relative frequency per unit area within this rectangle is Ap/AA. however. <f>(x. y)dA — (f>(x. the values of t h e variates are distributed according to some law. Thus dpjdA = <j>(x. simultaneously. . say. <f>(x. dp — <j>(x. while. y) of the continuous parent population. y. y) is the joint probability function of x. which m a y be expressed by saying t h a t t h e relativefrequency density a t (x. y)dxdy . which is now the relative-frequency density at (x. y). c ^ y ^ d ..continuous bivariate distributions 229 < 1. y)dxdy = 1 c y or. . y) to be zero. c ^ y ^ d\ then. while.+ » 0 we define / / <j>(x. The average. where AA — Ax . reduce Ax and Ay. y or the joint probability density of x. (A. Now let the range of possible values of x be a ^ x ^ b and t h a t of y. In this parent population. (A. simultaneously. Then </>(x. If we now increase N. But. the sample size. if for values outside x~^b. y) is a certain function of x and y. we m a y write + 0 . or mean. the variate y assumes a value between y Jt \dy.l) Here dp is the relative-frequency with which t h e variate x assumes a value between x ± \dx. since both x and y must each assume some value fX ™ «a =* x b ry = ry == c / <f>(x. we may write Limit Ap/AA Az—>-0 AY—>-0 = dp Id A. we may say : dp is the probability t h a t the variate x will assume a value between x ± \dx. since the relative frequency of an event E converges stochastically. simultaneously. in differentials. Ay. t h e variate y assumes a value between y ± \dy. indefinitely and. y) or.

Y. i.3) becomes P ( X l Vx (*)d* J • ( 'HVWV X. dp1 = <ji1(x)dx. dp2 — (y)dy. J r. and if the probability.e. by the law of multiplication of probabilities we have y) = H * ) • W v ) • • • (a.4) and all double integrals resolve into t h e product of two single integrals. For instance (A.230 statistics It follows t h a t is P(X^ the probability that X1 ^ x ^ X t f l and Y i > y > <t>(x. if the probability. y)dxdy (A. ent of t h e value taken by y.3) If x a n d y are statistically independent. . of x taking a value between x i \dx is independJ x. o t y taking a value b e t w e e n ^ ± \dy is independent of the value taken by x..

mrs' is given by mTS : AN xry N ^xry'AA For the corresponding class rectangle of t h e continuous parent population. for the entire parent distribution. dp. Q generates a surface z = <f>(x. erect a perpendicular. Their product moment of order r. y) is defined. CC — GO * + 0 /• + 0 0 0 ^11' = I I *y$(x> y)dxdy -CO + I . y)xrysdxdy —0 0 (A.y)dxdy. y = 0. At every point P. 0 O . y). y). we have. y = y ± \dy. y) lying within the rectangle AxAy of the scatter diagram of our sample of N value-pairs from a continuous bivariate population be considered as " grouped " in this class-rectangle. H02 '=/ J + 00 y*<f>{x.Hoi' = I J y<t>{x. ~dA therefore. y)dxdy. Let the AN values of (x.rs' = —0 0 In particular: •• 0 + 0 0 0 <t>(x. t h e probabilityor correlation-surface. in this plane for which <j>(x. +0 O 0 (A. . z. Then as x and y vary over the xOy plane. y)dxdy. the probability t h a t x lies within x ± \dx and y within y ± %dy. s about x = 0.2 let t h e rectangle ABCD in the xOy plane be formed by the lines x = x ± \dx. of length <j>{x. r 4 . Moments. is then represented by the volume of the right prism on ABCD as base below the correlation-surface. —0 0 —0 0 /• + « r + w > y . y). (x.00 / \j. In Fig. A.5) / I %</>{%.6) / -OO x*4>(x. — » — DO +O . accordingly.00 /• 4 . dp x?ysdA = <f>(x.continuous bivariate distributions 231 Clearly variates for which (A.y)dxdy. y)xrysdxdy .4) holds are uncorrelated.

2). — C — o .+» . y)dxdy — yl 00 / » x<j>(x.x)2<j>(x. y)dxdy •'— —*/ I y<t>(*> y)dxdy + xy •' — 0 y _oo 0 = H11' — yx — x$ + xy i.232 statistics The corresponding moments about t h e mean (x.y)dxdy.x)(y . y)dxdy — 00 / (A.ry + xy)(j>(x. y)dxdy — 1 (A.9) The moment-generating function.2 = |x02 = |x02' — yy i <v = Mo = H02 — -a y)dxdy + / — 00 or •'_<» / and likewise Finally. f rw + oo y + <0 r +« Cxy / / (Ary — -+0O •'-coO 0O .7) . y)dxdy.e...+» = I J (x .+ 00.y)<j>(x.•+«> CO / (#2 — 2xx + x2)<f>(x.+ C — . y)dxdy . . .+ 0 Also.y)<f>(*. . ^ + 000 . y)dxdy " —00 f + 0 /• + 0 0 0 y)dxdy — 2x I— oo / — OO Ci2 s (ji 2 0 = (Ji20' — x 2 = (Ti2 = H20 1*20' 1 a. (A. + oo O + co o -00 — C O / / (y . M(tlt continuous distribution is defined t o be t2) of a bivariate .+ 0 0 r + y+ ® / — 00 I•'—eoxy<j>(x. we have : — o •'-oo o + «> r+"> / — + « . y) = / (x .r -+ O 00 f O . y) of the distribution are : r -f + wj f + » cov (x. since / / <f>(x. = = (!„' — xy .

. y)dx)dy = dx I ' . (A. y x .00 / — OO (<j>(x. s about x = 0. is. y Now the right-hand side of (A.\dy for any value of x. y)dxdy. t h a t x lies between x ± \dx. Hy) Exercise: Show that the variance of the y's in the x-array. i. Regression and Correlation.. dp. 10) As the reader should verify. when y lies between y ± \dy is dp — <j>{x. (<v*)*. say. fi^x). then.11) / r + « . y = 0 is given by the coefficient of t/t2s/rl 5! in the expansion of M(tlt i 2 ). therefore. and. Since the probability. + 0 + 0 /-00 /— OO exp (xtl + yt2)rj>(x.+00 <f>(x. y)dx = of the y's in the 0 that / y -J.e.CONTINUOUS BIVARIATE DISTRIBUTIONS 233 M{tv tt) = e(e"' + »'•) . y)dy is a function of x. y) dy l(x) Likewise. y)dxdy (A.• to . the moment of order r.. given by + to / — 00 y<t>(x. and is the -to relative frequency<f>2(y). the relative frequency of t h e x's in the y-array. y)dy +0 0 / <f>(x.y lies between Likewise + 0 rj>(x. The mean of the y's in the #-array.11) is a function of x. is the probability array. the probability t h a t x lies between x ± \dx when y takes any value in its range is -f oo f + to /• 00 r +-t. is (y U*) y) J.

.+ 0 0 / y)dxdy -oo •'-oo i. V t { x (A.e. 16) for a n d B.+ 0 .e. 15) and (A. 14) b y and integrate. (A. 16) Solving (A. y)dxdy — A I / x2cf>(x. 13) y = f ^ ..+« > /•+» J I y<j>{x.e.j.. (*oi' = ^ n . . (A. we have /• + C ./_«.234 statistics The equation of the curve of means of the x-arr ays. obtaining I I xycf>(x. (A.-7 s a*y — —/ A 2 1^20 Gz Consequently. • (A.+ 0 O 0 / — co . 14) Multiply this equation b y <j>i{x) a n d integrate over the whole * . y)dxdy J_ C — O O O —0 —C 0 O / + 05 . ' + S .^ d y = A x + B (A.+ » 0 y)dxdy + B I I <f>(x. . . we m u s t have .( t V ) ' 1*20 — t*01 ^20 ~ t^io ^11 _ . 14) becomes = Cx Ov GxOy \ ax / . 15) Now multiply (A.r a n g e . . . H n ' = ^ a o ' + Six 1 0 ' . we have ^ _ P u ' ~ ^ioVOI' _ (fix _ f^ao' ~ (HioT (J-20 <*r2 _ l^oi'mo' ~ ^ioVii' _ + j*io'2) ~ I^iq'Cm-ii + P i o W ) 2 M'jo' . the curve of regression of y on x is • while t h e regression equation of x on y is r x • • . i.. My) If the regression of y on x is linear. + C O O . 17) . y)dxdy • 0 —0 — x ' i. y)dxdy = A | xcf>x(x)dx + B I <f>x(x)dx ' — oo — 00 / + C .A 12) - :?]dx .

and t h e probability function of y for any or all x. (A. . however. is given by + 00 "t. whatever the value assumed by y. ay and p as undefined constants. this does not necessitate t h a t the regression be linear.p2) The Bivariate Normal Distribution. y)dxdy — 00 r + CO — CO /•+ ao = / — CC / —CO [(y ~ y)2 ~ (y . p.e„ 2 O I ^2 y <y 2 (* - „2 (A.x)]2<j>(x. 17) : since GzylGx2 = (GxylGxGy) Oy/ax = pGy/ax. 19) Ss2 = c„ 2 (l . y) = C exp (— Z) where and C = Z = {x*/ax2 (1 — p2)i — 2pxyjaxav + y /av }/2{ 1 . consider the standard error of estimate of y from (A.« > / <j>(x.20) The distribution so defined is called the Bivariate Normal distribution. We begin by finding what are usually called t h e marginal distributions.18). y)dy .y)(x ~ X) 2 + P2 / . 2 = / / [y . Finally. i.p ) 2 2 2 (A.p.y . Consider the bivariate distribution whose probability density is j>(x. t h a t if we define p by (A. The probability t h a t x lies between x ± \dx. t h e probability function of x for a n y value or all values of y. .y (x .CONTINUOUS 3IVARIATE DISTRIBUTIONS 235 and the correlation coefficient between x and y. For the moment we shall regard ax. . is p = F>z1y « a = H u GxGy .18) I t should be noted.. r 1f r+ " r + 5. viz.

May not p then be the correlation coefficient of x and y ? That is indeed so and t h a t var (at) and var (y) are respectively ax2 and c%2 may be seen by considering the moment-generating function of the distribution.4 (e).y2/2<V) VyVZK .2 /2c*2) J But (see footnote to 5. both a. dy — cvdY and as ± 00. (A. <£i(*) = Cctj. y) = — e x p 5. likewise.p2)]dV = [2tc(1 . and 3/ are normally distributed with zero mean and variances a x 2 and a„2 respectively. ±00.21) = <Ai(*) • and thus we obtain a clue to the significance of p : when p = 0 the variates are uncorrelated.y 2 / 2 V ) Thus.* 2 / 2 ^ 2 ) .pxlax)*}dy V = yl<yv — px/ax.22) .V 2 /2(l . L exp [ . and. We have M(tv t2) = C J J - exp |_xt1 + yt2 2pxylaxav + 2(1 _ p2) {^ 2 /a x 2 y 2 /o y 2 }J (fAtfy (A. Hence.236 statistics i. Hv) = exp ( .2pxyfa^y p2)x2laxz]}dy + y*hv*) + (1 = C exp { .a.*2/2a*2) .e. marginally.V2JT ( . Therefore fcix) = — 1 = exp ( . page 82) -0 0 exp 2(1 ^ p2)] dy. exp ( . _ L = exp ( . ^(x) = C j exp { 2 _ ^ [(p 2 * 2 /<j* 2 ..20) <£(*.x''l2ax } x XJ Put Then exp { 2(1 _ p2) (yhy . Moreover. if we put p = 0 in (A.p2)]*.

*72a* s ) = — l . that.CONTINUOUS BIVARIATE DISTRIBUTIONS 237 The reader will verify that.j = exp ( .. (j. and p to be other t h a n undefined constants. if x is held constant.p2)]. — n ^ x o®V2it . t2) = exp (i(<J*V + 29az(jytlti + < W ) ) X . although up to now we have not considered GX.20) may be written = 27ta^„(l .e.. i. the coefficients of tx and <2 in its expansion are zero. Exercise: Show that : (i) nrl = 0.t2) Y = y — ay(ayt2 + pa**!). (A. + Y»/c„*)/2(l .[ ( * . x = 0 = y. H31 = 3pox\ Atx V) . so. therefore. 19) S„2 = a„ 2 (l .p*)-]dXdY But the second factor in the right-hand expression is equal to unity. in any given xarray. when r + s is odd.9%*) Y{2<J„2(1 . the y's are distributed normally with mean y = pOyXf<jx 11 and variance 5 y 2 . then. therefore M(<!..23) But the exponential contains only squares of tx and t2 and the product tlt2 and.22) becomes M(ty. (ii) filn = 3CTx* . &Y. and M(tx. tt) = exp [•i(<j*»<l« + 2 P <j x <j i ^ ! + rsvH / ) ] (A. t2) is the meanmoment generating function.02 = o / X9 n *vJ f ex P Thus. . We see.x2/2gx2) . if we make the substitutions X — x — ax(axt-L + per. exp ( .p2). they are in fact the standard deviations of x and y and the correlation coefficient of x and y respectively.ZpXYiw. Some other properties: (A.p2)i + pV/(ji 2 )/(2(l .P2) V 2 t c X exp { .P 2 )}]} P [ - Why - But by (A. = P<W* or p = i^J.e. i. ft22 1 = (1 + 2pa) ( u / O : e x = 2 Mm = 2pyx/oia* 3a*.. and. Consequently [*2o = <*c2.

2 S Xij = Xlx + X12 + .1) sum of . xn_ 1 • x„. + xn. . x2 . n(n — \)(n — 2) . . exponential function. .238 statistics T h u s t h e regression of y on x is linear. where e — 2-71828 . greater t h a n or equal t o .-^22 + • • • + + • • • + + xml + n x m<t n x mn + • • •+ I I Xi = <= i = > . .tend t o (limit).-)xln <=1 x "I" z i "f.2. 00 = infinity. is base of natural logarithms. n\ — factorial n. in general. . . natural logarithm. is constant. Ax = small increment of x. . . denoted by Greek letters.s)! n S Xi = x1 + -f . greater t h a n . . 24) a n d t h e variance in each array. regression for the bivariate normal distribution is homoscedastic (see 6. Consequently. less t h a n . . . (3.> .14). x—> a . !=1 m n . L t f { x ) or X —> a Limit f(x) = the limit of f(x) as x t e n d s t o a.. SOME MATHEMATICAL T H E I R SYMBOLS AND MEANINGS exp x = ex.x . less t h a n or equal to. .. Q = H / s ! (r . In x = logc x. Population parameters are. x3 . estimates of these parameters f r o m a sample are denoted b y t h e corresponding R o m a n letter. 2 . t h e regression equation being y = P-*. < = ^ . . approximately equal to. being S„ . (A. ^ = Xi . . product of .

G. Ya. M. Dick. F. Lambe. Moore. G. Rourke. G. Introductory Statistics (Pergamon). Mood. Wilks. Good. Elementary Statistical Analysis (Princeton University Press). The Advanced Theory of Statistics (Vol. G. Probability and the Weighing of Evidence (Griffin). Introduction to Statistical Method (Heinemann). Khinchin. Statistical Mathematics (Oliver and Boyd). W. A Statistical Primer (Griffin). H.O. G. R. Quenouille. J. Yule and M. M. Paradine and B. S. F. Tippett. David. E. Stuart. M. Quenouille. F. Aitken. I of new three-volume edition) (Griffin). 239 .. Brownlee. H. S. C. Hoel. L. Kendall and A. H. The Design and Analysis of Experiment (Griffin). Facts from Figures (Penguin Books). Introduction to the Theory of Statistics (Griffin).M. Kendall. G. G. David. Feller. C. H. Brookes and W. Fisher. F. Gnedenko and A. More Advanced A. Mosteller. Introduction to Mathematical Statistics (Wiley). L. G. A. A. M. Moroney. M. Statistical Methods for Technologists (English Universities Press). Industrial Experimentation (H. Rivett. N. K. P. V. A.). Statistical Methods for Research Workers (Oliver and Boyd). Probability and Statistics (Addison-Wesley) C. An Elementary Introduction to the Theory of Probability (Freeman).S. P. N. Principles of Statistical Techniques (Cambridge University Press). M. R. P. C. Thomas Jr. K. C. Answers to Problems in Introduction (McGraw-Hill). M.SUGGESTIONS F O R F U R T H E R READING B. Statistics (Oxford University Press). Statistical Methods and Formulae (English Universities Press). Kendall. B. Introduction to the Theory of Statistics (McGrawHill). G. An Introduction to Probability Theory and its Application (Wiley). Exercises in Theoretical Statistics (Griffin). J. Probability Theory for Statistical Methods (Cambridge University Press). I. B.

Kendall and W. Also M. E. Weatherburn. A First Course in Mathematical Statistics (Cambridge University Press). G. .240 statistics C. A Dictionary of Statistical Terms (Oliver and Boyd). R. Buckland.

.

O. In c a s e s o f difficulty c o n s u l t y o u r l o c a l b r a n c h o f t h e British C o u n c i l o r write t o : The P.p r i c e d editions of specialist works. London.) Inside y o u will find a list o f s o m e o t h e r E L B S low-priced editions.s e l l e r m a y be a b l e t o s h o w y o u a c o m p l e t e list o f E L B S titles. 2 0 (4s. Language BR. T E A C H Y O U R S E L F STATISTICS is f o r sale at a U . price o f £ 0 . of accuracy and usefulness each or v o l u m e in w h i c h h a s b e e n s e l e c t e d f o r its high This is the metricated edition of a book which will help readers who have to teach themselves something about statistics to understand some of the fundamental ideas and mathematics involved. K .s p e c i a l i s t standard interest. English Box 4 W . Book Society. l . Od. Y o u r ^ o o k . There is after the first introductory chapter a set of exercises at each chapter-end for the reader to work through.THE ENGLISH LANGUAGE BOOK SOCIETY. publishes a v a r i e t y o f series f o r t h e n o n . in a d d i t i o n t o its l o w . P R A C T I C A L B O O K S ISBN 0 340 05906 0 .

Sign up to vote on this title

UsefulNot useful- An Introduction to Statistics
- Teach Yourself Algebra
- Probability and Statistics
- Teach Yourself Geometry
- Basic Skills in Statistics
- Statistics
- statistics_9781429277761
- Statistics for Business
- Statistics for Management-For VTU
- Applied Probability
- Statistics
- Teach Yourself Trigonometry
- 29518664-Introducing-Statistics
- Statistics
- Essential Mathematics and Statistics for Science~Tqw~_darksiderg
- Elementary_Probability.pdf
- Data Mining 2003
- 0521850126.Cambridge.University.Press.The.Mathematics.of.Behavior.Oct.2006
- Teach Yourself Accelerated Learning
- Statistics Without Mathematics
- Introductory Statistics
- Mathematics - Teach Yourself Trigonometry
- Algebra Know-It-ALL Beginner to Advanced, And Everything in Between
- Mechanics
- Schaum's Outline of Elements of Statistics II~Inferential Statistics by Ruth Bernstein, Stephen Bernstein
- Applied Statistics For Business.pdf
- Clustering for Data Mining
- Statistics
- Excel Statistics - Neil J. Salkind
- Teach Yourself Statistics

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.