Professional Documents
Culture Documents
1900 On The Criterion That A Given System of Deviations From The Probable in The
1900 On The Criterion That A Given System of Deviations From The Probable in The
X. On the criterion that a given system of deviations from the probable in the
case of a correlated system of variables is such that it can be reasonably
supposed to have arisen from random sampling
Karl Pearson a
a
University College, London
To cite this Article Pearson, Karl(1900)'X. On the criterion that a given system of deviations from the probable in the case of a
correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling',Philosophical
Magazine Series 5,50:302,157 — 175
To link to this Article: DOI: 10.1080/14786440009463897
URL: http://dx.doi.org/10.1080/14786440009463897
This article may be used for research, teaching and private study purposes. Any substantial or
systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or
distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss,
actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly
or indirectly in connection with or arising out of the use of this material.
[ 157 ]
Z - Z0e , . (i.)
where R is the determinant
I ~i~ rib . . rln
~1 r32 i . . r3.
9 o 9 9 ~ 9 . .
and Rpp, Rpq the minors obtained by striking out the pth row
and pth column, and the pth row and qth cotumn. $1 is the
sum for every value ofT, and S~ for every pair of values o f p
and q.
Now let
x ~ = S I ( R ~ , x ~p )' + 2 S ~' ( RPqXpX---A~
~ o'p~r~] . . . . (ii.)
Yo e-~X:U"-'dx=(n--2)(n--4)(n--6) . . . (n--2r)
e.~x d X + e-~X
Downloaded By: [McGill University Library] At: 22:36 8 November 2009
i- + 1--_g + t. .5 + . . . + f
P=
~ "| (iv.)
But
Thus
yo
~e-~x~dx _ /~ 9
P = e-~x~dx
t ~- ,- ~/X_ X3 9r Z'- ~
+~J~. e-,~ I 1 + ~ + 1----_3.5 + . . . + 1.3.5._: n-2)" (v.)
As soon as X is known this can be,at once evaluated.
Case (ii.) n even. Take r = ~n-- 1. Hence
-v ~rdo
Again~ the series in (vi.) for n very large becomes #•
and thus again P = I . These results show that if we have
only an indefinite number of groups, each of indefinitely
small range, it is practically certain tha~ a system of errors
as large or larger than that defined by any value of X will
appear.
Thus, if we take a very great number of groups our tesL
becomes illusory. W e must confine our attention in calcu-
lating P to a finite number of groups, and this is undoubtedly
what happens in actual statistics, n will rarely exceed 30,
often not be greater than 12.
(3) Now let us apply the, above results to the problem of
the fit of an observed to a theoretical frequency distribution.
Let there be an ( n + 1)-fold grouping, and let the observed
frequencies of the groups be
?~ll~ sly, mla.., mtn~ mfn+H
and the theoretical frequencies supposed known a priori be
ml, m2, ma 9 9 9 m~, ~tn+l
then S(m)-- S(m') = Iq = total frequency.
]~urther, if e = ,d--m give the error, we have
e l + e ~ + e 3 + 9 9 9 -t- e~+l----0.
Hence only n of the n + 1 errors are variables; the n + tth is
9 Write the series as F, then we easily find dF/dx=I+KF, whence by
integration the above result follows. Geometrically, P = I means that if
n be indefinitely large, the nth moment of the tail of the normal curve
is equal to the nth moment of the whole curve, however much or
however little we cut off as "tail."
Probable in a Correlated System of Variables. 161
determined when the first n are known, and in using formula
(it.) w e treat only of n variables. N o w the standard de-
viation for the random variation of ep is
~. = J N ( l _ m , ,m,
-ff]N' . . (,,ii.)
and if r~q be the correlation of random error ep and %
1~IP~q 9
O'pO'q?'pq N . . . . (viii.)
Downloaded By: [McGill University Library] At: 22:36 8 November 2009
N o w let us write mq
~- =sin~flq, where flq is an auxiliary
angle easily found. Then we have
crq= v'N- sinBq cos fiq, . . . . (ix.)
r~q = --tan/3~ tan Bp . . . . . (x.)
W e have from the value of R in w 1
~
1
9 9
tan fl~ tan/32
. . . - - tan fl,~ tan fl~
, ~ 9 9 , ~
/
9 9 , 9 ~ , , ~ , 9 9 , 9 ~ ~ 9 9 9
9 9 . . . . . . . , 9 9 ~ , . , 9 ~
j @ ~ O @ ~ I S g O 4 $ Q
9 ~ 9 , ~ ~ 9 9 ~ ~ ~ ,
1 1 1 - - cot~fl
J= --~h I 1 ...... 1
1 --~ 1 ...... 1
1 1 -- ~:~ . . . . . . 1
Downloaded By: [McGill University Library] At: 22:36 8 November 2009
9 o ~ ~ * ~ ~ ~ ~
1 1 1 ..... --~
Clearly,
J12 --- -- 1 1 1 ...... 1
1 --~ 1 ...... 1
1 1 --~4 . . . . . . 1
e e , t o * . .
9 ~ ~ . ~ 9 ~ ~
Now
//~n-b I
=S N 'by(xi')'= N =l---
N .
Thus :
J =
Probable in a Correlated System of Variables. 163
Similarly :
X [m~
Thus :
R Op 0 ~ Tl'lp "Inn+ 1
Again :
\ rap/ mn+l
But
81(~') = -- e.+l,
hence :
= e-,~x d X
P jr x
x ~ = S ~~"(,,r -, ~ } = s t f - / - -,,+t,
. , s - g ) ~}
_ -- -s } \G- ~-
Downloaded By: [McGill University Library] At: 22:36 8 November 2009
#Vl~ t #~s
No. of Dice in
Observed Theoretical
Cast with 5 or 6 Deviation, e.
Frequency, m'. Frequency, m .
X'oints.
26306 26306
Gr e 2, Group. e 2, e2/7/t.
higher points.
Illustration I I .
If we take the total nmnber of fives and sixes thrown
in the 26,306 casts of 12 dice, we find them to be 106,602
instead of the theoretical 105,224. Thus 106,602 _=.3377
nearly, instead of ~. 12 x 26,306
Professor Weldon has suggested to me that we ought
to take 26,306('3377+'6623) ~: instead of the binomial
26,306(89 1~ to represent the theoretical distribution, the
difference between "3377 and ~ representing the bias of the
dice. I f this be done we find :
' ! ! '
Runs ....../] 1 2 / 3 4 5 6 7 8 9 10 1112[ o~12
No. of Petals... 5 fl 7 8 9 10 ] 11
n e a s u r e areas : ~
Observed
Groups. m'. Curve m t. e2/:~, o
S u b s t i t u t i n g a n d w o r k i n g out we find
P='101='I, say.
Or, in one out of every ten trials we should expect to differ
from the frequencies given by the c u r v e b y a set of devia-
tions as i m p r o b a b l e or more improbable. Considering that we
should get a better fit of o u r observed and calculated fre-
quencies b y (i.) r e d u c i n g the moments, and (it.) a c t u a l l y
+ Due to taking ordinates ibr areas and fewer figures than were really
requirad in the calculations.
Probable in a Corl'elated @stem of V(o'iable~. 171
calculating the areas of the curve instead of using its ordinate~,
I think we may consider it not very improbable that the
observed frequencies are compatible with a random sampling
from a population described by the skew-curve of Type I.
Illustration VI.
In the current text-books of the theory o f errors it is cus-
tomary to give various series of actual errors of observation,
to compare them with theory by means of a table of distri-
bution based on the normal curv% or graphically by means of
a plotted frequency diagram, and on the basis of these com-
parisons to assert that an experimental foundation has been
Downloaded By: [McGill University Library] At: 22:36 8 November 2009
Observed ~Normal e~
Belt. Frequency. Distribution. e. m'
1 ........... 1 1 0 0
2 ............ 4 6 --2 "667
3 ............ 10 27 -17 10'704:
4 ............ 89 67 +22 7"224
5 ............ 190 162 + 28 4"839
6 ............ 212 242 -30 3-719
204 240 -36 5"400
Downloaded By: [McGill University Library] At: 22:36 8 November 2009
7 ............
8 ............ 193 157 +36 8'255
9 ............ 79 70 +9 1"157
10 ............ 16 26 --10 3"846
11 ............ 2 2 0 0
m
o
.m
Downloaded By: [McGill University Library] At: 22:36 8 November 2009
~ d ~ d ~ 4 g ~ d 4 ~ d d d d
ml
E'-+
9 ~ ~ ~ ~
~ ~4~4~dd~rdddgd
.~ ,,4
0 ~ t ~ ~ ~
~o
2 9
0
..+ ~,
9
0J
. +B