Professional Documents
Culture Documents
in
2008
Microsoft Corporation
http://www.archive.org/details/frequencycurvescOOelderich
FREQUENCY - CURVES
AND
CORRELATION.
\
BY
W. PALIN ELDEETON.
|jttblisjj*i>
for
Institute of
Qttimms
BY
CHAELES
FARRINGDON STREET,
Wb
&
TABLE OF CONTENTS.
Preface by the President of the Institute of Actuaries
Introduction by the Author
...
...
.
v
vii
...
Key
to Actuarial
...
xi
PART
CHAPTER
I.
I.
Page.
IL.
Frequency
Method
Introductory
Distributions
III.
of
Moments
...
13
IV.
Frequency-Curves
Calculation
36
V.
48
PART
VI.
II.
Correlation
106
of
VII.
The
Correlation
Characters
not
125
131
quantitatively measurable
VIII.
Probable Errors
IX.
X.
139
145
APPENDICES.
I.
Useful Constants
151
...
II.
and T Functions
Integration
152
ill.
The
of
connected
of Error...
IV.
with
Alternative Systems of
Books, References, &c.
Frequency-Curves
160
163
166
V.
...
INDEX
33G536
69
PREFACE.
The main
as
of the present
object
practical
those
modern
statistical
methods
name
of Professor
is
Karl Pearson.
follows
:
The history
January,
Institute
of
the
work
Palin
briefly
as
In
the
1903,
of
Mr.
W.
the
Elderton
read
before
Actuaries
of
an interesting
paper
dealing with
the application
was then
felt
the
discussion
fact
so
of
that
paper
suffered
considerably
from
the
that
Professor
Pearson's
methods, which
had attracted
much
of the
it
to the
profession
to the Journal of
and
actuarial
in the
data.
To
this
invitation
On
was,
however,
if
felt
that
the
work would
instead
published
separately
of in the
form of a paper
to
in the Journal,
undertake
the
necessary
hoped that
in its present
form the
volume
may
to
upon
the
results
enquiries
that
developments
and
of
improvements
great interest
to
in
to
statistical
methods
must always be
profession
is
Actuaries,
and
the
much indebted
great
what
will
it
extent,
the
to
expounded
prove
would
any opinion on
state
this point.
to
as
indications
of
any
view as
methods
to actuarial
problems.
The
fact
is
that
by the Institute
slightest
degree,
lessen
the
indebtedness of
all
actuarial
F. B.
W.
By
is
made
of
more
practical
tell
methods
modern
work.
It is difficult to
how
far
to
such methods
may prove
useful
if
in
direct application
it
have
some
knowledge
of
contemporary
study
of
subject
side.
It has
been necessary
selection
to
it
and
in
making a
from
its
practical
than
its
theoretical aspect,
is
and
any
rate, it
best to
proofs,
namely,
and
of
fit,
while the
has
been
omitted
of
because
as
it
is
not
necessary to an appreciation
moments
a
it
practical
method
(as is
till
obvious
when we remember
had been
to
that
was not
}'ears),
published
the method
it
in use for
some
and
also because
seems
adjustments
have
is
been
found
possible
cases
its
practical value
somewhat discounted.
with,
contingency
is
dealt
but
the
mean contingency
Vlll
it
are more
is less,
the results
Some
readers
may be
method
of least
mind that
is
the range of
its
applicability
to
is
a
it
growing tendency
put
it
more
this
it
likely to
is
whom
and
book
intended.
The median
is
not considered
;
because
if it
is its definition,
number
of
cases before
it is
equal to the
number
after
it.
The
here again
know
it
meaning.
In
comparing
the
way two
things
vary
necessary to
the
remember that
"but
only
mean
and
to the
mean
Vmean /
).
Part
II. will
difficulty
than
Part
I.
because
present received
it is
little
and
on actuarial work
is
is
dealt with
more
in outline
than Part
Here, as in
if
all
goes
through a book on a
is
as
IX
and miss
real difficulties, as of
he
is
to fail to obtain
the
subject.
to
possess,
may
therefore
man
a proof
the
constants
not
followed
It
the
must
is
that
it is
belief
without
proof
man
which he can
test practically,
and
correlation
to
form
subject
in
which there
progress
is
still
much
been
that
has
made
work than
field the
mathematics and
its
applications.
In this
we
are indebted to
Karl Pearson
work
that
has proved a success in practice, and anyone writing on the subject for practical
footsteps.
men
is
bound
to
follow in
his
so
that
they
will
fully
merely tried
to
bring
members
examined by
his
methods
in
the same
way
and anthropology.
May
own
As
will
is
chiefly
is
of the
;
it
is
such
to
He
Gr.
much kind
help
from
Messrs.
Lidstoxe
and Johx
Spencer, both of
whom
with the arrangement of the matter, and the former has also
helped him in
S.
many ways
Ad lard and
arise
and ways
of
removing
them.
W.
Loxdox, August, 1906.
P. E.
AND SYMBOLS
USED.
The
For a
fuller
and
II.),
and
Methods adopted
:
in constructing
C.
&
E. Layton.
1903.
When
among
the
an investigation
is
made
number
of
observation
at
each
age owing
to
by the
or terminating from
some other
The exposed
to
die
When
large
still
an experience ends
of persons
in
will be a
number
of the observation.
Xll
When
are found
.
by
multiplying the exposed to risk by the graduated values of q x result is then compared with the actual deaths.
qnm(5)
is
The
t jie
name given
from
first five
years of assurance.
is
and
31
(healthy males)
in 1863.
is
the
name given
to the
older experience
which ended
qx
is
dying in a year.
}^ear.
px
So
if
is
we imagine
leave
to
X h+i, and
so on.
The value
a
sum
of 1 payable
is
if
person
'4-
aged x be alive at
,
the
end
of
years,
therefore
VH l
^-
+ vHx+2 +
it is
Xow
for
well
vx
,
to multiply
as
this expression
by
and we have
vHx
where
Similarly
Dx
lx
T> x
D x =v x
and
*=^+S**+i+
jKT,
Tables of D,
Ua\
is
certain, independent of
tin
is
any
life,
so its value is v
+ v2 +
,
-\-v n .
annum payable
is
oo
dm.
o
Xlll
Oology
hypothesis
log /.v log
is lx
is
px
'
and Makeham's
hut
colog_p.r
assumes
/a?+i;
that
its
value
is
A + .Bc a
therefore an alternative
way
=k&g*.
valuing the policies in
When
an assurance
office
the actuary
When
when
death
maturity age or
previous
(Endowment Assurances) they can be grouped either according to year of birth or according to the number of years to run {unexpired
term).
more accurate
Progression.
result
is
The
model
office is
an imaginary specimen
valuations.
office
which
is
used for
making approximate
PART
CHAPTER
I.
I.
Introductoey.
The ordinary treatment of probability begins with the assumption that the chance that a certain event will occur is known, and proceeds to solve the problems that arise from
1.
experiment
a limited
it
is
more
likely to
number
is
number
increased.
Experiments
theoretical
can
easily
be
made
method leads
;
to results
practice
when
beforehand
as the ordinary theory leads us Sequences of " heads " or " tails " form a series approximating to the Geometrical Progression with a common ratio of i, and the drawing of cards from a pack gives a result
closely
5 (J 4- i)
agreeing with
the
numbers that
theoretical
work
suggests.
2.
It
it is impossible to tell whether we are dealing with an experiment like coin tossing or sequences or carddrawing; in fact, the only thing known is the distribution of
the
number
to
of
cases
into
certain
groups, and in
these
which the
statistics
statistics
approximate
may become an
important matter.
The
because
exactly,
impossible to tell where the differences between the actual and theoretical results lie. To make the position clearer it will be well to re-state the problem and ask whether it is possible to find the theoretical series to which a series resulting from a statistical experiment approximates. It may be difficult, perhaps impossible, to trace the simple probabilities corresponding to a given case, but yet practicable to form a reasonable opinion of the series of numbers that might be reached if the experiment could be repeated an infinite number of times. On turning to the reasons which make it advisable to find this ideal result to which statistics approach, it will be seen that the elementary probabilities are not so important as they seem to be, and a reasonable
it
and
is
is
We
first
objects of a statistician or an
statistical
work
is
to
express
the
can be easily drawn from the figures that have been collected.
If the
naturally into
fifty
or sixty
groups he has to decide how they can be- arranged to bring out the important features of the problem on which he is working, and if he can find four or five numbers closely
connected with the original series which can be used as an
index to the whole, he can then give the result in a way that might assist comparison with similar statistics, and enable
others
who have
whole
3
distribution
its
original
if
it
remained
to
in
also
supply
approximate values for intermediate terms when only a few can be obtained from his experience, or complete or continue a series when only a part of it is known. In many cases
he has to keep the same terms as his original series, but remove the roughnesses of material due to limitations in the number of cases available for his investigation; that is, he
has to graduate his data. 3. In reality these objects are
tables can be represented
much
by an algebraic or transcendental formula, we can replace the whole series of numbers by a few
values
(the
constants
in
the
formula) which,
if
we
T
deal
systematically with
the
distributions
we meet,
facilitate
comparison or enable us to supply missing terms, while the roughness of the original material can be removed by making
possible. If a
it
a suitable formula represent the original statistics as nearly as formula is based on the theoretical considerations,
should also give a solution of the problem in probabilities
mentioned
practical
that both the at the outset, and we see and theoretical requirements can be dealt with at the same time, for the smooth series sought by the theoretical student is the same thing as the formula required
for practical work.
4.
The advantages
of
simplicity
any system of curves depend on the the formulae and the number of classes of
of
observations
that
can
be
is
dealt
with
little
satisfactorily,
for
complicated expression
very
in
difficulties
;
whenever
a formula
it
breaks down.
necessary
if
is
known
to
method of finding the arithmetical constants that will give a good agreement in the particular case. Such a method, if it is to be of practical use, must be simple, reliable and capable of general and systematic application.
to
A broad idea of the objects to be accomplished ought they are likely to be be kept clearly before the mind forgotten because of the large amount of detail necessarily It is also important because connected with the subject.
;
to
and short cuts and rough and ready methods are adopted the detriment of the work, and formulas having no basis and having no connection with others scientific
to
suitable
similar
cases
are
sometimes
used
in
rather
The consequence is that generalisation is impossible, and where a law might be found one can see little but a great variety of attempts by energetic
haphazard fashion by
statisticians.
workers to reach their own conclusions regardless of the value of comparative statistics.
CHAPTER
II.
Frequency Distribution.
1.
of times,
way, then
the
arrangement
is
frequency distribution.
Although some
we
2.
shall generally
It is necessary to
been adopted for the purpose. The geometrical progression which describes the number of sequences in any direct experiment, such as coin tossing or dice throwing, is a frequency-curve, the equation to which is y = ~Nax
.
3.
Some
number
of cases fallin
others
for
more
clearly,
The diagrams on pp. 6 and 7 have been prepared. drawings of distributions, such as those in the diagram, are called frequency polygons or histograms. 4. When statistics give the number of cases for an exact value of the independent variable, it is simple to plot them in a diagram by drawing ordinates and joining their tops, but in
the
the case of groups of values there
is
little
complication, for
_0>
Q_
E
03
X UJ
>
CL
E
re
"EL
E x UJ
we can
(Ex.
either
entire base
diagram) or pnt in ordinates at the middle points of the bases and then join their tops (Ex. III.). The former
II. of
method seems
(e.g.,
to
give
better
idea
of
the
more convenient.
5.
If the reader will
now examine
he
Table
Example
I.
Example
II.
Example
III.
Example
IV.
Example
v.
Withdrawals
Curtate Durations.
Exposed to
risk of
Ages.
Sicknes3 (Watson,
.1/.
Terms of
the expansion
1000(i + f)
of
1J
No.
of term.
Without
Profit
"Old"
Annuities
(Females).
r. Tables,
p. 19).
"Old"
Assurances.
2
3
4 5 6
7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23
29 28 2G 21
18 18 12 11
5
11
7 6
1
-19 20-24 25-29 30-34 35-39 40-44 45-49 50-51 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 100-
2 3
4
5
6
7
11 2
1
8 9
10
11
...
3
1
3 2
...
1,000
1,000
2,995,721
1,000
2,674
1,000
1,000
True Total
1,308
172
79-400
3-998
Mean
Standard
Deviation
)
4-182
37*8750
68-485
4-1996
I
2-76810
I
1-771288
11
1-774894
1-46215
...
Type
VII
'
as the total
can be seen
conception
number of cases is increased, and from how naturally practical statistics lead
a frequency-curve
to
this it
to the
of
describe
the
smooth
distribution that would be obtained if an infinite supply of homogeneous material were available for investigation. In other words, such curves would give an approximation to the " total population" of which the particular case investigated
is
a sample.
6. It
may be
give
frequency corresponding to every value of the independent variable along the whole range of the distribution, and will
not restrict us to a few more or less arbitrary groups as
necessarily done
is
by the actual
statistics.
The binomial
series
and geometrical progression do the same when we imagine we are dealing with something that can be divided into a very
large
number
of groups.
Thus,
if
we mix
a large quantity of
sand of two colours and take out a fixed quantity of the mixture and record the number of grains of sand of either
colour in each drawing, we should obtain a continuous curve from a large number of trials. When a 7. We will now define some important functions. distribution is arranged according to the progressive values
of
a variable
characteristic,
e.g.,
duration,
age,
&c, the
is
called
the
mean
of
given by
"
>
\Y
here
fr
is
the frequency
is
corresponding to r; thus,
2.
m
we
Example L, 200
If
given by
\fx dx
'
where the limits of the integral will be such as to cover the The mean could also be described as whole distribution. the position of the ordinate through the centre of gravity of the distribution (centroid vertical), and this may be of help to some readers.
10
8.
The mode
its
is
or, in
other words,
maximum
ordinate,
and
made approximately,
the various groups.
until
unless
we know
find
the
law connecting
We
cannot
the
mode
it
exactly
is
we know the
is
frequency-curve,
because
and we cannot
greatest.
tell
which ordinate
9,
Now since
distributions,
we must have
itself.
a standard of reference
For
is
this
purpose a function
It is
known
used.
. . .
given by
v
,
. . .
fa a'*+fb b'*+
+f*n'* \
where a h' n are the distances from the mean. form of integrals the standard deviation is
In the
v\
l fx
x x 2 dx\
where the distances x are measured from the mean. The standard deviation measures the way the frequencies Since are distributed in terms of the unit of measurement. the frequencies furthest from the mean are multiplied by the largest values of x3 a large standard deviation shows that the frequency distribution spreads out from the mean, while a small standard deviation shows that the frequency is closely In considering the relative concentrated about the mean.
sizes of
standard deviations,
it
is
a given distribution
is
arranged in two series, first, according to years of age, and then in quinquennial age groups, the standard deviation will
be
five
it is
in the former.
jm^\
The
and
jiugm
The values
of
I.
11
shows two Curves having the same the same area, but the dotted curve has the larger standard deviation because it spreads out more on each side of the mean. The reader will notice from the algebraic expressions given above that the standard deviation is not dependent on the number of cases (i.e., on the absolute size of the curve) but merely on the way they are distributed (i.e., on the proportionate numbers or the shape of the curve) it measures the " spread " or " scatter " of the statistics from the mean. 10. An examination of frequency distributions (see Table I. and pp. 6 and 7) shows that most of them start at zero, gradually rise to a maximum, and then fall sometimes at a very different rate. If the rise and fall are at the same rate, distribution will be symmetrical about its mean, which will obviously coincide with the mode. The difference between the mean and mode is therefore a function of the skewness or deviation from symmetry. In order to get a satisfactory measure, the way the material is grouped must be taken into account, and this leads us to measure skewness by (distance between mean and mode) -5- standard deviation. If the mean is on the left-hand side of the mode when the statistics are plotted out in diagram, this function will be negative, and to remember the sign it is convenient to write
Tlie diagram,
p. 12,
on
ri .
bkewness
= Mean^prMode
show the
rationale of the measure
The diagram
for skewness.
will help to
It gives two curves having the same mean B and the same mode A, but with different standard deviations, and it is clear that the dotted curve, with its larger standard deviation, is more nearly symmetrical than the other curve. 11. We may summarise these functions by saying that the mean and mode fix the position of the curve on the axis the standard deviation shows how the material is distributed about the mean, and the skewness shows the amount of the deviation from symmetry exhibited by the material. These preliminary definitions will be sufficient for our present purpose, but the functions defined will be more easily
;
12
work
at
student working
the
subject
for
the
first
on cross-ruled paper, in order to familiarise He should their nature and appearance. calculate and insert the means in the diagrams, but should not attempt to calculate standard deviations until he knows something: of the method of moments.
distributions
himself with
A B
CHAPTER
Method
1.
III.
of
Moments.
Before we proceed
it
frequency-curves,
will
be well to see
if
some method
of
examples can be found, for it is clearly useless to suggest a curve and have no way of using it. We require, therefore, a general method by which a given formula can be fitted to a particular statistical experience, and may be applied to any expression (for instance, Makeham's formula for the force of mortality) on which we may have decided as the basis of graduation. The first point to be noticed in searching for such a method is that if there are n constants in the formula, we must form n equations between the formula and the statistics. Thus, if we have = 20, 40 and 88, when x = l, 2, and 3 three terms, say, y respectively, and wish to use the curve y = a + bx-{-c% 2 to describe them, we can, of course, find values of a, b and c so that each item is exactly reproduced by equating as follows
statistical
:
applying them to
a+ b+
a
c
2
= 20
+ 2b + 2 c = 40 a+3b + 32c=88
use ?/ = 96 when x=4, and found from the three equations just This suggests given, we should find that when x = 4 y = 164. statistics than there that when there are more terms in the are constants, the equations must be formed by using all the terms, not by selecting from them. The graduating curve
But
if
we have
a, b
a fourth term
c
the values of
and
reproduce exactly any of the observations but will run evenly through the roughnesses of the observed
will not necessarily
14
aA an be n terms to be graduated then, if were perfectly smooth and followed a known law, each term could be reproduced exactly by, say, b b 2 b 3 bn where a = b 1} a 2 =b 2 a3 =b3 and a n = b n Now, if we consider the two series (the a's and the b's), we see that since each term is reproduced exactly
2. Let a 1} a 2
tlie series
x ,
r=n
r=l
.
rn
r=l
and
S
T=l
c r is
cr ar
=2
r=\
cr b r
where
a numerical coefficient.
The
total of the
and the further equations necessary for finding the unknown constants must be formed by multiplying the various terms by different factors and similarly equating the sums of the graduated and ungraduated products, i.e., Sc r a r = l c r b r It still remains to decide the best form to be given to c r and the mean being equal to
t
+ 2a +
2
+na n
one reasonable
use
equation.
of
suggests that
cr
=r
should give
we
shall
have to
to the
some
function
applied
make an powers of r suggest themselves as convenient when integration by parts If, therefore, we write Cr r and give t is attempted. we can obtain as many successively the values 0, 1, 2 equations as we require, and the first two of them give the area and mean of the distribution, which will be the same in the graduated and ungraduated figures.
an
integrable
equation
cannot
the
method is known as the Method of Moments, moments of inertia), and Professor Karl Pearson has (cf., recently shown (Biometrika, vol. i., pp. 267, &c), that it can
This
results.
equations given
15
+ 6 + c) + + 2b + 2 c) + + 3fr + 3 c)=20 + 40 + 88 (a jlJj + c +2(a + 26 + 2 c) + 3(a + 36 + 3 c) =20 + 2x40 + 3x88 a + 6 + c +2 (a + 26 + 2 c) + 3 ( + 37, + 3 c) =20 + 2 x 40 + 3 x 88' 3ft + 66 + 14c = 148 or 6ft+146 + 36c = 364
(ft (ft
2
2
2
(tf
14a+366+98c=972
These equations will give the same result as those from which they were formed, because each of the three terms can be but if we introduce the fourth term, graduated exactly = 4, y = 96, we can modify the moment method by adding a x fourth term to each equation given above and obtain
;
4ft
The
a 25*5
or
= 42-6 c=-3 03=1 y = 14:'l = 47'7 x=2 a?=3 y = 75-3 x=4 y = 96'9
b
t/
This
to
is
a very simple
example, but
it
will
probably help
show the way results are reached, and will serve as a foundation for what follows. 4. The ?ith moment of a particular frequency is defined as the product of the frequency and' the nth. power of the distance of the frequency from the vertical about which moments are being taken ; or the nth. moment of any ordinate
y of a frequency-curve about the vertical through a point distance x from it is yxn, and the nth moment of the whole
distribution treated as a series of ordinates
is
y^ + y
1
2 a'2
where
+ y. +
2
is
the
total
frequency.
Thus,
in
of
81 x
2)
where 5 years
is
known, we can calculate the moment for them immediately by multiplying the frequencies by the powers of the distances between them and the. vertical about
5.
If the ordinates are
16
which the moments are required and then add the results, care being taken to give the distances their proper signs. If areas are given, an approximation is made by assuming them to be concentrated about the ordinates at the middle points of the bases on which they stand the moments thus obtained are sometimes said to be based on " loaded ordinates." The columns after the third in Table II. show the calculation of moments about the vertical through age 77 for the Example IV. of Table I., on the assumption that the frequencies are concentrated at the middle points of the bases.
Table
Central Age of
II.
Frequency
a?
-77
5
Group
/**
(4)
/x.s-
/x*3
(6)
fxs*
(7)
X
(1) (2)
=s
(3)
(5)
57
62
29
23
81
67 72 77 82
87
151
192
-4 -3 -2 -1
1
116
464
207
1,856
7,424
1,863
69
162
151
621
324
151
648
151
1,296
151
-498
239
239
2
3
-3,276
239
1,256
239
157 93 29 6
239
2,512
7,533
7,421
314
279
628
837
92
97
2,511
1,856
4
5
116
30
464 150
3,464
102
750
3,750
Totals
1,000
+ 978 + 480
+ 6,612 + 3,336
32,192
= total
frequency.
statistical
= nth.
unadjusted
'
otber point.
(.
= tk moment from curve about mean. = wth adjusted statistical moment about
moment from
and
H-
n =nth
= th
No TE.
-t/,
adjusted statistical
v',
fj.
moment about
other point.
fx
The
as
is
if,
often
convenient,
totals
we assume
have
the
be unity, the
will
be
divided by
in
1,000.
We
actual
numbers that
I.
occur,
been given
Table
as
the
17
distribution
in that
in
of
1,000
cases,
it
will
way
The numbers
4, 3
column
of the unit of
show the distances from age 77 in terms grouping. The centre of any other group
;
it
is
convenient to
of
This
easier
the
the
enables the calculator to get a rough check on these moments by comparing them with those about the arbitrary origin. The columns (4) to (7) are sufficiently explained by their headings they are formed successively and checked by multiplying / by s 4 the values of s* being taken from a table
;
of the
powers of the natural numbers. 6. It has so far been assumed that moments can be calculated about any point, but it is frequently inconvenient to do so for if we had required them about age 79'4, we should
;
4=
77-79-4
=
,
ot
82-79-4
^
and so on, and it is quite clear that the labour would have been very great. In such a case we can, however, take the moments about any other more convenient point, and then
modify them in the following way Let the distance between A, about which the moments are known, and B, about which they are required, be + d thus, if we want moments about 25" 7 and have found them about if we had found them about 26, cl would have 25, d is '7
:
been '3. Then, if the distance of any ordinate yr from from B is x r then xr =Xp d
,
A is X
and
and
xrn =(Xr d) n
?ith
Now, the
so
moment
is
of the
}
a series of ordinates
%y
Xn
r
we have
v" n
= Zy
x r = Zy r {X r -dy>
= S\_y
(X rn -ndX r ,
x
+^Ijd*)]
,
v n ndv n _
n{nl)
-\
2"j
a v_
,
n (1)
c
18
where v" n
?ith
is
nth.
moment about
as follows
:
B, and v
the
Instead of
= v"n + ndv" n _ +
1
n(n 0}
1)
dV' n
4-
v\
There
7.
is
= v\ -ndv\^- ni
l
^^d^
n_2
...
(2)
little
to
We
will
{i.e.,
work out the moments through the mean) The distance of the mean from
(2) to
vertical
S(X,. ?/,.)
%y r
where
_ 2(X,.y, ~ N
or
is
distance of the
point
is
moment about the centroid vertical is zero, and this leads me to prefer formula (2) to formula (1) when moments are required about that vertical. When we come to deal with frequency-curves, we shall see that this is
generally the case.
8.
as follows
(7)
observations (total of
moments
reference
i.e.,
[v)
to
by the number of and the quotients are the about 77. The moments are dealt with as having a case where unity is the total frequency,
are divided
(2)
),
v\=
The value
of
-480
v',= 3-464
i/ 4
i/3 =3-336
v\
gives the
(lj
mean
(2),
or
the value of d
required,
moments has to be made about the centroid vertical its value is, as we have seen above, the same as v\ in the j)resent case it is the first moment about The powers of d are next the vertical through age 77. as it hap23ens d is a comparatively calculated by logarithms
of
;
;
19
simple
number
if
it
d 2 =-2304
(&=
+ '110592
(2)
d 4 ='0530842.
which
it
In modifying formula
moments
it
about
the
the
centroid
v\ is
are
v
is
required,
zero
and
unity, because
merely
'
total
frequency
divided
by
the
total
frequencv.
v,
v2
be used. It will be noticed that 6d 2 v2 can be formed very easily from Sdv.2 when logarithms are used, as log d is known. It is useful to keep a note of this value (log d) in a conspicuous place when the moments are being calculated. 9. Although the above is the most direct and obvious way
by which moments can be calculated, another method was suggested by Mr. Gr. F. Hardy and used by him in his recent
graduation of the British Offices Life Tables.
that
He
pointed out
by summing- the statistical numbers and forming a new series in the same Avay as the ~N X column is formed from the Dj. column and then summing these results (cf. the i column), and so on, equations can be formed. So far as I can trace, Mr. Hardy has not shown the connection between the summation method and the direct calculation of the moments, though he has pointed out that the same results can be
obtained.
calculation
process.
p.
of the expression
first
we
notice that
series is given,
call
first
vertical situated at
to /(l).
:<
frequency is taken as whole distribution about a unit distance before the point corresponding
S 2 when the
moment
of the
Still
first
line,
we
see that
"
20
<<
!b^
<=r^
+
~
+
~
+
g
_.
-*2
o o
<M
+
s
.
"*
+ *
1
^ O
'^.
1
"-
+
""J*
fa
+
=;
'
O ^
Jh
e
;
o w s
^
"ft
* T
^
+_
O B
CO
'V
+
,,
O ^
^ +
+
CN
-jj
SO
+
g
^s
|f
c^
+
*-.
+
CO
+
jg
s
+
**->
<
J^
^
1?
CO
co
^
-
^
+
CO
r-"
o
c 3
fa
+
w
s
^
(M
1
1
+
7
*c
^
+
<~>
1 +
CO
t
"a
CO
a
co
+
+
co
+
+_
22,
5
+
1
^
^
5 1
t
~i
S
+
,-v
oT
J$
+
=:
^ 3
'
-S"
^C
~
i
-'
=:
'
2
Fn
of
+
+^
p*
CI
^ c. ^ o
_+
CI
^v
s~~-
''
-
in
m
u
%
+
+
CO
^
+
+^
CI
1
^
"*-
/I
"
+
<^.
S
+
co*
"^
.2
>*
"5
>>
CM
V a 3
+
:
t
z
+
+
CO
+
1
O ^
+
>^
o
*->
^^
S,
-\-
"^
m
H
5
c^
+
o7
c?
+
1
n
II
CO
^
*-,
+
52, "->
+
*
^
C-
^
S,
B c
i;
*~>
+
+
+
+
co"
>2>
S
fe
^ +
52,
">
+
CO
^ ^ ^ +
Cl
^ + P
1
-^
^c
"
to
^ ^
^.,
"*-.
^
_
^
^^
^_^
_c
^^
cT
1
~
^.^
2
cZ
K
~
1 "
^
01
^
CO
S DO ^ s
"3
^
O)
""
3 "^
*
^
e
2^
1
-
CD
.2 s c
21
V-2
or
,
'
!
i.e., it
'
gives
+V
2
where
is
written for
the moment, because by definition the 2th moment (v'i) of the whole distribution is given by the sum of n*/(?i) for all values of n. S 4 and S 5 give each function multiplied by
n?+3n2 +2n
h
and
?i
^r
respectively.
A.
result
S4 =i(^3+3i; 2 + 2i;
/ 1)
These equations enable us to calculate the moments about the selected origin, but if it is necessary to find moments about
the mean, the following relations are more convenient
formula
v,
(2),
3
= 2$
-d(l + d)
v3
ir
10. The following table shows the working in the numerical example already dealt with by the direct method. The fifth
sum
is
sum
Table IV.
First 1 requency
Sum.
Second Sum.
Third
Fourth
Sum.
Sum.
29 23
81 151 192
1,000
239 157 93 29
6
670 216 47
6
939 269
53
6
(lor check) )
,,
Tt * a *
Hooo
5,480
19,372
54,508
132,503
1-1
From
we have
The
52
first
mean
is
at
is
age
the used because it centre of the group before that in which numbers occur and, as has been already remarked, the summation method assumes
5-48 x 5
is
= 79-4.
The age 52
the
work
to
be done with reference to this position. The v3j and v 4 given above,
,
,
enables us to find
v2
v3
3-2336
1-43099
==
p4
11.
30-4164
This is the most obvious way of using the summation method, but if the series contains a great number of terms, it is more convenient to use a central term instead of the first term as the starting point for the summation.* A slight adjustment is then needed because, though there is no difficulty
about the calculation of the sum for the terms on the positive side of the selected point, the moments for the terms on the
negative side are formed by multiplying by the powers of negative quantities. In order to use the formula: given above,
we
require Suf(n);
%^
-
when n
is
negative^ -nf(-n); X
K -
^_
ul
j,
)
j(-n)ovl2)
2
,,
-f(-n);
N
^
24
v ^~ n(n l){n jp
01
^/(-ra);and
- n(-n+l)(-n + 2)(-n + Z) _ A n
The first of these is given by the last term in the the second ordinary second summation taken negatively from Table III. to come from the term before the is seen
;
last
in
the
the third
is
the
second
term before
and the fourth is the third'. term before the last in the the sums in each case being begun from the fifth sum
;
* 1
have
to
thank Mr. G.
J.
in the
method.
23
central term but in the reverse direction from the sums on the
positive side.
has been prepared, showing the calculation of the summation about the centre of the group of which the frequency is 192.
Frequency.
Second
Third
Fourth
Fifth
Sum.
Sum.
Sum.
Sum.
Sum.
29 23 8L 151
192
29 52 133 281
29
81
29
214 498
110 324
29 139
29
239 157 93 29
6
52
978
4-54
1,648
2,587
3,854
285 128 35
6
169
41 6
670 216 47
6
939 269
53 6
1,000
S 2 = -978-'498 =
S 3 = 1-648 + -324=
-48
1-972
Hence
and
and
similarly
_ 1-43097
= 30-41621
agreeing with the previous results. 12. A comparison of Table IV. (A) with Table IV. will show that a saving of numerical work is effected by using a central
point as the starting point for the summation, for the sums
are numerically smaller
of
S2 or d, which enters
It will
much
smaller.
is
be readily
of
a large
number
terms
the summation method, and especially the form of it given in Table IV. (A), is a very great improvement on the product
method
of
calculating moments.
5
By means
of
an adding
marline, such as Burroughes adding machine, the summations can be obtained mechanically with little trouble, even for
series containing as
many
as a
hundred terms.
'
moments
24
13. It
is
now necessary
from the curve, for until this has been done it is impossible to form equations for finding the constants. are constants to be Let y x =f(cV, a, b,c .) where a, h, c
.
. . . .
determined.
We have would be to
seen,
find
/,,
on pp. 13 and
14, that
one
way
of
working
/(I, a,
.)xl+/(2
f(x, a,
a, b, c
.)
x2+
say,
S
this
b, c
.)
x xn,
and
to
equations
find
would give a result which might be used in forming if it were not for the fact that it is often impossible
an
algebraic
expression
for
It
is,
the
sum
of
such
series in
however, generally
and as we
moment
of
n yxx dx or
Jh
Jh
f(x, a, b, c
.)x
tl
dx.
The
rk
is J
h
total
frequency
{i.e.,
total
m
number
Ck
J
h
of cases investigated)
rk
mean
is
ygxdx-rJh
yx dx, as we have
already noticed.
14. If
the
calculated in this
moments from the equation to the curve are way and equated to the moments calculated
by assuming that the
is
:
from
statistics
of ordinates,
an inaccuracy
introduced.
When
(2)
we wish to pass a curve very through them. When they are a system of areas but the moments are calculated by assuming the areas to be
or ordinates* and
closely
speaking,
not
series
of
values
requiring graduation.
The
to
be dealt
the
way
the whole
number
of cases is
of
25
15.
(1)
?/
y l} y 2
y n -\
^i*e
given by
,
y x doo
is
approximately equal to yQ
it
y x dx is given by the equation to J-* the curve, and we have to find adjustments to counteract the x=n 1 Cn error caused by equating 2 X^. to X.yxdx (the error is
is
.7=0
-i
(l+t)*o;
= 5^).
by
The most
.
practical
way
of
overcoming the
corresponding
to
difficulty is
calculating the
. .
true
area
the ordinates
y 0f y\ y n -\ by means of a quadrature formula (formula of Many formulas are well known, approximate summation).
Text-Book,
it
Part
is
pp. 480-491
to
but
in
for
the
present
ordinates
purpose
lying
convenient
of
have
expressions
within
and
an area without
terms
of
the
base on
be valued stands.
y x dx in terms of
&c.
Symbolically, these
ijl,
i/_i,
y__n,
y^, &c,
yu y-u
y*> y-2,
I.Let
yx
= a + bx + cm + ds + ex*,
2
then
and
y
=a
y-i+yi=2(a+c+e)
7/_ 2
2/2
=2(a + 4c+16e).
Now, assume
+ h(y_ + y + l(y1
J)
+y2),
substitute
the values
given
e
just
above
1,
and
equate
,
the
coefficients of a, c
and
respectively to
y^ and
if
and
other limits
used.
have
to
be taken, such as
must be
26
we have
+ 2k + 21=1
The
_5178
~ 5760
308
k
'
~ 5760
anCl
>
'
~ " 5760
17
'
and we obtain
f
I
y*dM=
1 ^ Q0 {ol78ij + 3Q8{y_ +
1
ij l )
^17(y_ 2 i-
2 )\.
II. If
r*
III.If
yx
= a + bx + ex? + iIm +
3
e.c
IV.If
16.
We
Vjcdx is
yx
vi
J-*
Now,
yxdx=\
yxdx+\
yxdx+...+
yxdx.
If formula I. be applied it can be used for all the integrals on the right-hand side of this equation except the first two
and the
last two,
by IV.
Summing
the values
[V^=^{W63y,+4871y + 6660y
1
+5537y, + 5760(y4+yf+.
4
+ 4371y_ 2 +6463yw _
which means that we can multiply the first and last ordinates 6463 = 1-1220485), the second and last but one by 5760
by
^Zl = -7588541),
o760
( v J
the
the
third
and
last
but but
two
three
by
bv
o760
iV37
= 1-1578127),
( v
; (
fourth
and
the
in
last
5760
this
less
= 9612847),
leave
all
other
the
ordinates
unaltered,
usual
if
way
there
from
are
Of course,
than
eight
ordinates
another formula
must
be
evolved.
17. In the following table the original series
one are set out in the first two columns, and in the other columns the calculations of the first four moments about the middle of the range by the direct method are shown
:
Table V.
Modified by
y*
Formula V.
y'x
',;
y'% x
ff
y'x x
#3
y'x x
&
51-81
58-13
232-52
99-57
82-02
930-08
288-71
3,720-32
14,881-28
43-74
35-58
33-19
41-01
866-13 328-08
26-72
2,598-39
164-04
26-72
656-16
26-72
26-72
20-42
13-26
9-52
26-72
-440-83
-4,941-25
13-26
1326
19-04
9-78 7-60
13-26
1326
152-32
38-08
76-16
88-02
4-29
1-69
326
1-90
29-34
30-40
264-06
121-60
486-40
208-38
207-41
+ 49-68
+
1,520-63
299-04
19,078-59
-39115
- 4,64221
28
then treated as the total frequency, and the moments for unit frequency (fju n would be obtained by dividing -391-15, 1520-63, &c, by 207*41, and not by 208'38,
is
)
207 "41
which
18.
is
",
uncorrected
sum
if
the values at the ends of the experience are very small and have a tendency to keep close to the axis of x before they
finally vanish
(i.e., if
,
there
is
high contact
most actuarial
functions l x , a x D x , &c, have high contact at the old age end of the table), then it is reasonable to suppose that
ordinates
before the
first
and after the last exist, but Thus the integral corresponding
whole series of ordinates can be legitimately extended beyond the limits \ and n\ previously used, because the
additional area thus introduced will be evanescent.
Now
if
the significant ordinates from y to y n _ will all have the coefficient unity, and the ordinates with weighted coefficients
will all vanish.
The
practical result
is,
that
if
there
is
high
if
there
is
high contact at
necessary.
Mathematically, high
first
few
high contact at
p.
The diagrams on pp. 73 and 90 show both ends of the curves, and the diagram on
19. (2)
are
The second case, namely, that in which mid-ordinates used instead of areas, may now be examined. By concentrating areas about the middle points of their
bases,
ft
we assume
f
;
that the
distances
1*
yx dx
ijxds,
&c;
that
is,
the
tth.
+h
\
11
yx dxXH
ydx{X+iy+
+\"~ J
.
.
y x chiX+n-iy
and we require
(X + a) y x dx, where
f
is
the distance of
29
Bv formula
I.
-L{
where h is written for X + oj in order to simplify the expression, and working out this general coefficient Ave have
If
>>
=l
this
99
becomes h
}i
t l
t
9
=S
h2
,L
-\'
1 2
/i
+ J/t
It
=4
/l
+i
/l
+ _l_
if
there
is
value of
ordinates
+ xyydx
is,
is
that
the second
is
moment
;
,
is
given by a
hence,
the
if
series,
h2y
is
h3 y
and
be
v
moment about
relations
mean and
/x
moment, the
between
and
v are
given by
/jL2
+ t\ = Vo
or fi2=v2 Yw
The mean needs no adjustment, for if ^ = 1 the general term h, and the third moment has to be adjusted by J of the first moment, which is zero where the moments are taken about the mean. These adjustments were first given by Mr. W. F. Sheppard in Proceedings of the London Mathematical Society, vol. xxix., pp. 353-380. In order to demonstrate the correction for the ?ith moment by the above method, a parabola of at least the nth. order is necessary. If we apply these adjustments to the moments found on p. 19, for Example IV. of Table L, we have /x, = 3-1503, /x 3 = - 1-430976, and ^4 = 28*828322. These adjustments are found to make a
has the correct coefficient
considerable
difference in
moments
20.
especially
is
when
there
a small
number
;
of terms.
When there
have been made for finding the corrections, but they are not
30
altogether satisfactory, and
to use the
it is
A
SD=
distributions,
also find
\/yL6 2
moments for one or two and make the necessary adjustments he can
;
standard
deviations of
where the /u,2 has been adjusted in accordance with In Examples III. and IV. there is clearly high contact, in II. and V. there is more doubt but the adjustment is advisable, while in I. the rough moment should be used. 21. Before proceeding to deal with fitting more complicated curves it is advisable to consider the application of the method
the above rules.
of
moments
to a simple case,
21,
and
+ bx + c.r
at
-f
&c.
the middle
moment
Then
of the
w =
2s
(a
+ bx + ex2 +
XI + +1
bl
-n 2s+3
.)x*d&
,2s
O^Q + 2s 3
-|-
and similarly
2,
+ i=21
2s+
+ 2s+5 +
'
constants
b,
These equations show that the even moments give the a, c, e, &c, and the odd moments give the constants
This
is,
d,f, &c.
moments
about the middle of the range, and makes the solution of the
equations less laborious than they would otherwise have been.
The
by writing
a
ll'
I'
2S
^~2s+
7+ s 3 + 2s +
;
cl
'
so that
.m
1 1
2
= flf+
a
cl 2
-TV
+
2
p/ 4
y+
4
2
1
'
1
'
cl el ~S + 5 + 7 +- " m _ a cl el ~5+ 7 + 9 +
I
m _
2
4
'
'
31
and
similarly
1
.
m,
.
2Z
1
bl =-
cZZ
4-
3
3
o
dl s
7
fP 4-4^ ^
,
m _bl ~ 5 2i"
3 Z
+ *9 + +
fl
9
fP
II.
m _bl
5
Z
21
~~
dP
9
11
+
equations
gives
The
solution
of
these
the
constants
if
7/
= a 42l
fea?j
we have
a=
m
1
Z>
=3
I
2Z'T
(ii.)
if
=3
771,
/
7'2Z"
f
C_
(iii.)
15
2
4Z \
2Z
3 wiJ m + 2Z>7
if
a-\- bx + ex
3(3
4-
cZa;
m.j I
15
m,
Z
ra 3
4l\2l*
"""2l'"FJ
C_
15/
4Z 2
35 4Z 3
r
f [
2Z'
mo+ 2l'Tj
,
m,l
3 m, 2/' Z
w
Z
3)
2Z*
The above
is
results,
which can
be
easily be
extended
if
it
wished,
may now
examples. 22. As a first example, we shall graduate the statistics in Table V., Art. 17, for which the moments about the middle
32
of
the
range have
,
been calculated.
Taking
the
curve
y = a + h,r + cxrequired
:
21
=9 m=
or
= 4-5
207*41
W!=
m,=
391-15
1520-63
Hence
5 3 f a= (-g- - g X j
622-23
-^
1520-63)
}
= 20-563
3
4-5
1
(-391-15)
4-5
- -6-4387
C_
15
f
4(4-5)U
_ 207-41 + 3 9 9
1520-63 \
(4-5)*
J
= 36815
23. The best
way
graduation
is
by calculating
+c
the
first
difference,
and 2e
the second difference, from the middle term ; their values are 6-0706 and -7363 respectively. Since second differences
are constant,
follows
the
work
is
done
continuously,
and
is
as
A
52-208 43-192
-9-016 -8*279
-7-543
-6-807
-6-071
-736
34-913
27-370
20-563
14-492
9-157
-5-335
-4-599 -3-862
4-558
696
fairly well
33
24. As a further example the following
a paper by S. H. J.
statistics,
taken from
Allin (Journal of the Institute of Actuaries, xxxix., p. 350), and giving the values of annuities
to
W.
widows
member,
may be
considered
Modified
Value
Age.
of
by Formula
p.
V.
of x
a' x d-
' x
d3
Annuity.
2*7
a'
by
2.
27 32 37 42
21-20
19-91 19-34 18-58
23-79 15-11
-7
16653
75'55 67-20 17-86
5
-3 -1
2240
17-86
-327-14
47 52 57 62
16-74 16-09 18-17 11-15 14-58
-10671-38
+1 +3 +5 +7
1609
54-51 55-75 102-06
1609
163-53 278-75 714-42
16-09
139-15
+ 228-41
2935-71
+ 6901-37
-3770-01
98-73
In calculating the above moments it has been assumed that the figures to be graduated represent a system of ordinates
;
if
they had represented a system of areas the adjustment by formula V. would have been unsuitable.
is an even number of terms the difficulty of moments about the middle of the range is that the terms have to be multiplied by -5, To, 2*5, &c, and
When
there
calculating the
if
it
is
way shown
above, and
2, 4 and 8, in order to obtain the second and third moments respectively. In this way, we have
first,
1=
7W
7)1]
= 139-15 = - 49-36 =
733-93
W*2
m =
3
-471-25
34
now
fit
We
will
the
statistics
with eacli of
the three
curves, the formulae for which have been given, and compare
y= 17*394- M57a;
y
(ii.)
(iii.)
= 17-633- l-157aj-*0451
y=17-633-ri9Qaj--0451aj?+-0035a;3
table shows the graduations
(i.)
(ii.)
:
The following
Age.
Ungraduated.
(iii.)
27 32 37 42 47 52 57 62
1674
15-69 14-70 12-99
2113
20-24 19-27 18-20 17-04 15-80 14-46
21-13 20-28
19-31 18-22 17-02 15-76 14-43 13-05
1303
Formula?
25. The
follows
:
(ii.)
and
(iii.)
and both
as
(i.).
obtained
so
far
may be summarized
is
(1)
The method
of finding
of
moments
a a
eneral
method
consists
the constants
in
formula suitable
it
to a particular statistical of
is
example, and
of
2/(?j)
is
equating
of
the
values
x nf
(which
for all
moment, and
occur)
to
summed
values
that
similar
expressions
These latter
The moments from the statistics can be calculated by multiplying the frequencies by appropriate values of n or by Mr. G-. F. Hardy's summation
f
,
method.
(3) If
vertical,
they can be transferred to any other by the formulae in Art. 6 of Chap. III.
(4)
Since the moments from the graduation formula must generally be found by means of the integral
35
calculus, while those
by summation, the
latter
can be correctly formed. The adjustments depend on whether the statistics are a system of ordinates or a system of areas in the former case adjust;
ment
there
is
made by equation
V.,
and
in the latter
by
if
D 2
36
CHAPTER
IV.
Frequency-Curves.
1. When it becomes necessary in practical work to decide on a system of curves for describing frequency distributions, we
have to bear
(1)
in
mind that
Any
it
expression used must be a graduation formula; must remove the roughness of the material.
so
(2)
many
we require a great number of moments, for means that the accuracy is reduced. The higher the moment the more liable it is to error when deduced from ungraduated observations; this is clear, when we remember that the ends of the experiences are multiplied by the highest numbers and their powers.
(3)
2.
more obvious
characteristics
of
frequency distributions, we find they generally start at zero, rise to a maximum, and then fall sometimes at the same but
often at a different rate. there
is
At the ends
y
of
the
distribution
a series of equations
so that in
= f(x)
= <p(%),
-r^
clc
in certain
is
cases
; '
at the
maximum
the test of a
zero
to
maximum
that the
first
differential
coefficient is
when y = 0,
of
be contact at one end, at least, or, in other words, the angle formed by the tangent to the curve at this point
for there
is
the range
of
the
distribution,
must be
zero,
in
order
that
the
tangent of
the
angle
'
87
(i.e.,
differential coefficient)
may be
zero.
In non-geometrical
lano-uao-e,
the
finite
difference
The
f
above
-,
suggests
if
that
may be put
if
equal
to
J^f F(x)
then,
= 0,
'
~ =0, dx
--
and
x= a,
So
-~-=0, and dx
F(a')
is
we have
the
maximum we
require.
is
long as
when
-j*-
may not be
zero
when
is
zero.
F(x)
X)
in ascending
powers of
y( + o) + b x + b x2 +
x
j
. .
.
We
in the
;
and show how it can be put form y = f(x), so as to express y as a direct function of x but as the matter has up to the present been approached from an experimental point of view, it will be interesting to see how equation I. can be obtained up to the x2 term in the denominator from elementary propositions in the
shall return to this equation
theory of probabilities.
3. If
p be
the probability of
the probability of an event happening and q its failing, then the probabilities of its
trials are
given by
the terms of the expansion (p -\-q) n ; or if we have cases, the terms of + q) give the frequency distribution of the (p
11
N cases
nearly
occurs
r,
into
n groups.
The binomial
all
is
.
the hypergeometrical.
.
of getting
1,
pn(pn 1)
n(n i)
.
(pn r-f-1)
f^
I
rqn
(n r+1)
pnr + 1
r.r
1
!
(pn-r + l){pn-r + 2) +
help
to
qn(qn-l)
"
'
\i
numerical example
may
make
the
way
the
A bag
38
are black
then
if
drawn
the'
probability that
all will
be black
is
4.3.2 7.6.5
' '
two
will be black
is
7.6.5
is
x 3 Cj
W x C 7.6.5
3
none
3.2.1
will
be black
is
7.6.5
is
The sum
unity.
be seen to agree with the series by putting n = l pn = 4>, qn = S, and r= 3. Other series may arise, but those given will be sufficient for the present purpose, and we shall proceed to consider how they can be put in the form of equation I. The inconvenience
of the expressions as they
fairly obvious
large
when an attempt is made to calculate numerical values for a number of groups, and besides this, they are not continuous, while the statistics of practical work often are.
Considering the hypergeometrical
that the
the series
f auction
is
required for
y
discontinuous, finite differences must be used,
we
have
?-
pm{pn 1) n (w -l) ..
.
(pn r +
Y)
r(r
1)
.
(n-r+1)
(r a?-f 2) (-l)I
. .
{pn
.
r+l){pn r + 2)
qn(qn-l)
,.
*.=v.n-u=v.\i
Jx
-1
1
and
Ay*
'
39
r
hicli
may be put
in the
form of equation L,
a+x
b
ldy ydx
4.
+ bLx + b.
we
2 ,v-
Returning
to
equation L,
see that
it
can be written in
the form
{b
+ b v + b,x- +
2
.v
)^=y( + a)i
we have
)
xn Q)Q + b xx +
b. a2
.)
dx =
n y [x + a)x dx
by parts treating
as one
side as the
sum
of
two functions,
x+K x +
2
.)y {nb^j
+ {n+l)
,
xn
. .
+ (n + 2)b,x n + +
.)ydx
or since
jj
at
expression xn (b
notation
+b x+b2p' +
fi'n z=
.)y vanishes,
we have
ft>'n+i
+ a>f*n*
If
we put n0,
1,
s respectively,
bx
. .
.
we get
s-f 1 equations
to enable us to find a, b
(//)
moments shown by the following equations, which have been obtained by writing the equation in the form
,
&c,
in terms of the
as
ctfi'n
+ nb
Q fjb
n_
+ (n + l)&i^' +
3
(71
+ 2) fc^'*+i +
"/*
?i+U
1, 2,
&c.
.
+
+6
X
//
&
tt/^',
,
= /A,
a/A i
o/// 3
+ 2bvfi'i-{-Bbifi
+ Sb fl2 + 4^//
a
3
-\
4?> 2y"'
3-f
/
- ^'a
II.
-f
56 2yLt
&c, &c.
40
Let us now make /a'i=0, and alter the other moments in way indicated in Chap. II., for the result of making fi'] = to change the origin of the system to the mean of the
the
is
distribution.
We
b
can
also
treat
//
:
as
1,
and
these
Keeping
only,
we have
1
dy _
x
'
y dx
(2)
/jl 2
Keeping
II.
and
b l}
the
first
three
equations in the
system
above give
and
or
a/jbo
= b = /ju, 36jyu. = +
a+
b{
2
/x 3
&!=
7
fa g
(
a
differential equation
becomes
%_
IfM
/jl 3
ydx
^ + &"
=
(3)
Keeping
b Xi b 2 ,
b
a/jb 2
+ Sb.
2
3
2 2 /jL. 2 fji 3
fl2
3
a/j*
Sb
/u, 2
The
perfectly
dy _
10/i a A* 4
r
*
/*(a*4+3/* 3
i0/i 2ilt4
2 )
-l8/* a
--l2)tc 3
-18/i B 8 -12/i 8
10/i 2 /i 4 -18jt* 2 3
2
-12/i 3 2
10/i 2 /t 4 -18/i 2
-12/t s 2
In this
last
^ and
f^2
/32
=
3)
^
fa
and
yk
1% =
/^(4/3.2
^ y^^tfe + 2(5&-6ft-~9)
-3A) +
-3ft
-6>
41
The reasoning by which equation I. was first obtained showed that a is the distance between the origin and the mode, or as the origin has now been transferred to the mean by putting //'i = 0, a is the distance between the mean and the mode. This distance in terms of the moment is, therefore,
5.
qVA(A+3)
2(5/32 -6/3
1
-9)
y/
where a
is
'^
mean and
6.
It
equation
b3 , b 4 ,
would be possible to obtain constants in the differential I. by using a greater number of terms and retaining &c, but there are strong practical objections to such
Besides the large increase in arithmetical work, the
is
a course.
we have
" random sample, but no safe argument can be drawn from this " individual sample as to the general population at large, at " any rate so far as the argument is based on the constants " depending on these high moments." In some actuarial
statistics
might but even here the value of the work is discounted because any other smaller body of statistics on the same subject could not For practical be compared satisfactorily with the result. purposes it is probable that the equation taken as far as h 2 will be sufficient, and we shall confine our attention to the forms thus obtained, merely remarking that in some extreme cases in graduation another term might be required.
as
many
as 100,000 cases,
it
series,
"Skew
Company Research
Memoir, 1905,
42
7.
Turning
to
it
I.
given in
in
equation III.
be seen that
the
it
is
possible to obtain a
formula representing
statistics
by inserting
that
but this would not give a graduation in the same form as that in which the original data appeared, for in the latter we have
?/, J
dx ^-^
It
we
obtain in order to
it
is
better
forms in which we require them for comparison, rather than by using the differential equations and then integrating the result.
work
The
8.
latter
frequencies.
The next
d log y _ dx
bQ
x+a + biX-\- b x~
2
by one
of the
to do this
x ~4r
(/
must
be integrated. Let us consider equation III. as a general expression for integration, then we notice that the form the integral takes depends on the particular values of the coefficients of x in The problem is, in fact, merely a the denominator. consideration of the forms taken by the denominator for
bQ
+b x+b
x
2 x-
b. 2
[and the
criterion for fixing the
&2 }-|
form
in a particular case
is,
obviously, the
same
equation b
+ b vv + b.,x = 0,
viz.,
-r^
46
o2
2
which, by substituting
/3i(A + 3)
9.
and
of different sign
(Type
positive
and
less
A
43
if positive and greater than unity they are and of the same sign (Type VI.). This really covers all the cases, but just at the point where one type changes into another we can use a slightly simpler transition curve. Thus when the criterion is oo one root is oo (Type III.), when it is unity the two roots are equal (Type V.), while when it is zero the roots are equal in magnitude but of opposite sign The only other transition curve arises when (Type II.). b = b 2 = 0, and the criterion is again zero (Normal Curve of
,
real
Error,
10.
of
Type VII.). The actual integration can now be considered. Type I. The factors in the denominator, when the roots are real and of different signs, take the b + biX + b 2 xr =
form
7
b +
}
-v^a
positive quantity"!
-r
2b 2
x
b >v/a
and the expression
be integrated
positive quantity"]
to
is
(x
x+a + A.\)(x A
2)
A, a A! + A 2
now
A +a
2
aj
+ Ai
Ai + A2
x A 2
by
partial fractions.
The integration
lo g
is
simple,
and gives
y= A
^jrr
lo '
'
+A +
A +A
lo 8'
)+a
constant.
7/=Z/
t l'
+A
A, a )A 1 + A 2
A g +a
(^_A 2
a7+a,
where
x+
a),
y' results
If the origin is
from the constant introduced by integration. now transferred to the mode {i.e., put x for
we have
-KTO-0'
the form given in Table VI.
Type
II.
In
this
type a
= a.
in
Type L, and it
is,
therefore,
44
Type III. This type is reached when the criterion which happens when b 2 =Q,
is cc
a
1
ho
~l\
bx
bxx
/
+b
Qi
-^)Jlog(6i* + 6J+0
bl
and
or,
by
= y'
bl
(^ + W
(l+
5)
("
- ^)
bl
=w
x\y a
where a has a meaning different from that implied in Equation I. This type can be seen to be a particular case of Type I. when a 2 becomes infinite. Type IV. If the roots of the equation b -\-b x + b 2 x 2 = are complex, it is impossible to throw the denominator into and when this occurs, we have to integrate by real factors putting the expression on the right-hand side of the fundamental differential equation in the form
i)
X+c
fc 2
(X2 + A2)
bi
tT7-,
where
A.=x +
-7-
==-.
bi
26o
= a
dX
ana A~ =
.
b
=
26-,
o2
.
b{2
4o22
Then
loo
y=\b-(i?+A*)
=
=
u
\b 2 (X*+A>)
1
dX+
2
)
\x^ dX
c
-log(X2 +
A +
i *n--
-.tan
A.
1
X
-A.
+ constant.
2t>2
where a has a meaning different from that implied in equation I. The relation between this type and Type I. can be seen by
45
factorising the denominator of
differential equation, 62
of
the
and then obtaining an expression for y having the same form as Type L, but containing complex expressions.
Type V.
,
(X iA)(x + iA.)
In
this case,
fl
when
,
x+a
i(*
ftW" ft).
(-ft)
a
r
26 2
da;
01
= sM"
+
26)J
26 2
b2 (x
6,
a- ~26
2
r?
K\ + 2bJ
+ constant
^'('+ftK
= y xpe
Type VI.
y'
4 '^
The factorising
is
the same as
Type
I.,
but the
The work is then denominator take the form (x + A ) (a? + A2) the same, but at the end the origin is put not at the mode but so that one of the expressions x + Ai or x + A 2 can be written
2
.
as x.
The form
is
then
y
=y
(x
a) m 'X~ m
'-.
Type
VILPutting Ol =52 =0
x = ^,- + 2&
ax
-=
^0
2
constant
.
\-
constant
l2b
46
or,
of
negative,
11.
The
of their
table on p. 47 gives a list of the curves, a description appearance and range, the position of the mode and
the criteria. The values of /3i and /3 2 in the cases of Type II. and Type VII. can be seen to be required by examining equation III. The third moment about the mean must be
very small (theoretically, zero) if the curve is symmetrical, and therefore /3i = 0, and it is only when /3 2 = 3 and (3i = that
both the coefficients of x and x 1 in equation being the condition for Type VII.
12. It
is
III.
vanish
this
now
a frequency-curve to
Arrange the
statistics in sequence.
Calculate the
vertical.
Transfer
(vertical
the
moments
to
the
centroid
vertical
4.
If there is
apply Sheppard's adjustments to the moments (i.e., deduct y^ and \v2 ^To fr m the second and
fourth
5. 6.
moments
respectively).
By means
be used.
Table VI. gives a reference to the page on which the formulae for the constants of each curve in terms of the moments are to be found.
47
3
=
3
=
...
*
55
not
o
O 5
a s
oa
:
II
. :
.
:
. :
.
:
o
||
rH
O
V
O
B
# 8
V
"^
H
B
w v
3
O
1
0>
!>
!S>
S
6
O CD
lei
i
^| ,
6
page
6
r-H
^ f e
,^
!p
o
I>
00
For
calculation
of
Constants,
CO tQ
see
CO
Ci CD
GO
J>
CO 00
Curve.
> ^
1-3
to
^
1
<
l
,
Equation
^
II
S
II
<
II
^
1!
II
4
II
II
3s
S*>
3*>
3*j
5j
Ssj
gsj
a.
(symmetrical)
4. 'S
(skew)
(skew)
(skew)
(skew)
a.
a.
Curve.
directions
directions
of
direction
direction
direction
(symmetrical)
CO
I
o
II
rH
both
Description
CO
'
both
one
(skew)
one
one
in
in
in
range
in
in
range
^
CO
CO
I
o
CO
I
ci
+
02.
.S3-
>J
range
range
range
range
range
Unlimited
Unlimited
Limited
Limited
Limited
Limited
Limited
li
>
>
>
48
CHAPTER
V.
Calculation.
1.
The next
point to be considered
is
when
the
moments
have been calculated and the type to be used has been The formulas required for the numerical work will decided. given for each type, a numerical example, including the be calculation of the graduated figures, will follow, and finally
the proofs of the formula?.
2.
curves
Some general points relating to the calculation of the when the constants have been found may be
considered
here.
conveniently
When
the
constants
of x
are
for
any value
by
and if areas are required, some method of proceeding from ordiuates to areas must be found. The most simple is probably to calculate mid-ordinates, and then by the quadrature formula I. or II. find the areas. It is occasionally more convenient to calculate the ordinates at the beginning of each group, and then formula III. should be used. These
;
thus,
from
II.
we have
f*
J-i
from
I.
from
49
Formula II. is generally sufficiently accurate, while the others will be found to give a result true to five figures in ordinary
exceptional cases will be referred to in the numerical examples that follow. 3. It is sometimes a help to see the graduation expressed graphically, and this has been done with some of the examples.
cases
The
best
method
is
yQ
at the
mode
note the ends of the curve, and the heights of the ordinates
that have been calculated.
curve, which can be
In
drawing the curve, as well as in calculating the constants, the sign of the skewness must be borne in mind, for it is possible to draw the curve with the skewness on the wrong side of the mode, and if the distribution is nearly sjmimetrical, it is not so easy to notice the mistake as it seems to be. The tangent
to the curve at the
mode
is
4. It
is
best to
distinctness,
draw on a rather large scale in order to gain and the curves given here were drawn larger
size
;
made
also
may
to conceal large
vertical
it
when
the curve
is
sometimes necessary to use more closely-ruled paper than that generally favoured by actuaries, and it can be procured in very
convenient rulings from Messrs.
W. Gr. Pye & Co., Granta Works, Cambridge. 5. The reader should notice that all the cases considered in the following pages assume complete distributions, and it is in
from part of a which is extremely laborious. Another point, to which reference will again be made, is with regard to grouping statistics it is sometimes impossible to obtain many groups, but for accuracy in finding moments the greater the number of groups the
general only possible to find
distribution
the
curve
by means
of successive approximation
number
of
cases
is
small.
little
needed in this respect, but in actuarial statistics which are sometimes based on as many as 200,000 cases,
discretion
is
50
In our have grouped merely to save work, space and printing, and the grouping does not alter the method. 6, Another matter with which it seems advisable to deal here
seventy or eighty groups would not be excessive.
Ave
examples
is
from -co
seen
+ 00,
This may have any value and from the following diagram it will be
all
how
=X
k negative
==
=l
1
.<
00
k>0
and <
k>1
Type VI.
V.
Type
Typ elll.
I.
Type
Type III
when
j8 2
not
=3
I.
Just before
/c
Type
is
and
passed
is
II.,
If
by a mistake a
student should use the wrong type he will necessarily find his
mistake by reaching an imaginary in one of the square roots which occur in the equations for the constants, but transition
types can be used
to the
when
theoretical values
they can, in
fact,
be viewed as
is
justified
using a transition
type;
the justification
depends on the size of the probable error of the function dealt with, but in practice one can be guided to a great extent by the size of the experience if there are few cases a larger It deviation in the criterion will arise than if there are many. would probably be sufficiently accurate to use Type III., provided tc was arithmetically greater than 4 individual cases must be considered on their merits, but if the student finds himself in doubt he should avoid using the transition type as he will then be on the safe side in the matter of accuracy.
;
7.
In the formulae that are given for the various types, the
//,3
.
If
the frequency
is
51
than after
fi 3
it,
;
the
mode
is
is
positive
therefore depend on the signs of fa in order that the mode and mean may lie in their correct relative positions. Where,
is
made
is
implied, and
become
.
easier to follow
Thus, if we imagine and the other a negative value for /x 3 the frequencies in the example for Type I. to be written in the opposite order 1, 3, 7, 13, &c, all the numerical work would be the same, but raj would be 2'776978, ra 2 = '409833, Oi = 13*52728, and a 2 =1*99638, and the graduation would be the same, but the numbers in the columns of the table on p. 56
would run
52
v\ = d
Vo^v'o
vA
=v
2.
~~ 3rfi> 2
v 4 =v' 4
^dv
~ d? Qd v d
2
or
S 2 =d
v,=2S 3 -d{l+d)
v,=e>s 4
vi
Sv
2 (i
+ d)-d{i+d){2+d)
2
fj, 2
Jj
Sheppard's adjustments when the
curve has high contact
cr
(standard deviation)
= vV-
A(A+8)
4(4/32-3^0(2^-3^-6)
DO
TYPE
+
-O 0"O-0
a
x
~ a
FORMULAE.
The values
to
~ 6+S&-2&
= W/WWi (r + 2) 2 +
are given by
1
6(ft-ft-l)
\
6(r
+ 1)}
?>i 2
and
/?i!
s |r - 2
2\
and
x
+ 2= &, and
ai-r-mi
=
1
a. 2
-^m.2
_N
6
m^mj'
(m
1
table of
functions
is
required
(see
Appendix
II.).
Skewness = 5 v^i
r
54
NOTES.
mi
is
yLt 3
is positive,
and
when
yu, 3
is
negative.
is
negative, which
to that given in
Type
III.
it
starts at infinity,
is
though the
difference
is
ordinate
that
in
infinite,
I.
means that the curve has the numerical example of and falls rapidly so that The the area is finite.
;
Type
the
point, while in
Type
is
In this
needed in taking out the T function, for T(t) is required where t<l; the tables give \ogT(l + t), i.e., logt-\-\ogT(t). If both <m and m 2 are negative, a U-shaped curve is obtained.
case a
little
care
EXAMPLE.
As an example
I.
(Example II.) may be used. The moments were Mr. Hardy's Summation Method (see Chap. III., Art,
following form:
found by
9) in
the
Central
Exposed
to Risk
_
First
Age
of
Example
of Table
II
I.
Sum.
Second Sum.
Third
Fourth
Sum.
Sum.
Group.
17
22 27 32 37 42 47 52 57 62 67
72 77
1,000
37 21 13
7 3
1
552 238 91 29
7
1
82 87
4
1
Totals
1,000
5,175
19,809
64,389
186,638
S2 = 5175-f-1000= 5-175 S 3 = 19809--1000= 19-809 S 4 = 64389-- 1000= 64-389 S 5 = 186638--1000 = 186-638
of the formula? on p. 21, and, in this no adjustments* are to be made in the moments the v's and ///s are the same because there is not high contact, we have
vertical
case, as
/z.,=
to find the
766237
15-1069
M3 =
2-935110
From
be used
the values of
fti
and
/32
the criterion
(k)
can be
I.
calculated,
and
its
must
(see
Table VI.).
r
r
= 5-186811
logr
log log
log
(r
='7149004
+ 1=6-186811
(r+l)= '7914669
+ 2 = 7'186811
of
+ 2) ='8565363
were
72=3-186811
The
values
(r- 2) =-5033563
checked
log(r+l), &c,
6
by
Gauss-logarithm table.
= 15*52366
-409833
m, =
m,=
2-776978
a,= 1-99638
a 2 = 13-52728
Mean-mode=
It will
2*223116
1
be noted that the expression {{3 (r + 2) 2 + 16(r + 1)}* occurs in both the values of b and m. The mean is at age 12 + 5*1 75 x 5 = 37*8750, and the mode at age 37-8750-2-223116 x 5 = 2675942.
The skewness
* In
is
'8032.
permanent object depending on a considerable degree In the examples given it was simply done to save labour, and the original reasons for which the corves were If we had not grouped our calculated did not require extreme accuracy. statistics we should have reduced the error resulting from our not knowing the best adjustments to use in cases in which there is not high contact.
work which
lias a
of accuracy,
b6
The
calculation of logyo
is
as follows
log
X = 3-00000
6= 2-80901
1
colog
l
m logm = 1-84123
mo\oo;m.i
2
= 1-23179
_
T(r)= 1-50406
colog r(w!
+ l)=
-05219
colog
+ logl-776978 + logr(l*776978) the last value being taken from the table at the end of the book. The work to this point gives as the curve for graduating
the statistics
= 149-47] 1
is
X
1-99(338
at
'400833
x
1
2776978
"
13-52728)
is five
years.
The following
col (6)
Age
1+
*
a
\
1(3)
"
i
log- (2)
log
:;
(7)
Vx
(1)
(2)
(4)
(5)
(6)
(7)/
(8)
(9)
17
22 27 32 37 42 47 52 57 62 67
72 77
114429
1-07037 99614 92252 81859 77466 70074 62681 55289 47896 40501 33111 25719 18326 10934 03541
352865 402956
4-530*7
503136
5-53229 6-03320 6-53411 7-03502
82 87 92
753593
2-31792 1-71866 0-01034 18327 30662 40257 48111 51760 60526 "65615 70169 74291 "78055 81519 84726 87714
0-05854 02955 1-99815 96198 92870 S8911 84556 79714 74264 68030 60750 51997 41025 26307 03878 2-54913
1-3229
1-8847 0-0042 0751 1257 1650 1972 2241 2481 2689 2876 3045 3199 3341 3472 3595
04626
0821 1-9957 9027 8020 6921 5711 1367
1-6601
2
1404
21745
2-1525 2-1023
1266
107-6 87-7 68-5 51-0 36-0
20317
1-9429 1-8357 1-7080 1*5557 1-3722 1-1461
2S53 1122
29100
6670 3623 3-9535 3307 5-9709
236
14-0
8568
4622 1-8525
72
2-9
7
3-5050
57
. r CO CD
CM
T"
O CM
58
Cols.
(2)
and
(3)
have a constant
,
first
difference,
at
viz.,
or -500907, and
or -073925.
The value
any point
having been calculated and checked, the other items are formed continuously. Cols. (4) to (9) explain themselves, but we may remark that it is generally advisable to use a larger
number
nil
if
or
is
large.
little
care
is
necessary in multiplying
409833). If an arithmometer put on the plate, and is multiplied by '28134, and the result '1153 must be put in the form 1*8847, to enable us to add it to other logarithms. Col. (10) gives the
,
used, mi
is
and was formed by applying one of the formulae on p. 48. The area of the first group must be treated separately, as the
area,
curve starts at age 16*7775, and the base of the group is therefore 2*7225 in length, instead of 5 years as in the other
cases.
good way
viz.
:
to
find
the area
is
to
calculate
the
ordinates for the middle and ends of the base, and apply
Simpson's rule,
\yxd%={yQ + 4 yi + y
!
fi
},
remembering
by
2*7225
- to allow for the
base
The mid-ordinate is 92*1, the ordinate at the end of the is 116*5, and the ordinate at the start is of course zero
;
the area
is
approximately
tl^i x
o
f.
{0 + 4x92-1
+ 116-5} =44.
PROOF OF FORMULA*
The equation
to
the
curve
is
= ijJl+'
J
M
J
m m where =
,
y
a2
Let
a.,
=b
and
=
|
+d
The reader
-who lias
little
T and
Appendix
II.
59
= a, to a?= +
is
//
- (a i
a?)
'
(a 2
x)
"'-
d.v
Jo
ft 2
"
_N
y ~~ 6
'
m,
(
ffi
'wi2'"=
'
+ ~m^ + '^
'
I>ir+l)r(7?i 2 +I)
Using the same method for the moments as that just given we see that the nth moment, about the line parallel to the axis of y through a?= a is
for the area,
Y ,
tt]
'fl 2
"
th^aj^
y (m
+m
2)
m^ 'W2m
=
(jy
Now,
since r(p)
l)r(pl),
+ l)
2
the
moments about
the
:
through a?=
are as follows
, l
bjm.
x
m + ra + 2
y(m + l)(m + 2)
1 1 7 (
A6
2=
wi,
and
so on
in order to get
GO
m =m +l
/
1 1
and
m 2 =m2 +l
,
and
= m\+iu'
h~m
r (r
2
vi)b.,
+ l)
x
+ !)( + 2)
r
]
^4 ~~~
We
p.
53 by writing
/3 l
=^r
2
j32
and
em\m 2
r
then
Pl
01
&(r + 2) 2 _
4(r + l)
~T
and
2t*
e
3(r
l)
Eliminating
we
e
find
2
ft(r+2) 2(r+l)
&(r + 2)Q+3) _
3(r + l)
(&-A-l)
'
3^ -2ft + 6
Using
this value in the equation ^j-^
^- =
2)
'
+ 4+Wr+1
4^'
for
^
e
at once
em\m The a = (a bm
2
.
distance between
/
the
which can be easily reduced i)-T-(m'i + m' 2 form given. A general value (regardless of type) for the distance was given in Chap. IV. Art. 5.
x
/j/ 'i
) J
to the
01
TYPE
II.
-0-5)
FORMULAE.
2(3-/32 )
a-
JW3
3- /3
2 2
62
/3]
=
/jl 3
ia
therefore
= 0.
is is
symmetrical, and
clear that
mi=m
to
r may
If
rises
a
it
maximum and
again to zero;
but
if
is
negative,
and then
EXAMPLE.
In the discussion that followed the reading of Mr. Lidstone's
paper on Endowment Assurances, Mr. Gr. F. Hardy said that "the errors in the successive groups formed a curve very
similar to the
normal curve
and the
series in question is a
"Mean Age"
Method.
0- 4 5- 9
11
116
1,683
Moments were
found for the
first
= 19-992573),
fl2
1-829172
120452
!**=
= = A
fi 4
8-52636
0023706
&=
/Co
2-548313
-
- -007492,
"liicli
shows
that
63
Type
II.
can be used.
The equations
m=
a=
?/
4-141766 4-543079
= 462-57
The mean and mode coincide, because the curve is symmetrical. For calculating a series of values, the followingarrangement
X
a
(1)
is
convenient
log(l + ?)
(2) (3)
(2)
+ (3)
(4)
mx
(5)
(4)
+ !/
It is easier to
work
in this
way than by
calculating values of
1
at
the
beginning,
middle,
and end
of
each
group,
and
y dx=-{ij
Group.
+ 4iA + y
Areas.
Mid-ordinates.
0- 4
5- 9 10-14 15-19
u
104 287 440 440 287 104
11
1,683
A comparison of the
idea of the error involved in using the former for the latter the differences are largest at the " tails " and near the mode.
=- 272283, and
>3, and so a 2 and m are m'= m, in such a case the equation / x2 \ to the curve becomes y = yJl + -^ and the value of y can J
sometimes happens that
;
/3 2
negative
if a/ 2
= a
and
'"
best be found
by
64
+ *
i
x 2 \- r
'
r
[
J
-
x2 \" m
then putting In
^ =z
a
showing that
N=
P
Jo
v .a'(l
z)~^w/ "
^=
ay
B(m
J,
J)
by
Appendix
II.,
or y
= -,
'-i/m)
we
In a similar manner
value for
?/
to that given
Vo
=N
a
r(m+l
s/nrTim+l
15
20
Z5
30
35
*0
63
TYPE
III.
*-*f*( i+
iF
FORMULAE.
2/x.,
fl3
2/A2
/X 3
fJL^
2/Xo
Mode = Mean
-^
NOTES.
If
is
positive, the
is
like that
shown
in the
example
it
of
Type
but instead of
ending at a
fixed point,
goes to
infinity.
EXAMPLE.
The following
statistics are
Transactions of the Actuarial Society of Edinburgh, vol. iv., p. 44, and give the numbers of wives tabulated for the ages
of mothers,
and according
to years
since
marriage.
The
Number
of Wives.
Graduated by Type
Curve.
III.
2 3
4
5 6 7 8
44 135 45 12 8
3
1
59 111 45
20
9 4
2
1
Total
251
251
group,
The mean is '3346612 after the middle of the second and the moments about the centroid vertical are so that *=-8-44. 1-441787, 3-606622, and 18-93221 As this value was large, Type III. was used, and
;
v =a =This example
7=
-7995221
-0783584
-098007
7/0=214-8
it is one which shows a At first sight, a curve starting at zero, a maximum, and then falling, might be expected.
is
given because
In
reality,
we
The mode
case i^_
at
in ordinary cases of
so the
given by
mean
^3
In this
start
=1*25075
"
mode would be
at "58391,
{" mode
67
so that the first
group
is
made up
of
a strip
first
roup, though, of course, any ordinate read off within the oroup would be larger than any ordinate in the second
croup.
No
Type
III
180
160
140
120
100
80
60
40
20
!-*
turatton
8
F 2
68
PROOF.
viz.,
y = yJl+-)
if
e~yx, put
ya=p, and
frequency,
substitute
for y{a
+ x);
then,
be the total
ij
zPa-iJ e- z+ Py~ P +]
{
dz for -/
cto
=7
Jo
z p e~ zdz
\i4
=y<y+i I Tp+ 1
This
o-ives
?/
=45
71
7-r(p+i)
by using the value
Since
of
found above.
the
T(p)={p-l)T{p-l),
first
moment
29
+
,
is
the
(^H^^A
7 work,
it is
necessary
have moments about the centroid vertical, the position of which (the mean) can be found and as, by definition, the first moment about it is zero, we get
;
yu-3
and
-^-|
/^3
respectively.
TYPE
IV.
y=yo(i+j)
-^'
e
FORMULA.
6(ft-ft-l)
2^-3/9,-6
v/{16(r-l)-A(r- 2) 2 }
= Vl|v/ {16(r-l)-A(r-2)^}
y ~ aG(r,i/)
Sk.=
The
origin
is
/Tr-2
|V^I + 2
is
-
i.e.,
origin
= mean H
va
2/jL2 (r
+ 2)
i2f ,+
</>)'
N Ire r
/
3r
l/o
2-7T
(COS
is
70
NOTES.
/z 3
and
have opposite
signs,
i.e.,
when
to
/jl3
is
positive
v is negative.
is
put
it
in the
form
=y
'
Then 6
is
taken as
this
must
be
If
equidistant
little
is
be calculated accurately,
we had good
tables
of
EXAMPLES.
The
Sutton's
numbers
Sickness
of
in
the
following
nearly
risk
of
symmetrical
sickness
distribution represent
the
exposed
(males
is
to
all
by
the
Tables
durations)
when
number
weeks' sickness
No. Exposed.
10
15
20 25 30 35 40
-45
50
55 60 65 70 75 80 85 90 95
610 255 86 26
8 2
1 1
604 274
102 32 8 2
1
9,154
9,154
Tins group has been taken as the area of the rest of the curve.
71
Mean=
i f 2
3
44-5772339
&=
k=
Type IV. was used because,
eases, the probable error of
k.
3-169897
-0125
as there
is
a large
number
of
will
be small
(see
Chapter VIII.).
r= v= a=
?=
40-12143
4*450399 (positive because
13-39152
^3 is
negative)
21-06072
-03313
Sk.= -
When
is
changed
Ihe origin
=mean +
= 52-504394
The mode which
44-92989.
is
wanted
if
the curve
is
drawn,
is
at
As
,
is
9=
4-450398 40-12143
(
log cos
approximate form for y was used, 8925 . or 19 n 9 = lo 8" tan 6 01Q/ hence g 11537 1*9973446, and from this y is found to be
large
the
'
'
'
273*3649.
The value was checked by Dr. Alice Lee's tables (see Appendix V). The calculation of ordinates by the double process is as
follows
:
iii
years of age.
4'450398 01og lo
<?
logy
0
1
243675
1-1687 2-3382
27337
251-38
T96637
1-93253
1-99721
2-40033 2-35813
f-98885
228-10
72
tan
directly as x
is
of
required in years,
The fourth
column
formed by multiplying L cos 6, and the third is negative, the fourth continuously by addition. When column has to be subtracted from the fifth i.e., it ceases In each case the sixth to be negative and becomes positive. is formed from the fourth + the fifth + log y If the calculation is made directly, the following columns would be required
:
^(i%:)
(3)
tan- 1 a
&c.
col (4)
in degrees,
in circular
Co1
,
5)
,xcol(3)
+ (6) +
au
(7)
measure
(5) ()
(7)
g
(9)
(i)
(2)
(4)
(8)
Col.
2 )
(2)
can
be
tan"
formed
1
best
by
differences
since
A(1+X = 2X+1,
of the tangents
of angles inversely.
table
helpful for
from col. (4) will be found on pp. 251 to 262 of Chambers' Mathematical Tables (1897 edition). When drawing a curve of this type the position and height of the mode can be noted and then corresponding points inserted, e.g., y=+l"1687 and y = 251*38. Care must
obtaining
col. (5)
t
be taken
point.
to
give
the
curve
its
maximum
at
the
right
Type
IV.
10
15
20 25
T 30l5 40
4-5
50 55 6
65
75
65 90
'
95
Mean
"
74 PROOF.
In
V = yoU
#
+
a
)i-
tan0=
= tan -1 -
and
2
+ PYl
0)
-=cos 2w 0,
vB 2m y =y cos 6e-
Now
N=|
77
y {l
+ ^|
I ' fl
e-'^-^da?
=
tan0 =
?/
7T
cos-" l9 e~
cos 2
oxdO. by substituting
sec -6/=
so that
a
7T
dd
,.
=a
v9
cos
^
2
6/
=y a
cos *0e
I
dd where r = 2 ra 2
= y ae
sine/) for
'
sin r <
J o
e'"^c?
(/) ,
substituting
cos
so that
(j>
+ ^7r = 6 and
origin
=yQae-?vG;(r,v),
The nth moment about the
1
00
is
If 00
7T
=n.L^\
l = ==
1+
rh
~ Wi
e
'
&
substituting as above
ra
y
7T
a'
i+1
cos 2m_2
tan"0e-"W by
= iha~
2
1
cos'-" 6
cos r+w +
7'
&n*6e-d0
1
yoan+1 r
sin^-
^-^
71+ 1
2
]
I
p (sin*~ +
r
=
71
cos0e-*(n-l)-
ve~
'
sin""
*?)
;
<Z0
I-)
by integrating by parts and. treating sm n ~ 0e~ v0 as one part and cos r ~ n 6 sin as the other, and remembering that
1
PfK rn+1
J
rn + 1
Wow, since
cos
> n +
l
~ sin n 1 0e~ v0
=O
when
becomes
IT
or
-^
we have
rr
^
a
)J
- v
,
-VCO&r-n+W&mn-ide-'Odd
rn + 1
Further,
7T
//
a-
cos^tan^e-^W
= tr \-\" Nr[
ll
1
n=l
in
the
/x
because
N= ya
cos '0a~ ve d0
}
7T
Using the last result with the formula for the th in terms of the two previous moments, and remembering that
fi'
is
unity,
'
ii=
~r(r-l)(r-2j
(8r
-2 + ^
<8,(r
r(r-i)(r-g Xr-8)
,i
to
the
centroid
vertical,
we
76
or
(r-l)(r-2)
2
z;
3a 4 (r 2 +
fit-
(r
+ 6)
,2
>
(?-
+ v -8r2 }
2
)
(r-l)(r-2)(r-3)
2
If
now, we put
for
z/
2
,
and write
as before,
A-&
we have,
and
A=,
8
z
and
2{,-l)
A(r-2)(r-3)
8(r-l)
"- r +
r 2, we
1)
8r
7'
have
'and
6(13,-/3,2/3,-3/3,
-6'
A(r-2)
16(r-l)
Finally, since
at once.
v-
?:r
2
,
on
p.
69 follow
ordinate
maximum
mode
is
is
such that
dx
is
i.e.,
J
{
a?)
cases, x
L
,
^
.
aJ
,
is
zero.
of x such that
=2
the
mean from
the
= x x = -fco and a value The distance of + - is zero, or x= =2m a origin is /x\ or -, and, therefore, the
mean and mode
is
-.
r(r
2va ^.
+ 2)
which
77
p. 69,
when
on the same page, are inserted. It will be useful to give another example of the calculation of y for curves of this type, and may take a curve in which
a,
and
.-.
= 29'590, = 19'886, a = 13-650, N = 2162. = -67205, tan = -82998, log cos = T'91907, and in = 33 Jj, cos
i/
.*.
(/)
c/>
</>
54'
circular
measure
is
"59172.
logN
colog a
i log r
log
x/2 IT
cos 2 </>_
~37~
~
-
00776
-00282
12r
-cf>v=- 11-76700
-11-762
xloff 10
6-89183
</>)''
colog (cos
+1
= 2-47564
1-90367
-80107
accurate for
If,
The form
just considered
is
sufficiently
all
is
however,
by
2i/7re-W>+l)
n= f Product (1
+
4
r 2 -\-v 2
< 1+ l)
TYPE
V.
y=yQx
e~y
,x
FORMULA.
y={p-2)y/'/ i {p-Z)
jL
l/o-
V
Origin = Mean
p2
^~
Mode = Mean
27
p(p-2)
yu, 3
The sign
of
is
79
EXAMPLE.
The following
paper "
series of deaths is
On
262-8)
Ages.
Deaths.
Graduated by Type V.
30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99
100, &c.
1 5
8 12
3 6
2,162
2,162
The mean
&c., are
is
at
/Jh
= =
3-573346
fi3=- 4-752613
fj, 4
51-02583
-4950399
1=
R,=
/c
3-996134
85
used, but the value
is
Strictly speakings
not very far from unity, and the following Type V. constants
were found
p=
7=
\ogy =
37-29145
/x3 is)
The approximation
origin
is
1) was used,
The
at
80
were
X
(I)
log.r
log
y = antilog
(6)
(3)
(5)
of
by
obtained, of course,
from a table of reciprocals. The point to be borne in mind in drawing a curve of this type is that as the mode and origin are not at the same place, care must be taken to give the maximum ordinate its right position and magnitude (cf. Type IV.). The graduated figures agree fairly closely with the original statistics below the 90-94 group, but are unsuitable
and the two later groups. The reason is that Type IV. should be used, and curves of Type V. have a range limited in one direction, while Type IV. curves have an unlimited
for that
range.
example
The particular case was chosen partly because an in which /^ is negative is rather more awkward than
is
when
positive.
In such cases
it
is
a good check
(in
to
imagine the
4, 18, 53,
order
this
case
&c), and
81
Type V
30
35
40
45
50
55
60
65
70
15
30
85
90
95
100
105
82
PROOF.
Putting
to
L
=z
in
=y
e~y'\c~i\
and integrating
from
x we
,
have
2sr
yo=
Using the same
orio-in is
p-l
r( P -T)
moment about
the
Jo
r(p-n-l)
1
r(p-i)
This gives
ii\
= *r. pl
mean
and. origin,
which
is
**- (p-2)(p-3)
^
3
(j,-2)0>-3)(p-4).
=
(
P -2y( P -3)
4 73
and
/J>3
(p-2}(p-3)( 1 i-4)
Pl =
.
^ = 16(y-3) _ ^_ 4 + ^ (^-4)2
16
.
16
_ 4)2 (p
root
is
16.
16
p4
will
have
to
be taken
as
the
positive
of
the
equation, or 7, which from the above equations 3), will be imaginary. 2)\Zfj,2 (2') (p
given by
maximum mode is
-
such that -/
civ
zero there,
i.e..
yQ J
pe~P~ 1 e~yfa
{ \
p+s x)
is
zero.
x=
axis
1
and a?=oo give the cases in which the curve touches the of x, and the other case, the one required, is when
x
or
p - = 0,
a?= -,
i.e..
the
mode
is
from the
origin. 5
88
TYPE VI
y=yo{ x - a ) x
l,
'
FOEMULyE.
6(ft-ft-l)
6 + 3/3,-2/32
r
1
rr+2
+ 2) +16(r+l)
2
\A(r+2)+16(r+l]
_
= 2 V/ ^V/A(r + 2)
+16(r+l)
^"r^-^-ijrfe+i)
sk -4v<-l
Origin = Mean
^ 31-2.-2
Mode = Means-
^ r+2
rs
>
84
NOTES.
The range is from a to co and the method is like that of Type I. r and e are found exactly as in Type I., and lqi and 1 + ^2 are the roots of z 2 rz-^e O, just as l+??i and 1 + ra 2 were in Type I. The origin is before the beginning of g^ is taken with the negative root and l+q 2 the curve. 1 with the positive root when 3 is negative, and vice versa.
,
1
//,
EXAMPLE.
The number
payment
were summed in groups of ten years of age and divided by 100, and the following series was
policies experience
obtained
Xo. of Entrants -100.
rH
56 167 98 34
9 2
1
50 168 100 36 10
2
'5
368
368
!
lie
moments
&c.j
at
were
mean
/*2
= =
fia=
f*>4
4-088800
9953605
A= &=
K2 =
4-739349
1-895
1-2,= l + q*= =
r
9.i
33-42429
41-03080
7*60950
42-03080
6-60950
10-37947
46-1821
q,=
a=
logy
85
The origin
is
or
12"34058
before the centre of the 167 group, and the curve starts at
of the largest
which
reasonable.
as follows
log-
log(x-a)
(3)
<l\
og
-'-'
q log(x
(5)
a)
logy
(6)
y
(7)
(1)
(2)
(4)
There is no difficulty in writing down the values for columns (2) and (3) without using column (1), as only the whole numbers in x and xa change, the decimal remaining constant so long as equidistant ordinates are required. Columns (4) and (5) are obtained directly, and column (6) by
adding columns (4) and (5) to log y The mode which is useful for drawing the curve
.
is
'02429
The skewness
is *443.
PROOF.
N==
y (x a)^x~^dx
by
substituting
for
Jo
N
^"nlh-fc+iB.fe+lj qi-q-2-l)
86
origin
is
M
y
xU x (
~a
^x-^dx
__
by the same
From
&c. ;
we
obtain,
of y
l
and r(q
l),
/A 2
,_ =
a'(gi-l)(gi-2)
(i-9.-2)(a,-t-8;
&c.
It will
and m 2 = q.2 I. if m =q we can use the whole of the Type I. solution, provided Thus, we bear in mind that the range is from x = a to a?=oo
l
Type
VI.
87
TYPE
VII.
FORMULAE.
c=2/*2
2/o
N
v/2 7T/X 2
88
EXAMPLES.
in
column
(2),
(4) the reserves resulting from grouping a number of Endowment Assurances according to
Reserves -^ 1,000.
groups of
Ungraduated.
(4)
years of birth.
Graduated.
(5)
0)
(2)
17 22 27 32 37 42 47 52 57 62 67
11
13
e
2'8 11-5
104 40
15
3
2-7
232
12-2
1-3
522 250
8-4 2-4
Total
1,319
1,319
347-7
3477
The following
Constant.
Mean age
J*2
39-202426
3 066840
43-967213 2-769635
029805
M-s
650127
M4
0i
22-40663
0000418
2-920997
&
K
cr(
-005
1-751237
- -0002
1-664222
= V~)
(T- 1
5710248
300-4760
!h
6008813 83-34959
normal 'curve are k=0, ft = 0, and The values given above do not differ very greatly from these, but a comparison of the graduated and ungraduated figures shows that the reserve curve agrees better than the sum assured curve partly because the
The
ft = 3.
value of ft is closer to 3, and ft has a larger value in the case of the sum assured.
89
of y
the value of
= 1-6009100657
y/2/i
is
required.
In finding the
graduated
and
The
best table
ii.,
was recently
pp. 174, &c),
it
given by Mr.
to
W. F. Sheppard
areas in
(Biometrika, vol.
show how
was used
Mr. Sheppard' s tables give the areas and ordinates of the normal curve in terms of the standard deviation that is, he assumes the standard deviation to be unity, and his tables must be entered by using intervals of a'
;
Di.stance
from
Previous
origin in
Age
calculation
units, i.e., 5 years of age.
column
xo- 1
Difference of Area multiprevious column plied bv 3-47-7 area for age (total group x to.r+5. frequency)
145
19-5 24-5 29'5 34-5 39-5 44-5
5-893443 4-893443 3-893443 2-893443 1-893443 893443 106557 1-106557 2-106557 3-106557 4-106557 5-106557
3-541258 2-940377 2-339496 1-738615 1-137734 536853 064028 664909 1-265790 1-866671 2-467552 3-068413
00144*
99836 99049 95S97 87238 70432 52553 74694 89712 96902 99320 99892
00785 03152 08659 16806 22985+ 22141 15018 07190 02418 00572 00108*
2-7
and
a piece of
+ ('70432 *50000) + (" 52553 "50000) because we pass across the the group is on each side of it.
origin,
The second column can be left out when the method has The ages in the first column were taken been grasped. with the assumptions that 17, 22, etc., were the consistently
central ages of the groups.
column in Mr. Sheppard's must be used. It was with its help that the curves in The statistics and curve for the the figure were drawn. reserves are shown by the dotted lines.
If ordinates are required, the z
tables
90
Type
VI
Sums.
Ass urea"
-r
/OOO.
/Jge
17
22
An
means
of the
average reserve for any group can be obtained by graduated figures, and it could be used to test
This is by no means the only rough check that can be applied, but it is interesting because it shows a use to which frequency-curves
might be put
91
PROOF.
To show that
f
I
e~ x
dx= V7T
2
Jo
let
Jo
a?,
we have
ada:=K
e~ a
* x2
Hence,
e- a^1+x ^adadx=A e- a2 da=K2
But
If"
2J
(to
.,
1+^-""
V7T
4
Hence,
J -co
"
v
:
is
obtained as follows
t/
cc
dx=yQ xe~
\
'-
+ e~ C
C&t'
<
xdx
J
_co
by parts
_2
1-f
AT
= 2/^a
92
ADDITIONAL EXAMPLES.
8.
Up
tu the
present
with a view to illustrating the various types of frequencyit seems advisable to consider one or two practical examples which may help to show the range of applicability of the curves in actuarial work, and give an opportunity of noticing a few difficulties which may arise in applying them.
carves, but
in practical
or
The function with which actuaries generally wish to deal work is not an exposed to risk or series of deaths withdrawals, but the ratio between the deaths and the
that
is,
exposed;
curves
with
the
rates
of
mortality,
sickness,
An
may
examined, and,
other method
?
if
by means of the curves we have they fail, must they be put aside for some
;
Xow the first point to be considered is whether these rates are frequency distributions if they are not, the use of the frequency-curve is empirical. A rate of
who die, we imagine 1,000 persons exposed to risk at each integral age, the number of deaths would be 1,000 times the
if
and
rate of mortality,
and
this
it is
possible to
it
though
is
is
me
that
experience.
It
frequency-curve.
On
the
marriage are certainly much like frequency-curves, and the rates of withdrawal, whether regarded according to age or duration, might take a form like our example in Type III.
There are, however, practical objections to the direct operation on rates, even apart from the very exaggerated idea of frequency distributions in which it is necessary to indulge.
risk at the
end
of
may
tends
to-
we
are
it,
which introduces
it
is
93
possible.
be inferred that a small number of say fifty or one hundred deaths must necessarily be grouped according to each year of age, but that even if there are two or three thousand the roughnesses
It
must
not, of
course,
introduced
by
the
use
of
rates
influence
the
result
considerably.
each rate of
9. It
The reason is that an equal weight is given to mortality which is very far from the weight
these
objections
and then deal with a practical method of overcoming them. The statistics to be considered have been taken from a paper by Mr. M. Mackenzie Lees " On Rates of Mortality and Marriage among daughters of Peers and Heirs
Apparent, &c." (Transactions of the Faculty of Actuaries, vol. i., p. 276), and may be summarized as on page 94.
The moments were calculated by Mr. G. F. Hardy's Summation Method, and were found, about the mean
28-77191, to be
^=
fM 3
63-2092
627-101
^4=19,103-3
ft= 2 =
The
criterion
1-557153
4-781321
was
k= I'd,
The
The constants
for
Type
III.
were
-201592
1-56881
7= p=
a=
The curve
starts, therefore, at
7-78189
Mode = 2381128
age 16*02939.
= 890-05.
table,
The rates resulting from this graduation are given in the and while they tend to show that the distribution
do
not
give
they
satisfactory
graduation,
and
the
94
Xo. of Marriages
Rate of Marriage
E
3,658 3,603 3,528-5 3,393-5 3,187 2,945 2,688-5 2,443 2,187 1,956 1,758 1,583-5 1 ,417 1,270-5 1,148-5 1,068
M.x
mx
0008 0022 0139
Xo. of Marriages
M>
3 7
Curve.
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
...
8 49 114 176 219 192 211 212 194 146 137 121 105
75
0027 0132
0336
0552 0744 0714 0864 0969 0992 0831 0865 0854 0826 0653 0562 0650 0453 0354 0486 0266 0352 0268 0172 0229 0256
01
0332
0517 0667 0776 0846 0881 0S89 0875 0845 0803 0753 0698 0640 0583 0528 0475 0425 0378 0335 0295 0260 0228 0199 0173
0151
984
9045
848-5
60 64 41 30 39 20 25 18
3,695 3,433 3,187 2,957 2,742 2,541 2,354 2,179 2,016 1,861 1,723 1,591 1,469 1,355 1,249 1,151 1,061
44
99 151 189 168 188 196 185 143 138 126 112 82 65 69 43 32 40 20 25 17 10 12
12 7
5
0541
0695 0809 0880 0917
0920
0901 0861 0812 0754 0693 0631 0569 0508 0452 0400 0352 0309 0270 0235 0205 0176 0151
638
6125
586-5 568-5 541-5
14 15
9 6 8
58
515
491-5
2
5 5
476 454
440-5
2
5 2
'"
0015
0120 0051
.*::
416 395
378-5 363-5 348-5 335-5
50
51 52 53
0029 0089
54 55
56 57 58 59 60 61 62 63 64 65 66
67 68 69
3175
304 291
278-5
"3
0103 0036
.'.'.'
261
248-5 234-5
...-
2195
209-5 201-5 191
1
... ...
0046
177
165-5
:::
...
154
147-5 135-5 124-5 112-5 105-5
...
70 71
72
0131 0113 0097 0084 0072 0062 0054 0046 0039 0034 0029 0024 0020 0018 0015 0013 0011 0009 0007 0006 0005 0004 0004 0003 0002 0002 ouoi 0001
...
...
976 897 825 758 696 639 586 537 492 451 412 376 345 315 288 262 239 218 199 181 165 150 139 124
0130
0112 0096 0082 0070 0060 0051 0044 0037 0031 0026 0022 0019 0016
6
1 3 3
1
3
1
75*7
0007
45-7
0003
...
27-2
ocoi
...
1
...
0089
73 74 75
95
84-5
...
79
...
95
due almost entirely to the objections referred Of course, if we were examining the algebraic to above. form taken by rates of marriage, we should begin by work on population data where the roughness of material is avoided by the large numbers of individuals dealt with as, however, we are seeking for a graduation, we must see how these objections, which of course apply to some extent to any method of graduation, can be overcome. It has been remarked that the cause of the difficulty is that incorrect weights are given to the items used, and the most obvious suggestion is that the actual exposed and marriages should be graduated entails a large amount This, however, separately. of additional work, and a shorter method can be used which
failure
is
;
This method consists of using and treating it as a hypothetical exposed to risk from Avhich a new series of marriages can be The advantages are that we have only to make calculated. one graduation, and the weights of the various parts of
avoids the double graduation.
a series allied to the exposed,
be graduated, and in this connection it may be remarked that as the exposed to risk is generally capable of beingrepresented by a frequency-curve,
it is
assumed
tabulated.
by
such
curves
(viz.,
Type VII.);
this
is
also
The hypothetical exposed can be fixed by trial or The column E'a, in the from the values of the exposed. table given above is taken from Sheppard's Tables of the Probability Integral, x being taken as 3'06, 3'084, 3*108, and the entries were multiplied by 10". 3*132, &c, The M'a.=E'a.xma was then formed and graduated. following values were obtained for the M'.r series
10.
, :
Mean=
/
24-85779
29-5006
Lt2
=
;
= 190-112 M4 = 4 36M2
M3
1=
1-40775
&=
k= -
5-01114
7-102
96
As
this is large.
Type
III.
7=
p= a=
Mode=
in the final
5-933325
7/o=192-625 21-63562
The curve was then worked out and the rates of marriage column were obtained by dividing M' by E'. They
agree closely with the ungraduated figures. numerical example of the application of the method to MX """ Table may now be given. The normal curve with the <r = 10 and origin at age 524 was used, and the values were
11.
multiplied
b}^ q x
A part
gxEx
of the
work was
Age.
105
=E
3984439 3944793 3866681
Age.
gxEx
lO3
52 51 50
53 54
55
1
&c.
Summing
following
:
these entries (q x
Ex
10 5 ) in
fives, I
formed the
Age
q X
10
r>
20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
13
70 218 594
1,394 2,460 3,702 4,519 4,385 3,602 2,249 1,197
461 133
31
5
1
25,034
The abbreviations (use of Crelle's tables and grouping) were adopted to save labour, and as the figures were required for an example they are sufficiently accurate.
The following values were then found
a2
4-584327
^=-4999871 ^4 =61-17014
Type
of curve.
x
No.
I.
m = 32-81 166
^, = 26-57123 ! = 18-78553
02=15-21272
?/
= 4609-884
5 years of age.)
The ordinates were then calculated for every fifth age, and finding that the curve is not very far removed from the normal
curve of error, I interpolated in the second differences of the logarithms of the ordinates for those at the other ages.* A quadrature formula was used for finding areas, and q x was
as follows
Deviation.
Actual.
Expected.
Age
group.
+
1-5
1
15-19 20253035404550556065707580859095100-
...
00643 00731 00850 00991 01179 01452 01866 02505 03516 05118 07682 11648 17462 24870 33286 43289
1-5 8-9
61-0 204-6
8-0
4
3807
575*6 811-4 1063-8 1386-6 1773-2 2136-7 2261-2 1925-8 1241-9 514-4 126-0
11-7
12-4
104
"2
124
21-2
27-3
45-2
392
4-9
...
494 129
18
1
20-4
30
7
173
1-5
'5
14,480
14492-1
1158
21 9-5
103-7
*
say.
Ase - '* - ^
The
'
is
the equation to
is \x'~
+Bx + C,
criterion of course
shows
the curve
is
nearly normal.
98
12. It will be interesting to
method
~)
to represent
.
the exposed
and multiply by the values of colog p x This means that we assume that the products can be represented by
y
2"'
= Aytf-te-WI2** + HBi/ r
where
(:'
" 2[U(r=logff
'
]'
H=e
(
-
'
s< c +
o-
4 (iog^) 2
-h-)i-2a-
_ e k\ogec+ ^(logeO*-
y = Ajytf-(?-WP<r -+HBytf-<9
x - t) *'2<r
I.
i.e.,
the
sum
of
same
origin.
The
difference
o-
log f c,
so log" 10 e=
log 10 e.
is
The whole
solution
J'+x
first
xydx and
x2ydx (the
2 (<r
!
1 o"
+N
[t h) 2)^
given
first
we
obtain, as the
=-=r-.
or
~
h = fi'20
and
where //
is
= /,(N, + N2
11
written for
moments about h
the odd
Remember that tlie normal curve is symmetrical, so that mean of such a curve are zero. 2 t Can be seen at once as the sum of two integrals; Njtr moment of the first normal curve in I, and N 2 (<r 2 +(t h)'2 ) moment of the second normal curve.
about the
moments
99
logi
c=
-log
"'
10 e
II.
as stated above,
and
if
y=
y== a
10 7
as
is,
of course, generally
v'lrr
com enientj
r
then,
A =Ni-*-10*
and
2
J
1
s
N
=
13. If
(see
equation
II.)
10Ve-lF *>10grf
2
t
+h
10 k C-2~
of the
we assume, as Mr. Hardy does in his recent graduations new experience, that log 10 c is known, we only require
to calculate one
moment which
of
gives us -^
JN
!
-f JN
r^r
2
and. this,
with
the
help
equation
II.,
enables
us
to
complete
the solution.
If c
14.
interest.
in
2nd
=4-1929354
second
Deducting (W. F. Sheppard's adjustment) T^- from the moment and multiplying the first moment and the
h 2
100
adjusted second
moment by
to
make
/,=
then
7-080920
^2=164-384085 g (*-*)=
9586889
t-h =
log
10
9-092617
c=
03948873
00301749
A=
]ogi
B= B=
00004518782
5*6550214
q x was then calculated from the graduated oology obtained from the values of A, B and c, and the following table of expected deaths was worked out. The values of q x are given in the table showing the frequency-curve graduation
:
Graduated
Deviation.
Age
Group.
for Central
E xpected
Deaths. Deaths.
Age
of Group.
Under 25 253035404550556065707580859095-
01431 01854
02517 03551 05160 07639
4-0 2-0
6'6
69 205
11-8
211
1-3 6-5
6-3
38-2
11-0 33-3 76-3 23-4
25-1 7-6
1-6
...
21530 22493
1888-7 1213-6
11415
17053 23352 36484
5191
136-6 20-6
494 129
19
14460-3
14,480
128-2 276-1
147-9
all
very like that given by Mr. G-. F. Hardy, but avoids having to obtain c by trial. Mr. Hardy's expected and actual deaths balance better than the above, but I do not think the rates have been understated the 75-79 group accounts for the as systematically,
disagreement.
The
101'
15.
When
there
it
was
remarked that
is
moments except
In some
when
is
moments
and consequently the curve obtained is by no means the best that can be found. This most frequently happens when the curve rises very abruptly, as in our example for Type I., or when it takes the form of the example for Type III., the reason
being that, in such cases, the assumption that an area is concentrated at the middle of a base of unit length involves
a considerable error.
In the Type
I.
group was assumed to be an ordinate at 17, whereas the curve starts at age 16*76, and the central point ought, therefore, to be later. The results can sometimes be improved considerably by basing a second graduation on the first and, in the particular case just referred to, since the first group is
the
first
too large,
It is
we might assume
unnecessary to find as
many
as four
moments, for by
(b)
of the
are
p. 59, giving the moments about the have start of the curve, afford a very simple solution.
We
6(wi
m
and writing
+ l)
fc
71=
and
7.,=
r4
fi ib
we have
71(72
!)
717-2
and
m +
2
(72-l)(l-7i)
71
72
moment about the start of the curve. range of a curve can be fixed by general considerations, a good deal of labour can thus be saved, while, if the start of the curve is known, the following solution depending on three moments is of use.
where
jl
is
written for a
16. If
the
Writing
\.,=
-A and X =
3
-r~A-
2/A
102
the values of the constants in the equation to the curve are
given by
in
+ 1=
K,-x
2(\2 -\,)
A/o
A.oA-3
Wl
*
mi
+ mo + 2
and
??i 1
line for
Returning to the example of Type I., and considering the age 22 in the table on p. 54, we see that 4*175 and 14' 634 give S 2 and S 3 excluding the first group, and the moments about age 17 are then found to be 4*175 and 29*268; transferring to 17*5, we have 4*075 and 24*268 adding the
17.
,
;
2 moments for the first group, *034 x I and *034 x respectively, = 4*0818 and yu/ 2 24*26936. Assuming a range of 15*5, and
(-i-)
/
u,
m 1=
W2=
a,=
a2
7/0
-3498
2*7758
1*735
= 13*765 = 154*2
is
17*5
+ 1*735 x 5 = 26"175.
first
From
four
groups are 37, 140, 152, 143, which is an improvement on the This example is used for figures obtained previously.
convenience, but with regard to adjustments in the particular
case the remarks in the footnote on p. 55 should be borne in
mind.
As a second case, the example for Type III. may be considered. Assuming that the value of p( "0783584) is not to be altered, then the first moment about the start of the
18.
curve
(see p. 08)
aQ ,
= p+1 =
-
'9216416
-
,,
to
103
start at *8
the
first
moment about
44 x 135 x
the point
is
calculated
as follows
'1*= 4-4 -7 =94-5 45x17 =76-5 12x2-7 =32-4 8x3-7 =29-6 3x4-7 =14-1
1
x 5*7
5*7
3x6-7 =20-1
251
277-3
The
first
moment
The value
19,
8,
-
277-3
is
therefore
is
..
2ol
1*0875.
and hence
is
ry -84752.
47,
i/
of y
2,
1.
123, 48,
3,
r
is
= 205-0ctf~'
07186
e~' 84751
y-r(
P +i,
have no
difficulty in
When moments
convenient to use the equation in this form rather It may be mentioned that as the than in that given on p. 65. value of p is nearly zero in the particular example, a good
result
is
would be obtained by assuming that value, or, which same thing, by putting y^y^e'V; y now becomes l-0875- = -91958, and y = 251 x '91958 = 230*8, and the
the
J
graduation
19. It
is
1, 1.
sometimes happens
that
the
error involved
in
the
calculations of the
moment
from the curve not starting at the beginning of the unit base assumed for the first group. An instance of this is the first example of Table I. for which the mean is at duration 5'182, and the moments and constants are
^= ^=
* Strictly
17-63688
A=
3-34846
1355361 ^4=1923-565
rapidly.
104
so that the curve will
7
be of Type
(i29fi85
t
I.
and equation
'
to
it is
= -89O82 y-'
(25-49729-ci0
624275
where the origin is at 1*02897 where the curve starts. The graduation by this curve is shown in the following
table
Duration.
Withdrawals.
Graduated by
Type
I.
curve.
1 2
1
5
58
15
6 7 8 9 10
11 12 13
u
29 28 26
2L 18
18
37 30 25
21 18 15 13 11 9 7 6 5 1
3 2 2
1 1
12
11
5
14
15 16 17 IS 19
11
7
6
1
20 21 22 23 21
3
1
3 2
1,000
1,000
first
group
of
may
applied,
(bx) "-dx=\ y
l
x"
4b"
m
J
2 b"
-~ 1
*+
{i 2
l)
~2T
+1
/<mi
--x 2 .
.jdx
V"'i
+ 2)
last
which is a rapidly convergent series when x is small. In the 1*5 1*02897 = '47103 the second term example where a? is
105
barely affects the result.
the formula
y must, of course, be calculated
by
^ +m
which
is
Y(r)
3
+l
an analogous form
to that
group
in
j;,
e-v^r = ,^
;/
(-|- 1
-^2+
...)
106
PART
CHAPTER
11/
VI.
COEEELATION.
1.
Two measurable
characteristics,
and B, are
a?
said to be
the same value, y of B, equally likely to be associated. In other words, certain values of B are relatively more likely to
the
other
it
generally either
increases
or
decreases,
and
is
one
increases
steadily
the
in
decreases.
Put
a rough-and-ready way, the definition can in particular cases, " The mean with which actuaries are familiar, be stated
:
ages at maturity in
Endowment Assurances increase with the unexpired term when the policies are grouped according to
;
or,
less
marry and have children." There is correlation between ages at maturity and unexpired term, and between the age of a bachelor and the number of children, and it is required to find a method of measuring the amount of The easiest way to appreciate the correlation statistically. nature of the problem is with the help of a table of double entry, such as the following, which gives particulars of 2,870 endowment assurances grouped according to their unexpired term. A little examination of the table shows that there is a connection between the two functions, but does not give any measure of the correlation suitable for comparison with
he
to
107
the experiences of other offices or with that of the same office
at a later date
Unexpired term of
:
Mean
Maturity
Endowment
Assurances.
Age
for
30 35 40 45
50
55
60
65
70
75
the row.
0-4
5-9
2
24 1
IS
2
20
1
15
26
8
6
4
14
6
4
56
2
6
53-75
16
12
2
12
6
9
62
(5
36
3
40
127 237 271
231
22
3
172
1
6
...
::
55-03
55-85
10-14
15-19
3
6
2
10
...
:.
9
S
17
117
4
99
2
52
2
8
4
2
432 665
6
4
...
24 145 155
3
2
l
84 11
l
56-59 57-58
57-88
20-24 25-29
3 9
3
133
167
78 20
71
i
674
538
247
77
90 123
2
1
11
2
3
3
30-34
35-39
...
1
6
11
4
49
2
127
49
2
8
4
2
(5
59-94
6
3
49
2
22
3
61-04
62-50 65-00
40-44
45-49
...
2
4
3
4
...
8
1
12
1
Total
6
17
62
8 2,870
Note.
For explanation
table
is
16.
The above
called an array. The middle which the row is associated is value of the variable with called its type, so that the third column (i.e., that headed 40) would be called the ?/-array of type 40, and the fourth row would be called the ^-array of type 17*5, because 17*5 is the middle of the 15-J9 group. our definition of correlation, and 3. Now, returning to examining the last column of the table, which gives the mean age in each row, we see that the numbers in it tend to increase as we go down the column, and the age at maturity The figure is therefore correlated with the unexpired term. on p. 108 shows the series of mean values clearly.
in
is
4.
if
A little
we imagine
is
no correlation, the
series of the
means
of the
is,
rows
will
be independent of the
other function;
that
they will
Another point
be noted
is
that
may
may
increase
JU8
FIGURE
'z/vV y
>
/
^
{/
10
!5
20
25
30
35
40
45
50 7c/
109
in
is
the
ease
of
the
endowment
positive,
mean maturity age also increases. These introductory remarks will give an indication of the nature of the problem to be solved, and may help to render the following proof easier to follow. It should be remembered that the proof deals with a function of n variables.* The table given above has only two variables, but it is easy to see how more for instance, variables may be introduced in similar tables endowment assurances by limited payments give three variables (term of premium, term of assurance and age of life), while an increase in the number of lives say, jointgives four life endowment assurances by limited payments
;
variables.
what equation
will represent
the
numbers
body
and how
is
this
concerned.
be deviations from their respective means of a complex of measurable characteristics. The sizes
771,
772,
773
.
Let
rj n
by a
large
number
Let there
be
m
6. , 2
of these causes,
.
.
and
let their
x
means
will
be
6],
>, 1, eL 63
e lu
then
rj
rj 2 ,
ij 3
%% will be functions of
certain of
63
em .
Further,
if
m>n
the
e's
appear only in certain of the 77's, and the e's will not be fully determined for a given tj complex. We also assume that the
variations in intensity of the contributory causes are small
as
compared with
their
that
is,
we
assume that the deviations from the mean value can be graduated by the normal curve of error (Type YIL). The mean complex being reached with the mean intensities of contributory causes, we have, by the principle of the super
position of .small quantities,
?; 1
=a
,e 1
+a
2 e2
+a
13 3H-
+a lm e m
V2
+2,nl
(i->
Vn
+ <*>wmt
A student reading the subject for the first time would do well to omit the paragraphs indicated by brackets. After the statistical idea underlying correlation has been understood, it will be found easier to follow the theoretical work.
110
The as are coefficients whose values have to be determined, and any of the system of a's may be zero, for a particular contributory cause may have no effect on a particular result. Further, the chance that we have a conjunction of contributory and causes lying between ei and ei + Sei, e2 and e 2 + Se 2 between em and em f 8e m will be given by
2^ + ?
61 2
6" 2
4+
'
J+
*f_ \
um*) x
SeM
of
Se m
(ii.)
where the
x
.
standard
deviations
the
distributions
are
k k2 K m and C is constant.* Now, by (i.) let n of the variables e, say the first n, be replaced by the variables 77, then the probability that we have a complex with organs lying between 77 and 971 + 8771, 7) 2 and %+S?i2 Vn and rj n + 8r} n together with a series of contributory causes lying between e w+1 and e u+1 + Se +u eOT and em + $,, will be e n+2 and Se n+2
.
.
where
C
(i.)
is
a constant, a function of
:
C and the
a/s,
and
2
cj>
(ii.)
(iii.)
A quadratic function of the t/s from A quadratic function of the e's from A series of functions of the type
e+i(&i, n + lVl+h, n + )V-2+
rjx
to
rj n
en +i to e m
+b n .n + iVn)
en+zfii, n+2Vi
+ b2}
n +iV-2+
+ &n>
n + iVn)
m (bi
mVi
+ h,
mV-2+-
+bn .mVn)
where some of the b's may be zero. Now, if P' be integrated for all values from
.
. .
go to + go of
x
e m we shall have the all the contributory causes en +i 3 e n+2 whole chance of a complex with organs falling between 7] and
,
V\ + &Vi>
and
772
+ 8772
7jn
r)
+ $7)n
say
2
<j>
,
e,
e n+x
we
alter
the
any particular case of the normal curves, the chance of getting a and ej + 8^ when the distribution is Type VII. is y^e-^l^^b^ where *i is the standard deviation similarly with each of the other causes. As the causes are independent, the product of the various chances gives the required
result
between
ei
chance,
Ill
triple constitution of
2
</>
e to
disappear
from
alter
its
(ii.)
and
(iii.)
any terms in 77. Thus, reduced to its first constituent, or we conclude that the chance of a complex of organs between 7) and ^ + ?/!, 77.,, and 7) 2 -\-Sr} 2 rj n and
without introducing into
finally, after
mn
x
integrations,
</>
Vn + &Vn
occurring
is
given by
1
P = Ce-?x '8r)
where
,8r) 2 , Sr] 3 ,
Srj n
(iii.)
is
77's.
This
is
the law of
but replace
X 2 by
-
a quadratic
P Q e -hi''
Here
C, cpp
cpq
ir]l -+C.,..,r,.."+
+2e
.,rll T
.,+2r 1
.,r1l
n .,+
are constants,
and
S]
p and
q in
case,
when
2
P Ce-K'i'h
Integrate
+C.r,.S
+ 2c
*
to
771
from
+x
and we
t]. 2
variation.
f,
2= C2
V
1<T\
C\C 2
all
values of
77-2,
2<x22
- =c/l- \
77!
C\C.2
Integrating for
frequency,
all
values of
and
77-2
we have
N=
*
Ctt
VC1C2
In Appendix
III.,
(c
2
12 )
some
The
index
as
result
za-j
h z
=eJl "\
(v.)
e1 c2 /
rearranging
the
in
is
the
expression
of
for
as
a
III.
jaerfect
square
-e 2
(l
L2
'
)v-2~
done in No.
Appendix
112
r= C C
\
Now
we have
put
for
rji
and
770,
anc^
-2
:7rcr 1 o-o'
vT ^
/
W(l-r*)
o- lff2
(l-r 2 )
<r 2
*(L-r*)/.
(iv.)
The equation
just given
is
representing tables like that on p. 107. Tt has been obtained on certain assumptions which may not all be realised in
practice, but
it
Since
20V2
2<r 2 2
the other,
it
follows that
all
the
two chances,
i.e.,
proportional
-oW, which
a measure
size of the
term
jzr
<T x cr,{l
2
)
or
r, is
of the correlation,
is
[8.] Perhaps the easiest way tosee how its value can be obtained from an actual experience is by looking- at the matter from the curve-fitting point of view, and dealing with the expression for z hj moments. It will be remembered that the moments were obtained by
summing
powers
by the
independent variable, but as we now have two variables we can take n powers of the one and m of the other. Thus, if we take the second powers of the x distances and
of the
the zero power of the y distances [i.e., neglect y), we obtain the ordinary second moment of the frequencies reckoned only
Similarly, with the second powers of the and the zero power of the x distances. These y distances calculations give two of the unknown constants for a = \//j,2 There is, however, another second-order (see Type VII.). term to be considered, namely, that obtained by taking the first powers of both the x distances and the y distances, i.e., multiplying the frequencies by xy. This may be written
in the
x direction.
c. x
113
This double integral reduces to
or
Nro-jo-., (see
Appendix
III.)
the
_ (xy)
moment moment
(xy)
of
or
No-i<r2
To
calculate
the
coefficient
of
correlation
2
moment, the
xy moment about the centroid vertical. [9.] Now we have seen that equation (iv.) gives an expression for describing a correlation table such as the table of
endowment assurances on
p. 107,
it
is
= ZQe -{<l^-2ht>: +
make
t^
g.>
If Ave
=z
This last expression
is
but
its
mean
9i
and
it
follows
that
(1)
The deviation
arrays
of the
mean
or
or the
in
means
of the
decrease
arithmetical
progression or
regression line).
(2)
on a straight
The standard deviations of all parallel arrays are equal and independent of their types.
example
it
will
be well
to
proceeds on the
principle that
we
require to
fit
a straight line (y
=a
2 -\-b 2 x)
been altered to
one given by Mr. G. U. Yule, &c, andProc. Boy. Soc, 1897, vol. lx., pp. 477, &c. avoid the introduction of the method of least squares.
is
a modification of
in the
It
has
114
**
^*
*l
*3
Let xx
yi,
# 2 y2
and
let
y=.a^-\-h^x
Now,
method
if
of
x is a 2 Jrh 2 Xi. we proceed as we did in fitting frequency-curves by the moments, we make the graduated and ungraduated areas,
+&2#i) + (2+&2#2)+
=yi+y2+
or
N2+*2S'(#)=S'(y).
And
or
(2+&2#l)#I+(2+&2#2)#2+
tfoS'O)
first
=^^1 + ^2^2+
+ &2S'(ar2) =S'(ay),
moment of the x's, S'(y) the first moment moment for the x's, and S'(#y) a
moment
distances in the
in
which any frequency is multiplied by the product of the x and y directions. If these moments are now transferred to the mean, as was done fitting the frequency -curves, we have
in
Nrt 2 =0,
or
tf 2
=0;
and
SV) = S'(*y),
7 -
or
~ W(x*)
'
115
of the whole
Ncrr
h
-
_ s '(^)
No?
S'O/)
;
If
we now
we have
0-2
where r
measure of correlation
y's.
(coefficient
and
11.
At
first
sight
it
may appear
and
y,
are not
first,
y=:rx,
gives
the
mean
values
of
corresponding to
cr l
then
if
x=0
the
if
mean
of x is 0,
and
# = 20 the
mean
When
is
we turn
it
we
cannot, of course,
mean
=2
20
be
-2.
12. After this preliminary remark we may return to the two equations and consider how it is that r is a measure of correlation and whether it can always be treated as a satisfactory measure. We can best see that r is a measure of
correlation
u
*
y=rx
in the
form
x
0"i
or
Y = X?',
it
as giving
(To
origin (this
proof)
in
one characteristic in terms of the other where the mean is the is due to referring moments to the mean in the
of
measurement
is
each case.
mean
116
series
of
the other
characteristic
of
r,
increases to
an
extent
while
if
r is
negative
is
Y
of
decreases.
increments
and
If
reached.
Y Y
become
equal
increases
tells
us that
no correlation, and r in this case is zero as can easily be seen from the equation Y = X?\ The value of r lies between 1 and + 1 (see Chap. X., Art. 4), and its sign has no influence on its numerical value. In other words a large negative value does not mean that the two characteristics do not vary together but only that increases in the one correspond with decreases in the other the numerical value of r indicates the extent to which variations in the two characteristics correspond. This indication is satisfactory provided the means, when plotted in a diagram such as that on p. 108, fall approximately Distinct in a straight line (i.e., " regression "* is linear).
;
common
as
might be
they are very marked in any case, r ceases to be an entirely satisfactory measure of the correlation. opportunity of removing another 13. We may take this
supposed, but
if
difficulty
that
is
sometimes met.
doubt which is best shown by be perfect correlation when one thing is always smaller than As an example we may take the correlation another " ? between the lengths of a man's right arm and his left arm
;
Some
and since each characteristic is measured from it own mean, and in terms of its own standard deviation, the coefficient would not be decreased if every left arm was a certain number
than the right or if 99 to the right arm. in length, say --.
of inches shorter
.
it
14.
though we required
p. 17, we noticed that them about the mean, it' was best in practice to take them about some point fixed, arbitrarily so as to avoid fractions and then adjust the results afterwards. The values of the <t and <72 can, of course, be found with The the help of the formula on p. 19, viz., v.2 = v'o d 2
When
was invented by Mr. Francis Galton in connection The term it indicates the way the children of particular with the study of heredity " step back " to the ordinary population mean. parents tend to
*
;
" regression"
117
for the
deduction of TV from the second moment should be made same reason and in the same cases as in frequency-
curve
fitting.
With regard
to the
S(*Y)=S(*+4)(y + dB
or since
S (x)
= S (y) =
S(ay)=S(*Y)-N4
where
S(V/') is calculated about a point distant d
x
from the
mean
of the y's.
It will
example on p. 107 can now be worked be found to make the proofs and methods
easier to grasp.
given above
much
moments
are to be calculated
is first
age 60 and unexpired terms 20-24 years, and for the present The following the calculations are made abont this point.
table shows the calculation of the
of the totals of the y-arrays,
i.e.,
Frequency.
x'
Frequency x x'
Frequency x
(jc')~
A
17
62
584
a 4- CO
-6 -5 -4 -3 _2 -1
1
36 20 68 186
1,168
216 100
272 558
2,336
c CO GC
co CO
00
60
8
2
3
2,870
=N
643
* = -z2870
1589
553659
118
because
is
5 years.
cj!
=1-13637
:
was formed
Frequency.
y'
Frequency x y'
Frequency x y'-
56 172 432
665
-4 -3
_o
-1
1
2 3
4
5
896
1,548
1,728
665
2,870 =
1,3U0
969 2870
969
4=
.
^=
C Z.,2
-337631
7209 2870
_ JL
l
s
"
=
and
16.
a,
2-31453
1-52135
is
The value
of S(a?y)
of the
numbers
appearing under the frequencies in the correlation table. The frequency 62 in the 50 column, for instance, is distanced three spaces upwards and two sideways from the arbitrary origin, so the value of x'y' by which it has
in very small type to
be multiplied
is
3 x 2 = 6, as
shown
The
other figures are obtained in like manner, but the sign must
be borne in mind.
Any
left-hand upper
will
and y having
like
signs
119
will
by which the frequencies are multiplied are of opposite The calculation of the product moment is as follows
Frequencies.
tfaf
Total of frequencies
/**y
+ 19
204 144 452
5
(/)
155 + 71-84-123 145 +99 + 11 +49 -11 -52-49 -90 24 + 36 + 3 + 22-22-6-9. 6 + 6 + 8 + 3-6-8-11-2 + 117
.
1
1
+ 19
102 48 113
1
2 3
4
5 6
3 9
+ 17 + 62 + 2-1-1-2 + 26
6 2
2+2+1
1
....
8 9 10 12 15 18 24
80 35
6 2 5
1 1
480 280 54 20 60
15
18 48
1,799
S(a?y)=S(a>y)-N<M2
= 1799-Nc^ = 1262-51
S(ay)
No-,0-,
1262-51
= 25445.
The
coefficient of correlation
between age
at maturity
is
and
endowment assurances
'25445.
in terms of the
= 19007//
is
measurements are made from the mean and the unit The Hue drawn in the figure gives this result. 17. An alternative method similar to the summation method given in Art. 9, Chap. III. for moments can be conveniently
where
all
5 years.
used in connection with correlation tables. Taking the same example, we obtain from the given table
another in the same form, giving the y sum of it by summing each column continuously, and then form a third table by
120
Table of
or
Unexpired
Endowment
Assurances.
the
term
30
35
40
45
50
55
60
65
70
75
Totals.
0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49
>
4
3 3
4 4
3
1 1
17 17 15
6
62 60 54 37 13 10
1
643 1,098 637 1,084 601 1,044 502 917 680 347 409 180 178 57
8 2 51 2
60 60 58 50 39 19
8
8 8
7
7 6 3
1
1
871 333 86
9
1
Totals
16
13
55
237
2,363
2,977
49 13,381
The
give the
column
sum
correlation table,
column of the and are the same as the column a?=30 in The total of the y sum, or of the first column
of the total in the right-hand
y's
(13,381h-2,870),
of the
and
similarly the
sum
of the first
x's (18,501^-2,870).
Table,
i.e.,
Table giving
all cases
for
in Correlation Table.
- B
(3
30
35
40
45
50
55
60
65
70
ID
Totals.
0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-19
2,870 2,864 2,860 2,814 2,810 2,806 2,642 2,639 2,636 2,210 2,207 2,206 1,545 1,545 1,544 871 871 871
2,843 2,781 2,197' 1,554 2,789 2,729 2,171 1.534 2.621 2.567 2.071 1,470 2.200 2.163 1,784 1,282 1.544 1,531 1,297 950 861 580 871 760
!
68 68 66 57 16 25
11
1
1
8 8 8 7
333 86
9
1
333 86
9
1
333 86
9
1
333 86
9
1
332 86
9
1
321 86
9
1
261 78
7
1
623 68 8
Totals
343 49 87,521
The total of the last table gives the xy moment (87,521), and the x standard deviation is found by forming from the
121
first
row the
series
8,
and summing
:
i.e.,
70,855.
The second
the numerical
moment about
work being
the
as follows
= 2S
-d(l + d)
= ^0855-6.4463 x 7-4463
= 1-3747
Similarly with the y
moments
y
mean=-OQ - n =4*6624
2870 Zo/U
3381
4-6624x56624
The xy moment
= ^^-
-6*4463 x 4-6624
(Sheppard's adjustment)
to
=a
= 25, y 5
The xy moment
(-4399) is
007^
i- e ->
S(#y)-r-N.
it
may
be well to point out a use to which the particular example The result in the equation form gives the might be put. average age corresponding to each unexpired term. Now,
and get new series of average ages. The results used in would give the relative accuracy of the three I have worked out the formula with the Z weights methods. (H M Table), and found that Age at maturity =57'595 + *1200 x (unexpired term).
a valuation
*
result is
The method used by me was approximate and can probably be improved, merely given as an indication of a possible line for research.
the
122
The
possibility
and there certainly seems a towards making a simple " model office " for endowment assurances with the help of the method we have been using.
average ages
of
valuations,
doing something
19.
When
it
constructing
correlation
certain
to
tables
little
care
is
necessary,
because
is
in
arrangements of
statistical
material
possible
coefficient of correlation
when
absolutely uncorrelated.
correlation,"
and
it
as
is
arises is
by the
use of indices,
found between indices, when the absolute values of the functions dealt with have been selected purely at random.
As an example
of
the
way
introduced in actuarial
statistics,
we may
refer to
endowment
doing a large quantity of such business, and consider the term of the original assurance (1), the number of premiums
to
be paid
in future (t2 ),
of years for
which
ratios
If Ave
formed the
f and , and Avorked out the coefficients of correlation, Ave should not obtain a measure of
the
correlation
betAveen
of
number
of
premiums payable
in each
in future
same denominator
would be
to
exaggerate correlation
that
the
folloAvs
is,
just
mentioned
particular
case,
are
as
index in terms of the means, I. To find the mean of an standard deviations and coefficients of correlation of the two absolute measurements.
Let
#1,
cc->,
a' 3
#4
,
subjects,
mu
j
m>, e2
m3 w
, ,
their
r3i
,
mean
values;
''is?
<ri,
o-.>,
o- 3
or 4
their
vvj,
''24,
e4 ,
{
i.e., ^r 1
=w +
1
&c.
i13
the
mean
and
/2 i
the
mean
value of
^
;
123
indices
and
"
respectively,
and
groups.
values
We
mean
be neglected.
Tben
;, 3
1 S(-')
J. S Jf 1+
.y 1+ Y
ra 3
Wij
? 3
w3
wH
0-3"
-.
O"!
0-0
&ia=
;!
m.2
r Vi
and
?24=
2/ n
H
.
err
1 mf
cr2
0-4
m.2
\ ^J
J
II. To find the standard deviation of an index in terms of the standard deviations and coefficient of correlation of the two absolute measurements.
^3"
[V
lA
-f-
8/
W3
-
'l
*3
/J
square terms
W32
[mi
W2 3
=*(
N - +N
-2N.
r l3
or
-13 h.i
0+ VVwr w ^
3
'l3f
III. To find the coefficient of correlation of two indices in terms of the coefficients of correlation of four absolute measurements and their standard deviations.
Let
.1*3
and
X\
124
Then,
if
<** -*0G-*0
_
milHi
__
l 3 in x
2 g3
W3W4 V
A,
.
mi
e2
;;* 2
m3
e4
m3
.^1
wz 32
;!
m 32
4
2
W3
e2 e 4
7 4
m 2 mA
mA -
M4 2
W*2
*W4
iuizSl J)( m \m
x
\v1.2
mj
as
we
o- 2
0-1
<r 4
0^13^24= *13*24
\w*i
r m2
m
0"4
Tn
o- 2
0-3
cr 2
0-4
m2 m 3
Co
CT 3
r23_1
in 2
r24
4
>
Hence,
<Ti
(To r
<J\
(To
(T4
mi
Wo
.
r 12
m m
x
^14
4
m m3
2
)
)
;vH
0-4
4
m 2 m4
r24
cr 4
4
P/ f <ri
0-32
en
<r 2
/fo- 2 V
cr 2
\ Krnf
Proposition
the
I.
m3
m2
w 2 2
mf
m2 m
>
means
of
the
shows that the mean of an index is not the ratio of corresponding absolute measurements, and
Proposition III. shows that the p will vanish when the four subjects forming the indices are quite uncorrected, while, if two, say, the
third and fourth, are identical, so that r 34
=l
and
O-3
m3
=m
0"3
we have
<T X
*
(T 2
>*12
=
\ \
\
m m2
x
(T X
*
O-3
<X 2
r 13
nil
O-3
H
f
^H
m3
~ ^23H
m3
<Tl
Kin x 2
+ m
.
C3 2
-
(To
a3
m2 m3
r23 \
>
payments to which we
referred.
An
when the
subjects
xu
.r 2 ,
x3
are
cc 3
and
x3
0V
O32 )
Ifctf
//^.^]
125
CHAPTER
VII.
Measurable.
1.
we
will give a
it
deals,
drawn
from vaccination
statistics.
in a
purpose, and the table is taken from a paper on the subject by Dr. W. R. Macdonell* and relates to the Sheffield smallpox outbreak of 1887-1888:
Strrxcth to resist Smallpox when incurred.
Cicatrix.
Recoveries.
Deaths.
Total.
S
93
Present
rz
3,951
200 274
4,151
<B
Absent
278
552
A
Total
4,229
474
4,703
The functions between which we want to find the correlation are " Strength to resist smallpox when incurred " and " Degree of effective vaccination," and the statistics we
have cannot be arranged in a more detailed manner than the The characters cannot be measured quantitatively; above. but as the absence of such measurement does not mean that there is no correlation, we must see how the coefficient can
be obtained in such a case.
* BiometriJca, vol. i., pp. 375, et seq. This paper and a supplementary one deal with, the subject in a way that shows clearly the strength of the evidence on
The question
126
Table of Frequencie
a+b
+d
a+
b+d
2. Using the same notation as that of the previous chapter, imagine the frequency surface a 2 \ - l 1 ( x~ N"
-
2J=
ff
.'/
2tt\/
r'2
e
cricr2
21
V<Ti 2
<r 2 -
<rx <r.)
the axes of x and y at distances h' and suggested by the figures above.
127
Then
27rv
<Tia-2j
i>'
fc*
=
by substituting
/__
xt
"
^^
11
2
?/
for
OtP
1
and
for
-^
V2
and writing
??
and & =
1 x-
Further,
00
27TCT! J
fc'
n = -7= p
\/27rJfc
--** X
2
6-
r/^
and
+ r/,=
-^-=
50
1
-
""dt/
the total
f requeney
=a+ b +c+
?,
we have
N-2(6 + ^) = N-N /V / 2
(a
fV^&
ZJ
+ c)-{b + d)_
.XT X>
~\/
/2 f*
> 7T J o
"^
and, similarly,
Since
a, b, c,
Sheppard's Tables, and the problem becomes " To find a value for r from the equation
XT
2
poo
poo
2Wl-r )h
where
d,
}k
N,
/i,
Appendix III.)
128
The
solution given
by Professor Pearson
(see
^^=r+^AA+J(tf-l)(^-l)+g^-8)i(*
+
-8)
\ - 6/* 2 + 3) (*4 - 6* 2 + 3)
-15/i 4
+ 45/i 2 -15)(^-15A- 4
-J-45&
2
15) +
etc.
H=
-.-^e
v/2tt
-*
7'
and
K=
e~^'
:
y^tt
The numerical solution has to be obtained by approximating to the roots, and Newton's method* is convenient for the purpose. 3. The numerical work of our example is as follows
4
by interpolation
for this purpose
* Jl
e
f
(a
+ c)-(b + d) _
" 4703
3755
= 7984265
7^=1-27716
in Sheppard's Tables.
to a in his notation, so h of (1 + '7984265) = '8992132 must be looked up inversely in his Table I. If his Table III. be used it must be entered with '7984265.
Similarly,
J2
We
next require
k
\
e-Wdy= -7652561
A-
= l-18833
>
v -,Tjl?
'>
an ^ we ^rs *
et ^ rom
Shepparcl's
Tables
H = -1764870 K = -1969111
* Neirton's
.-.
log
.-.
log
H = 1-2467127 K =1-2942702
of an equation. Let f(x) =
method of approximating
to the root
is
and let & be a value near to x so that x = b+h where h is small, then f(x) = f(b + K)=f(b) + hf(b) + terms involving higher powers of h by Taylor's Theorem, and since f(x) = 0, we
to be found
have
there
7i
"^7777
oraj=b
",,,,
The
(f>)
(b)
chief objection J
b,
to the
method
is
that
may be more
application to correlation.
I.,
129
Hence log
and
Dr.
^^ w^ = N HK
C
2
='1258266
1-336062
Macdonell gives 56 instead of 62 as the last two the difference is probably due to interpolation. Turning to the expression for r, we notice that hh is a product in the coefficients of r 2 , r 4 r 6 &c, so it is well to work
figures
out
its
it
being found.
writing
also
by
down
the
first six
or seven powers of
:
and
h.
one as a
first
approximation.
-758844r2 + r- 1*336056 =
r
Taking
we have
-7588}
Now,
our only using two terms of the series on the left-hand side
of the equation for finding
rate.
r,
and
Ave
may
:
take 77 as a
#
trial
-1-336056+
(-77)
4
+-7588C77)
5
= 771
+ -1375(-77) + -1196(-77)
2
+ 2(*77) (-7588) + 3(-77) ('0434) +4(*77) 3 (*1375) + 5(-77) 4 (-1196) + 6(-77) (-0082) + 7('77) (-0971)
6
= 77- 2rMl
= 7692
0022
In work such as
of the natural
this, a table
is
giving the
first
is
seven powers
first
numbers
of
it
a help.
There
edition
one in the
is
now
difficult to
though a copy
will
Institute.
table of the
ii.,
Biometrika, vol.
pp. 474,
4.
130
for r can be
checked in the
following
way
d=
AW- had
N
e
21-
-(x a +y 22rxy)
dxdy
2tt\/1'>:/<
"VT r
27r\/r
1ST
r Jk
[}*
*./
|
|
2tt\/1 r-j
,.-.,
.rPVl-rt/x'^
-r
where
t
"'i\
r e-^dX^y
1
u
. .
A/l-r 2
to
To approximate
the
double
integral
we
can
find
e~^dX
J
t
for a
of
formula.
The following
Values of
Values of
Vi_ r
h-yr _ f
-2
,*
^
1
/
Application
of
27^
Simpson's Rule.
(&)1-18833
(+)l-68833
2-18833 2-68833 3-18833 3-68833
9795
Xl = -0561
x4 = -1964 x 2 = -0530
X 4 = -0384 X 2 = -0048 X 4 = -0016
3503 -i-6
= 584
The
final
first
quadrature rule,
y^=g{yo+4y*+2yi+4y,|+
this
and gives '584 as the value of the double integral; multiplying by N(4703), we have, 274*6 as the value of the group called d, which agrees with the figure given in the correlation
table.
131
CHAPTER
VIII.
Probable Errors.
1.
In the previous chapters we have assumed that the means, standard deviations, moments, constants, and coefficients of correlation obtained from a body of statistics give an exact
measure of the constants or of the correlation between two This is not really the case. If it were possible to make an infinite number of trials bearing on a given subject, we could obtain constants or measure correlation accurately, but in practice it is only possible to take a sample from this total " population." The variation that results from using a random sample, instead of the whole "population," to find the value of any particular constant, could be reasonably measured by the standard deviation of that constant, for, as we have already remarked, the standard deviation measures the way statistics are collected round their mean or their " scatter " from it. Custom has, however, led to the use of another function known as the Probable Error, which is 67449 times the Standard Deviation. The connection between these two functions is due to the theory having been developed from the normal curve of error, and arises in the following way. The probable error gives that value of x (say p) which divides the part of the normal curve representing positive errors into two equal portions it is
functions.
;
therefore given
of the
by
p y= e~*
1
J
v2
V27T
unity.
In
order to find
we have,
k 2
132
therefore,
-1(1
to
in
obtain
the
value
of
x,
corresponding
is
to
+ a) ='75
i
,
e~^''da\
Jo\/27r
formula, and
This can be done by interpolating inversely by Lagrange's p is thus found to be *67449 approximately.
Probably the best way of viewing the use of the probable error is to regard it as a conventional reduction of the
standard deviation.
2.
The general
rule followed
is
by
statisticians
when
considering
probable errors
by two or three times the probable error, it is not safe to assume that the particular case differs from the expected
result.
3.
We
most simple
case,
and
find the
mp
times
when
m trials
made and p
q
of
its
is
whole series is given by m Taking moments about +q q+ (ft + q_) the centre of the group represented by p m the first moment ~ is mp m + m(m l)p m ~ 2 + + mq m = mq(p-\-q) m ~ = mq.
and
failing.
l
The
.
m =p m + inp) m ~
l
is
q^
+
.
+m q m
2
= mp m - q + m{in l)p m t
l
-.n
o 2
o 2
m(m 1) (m 2)
^-y+
a
+mqm
1
+ m(m Y)pj m -
q*
+m(ml)q
= mq-\-m(ml)q
is,
therefore,
mq + m (m
2
1 )
m q = mpq
2
'67449vm^^.
and
it
be advisable to see
its
application to a few
examples.
It
born
is
number
of female children
born as 1,050
1,000
133
in
'
is
If 51,350 out of
in a certain
community, would it be safe to base any theory connected with the variation from the usual probability on
statistics
?
the
The expected
is
result
is
51,220,
and the
probable error 1
difference
+ J^ = 103*9.
The
as this
between the actual case and the expected result was is only one and a quarter times the probable definite conclusion can be based on the divergence
result.
actual
number of cases had been 10,000,000, and the number 5,135,000, then the probable error being 1,039,
the
it
sufficient
evidence for the conclusion that the ratio 1,050 1,000 did not
the particular case.
the probability of death within a year
is
#
5. If
is
007, the
-67449\/200 x '007
possible
to
and
it
would,
therefore, be
if
2*2
number
on risk
to treat
for a year.
That
is, it
on the assumption that the number of cases is about 200 and the average age is such that '007 might be taken as the probability of death in a year. It has also been assumed that it is correct to treat each class as if it were subject to its own rate of mortality, and had to be treated independently of the
rest of the business
6. It will
;
this
is,
remains constant, then \/mjpq has its largest numerical value when 'p q = i, which shows that an office will generally find that if it has two classes of
be noticed that
if
size, and one is subject to a higher rate of mortality than the other, the former will have the larger actual deviations from the expected number of claims, because the
equal
end
7. If
we now
individual experiment,
is
clear that
if
ys
is
the theoretical
134
frequency in the sth group that would occur in
probability of a particular one of the
stli
cases, the
group
is
H =p, and
Then
will
the probability of
falling elsewhere
is
l~=q.
group
in
m trials
of this
be given
by
{jp
+ q)m
and
its
standard
deviation
^=^/TO^=V m m(1 -y
where
y' s
=^ys, and
is
accordingly
the proportion
of
y8
"N individuals.
in the typical group of m out of In actual practice we have, however, only the sample, but since the sample is only likely to deviate from
we can
frequency,
and write
W -^
1
(i
where y s
8. If
is
now taken
group in
the sample.
is
others must on the average be too small, and this shows that
to
be investigated
is
the
amount
of
the
between
deviation y s and y s >, or between deviations in the frequencies of the sth and sth groups.
Let
8^= deviation
in the sth
yi+y2 +y3 +
+>/*+
ys>+
-\-yn-m
Now,
if
the sample has given too large a value in the sth group,
it is
among the
other
groups in the proportion of their relative frequencies. This assumes that deviations are only due to random sampling, not to defective
measurement or
latter causes, it
classification
if
might be reasonable
135
drawn from adjacent groups, but it is necessary to confine our attention to errors due to random sampling, and we then have
%fr=-8y.x
and
This gives the
Sy s ,Sy s
-^~
my
1
s
=-^ m
*"
ys\m
effect of the error in the sth group on that in the group on the assumption that the error in the sth group is the cause of all deviations to give effect to the fact that all groups contribute we must sum the expression for all samplings, and obtain
s'th
m lys/m
Hence
(i-)
^y^y^ym--
yj
^
is
The only
in
summing
of a
(ccy) -moment
should
immediately
disappear.
9.
To find
<rh
observations.
Measured from a
fixed point
we have
S(y) where y
is
ac.
mSh = S(.vSy)
Squaring each
side,
we have
for
where
S'
is
s'.
the
sum
all
is
not
equal to
By
dividing by the
number
2 2
summing
for all
= S OV oW 2 S
) 2
"
'
xs00s>o-y8(ry8 -rym )
using
(i.)
and
(ii.),
&(xs,ys,)
m
whole distribution about the
of
where
fixed
m/x' 2 is
the second
// 2
moment
2
of the
point.
sample.
But Hence
=cr2 =square
a-jl
standard
deviation
of
=a-l^
(iii.)
136
10. Tins last result
is
is recorded and the mean used to compare the particular experiment with another of a like kind. Is an actual difference between the means due to some cause other than random sampling ? A practical application would be the comparison of the average profit from various classes
large
number
of cases
of business for a of
number
of years.
the
profits
taking the square root of the second moment about the mean and dividing it b}T the square root of the number of years the quotient would give cru of (iii) It is only by using the
;
.
that
chance or to some cause requiring removal. In a similar way it is possible to find the probable errors of the moments and constants, but this leads to the more theoretical parts of the subject, with which it is unadvisable
11.
to deal in a
to call
book
of this character.
It
is,
however, necessary
function
in
correlation
statistical
owing
to
the
importance of that
work.
has been shown* to be "67449j=^
Its value
Vn
We
have
already found in Chapter VI. that the coefficient of correlation between the age at maturity and the unexpired term of
endowment assurances
definitely,
is
-25445, but
this
it
is
on the strength of
information,
coefficient represents
any
we have
seen
how
probable error of
from our having taken a random sample. The r is found by inserting 2870 for n in the formula given above, and we have +'00118 as the probable error. It is customary to show this by writing ?=: 25445 + '00118. In this case, therefore, the probable error is so small that the result is reliable, but it sometimes happens, especially when only a few cases have been
arise purely
* The proof is given in Vlul. Trans. A., vol. exci, pp. 231-241. It is, however, of a complicated nature and unsuitable for insertion in the present work. This last remark also applies to the proof for the probable error when the
fourfold table
is
used.
137
considered,
that
the probable
error
is
large
enough
to
make
result
r
it
produced.
for instance,
it
= "0827 + '0621,
would
definitely
chance.
12. In using the fourfold table the probable errors are larger, as
is
rougher, and
gives as the
becomes complicated.
probable error of
r,
to,
G7449f(fl + <Q ( C +
ft)
xV'n \
'
4ft 2
+ ^2
(a
+ c)(d+b)
ri
2
^
ab
(a
+ b) (d + c)
ft
ad be
n
2
cd
n
2
ac
bd] i
n2
J
where
v=
2Vl r
1 -=
f
_=
e -(fe
+fc 3 -2rfcfc)/2(i-r)
h =?*
Ji-r>e-& a dx
o
^2ic]
l-rh
fa
_1
Jl-r* e -1fc* fa
v2ttJo
assumed that the fourfold table is so arranged that a + b>c + d where a, b, c, and d have the meanings indicated on p. 126. The numerical work for finding the probable error of r for the example in Chapter VII. is
it is
and
+ c>b-\-d and
as follows
1
r
h ~ rk
fa
-t=\ v 2irJ
Vl -'"
e-^dx
_ v
7
r -5(3821
e-&dx= -21505
JJttJ
by Sheppard's Tables.
1
^
^-re-&*da;=
fa=
-=
1
-32230
v= x
= 2^1-^
;
e --S6744
2ttx -63900
10462
138
1
.-.
log
= 1-98039,
log
^ = 1-33254,
is-
and
logifr2
= 1-10171
A
.*.
G744Q 10462
^__l. j
^4703
[
-02283 + -00145
+ '00479 +
00252
- -00408- -01015 1*
J
= -0124.
by
13. It
is
The latter gives + -0040 in the above case, \J n which is a good approximation to J of -0124.
67449
139
CHAPTER
IX.
Fit.
When
made
reasonable.
graduated values of any table on which they may have been working rough checks which have amounted to a comparison of the totals in various groups and an inspection of the changes of sign in the differences between the graduated and
ungraduated
figures.
needs, however,
The problem of the goodness of fit more accurate treatment, for inspection, even
calculation of
mean
and
it is
the
mean
impossible to
excesses
are in
any way
is
balanced by equalities in the rest of the graduation. A test required which will give some measure of the disagreement
judged by the whole graduation. Now, if there be N observations distributed in n + 1 groups, the numbers in the group being m\, in 2 w' w+ i, we have to find a criterion to enable us to decide when the series m 1} m.2 mn+1 will be a legitimate graduation. We may clearly take a legitimate graduation to be one in which the observed values [in') do not differ from the theoretical (m) by more than the deviations that would be expected in
as
2.
. . .
random sampling.
What we
require
to
know
is
not the
* Generally calculated as i^ripq, which gives approximately the average magnitude of the deviations irrespective of sign from the mean result. G. F. Hardy, Journal of the Institute of Actuaries, xxvii., pp. l!14, et seq.
140
probability that the particular series of
m"s
??z/ s,
or
an equally
To appreciate
we may
and suppose that a coin has been tossed six times and come down 4 heads and The "graduation" we make is 3 heads and 3 tails, 2 tails. and to test it we require to find the probability of obtaining a result as unlikely, or more unlikely, than the observed one. This probability is the same as that of getting any one of the
case, that of a coin-tossing experiment,
following results
6 heads and
5
>>
tails
1
>>
4
2
4
6
such probabilities
even when the simple probabilities leading to the deviations are known, in any but the easiest cases but when
directly,
;
we do not know
our inability to
actually arisen.
is
introduced, owing to
from a priori reasoning which of the more or less likely than that which lias
would, for instance, be impossible to say,
It
when 20
dice
were being thrown, whether the probability of getting ten 11 sixes" or more was greater than that of getting two " sixes" or fewer but this is an extremely simple case compared with
;
numbers have to be considered. 3. If it is assumed in any measurement on one subject that the deviations from the mean take the form of the normal curve of error (Type VII. ), and it is required to estimate the
(,
chance of obtaining deviations greater than a certain value say), it will be necessary to sum all values of the normal curve beyond t on each side of the mean, i.e., take
e-* flfe+
j
2
I
e~*'
dx = 2
j
e~ x *dx
result
by the area
of the
whole curve,
i.e.,
by
141
the total deviations.
two
it
and
it
is
normal curve
is
z=z
with which
lcr i"
"i"2
o a
we have already
performed for compared with the total. If there are n measurements, it becomes necessary to deal with a function of n variables, and this will give the reader a slight idea of the problem from the mathematical point of view, and suggest that he will expect the quotient of U\o ?i-fold integrals to give the probability. The next step is to reduce these ?i-fold integrals to the form of ordinary integrals, and it has been shown* that the result
I
The integrations must be both variables from t and t' onwards, and
dealt.
p_J
~ e~&* xn l dx
X
e
-},x*
4-
xn
cJ
I.
is
reached.
In
this
expression
the
function depending
indicated
test for
on
n variables
from which
the
is
position that
distribution, the
measure of the probability P can be obtained, a must be found from the statistics of the particular ^ graduation, and in the paper to which reference has already been made its value is shown to be such that
4. Before a
value for
mr
square of the
Karl
Pearson
"On
variables
such that
it
Random
p.
corresponding to \ 2 horn 1 to
be found in Biometrlha, vol. i., values of n + 1 from 3 to 30, 30, with a few additional values and auxiliary
will
P P
for
all
142
positive differences, increase the improbability of the system,
while a ratio
is
an error of 15 in a group of 20 would be very large, but in a group of 1,000 would be negligible.
;
group
for
5.
The
fit
and
its
application
may now
be dealt with.
the facts representing the graduated and (1) If ungraduated figures are only available in groups, then the value of the probability by the test will, as a rule, be lower as the number of groups is increased. This practical point should be borne in mind, as it sometimes happens that graduations are tested in groups of, say, 5 years of age; but the graduated
figures for individual ages are then used unreservedly, though,
strictly speaking, they
(2)
The
if
test
applicable
maybe no better than interpolated values. assumes a distribution, and would not be the numbers were a series of ordmates, though
fit if
number
of ordinates
had been
given in the
series.
(3) The tails of the experience will be very small and " We ought to take our final theoretical never fit exactly. " groups to cover as much of the tail area as amounts to at " least a unit of frequency in such cases. " (Phil. Mag.,
footnote, p. 164.)
(4)
If the
number
will
of observations be multiplied
by
/,
sa}^,
and
value
the
of
deviations
2
and % the test will show that the fit is worse. This may seem strange at first, but a little consideration Avill show that it is
figure,
multiplied
/,
theu
the
number
;
smoother
then,
tionally the
same
in
distribution but
follows
that
the
one with greater frequency is less probable than the one with less frequency. The probability of a result as bad,
or worse, than three heads and one tail in coin tossing (two heads and two tails being the theoretical result) is '625; but
3x2 = 6
heads and 1x2 = 2 tails is "289. (5) I have found, in applying the test, that when the numbers dealt with are very large, the probability is often small, even though the curve appears to fit the statistics very
143
closely.
The explanation
is
we
of
amount
is
concealed in a
by the roughness
of the data.
The increase
number
of cases observed
of view,
is
made
up of more than one frequency-curve; but a certain curve, approximating to the one calculated, predominates. thought that the introduction of (6) It is sometimes additional constants must necessarily improve the fit of a curve. It may do so in some cases, but it is quite possible to take a curve with ten constants and find it gives a worse result than another having only three.
(7)
It
may sometimes be
slightly worse
reasons
agreement than another for simplicity, or for such as those which prompt actuaries to employ
;
Makeham's hypothesis
best-fitting curve in
but, as a rule,
case, that
is,
it
is
any
highest value of P.
6. In a recent paper " On the Comparative Eeserves of Life Assurance Companies, &c." (Journal of the Institute of Actuaries, xxxvii., pp. 458-9), Mr. King remarked that it M Model Office for the M ; and it is permissible to use the
will
what
distribution
:
if
the
HM
be
Age-Groups.
o^_ -HM
+
Square of
M -H M
Group.
HM
6-97 17-75
QM
+HM
20 25 30 35 40 45 50 55 60 65
2101
18-41 13-82 9-45
623
3-51 1-97
85
33
02
41
2-70 3-07
01 77 1-01
15
00 04
11
22
1-16 1-93
77 45
1-60
30 21
100-00
100-00
6-10
640
X3-39
144
Biometrika gives
respectively.
There are ten groups, and ^ 2 = 3*39, and the table in P= '964295 and -911413 when %2 = 3 and 4
There
is,
it
has to
be decided
if it is sufficient to test
new
policies.
500
would reduce the probability to about '05, which means that in only one case out of twenty would a random sampling lead to a system of deviations from the IP 1 as great as that shown by the (P 1 This result will remind the student of the great danger of dealing with percentages without considering the actual number of cases investigated. Mr. King's other table, which is of greater importance in his work (policies according to attained age), shows a much closer agreement,
.
as
is
In a paper on Makeham's formula {Journal of the Institute of Actuaries, xxxv.) Mr. Calderon gave some graduations of F mortality table, and on pp. 188 and 189 his results the are summarized in a form which is convenient for applying
and The probability if ^ 2 2 The odds % = 30 is -051798, and if % = 40 is '003272. against the best of the three graduations must be 30 to 1, which shows that Makeham's formula is unsuitable or the methods of application unsatisfactory.
the test.
C, give 35*11, 34*02
His methods A,
2
and
for 20 groups.
Type I. is about *98. Type II. gives '7, Type IV. f, and the sums assured in Type VII. '2, and the reserves give
for
'9.
which reference is necessary is the which a good fit ends and a bad one begins. It is impossible to fix such a value. We have merely a measure of probability for the whole table, and if the odds against the graduation are twenty or thirty to one the result
to
actual value of
at
is
unsatisfactory
if
is
not
unreasonable,
but
exact
value
when
a result
must
be discarded cannot be given. As, however, it is clearly impossible to imagine any test which can fix an absolutely definite standard, there is no reason for objecting' to the
particular
method because
it fails
to do so.
45
CHAPTER
The Theory
1.
X.
of Contingency.
totals
on
p.
107,
table,
such
Thus,
as
that
in
of
each
column
the
proportion
the
totals
of
the
rows.
first
column would be
...
4
)
9
172 2870
...
"
56 no ~ X ...> 2870
in
and the remaining part of the table would be A moment's consideration a similar way.
definition
formed
of
the
beginning of Chapter VI. will show that such a method of formation must necessarily give the required table, because, since each column is formed in proportion to the total, the means of the columns must all be the same as the mean of the total, which shows at once from the definition that no correlation can exist in
given
at
the
shows the figures exhibiting no correlation in ordinary type, and those exhibiting correlation in Now, if these two sets of figures coincide exactly small type. in any particular case there is clearly no correlation in the table if they differ slightly there is a slight amount, and if
table
;
they
amount
of correlation,
146
and we come therefore to the conclusion that an alternative method of finding the correlation between two things is by measuring the difference between the figures in the actual correlation table and those that would have arisen if there had not been any correlation. In the last chapter we discussed a method of measuring the goodness of fit (or amount of agreement) between two sets of figures, and this suggests that we might calculate % 2 by squaring the difference between each pair of figures in the table and dividing the result by the frequency when there is no correlation. The reason for choosing the figure from the table with no
correlation as the divisor
is
that
it
may
renders
it
Unexpired
tf rm of
at Maturity.
Total.
Endowment
Assurances.
30
35
40
43
50
55
60
65
70
75
0-4
5-9
i
2
1 2
1
1-1
11-4
26
12-5
6
21-4
14
7-6
6
1-2
2
"5
56 172
4
1
1-0
2
37
6
35-0
62
38-6
36
65-8 23-2
40 127 237
271
22
3-6
10-14
15-19
6
2
26
9
9-3
17
87-8
117
145
96-8
99
165-4 58-4
52
9~0
8
1-2
1
432 665
674
1-4
3
9 9
1
39
6
14-4
24
13-9
11
1-9 1-9
1
S4
20-24
25-29
1-4
4-0
11-6
3
91-1
7S
14-1
20
11
'5
8
4
1
32
1-5
5
11-6
9
109-5
90
120-6 205-9
231
72-7
71
11-2
11
1-4
3 7 2 2
538
247
77 8
1
30-34
35-39
5-3
1
50-3
11
55-3
49
1,7-2
6
94-6
127
33-4
49
5-2
8
1-7
2
15-7
1-6
2
29'4
49
10-4
22
1-6
"2
40-44
45-49
0
'0
0
'<>
0
:
1-8
2
31
4
1-1
-2
1
-2
Total
17
62
584
643
1,098
388
60
2,870
3.
ir
As
will
it is
clear that
will give a
coefficient of correlation r;
measure of the correlation, it and the and the following proof shows that
here
<-
*2
and the correlation tabic can be approximately represented by the normal correlation surface.
147
with no correlation
Using the same notation as that of Chapter VI., the frequency is given by
J.Tr(T]<r2
is
S=
^vl_ rv
+
+
J
N
d?
:c-
2rxy
y" \
l( r2
Then
^f
-f oo oo J
-<f=^^
-IN
r+
r+
-i/^y^-fsa+y.y^i
where
ar
and y
+1
\
U-f*/
(1-r2) 2
by No.
(vi.)
of
Appendix III.
r2
"1
or
?
"Vr^
L 2
148
4.
The
may
be considered a
:
little
more
closely, as it leads to
(1)
It
between
and
+ 1.
(2)
be affected by the order of the columns (or rows), it will be seen that it is permissible to interchange them, provided, of
<$r
will not
column
(or
row) be moved at
once.
(3)
will
not necessarily be
used, because
infinite
(4)
if a very small number of groups by using the integral calculus an number of groups was assumed.
We
also
is
measure
large
of
the
goodness of
fit
between
the
no-correlation figures, a very groups gives undue prominence to the chance deviation, due to the use of a
correlation
and
number
of
of r
found from
grouping
may
differ
Too
fine a
may
one.
These conclusions are borne out by practical work, and any student who cares to go into the subject can find the value of r by the two methods from a large table, using various groupings, and he will see that the best agreements are obtained when the grouping is neither very fine nor very rough. It should, however, be borne in mind that unless the correlation table takes the form assumed in the proof, an exact agreement between the two methods cannot be expected;
5.
it
is,
-\/
v 1
+ <pof
coefficient
seems to
me
that
if
the
difficulty about-
grouping could be overcome, the coefficient of contingency would be more useful than the coefficient of correlation (?).
149
6.
The following
grouped
more conveniently
Unexpired term of
Endowment
Assurances. 30, 35,
40
&
45
50
oo
60
65
70&75
55
2
0-9
10-14
15-19 20-24
7-0
14
46-4
ss
511
42
87-2
54
30-8
28
228
432 665
134
28
87-8
117
96-8
99
165-4
127
58-4
52
10-2
9
20-6
38
135-3
145
149-0
155
254-4
237
89-9
84
15-8
11
20-9
4
1372
133
151-0
1G7
257-8
271
91*1
7S
16-0
21
674
538
25-29
16-7
;>
109-5
;>0
120-6
123
205-9
231
72-7
71
12-6
14
30-34
35-49
7-7
l
50-3
11
55-3
49
94-6
127
334
49
5-9
10
247
86
2-7
175
19-2
s
32-9
51
11-7
26
2-0
1
Total
89
584
643
1,098
388
68
2,870
this
is
found
to
be
gives <*=
'2872.
^ ='0899,
*03
;
and the
coefficient of
contingency
is
)rr=
Vi5 i+<f
is i
7. The probable error of the coefficient of contingency may be taken as approximately one and a third times that of r. 8. Though Ave have dealt with the theory of contingency
its
mind that
in
use
is
when
of quantitative
instance, as
application
may
may mention
that
2 x from
the
column are
7'0,
150
now, in conclusion, refer briefly to some of the practical applications to which the theories of correlation and contingency can be put. Some actuarial applications have already been made, such as the investigation by Professor Pearson and Miss Beeton into the inheritance of duration of life (see Journal of the Institute of Actuaries, vol. xxxv.,
9.
We may
pp. 112,
et
seq.j
and 458,
et
seq.
and Biometrika,
the
vol.
i.,
part
wife
I.)
and age
man and
mentioned case seems to suggest that it might be possible to apply the method in connection with the valuation of pension funds, while we have already noticed that it is possible that it might be of use for checking average ages, &c, in
endowment assurance
valuations.
151
APPENDIX
I.
USEFUL CONSTANTS.
= 2-71828
18285
tt=314159 26536
log 10 e =
-43429 44820
log 10 7r=
-49714 98728
log 10 \/7r=
-24857 493(34
log io -7
\
'
=160091 00057
27T
log 10 e-r,
= r-96380 87932
152
APPENDIX
II
AND r FUNCTIONS.
B(m,ri)
=
=
xm
~l
(lx) n
-
-l
dx
Jo
r( 1 >)
l
e- :cxi
d,e
I.
\x^- e- x dx=i
e-
ar
^-
+(p l)\xP-*6-*da:
vanishes
'
,
-3
-1
when
a?
=00
it
can
be written
and
evaluation
ch. xiv.)
forms
(Edwards'
Diff.
Gale,
xP- l e~ xdx={p 1)
Jo
aj*""%"*daj
Jo
I\p)=(p-l)T(p-l).
If
7
be an integer,
JT(^)
=|jp 1.
tt
II.
r{)n)r(n) ;
*
,\
r(m)=
and
r(m)e--z n
~
1
c^W" ^
1
Jo
e~ e
l+x} z m+n ~ l dx
Jo
153
But
if
g(l
f
+ x) =ij, we
get
T
p-zX+xtym + n-XJy
C
I
x
fjUuiii+ii-LI,,
r(m)r(n)=r(m + n)
But putting
we
d*
l+#=
in this integral,
obtain
fl_ z )m+n
and
dz
which reduces
IIITo
prove
r(4)=VV.
We
2
Jo
e~''dx-=-\/ir,
e-*"<fc=f e-*a-*<fe=r(i)=vV.
-'0
Jo
For statistical work a table of r(^) or logr(A') is required, and Legendre has given a table of the latter to twelve figures for values of x between 1 and 2, from which logT^-) can be found easily, provided x is small. When x is large, logT^') can be approximated to. The
best
known approximation
is
V(x + 1) =\/2irxx x 6- x e ~ ux *
or
log 10 r (as
+ 1) =log
10
s/2ir +
{as
!j)
log 10 x
(x +
8.
~\ log
10
and
it
is
To show how
used,
and
also
how
the approximation
Algebra, vol.
pp. 308,
&c,
154
approaches
the
true
value,
the
following
table
has been
prepared
True.
Approximate.
1-372
2372 3372
4-372 5-372 6-372 7-372 8-372 9-372 10-372
1-948975 086329 461444 989332 1-630012 2-360148 3-164424 4-032009 4-954838 5-926670
3164430
4-032010 4-954837 5-926669
-000001 -000001
six-figure
is
table
of
Legendre
given on pp.
by
abridging
155
APPENDIX
III.
The Integration of some Expressions connected with the Normal Curve of Error.
1.
On
e~ x 'dx=. V7r
en
(i.)
(ii.)
is
symmetrical, we have
If
00
(iii.)
we integrate \x2tle~
'~dx
by
parts,
we have
r
I^n
and inserting the limits
j.-ji/i!
and
+x
:
we have
r+
J
-a
r+c
x2,l + 2 e- za dx
2-r-lJ_ x
connection
(iv.) v J
r^_2gyr
Then
2!
Z e
2(1-,-)
U,
# a
2
j.
_
e
//-'(I-/'- ) <r,-
2(1-/-)
Hi-ax
cr,
"
"I
=Z e
Then
j
jr_
2<r 2
a
2(l-i-=> x *
'
(ii.)
2dfo?d^=
^ v2tt(1
J
r-)<nc-ii-i-<r*"-dy
(v.)
00 J oo
oo
v^l
27ro- 1 o-o
r~~
156
3. Using the same
if
method
be shown that
ac> I'2
f+OD
I
f+
2g-KaaB -2toy+cy 3
)^^ ==
(v i.)
for the
-2V -^}
l
( "t
'- i)
and
if
to x,
we have
and
if
ac
2 is
positive,
we can
respect to
y and have
411acb+x
4.
\/ ac
b2
r+x
We
will
now
find
J
zxydxdy
ao J ao
Proceeding as in
r+oo
(v.),
we have
xyz
e
r+co
zxydx=
J -X,
oo
where
3la
:
X=a?
"
.
0(
yc--'-
-^"
y/'tri
"'
_
e
x2(i-r=)c
Jx
e~ x
+o
because by
r
(iii.)
~ !
XdX
is
zero
=*o
But, by putting
rV2ir(l r2)fe-y
and using
l2
*'-'-
=0
in (iv.)
(ii.),
we have
j.
*
j
zxydxdy=Ztfrfrr2vr
= ^S(T
because by (v.)
(T2r
(vii.)
'"2^
^P
157
5.
We may
now
"
To
find a value of r
to in Chap.
VII
2irvl
where
rf,
2
j7i Ja-
N, h and &
}
are
known."
-
e-fo
+.v" -2rey)/2(i
-r-) XJ say
(a)
and expand
it in
where
Wm
=p^
+//
)^_ J ^
differential
coefficient
/dnXJ\
( 7)
Now
respect to
take
r,
the logarithmic
of
with
and we have
(pt?+y2 2rxy) d
dr
(I
ldV
r2)-l
dr
= _(^ +
dV
,/r
//
(1
_ r2 )-l iry +
.( 1
_r
o )
-l
and we
have
+ ;ry (w w + w O 1) ?*
.
._
2)
Hence
^o=I
ux xy
(8)
- 6#a + 3) <y - 6/ + 3)
(8) are
t'
xv n _
( 1
y _ 2
....()
w n =yiv n -i (nl)w n _ 2
and we can, therefore,
re- write (j3) as
Ltj
7T
i
e
Z7T
-^
!(1+
^v+^+...)
1!
^1
(?)
158
oo
with respect to
.r,
wn
x we have
t
htt
/,
i
1-rr
lJ h
I
i r
1
!
^7rJ
/,
;,
+ r-=
w
!
e-h* v n dx+.
J
/,
=
where V
is
v^tt
-^fv. +
[
^
1!
+ ... +
^+..A
n\
|
M
f
written for
v n e-*'~dx.
v2irJJi
Now
that
y from & to
oo
remembering
e-%y*w n dy,
Yn
for
we
see that
r TJdxdy
V,
4
general term
is
n\
evaluate
OT
and
W^.
From
of v n
is
(e) it
xv
n(n-l) xn ~ 2
j-|
+ n(n-l)(n-2)(n-3) x n~4
&c.
(7?)
Now we
notice that
dvn
-j-
ax
nvn -i
dx
by
(c)
=*_!
#e
-'
v n -idx
- **
ax
*^tlie proof
is
as follows
] (
,,-(-
1)
, .^.,1
"-
,"- 2)
-',
":;
^-^-^-^-*> .^\.,
__
-(,,_!).,.,.
(-l)(-2)(-3).,^_
n(n +~
-*"
//(-])
ji
arn
--
159
and integrating the latter integral by parts we have
Now, writing
d
H for -= e-^
V2w
',
and
K for -= r-~^ \
:
we have from
(a)
V2tt
N
27rJ
/,
fc
\\n\
=*
or
(b
+ d)(c + d)
*(r n nwyF ,
1 ),
N
;l
0^-i)^,J
remembering that
N = a + h-\-c + J,
we write
\
W be
"/**,
= ,+
'J
*(**-3)*(**-3)
+ ^7*(^-10A + 15W^-10P+15)
2
&c
(viii.)
160
APPENDIX
IV.
deals with
not been
generally
directed
that
of
the
system
insufficient.
It
is
the Statistical
Societ}^ this
year, Professor F. Y.
Edgeworth, who
Pearson's
points
out that
subjected
to the
With
we may turn
suggested methods.
I.
Method of Translation.
by using y=-t/W3'. This merely conceals the use of an absolutely general expression, and It is hard one still requires to know what forms are best for f(jx) why the normal curve should be held to be anything more than to see For a fuller account of a first approximation to a general result. this method, the reader may refer to F. Y. Edgeworth, Journal of Statistical Society, vol. lxi., pp. 675-689, or J. C. Kapteyn, " Skew Frequency-Curves in Biology and Statistics," Groningen, 1903.
theoretical basis, graduation
might be
II. The use of half one normal curve for positive and half Obviously, there is another normal curve for negative frequencies.
meaning
is
empirical
so too, but
The method cannot give suitable curves for graduating the examples of Type II. or Type III., nor a curve rising abruptly from the
axis of x.
III.
series
y=A <(>) + A
vol.
3 </>"'0r)
+ A^ ( +
iv
...
where
(#)=
j= e-C*-'')-/^-. The
Trans.,
into
curve
has
been
given
by
C. Y. L. Charlier, "Researches
the
161
Meddelanden fran Lunds Astronomiska Observatorium, Lund, 1906, and T. N. Thiele, "Theory of Observations," London, 1903, either in
this form, or as
y=+(*){l + 08[(*-ft)-8e(*-ft)i
.}*
f <f>(.v) ,cf
J a x-{r l
a. .v2
.]-.
Charlier
method of moments
,
notation, 6=/*,
tables of
cr-=fJ>. 2i
&*<}>'"
A s=
and
cr
3
</>
A 4=
(\r),
and writes
&c.
he gives
o-</>(V),
(x)
lv
his formula as
An
we
find
mean
and
44 5772339,
N =43020,
Charlier's
^3
<7
= 012208
'007079, and
using
3!
Central
Age
.r- 44-57723
5a
sum
1
of three
First
Term.
Second Term.
Third
Term
-3-7200
0004
+ 0002
10 15 20 25 30 35
- 3-2500
-2-7800
0020
0084 0277 0734 1561 2661 3637 3986 3503 2468 1394 0632 0229
- 2-3100
-1-8401
- 1-3701 - -9002
40
45 50
55
-4302
+ -0397
5097 9797 1-4496 1-9196 2-3895 2-8595 3-3295 3-79S4
60 65 70 75 80 85 90
0067
0015 0003 oooo
43693
0003 0007 0010 0001 0030 C052 0023 0050 0084 0037 0032 0047 0025 0002 C010 0007 0003 0001
4
14
589 256 92 29 7
2
1
9,156
Thiele obtains the equation by writing e-<*- 6>' ^' = e -^e^'^'e'^'l 2 ^ and then expands the last term by Madaurin's theorem. Charlier in " Ueber das Fehlergesetz," Arkiv for Matematik, vol. ii., Stockholm, 1905, adopts a method which follows that of Laplace. EdgeAvorth gives more than one method of
lyl
162
This graduation, which shows the method in a suitable application,
which is less probable than the Type IV. graduation as judged by the test in Chapter IX., though the difference is entirely due to the bad agreement in the age 5 group. With the Type II.
gives a result
is
5
^ = 12M-4{o-^(.i)--0081o-VOr)--01882o-
iy
^>
Or)}
which gives a fair result that is improved into an excellent one by omitting the middle term, but negative frequencies arise (see below).
The formula fails to graduate distributions such as our example for Type I. or Example I. of Table I., and for these cases Charlier
suggests the use of the series
IV. y
= FGr- + c)=B
e
-
tf.r)
where
A sin irx |~1 tcx rl
A, A-
'V2
+
)
er x \*
2!(>-2)
Theory of Observations,
values
c
when x
p. 21).
is
Thiele,
is
we assume
based on
&c.
Thus one
examples
solution
2
w=l
and
c.
= 0,
He
while
another, by assuming B!
gives
= B =B =0
finds A, w,
and
only
two points in connection with its a third point to which he application have still to be cleared up is that a statistical criterion depending on the moments does not refer is required to show which series is to be used and which solution
two
because
;
of
IV.
is
to be taken.
Apart, however, from these points the use of a series seems open
to
many
objections
may
moments which
are untrustworthy
The use
of
a limited
number
as those suggested,
may
which
is
is
merely the
first
series
III.,
and
its
inability
is
to
rather
163
APPENDIX
V.
Ac.
BOOKS, REFERENCES,
The
will,
following
list
It
THEORETICAL PAPERS,
Biometrika
"
(Editorials)
:
&c.
On the
vol.
Biom.,
Biom.,
Blakeman,
"
J.
On
Tests for
Linearity
of
Regression
in
Frequency
Distributions."
Biom.,
:
332, et seq.
Blakejiax,
"
J.,
and Pearson, K.
On
the Probable Error of Mean Square Contingency." Biom., vol. v., pp. 191, et seq.
C. B.
:
Davenport,
" Statistical
New
&
&
Sons;
Hall, 1904.
Galton, F.
" Correlations
Pearson, Karl:
"
in
Skew Variation
Homogeneous Material."
Phil. Trans.
A., vol. clxxxvi., pp. 343, et seq., and a supplement in Phil. Trans. A., vol. cxcvii., pp. 443-459.
"Regression,
Phil. Trans.
On
Form
may
arise
when
lx.,
Indices
used, &c."
Proc. Boy.
Soc.,
vol.
pp. 489-498.
"
On
the criterion that a given system of Deviations from the Probable in the case
Variables
is
of
a Correlated
System of
such that
it
Random Sampling."
164
Pearson, K_u<l (continued):
"
of
On
the
Correlation
Characters not
quantitatively
measurable."
"
On
Points in Space."
"
On
Phil.
Trans. A.,
"
235-299.
On
and
vol.
Correlation
of
Organs."
Phil.
Trans.
A.,
"
On
a General
On
the
Theory
of
Contingency
and
its
relation
to
Association
Normal Correlation." Drapers Company Research Memoir: Dulau & Co., 1901.
and
the General Theory of
"
On
Skew
Correlation and
Non-
linear
degression."
Drapers"
Company
Research
Co., 1905.
On
Measurements."
vol.
ii.,
Biom.,
vol.
i.,
and
pp.
1, et seq.
Gr.
:
and
On
Pandom
Selection on Variation
,
pp.
229-311.
Sheppard, W.
"
F.
Distribution and
excii.. pp.
On
Normal
'
Normal
Correlation."
Phil.
101-167.
On
the Calculation of the Most Probable Values of the Frequency Constants for data arranged according to
equidistant divisions of a scale."
roc.
Lon. Math.
Hoc,
353-380.
Yule, G. U.
"
in the case of
On
&c,
"
Skew
Correlation."
On
Journal of Statistical
165
TABLES.
Peofessoe Peaeson informs us that Tie hopes to publish very shortly a volume of copyright tables for the use of statisticians, and it has therefore been decided not to include any tables in this volume
except that of lo<jY{p).
E.
&
F. N. Spon.
Goodness of
vol.
i.,
Fit of
Theory to
Biom.,
Numbers and
Sums
of
Biom.,
vol.
ii.,
Gibson,
W.
Tables
for
Facilitating
the
iv.,
Computation
of
Probable
Errors.
Biom.,
vol.
Lee, A.
Tables of F(r, v)
Sherrard,
W.
F.
New
Biom
vol.
ii.,
Statistical
Methods "
Normal Curve
of Error.
Table of
first six
numbers from 1
to 1,054.
Lcos#,
Lcot0.
166
Table oflogT(p).
p
1-00 1-01 1-02 1-03
2
9500 7043 4656 2338 0089 7907
5791 3741 1755 9833
3
9251 6801 4421 2110
4
90U3 6560 4187 1883 9647 747S 5376 3338 1365 9456 7610 5825 4101 2438
5
8755 6320 3953 1656
9427 7265 5169 3138 1172 9269
6
8509 6080 3721 1430
7
8263 5841 3489 1205
|
8
8017 5602 3257 0981
8772 6629 4553 2541 0594 8710 6883
9
777: J
i
1-10
1-11
7529 5128 2796 0533 8338 6209 4145 2147 0212 8341
6531 4783 3096 1469 9901 8390 6939 5544 4205 2922
1695 0521 9401 8335 7321 6359 5449 4589 3780 3020 2310 1648 1035 0470 9951 9480 9054 8676 8342 SU53 7808 760S 7451 7338 7268 7240 7254 7310 7407
112
1-13
1-97
1-97 1-96 1-96
9750 7285 4892 2567 0311 8122 6000 3943 1951 0022 8157 6354 4612 2931 1309 9747 8243
6797 5408 4075 2797 1575 0407 9292
117 118
1-19
1-20
1-24 1-25
1-26
1-30
1-31 1-32 1-33
7974 6177 4441 2766 1150 9594 8096 6655 5272 3944 2672 1456 0293 9184 8128 7125 6173 5273 4423 3624 2874
-174 1522 0918
22-12
137
1
1-38 1'39
1.40
1-41 1-42
1-94
1-94 1-94 1-94 1-94
143
1-44 1-45 1-46
1-47 1-48 1-49
T94
1-94
1-94 1-94 1-94
1585 0977 0416 9902 9435 9015 8640 8311 8026 7786 7590 7438 7329 7263 7239 7258 7317
7115) 1
0992 9442 7949 6514 5137 3815 2548 1337 0180 9076 80^5 7027 6081 5185 4341 3547 2802 2106 1459 0861 0309 9805 9348 8936 8571 8250 7975 7744 7556 7413 7312 7255 7240 7266 7334
711
1
0835 9290 7803 6374 5002 3686 2425 1219 0067 8968 7923 6930 5989 5099 4259 3470 2730 2040 1397 0803 0257 9757 9304 8898 8537
8221
7428 5650 3932 2275 0677 9139 7658 6234 4868 3557 2302 1101 9955 8861 7821 6834 5898 5013 4178 3394 2659 1973 1336 0747 0205 9710 9262 8859 8503 8192 7925 7703 7524 7389 7298
72 48
9208 7053 4963 2939 0978 9082 7248 5475 3764 2113 0521 8988 7513 6095 4734 3429 2179 0984 9843 8755 7720 6738 5807 4927 4097 3318 2588 1907
1275
0690 0153 9663 9219 8822 8470 8163 7901 7683 7509 7378
7291 7246 7243
0786 8896 7068 5301 3596 1951 0365 8838 7369 5957 4601 3302 2057 0867 9732 8649 7620 6642 5716 4842 4017 3243 2518 1842 1214 0634 0102 9617 9178 8785 8437 8135 7877 7664 7494 7368 7284 7214
72 \b
5128 3429 1790 0210 8688 7225 5818 4169 3175 1936
0751 9621
2344 0403 525 6709 4955 3262 1629 0055 8539 7082
568 L 4337 3048 1815 0636 9511 S439 7420 6453 5537 4673 3858 3094 2379 1712 1094 0524 0001 9525 9095 8711 8373 8080 7831 7626 7165 7348 7273 7241 7251 7302 7395
-7529
'
7242
7277 7353
7 474
'
8544 7520 6547 5627 4757 3938 3168 2448 1777 1154 0579 0051 9571 9136 8748 8105 8107 7854 7645 7479 7358 7278 7242 7248 7295 7384 7514
167
Table of log
r(p)
continued.
5
|
p
1-50
1-51 1-52 1-53
1
|
2
|
4
7612
6
7647 7850 8093 8376 8698 9059 9458 9896 0372 0886 1437 2025 2650 3312 1010 4743 5513
7
7666 7873 8120 8406 8732 9097 9500 9912 0422 0939 1494 2086 2715 3380 4081 4819 5592 6400 7213 8122 9034 9980 0961 1976 3024 4105 5220 6367 7547 8759 0003 1279 2586 3925 5295 6697 8128 9591 1084 2607 4159 5742 7354 8996 0666 2366 4094
5851 7637
8
7685
9
7704 7919 8174 8468 8802 9174 9586 0035 0522 1047 1610 2209 2845 3517 4226 4970 5740 6566 7416 8301 9220 0174 1162 2183 3238 4326 5447 6600 7787 9005 0255 1538 2852 4197 5573 6980 8419 9887 1386 2915 4474 6062 7680 9327 1004 2709 4143 6206 7997 9816
9
1-94
1-94 1-94 1-91 1-94 1-94 1-94
1-94 1-95 1-95 1-95 1-95 1-95 1-95
1-95 1-95 1-95
!
1-60
1-61 1-62 1-63
7545 7724 7913 8201 8500 8837 9211 96^9 0082 0573 1102 1668 2271
2911
1-95
1-95 1-95
'
1-70
1-71 1-72 1-73
1-95
1-95
1-96 1-96
79
P80
1-81 1-82
183
1-84 1-85 1-86
1-87 1-88 1-89
1-97
1-97 1-97 1-97
1-90
1-91
1-98
1-98 1-98 1-98
1-92 1-93
194
1-95
196
1-97 1-98 1-99
199
1-99
3587 4299 5047 5830 6649 7503 8391 9311 0271 1262 2-87 3315 4436 5561 6718 7907 9129 0383 1668 2985 4333 5712 7123 8561 0036 1537 3069 4631 6223 7814 9191 1173 2S81 1618 6381 8178
9672 0130 062 4 1157 1727 2333 2977 3656 4372 5124 5911 6733 7590 8182 9409 0369 1363 2391 3453 4547 5675 6835 8028
9253 0509 1798 3118 4470 5852 7266 8710 0184 1689 3224 4789 6383 8007 9660 1343 3051 4794 6562 8359
1
7594 7785 8016 8287 8597 8946 9334 9701 0225 0728 1268 1815 2159 3110 3797 4519 5278 6072 6901 7766 8661 9598 0565 1566 26U1 3669 4770 5901 7071 8270
9501
4741 6132 7552 9002 0483 1994 3535 5105 6706 8336 9995 1683 3399 5145 6919 8722
3
7806 8041 8316 8630 8983 9375 9806 0274 0780 1321 1905 2522 3177 3867 4594 5356 6151 6986 7854 8756 9693 0664 1668 2706 3778 4882 6019 7189 8392 9626 0893 2191 3520 4881 6273 7696 9119 0633 2117 3690 5264 6867 8500 0162 1853 3573 5321 7098 8903
7629 7828 8067 8316 8664 9021 9117 9851 0323 0833 1380 1965 2586 3244 3938 4668 5434 6235 7072 7943 8848 9788 0763 1770 2812 3887 1991 6135 7308 8514 9751 1021 2322 3655 5019 6111 7810 9296 0783 2299 3816 5423 7029 8665 0330 2024 3746 5498 7277 9085 5
6317 7157 8032 8911 9881 0862 1873 2918 3996 5107 6251 7127 8636 9877 1150 2454 3790 5157 6555 7984 9443 0933 2453 4003 5582 7192 8830 0498 2195 3920 5674 7457 9268
9450
7896 8146 8437 8767 9135 9543 9989 0472 0993 1552 2147 2780 3449 4151 4894 5671 6482 7322 8211 9127 0077 1061 2079 3131 4215 5333 6484 7666 8882 0129 1408 2719 4061 5434 6838 8273 9739 1234 2761 4316 5902 7517 9161 0835 2537 4269 6029 7817 9633
INDEX.
24-30, 102-104.
H.
J.
W., 33.
128, 123.
AREAS
for curves, 48, 49, 58, 63, 104, 105.
moments
of a
system
of,
28-30, 102-104.
ARRAY,
107.
B-FUNCTIONS,
59, 85,
App.
II.
BEETON,
M., 150.
CALCULATION OF CURVES,
ch.
V.,
161, 162.
&c. (see
CORRELATION', &c).
ch.
V.
I.
CONTACT,
and
28.
CONTINGENCYcorrelation, 145-148.
vii, viii.
mean,
mean
theory
square,
X.
probable error of
of,
mean
square, 149.
CORRELATION,
App.
III.
119, 128-130.
of,
136-138.
CRITERION
for type of curve, 42-17, 50.
for goodness of
fit,
ch. IX.
170
DEATHS,
graduation of
statistics,
79,
80.
DIAGRAMS
construction
of,
reproduction
of,
11,
114.
EDGEWORTH,
ENDOWMENT ASSURANCES
check valuation of, 121. and correlation and contingency, 106-109, 117-122, 145-149.
ENTRANTS,
EXISTING,
EXPOSED TO RISK
hypothetical, 94-100.
statistics of, 8, 54-58.
FREQUENCY-CURVES
desiderata, ch.
I.,
36.
FREQUENCY DISTRIBUTIONS,
G-FUNCTIONS,
70, 71, 74-77.
ch.
II.
56,
App. II.-
when p <
App.
ii.
54.
2, 5,
103.
GRADUATION
36, ch. V., App. IV.
of rates, 92-100.
GRAPHICAL REPRESENTATION
of corvee, 72, 80, 89.
of distributions, 5-8.
HARDY,
G. F., 19, 54, 62, 93, 99, 100, 119-121, 139 (footnote).
HYPERGEOMETRICAL
INDICES, dangers
SERIES,
37, 38.
of using,
122-124.
KAPTEYN,
KING,
J.
C, App. IV.
LEAST SQUARES,
LEE, A., 71. LEES, M. M., 93. LIDSTONE, G. J,
Method
of, viii.
xiii.,
171
MACDONELL, W. R., 125. MAKEHAM'S HYPOTHESIS, 13, 98-100, MARRIAGE STATISTICS, 66, 67, 93-96.
MASCULINITY,
132, 133.
144.
MEAN, 9
distance between
41.
probable error
of,
47. for
MODEL
OFFICE,
King's
statistics, 143,
MOMENTS
adjustment
of,
24-30, 102-104.
method
of, ch.
III.,
summation method,
128 (footnote),
qnm(5)
96-100.
ORDINATES
loaded, 16.
mid-, 16, 48, 63.
moments
of
system
of,
15, 24-28.
13-15, 30-34.
ix.,
x.,
14,
ch. IV.,
App. IV.
33, 34, 150.
fitting, ch.
I.,
37-39.
Normal curve
of error ").
PROBABLE ERRORS,
QUADRATURE FORMULAE,
RATES, graduation of, 92-100. REGRESSION, 113, 116.
SEX-RATIO,
132, 133.
SHEPPARD, W. F., 29, 89, 95. SICKNESS TABLES, graduation, &c, SKEWNESS, 11, 41, 49.
70-72.
172
and probable
of,
165.
TYPES OF FREQUENCY-CURVES,
VACCINATION STATISTICS,
VARIATION,
V, App.
IV.
ch. VIII.
8,
v.
103, 104.
to desk
DUE on
1948
19Dec'49FB
15^~2RS
IflAYl
1953
I2%4 9fW
2/lfe'5C.
lEPi
1J63
APfi* 4/954 u,
DEAD
30^'
LD
A'
2lApr'60RT
21-100?n-9,*47(A5702sl6)476
LD
21A-50-/??-4,'60
(A9562sl0)476B