You are on page 1of 6

Lesson 35

Hypothesis Tests: Anderson-Darling

Reading: Loss Models Third Edition 16.4.2

This Anderson-Darling was included in the pre-2000 Course 160 syllabus, but was not on the Course 4 syllabus
during 2000-2002. It has been on the Course 4 or Course C syllabus since 2003. With the exception of theory
questions that are parts of multiple true/false questions, no question on it has appeared on the released exams,
although there was a calculation question on it on the Fall2008 exam according to student reports. If you're in a
hurry, skip this lesson and just study Table 36.1 carefully.
In the following, we continue to use the notation F* to indicate the fitted function adjusted for truncation;
see the discussion of equations (33.1) on page 613 for details.
The Kolmogorov-Smirnov statistic is crude in that it uses a single point, the point of maximal difference
between Fn and F*. Anderson-Darling, in contrast, integrates the difference over the entire range from the lower
truncation point t to the upper censoring point u. It weights the difference by the reciprocal of the variance.
The formula for the Anderson-DarlingA 2 statistic is:

A2 = nfu (Fn(X)- F*(x))2


t F*(x)(l- F*(x)) f*(x)dx

That formula is usually hard to evaluate, but if the fitted distribution is uniform it may not be too bad (see the
exercises).
The product in the denominator is small when S*(x) or F*(x) is small, near t and u; thus heavier weight is
put on the tails of the distribution.
For individual data, this can be evaluated as a sum. Number the unique ordered non-censored data points 1
from 1 to k (in other words, some observations may be tied; count these observations only once), so that they
are t = y0 < y 1 < · ·· < Yk < Yk+ 1 = u. Note that t is set equal to y0 and u is set equal to Yk+ 1 in order to make the
following formula work.

L(Sn(Yi}) (lnS*(yj} -lnS*(yi+I1)


k
2
A 2 =- nF*(u)+ n
j=O .

L(Fn(yj }) (lnF*(yj+l} -lnF*(yj })


k
2
+n (35.1)
j=l

To help you memorize the formula, note the symmetry of the S's and the F's. The second factor in each sum
is arranged in the order that makes the difference positive. The second sum could start from 0 too, but the
summand corresponding to 0 would be 0 since Fn(t)='=O.
In the first sum, if u = oo or if S*( u} = 0 for any other reason, skip the last term. If there are no censored
observations, Sn(yk} = 0, which will also allow skipping the last term.
Since this statistic is the integral of a square, it cannot be negative. If you get a negative answer, you know
you made a mistake.
EXAMPLE 35A An insurer offers a coverage with a policy limit of 2000. The following three claims are observed
on this coverage: 300, 300, 800. In addition, one claim is for an amount over 2000 and is censored at 2000. You
1 In this lesson, we'll use the Yj notation for order statistics.

C/4 Study Manual-11th edition 641


Copyright ©2010 ASM
642 35. HYPOTHESIS TESTS: ANDERSON-DARLING

model the ground-up losses using a uniform distribution on [0, 2500]. You test this model against the experience
using the Anderson-Darling A 2 statistic.
Calculate A 2 •

ANsWER: We have that F*(300) = 0.12, F*(800) = 0.32, and F*(2000) = 0.80. Also, Fn(300) Fn(800) = 1·
Since n = 4, the first term of formula (35.1) is -nF*(u)= -4(0.8) = -3.2. The first sum of the A 2 formula is:

4 ( 12 (0 -ln0.88) + m(ln0.88 -ln0.68)+ W(ln0.68 -ln0.2)) = 4(0.268777) = 1.07511.


2 2

The second sum of the A 2 formula is:

4 ( @ (ln0.32- ln0.12)+
2
W(ln0.8 -ln0.32)) =4(0.760621) = 3.04248.
2

SoA 2 is:
A 2 = -3.2+ 1.07511 +3.04248=j,0.9l759J 0

The critical values for the Anderson-Darling statistic are fixed; they do not vary with sample size n (unlike
the Kolmogorov-Smirnov statistic). However, they should be smaller if parameters are estimated or if u < oo,
just like the Kolmogorov-Smirnov statistic.
A summary of the main characteristics of the Anderson-Darling test is given in Table 36.1 on page 653. It is
more likely you will be tested on these characteristics than on actually calculating the statistic.

Exercises l
\,....,.
35.1. You have observed one loss of 1. The assumed distribution oflosses is uniform on (0, 2].
Calculate the Anderson-DarlingA 2 statistic for this fit.
35.2. You have observed one loss of 0.5. The assumed distribution oflosses is uniform on (0, 2].
Calculate the Anderson-DarlingA 2 statistic for this fit.

35.3. Losses are assumed to follow an exponential distribution with mean 100.
For an insurance coverage with policy limit 50, you observe one loss of25.
Calculate the Anderson-Darling A 2 statistic for this fit.

35.4. Losses are assumed to follow a uniform distribution on (0, 100].


For an insurance coverage with policy limit 60, you observe one loss of25 and one loss at the limit.
Calculate the Anderson-Darling A 2 statistic for this fit.
35.5-6. Use the following information for questions 35.5 and 35.6:

Losses are assumed to follow a uniform distribution on (0, 100].


For an insurance coverage with ordinary deductible 10, you observe one loss of 50 before deductible.

35.5. Calculate the Anderson-DarlingA 2 statistic for this fit.

35.6. Suppose that there are five losses of 50 instead of one loss. How would the answer to the previous exercise
change?

'-
Cf4 Study Manual-11th edition Exercises continue on the next page .. .
Copyright ©201 0 ASM
EXERCISES FOR LESSON 35 643

· 35.7. You have obser\Ted the following 3losses: 200, 500, 2000. You fit these to a parametric distribution F. You
are given the following table of values for F:

X F(x)
0 0.0
200 0.3
500 0.6
'---20()()_ 0.9
You test the fit using the Anderson-Darling A 2 statistic.
Calculate A 2 •

35.8. An insurance coverage has deductible of 250 and a maximum covered loss of 20,000. 22 observed losses
are fitted to a lognormal distribution with J.l = 7 and a = 2. Of these 22 observed losses, 15 are below the limit.
Each one is for a different amount.
Let Fn and Sn be the empirical distribution and survival functions, and F* and S* be the fitted distribution
and survival functions. Let 500 = y1 < y2 < · ·· < y15 be the 15 observed losses below the limit. You are given:

(i) s;(yj J(ln(S*(yj J) -In(S*(yj+d)) = 0.505

(li) F,;(yj)(ln(F*(yj+l)) -ln[F*(yj)]) =0.423

Calculate the Anderson-Darling A 2 test statistic for the fit.

35.9. An insurer offers a coverage with a policy limit of 2000. The following 4 claims are observed on this
coverage: 300, 300, BOO, 1500. You model these losses using a uniform distribution on [0,2500]. You test this
model against the experience using the Anderson-Darling A 2 statistic.
Calculate A 2 .

35.10. A mortality study on a group of3 results in failure times of2, 5, and 5. You fit an exponential distribution
to this data using maximum likelihood. You test this fit using the Anderson-Darling A 2 statistic.
Calculate A 2 •

35.11. An insurer offers a coverage with a deductible of 500 and a maximum covered loss of 5000. Observed
losses (including the deductible) are 1000 and 2000; in addition, there is one claim censored at 5000. You fit a
Pareto with a = 1, () = 1000 to this data and then test the fit using the Anderson-Darling A 2 statistic.
Calculate A2.

Solutions
35.1. Since there's only one loss, this could be calculated directly from the integral. We have

x<1
Fn(x)= {
x2:1

and F*(x) = f*(x) 0 :S x :S 2. Then

Jr(
2
2 (( (x/2)2 )(1)d (1-x/2) )(1)d
A = j0 (x/2)(1-x/2) 2 x+ (x/2)(1-x/2) 2 x
1

C/ 4 Study Manual-11th edition


Copyright ©2010 ASM
644 35. HYPOTHESIS TESTS: ANDERSON-DARLING

Making the substitution u = 2- x in the second integral, we see it is the same as the first, so we'll calculate the
first integral and double it.

(x/2)2 11 xdx
I
I 1 1
0
((x/2)(l_:_x/2))(z-)dx=z- 0
2-x

(
2 )dx
(-1+-
2 )0 2-x

= -x
1
= - (-1 - 2ln 1 + 2ln 2)
2

Doubling this, we get 2ln2 -1 =I


Using the formula with t = y0 = 0, y 1 = 1, y2 = u = 2, we get only one term in each sum (k = 1):
2
A 2 = -F*(2) + ( Sn(O)r (lnS*(O) -lnS*(l)) + ( Fn(l) 2 ) (lnF*(2)-lnF*(l))
= -1 + (1 2 ) (In 1-ln D + (1 2 )(ln l-InD
= -1-ln.!2 -In!=
2
-·1 + 2ln2

which is the same as above.


35.2. Unlike the previous exercise, the two integrals will not be the same now. We'll use the formula.

A 2 =- F*(2)+ ( Sn(O) r (InS*(O) -lnS'(0.5)) + ( Fn(0.5) 2 r (lnF*(2) -lnF*(0.5)) l,


= -1 +(1 2 ) (lnl-lnD +(1 2 )(In 1-ln
=-1-lnL-Jn!=l
4 4 0.6740
. . . I
The divergence of fitted from observed is greater in this exercise than the previous one, so it's not surprising
the statistic is higher.
35.3. Using formula (35.1), we have t =Yo= 0, u = y2 =50.

j Yi F*(yj) Fn(yj)
0 0 0 0
1 25 1- e-0.25 1
2 50 1- e-0.5 1
The first sum has two terms, but the second term is 0 since Sn(25) = 0, while the second sum has one term.

A*= -(1- e- 0·5 )+ (1 2 )(0- (-0.25)) + (1 2 )(ln(l- e- 0 -5 ) -ln(1- e- 0 ·25 ))


= -0.393469 + o.2s + o.575939 = 1·o.432sl

35.4. Using formula (35.1), we have t = y 0 = 0, u = y 2 = 60.

j Yi F*(yj) Fn(yj)
0 0 0 0
1 25 0.25 0.5
2 50 0.60 0.5 (:

C/4 Study Manual-11th edition


Copyright ©2010 ASM
EXERCISE SOLUTIONS FOR LESSON 35 645

The first sum has two terms and the second sum has one term.

A*= -2(0.6)+ 2((1 2 )(0 -ln0.75)+(0.5 2 )(ln0.75 -ln0.4)) + 2((0.5 2 )(ln0.6 -ln0.25))
= -l.2+0.889668+0.437734=ro.I274'

35.5. Using the formula, we have t = y0 = 10, y 1 =50, and u = y2 = oo. When calculating F*, the truncation
at 10 is taken into account, so that F*(x) = ( F(x)- F(10))/ ( 1- F(10)), where F(x) is the unmodified fitted
distribution.

j Yj F*(yj) Fn(yj)
0 10 0 0
1 50 4/9 1
2 00 1 1
We get one term in each sum of the A 2 formula.

A 2 = -1 +(1 2 )(0 -ln5/9)+(1 2 )(0 -ln4/9) =I o:3987171

With some work, one can also get this answer using the integral formula. The reader may fill in the details of the
following derivation:

2
A =
I5° ( 0-(X -10)/90
(x-10) (10o-x)
r 1
90 dx +
JIOO (1-(X -10)/90
(x-1o) (10o-x)
r 1
90 dx
10 90 90 50 90 90
/
\i·.. = 1-
90
(J·
10
50

( - 1 +90
- - ) dx+
100-x
f100

50
( - 1 +90
- - ) dx )
x-10
1
x)) 1:: 1::
0
= ( ( -40- 90ln(100- + (-50+ 90ln(x -10)) )
90
1
= -(-40- 90ln50+90ln90- 50+90ln90-90ln40)
90
= -1-ln(5/9)+ ln(9/4)

which is the same as above.


35.6. The integral formula for A 2 shows that if there is no change to Fn or F* but only to n, the statistic is
multiplied by n. So the new answer would be 5(0.398717) = ji.99358j. You can also see this in formula (35.1),
where each term is multiplied by n.
35.7.
2
A 2 = -3 +3( 12 (0 -ln0.7)+ -ln0.4)+ G) (ln0.4 -lnO.l))
+ 3( G) (ln0.6 -ln0.3)+ @ (ln0.9 -ln0.6)+ 12 (0 -ln0.9))
2 2

= -3+3(0.75943)+3(0.36258)=1 0.36()031

35.8. The formula for the Anderson-Darling statistic has three terms. We must calculate the first one, which
means that we need nF*( u) or 22F*(20,000). The second one involves a sum from 0 to k = 15. We are given the
sum from 1 to 15, so we need the first term of the second sum, or ( Sn(250)2) (In5*(250)-ln5*(500)). 5*(250) == 1,
but we need 5*(500). We are given the entire third term, which needs no adjustment.
We therefore must calculate F*(20,000) and F*(500).

(ln20 000)-7)
F(20,000) = iJ.l ( ' = iJ.l(l.45) = 0.9265
2

C/ 4 Study Manual-lith edition


Copyright ©2010ASM
646 35. HYPOTHESIS TESTS: ANDERSON-DARLING r
(ln500) -7)
F(500)=ci> ( . =ci>(-0.39)= 1-0.6517=0.3483
2
(ln250)- 7)
F(250)=ci> ( =ci>(-0.74)=1·-0.7704=0.2296
2
* F(20,000)- F(250) 0.9265-0.2296
F (20,000) = 1- F(250) = 1-0.2296 = 0.9046
* F(500)- F(250) 0.3483-0.2296
F (500) = = = 0.1541
1- F(250) 1-0.2296

The extra summand for the second term is -In( 1- F*(500)) = -ln(l- 0.1541) = 0.1674. So

A 2 = -22{0.9048)+22(0.1674+ 0.505)+ 22(0.423) =I4.HJ321-

35.9. Here we can skip the last term of the first summand since Sn (1500) = 0.

A 2 = -4(0.8)+4 ( 12 (0 -ln0.88)+ @ (ln0.88 -ln0.68)+


2
W(ln0.68- J.n0.4))
2

+4 ( @ (ln0.32 -ln0.12)+ W\In0.6-ln0.32)+ 12 (ln0.8 -ln0.6))


2

= -4(0.8)+4(0.2255)+4(0.8865) 1

35.10. The exponential makes the first summand less painful, since it consists oflogs of exponentials.
The maximum likelihood fitted exponential parameter is the observed mean, or 4. Then
(
2
A =-3+3(em+m
2
m)
+ 3 ( (D (ln(1- e- 514 ) -ln(1- e- 112 )) + 12 ( -ln(1- e- 514 )))
2

= -3 +3(0.40371) =I o;7iiisJ
35.11. We need F*(1000), F*(2000), and F*(5000).
1000 1 1000 2 1000 5 1000 1
F(lOOO) = l- 2000 = Z F(2000) = 1 - 3000 = 3 F(SOOO) = 1- 6000 = B F(500) = 1 - 1500 = 3
! --! 1 1 3
F*(2000) = 1- ! - 2
3 3--
F*(1000)= 2
_!3-- -
4 F*(5000)= 6
_ 3--
- 4
1 3 3
1

Now we can calculateA 2 •

A 2 = -3(0.75)+3 ( 12 ( -ln0.75)+ W(ln0.75 -ln0.5)+ GJ


2 2
(ln0.5 -ln0.25))

+3 ( G/Cln0.5 -1n0.25)+ cn\ln0.75-ln0.5))

= -2.25+3(0.54491)+3(0.25722) =I

(
C/ 4 Study Manual-!! th edition
Copyright ©20!0 ASM

You might also like