You are on page 1of 14

CS2B: Chapter 1 Poisson processes - Solutions Page 1

Poisson processes – Solutions


1.1 (i) Motor insurance claims, house insurance claims and travel insurance claims

We need to extract the relevant part of the matrix for each of the three different types of claim.
We can use the head function to display the first few lines of each table, and the nrow function
to see the number of rows:

motor <- claims[claims$claim.type=="Motor",]


house <- claims[claims$claim.type=="House",]
travel <- claims[claims$claim.type=="Travel",]
head(motor)

claim.date claim.cost claim.type


1 2008-01-11 2372 Motor
2 2008-01-20 1214 Motor
4 2008-02-03 2945 Motor
6 2008-02-15 1231 Motor
7 2008-02-16 981 Motor
8 2008-02-17 1470 Motor

nrow(motor)

[1] 352

So, there were 352 claims on the motor insurance policies.

head(house)

claim.date claim.cost claim.type


5 2008-02-09 6893 House
9 2008-02-29 10478 House
14 2008-04-21 4610 House
15 2008-04-25 9599 House
17 2008-05-01 8776 House
22 2008-06-09 8294 House

nrow(house)

[1] 221

There were 221 claims on the house insurance policies.


head(travel)

claim.date claim.cost claim.type


3 2008-01-29 800 Travel
30 2008-07-12 259 Travel
31 2008-07-17 265 Travel
32 2008-07-23 232 Travel
33 2008-08-08 315 Travel
34 2008-08-15 268 Travel

nrow(travel)

[1] 136

There were 136 claims on the travel insurance policies.

The Actuarial Education Company © IFE: 2020 Examinations


Page 2 CS2B: Chapter 1 Poisson processes - Solutions

(ii) Waiting times of a Poisson process Nt ~ Poi   t 

The distribution of the time Wt of one claim to the next is:

Wt ~ Exp   

(iii)(a) Waiting times in days for the claims for combined claims

The diff function in R takes the difference between neighbouring rows, which is just what we
need for the observed waiting times. We can set this up as a function, which will begin like this:

add.date.diff <- function (data.with.dates) {

We set up a temporary table and column bind a new column with the difference of the dates as a
numerical value:

temp <- cbind(data.with.dates,


c(NA,diff(as.numeric(data.with.dates[,1]))))

We can then rename the new column so that we know what it represents:

colnames(temp)[4] <- "Diff"

Finally, we return the temporary table as the output of the function.

temp
}

Running the function:

claims <- add.date.diff(claims)


head(claims)

claim.date claim.cost claim.type Diff


1 2008-01-11 2372 Motor NA
2 2008-01-20 1214 Motor 9
3 2008-01-29 800 Travel 9
4 2008-02-03 2945 Motor 5
5 2008-02-09 6893 House 6
6 2008-02-15 1231 Motor 6

We can see that the new column has been created correctly. For example, it is correct that there
are 9 days between 2008-01-11 and 2008-01-20. The first entry shows NA (‘not available’)
because there is no preceding claim to compare with.

© IFE: 2020 Examinations The Actuarial Education Company


CS2B: Chapter 1 Poisson processes - Solutions Page 3

(iii)(b) Waiting times for each type of claim in days

We then repeat this separately for the motor, house and travel claims:

motor <- add.date.diff(motor)


house <- add.date.diff(house)
travel <- add.date.diff(travel)
head(motor)

claim.date claim.cost claim.type Diff


1 2008-01-11 2372 Motor NA
2 2008-01-20 1214 Motor 9
4 2008-02-03 2945 Motor 14
6 2008-02-15 1231 Motor 12
7 2008-02-16 981 Motor 1
8 2008-02-17 1470 Motor 1

head(house)

claim.date claim.cost claim.type Diff


5 2008-02-09 6893 House NA
9 2008-02-29 10478 House 20
14 2008-04-21 4610 House 52
15 2008-04-25 9599 House 4
17 2008-05-01 8776 House 6
22 2008-06-09 8294 House 39

head(travel)

claim.date claim.cost claim.type Diff


3 2008-01-29 800 Travel NA
30 2008-07-12 259 Travel 165
31 2008-07-17 265 Travel 5
32 2008-07-23 232 Travel 6
33 2008-08-08 315 Travel 16
34 2008-08-15 268 Travel 7

(iv) Observed waiting times indicative of a Poisson process?

We know the waiting times of a Poisson process are exponentially distributed with the same
parameter as that of the Poisson process. So, we should find this:

mean.claim.time = mean(claims[,4][2:length(claims[,4])])
Poiss.param.claims = 1/mean.claim.time
Poiss.param.claims

[1] 0.194773

The test to use is a chi-squared test to see if the actual waiting times are exponentially distributed
(as they should be for a Poisson process). The null hypothesis H0 is that the claims are
exponentially distributed.

Looking ahead in the question, it will be useful to build a function that tests this for any matrix of
claims data:

test.wait.times=function(test.data,lambda,largest.time){
n=length(test.data)
test.data=as.matrix(table(test.data))

The Actuarial Education Company © IFE: 2020 Examinations


Page 4 CS2B: Chapter 1 Poisson processes - Solutions

So, the table that summarises the claim counts has now been put in a matrix.

test.data=cbind(test.data,0)
test.data[1,2]=n*(1-exp(-lambda*0.5))

This uses the probability of a waiting time of 0 days (less than 0.5 days P T  0.5 as per the
question) using the distribution function for an exponential distribution.

for (j in 2:(largest.time)){
test.data[j,2]=n*(exp(-lambda*(j-1.5))-exp(-lambda*(j-
0.5)))
}

This part of the function uses the probabilities, P  0.5  T  1.5 , P 1.5  T  2.5 , P 2.5  T  3.5 ,
and so on until some ‘largest time’. We ‘cut it off’ here and use the fact that probabilities sum to
1 to fill in the very last probability:

test.data=test.data[1:(largest.time+1),]
test.data[(largest.time+1),2]=n-sum(test.data[,2])
test.data[(largest.time+1),1]=n-sum(test.data[,1])

So, column 1 contains the actual waiting times, column 2 contains the expected waiting times.
Finally, we can create a column of  A  E  E , which we will need for the chi-squared test
2

statistic.

test.data=cbind(test.data,0)
for (j in 1:(largest.time+1)){
test.data[j,3]=(test.data[j,1]-
test.data[j,2])^2/test.data[j,2]
}
test.data
}

In our function, the probabilities calculated in the for loop are only calculated correctly whilst
the waiting times are going up by 1 each time. Looking at the table of claims data, we will need to
stop at 27. This is OK because there are instances of waiting times in days for all integers between
1 and 27:

test.claim.times=test.wait.times(claims[,4],Poiss.param.claims,27)
test.claim.times

[,1] [,2] [,3]


0 60 65.7914526 0.509806704
1 133 113.8341290 3.226893505
2 76 93.6879276 3.339414066
3 84 77.1071721 0.616169345
4 70 63.4608551 0.673807756
5 62 52.2296437 1.827695074
6 38 42.9861160 0.578357744
7 29 35.3784947 1.149997895
8 27 29.1172593 0.153956347
9 17 23.9641284 2.023820069
10 19 19.7229912 0.026502890
11 17 16.2324444 0.036294076
12 16 13.3596496 0.521828818
13 5 10.9952779 3.268981254

© IFE: 2020 Examinations The Actuarial Education Company


CS2B: Chapter 1 Poisson processes - Solutions Page 5

14 5 9.0493494 1.811978917
15 7 7.4478085 0.026925022
16 7 6.1297060 0.123564118
17 7 5.0448794 0.757698335
18 5 4.1520438 0.173174874
19 5 3.4172210 0.733107157
20 3 2.8124462 0.012507420
21 2 2.3147035 0.042786589
22 2 1.9050505 0.004732367
23 3 1.5678974 1.308068851
24 1 1.2904132 0.065358768
25 2 1.0620377 0.828382337
26 1 0.8740798 0.018140106
27 5 4.0648221 0.215152781

Usually with the chi-squared test, we need the numbers in the expected column to be at least 5,
E  5 . So, let’s rerun this table with a new largest value of 18:

test.claim.times=test.wait.times(claims[,4],Poiss.param.claims,18)
test.claim.times

[,1] [,2] [,3]


0 60 65.791453 0.50980670
1 133 113.834129 3.22689351
2 76 93.687928 3.33941407
3 84 77.107172 0.61616935
4 70 63.460855 0.67380776
5 62 52.229644 1.82769507
6 38 42.986116 0.57835774
7 29 35.378495 1.14999789
8 27 29.117259 0.15395635
9 17 23.964128 2.02382007
10 19 19.722991 0.02650289
11 17 16.232444 0.03629408
12 16 13.359650 0.52182882
13 5 10.995278 3.26898125
14 5 9.049349 1.81197892
15 7 7.447808 0.02692502
16 7 6.129706 0.12356412
17 7 5.044879 0.75769834
18 25 23.460715 0.10099427

Now, all the numbers in the expected column are at least 5, E  5 .

We now sum the last column and calculate the critical value based on there being 17 degrees of
freedom (19 categories and we lose 2 degrees of freedom).

sum(test.claim.times[,3])

[1] 20.77469

DOF = length(test.claim.times[,3]) - 2
qchisq(0.95,df=DOF)

[1] 27.58711

So, because the test statistic of 20.77 is less than the critical value of 27.59, there is insufficient
evidence to reject H0 and we can conclude that the waiting times are exponentially distributed.

The Actuarial Education Company © IFE: 2020 Examinations


Page 6 CS2B: Chapter 1 Poisson processes - Solutions

(v)(a) Using actual waiting times of the motor insurance claims


We will need to calculate the Poisson parameter for the motor insurance policies:

mean.motor.time = mean(motor[,4][2:length(motor[,4])])
Poiss.param.motor = 1/mean.motor.time
Poiss.param.motor

[1] 0.09704175

(v)(b) Thinning the Poisson parameter

Thinning works by taking the Poisson parameter for all the claims and multiplying it by the
proportion of claims that are motor insurance claims:

proportion.motor = length(motor[,4])/length(claims[,4])
Poiss.param.motor2 = Poiss.param.claims*proportion.motor
Poiss.param.motor2

[1] 0.09669973

The answers are extremely similar.

(vi) House insurance claims and travel insurance claims

We need to run all the same code but with ‘house’ instead of ‘motor’:

mean.house.time = mean(house[,4][2:length(house[,4])])
Poiss.param.house = 1/mean.house.time
Poiss.param.house

[1] 0.06100943

proportion.house = length(house[,4])/length(claims[,4])
Poiss.param.house2 = Poiss.param.claims*proportion.house
Poiss.param.house2

[1] 0.06071205

The answers are very similar again, whether using the actual waiting times of the house insurance
claims or thinning the Poisson parameter used for all claims.

We then run all the code for a third time but with ‘travel’ instead of ‘house’:

mean.travel.time = mean(travel[,4][2:length(travel[,4])])
Poiss.param.travel = 1/mean.travel.time
Poiss.param.travel

[1] 0.03736507

proportion.travel = length(travel[,4])/length(claims[,4])
Poiss.param.travel2 = Poiss.param.claims*proportion.travel
Poiss.param.travel2

[1] 0.03736126

Again, the answers are very similar, whether using the actual waiting times of the travel insurance
claims or thinning the Poisson parameter used for all claims.

© IFE: 2020 Examinations The Actuarial Education Company


CS2B: Chapter 1 Poisson processes - Solutions Page 7

(vii)(a) Observed motor insurance claims waiting times from a Poisson process?

In our function to test the waiting times, the probabilities calculated in the for loop are only
calculated correctly whilst the waiting times are going up by 1 each time. So, remembering this
and looking at the table of the motor insurance claims waiting times, we will need to stop when
t  34 :
test.motor.times = test.wait.times(motor[,4],Poiss.param.motor,34)
test.motor.times

[,1] [,2] [,3]


0 24 16.671616 3.221355932
1 63 31.011800 32.995342622
2 17 28.143770 4.412472545
3 31 25.540981 1.166786989
4 24 23.178903 0.029086783
5 26 21.035275 1.171769658
6 14 19.089893 1.357106118
7 15 17.324424 0.311868719
8 10 15.722228 2.082649970
9 11 14.268207 0.748599890
10 9 12.948657 1.204131763
11 7 11.751140 1.920948494
12 11 10.664373 0.010562793
13 4 9.678112 3.331326683
14 3 8.783062 3.807761270
15 7 7.970788 0.118235308
16 6 7.233634 0.210385705
17 5 6.564654 0.372927849
18 4 5.957543 0.643213648
19 6 5.406578 0.065133614
20 4 4.906567 0.167502898
21 1 4.452799 2.677376569
22 2 4.040996 1.030850606
23 2 3.667277 0.758004284
24 2 3.328120 0.529999740
25 1 3.020329 1.351418939
26 4 2.741003 0.578281735
27 2 2.487510 0.095543849
28 1 2.257461 0.700436285
29 2 2.048686 0.001157021
30 1 1.859220 0.397080031
31 3 1.687276 1.021317543
32 1 1.531233 0.184301746
33 3 1.389622 1.866202960
34 25 13.636263 9.469934598

The Actuarial Education Company © IFE: 2020 Examinations


Page 8 CS2B: Chapter 1 Poisson processes - Solutions

Usually with the chi-squared test, we need the numbers in the expected column to be at least 5,
E  5 . So, let’s rerun this table with a new largest value of 21:

test.motor.times = test.wait.times(motor[,4],Poiss.param.motor,21)
test.motor.times

[,1] [,2] [,3]


0 24 16.671616 3.22135593
1 63 31.011800 32.99534262
2 17 28.143770 4.41247254
3 31 25.540981 1.16678699
4 24 23.178903 0.02908678
5 26 21.035275 1.17176966
6 14 19.089893 1.35710612
7 15 17.324424 0.31186872
8 10 15.722228 2.08264997
9 11 14.268207 0.74859989
10 9 12.948657 1.20413176
11 7 11.751140 1.92094849
12 11 10.664373 0.01056279
13 4 9.678112 3.33132668
14 3 8.783062 3.80776127
15 7 7.970788 0.11823531
16 6 7.233634 0.21038570
17 5 6.564654 0.37292785
18 4 5.957543 0.64321365
19 6 5.406578 0.06513361
20 4 4.906567 0.16750290
21 50 48.147796 0.07125271

The tail has been cut off quite abruptly here. It would have been better to group slightly
differently. However, this is good enough to perform a test, especially as this turns out to be a fail
anyway but with the last category numbers being quite close.

sum(test.motor.times[,3])

[1] 59.42042

DOF=length(test.motor.times[,3])-2
qchisq(0.95,df=DOF)

[1] 31.41043

So, there is sufficient evidence here to reject the null hypothesis that the motor insurance claims
follow a Poisson process.

© IFE: 2020 Examinations The Actuarial Education Company


CS2B: Chapter 1 Poisson processes - Solutions Page 9

(vii)(b) Observed house insurance claims waiting times from a Poisson process?

First, we stop when t  25 :

test.house.times = test.wait.times(house[,4],Poiss.param.house,25)
test.house.times

[,1] [,2] [,3]


0 2 6.639755 3.242186947
1 11 12.687046 0.224333174
2 14 11.936156 0.356853101
3 17 11.229707 2.965018320
4 12 10.565069 0.194889966
5 13 9.939769 0.942176169
6 7 9.351478 0.591291226
7 12 8.798004 1.165352396
8 13 8.277289 2.694601966
9 4 7.787392 1.841995446
10 7 7.326491 0.014549417
11 9 6.892868 0.644145026
12 4 6.484909 0.952175691
13 1 6.101096 4.265000551
14 6 5.739998 0.011777154
15 5 5.400273 0.029668580
16 5 5.080654 0.001280372
17 3 4.779953 0.662816489
18 1 4.497048 2.719416225
19 5 4.230888 0.139813226
20 3 3.980480 0.241513741
21 5 3.744893 0.420651510
22 4 3.523249 0.064511988
23 6 3.314723 2.175358960
24 2 3.118539 0.401190939
25 47 49.572275 0.133473735

This suggests redoing it with t  17 :

test.house.times = test.wait[,4](house.waits,Poiss.param.house,17)
test.house.times

[,1] [,2] [,3]


0 2 6.639755 3.242186947
1 11 12.687046 0.224333174
2 14 11.936156 0.356853101
3 17 11.229707 2.965018320
4 12 10.565069 0.194889966
5 13 9.939769 0.942176169
6 7 9.351478 0.591291226
7 12 8.798004 1.165352396
8 13 8.277289 2.694601966
9 4 7.787392 1.841995446
10 7 7.326491 0.014549417
11 9 6.892868 0.644145026
12 4 6.484909 0.952175691
13 1 6.101096 4.265000551
14 6 5.739998 0.011777154
15 5 5.400273 0.029668580
16 5 5.080654 0.001280372
17 76 80.762046 0.280788851

The Actuarial Education Company © IFE: 2020 Examinations


Page 10 CS2B: Chapter 1 Poisson processes - Solutions

sum(test.house.times[,3])

[1] 20.41808

DOF = length(test.house.times[,3]) - 2
qchisq(0.95,df=DOF)

[1] 26.29623

So, there is insufficient evidence here to reject the null hypothesis that the house insurance
claims follow a Poisson process.

Here we’d ideally redo this with a better grouping of the last category, to see if the tail of the
distribution is good enough.

(vii)(c) Observed travel insurance claims waiting times from a Poisson process?

First, we stop when t  12 :

test.travel.times=test.wait.times(travel[,4],Poiss.param.travel,12)
test.travel.times

[,1] [,2] [,3]


0 7 2.517237 7.98302164
1 7 4.895562 0.90462759
2 9 4.716014 3.89153537
3 11 4.543051 9.17713279
4 4 4.376432 0.03237824
5 7 4.215924 1.83852465
6 5 4.061302 0.21696326
7 6 3.912352 1.11397865
8 3 3.768864 0.15685134
9 3 3.630638 0.10954125
10 5 3.497482 0.64548119
11 6 3.369210 2.05420713
12 57 88.495931 11.20948343

sum(test.travel.times[,3])

[1] 39.33373

DOF = length(test.travel.times[,3]) - 2
qchisq(0.95,df=DOF)

[1] 19.67514

So, there is sufficient evidence here to reject the null hypothesis that the travel insurance claims
follow a Poisson process.

Here we’d ideally redo this with a better grouping of the last category. Indeed, here none of the
numbers in the expected column are greater than 5, which means that we could question the
validity of this test in this instance

© IFE: 2020 Examinations The Actuarial Education Company


CS2B: Chapter 1 Poisson processes - Solutions Page 11

(viii) Should the company model its future claim numbers as a Poisson process?
In part (iv), we saw that the total claims arrive according to a Poisson process. So, this assumption
seems reasonable.

In part (v), we saw that the Poisson parameter chosen for motor insurance claims, house
insurance claims or travel insurance claims would be extremely similar whether using the actual
waiting times of the claims under consideration or thinning the Poisson parameter used for all the
claims. So, the Poisson parameters chosen would be appropriate.

However, the legitimacy of thinning a Poisson process, as in part (v)(b), relies on the motor
insurance claims, house insurance claims and travel insurance claims being distributed randomly
throughout all the claims, ie that the three types occur independently. This may not be the case.

In part (vii)(b) we saw that the observed house insurance claims arrive according to a Poisson
process. So, this assumption seems reasonable. However, in parts (vii)(a) and (viii)(c), when we
tested whether the observed motor insurance claims waiting times and travel insurance claims
waiting times came from Poisson processes, we found that there was sufficient evidence to
conclude that they didn’t.

So, it’s probable that the motor insurance claims and travel insurance claims are not randomly
distributed throughout all the claims and there is some clumping of these types of claims in one
place or some seasonality of claims, eg more motor insurance claims in the winter when the roads
are icy and cars break down in the cold weather, or more travel insurance claims in the summer,
when most people go on holiday.

You might expect a similar ‘seasonality’ in house insurance claims, eg houses being burgled in the
period leading up to Christmas when thieves know that houses will be full of high-value items, or
many claims following a one-off event such as a flood.

(ix)(a) Mean, standard deviation,  and  for motor insurance claims

It’s going to be useful to have a function that gives us all four of the required items for some given
data. We have:

X j ~ Ga  ,  

which means that:

E  X j     and var  X j     2

   E  X j  var  X j  and    E  X j 

The Actuarial Education Company © IFE: 2020 Examinations


Page 12 CS2B: Chapter 1 Poisson processes - Solutions

So, we code a function that returns the sample mean, standard deviation, and the estimated
values for the parameters  and  :

get.moments.parameters = function(claims){
claims = as.numeric(claims)
answer = numeric(4)
answer[1] = mean(claims)
answer[2] = sqrt(var(claims))
answer[4] = mean(claims)/var(claims)
answer[3] = answer[4]*answer[1]
answer
}

We can then use this for the motor insurance claims:


m.p.motor = get.moments.parameters(motor[,2])
m.p.motor

[1] 1.722952e+03 8.087160e+02 4.538937e+00 2.634396e-03

So, the mean is 1,722.95, the standard deviation is 808.72,   4.538937 and   0.002634396 .

(ix)(b) Mean, standard deviation,  and  for house insurance claims


m.p.house = get.moments.parameters(house[,2])
m.p.house

[1] 7.259018e+03 3.426188e+03 4.488831e+00 6.183800e-04

So, for the house insurance claims , the mean is 7,259.02, the standard deviation is 3426.19,
  4.488831 and   0.00061838 .

(ix)(c) Mean, standard deviation,  and  for travel insurance claims


m.p.travel = get.moments.parameters(travel[,2])
m.p.travel

[1] 281.51470588 109.96181056 6.55418068 0.02328184

So, for the travel insurance claims, the mean is 281.52, the standard deviation is 109.96,
  6.55418068 and   0.02328184 .
(x) Mean and standard deviation of total annual claims
If we are fitting individual claim amounts as a gamma distribution and claim numbers as a Poisson
process, the total annual claim amounts will be a compound Poisson distribution with mean and
variance formulae given on page 16 of the Tables:

E  S    m1

and 
var  S   m2   var  X   E 2  X  
Here the  is the Poisson parameter over a year, which will be 365 times the daily Poisson
parameter rate. The second moment of the claim amount distribution m2 can be calculated using
the variance plus the mean squared.

© IFE: 2020 Examinations The Actuarial Education Company


CS2B: Chapter 1 Poisson processes - Solutions Page 13

(x)(a) Mean and standard deviation of total annual claims for motor insurance
Mean:
expected.annual.motor = 365*Poiss.param.motor2*m.p.motor[1]
expected.annual.motor

[1] 60812.27

Standard deviation:
sd.annual.motor = sqrt(365*Poiss.param.motor2*((m.p.motor[2])^2+
(m.p.motor[1])^2))
sd.annual.motor

[1] 11307.54

(x)(b) Mean and standard deviation of total annual claims for house insurance
Mean:
expected.annual.house = 365*Poiss.param.house2*m.p.house[1]
expected.annual.house

[1] 160859.1

Standard deviation:
sd.annual.house = sqrt(365*Poiss.param.house2*((m.p.house[2])^2+
(m.p.house[1])^2))
sd.annual.house

[1] 37786.36

(x)(c) Mean and standard deviation of total annual claims for travel insurance
Mean:
expected.annual.travel = 365*Poiss.param.travel2*m.p.travel[1]
expected.annual.travel

[1] 3838.977

Standard deviation:
sd.annual.travel = sqrt(365*Poiss.param.travel2*((m.p.travel[2])^2+
(m.p.travel[1])^2))
sd.annual.travel

[1] 1116.073

The Actuarial Education Company © IFE: 2020 Examinations


Page 14 CS2B: Chapter 1 Poisson processes - Solutions

(xi) Other issues the company should consider


The model probably needs to be thinned down much further and include different types of claim
within each category. Claim sizes are likely to be very different depending on this. Examples:
 a house insurance claim could be for having contents stolen or it could be for a house
being completely destroyed by a fire
 a travel insurance claim could be for being air-lifted off a mountain during an expensive
winter sport holiday or having a summer holiday suitcase with a few swimming costumes
in it lost at an airport.

Using past data isn’t necessarily a good guide to predicting future claim patterns. There may be
changing trends in claim sizes, the number of claims arriving and the waiting times between
claims.
 Claims inflation (past and future) should be allowed for, especially as we are using 10
years of data.
 There needs to be an allowance for the evolution in the kind of event that would lead to a
claim. Examples:
– developments in technology will mean that the type of goods desirable for a thief
has completely changed over 10 years
– the onset of global warming may have dramatically increased the number of
claims for houses being flooded.

The company should consider building seasonality into the model so that the rate at which claims
arrive changes depending on the time of year.

For the claim amounts, the gamma distribution assumption could be tested:
 other distributions could be considered for a better fit.
 a distribution with 3 parameters would allow for the skewness of claim sizes to be
incorporated more accurately.

© IFE: 2020 Examinations The Actuarial Education Company

You might also like