Professional Documents
Culture Documents
We need to extract the relevant part of the matrix for each of the three different types of claim.
We can use the head function to display the first few lines of each table, and the nrow function
to see the number of rows:
nrow(motor)
[1] 352
head(house)
nrow(house)
[1] 221
nrow(travel)
[1] 136
Wt ~ Exp
(iii)(a) Waiting times in days for the claims for combined claims
The diff function in R takes the difference between neighbouring rows, which is just what we
need for the observed waiting times. We can set this up as a function, which will begin like this:
We set up a temporary table and column bind a new column with the difference of the dates as a
numerical value:
We can then rename the new column so that we know what it represents:
temp
}
We can see that the new column has been created correctly. For example, it is correct that there
are 9 days between 2008-01-11 and 2008-01-20. The first entry shows NA (‘not available’)
because there is no preceding claim to compare with.
We then repeat this separately for the motor, house and travel claims:
head(house)
head(travel)
We know the waiting times of a Poisson process are exponentially distributed with the same
parameter as that of the Poisson process. So, we should find this:
mean.claim.time = mean(claims[,4][2:length(claims[,4])])
Poiss.param.claims = 1/mean.claim.time
Poiss.param.claims
[1] 0.194773
The test to use is a chi-squared test to see if the actual waiting times are exponentially distributed
(as they should be for a Poisson process). The null hypothesis H0 is that the claims are
exponentially distributed.
Looking ahead in the question, it will be useful to build a function that tests this for any matrix of
claims data:
test.wait.times=function(test.data,lambda,largest.time){
n=length(test.data)
test.data=as.matrix(table(test.data))
So, the table that summarises the claim counts has now been put in a matrix.
test.data=cbind(test.data,0)
test.data[1,2]=n*(1-exp(-lambda*0.5))
This uses the probability of a waiting time of 0 days (less than 0.5 days P T 0.5 as per the
question) using the distribution function for an exponential distribution.
for (j in 2:(largest.time)){
test.data[j,2]=n*(exp(-lambda*(j-1.5))-exp(-lambda*(j-
0.5)))
}
This part of the function uses the probabilities, P 0.5 T 1.5 , P 1.5 T 2.5 , P 2.5 T 3.5 ,
and so on until some ‘largest time’. We ‘cut it off’ here and use the fact that probabilities sum to
1 to fill in the very last probability:
test.data=test.data[1:(largest.time+1),]
test.data[(largest.time+1),2]=n-sum(test.data[,2])
test.data[(largest.time+1),1]=n-sum(test.data[,1])
So, column 1 contains the actual waiting times, column 2 contains the expected waiting times.
Finally, we can create a column of A E E , which we will need for the chi-squared test
2
statistic.
test.data=cbind(test.data,0)
for (j in 1:(largest.time+1)){
test.data[j,3]=(test.data[j,1]-
test.data[j,2])^2/test.data[j,2]
}
test.data
}
In our function, the probabilities calculated in the for loop are only calculated correctly whilst
the waiting times are going up by 1 each time. Looking at the table of claims data, we will need to
stop at 27. This is OK because there are instances of waiting times in days for all integers between
1 and 27:
test.claim.times=test.wait.times(claims[,4],Poiss.param.claims,27)
test.claim.times
14 5 9.0493494 1.811978917
15 7 7.4478085 0.026925022
16 7 6.1297060 0.123564118
17 7 5.0448794 0.757698335
18 5 4.1520438 0.173174874
19 5 3.4172210 0.733107157
20 3 2.8124462 0.012507420
21 2 2.3147035 0.042786589
22 2 1.9050505 0.004732367
23 3 1.5678974 1.308068851
24 1 1.2904132 0.065358768
25 2 1.0620377 0.828382337
26 1 0.8740798 0.018140106
27 5 4.0648221 0.215152781
Usually with the chi-squared test, we need the numbers in the expected column to be at least 5,
E 5 . So, let’s rerun this table with a new largest value of 18:
test.claim.times=test.wait.times(claims[,4],Poiss.param.claims,18)
test.claim.times
We now sum the last column and calculate the critical value based on there being 17 degrees of
freedom (19 categories and we lose 2 degrees of freedom).
sum(test.claim.times[,3])
[1] 20.77469
DOF = length(test.claim.times[,3]) - 2
qchisq(0.95,df=DOF)
[1] 27.58711
So, because the test statistic of 20.77 is less than the critical value of 27.59, there is insufficient
evidence to reject H0 and we can conclude that the waiting times are exponentially distributed.
mean.motor.time = mean(motor[,4][2:length(motor[,4])])
Poiss.param.motor = 1/mean.motor.time
Poiss.param.motor
[1] 0.09704175
Thinning works by taking the Poisson parameter for all the claims and multiplying it by the
proportion of claims that are motor insurance claims:
proportion.motor = length(motor[,4])/length(claims[,4])
Poiss.param.motor2 = Poiss.param.claims*proportion.motor
Poiss.param.motor2
[1] 0.09669973
We need to run all the same code but with ‘house’ instead of ‘motor’:
mean.house.time = mean(house[,4][2:length(house[,4])])
Poiss.param.house = 1/mean.house.time
Poiss.param.house
[1] 0.06100943
proportion.house = length(house[,4])/length(claims[,4])
Poiss.param.house2 = Poiss.param.claims*proportion.house
Poiss.param.house2
[1] 0.06071205
The answers are very similar again, whether using the actual waiting times of the house insurance
claims or thinning the Poisson parameter used for all claims.
We then run all the code for a third time but with ‘travel’ instead of ‘house’:
mean.travel.time = mean(travel[,4][2:length(travel[,4])])
Poiss.param.travel = 1/mean.travel.time
Poiss.param.travel
[1] 0.03736507
proportion.travel = length(travel[,4])/length(claims[,4])
Poiss.param.travel2 = Poiss.param.claims*proportion.travel
Poiss.param.travel2
[1] 0.03736126
Again, the answers are very similar, whether using the actual waiting times of the travel insurance
claims or thinning the Poisson parameter used for all claims.
(vii)(a) Observed motor insurance claims waiting times from a Poisson process?
In our function to test the waiting times, the probabilities calculated in the for loop are only
calculated correctly whilst the waiting times are going up by 1 each time. So, remembering this
and looking at the table of the motor insurance claims waiting times, we will need to stop when
t 34 :
test.motor.times = test.wait.times(motor[,4],Poiss.param.motor,34)
test.motor.times
Usually with the chi-squared test, we need the numbers in the expected column to be at least 5,
E 5 . So, let’s rerun this table with a new largest value of 21:
test.motor.times = test.wait.times(motor[,4],Poiss.param.motor,21)
test.motor.times
The tail has been cut off quite abruptly here. It would have been better to group slightly
differently. However, this is good enough to perform a test, especially as this turns out to be a fail
anyway but with the last category numbers being quite close.
sum(test.motor.times[,3])
[1] 59.42042
DOF=length(test.motor.times[,3])-2
qchisq(0.95,df=DOF)
[1] 31.41043
So, there is sufficient evidence here to reject the null hypothesis that the motor insurance claims
follow a Poisson process.
(vii)(b) Observed house insurance claims waiting times from a Poisson process?
test.house.times = test.wait.times(house[,4],Poiss.param.house,25)
test.house.times
test.house.times = test.wait[,4](house.waits,Poiss.param.house,17)
test.house.times
sum(test.house.times[,3])
[1] 20.41808
DOF = length(test.house.times[,3]) - 2
qchisq(0.95,df=DOF)
[1] 26.29623
So, there is insufficient evidence here to reject the null hypothesis that the house insurance
claims follow a Poisson process.
Here we’d ideally redo this with a better grouping of the last category, to see if the tail of the
distribution is good enough.
(vii)(c) Observed travel insurance claims waiting times from a Poisson process?
test.travel.times=test.wait.times(travel[,4],Poiss.param.travel,12)
test.travel.times
sum(test.travel.times[,3])
[1] 39.33373
DOF = length(test.travel.times[,3]) - 2
qchisq(0.95,df=DOF)
[1] 19.67514
So, there is sufficient evidence here to reject the null hypothesis that the travel insurance claims
follow a Poisson process.
Here we’d ideally redo this with a better grouping of the last category. Indeed, here none of the
numbers in the expected column are greater than 5, which means that we could question the
validity of this test in this instance
(viii) Should the company model its future claim numbers as a Poisson process?
In part (iv), we saw that the total claims arrive according to a Poisson process. So, this assumption
seems reasonable.
In part (v), we saw that the Poisson parameter chosen for motor insurance claims, house
insurance claims or travel insurance claims would be extremely similar whether using the actual
waiting times of the claims under consideration or thinning the Poisson parameter used for all the
claims. So, the Poisson parameters chosen would be appropriate.
However, the legitimacy of thinning a Poisson process, as in part (v)(b), relies on the motor
insurance claims, house insurance claims and travel insurance claims being distributed randomly
throughout all the claims, ie that the three types occur independently. This may not be the case.
In part (vii)(b) we saw that the observed house insurance claims arrive according to a Poisson
process. So, this assumption seems reasonable. However, in parts (vii)(a) and (viii)(c), when we
tested whether the observed motor insurance claims waiting times and travel insurance claims
waiting times came from Poisson processes, we found that there was sufficient evidence to
conclude that they didn’t.
So, it’s probable that the motor insurance claims and travel insurance claims are not randomly
distributed throughout all the claims and there is some clumping of these types of claims in one
place or some seasonality of claims, eg more motor insurance claims in the winter when the roads
are icy and cars break down in the cold weather, or more travel insurance claims in the summer,
when most people go on holiday.
You might expect a similar ‘seasonality’ in house insurance claims, eg houses being burgled in the
period leading up to Christmas when thieves know that houses will be full of high-value items, or
many claims following a one-off event such as a flood.
It’s going to be useful to have a function that gives us all four of the required items for some given
data. We have:
X j ~ Ga ,
E X j and var X j 2
E X j var X j and E X j
So, we code a function that returns the sample mean, standard deviation, and the estimated
values for the parameters and :
get.moments.parameters = function(claims){
claims = as.numeric(claims)
answer = numeric(4)
answer[1] = mean(claims)
answer[2] = sqrt(var(claims))
answer[4] = mean(claims)/var(claims)
answer[3] = answer[4]*answer[1]
answer
}
So, the mean is 1,722.95, the standard deviation is 808.72, 4.538937 and 0.002634396 .
So, for the house insurance claims , the mean is 7,259.02, the standard deviation is 3426.19,
4.488831 and 0.00061838 .
So, for the travel insurance claims, the mean is 281.52, the standard deviation is 109.96,
6.55418068 and 0.02328184 .
(x) Mean and standard deviation of total annual claims
If we are fitting individual claim amounts as a gamma distribution and claim numbers as a Poisson
process, the total annual claim amounts will be a compound Poisson distribution with mean and
variance formulae given on page 16 of the Tables:
E S m1
and
var S m2 var X E 2 X
Here the is the Poisson parameter over a year, which will be 365 times the daily Poisson
parameter rate. The second moment of the claim amount distribution m2 can be calculated using
the variance plus the mean squared.
(x)(a) Mean and standard deviation of total annual claims for motor insurance
Mean:
expected.annual.motor = 365*Poiss.param.motor2*m.p.motor[1]
expected.annual.motor
[1] 60812.27
Standard deviation:
sd.annual.motor = sqrt(365*Poiss.param.motor2*((m.p.motor[2])^2+
(m.p.motor[1])^2))
sd.annual.motor
[1] 11307.54
(x)(b) Mean and standard deviation of total annual claims for house insurance
Mean:
expected.annual.house = 365*Poiss.param.house2*m.p.house[1]
expected.annual.house
[1] 160859.1
Standard deviation:
sd.annual.house = sqrt(365*Poiss.param.house2*((m.p.house[2])^2+
(m.p.house[1])^2))
sd.annual.house
[1] 37786.36
(x)(c) Mean and standard deviation of total annual claims for travel insurance
Mean:
expected.annual.travel = 365*Poiss.param.travel2*m.p.travel[1]
expected.annual.travel
[1] 3838.977
Standard deviation:
sd.annual.travel = sqrt(365*Poiss.param.travel2*((m.p.travel[2])^2+
(m.p.travel[1])^2))
sd.annual.travel
[1] 1116.073
Using past data isn’t necessarily a good guide to predicting future claim patterns. There may be
changing trends in claim sizes, the number of claims arriving and the waiting times between
claims.
Claims inflation (past and future) should be allowed for, especially as we are using 10
years of data.
There needs to be an allowance for the evolution in the kind of event that would lead to a
claim. Examples:
– developments in technology will mean that the type of goods desirable for a thief
has completely changed over 10 years
– the onset of global warming may have dramatically increased the number of
claims for houses being flooded.
The company should consider building seasonality into the model so that the rate at which claims
arrive changes depending on the time of year.
For the claim amounts, the gamma distribution assumption could be tested:
other distributions could be considered for a better fit.
a distribution with 3 parameters would allow for the skewness of claim sizes to be
incorporated more accurately.