Professional Documents
Culture Documents
Non-Life Insurance:
Mathematics & Statistics
– Lecture Notes –
(m
tes
Mario V. Wüthrich
RiskLab Switzerland
Department of Mathematics
ETH Zurich
no
NL
w)
(m
tes
no
NL
Lecture notes. The present lecture notes cover the lecture Non-Life Insurance:
w)
Mathematics & Statistics which is held at the Department of Mathematics at ETH
Zurich. This lecture is a merger of the two lectures Nicht-Leben Versicherungs-
mathematik and Risk Theory for Insurance. It is held for its first time in Spring
2014. The lecture aims at providing a basis in non-life insurance mathematics
(m
which forms a core subject of actuarial sciences. After this course, the students
may follow lectures that give a deeper specialization in non-life insurance math-
ematics, such as Credibility Theory, Non-Life Insurance Pricing with Generalized
Linear Models, Stochastic Claims Reserving Methods, Market-Consistent Actuarial
Valuation, Quantitative Risk Management, etc.
tes
Prerequisites. The prerequisites for this lecture are a solid education in mathe-
matics, in particular, in probability theory and statistics.
Terms of Use. These lecture notes are an ongoing project which will be continu-
no
ously revised and updated. Of course, there may be errors in the notes and there
is always room for improvement. Therefore, I appreciate any comment and/or cor-
rections that readers may have. However, I would like you the respect the following
rules:
• These notes are provided solely for educational, personal and non-commercial
NL
• All rights remain with the author. He may update the manuscript or with-
draw the manuscript at any time. There is no right of the availability of any
(old) version of these notes. The author may also change these terms of use
at any time.
• The author disclaims all warranties, including but not limited to the use or
the contents of these notes. On using these notes, you fully agree to this.
3
4
w)
(m
tes
no
NL
Writing these notes, I profited greatly from various inspiring as well as ongoing
w)
discussions, concrete contributions and critical comments with and by several peo-
ple: first of all our students that have been following our lectures at ETH Zurich
since 2006; furthermore Hans Bühlmann, Philippe Deprez, Paul Embrechts, Lau-
rent Huber, Michael Merz, Gareth Peters, Simon Rentzmann. I especially thank
(m
Alois Gisler for providing his lecture notes [48] and the corresponding exercises.
5
6
w)
(m
tes
no
NL
w)
1 Introduction 11
1.1 Nature of non-life insurance . . . . . . . . . . . . . . . . . . . . . . 11
1.1.1 Non-life insurance and the law of large numbers . . . . . . . 11
1.1.2 Risk components and premium elements . . . . . . . . . . . 13
(m
1.2 Probability theory and statistics . . . . . . . . . . . . . . . . . . . . 14
1.2.1 Random variables and distribution functions . . . . . . . . . 14
1.2.2 Terminology in statistics . . . . . . . . . . . . . . . . . . . . 20
7
8 Contents
w)
4.2.2 Fast Fourier transform . . . . . . . . . . . . . . . . . . . . . 112
(m
5.2 Lundberg bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.3 Pollaczek-Khinchin formula . . . . . . . . . . . . . . . . . . . . . . 123
5.4 Subexponential claims . . . . . . . . . . . . . . . . . . . . . . . . . 127
w)
9.3.1 Gamma-gamma Bayesian CL model . . . . . . . . . . . . . . 219
9.3.2 Over-dispersed Poisson model . . . . . . . . . . . . . . . . . 224
9.4 Claims development result . . . . . . . . . . . . . . . . . . . . . . . 227
(m
10 Solvency Considerations 235
10.1 Balance sheet and solvency . . . . . . . . . . . . . . . . . . . . . . . 235
10.2 Risk modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
10.3 Insurance liability variables . . . . . . . . . . . . . . . . . . . . . . 242
10.3.1 Market-consistent value . . . . . . . . . . . . . . . . . . . . 242
10.3.2 Insurance risk . . . . . . . . . . . . . . . . . . . . . . . . . . 243
tes
no
NL
w)
(m
tes
no
NL
Introduction
w)
1.1 Nature of non-life insurance
1.1.1
(m
Non-life insurance and the law of large numbers
Insurance originates from a general demand of our society that asks for protection
against unforeseeable random events which might cause serious (financial) damage
to individuals and society. Insurance then organizes the financial protection against
tes
such unforeseeable random events, meaning that it takes care of the financial re-
placements of the damage. The general idea is to build a community to which
everybody contributes a certain amount (fixed deterministic premium) and then
the (random) financial damages are financed by the savings of this community.
no
insured insurer
policyholder insurance company
11
12 Chapter 1. Introduction
The basic features of such communities are that every member faces similar risks
and by building such communities the individual members receive diversification
benefits in the form of a law of large numbers that applies to the community.
Insurance companies organize this equal balance within the community.
Modern insurance as known today is traced back
to the Great Fire of London in 1666 which has
destroyed a big part of London. This event has
initiated fire insurance protection against such
disastrous events. Today, fire insurance belongs
w)
to the branch of non-life insurance which is also
known as property and casualty insurance in
Great Fire of London 1666
the US and general insurance in the UK. Non-
life insurance comprises car insurance, liability insurance, property insurance, ac-
(m
cident and health insurance, marine insurance, credit insurance, legal protection
insurance, traveling insurance and other similar products. Insurance contracts for
these types of products have in common that they specify an insurance period
(typically one year) and then all (random) events that occur within this insurance
period and which are causing financial damage to which the insurance contract
applies are replaced. Such random events to which insurance contracts apply are
tes
called insurance claims.
Typically, the insurance premium for these con-
tracts is paid at the beginning of the insurance
period (in advance). To determine this insur-
ance premium, the insurance company pools
no
his path-breaking work Ars Conjectandi which has appeared in 1713, eight years
after his death, see Bolthausen-Wüthrich [15].
For independent and identically distributed random variables
Y1 , Y2 , . . . with finite variances σ 2 < ∞ the weak law of large
numbers can further be refined by Chebychev’s inequality which
provides rates of convergence and by the central limit theorem
(CLT) which provides the asymptotic limit distribution. The
CLT states under the above assumptions that we have the fol-
lowing convergence in distribution
w)
Pn
Y − nµ A. De Moivre
i=1
√i ⇒ N (0, 1) as n → ∞, (1.2)
nσ
(m
crucial feature is that the denominator only increases of order
√
n, i.e. it increases at a slower rate than n. This exactly implies
that the total claim amount of the portfolio becomes predictable
in the limit because the relative confidence bounds get narrower
the bigger the portfolio is. These are the basics why insurance
works. The CLT goes back to Abraham De Moivre (1667-
tes
1754) who published a first article on the CLT in 1733 based on
P.-S. Laplace
coin tossing, this was way ahead of time, and to Pierre-Simon
Laplace (1749-1827) who provided an extension in 1812.
no
This risk is taken care of by the volume n of the insurance portfolio (as
described above). This implies that it can be controlled in a sufficient way if
the insurance portfolio is built in an appropriately large fashion.
(a) the model world does not provide an appropriate description of real world
behavior;
All these uncertainties ask for a risk loading (risk margin) beyond the pure risk
premium defined by µ = E[Yi ]. This will be described in detail below.
We close this section by describing the elements that are considered for the insur-
ance premium calculation. The premium items are:
w)
• risk margin to protect against the risks mentioned above
• profit margin
(m
• − financial gains on investments
• state taxes
tes
The sum of all these items specifies the insurance premium. Non-life insurance
mathematics and statistics typically studies the first two items. This will be done
in the subsequent chapters.
no
and assume throughout that this probability space is sufficiently rich so that it
carries all the objects that we are going to consider.
Random variables on this probability space (Ω, F, P) are denoted by capital let-
ters X, Y, S, N, . . . and the corresponding observations are denoted by x, y, s, n, . . ..
That is, x constitutes a realization of X. Random vectors are denoted by boldface,
e.g., X = (X1 , . . . , Xd )0 and the corresponding observation by x = (x1 , . . . , xd )0 for
a given dimension d ∈ N. Since there is broad similarity between random variables
and random vectors, we restrict to random variables for introducing the crucial
terminology from probability theory.
Random variables X are characterized by (probability) distribution functions F =
FX : R → [0, 1] meaning that for all x ∈ R
pk = P [X = k] > 0 for k ∈ A,
w)
with k∈A pk = 1. We call pk probability weight of X in k ∈ A;
P
(m
Z x
F (x) = f (y) dy for all x ∈ R.
−∞
This function f is called density of X and in that case we also use the terminology
X ∼ f.
tes
Assume X ∼ F and h : R → R is a sufficiently nice measurable function. We
define the expected value of h(X) by
P
Z
k∈A h(k) pk if X is discrete,
E [h(X)] = h(x) dF (x) =
no
The middle term uses the general framework of the Riemann-Stieltjes integral
R
R h dF . The “sufficiently nice” refers to the fact that E [h(X)] is only defined
upon existence. The most important functions h in our analysis define the follow-
NL
• k-th moment of X ∼ F
h i Z
E Xk = xk dF (x),
R
• variance of X ∼ F
h i h i
2
σX = Var (X) = E (X − E[X])2 = E X 2 − E [X]2 ,
• skewness of X ∼ F h i
E (X − E[X])3
ςX = 3
,
σX
• moment generating function of X ∼ F at r ∈ R
w)
MX (r) = E [exp {rX}] .
The moment generating function will be crucial to identify the properties of random
variables X. If we replace r by −r, i.e. if we consider MX (−r), we obtain the
(m
Laplace-Stieltjes transform of the distribution function F at position r which will
be denoted
cF (r) = MX (−r).
m
Lemma 1.1. Choose X ∼ F and assume that there exists r0 > 0 such that
MX (r) < ∞ for all r ∈ (−r0 , r0 ). Then MX (r) has a power series expansion
for r ∈ (−r0 , r0 ) with
tes
X rk h i
MX (r) = E Xk .
k≥0 k!
Proof. Note that it suffices to choose r ∈ (−r0 , r0 ) with r 6= 0. Since e|rx| ≤ e−rx + erx
the assumptions imply integrability E [exp {|rX|}] < ∞. This implies that E[|X|k ] < ∞ for all
no
k ∈ N because |x|k is dominated by e|rx| for sufficiently large |x|. It also implies that the partial
Pm
sums |fm (x)| = | k=0 (rx)k /k!| are uniformly bounded by the integrable (w.r.t. dF ) function
k |rx|
P
k≥0 |rx| /k! = e . This allows to apply the dominated convergence theorem which provides
m
X rk h i
E X k = lim E [fm (X)] = E lim fm (X) = MX (r).
lim
m→∞ k! m→∞ m→∞
k=0
NL
Lemma 1.1 implies that the power series converges for all r ∈ (−r0 , r0 ) for given
r0 > 0 and, thus, we have a strictly positive radius of convergence ρ0 > 0. A
standard result from analysis implies that in the interior of the interval [−ρ0 , ρ0 ]
we can differentiate MX (·) arbitrarily often (term by term of the power series) and
the derivatives at the origin are given by
dk h
k
i
MX (r)|r=0 = E X <∞ for k ∈ N0 . (1.3)
drk
Lemma 1.2. Choose a random variable X ∼ F and assume that there exists r0 > 0
such that MX (r) < ∞ for all r ∈ (−r0 , r0 ). Then the distribution function F of X
is completely determined by its moment generating function MX .
Proof. The existence of a strictly positive radius of convergence ρ0 implies that all moments
of X exist and that they are directly determined by the moment generating function via (1.3).
Theorem 30.1 of Billingsley [13] then implies that there is at most one distribution function F
which has the same moments (1.3) for all k ∈ N. 2
For one-sided random variables the statement even holds true in general:
w)
Proof. See Section 22 of Billingsley [13], in particular Theorem 22.2. 2
(m
implication
(d)
MX ≡ MY ⇒ X = Y.
Lemma 1.4. Assume that the random variables Xn , n ∈ N, and P.L. Chebychev
tes
X have finite moment generating functions MXn , n ∈ N, and
MX on a common interval (−r0 , r0 ) with r0 > 0. Suppose limn→∞ MXn (r) = MX (r)
for all r ∈ (−r0 , r0 ). Then (Xn )n converges in distribution to X, write Xn ⇒ X
for n → ∞.
no
Proof. See Section 30 of Billingsley [13]. Basically, Chebychev’s inequality implies tightness of
the underlying probability measures from which the convergence in distribution is derived. 2
1 (x − µ)2
( )
1
f (x) = √ exp − .
2πσ 2 σ2
d 1
µX = E [X] = MX (r)|r=0 = exp rµ + r2 σ 2 µ + rσ 2 = µ,
dr 2 r=0
Moreover, any random variable Y that has moment generating function of the form
(1.4) is Gaussian with mean µY = µ and variance σY2 = σ 2 , see Lemma 1.2.
)
Exercise 1 (Gaussian distribution).
w
(a) Assume X ∼ N (0, 1). Prove that a + bX ∼ N (a, b2 ) for a, b ∈ R.
(b) Assume that Xi are independent and Xi ∼ N (µi , σi2 ). Prove that Xi ∼
P
i
(m
N ( i µi , i σi2 ).
P P
(c) Assume X ∼ N (0, 1). Prove that E[X 2k+1 ] = 0 for all k ∈ N0 .
Lemma 1.6. Assume that MX is finite on (−r0 , r0 ) with r0 > 0. Then log MX (·)
is a convex function on (−r0 , r0 ).
Proof. In order to prove convexity we calculate the second derivative at r ∈ (−r0 , r0 )
00 0 00 0
2
d2 (r))2
MX (r)MX (r) − (MX MX (r) MX (r)
log MX (r) = = −
dr2 (MX (r))2 MX (r) MX (r)
!2
E X 2 erX E XerX
= − .
E [erX ] E [erX ]
w)
Z x
1
Fr (x) = ery dF (y). (1.6)
MX (r) −∞
(m
!2
E X 2 erX E XerX d2
0 ≤ Var(Xr ) = E[Xr2 ] 2
− E[Xr ] = − = log MX (r).
E [erX ] E [erX ] dr2
Remark. The distribution function Fr defined in (1.6) gives the Esscher measure
tes
of F . The Esscher measure has been introduced by Bühlmann [19] for a new
premium calculation principle. Bühlmann has called it Esscher pricing principle
because of its formal connection to the Esscher transform. We come back to this
in Section 6.2.2.
no
The next formula will often be used: Assume that X ∼ F is non-negative, P-a.s.,
and has finite first moment. Then we have the identity
Z ∞ Z ∞ Z ∞
E[X] = x dF (x) = [1 − F (x)] dx = P [X > x] dx.
NL
0 0 0
The proof goes by integration by parts and the result says that we can calculate
expected values from survival function F̄ (x) = 1 − F (x) = P[X > x].
Another property that is going to be used quite frequently is the so-called tower
property, see Williams [85]. It states that for any sub-σ-algebra G ⊂ F on our
probability space (Ω, F, P) we have for any integrable random variable X ∼ F
E [X] = E [E [X| Y ]] ,
w)
non-decreasing. This allows to define the left-continuous generalized inverse of F
by
F ← (p) = inf {x; F (x) ≥ p} ,
where we use the convention that inf ∅ = ∞. For p ∈ (0, 1), F ← (p) is often called
(m
the p-quantile of X ∼ F . The generalized inverse F ← is only tricky at the places
where F has an discontinuity or where F is not strictly increasing. It satisfies the
following properties, see Proposition A3 in McNeil et al. [68],
5. F ← (F (x)) ≤ x.
6. F (F ← (z)) ≥ z.
Items 4. to 8. need that F ← (z) < ∞. Note that the first part of item 4. is put in
brackets because distribution functions are right-continuous. However, generalized
inverses can also be defined for functions that are not right-continuous (as long as
they are non-decreasing) and then the condition in the bracket of item 4. is needed.
and expert opinion. For example, we can estimate the (unknown) mean µX of X
by an estimator µb X .
If we now choose predictor X c = µ b X for predicting X, then µ b X serves at the
same time as estimator for µX and as predictor for X. In this sense we obtain
an estimation error which is specified by the difference µX − µb X and we obtain a
prediction error which is characterized by the following difference
X −X
c = X −µ
b X = (X − µX ) + (µX − µ
bX ) . (1.8)
The second term on the right-hand side of (1.8) again specifies the estimation error
w)
and the first term on the left-hand side of (1.8) is often called pure process error
which is due to the stochastic nature of X, see also Section 9.3.
Statistical tests deal with the problem of making decisions. Assume we have an
(m
observation x of a random vector X ∼ Fθ with given but unknown parameter θ
which lies in a given set Θ of possible parameters. The aim is to test whether
the (true, unknown) parameter θ that has generated x may belong to some subset
Θ0 ⊂ Θ. In the simplest case we have a singleton Θ0 = {θ0 }. Assume that we
would like to check whether x may have been generated by a given parameter θ0 .
tes
• Null hypothesis H0 : θ = θ0 .
We then build a test statistics T (X) whose distribution function is known under
no
the null hypothesis H0 and we consider the question whether T (x) takes an unlikely
value under the null hypothesis. This means that one chooses a significance level
q ∈ (0, 1) (typically 5% or 1%) which provides a critical region Cq with P[T (X) ∈
Cq ] ≤ q. The null hypothesis is then rejected if T (x) falls into this critical region.
In practice, one often calculates the so-called p-value. This denotes the critical
probability at which the null hypothesis is just rejected. For instance, if we choose
NL
(a) Prove that f is a density. Hint: see Section 3.2.1 and proof of Proposition
2.20.
(b) Prove
MXk (r) = (1 − 2r)−k/2 for r < 1/2.
(d)
(c) Choose Z ∼ N (0, 1) and prove Z 2 = X1 .
i.i.d. Pk (d)
(d) Choose Z1 , . . . , Zk ∼ N (0, 1). Prove i=1 Zi2 = Xk and calculate the first
two moments of the latter.
w)
(m
tes
no
NL
w)
The aim of this chapter is to describe the probability distribution of the total claim
amount S that an insurance company faces within a fixed time period. For the
(m
time period we take one (accounting) year. Assume that N counts all claims that
occur within this fixed accounting year. The total claim amount is then given by
N
X
S = Y1 + Y2 + . . . + YN = Yi ,
i=1
tes
where Y1 , . . . , YN models the individual claim sizes. If we are at the beginning of
this accounting year then neither the number of claims N nor the individual claim
sizes Y1 , . . . , YN are known. Therefore, we model all these unknowns with random
variables that describe the possible outcomes of the total claim amount S (which,
of course, then is also a random variable). We call such models for S collective
no
risk models because we consider the whole portfolio as a collective. The hope is to
discover a law of large numbers for the total insurance portfolio claim so that the
insurance company can benefit from diversification benefits (between individual
risks) that allow to predict possible outcomes of S (more) accurately.
NL
23
24 Chapter 2. Collective Risk Model
Remarks.
• The first assumption of the compound distribution says that the number of
)
claims N takes only non-negative integer values. The event {N = 0} means
w
that no claim occurs which provides a total claim amount of S = 0.
• The second assumption means that the individual claims Yi do not affect each
(m
other, i.e. if we face a large first claim Y1 this does not give any information
for the remaining claims Yi , i ≥ 2. Moreover, we have homogeneity meaning
that all claims have the same marginal distribution function G with
0 = G(0) = P [Y1 ≤ 0] ,
tes
i.e. the individual claim sizes Yi are strictly positive, P-a.s. We use synony-
mous the terminology (individual) claim size, (individual) claim and claims
severity for Yi .
• Finally, the last assumption says that the individual claim sizes are not af-
no
fected by the number of claims and vice versa, for instance, if we observe
many claims this does not contain any information whether these claims are
of smaller or larger size.
This compound distribution is the base model for collective risk modeling and we
are going to describe different choices for the claims count distribution of N and
NL
for the individual claim size distributions of Yi . We start with the basic recognition
features of compound distributions.
Proof. Using the tower property (1.7) we obtain for the mean of S
"N # " " N ## "N # "N #
X X X X
E[S] = E Yi = E E Yi N = E E [ Yi | N ] = E E [Yi ]
i=1 i=1 i=1 i=1
= E [N E [Y1 ]] = E [N ] E [Y1 ] .
w)
= Var E [ Yi | N ] + E Var ( Yi | N )
i=1 i=1
N
! " N
#
X X 2
= Var E [Yi ] +E Var (Yi ) = Var (N ) E [Y1 ] + E [N ] Var (Y1 ) .
i=1 i=1
(m
Finally, for the moment generating function we have
" ( N )# " " N ## "N #
X Y Y
MS (r) = E exp r Yi = E E exp {rYi } N = E E [ exp {rYi }| N ]
i=1 i=1 i=1
= E MY1 (r)N = E [exp {N log(MY1 (r))}] = MN (log(MY1 (r))).
" k #
G∗k (x) P [N = k] ,
X X X
= P Yi ≤ x P [N = k] =
k∈A i=1 k∈A
where G∗k denotes the k-th convolution of the distribution function G. In partic-
i.i.d.
ular, we have for Y1 , Y2 ∼ G
NL
Z
P [Y1 + Y2 ≤ x] = G(x − y) dG(y) = G∗2 (x).
With formula (2.1) we obtain a closed form solution for the distribution function
of S. However, in general, this formula is not useful due to the computational
complexity of calculating G∗k for too many k ∈ A. We present other solutions
for the calculation of the distribution function of S. These involve simulations,
approximations and smart analytic techniques under additional model assumptions.
w)
insurance policy and one should always choose the most appropriate one. Typical
volume measures are: number of insured, number of policies, number of risks. But
in health and accident insurance it could also be the aggregated wages insured or
in fire insurance the total value of the buildings insured. To make language simple
we assume that v > 0 denotes the number of risks insured and N counts the num-
(m
bers of risks that have a claim. The ratio N/v is called claims frequency and the
expected number of claims is given by
E[N ] = λv,
where λ > 0 denotes the expected claims frequency. Under these assumptions we
tes
would like to describe the probability weights
pk = P [N = k] for k ∈ A ⊂ N0 .
no
!
v
pk = P [N = k] = pk (1 − p)v−k for all k ∈ {0, . . . , v} = A.
k
P
The binomial formula provides k∈A pk = 1, see e.g. Section 5.3 in Merz-Wüthrich
[70], and, hence, we have a discrete distribution function on the set A = {0, . . . , v}.
The special case v = 1 is called Bernoulli distribution or Bernoulli experiment, write
N ∼ Bernoulli(p), and reflects the coin tossing experiment
(
1−p for k = 0,
P [N = k] =
p for k = 1.
Proposition 2.3. Assume N ∼ Binom(v, p) for fixed v ∈ N and p ∈ (0, 1). Then
s
1−p
E[N ] = vp, Var(N ) = vp(1 − p), Vco(N ) = ,
vp
MN (r) = (per + (1 − p))v for all r ∈ R.
Proof. We calculate the moment generating function and then the first two moments follow from
formula (1.5). For the moment generating function we have
X v
rk v
X k
k v−k
MN (r) = e p (1 − p) = (per ) (1 − p)v−k
k k
w)
k∈A k∈A
k v−k
X v per 1−p
= (per + (1 − p))v .
k per + (1 − p) per + (1 − p)
k∈A
The last sum is again a summation over probability weights p∗k , k ∈ A, of a binomial distribution
(m
with default probability p∗ = (per )/(per + (1 − p)) ∈ (0, 1). Therefore it adds up to 1 which
completes the proof. 2
Pv
Proof. In view of Lemma 1.3 it suffices to prove that N and X = i=1 Xi have the same
moment generating function. The moment generating function of the latter is given by
" v # v v
Y Y Y
rXi
E erXi = (per + (1 − p)) = MN (r).
MX (r) = E e =
i=1 i=1 i=1
NL
Remarks. The corollary states that N describes the number of defaults within
a portfolio of fixed size v ∈ N. Every risk in this portfolio has the same default
probability p and defaults between different risks do not influence each other (are
independent). Thus, if N has a binomial distribution then every risk in such a
portfolio can at most default once. This is the case, for instance, for life insurance
policies where an insured can die at most once. In non-life insurance this distri-
bution is less commonly used because for typical non-life insurance policies we can
have more than one claim within a fixed time interval, e.g., a car insurance policy
can suffer two or more accidents within the same accounting year. Therefore, the
binomial distribution is only of marginal interest in non-life insurance modeling.
Definition 2.5 (compound binomial model). The total claim amount S has a
compound binomial distribution, write
S ∼ CompBinom(v, p, G),
E[S] = vp E[Y1 ],
w)
Var(S) = vp E[Y12 ] − pE[Y1 ]2 ,
s
1 q
Vco(S) = 1 − p + Vco(Y1 )2 ,
vp
(m
MS (r) = (pMY1 (r) + (1 − p))v for r ∈ R,
Remark. The coefficient of variation Vco(S) is a measure for the degree of di-
tes
versification within the portfolio. If S has a compound binomial distribution with
fixed default probability p and fixed claim size distribution G having finite second
moment, then the coefficient of variation converges to zero of order v −1/2 as the
portfolio size v increases.
no
j=1 j=1
Proof. Exercise. 2
(λv)k
pk = P [N = k] = e−λv for all k ∈ A = N0 .
k!
w)
c = λv > 0 and then we could work solely with c. This is the way how the
Poisson distribution is typically treated in the literature. However, we would like
to keep the separation of c into λ and v because we would like to have the frequency
interpretation for λ which allows for the study of diversification benefits. This is
(m
exactly the statement of the next proposition.
Proof. We calculate the moment generating function and then the first two moments follow from
formula (1.5). For the moment generating function we have using the power series expansion of
the exponential function
no
X (λv)k X (λver )k
MN (r) = erk e−λv = e−λv = exp {−λv + λver } .
k! k!
k≥0 k≥0
Proposition 2.8 provides the interpretation of the parameter λ. For given volume
NL
N
E = λ.
v
Moreover, for the coefficient of variation of the claims frequency N/v we obtain
N
Vco = (λv)−1/2 → 0 for v → ∞. (2.2)
v
Next we give a constructive characterization of the Poisson distribution.
Proof. In view of Lemma 1.4 we need to prove the the moment generating functions of Nv have
the appropriate convergence property.
h ivp(v)
1/p(v)
MNv (r) = (per + (1 − p))v = (1 + p(v) (er − 1)) .
Note that p(v) → 0 as v → ∞. If we apply this limit to the inner bracket (1 + p(v)(er − 1))1/p(v)
we exactly obtain the limit definition of the exponential function exp{er − 1}, see Definition 14.30
in Merz-Wüthrich [70]. This with the fact that vp(v) → c > 0 as v → ∞ provides the proof. 2
w)
by a Poisson distribution if the default probability p is very small compared to the
portfolio size v.
Definition 2.10 (compound Poisson model). The total claim amount S has a
compound Poisson distribution, write
(m
S ∼ CompPoi(λv, G),
if S has a compound distribution with N ∼ Poi(λv) for given λ, v > 0 and individual
claim size distribution G.
tes
Proposition 2.11. Assume S ∼ CompPoi(λv, G). We have
E[S] = λv E[Y1 ],
Var(S) = λv E[Y12 ],
no
s
1 q
Vco(S) = 1 + Vco(Y1 )2 ,
λv
MS (r) = exp {λv(MY1 (r) − 1)} for r ∈ R,
The compound Poisson distribution has the so-called aggregation property and the
disjoint decomposition property. These are two extremely beautiful and useful
properties which explain part of the popularity of the compound Poisson model.
We first state and prove these two properties and then we give interpretations in
the light of non-life insurance portfolio modeling.
with n n n
X X vj X λj vj
v= vj , λ= λj and G= Gj .
v j=1 λv
w)
j=1 j=1
Proof. We have assumed that Gj (0) = 0 for all j = 1, . . . , n which implies that S ≥ 0, P-a.s.
From Lemma 1.3 it follows that we only need to identify the moment generating function of S in
order to prove that it is compound Poisson distributed. Observe that MS (r) exists at least for
(m
r ≤ 0. Thus, we calculate (using the independence of the Sj ’s)
X n n
Y Yn
MS (r) = E exp r Sj = E exp {rSj } = E [exp {rSj }]
j=1 j=1 j=1
n n
Y n o X λ j vj
= exp λj vj MY (j) (r) − 1 = exp λv MY (j) (r) − 1 ,
λv
tes
1 1
j=1 j=1
(j)
where we have assumed Y1 ∼ Gj . This is a compound Poisson distribution with expected num-
ber of claims λv and the claim size distribution G is obtained from the moment generating func-
Pn λj vj Pn λj vj
tion j=1 λv MY (j) (r): note that G = j=1 λv Gj is a distribution function (non-decreasing,
1
right-continuous, limx→−∞ G(x) = 0 and limx→∞ G(x) = 1). We choose Y ∼ G and obtain
no
Z ∞ Z ∞ n
X λ v
j j
MY (r) = ery dG(y) = ery d Gj (y)
0 0 j=1
λv
n Z ∞ n
X λj vj X λ j vj
= ery dGj (y) = MY (j) (r).
j=1
λv 0 j=1
λv 1
NL
Using Lemma 1.3 once more for these claim sizes proves the theorem. 2
P [I = j] = p+
j for all j ∈ {1, . . . , m}. (2.3)
w)
This allows to extend the compound Poisson model from Definition 2.10.
Definition 2.13 (extended compound Poisson model). The total claim amount
S= N
P
i=1 Yi has a compound Poisson distribution as defined in Definition 2.10. In
(m
addition, we assume that (Yi , Ii )i≥1 are i.i.d. and independent of N with Yi having
marginal distribution function G with G(0) = 0 and Ii having marginal distribution
function given by (2.3).
Remark. Note that Definition 2.13 gives a well-defined extension, i.e. it fully
tes
respects the assumptions made in Definition 2.10 because (Yi , Ii )i≥1 are i.i.d. and
independent of N with Yi having the appropriate marginal distribution function
G. Observe that we do not specify the dependence structure between Yi and Ii . If
we choose m = 1 in (2.3) we are back in the classical compound Poisson model.
Therefore, the next theorem especially applies to the compound Poisson model.
no
n
[
Ak = R+ × {1, . . . , m}. (2.4)
k=1
Pn
Note that k=1 p(k) = 1, due to (2.4) and the mutual disjointness.
w)
λk vk = λvp(k) > 0 and Gk (y) = P [ Y1 ≤ y| (Y1 , I1 ) ∈ Ak ] .
Proof of Theorem 2.14. We prove the theorem using the multivariate version of the mo-
ment generating function. Choose r = (r1 , . . . , rn )0 ∈ Rn . The multivariate moment generating
function of S = (S1 , . . . , Sn )0 is given by
(m
" ( n )# " ( n N
)#
X X X
0
MS (r) = E [exp {r S}] = E exp rk Sk = E exp rk Yi 1{(Yi ,Ii )∈Ak }
k=1 k=1 i=1
"N " ( n
) ##
Y X
= E E exp rk Yi 1{(Yi ,Ii )∈Ak } N
i=1 k=1
"N " ( n )##
tes
Y X
= E E exp rk Yi 1{(Yi ,Ii )∈Ak } .
i=1 k=1
(l)
where we assume Y1 ∼ Gl . Collecting all terms we obtain
!N " ( !)#
Xn n
X
MS (r) = E p(l) MY (l) (rl ) = E exp N log p(l) MY (l) (rl )
1 1
l=1 l=1
( n
!) ( n
)
X X
(l) (l)
= exp λv p MY (l) (rl ) − 1 = exp λv p MY (l) (rl ) − 1
1 1
l=1 l=1
n
Y n o n
Y
= exp λvp(l) MY (l) (rl ) − 1 = MSl (rl ).
1
l=1 l=1
This proves the theorem because we have obtained a product (i.e. independence) of moment
generating functions of compound Poisson distributed random variables Sl , l = 1, . . . , n. 2
w)
• The disjoint decomposition property implies that we
can also follow a top-down ↓ modeling approach. Thus,
we model the overall portfolio by a compound Poisson
distribution and by the disjoint decomposition prop-
(m
erty we can easily allocate the total claim to sub-
portfolios. The crucial result here is, at the first sight
surprising, that this allocation results in independent
compound Poisson distributions for Sj . This property
does not hold true for other compound distributions
because it essentially uses the independent space and
tes
time decoupling property of Poisson point processes, see also Section 3.3.2 in
Mikosch [71].
• For I we have chosen a finite (discrete) indicator. Of course, this model can
easily be extended to other indicators. The crucial property is the i.i.d. as-
no
The choice of the appropriate volume on the sub-portfolios depends on the choice
of the indicator I. If m = 1, i.e. if we only consider one portfolio, but we apply a
NL
then it is natural to set vk = v and λk = λp(k) for k = 1, . . . , n. That is, the volume
v > 0 remains constant but the expected claims frequencies λk change accordingly
to Ak . This is also called thinning of the Poisson point process.
The second extreme case is m = n > 1 and the disjoint decomposition is given by
w)
Assume that S ∼ CompPoi(λv, G). We define the total claim Ssc in the small
claims layer and the total claim Slc in the large claims layer by
(m
N
X N
X
Ssc = Yi 1{Yi ≤M } and Slc = Yi 1{Yi >M } .
i=1 i=1
Theorem 2.14 implies that Ssc and Slc are independent and compound Poisson
distributed with
tes
Ssc ∼ CompPoi (λsc v = λG(M )v , Gsc (y) = P [Y1 ≤ y|Y1 ≤ M ]) ,
and
Slc ∼ CompPoi (λlc v = λ(1 − G(M ))v , Glc (y) = P [Y1 ≤ y|Y1 > M ]) .
no
In particular, this means that we can model the small and the large claims layers
completely separately and then obtain the total claim distribution by a simple
convolution of the two resulting distribution functions (due to independence), see
Example 4.11, below.
NL
For the large claims layer we need to determine the expected large claims frequency
λlc . The individual claim sizes Y1 |{Y1 >M } are often model with a Pareto distribution
with threshold M , see Sections 3.2.5 and 3.4.1.
The small claims layer is often approximated by a parametric distribution function:
we have seen in (2.1) that compound distributions may lead to rather time consum-
ing complexity when the expected number of claims λsc v is large. Therefore, one
typically assumes that the expected number of claims is sufficiently large so that
we are already in the asymptotic regime of the central limit theorem and then we
approximate this compound distribution by the Gaussian distribution (or maybe
by a distribution function that is slightly skewed, see Sections 4.1.2 and 4.1.3).
Note that the small claims layer cannot be distorted by large claims because they
are already sorted out by the threshold M . We will describe this in more detail in
Section 3.4.1, below.
w)
E [N ] < Var(N ).
(m
mean. We remark that similar constructions could also be done for the binomial
distribution. We refrain from doing so because the Poisson case is more appropriate
for non-life insurance modeling.
The mixed Poisson distribution gives the general principle and a specific example
will be given in the next section. The idea is to attach volatility (or uncertainty)
to the claims frequency parameter λ, thus, the claims frequency will be modeled
tes
as a latent variable, and based on this latent variable we choose the claims count
distribution.
In the next section we make an explicit choice for the distribution function H.
We say X has a gamma distribution, write X ∼ Γ(γ, c), with shape parameter
γ > 0 and scale parameter c > 0 if X is a non-negative, absolutely continuous
random variable with density
cγ
f (x) = xγ−1 exp {−cx} for x ≥ 0.
Γ(γ)
w )
The gamma distribution has many nice properties and it is used rather frequently
for the modeling of latent variables and for the modeling of individual claim sizes,
see Section 3.2.1.
(m
Definition 2.19 (negative-binomial distribution). We say N has a negative-
binomial distribution, write N ∼ NegBin(λv, γ), with volume v > 0, expected claims
frequency λ > 0 and dispersion parameter γ > 0, if
Note that for Λ = Θλ we are exactly in the context of Definition 2.17 with the first
no
This latter representation is the definition often used for the negative-binomial
distribution. However, in our context, it is simpler to work with the first definition.
Especially, parameter estimation will give an explicit meaning to the latent variable
Θ.
(Θλv)k
P[N = k] = E [P[N = k|Θ]] = E exp{−Θλv}
k!
Z ∞ k γ
(xλv) γ
= exp{−xλv} xγ−1 exp {−γx} dx
0 k! Γ(γ)
Z ∞
(λv)k γ γ Γ(γ + k) (γ + λv)γ+k γ+k−1
= x exp {−(γ + λv)x} dx
Γ(γ) k! (γ + λv)γ+k 0 Γ(γ + k)
γ k
Γ(γ + k) γ λv k+γ−1 γ
= = (1 − p) pk ,
Γ(γ) k! γ + λv γ + λv k
w)
notice that the second last inequality follows because we have a gamma density with shape
parameter γ + k and scale parameter γ + λv under the integral. This trick of completion should
be remembered because it is applied rather frequently. 2
(m
E[N ] = λv
Var(N ) = λv(1 + λv/γ) > λv,
s
1 q
Vco(N ) = 1 + λv/γ > γ −1/2 > 0,
λv
!γ
1−p
tes
MN (r) = for all r < − log p,
1 − per
properties of the gamma distribution. Therefore, it remains to calculate the moment generating
function. The tower property implies
from which the claim follows for Θ ∼ Γ(γ, γ) and 1 − p = γ/(γ + λv). 2
NL
Proposition 2.21 provides a nice interpretation. For given volume v > 0 the ex-
pected claims frequency is
N
E = λ.
v
Moreover for the coefficient of variation of the claims frequency N/v we obtain
N
q
Vco = (λv)−1 + γ −1 → γ −1/2 > 0 for v → ∞.
v
Thus, the random variable Θ reflects the uncertainty in the “true” underlying
frequency parameter of the Poisson distribution. This uncertainty also remains
in the portfolio for infinitely large volume v, i.e. it is not diversifiable, and the
positive lower bound is determined by the dispersion parameter γ ∈ (0, ∞). The
interpretation of this is as follows: consider the time series N1 , N2 , . . . of claims
w)
0.025
binomial
Poisson
negative−binomial
0.020
(m
probability weights p_k
0.015
0.010
tes
0.005
no
0.000
E[S] = λv E[Y1 ],
Var(S) = λv E[Y12 ] + (λv)2 E[Y1 ]2 /γ,
s
1 q
Vco(S) = 1 + Vco(Y1 )2 + λv/γ > γ −1/2 ,
λv
!γ
1−p
MS (r) = for r ∈ R such that MY1 (r) < 1/p,
w)
1 − pMY1 (r)
(m
Proof. The proof is an immediate consequence of Propositions 2.2 and 2.21.
In this section we describe the first two methods. For the Bayesian inference method
we refer to Chapter 8.
We define the sample mean and sample variance by, T ≥ 2 for the latter,
T T
1 X 1
σbT2 = (Xt − µb T )2 .
X
µb T = Xt and (2.5)
T t=1 T − 1 t=1
A straightforward calculation shows that these are unbiased estimators for µ and
σ 2 , that is,
)
This motivates the moment estimator (ϑb1 , ϑb2 ) for (ϑ1 , ϑ2 ) by solving the system of
w
equations
µb T = µ(ϑb1 , ϑb2 ) and σbT2 = σ 2 (ϑb1 , ϑb2 ).
(m
In our situation the problem is more involved. Assume we have a vector of obser-
vations N = (N1 , . . . , NT )0 , where Nt denotes the number of claims in accounting
year t. The difficulty is that Nt , t = 1, . . . , T , are not i.i.d. because they depend
on different volumes vt . That is, in general, the portfolio changes over accounting
years. Therefore, we need to slightly modify the framework described above.
tes
Assumption 2.25. Assume that there exist strictly positive volumes v1 , . . . , vT
such that the components of F = (N1 /v1 , . . . , NT /vT )0 are independent with
for all t = 1, . . . , T .
Lemma 2.26. We make Assumption 2.25. The unbiased linear (in F) estimator
for λ with minimal variance is given by
T
!−1 T
1 Nt /vt
NL
b MV
X X
λ T = 2 2
,
t=1 τt t=1 τt
T
!−1
b MV
X 1
Var λT = 2
.
t=1 τt
Proof. We apply the method of Lagrange, see Section 24.3 in Merz-Wüthrich [70]. We define
the mean vector λ = λe = λ(1, . . . , 1)0 ∈ RT and the diagonal positive definite covariance matrix
T = diag(τ12 , . . . , τT2 ) of F. Then we would like to solve the following minimization problem
1 0
x+ = arg min{x∈RT ;x0 λ=λ} x T x,
2
thus, we minimize the variance Var(x0 F) = x0 T x subject to all unbiased linear combinations of
F which gives the constraint λ = E[x0 F] = x0 λ. The Lagrangian for this problem is given by
1 0
L(x, c) = − x T x − c(x0 λ − λ),
2
with Lagrange multiplier c. The optimal value x+ is found by the solution of
∂ ∂
L(x, c) = −T x − cλ = 0 and L(x, c) = −x0 λ + λ = 0.
∂x ∂c
The first requirement implies x = −cT −1 λ = −cλT −1 e. Plugging this into the second require-
ment implies λ = x0 λ = −cλ2 e0 T −1 e. If we solve this for the Lagrange multiplier we obtain
−c = λ−1 (e0 T −1 e)−1 . This provides
w)
T
!−1
1 X 1 0
x+ = 0 −1 T −1 e = 2 τ1−2 , . . . , τT−2 .
eT e τ
t=1 t
(m
T
!−1
bMV = (x+ )0 T x+ = (e0 T −1 e)−1 =
X 1
Var λ T 2 .
τ
t=1 t
We apply this lemma to the case of the binomial and the Poisson distributions.
Assume that Nt , t = 1, . . . , T , are independent with Nt ∼ Binom(vt , p) or Nt ∼
tes
Poi(λvt ), respectively. Then we have in the binomial case
E[Nt /vt ] = p and Var(Nt /vt ) = p(1 − p)/vt = τt2 ,
These variances (and uncertainties) converge to zero for Ts=1 vs → ∞, and they can
P
w)
PT PT
t=1 Nt ∼ Poi(λ t=1 vt ).
(m
E[Nt /vt ] = λ and Var(Nt /vt ) = λ/vt + λ2 /γ = τt2 .
The variance term has two unknown parameters λ and γ and we lose the nice mul-
tiplicative structure from the binomial and the Poisson case which has allowed to
apply Lemma 2.26 in a straightforward manner. If we drop the condition “minimum
variance” we obtain the following unbiased linear estimator.
tes
Estimator 2.28 (moment estimator in the negative-binomial case (1/2)).
We have the following unbiased linear estimator for λ
T T
1 vt Nt
no
b NB = P
X X
λ T T Nt = PT .
s=1 vs t=1 t=1 s=1 vs vt
!2 T PT
b NB
1 X t=1 λvt + (λvt )2 /γ
Var λ T = PT Var(Nt ) = 2 .
vs
P
T
s=1 t=1 s=1 vs
w)
bN B for λ we have
Proof of Lemma 2.29. Using the unbiasedness of λT
T
" #
2
h i
2 1 X Nt bN B
E VT
b = vt E − λT
T − 1 t=1 vt
(m
T
1 X Nt b N B
= vt Var − λT
T − 1 t=1 vt
T
1 X Nt Nt b N B
bN B
= vt Var − 2Cov , λT + Var λ T
T − 1 t=1 vt vt
" T #
1 X λvt + (λvt )2 /γ PT λvt + (λvt )2 /γ
tes
t=1
= − PT .
T − 1 t=1 vt s=1 vs
(λb NB v)2
T
γbTNB = with
VbT2 v − λb NB v
T
T
1 X bN B v 2 = σ
VbT2 v = Nt − λ T b T2 ,
T − 1 t=1
where the latter term is the sample variance of i.i.d. random variables Nt . Or in
other words, the proposed estimators in the uniform volume vt = v case are found
by looking at the system of equations (2.6). In the negative-binomial model this
system is given by
Replacing µ and σ 2 by their sample estimators and solving the system of equations
b NB and γ
provides λ bTNB in the uniform volume case.
T
w)
Assume that the components of N = (N1 , . . . , NT )0 are independent with probabil-
(t)
ity weights pk (ϑ) = Pϑ [Nt = k] = P[Nt = k] which depend on a common unknown
parameter ϑ. The independence property of N1 , . . . , NT implies that their
(m
joint likelihood function is given by
T
Y (t)
LN (ϑ) = pNt (ϑ),
t=1
The MLE for ϑ is based on the rationale that ϑ should be chosen such that the
probability of observing {N1 , . . . , NT } is maximized. The MLE ϑbMLE
T for ϑ based
on the observation N is given by
T
ϑ ϑ
(t)
If the probability weights pk (ϑ) are sufficiently regular as a function of ϑ in a reg-
ular domain which contains the true parameter ϑ, then the MLE ϑbMLE T is asymp-
totically unbiased for T → ∞ and under appropriate scaling it has an asymptotic
Gaussian distribution with inverse Fisher’s information as covariance matrix, for
details see Theorem 4.1 in Lehmann [61].
w)
Calculating the derivative w.r.t. p provides the requirement
T
∂ X Nt v t − Nt
`N (p) = − = 0.
∂p t=1
p 1−p
(m
Solving this for p proves the claim. 2
t=1
T
!
∂ X Nt + γ − 1
log + γ log(1 − pt ) + Nt log pt = 0,
∂(λ, γ) t=1 Nt
Unfortunately, this system of equations does not have a closed form solution, and
a root search algorithm is needed to find the MLE solution for (λ, γ), see also page
61 below.
w)
year volume number of frequency
t vt claims Nt Nt /vt
1982 240’755 13’153 5.46%
(m
1983 255’571 14’186 5.55%
1984 269’739 14’207 5.27%
1985 281’708 13’461 4.78%
1986 306’888 21’261 6.93%
1987 320’265 19’934 6.22%
1988 323’481 15’796 4.88%
1989 334’753 15’157 4.53%
tes
1990 340’265 17’483 5.14%
1991 344’757 19’185 5.56%
total 3’018’182 163’823 5.43%
Table 2.1: Private households water insurance: number of policies and claims
no
We observe a strong growth of volume of more than 40% in this insurance portfo-
NL
lio from v1982 = 2400 755 policies to v1991 = 3440 757 policies. Such a strong growth
might question the stationarity assumption in the expected claims frequency λt ≡ λ
because it might also reflect a substantial change in the portfolio (and product
maybe). Nevertheless we assume its validity (because the observed claims frequen-
cies Nt /vt do not show any structure such as a linear trend, see Figure 2.2) and
we fit the Poisson and, if necessary, the negative-binomial distribution to this data
set.
0.070
●
0.065
●
observed frequencies
0.060
0.055
● ●
●
w)
●
0.050
●
0.045
(m
1982 1984 1986 1988 1990
Figure 2.2: Observed yearly claims frequencies Nt /vt from t = 1982 to 1991 com-
pared to the overall average of 5.43%, see Table 2.1.
tes
The coefficient of variation in the Poisson model is given by, see (2.2),
CI
c = (5.39%, 5.47%).
t
These resulting confidence bounds are very narrow and we observe that most of
the observed claims frequencies Nt /vt in Table 2.1 lie outside of these confidence
bounds, see Figure 2.3 (lhs). This clearly rejects the assumption of having Pois-
son distributions for the number of claims and suggests that we should study the
negative-binomial model for Nt .
0.070
0.070
● ●
0.065
0.065
● ●
observed frequencies
observed frequencies
0.060
0.060
0.055
0.055
● ● ● ●
● ●
● ●
● ●
0.050
0.050
● ●
● ●
w)
0.045
0.045
● ●
1982 1984 1986 1988 1990 1982 1984 1986 1988 1990
(m
Figure 2.3: Observed yearly claims frequencies Nt /vt from t = 1982 to 1991 com-
pared to the to the estimated overall frequency of 5.43%. (lhs): 1 standard devia-
tion confidence bounds Poisson case; (rhs): 1 standard deviation confidence bounds
negative-binomial case.
This makes much more sense in view of the observed frequencies Nt /vt in Table
NL
2.1. We see that 7/10 of the observations are within these confidence bounds, see
Figure 2.3 (rhs).
We close this example with a statistical test: In the previous example it was obvious
that the Poisson model does not fit to the data. In situations where this is less
obvious we can use the following χ2 -goodness-of-fit test.
We are going to build a test statistics for the evaluation of this null hypothesis H0 .
We define
T
(Nt /vt − λ)2
χ∗ = χ∗ (N) =
X
.
t=1 λ/vt
with E[X1 ] = λ and Var(X1 ) = λ. But then the CLT (1.2) applies with
Pvt
Nt /vt − λ Nt − λvt (d) X − λvt
w)
Ze t = q = √ = √i
i=1
⇒ N (0, 1) as vt → ∞.
λ/vt λvt λvt
(m
i.i.d.
Next, if we assume that Z1 , . . . , ZT ∼ N (0, 1) then a standard result in statistics
says that Tt=1 Zt2 has a χ2 -distribution with T degrees of freedom, denoted by χ2T ,
P
see also Exercise 2 on page 21. Therefore, we obtain the asymptotic approximation
in distribution
T
(Nt /vt − λ)2 X T T
(d) X 2
χ∗ = χ∗ (N) = Zet2 ≈ Zt ∼ χ2T .
X
tes
=
t=1 λ/vt t=1 t=1
In the last step we need to replace the unknown parameter λ by its estimate λ b MLE .
T
By doing so, we lose one degree of freedom, thus, we get the test statistics χb∗ and
the corresponding distributional approximation
no
2
T b MLE
Nt /vt − λ T (d)
χb∗ = ≈ χ2T −1 .
X
vt b MLE
t=1 λ T
NL
For the data in Table 2.1 we obtain χb∗ = 20 627. The 99%-quantile of the χ2 -
distribution with T − 1 = 9 degrees of freedom is given by 21.67. Since this is by
far smaller than χb∗ we reject the null hypothesis H0 on the significance level of
q = 1%. This, of course, is not surprising in view of Figure 2.3 (lhs).
Exercise 3. Consider the data given in Table 2.2. Estimate the parameters for
t 1 2 3 4 5 6 7 8 9 10
Nt 1’000 997 985 989 1’056 1’070 994 986 1’093 1’054
vt 10’000 10’000 10’000 10’000 10’000 10’000 10’000 10’000 10’000 10’000
the Poisson and the negative-binomial models. Which model is preferred? Does
a χ2 -goodness-of-fit test reject the null hypothesis on the 5% significance level of
having Poisson distributions?
w)
(m
tes
no
NL
w)
(m
tes
no
NL
w)
In Model Assumptions 2.1 we have introduced the compound distribution
(m
N
X
S = Y1 + Y2 + . . . + YN = Yi ,
i=1
53
54 Chapter 3. Individual Claim Size Modeling
the claim, reporting a (small) claim reduces the no-claims-benefit too much so that
the insured decides to withdraw the claim, etc.
We can deal in two different ways with such zero claims: (i) estimate the probability
of a zero claim separately and add a probability weight to G at 0; (ii) we simply
reduce the expected claims frequency λ by these zero claims. The first way (i) is
mathematically consistent, but contradicts our model assumption G(0) = 0; the
second way (ii) perfectly fits into the compound Poisson modeling framework due
to the disjoint decomposition Theorem 2.14. In general, the second version is the
simpler one to deal with (however, one may lose some information). Therefore, we
w)
assume that G(0) = 0 and E[N ] = λv, where v > 0 is the portfolio size and the
expected claims frequency λ > 0 only assesses strictly positive claims. Hence, after
subtracting these zero claims we have n = 610 053 claims records in PP insurance
and n = 140 532 in CP insurance.
We start with the scatter plots of the data, see Figures 3.1 and 3.2. We plot the
(m
individual claim sizes (ordered by arrival date) both on the original scale (lhs) and
on the log scale (rhs). These scatter plots do not offer much information because
tes
no
NL
Figure 3.1: Scatter plot of the n = 610 053 strictly positive claims records of PP
insurance ordered by arrival date: original scale (lhs) and log scale (rhs).
they are overloaded, at least they do not show any obvious trends. We calculate
the sample means and the sample variances of the observations, see also (2.5),
n n
1 X 1 X
µb n = Yi and σbn2 = (Yi − µb n )2 ,
n i=1 n − 1 i=1
w)
(m
Figure 3.2: Scatter plot of the n = 140 532 strictly positive claims records of CP
insurance ordered by arrival date: original scale (lhs) and log scale (rhs).
12000
tes
50000
10000
40000
8000
30000
count
count
6000
no
20000
4000
10000
2000
0
Figure 3.3: Histogram of the n = 610 053 strictly positive claims records of PP
insurance: original scale (lhs) and log scale (rhs).
Next we give the histogram for PP insurance, see Figure 3.3 (lhs). We see that a
few large claims distort the whole picture so that the histogram is not helpful. We
could plot a second one only considering small claims. In Figure 3.3 (rhs) we plot
the histogram for logged claim sizes. In Figure 3.4 we give the corresponding box
plots, they show positive skewness. The ultimate goal is to have the full distribution
functions G(y) = P[Y ≤ y] of the two claims classes PP and CP. Since we have
so many observations we could directly work (at least for small claims, see Section
w)
(m
Figure 3.4: Box plots of claims records of PP and CP insurance: original scale (lhs)
and log scale (rhs).
The empirical distribution function with logged claim sizes is given in Figure 3.5
(lhs). For a sequence of observations Y1 , . . . , Yn we denote the ordered sample by
no
NL
Y(1) ≤ Y(2) ≤ . . . ≤ Y(n) . For the next definitions we assume that Y ∼ G has finite
mean. We define loss size index function and the empirical loss size index function
by
Ry Pbnαc
0 z dG(z) Y(i)
I(G(y)) = R∞ and Ib
n (α) = Pi=1
n ,
0 z dG(z) i=1 Yi
for α ∈ [0, 1]. The loss size index function I(G(y)) chooses a claim size threshold y
and then it evaluates the relative expected claim among that can be explained by
claim sizes up to this threshold y. The resulting empirical graphs are presented in
Figure 3.5 (rhs). Rather typically in non-life insurance we see that the 20% largest
claims roughly cause 75% of the total claim size! Basically, this explains that we
need to well understand large claims because they heavily influence the total claim
w)
amount.
We have already seen in the previous figures that large claims may lead to several
modeling difficulties. Two plots that especially focus on large claims are the mean
excess plot and the log-log plot. We define the mean excess function and empirical
(m
mean excess function by
Pn
i=1 (Yi − u)1{Yi >u}
e(u) = E [Yi − u|Yi > u] and ebn (u) = Pn .
i=1 1{Yi >u}
y 7→ (log y, log(1 − G(y))) and y 7→ log y, log(1 − G
b (y)) .
n
NL
Figure 3.6: Empirical log-log plot (lhs) and empirical mean excess plot (rhs) of PP
and CP insurance data.
In Figure 3.6 we present the empirical log-log and mean excess plots of the two
data sets. Linear decrease in the log-log plot and linear increase in the mean excess
plot will have the interpretation of heavy tailed distributions in the sense that the
survival function Ḡ = 1 − G is regularly varying at infinity.
w)
the following notation for a random variable Y ∼ G:
(m
MY (r) moment generating function of Y in r ∈ R, where it exists,
µY expected value of Y , if it exists,
σY2 variance of Y , if it exists,
Vco(Y ) coefficient of variation of Y , if it exists,
ςY skewness of Y , if it exists,
tes
Ḡ = 1 − G survival function of Y , i.e. Ḡ = P[Y > y].
w)
point of view distribution functions G with Ḡ ∈ R−α for
some α > 0 are dangerous because they have a large potential for big claims, see
Chapter 3 in Embrechts et al. [36]. Therefore, it is crucial to know this index of
regular variation at infinity, see also Remarks 5.17.
(m
3.2.1 Gamma distribution
The gamma distribution has two parameters, a shape parameter γ > 0 and a scale
parameter c > 0. We write Y ∼ Γ(γ, c). The distribution function of Y has positive
support R+ with density for y ≥ 0 given by
tes
cγ γ−1
g(y) = y exp {−cy} .
Γ(γ)
There is no closed form solution for the distribution function G. For y ≥ 0 it can
no
only be expressed as
Z y
cγ γ−1 −cx 1 Z cy γ−1 −z
G(y) = x e dx = z e dz = G(γ, cy),
0 Γ(γ) Γ(γ) 0
NL
where G(·, ·) is the incomplete gamma function. From this we see that the family
of gamma distributions is closed towards multiplication with a positive constant,
that is, for ρ > 0 we have
ρY ∼ Γ(γ, c/ρ). (3.4)
This property is important when we deal with claims inflation. For the moment
generating function and the moments we have
γ
c
MY (r) = for r < c,
c−r
γ γ
µY = , σY2 = 2 ,
c c
−1/2
Vco(Y ) = γ , ςY = 2γ −1/2 > 0.
w)
• Prove the statements of the moment generating function MY and the loss
index function I(G(y)). Hint: use the trick of the proof of Proposition 2.20.
(m
1 − I(G(u))
e(u) = µY − u, E[Y 1{u1 <Y ≤u2 } ] = µY (I(G(u2 )) − I(G(u1 ))) .
1 − G(u)
The gamma distribution does not have a regularly varying tail at infinity, see
tes
Table 3.4.4 in Embrechts et al. [36]. In fact, Ḡ(y) = 1 − G(y) decays roughly as
exp{−cy} to 0 as y → ∞ because exp{−cy} gives an asymptotic lower bound and
exp{−(c − ε)y} an asymptotic upper bound for any ε > 0 on Ḡ(y).
For generating gamma random numbers in R the following code is used (n stands
for the number of random numbers to be generated)
no
cbMM = and γb MM = .
σbn2 σbn2
For the MLE we have log-likelihood function, set Y = (Y1 , . . . , Yn )0 ,
n
X
`Y (γ, c) = γ log c − log Γ(γ) + (γ − 1) log Yi − cYi .
i=1
The numerical fitting does not always work when the range of observations Y is
too large. In such cases it is recommended that in the first step the data is scaled
by a constant factor ρ > 0, this can be done due to (3.4), and parameters are
estimated for scaled data; and in the second step then the constant is scaled back
by the same factor. An alternative way is to explicitly program the function given
in (3.5) and then apply the root search command uniroot(). The term Γ0 (γ)/Γ(γ)
w)
is calculated with digamma(), see also Section 3.9.5 in Kaas et al. [57].
(m
exponential distribution with parameter c > 0 denoted by expo(c). The special
case γ = k/2 and c = 1/2 is the χ2 -distribution with k ∈ N degrees of freedom, see
Exercise 2 on page 21.
tes
Example 3.2 (gamma distribution for PP data). We fit the PP insurance data
displayed in Figure 3.1 to the gamma distribution.
no
NL
Figure 3.7: Gamma distribution with MM and MLE fits applied to the PP insur-
ance data. lhs: QQ plot; rhs: loss size index function.
From Figures 3.7 and 3.8 we immediately conclude that the gamma model does
not fit to the PP data. The reason is that the data is too heavy tailed, which
can be seen in the QQ plot in Figure 3.7 (lhs): the data at the right end of the
distribution lies substantially above the line. The MM estimators manage to model
w)
(m
Figure 3.8: Gamma distribution with MM and MLE fits applied to the PP insur-
ance data. lhs: log-log plot; rhs: mean excess plot.
the data up to some layer, the MLE estimators, however, are heavily distorted by
the small claims which can be seen in the mean excess plot in Figure 3.8 (rhs).
In fact, we have too many small claims (observations below 1’500). The MLE is
tes
heavily based on these small observations, in Figure 3.7 (rhs) and Figure 3.8 (lhs)
we see that MLE fits well for small claims, whereas MM provides more appropriate
results in the upper range of the data. Summarizing, we should choose more heavy
tailed distribution functions to model this data and the resulting figures are already
no
( ) ( )
α −3/2 (α − cy)2 α −3/2 α2 1
g(y) = √ y exp − = √ y exp − + α − cy ,
2πc 2cy 2πc 2cy 2
where α > 0 is a shape parameter and c > 0 a scale parameter. Observe that this
density behaves similar as the gamma density for y → ∞. For the distribution
function we have a closed form solution
! !
α √ α √
G(y) = Φ − √ + cy + e2α Φ − √ − cy ,
cy cy
where Φ(·) is the standard Gaussian distribution. This can easily be checked by
calculating the derivative of the latter. For the moment generating function and
w)
(m
tes
Figure 3.9: Inverse Gaussian distribution with MM and MLE fits applied to the
PP insurance data: QQ plot.
no
In Figure 3.9 we see that this leads to an improvement of the fit compared to the
gamma distribution. Overall it is still not convincing, especially in the tails, and
because the inverse Gaussian distribution is less handy than the ones that will just
be presented below we refrain from further discussing this distribution function.
w)
density for y ≥ 0 given by
(m
We are especially interested in τ ∈ (0, 1) because this provides a slower decay of
the survival function compared to the gamma distribution. For y ≥ 0 we have
tes
G(y) = 1 − exp {−(cy)τ } ,
which still does not provide regularly varying tails at infinity but the decay of the
no
survival function Ḡ is slower than in the gamma case for τ < 1. The family of
Weibull distributions is closed towards multiplication with positive constants, that
is, for ρ > 0 we have
ρY ∼ Weibull(τ, c/ρ).
NL
w)
(d)
For generating Weibull random numbers observe that we have the identity Y =
1 1/τ (d)
c
Z with Z ∼ expo(1) = Γ(1, 1). The R code for the Γ(1, 1) distribution is
(m
> rgamma(n, shape=1, rate=1).
f <- function(x,a){lgamma(1+2/x)-2*lgamma(1+1/x)-log(a+1)}
tau <- uniroot(f, c(0.0001,1), tol=0.0001, a=var(data)/mean(data)ˆ2)
no
Example 3.4 (Weibull distribution for PP data). We fit the PP insurance data
NL
displayed in Figure 3.1. From Figures 3.10 and 3.11 we see that the Weibull model
gives a better fit to the PP data compared to the gamma model. The reason is that
it allows for more probability mass in the tail of the distribution, the estimate for
τ is in the interval (0.5, 0.75). The MM estimators manage to model the data up
to some layer. The MLE estimators, however, are still distorted by the big mass
of small claims which can be seen in the mean excess plot in Figure 3.11 (rhs).
Summarizing, we should choose even more heavy tailed distributions to model this
data, and we should carefully treat large claims.
w)
(m
Figure 3.10: Weibull distribution with MM and MLE fits applied to the PP insur-
ance data. lhs: QQ plot; rhs: loss size index function.
tes
no
NL
Figure 3.11: Weibull distribution with MM and MLE fits applied to the PP insur-
ance data. lhs: log-log plot; rhs: mean excess plot.
with Φ(·) denoting the standard Gaussian distribution function. The family of
log-normal distributions is closed towards multiplication with a positive constant,
that is, for ρ > 0 we have
ρY ∼ LN(µ + log ρ, σ 2 ).
w)
We have
(m
MY (r) does not exist,
n o
µY = exp µ + σ 2 /2 ,
n o
σY2 = exp 2µ + σ 2 exp{σ 2 } − 1 ,
1/2
Vco(Y ) = exp{σ 2 } − 1 ,
1/2
exp{σ 2 } + 2 exp{σ 2 } − 1
tes
ςY = .
σ σ
!
2
log y − (µ + σ )
I(G(y)) = Φ ,
σ
log u−(µ+σ 2 )
1−Φ σ
e(u) = µY − u.
log u−µ
1− Φ
NL
The log-normal distribution does not have regularly varying survival function at
infinity, see Table 3.4.4 in Embrechts et al. [36]. For generating log-normal random
numbers we simply choose standard Gaussian random numbers Z ∼ Φ and then
set Y = exp{µ + σZ}.
Example 3.5 (log-normal distribution for PP data). We fit the PP insurance data
displayed in Figure 3.1. In Figures 3.12 and 3.13 we present the results. We observe
w)
(m
Figure 3.12: Log-normal distribution with MM and MLE fits applied to the PP
insurance data. lhs: QQ plot; rhs: loss size index function.
tes
no
NL
Figure 3.13: Log-normal distribution with MM and MLE fits applied to the PP
insurance data. lhs: log-log plot; rhs: mean excess plot.
that the log-normal distribution gives quite a good fit. We give some comments
on the plots: The MM estimator looks convincing because the observations match
the lines quite well. The only things that slightly disturb the picture are the three
largest observations, see QQ plot. It seems that they are less heavy tailed then the
log-normal distribution would suggest. This is also the reason why the empirical
mean excess plot deviates from the log-normal distribution, see Figure 3.13 (rhs).
A little bit puzzling is the bad performance of the MLE. The reason is again that
more than 50% of the claims are less than 1’500. The MLE therefore is very
much based on these small observations and provides a good fit in that range of
observations but it gives a bad fit for larger claims. We conclude that this PP data
set should be modeled with different distributions in different layers. The reason
for this heterogeneity is that PP insurance contracts have different modules such as
theft, water damage, fire, etc. and it is recommended (if data allows) to model each
of these modules separately. This may also explain the abnormalities in the log-log
plot because these different modules, in general, have different maximal covers.
w)
3.2.4 Log-gamma distribution
The log-gamma distribution is more heavy tailed than the log-normal distribution
and is obtained by assuming that log Y ∼ Γ(γ, c) for positive parameters γ and c.
(m
The density for y ≥ 1 is given by
cγ
g(y) = (log y)γ−1 y −(c+1) ,
Γ(γ)
tes
and the distribution function can be written as
µY = for c > 1,
c − 1 γ
c
σY2 = − µ2Y for c > 2,
c−2
γ
1 c
2 3
ςY = − 3µY σY − µY for c > 3.
σY3 c−3
w)
(m
Figure 3.14: Log-gamma distribution with MM and MLE fits applied to the PP
insurance data. lhs: QQ plot; rhs: loss size index function.
tes
no
NL
Figure 3.15: Log-gamma distribution with MM and MLE fits applied to the PP
insurance data. lhs: log-log plot; rhs: mean excess plot.
where the latter is solved numerically using, e.g., the R command uniroot().
The MLE is obtained analogously to the MLE for gamma observations by simply
replacing Yi by log Yi .
w)
3.17 for the distributions considered.
(m
tes
Figure 3.16: Cumulative distribution functions with logged claim sizes.
no
Weibull distribution
log−normal distribution
log−gamma distribution
10000
NL count
8000
6000
4000
2000
0
4 6 8 10 12
Figure 3.16 (lhs) shows the gamma, inverse Gauss and Weibull fits, Figure 3.16
(rhs) the log-normal and log-gamma fits. In Figure 3.17 we give the histogram for
the Weibull, log-normal and log-gamma fits.
w)
V.F.D. Pareto
The Pareto distribution specifies a (large claims) threshold θ >
0 and then only models claims above this threshold, see also
Example 2.16. The claims above this threshold are assumed to have regularly
(m
varying tails with tail index α > 0. For Y ∼ Pareto(θ, α), the density for y ≥ θ is
given by
−(α+1)
α y
g(y) = ,
θ θ
tes
and distribution function can be written as
−α
y
G(y) = 1 − .
no
We have closedness towards multiplication with a positive constant, that is, for
ρ > 0 we have
NL
As soon as we only study tails of distributions we should use MLEs for parameter
w)
estimation (the method of moments is not sufficiently robust against outliers).
Since the threshold θ has a natural meaning we only need to estimate α. The MLE
is given by
n
!−1
1
b MLE =
X
α log Yi − log θ .
(m
n i=1
i.i.d.
Lemma 3.8. Assume Y1 , . . . , Yn ∼ Pareto(θ, α). We have
h i n n2
E b MLE
α = α and Var b MLE
α = α2 .
n−1 (n − 1) (n − 2)
2
tes
(d)
Proof. Choose Z ∼ expo(α) = Γ(1, α). Then, θeZ = Y ∼ Pareto(θ, α) (this can be seen
by a change of variables in the corresponding densities). This immediately implies that Zi =
i.i.d.
log Yi − log θ ∼ expo(α). The sum of these i.i.d. exponential random variables is gamma
distributed with parameters γ = n and c = α. Using the scaling property (3.4) we conclude that
no
n
!−1
H 1 X
α
b k,n = log Y(i) − log Y(k) .
n − k + 1 i=k
The Hill estimator is based on the rationale that the Pareto distribution is closed
towards increasing thresholds, i.e. for Y ∼ Pareto(θ0 , α) and θ1 > θ0 we have for
all y ≥ θ1
y −α
−α
y
θ0
P [ Y > y| Y ≥ θ1 ] = −α = .
θ1 θ1
θ0
Therefore, if the data comes from a Pareto distribution we should observe stability
H
in α
b k,n for changing k. The confidence bounds of the Hill estimators are determined
by Lemma 3.8.
w)
Example 3.9 (Pareto for extremes of PP insurance). We start the analysis with
the PP insurance data.
(m
tes
no
H
Figure 3.18: PP insurance data; lhs: Hill plot k 7→ αb k,n with confidence bounds of
1 standard deviation; rhs: log-log plot for α = 2.5.
NL
w)
(m
Figure 3.19: PP insurance data largest claims only; lhs: QQ plot; rhs: mean excess
plot for α = 2.5.
but the data becomes less heavy tailed further out in the tails. This becomes also
obvious from the mean excess plot and the QQ plot in Figure 3.19.
tes
Example 3.10 (Pareto for extremes of CP insurance). In a second analysis we
analyze the extremes of the CP claims data of Figure 3.2. The results are presented
no
NL
H
Figure 3.20: CP insurance data; lhs: Hill plot k 7→ αb k,n with a confidence interval
of 1 standard deviation; rhs: log-log plot for α = 1.4.
in Figure 3.20. At the first sight they look similar to the PP insurance example,
i.e. they begin to destabilize between the 150 and 100 largest claims. However, the
main difference is that the tail index is much smaller in the CP example. That is,
there is a higher potential for large claims for this line of business. Of course, this
Example 3.11 (nuclear power accident example). We revisit the nuclear power
accident data set studied in Hofert-Wüthrich [53].
w)
In Figure 3.21 we plot all nuclear power acci-
dents that have occurred until the end of 2011
and which have a claim size larger than 20 mio.
USD (as of 2010). These events include Three
(m
Mile Island (United States, 1979), Chernobyl
(Ukraine, 1986) and Fukushima (Japan, 2011).
Fukushima 2011
In Figure 3.22 we provide the Hill plot. We
observe that this data is very heavy tailed.
tes
scatter plot logged claim sizes nuclear power accidents empirical distribution
1.0
24
no
●
●
0.8
● ●
●
●
●
●
●
22
●
●
●
● ●
claim sizes (log scale)
●
empirical distribution
● ●
● ●
0.6
● ●
21
●
●
● ●
●
●
● ● ●
●
●
●
20
● ●
● ● ●
0.4
●
● ●
● ● ●
● ● ●
● ● ●
●
19
●
● ● ●
●
● ●
● ● ●
● ● ●
NL
0.2
● ● ● ●
● ●
● ● ●
18
● ● ●
● ● ●● ●
● ●
● ● ● ●
● ● ●
● ● ●
● ● ● ● ●
●
● ● ●
● ●
17
● ● ● ● ● ●
0.0
● ●
0 10 20 30 40 50 60 17 18 19 20 21 22 23 24
Figure 3.21: 61 largest nuclear power accidents until 2011; lhs: logged claim sizes
(in chronological order, the last entry is Fukushima); rhs: empirical distribution
function of claim sizes.
The Hill plot suggests to set the tail index α around 0.64, which means that we
have an infinite mean model. The log-log plot in Figure 3.22 shows that this tail
index choice captures the slope quite well.
Hill plot of nuclear power accidents log−log plot (with alpha = 0.64 for the 61 largest observations)
0
●●
●●
● ●
●●
●
●●
●●●
●
●●●
1.2 ●
●
●●
●
●
●
●●
● ●
●
● ●
●●
−1
● ●●
●●
●
●
●
●
●
●
●
●
Pareto parameter
● ●
−2
● ●
●
●
●
●
0.8
●
● ●
●●● ● ●
−3
●
● ●
● ● ●
● ●
●● ● ● ●● ●
● ● ●
● ● ● ● ●
●
● ●● ●●
● ●●●● ●●
●●●●
0.6
● ●
● ● ●
●
● ●●
−4
w)
● Pareto distribution
0.4
observations ●
61 51 41 31 21 11 17 18 19 20 21 22 23 24
(m
H
Figure 3.22: 61 largest nuclear power accidents until 2011; lhs: Hill plot k 7→ α
b k,n
with confidence bounds of 1 standard deviation; rhs: log-log plot for α = 0.64.
(KS) test and the Anderson-Darling (AD) test. These are discussed in Sections
3.3.1 and 3.3.2.
In Section 3.3.3 we give the χ2 -goodness-of-fit test and we discuss the Akaike
information criterion (AIC) as well as the Bayesian information criterion (BIC).
NL
function, P-a.s., if the number n of i.i.d. observations goes to infinity, see Theorem
20.6 in Billingsley [13].
Assume we have an i.i.d. sequence Y1 , Y2 , . . . from an unknown continuous distribu-
tion function G and we denote the corresponding empirical distribution function of
finite sample size n by G b . We would like to test whether these samples Y , Y , . . .
n 1 2
may stem from G0 . Consider the null hypothesis H0 : G = G0 against the two-sided
alternative hypothesis that these distribution functions differ. We define the KS
test statistics by
b −G
Dn = Dn (Y1 , . . . , Yn ) =
G
b (y) − G (y) .
= sup G
n 0 n 0
w)
∞ y
This KS test statistics has the property that, see (13.4) in Billings-
ley [12],
√
nDn ⇒ Kolmogorov distribution K as n → ∞.
(m
The Kolmogorov distribution K is for y ∈ R+ given by
∞ n o
(−1)j+1 exp −2j 2 y 2 .
X
K(y) = 1 − 2
j=1
N.V. Smirnov
The null hypothesis H0 is rejected on the significance level q ∈ (0, 1) if
tes
Dn > n−1/2 K ← (1 − q),
distribution K.
q 20% 10% 5% 2% 1%
←
K (1 − q) 1.07 1.22 1.36 1.52 1.63
NL
w)
(m
Figure 3.23: KS test statistics for method of moments and MLE fits applied to the
PP insurance data; lhs: log-normal distribution; rhs: log-gamma distribution.
Example 3.13 (KS test, tail distribution). In this example we investigate the tail
fits of the Pareto distributions in the CP and the PP examples for the n = 505
largest claims, see Examples 3.9 and 3.10. The results are presented in Figure 3.24.
tes
no
NL
Figure 3.24: Point-wise terms of KS test statistics for MLE fits applied to the 505
largest claims; lhs: PP insurance data; rhs: CP insurance data.
For the PP insurance data we obtain Dn = 0.027 (for α = 2.5) and for the CP
insurance data we receive Dn = 0.061 (for α = 1.4). The first value is sufficiently
small so that the null hypothesis cannot be rejected on the 5% significance level,
the CP insurance value reflects just about the critical value on the 5% significance
level, i.e. the resulting p-value is just about 5%. The plot of the point-wise terms
of Dn looks fine for the PP insurance data, however the graph for the CP insurance
data looks a bit one-sided, suggesting two different regimes (this can also seen from
Figure 3.20).
w)
ald Allan Darling have developed a modification of the KS
test, the so-called AD test, which gives more weight to the tail
of the distributions. It is therefore more sensitive in detecting
tail fits, but on the other hand it has the disadvantage of not
(m
being non-parametric, and critical values need to be calculated
for every chosen distribution function.
The KS test statistics is modified by the introduction of a weight
T.W. Anderson
function ψ : [0, 1] → R+ which then modifies the KS test statis-
tics Dn as follows
tes
q
b (y) − G (y)
sup G
ψ(G0 (y)).
n 0
y
2
K
X (Ok − Ek )2
Xn,K = . (3.8)
k=1 Ek
2
If d parameters were estimated in G0 , then Xn,K is compared to a χ2 -distribution
with K − 1 − d degrees of freedom, see also Exercise 2 on page 21. Often it is
w)
suggested that we should have Ek > 4 for reasonable results. However, these
rules-of-thumbs are not very reliable.
Within the framework of MLE methods the Hirotugu
Akaike (1927-2009) information criterion (AIC) and the
(m
Bayesian information criterion (BIC) are often used, we
refer to Akaike [2] and Section 2.2 in Congdon [27]. These
criteria are used to compare different distribution func-
tions and densities. Assume we want to compare two
different densities g1 and g2 that where fitted to the data H. Akaike
tes
Y = (Y1 , . . . , Yn )0 . The AIC is defined by
(i)
AIC(i) = −2`Y + 2d(i) ,
no
(i)
where `Y is the log-likelihood function of density gi for data Y and d(i) denotes
the number of estimated parameters in gi , for i = 1, 2. For MLE we maximize
(i)
`Y and in order to evaluate the AIC we penalize the model for having too many
parameters. The AIC then says that the model with the smallest AIC value should
be preferred.
NL
The BIC uses a different penalty term for the number of parameters (all these
penalty terms are motivated by asymptotic results). It reads as
(i)
BIC(i) = −2`Y + log(n) d(i) ,
and the model with the smallest BIC value should be preferred.
Exercise 5 (AIC and BIC). Assume we have claim sizes Y = (Y1 , . . . , Yn )0 with
n = 1000 which were generated by a gamma distribution, see Figure 3.25.
The sample mean and sample standard deviation are given by
w)
(m
Figure 3.25: Claim sizes Y = (Y1 , . . . , Yn )0 with n = 1000; lhs: observed data; rhs:
empirical distribution.
tes
no
NL
Figure 3.26: Fitted gamma distributions; lhs: log-log plot; rhs: QQ plot.
If we fit the parameters of the gamma distribution we obtain the method of mo-
ments estimators and the MLEs
γb MM = 0.9794 and cbMM = 9.4249,
γb MLE = 1.0013 and cbMLE = 9.6360.
This provides the fitted distributions displayed in Figure 3.26. The fits look perfect
and the corresponding log-likelihoods are given by
`Y (γb MM , cbMM ) = 1264.013 and `Y (γb MLE , cbMLE ) = 1264.171.
(a) Why is `Y (γb MLE , cbMLE ) > `Y (γb MM , cbMM ) and which fit should be preferred
according to AIC?
(b) The estimates of γ are very close to 1 and we could also use an exponential
distribution function. For the exponential distribution function we obtain
MLE cbMLE = 9.6231 and `Y (cbMLE ) = 1264.169. Which model (gamma or
exponential) should be preferred according to the AIC and the BIC?
w)
distribution function to the entire range of possible outcomes of the claim sizes.
Therefore, we consider claim sizes in different layers. Another reason why different
layers of claim sizes are of interest is that re-insurance can often be bought for
different claims layers. For these reasons we would like to understand how claim
(m
sizes behave in different layers. First we discuss the modeling issue and second we
describe modeling of re-insurance layers.
Assume that S ∼ CompPoi(λv, G). We consider the total claim Ssc in the small
claims layer and the total claim Slc in the large claims layer given by
N
X N
X
Ssc = Yi 1{Yi ≤M } and Slc = Yi 1{Yi >M } .
NL
i=1 i=1
Theorem 2.14 implies that Ssc and Slc are independent and compound Poisson
distributed with
and
Slc ∼ CompPoi (λlc v = λ(1 − G(M ))v , Glc (y) = P [Y1 ≤ y|Y1 > M ]) .
Thus, we can model large claims and small claims separately. Observe that we
have the following decomposition
w)
2. Estimate probability G(M ) of the event {Y1 ≤ M }.
3. Fit a Pareto distribution to Glc for threshold θ = M , i.e. estimate the tail
index α > 0 from the observations exceeding this threshold.
(m
Example 3.14. We revisit the PP and the CP insurance data set. We choose
tes
no
NL
Figure 3.27: Empirical fit in small claims layer and Pareto distribution fit in large
claims layer, the gray lines show the large claims threshold; lhs: PP insurance data;
rhs: CP insurance data.
large claims threshold M = 500 000 in both cases. In the PP insurance data set we
have 237 observations above this threshold, which provides estimate 1 − G(M
b )=
0
237/61 053 = 0.39%. For the CP insurance example we have 272 claims above
this threshold, which provides estimate 1 − G(M
b ) = 1.87%. Next we calculate
the sample mean and the sample coefficient of variation in the small claims layer
{Yi ≤ M }:
0 PP
PP : µb PP
{Yi ≤M } = 2 805, Vco
d
{Yi ≤M } = 1.80,
0 CP
CP : µb CP
{Yi ≤M } = 4 377, Vco
d
{Yi ≤M } = 1.51.
w)
fit in the large claims layer, having tail parameters α as estimated in Examples 3.9
and 3.10 (this is also supported by the KS tests, see Example 3.13). The results
are presented in Figure 3.27. For PP insurance they look convincing, whereas the
CP insurance fit is not entirely satisfactory in the large claims layer (which might
(m
ask for a bigger large claims threshold M ).
Above we have calculated expected values in claims layers which were given by
tes
E[Y 1{u1 <Y ≤u2 } ] for various parametric distribution functions. This is of interest for
several reasons which we are going to discuss next.
(i) The first reason is that insurance contracts often have a deductible. On the
no
one hand small claims often cause too much administrative costs, and on the other
hand deductibles are also an instrument to prevent from fraud. For instance, it
can become quite expensive if every insured claims that his umbrella got stolen.
Therefore, a deductible d > 0 of say 200 CHF is introduced and the insurance
company only covers claims (Y − d)+ that exceed this deductible. In this case the
NL
Z ∞
E [(Y − d)+ ] = (y − d) dG(y) = E[Y 1{Y >d} ] − d P[Y > d] (3.9)
d
= P[Y > d] (E[Y |Y > d] − d) = P[Y > d] e(d),
under the assumption that P[Y > d] > 0 and e(·) is the mean excess function of Y .
(ii) The second reason is that the insurance company may have a maximal insurance
cover per contract, i.e. it covers claims up to a maximal size of M > 0 and the
exceedances need to be paid by the insured; or, similarly, it may cover claims
exceeding M but has a re-insurance cover for these exceedances. In that case the
insurance company covers (Y ∧ M ) and the pure risk premium for this (bounded)
claim is given by
Z M
E [Y ∧ M ] = y dG(y) + M P[Y > M ] = E[Y 1{Y ≤M } ] + M P[Y > M ]
0
= E[Y ] − E[Y 1{Y >M } ] − M P[Y > M ]
= E[Y ] − P[Y > M ] e(M ) = E[Y ] − E [(Y − M )+ ] .
If we combine the deductibles with the maximal cover we obtain the excess-of-loss
(XL) (re-)insurance treaty. Assume we have a deductible u1 > 0 (in re-insurance
terminology this also called priority or retention). Then the insurance treat “u2
w)
XL u1 ” covers the claims layer (u1 , u1 + u2 ], that is, this contract covers a maximal
excess of u2 above the priority u1 . The pure risk premium for such contracts is
then given by
E[Y 1{u1 <Y ≤u1 +u2 } ].
(m
An issue, when dealing with layers, is claims inflation. Assume we sell insurance
contracts with a deductible d > 0 and we ask for a pure risk premium E [(Y − d)+ ].
Since cash flows have time values this premium has to be revised carefully for later
periods as the following theorem shows.
tes
Theorem 3.15 (leverage effect of claims inflation). Choose a fixed deductible d >
0 and assume that the claim at time 0 is given by Y0 . Assume that there is a
(deterministic) inflation index i > 0 such that the claim at time 1 can be represented
(d)
by Y1 = (1 + i)Y0 . We have
no
= P[Y1 > y + d] dy
0 0
Z ∞ Z ∞
x
= P[Y1 > x] dx = P Y0 > dx
d d 1+i
Z ∞
= (1 + i) P [Y0 > y] dy,
d
1+i
where we have twice applied a change of variables. The latter is calculated as follows
!
Z d Z ∞
E[(Y1 − d)+ ] = (1 + i) P [Y0 > y] dy + P[Y0 > y] dy
d
1+i d
Z d
= (1 + i) P [Y0 > y] dy + (1 + i) E[(Y0 − d)+ ].
d
1+i
Choose inflation index i > 0 such that θ(1 + i) < d. From (3.6) we obtain
(d)
Y1 = (1 + i)Y0 ∼ Pareto(θ(1 + i), α).
w)
This provides
!−α
d 1
E [(Y1 − d)+ ] = d
θ(1 + i) α−1
!−α
d 1
(m
α
= (1 + i) d > (1 + i) E [(Y0 − d)+ ] .
θ α−1
We see that we obtain a strict inequality, i.e. the pure risk premium grows faster
than the claim sizes itself. The reason for this faster growth is that claims Y0 ≤ d
may entitle for claims payments after claims inflation adjustment, i.e. not only the
claim sizes are growing under inflation but also the number of claims is growing if
tes
one does not adapt the deductible to inflation.
no
NL
w)
(m
tes
no
NL
w)
Distributions
(m
In Chapter 2 we have introduced several claims count distributions for the modeling
of the number of claims N within a fixed time period. In Chapter 3 we have met
several claim size distribution functions G for the claim sizes Y1 , Y2 , . . . modeling.
Ultimately, we would always like to calculate the compound distribution function
of S, see Definition 2.1. As explained in Proposition 2.2, we can easily calculate the
tes
moments and the moment generating function of this compound distribution. On
the other hand the distribution function of S given in (2.1) is a notoriously difficult
object because it involves (too many) convolutions of the claim size distribution
function G. The aim of this chapter is to explain how we can circumvent this
difficulty.
no
4.1 Approximations
In many cases approximations to S are used. This may be justified by the central
limit theorem (CLT) if the number of claims is large. Compound distributions may
89
90 Chapter 4. Approximations for Compound Distributions
have two different risk drivers in the tail of the distribution function, namely the
number of claims N may contribute to large values of S or single large claims in
Y1 , . . . , YN may drive extreme values in S. Let us concentrate on the compound
Poisson model, in particular, we would like to use the decomposition theorem in
the spirit of Example 2.16. In this case, mostly the claim sizes Yi contribute to
the tail of the distribution (if these are heavy-tailed). Therefore, we emphasize
that in the light of the compound Poisson model one should separate small from
large claims resulting in the independent decomposition S = Ssc + Slc . Then, if
the expected number of small claims vλsc is large, Ssc can be approximated by a
parametric distribution function and Slc should be modeled explicitly. This we are
)
going to describe in detail in the remainder of this chapter.
w
4.1.1 Normal approximation
(m
The normal approximation is motivated by the CLT which goes
back to de Moivre (1733) and Laplace (1812), see (1.2). It
was then Aleksandr Mikhailovich Lyapunov (1857-1918)
who stated it in the general version and who discovered the
importance of the CLT.
tes
The classical CLT holds for a fixed number of claims. In our
approach the number of claims is not fixed, therefore we need
a refinement of the CLT. We do this for Poissonian number of A.M. Lyapunov
claims N by keeping the expected claims frequency λ fixed and by sending the
no
volume v → ∞.
S − λvE[Y1 ]
NL
q ⇒ N (0, 1) as v → ∞.
λvE[Y12 ]
We study the asymptotic behavior. For v → ∞ both enumerator and denominator of the following
expression go the zero, therefore we can apply l’Hôpital’s rule
−3/2
rµ rv −3/2
M Y1 √ r
−1 − √ −MY1 0 √ r √ + rµv
√
λvE[Y12 ] λvE[Y12 ] λvE[Y12 ] 2 λE[Y12 ] 2 λE[Y12 ]
lim = lim
v→∞ (λv)−1 v→∞ −λ−1 v −2
0 r
r M Y1 √ −µ
λvE[Y12 ]
= lim p .
v→∞ 2λ−1 v −1/2 λE[Y12 ]
Since enumerator and denominator still converge to zero as v → ∞ we can apply l’Hôpital’s rule
once more and obtain
w)
−3/2
MY1 √ r 2 − 1 − √ rµ 2 rMY001 √ r 2 −rv
√
λvE[Y1 ] λvE[Y1 ] λvE[Y1 ] 2 λE[Y12 ]
lim −1
= lim p
v→∞ (λv) v→∞ −λ−1 v −3/2 λE[Y12 ]
1 2 00 √ r
2 r M Y1 λvE[Y12 ] 1
= r2 ,
(m
= lim 2
v→∞ E[Y1 ] 2
the last step follows from (1.3). This last expression exactly reflects the moment generating
function of the standard Gaussian distribution, see (1.4), therefore the claim follows from Lemma
1.4. 2
Theorem 4.1 is the motivation for the following approximation of the distribution
tes
function of S
S − λvE[Y1 ] x − λvE[Y1 ] x − λvE[Y1 ]
P [S ≤ x] = P q ≤ q ≈ Φ q , (4.1)
2 2
λvE[Y1 ] λvE[Y1 ] λvE[Y12 ]
no
solely around the mean of S). For rates of convergences we refer to the literature,
for instance, see Embrechts et al. [36].
Note that the normal approximation (4.1) also allows for negative claims S, which
under our model assumptions is excluded, thus, it is really an approximation that
needs to be considered carefully.
Example 4.2 (Normal approximation for PP insurance). We revisit the PP insur-
ance data of Example 3.14. We consider 3 different examples:
(a) Only small claims: in this example we only consider claim size distribution
function G(y) = P [Y ≤ y|Y ≤ M ], i.e. the claims are compactly supported in
(0, M ]. As explicit claim size distribution we choose the empirical distribution
of Example 3.14, see Figure 3.27 (lhs), with M = 500 000. We choose portfolio
size v such that λv = 100.
(b) Claim size distribution function G is chosen as in (a), but this time we choose
portfolio size v such that λv = 1000.
(c) In addition to (b) we add the large claims layer modeled by a Pareto distri-
bution with M = 500 000 and α = 2.5 and for the expected number of large
claims we set λlc v = 3.9.
w)
(m
tes
Figure 4.1: Compound Poisson distribution of S and normal approximation (4.1)
in case (a), i.e. no large claims, expected number of claims 100; lhs: distribution
function; rhs: log-log plot.
no
For simplicity the true distribution function is evaluated by Monte Carlo simula-
tion, which contradicts our statement above, but is appropriate for sufficiently large
samples (and sufficient patience). We choose 100’000 simulations, this is further
illustrated in Example 4.11 below.
In Figure 4.1 we present the results of the normal approximation (4.1) in case (a).
NL
We observe an appropriately good fit around the mean but the normal approxima-
tion clearly under-estimates the tails of the true distribution function, see log-log
plot in Figure 4.1 (rhs). Moreover, the true distribution function has positive skew-
ness ςS = 0.43 whereas the normal approximation has zeroqskewness. In the normal
approximation we obtain probability mass Φ(−λvE[Y1 ]/ λvE[Y12 ]) = 6 · 10−7 for
a negative total claim amount (which is fairly small).
In Figure 4.2 we show situation (b) which is the same as situation (a) the only
change is that we enlarge the portfolio size by a factor 10. We see better approx-
imation properties due to the fact that we have convergence in distribution for
portfolio size v → ∞. We observe a lower skewness ςS = 0.15 which improves the
normal approximation, also in the tails.
Finally, in Figure 4.3 we also include large claims (in contrast to Figure 4.2) having
an expected number of large claims of 3.9 and a Pareto tail parameter of α = 2.5.
w)
(m
Figure 4.2: Compound Poisson distribution of S and normal approximation (4.1)
in case (b), i.e. no large claims, expected number of claims 1000; lhs: distribution
function; rhs: log-log plot.
tes
no
NL
We see that in this case the normal approximation is useless in the tail, which
strongly favors the large claims separation as suggested in Example 2.16.
sizes the approximation may be bad because the true distribution has substantial
skewness. This leads to the idea of approximating the small claims layer by other
distribution functions that also enjoy skewness.
We choose k ∈ R and define the random variable
w)
and in the translated log-normal case
(m
2
ςX = (eσ + 2)(exp{σ 2 } − 1)1/2 .
The idea now is to do a fit of moments between S and X. Assume that S has finite
third moment and then we choose
tes
X = k + Z, where Z ∼ Γ(γ, c) or Z ∼ LN(µ, σ 2 ),
such that the three parameters of X fulfill
1. Prove that the fit of moments approximation (4.2) for a translated gamma
distribution for X provides the following system of equations
E[Y13 ]
λv E[Y1 ] = k + γ/c, λv E[Y12 ] = γ/c2 and = 2γ −1/2 .
(λv)1/2 E[Y12 ]3/2
2. Solve this system of equations for k ∈ R, γ > 0 and c > 0 and prove that it
has a well-defined solution for G(0) = 0.
3. Why should this approximation not be applied to case (c) of Example 4.2?
w)
(m
Figure 4.4: Compound Poisson distribution of S and normal approximation (4.1),
translated gamma and log-normal approximation (4.2) in case (a), i.e. no large
claims, expected number of claims 100; lhs: distribution function; rhs: log-log plot.
tes
no
NL
i.e. case (b), in Figure 4.5. In both cases we see that the translated gamma and log-
normal approximations provide remarkably good fits. For this reason, the small
claims layer is often approximated by one of these two parametric distribution
functions.
Observe that for k > λv we have a Chernoff type bound of (Stirling’s formula
provides asymptotic behavior k! = O(exp{k log(k/e)}) as k → ∞)
This explains that the compound Poisson distribution with bounded claim sizes
w)
Yi ≤ M is less heavy tailed compared to the translated gamma and log-normal
distributions.
The KS test rejects the null hypothesis on the 5% significance level for the normal
approximation in both cases (a) and (b), whereas this is not the case for the
(m
translated gamma and log-normal approximations in both cases (a) and (b), the
p-values are clearly bigger than 5%; for the exact p-values we refer to Table 4.1,
below. In case (a) the translated gamma approximation is favored, in case (b)
the translated log-normal approximation (though the differences in the latter are
negligible).
tes
4.1.3 Edgeworth approximation
S − λvE[Y1 ]
Z= q .
λvE[Y12 ]
We have E[Z] = 0, Var(Z) = 1 and ςZ = ςS , and in fact the latter identity applies
to all further normalized moments of Z and S. The aim now is to approximate
the moment generating function of Z by appropriate terms coming from normal
distributions. Therefore, we first consider the following Taylor expansion around
w)
Using another Taylor expansion for ex = 1 + x + x2 /2! + . . . applied to the latter
exponential function in the last expression the moment generating function of Z is
approximated by
(m
P 2
n
2 /2
n
k=3 ak r k
MZ (r) ≈ er ak r k +
X
1 + + . . .
.
k=3 2!
Lemma 4.4. Let Φ denote the standard Gaussian distribution function and Φ(k)
its k-th derivative. Then for k ∈ N0 and r ∈ R
Z ∞
k r2 /2 k
r e = (−1) erx Φ(k+1) (x) dx.
−∞
NL
Note that the first term on the right-hand side is equal to zero because Φ(k+1) (x) goes faster to
zero than erx may possibly converge to infinity. This and the induction assumption for k provides
identity
Z ∞ Z ∞
2
(−1)k+1 erx Φ(k+2) (x) dx = r (−1)k erx Φ(k+1) (x) dx = r rk er /2 ,
−∞ −∞
Lemma 4.4 allows to rewrite approximation (4.3) as follows, set X ∼ N (0, 1),
h i Z ∞ Z ∞
MZ (r) ≈ E erX − a3 erx Φ(4) (x) dx + bk (−1)k erx Φ(k+1) (x) dx
X
−∞ k≥4 −∞
Z ∞
erx Φ0 (x) − a3 Φ(4) (x) + bk (−1)k Φ(k+1) (x) dx.
X
=
−∞ k≥4
Assume that Z has distribution function denoted by FZ , then the latter suggests
w)
approximation, see Lemmas 1.2-1.3,
dFZ (z) ≈ Φ0 (z) − a3 Φ(4) (z) + bk (−1)k Φ(k+1) (z) dz.
X
k≥4
(m
q
Integration then provides the Edgeworth approximation, set x = λvE[Y12 ] z +
λvE[Y1 ],
def.
P [S ≤ x] = FZ (z) ≈ EW(z) = Φ(z) − a3 Φ(3) (z) + bk (−1)k Φ(k) (z). (4.4)
X
k≥4
tes
This formula now highlights the refinement of the normal approximation (4.1),
namely we correct the first order approximation Φ by higher order terms involving
skewness and other higher order terms reflected by a3 and bk in (4.4).
no
The Edgeworth approximation (4.4) is elegant but its use requires some care as we
are just going to highlight.
We first consider the derivatives Φ(k) for k ≥ 1. The first derivative is given by
1 2
Φ0 (z) = √ e−z /2 ,
NL
2π
and the higher order derivatives for k ≥ 2 are given by
dk−1 1 −z2 /2
k−1 −z 2 /2
Φ(k) (z) = √ e = O z e for |z| → ∞.
dz k−1 2π
From this we immediately see that
Attention. The issue with the Edgeworth approximation EW(z) is that it is not
necessarily a distribution function because it does not need to be monotone in z,
see Example 4.5, below!
w)
2 2 2
Φ(4) (z) = z √ e−z /2 + 2z √ e−z /2 − z 3 √ e−z /2 .
2π 2π 2π
This implies
d
EW(z) = Φ0 (z) − a3 Φ(4) (z) = Φ0 (z) 1 − 3a3 z + a3 z 3 . (4.5)
(m
dz
Consider the function h(z) = 1 − 3a3 z + a3 z 3 for positive skewness ςS > 0. Then
we have
lim h(z) = −∞ and lim h(z) = ∞,
z→−∞ z→∞
which explains that the derivative of EW(z) has both signs and therefore EW(z) is
tes
not monotone. However, in the upper tail of the distribution of S, that is, for z suf-
ficiently large, the Edgeworth approximation (4.5) is monotone and can be used as
an appropriate approximation. We would like to emphasize that these monotonicity
properties should always be carefully checked in the Edgeworth approximation.
We revisit the numerical examples given in Examples 4.3.
no
In Figure 4.6 we give the approximation in case (a), i.e. expected number of claims
equal 100, and in Figure 4.7 we give the approximation in case (b), i.e. expected
number of claims equal 1000. In both cases we only choose the next additional
moment which is the skewness and refers to term a3 and we choose approximation
ez ≈ 1 + z in (4.4). We see in both cases that the Edgeworth approximation clearly
NL
w)
(m
Figure 4.6: Compound Poisson distribution of S and normal approximation (4.1),
translated gamma approximation (4.2) and Edgeworth approximation (4.4) in case
(a), i.e. no large claims, expected number of claims 100; lhs: distribution function;
rhs: log-log plot.
tes
no
NL
Finally, in Table 4.1 we present the p-values of the different approximations re-
sulting from the KS test, see Section 3.3.1. In this particular case we see that
the translated gamma distribution is preferred in case (a), whereas in case (b) the
approximations are very similar. For this reason, one often chooses a translated
gamma distribution in practice. Note that the Edgeworth approximation can be
w)
(m
Figure 4.8: We consider the Edgeworth ’density’ (4.5) to the Gaussian density;
lhs: in case (a), i.e. expected number of claims 100; rhs: in case (b), i.e. expected
number of claims 1000.
refined and improved by considering more terms in the Taylor expansion. This
closes the example.
NL
pk = pk−1 (a + b/k) .
)
Bjørn Sundt and William S. Jewell (1932-2003) have char-
w
acterized the Panjer distributions. This is exactly stated in the
following lemma. B. Sundt
(m
Lemma 4.7 (Sundt-Jewell [83]). Assume N has a non-degenerate Panjer distri-
bution. N is either binomially, Poisson or negative-binomially distributed.
b
pk = pk−1 >0 for all k ∈ N.
k W.S. Jewell
This is exactly the Poisson distribution with parameters a = 0 and b = λv > 0 for A = N0
because for the Poisson distribution we have, see Section 2.2.2, pk /pk−1 = λv/k.
Case (ii). Assume a < 0. To have positive probabilities we need to make sure that a + b/k
remains positive for all k ∈ A. This requires |A| < ∞. We denote the maximal value in A
NL
by v ∈ N (assuming it has pv > 0). The positivity constraint then provides b/v > −a and
a + b/(v + 1) = 0. The latter implies that pk = 0 for all k > v and is equivalent to the requirement
v = −(a + b)/a > 0. We set p = −a/(1 − a) ∈ (0, 1) which provides
b p b
pk = pk−1 a + = pk−1 − + .
k 1−p k
This is exactly the binomial distribution with parameters a = −p/(1 − p) and b = (v + 1)p/(1 − p)
and A = {0, . . . , v}.
Case (iii). Assume a > 0. In this case we define γ = (a + b)/a > 0. This provides b = a(γ − 1)
and
b γ−1
pk = pk−1 a + = pk−1 a 1 + .
k k
Since the latter should be summable in order to obtain a well-defined distribution function we
need to have a < 1. For the negative-binomial distribution we have, see Proposition 2.20,
pk (1 − p)(k + γ − 1) (1 − p)(γ − 1)
= =1−p+ .
pk−1 k k
This is exactly the negative-binomial distribution with parameters a = 1−p and b = (1−p)(γ −1)
and A = N0 . This proves the lemma. 2
The previous lemma shows that the (important) claims count distributions that
we have considered in Chapter 2 are Panjer distributions and the corresponding
w)
choices a, b ∈ R are provided in the proof of Lemma 4.7.
(m
rameters a, b ∈ R and the claim size distribution G is discrete with support N.
Denote gm = P[Y1 = m] for m ∈ N. Then we have for r ∈ N0
def.
p0 for r = 0,
fr = P[S = r] = Pr
k=1 a+ b kr gk fr−k for r > 0.
tes
Remarks.
• The Panjer algorithm requires a Panjer distribution for N and strictly pos-
itive and discrete claim sizes Yi ∈ N, P-a.s. Then it provides an algorithm
that easily allows to calculate the compound distribution without doing the
no
k=1 r
f0 = p0 = e−λv ,
f1 = λvg1 f0 ,
1
f2 = λv g1 f1 + λvg2 f0 ,
2
1 2
f3 = λv g1 f2 + λv g2 f1 + λvg3 f0 ,
3 3
..
.
In order to prove Theorem 4.8 we need a technical lemma, see also Lemma 1.5 in
Schmidli [81].
n
" #
X
Y1 Yi = r = r/n,
)
E
i=1
w
Pn
where, of course, we assume positive probability of the event { i=1 Yi = r}.
(ii) For r ≥ n ≥ 2
(m
r−1
!
k ∗(n−1)
pn gr∗n
X
= a+b gk pn−1 gr−k .
k=1 r
= pn−1 a+b P Y1 = k
r
i=1
k=1
Observe that the last line is exactly the conditional expectation of a + bY1 /r, conditioned on the
Pn
event { i=1 Yi = r} supposed that gr∗n > 0. Therefore, we can apply (i) of this lemma which
provides
" #
r−1 n
X k ∗(n−1) Y1 X b
a+b gk pn−1 gr−k = pn−1 E a + b Yi = r gr∗n = pn−1 a + gr∗n = pn gr∗n ,
r r n
k=1 i=1
where in the last step we have used that (pn )n≥0 is a Panjer distribution. 2
P[S = 0] = P[N = 0] = p0 . Then, for r ≥ 1, using Lemma 4.9 in the third step,
r
X r
X
fr = pn gr∗n = p1 gr + pn gr∗n
n=1 n=2
r Xr−1
X k ∗(n−1)
= p1 gr + a+b gk pn−1 gr−k
n=2 k=1
r
r−1 X r
X k ∗(n−1)
= p1 gr + a+b gk pn−1 gr−k
r n=2
k=1
r−1 r−k r−1
X k X ∗(n)
X k
= p1 gr + a+b gk pn gr−k = p1 gr + a+b gk fr−k ,
r r
w)
k=1 n=1 k=1
∗(n)
where in the second last step we have used that gm = 0 for n > m. Observe p1 gr = p0 (a+b)gr =
f0 (a + b)gr . Therefore the right-hand side is transformed to
r−1 r
X k X k
(m
fr = p0 (a + b)gr + a+b gk fr−k = a+b gk fr−k ,
r r
k=1 k=1
Remarks.
• In practical applications there might occur the situation that the initial value
tes
f0 is nonsensical on the IT systems. This has to do with the fact that we
can represent numbers only up to some precision. Let us explain this using
the compound Poisson distribution providing Panjer algorithm (4.6). If the
expected number of claims λv is very large, then on IT systems the initial
value f0 = p0 = e−λv may be interpreted as zero and thus the algorithm
no
cannot start due to missing precision and meaningful starting value. This is
called numerical underflow.
In this case we can modify the Panjer algorithm as follows: choose any strictly
positive starting value fe0 > 0 and develop the iteration
NL
Observe that this provides a multiplicative shift from fr to fer . The true
probability weights are then found by
n o
fr = exp log fer + log f0 − log fe0 ,
w)
"N # "N # "N #
X X X
P[S = dr] = P Yi = dr = P Yi /d = r = P Ye
i =r ,
i=1 i=1 i=1
(m
with Yei = Yi /d ∈ N.
h i
+
gk+1 = P Y1+ = (k + 1)d = G((k + 1)d) − G(kd), (4.7)
and h i
gk− = P Y1− = kd = G((k + 1)d) − G(kd). (4.8)
This provides the following stochastic ordering
NL
N N N
Y1− ≤ Y1 ≤ Y1+ . S− = Yi− ≤ S = Yi ≤ S + = Yi+ ,
X X X
and
i=1 i=1 i=1
for Yi− being i.i.d. copies of Y1− and Yi+ being i.i.d. copies of Y1+ (also inde-
pendent of N ). Thus, we get lower and upper bounds S − ≤ S ≤ S + which
become more narrow the smaller we choose the span d. In most applications,
especially for small λv, these bounds/approximations are sufficient compared
to the other uncertainties involved in the prediction process (parameter esti-
mation uncertainty, etc.).
To S + we can directly apply the Panjer algorithm, S − is more subtle because
it may happen that g0− > 0 and, thus, the Panjer algorithm cannot be ap-
plied in its classical form. In the case of the compound Poisson distribution
this problem can circumvented quite easily due to the disjoint decomposition
theorem, Theorem 2.14, which says that
N N
− (d)
Yi− Yi− 1{Y − >0} = Se−
X X
S = =
i
i=1 i=1
w)
Of course, there are more sophisticated discretization methods but often our
(rough) proposal is sufficient.
(m
compound Poisson model with expected number of claims λv = 1 and Pareto claim
i.i.d.
size distribution Yi ∼ Pareto(θ, α) with θ = 500 000 and α = 2.5. In a first step we
need to discretize the claim sizes. We calculate the distributions of Yi− ≤ Yi ≤ Yi+
according to (4.7) and (4.8) with
!−α !−α
kd k(d + 1)
gk− = gk+1
+
= G((k + 1)d) − G(kd) = − .
θ θ
tes
no
NL
Figure 4.9: Discretized claim size distributions (gk− )k and (gk+ )k ; lhs: case (i) with
span d = 100 000; rhs: case (ii) with span d = 10 000.
As span size we choose two different values: (i) d = 100 000 and (ii) d = 10 000. In
Figure 4.9 we plot the resulting probability weights (gk− )k and (gk+ )k . We see that
the discretization error disappears for decreasing span d.
We then implement the Panjer algorithm in R. The implementation is rather
straightforward. In a first step we invert the ordering in the claim size distributions
(gk− )k and (gk+ )k so that in the second step we can apply matrix multiplications.
This looks as follows:
w)
The results are presented in Figures 4.10 and 4.11.
(m
tes
no
In Figure 4.10 we plot the resulting probability weights of the (discretized) com-
NL
pound Poisson distribution, the left-hand side gives the picture for span d = 100 000
and the right-hand side for d = 10 000. The observation is that span d = 100 000
gives quite some differences between lower and upper bounds reflected by (gk− )k
and (gk+ )k , for span d = 10 000 they are sufficiently close so that we obtain appro-
priate approximations to the continuous Pareto distribution case. We also observe
that the resulting distribution has two obvious modes, see Figure 4.10 (rhs), these
reflect the cases of having one claim N = 1 and having N = 2 claims, the cases
N ≥ 3 only give smaller discontinuities.
Finally, in Figure 4.11 we show the log-log plots of the distribution functions. The
straight blue line reflects the Pareto distribution Y1 ∼ Pareto(θ, α), i.e. of having
exactly one claim with tail parameter α = 2.5 (which corresponds to the negative
slope of the blue line). We observe that asymptotically the compound Poisson
distribution with λv = 1 coincides with the Pareto claim size distribution.
w)
(m
Figure 4.11: Log-log plot of compound Poisson distribution with λv = 1 from
Panjer algorithm; lhs: case (i) with span d = 100 000; rhs: case (ii) with span
d = 10 000.
Example 4.11. We revisit case (c) of Example 4.2, that is, for large claims Slc we
assume a compound Poisson distribution with expected number of claims λlc v = 3.9
tes
and Pareto(θ, α) claim size distribution with θ = 500 000 and α = 2.5. We choose
the same discretizations as in Example 4.10, see Figure 4.9, and then we apply the
Panjer algorithm to the large claims layer as explained above. The results for the
distribution of Slc are presented in Figures 4.12 and 4.13.
no
NL
w)
(m
Figure 4.13: Log-log plot of compound Poisson distribution with λlc v = 3.9 from
Panjer algorithm; lhs: case (i) with span d = 100 000; rhs: case (ii) with span
d = 10 000.
The results are very much in-line with the ones of Example 4.10 and we should go
for span d = 10 000 which gives a sufficiently good approximation to the continu-
tes
ous Pareto claims size distribution. Observe that due to λlc v = 3.9 the resulting
compound Poisson distribution has more modes now, see Figure 4.12 (rhs). In
Figure 4.13 we see that the asymptotic behavior is sandwiched between the Pareto
distribution Pareto(θ, α) with tail parameter α = 2.5 and this Pareto distribution
no
stretched with the expected number of claims λlc v = 3.9 (blue lines in Figure 4.13).
We also observe a very slow convergence to the asymptotic slope −α which tells us
that parameter estimation is a difficult task if only few observations are available.
Finally, we merge the large claims layer Slc of case (c) in Example 4.2 with the
corresponding small claims layer Ssc , see case (b) of Example 4.2. In the small
claims layer we choose a translated gamma distribution as approximation to the
NL
where Xsc is the translated gamma approximation to Ssc (see Example 2.16 and
(4.2)) and Slc models the large claims layer having a compound Poisson distribution
with Pareto claim sizes as described above.
In order to calculate the compound Poisson random variable Slc we apply the Panjer
algorithm with span d = 10 000. The disjoint decomposition theorem, see Theorem
2.14 and Example 2.16, implies that in the compound Poisson case we may and
will assume that the large claims separation leads to an independent decoupling
of Ssc and Slc , and Xsc and Slc , respectively, see (4.9). Therefore, the aggregate
distribution of Xsc + Slc is obtained by a simple convolution of the marginal distri-
w)
(m
Figure 4.14: Case (c) of Example 4.2: exact discretized distribution Xsc + Slc for
span d = 10 000, Monte Carlo approximation and normal approximation (only rhs).
lhs: discrete probability weights (upper and lower bounds); rhs: log-log plot (see
also Figure 4.3 (rhs)).
butions of Xsc and Slc . Using a further discretization of the distribution function of
tes
Xsc to the same span d = 10 000 as in the Panjer algorithm for Slc , the convolution
of Xsc + Slc can easily be calculated analytically, i.e. no Monte Carlo simulation
(1)
is needed. Namely, denote the discrete probability weights of Xsc by (fk )k≥0 and
(2)
the discrete probability weights of Slc by (fk )k≥0 , i.e. set
no
(1) (2)
P [Xsc = kd] = fk and P [Slc = kd] = fk .
The results are presented in Figure 4.14. On the left-hand side we present the
probability weights (fr )r≥0 and on the right-hand side the log-log plot of the re-
sulting distribution function. We observe that the Monte Carlo approximation
(100’000 simulations) has bad properties in the tail of the distribution, see Figure
4.14 (rhs), and one should avoid the simulation approach if possible. Especially,
for heavy-tailed distribution functions the Monte Carlo simulation approach has
a weak speed of convergence performance. Note that convolution (4.10) is exact
up to the discretization error, and in some sense this discretized version can be
interpreted as an optimal Monte Carlo sample with equidistant observations.
w)
99.5%-VaR lower bound 40 0380 500
99.5%-VaR − E[S] ≈ 9120 500
Table 4.2: Resulting key figures, the 99.5%-VaR corresponds to the 99.5%-quantile
(m
of S, see Example 6.25, below. The 99.5%-VaR is calculated with the discretized
version with span d = 10 000, therefore we obtain upper and lower bounds resulting
from the discretization error.
Finally, in Table 4.2 we present the resulting key figures. We observe that the
resulting distribution is substantially more heavy-tailed than the Gaussian distri-
tes
bution which is not surprising in view of Figure 4.14 (rhs).
We only briefly sketch the fast Fourier transform (FFT) in order to explain the
main idea. Therefore, we follow Section 6.7 in Panjer [74].
In Chapter 1 we have introduced the Laplace-Stieltjes transform of X ∼ F given
by h i Z
cF (r) = MX (−r) = E e−rX =
m e−rx dF (x).
R
NL
The beauty of such transforms is that they allow for dealing elegantly with inde-
pendent random variables, in the sense that convolutions turn into products, i.e. for
X and Y independent we have
MX+Y (−r) = MX (−r)MY (−r).
Moreover, for compound distributed random variables S we have, see Proposition
2.2,
MS (−r) = MN (log MY1 (−r)). (4.11)
If we manage to identify the right-hand side of the latter equation, that is, find Z
such that MN (log MY1 (−r)) = MZ (−r), then Lemma 1.2 explains that S and Z
have the same distribution function and we do not need to perform the convolutions
(if Z is sufficiently explicit). This is also the idea behind the FFT.
Assume we have finite support A = {0, . . . , n−1} and that (fl )l∈A
is a discrete distribution function on A. The discrete Fourier J.B.J. Fourier
transform of (fl )l is defined by
n−1
( )
zl
fˆz =
X
fl exp 2πi for z ∈ A.
w)
l=0 n
Assume S ∼ (fl )l , then we have, by a slight abuse of notation,
z zS
fˆz = MS 2πi = E exp 2πi .
(m
n n
The discrete Fourier transform has the following nice inversion formula
1 n−1
( )
zl
fˆz exp −2πi
X
fl = for l ∈ A.
n z=0 n
This now provides the idea to the first part of the algorithm: if we are able to ex-
tes
plicitly calculate the discrete Fourier transform (fˆz )z the inversion formula provides
the probability weights (fl )l . Note that this idea also applies if (fl )l are weights
that do not necessarily add up to 1. This gives the following recipe.
• Step 2. Discretize the claim severity distribution G to obtain weights (gk )k∈A ,
for discretization we refer to the last section on the Panjer algorithm. Note
P
that typically we have k∈A gk < 1, because claims Yi may exceed threshold
n − 1 with positive probability.
NL
• Step 3. Calculate the discrete Fourier transform (fˆz )z∈A of S ∼ (fl )l∈A using
identity (4.11) with −r = 2πiz/n. Note that k∈A gk < 1 does not harm the
P
calculation since it will simply cancel out in the next step, because there is
only a scaling factor missing.
• Step 4. Apply the inversion formula to obtain (fl )l∈A from (fˆz )z∈A .
The remaining part now is the FFT which explains how we calculate the discrete
Fourier transform (ĝz )z∈A of Y1 ∼ (gl )l∈A which is needed to apply identity (4.11)
for −r = 2πiz/n, i.e. for
z z
fˆz = MS 2πi = MN log MY1 2πi = MN (log ĝz ) .
n n
There is a nice recursive algorithm that allows to calculate these discrete Fourier
transforms for the choices n = 2d , d ∈ N0 .
d −1
2X
( )
zl
ĝz = gl exp 2πi d
l=0 2
2d−1
X−1 2d−1
X−1
( ) ( )
2zl z(2l + 1)
= g2l exp 2πi d + g2l+1 exp 2πi
l=0 2 l=0 2d
2d−1
X−1 2d−1
X−1
( ) ( )
zl z zl
= g2l exp 2πi + exp 2πi d g2l+1 exp 2πi
2d−1 2 2d−1
w)
l=0 l=0
z
= gbz(0) + exp 2πi d
gbz(1) ,
2
(0)
where gbz(0) is the discrete Fourier transform of (gl )l=0,...,m−1 = (g2l )l=0,...,m−1 and
(1)
(m
gbz(1) is the discrete Fourier transform of (gl )l=0,...,m−1 = (g2r+1 )l=0,...,m−1 for m =
2d−1 . This can now be iterated until we have reduced the total length 2d to 20 = 1.
Observe that the total length of (fˆz )z is also n = 2d . Therefore, the exactly same
recursive algorithm can also be applied for the calculation of the inversion formula
to obtain (fl )l .
tes
no
NL
w)
Ruin theory has its origin in the early twentieth century when
(m
Ernst Filip Oskar Lundberg (1876-1965) [62] wrote his fa-
mous Uppsala PhD thesis in 1903. It was later the distinguished
Swedish mathematician and actuary Harald Cramér (1893-
1985) [28, 29] who developed the cornerstones in collective risk
and ruin theory and has made many of Lundberg’s ideas mathe-
matically rigorous. Therefore, the underlying process studied in
tes
ruin theory is called Cramér-Lundberg process. For the collected H. Cramér
work of Cramér we refer to [30]. Since then a vast literature has
developed in this field, important contributions are Feller [42], Bühlmann [19],
Rolski et al. [79], Asmussen-Albrecher [7], Dickson [34], Kaas et al. [57] and many
scientific papers by Hans-Ulrich Gerber and Elias S.W. Shiu. Therefore,
no
this theory is sometimes also called Gerber-Shiu risk theory, see Kyprianou [59].
Because it is not our intention to write another textbook
on ruin theory we keep this chapter rather short and
only give some key results. In particular, we investigate
NL
115
116 Chapter 5. Ruin Theory in Discrete Time
for initial capital C0 = c0 ≥ 0 at time 0 and an i.i.d. sequence (πt , St )t∈N with:
w)
• πt and St are independent for all t ∈ N.
(m
necessary but it may simplify calculations.
The surplus process (Ct )t∈N0 models the equity or the net
asset value process of an insurance company which starts
with (deterministic) initial capital C0 = c0 ≥ 0, collects
every year a premium πt and pays for the corresponding
(non-negative) claim St . At the first sight it looks artificial
tes
to model the premium πt stochastically. The reason there-
fore is that some results in ruin theory are derived under
randomized premia. The ultimate goal is to achieve
Ct ≥ 0 for all t ≥ 0,
no
otherwise the company cannot fulfill its liabilities at any point in time t ∈ N0 . In
the present set-up we look at a homogeneous surplus process (having independent
and stationary increments Xt = πt − St ). Moreover, no financial return on assets is
considered. Of course, this is a rather synthetic situation. For the present purpose
NL
it is sufficient because it already highlights crucial issues and it will be refined for
solvency considerations in Chapter 10.
Definition 5.2 (ruin time and finite horizon ruin probability). We define the ruin
time τ of the surplus process (Ct )t∈N0 by
τ = inf {s ∈ N0 ; Cs < 0} ≤ ∞.
The finite horizon ruin probability up to time t ∈ N and for initial capital c0 ≥ 0
is defined by
ψt (c0 ) = P [ τ ≤ t| C0 = c0 ] = P inf C (c0 ) <0 .
s=0,...,t s
Remark on the notation. Below we use that for c0 = 0 the stochastic process
(0)
(Ct )t∈N0 = (Ct )t∈N0 is a random walk on the probability space (Ω, F, P) starting
(c ) (0)
at zero. The general surplus process can then be described by (Ct 0 )t∈N0 = (Ct +
c0 )t∈N0 under P and, as stated in Definition 5.2, we can indicate the initial capital
by using the notation P[·|C0 = c0 ]. In Markov process theory it has naturalized
that the latter is written as Pc0 [·] meaning that (Ct )t∈N0 under Pc0 is equal in law
(0)
to (Ct + c0 )t∈N0 under P.
w)
and therefore τ is a stopping time w.r.t. the filtration generated by (Ct )t∈N0 . To
consider the limiting case t → ∞ we need to extend the positive real line by an
additional point {∞} because τ is not necessarily finite, P-a.s. We use the notation
R+ for the extended positive real line [0, ∞].
(m
The finite horizon ruin probability ψt (c0 ) is non-decreasing in t → ∞ and it is
bounded by 1 (because it is a probability). This immediately implies convergence
for t → ∞ and
Lemma 5.3 (ultimate ruin probability). The ultimate ruin probability for initial
no
capital c0 ≥ 0 is given by
ψ(c0 ) = Pc0 [τ < ∞] = Pc0 inf Ct < 0 ∈ [0, 1].
t∈N0
Proof. The second equality is a direct consequence of the definition, note that
NL
[ [ [ [
{τ < ∞} = {τ ≤ t} = {Cs < 0} = {Ct < 0} = inf Ct < 0 .
t∈N0
t∈N0 t∈N0 s=0,...,t t∈N0
For the first equality we use the monotone convergence property of probability measures, note
{τ ≤ t} ⊂ {τ ≤ t + 1},
" #
[
Pc0 [τ < ∞] = Pc0 {τ ≤ t} = lim Pc0 [τ ≤ t] = lim ψt (c0 ) = ψ(c0 ).
t→∞ t→∞
t∈N0
Theorem 5.4 (random walk theorem). Assume Xt are i.i.d. with P[X1 = 0] <
1 and E[|X1 |] < ∞. The random walk (Zt )t∈N0 defined in (5.2) has one of the
following three behaviors
w)
• if E[X1 ] < 0 then limt→∞ Zt = −∞, P-a.s.;
(m
Proof. See, e.g., Proposition 7.2.3 in Resnick [77]. 2
Proof. The random walk theorem implies for E[X1 ] = E[π1 ]−E[S1 ] ≤
0 that lim inf t→∞ Zt = −∞, P-a.s., and thus lim inf t→∞ Ct = −∞, Pc0 -a.s (for any c0 ≥ 0). But
this means that we have ultimate ruin with probability 1. 2
Henceforth, for avoiding ultimate ruin with positive probability we need to charge
NL
an (expected) annual premium E[π1 ] which exceeds the expected annual claim
E[S1 ]. This gives rise to the following standard assumption.
Assumption 5.6 (net profit condition). The surplus process satisfies the net profit
condition (NPC) given by
E[π1 ] > E[S1 ].
Corollary 5.7. Assume that E[π1 ] > E[S1 ], then ψ(0) < 1.
Proof. The assumption E[π1 ] > E[S1 ] implies E[X1 ] > 0 and, thus, limt→∞ Zt = ∞, P-a.s. This
implies that P[lim inf t→∞ Zt = −∞] = 0. The latter is equivalent to P[inf t∈N0 Zt ≥ 0] > 0, see
for instance Proposition 7.2.1 in Resnick [77]. But then the proof follows. 2
Our next goal is to find more explicit bounds on the ruin probability as a function
of the initial capital c0 ≥ 0.
w)
5.2 Lundberg bound
We start with a lemma which gives the renewal property of the surplus process.
(m
i.i.d.
We define the distribution function F by S1 − π1 ∼ F . Thus, we have −Xt ∼ F .
Note that from S1 ∼ FS , −π1 ∼ F−π and independence of S1 and π1 it follows
F = FS ∗ F−π .
Lemma 5.8. The finite horizon ruin probability and the ultimate ruin probability
tes
satisfy the following equations for t ∈ N0 and initial capital c0 ≥ 0
Z c0
ψt+1 (c0 ) = 1 − F (c0 ) + ψt (c0 − y) dF (y),
−∞
Z c0
ψ(c0 ) = 1 − F (c0 ) + ψ(c0 − y) dF (y).
−∞
no
Proof. We start with the finite horizon ruin probability. Observe that we have disjoint decom-
position for c0 ≥ 0
The ultimate ruin probability statement is a direct consequence of the finite horizon statement.
Using that we have point-wise convergence (5.1) and that ψt is bounded by 1 which is integrable
w.r.t. dF we can apply the dominated convergence theorem to the finite horizon ruin probability
statement which provides the claim for the ultimate ruin probability as t → ∞. 2
Lemma 5.10 (uniqueness of Lundberg coefficient). Assume that (NPC) holds and
that a Lundberg coefficient R > 0 exists. Then, R is unique.
w )
(m
tes
no
Proof. Due to the existence of a Lundberg coefficient R > 0 and due to the independence
between S1 and π1 the following function is well-defined for all r ∈ [0, R] and satisfies
NL
r 7→ h(r) = log MS1 −π1 (r) = log(MS1 (r) M−π1 (r)) = log E erS1 + log E e−rπ1 .
Similar to Lemma 1.6 we see that h(r) is a convex function on [0, R] with h(0) = 0 and h0 (0) =
E[S1 − π1 ] < 0 under (NPC). But then there is at most one R > 0 with h(R) = 0. This proves
the uniqueness of the Lundberg coefficient. 2
Theorem 5.11 (Lundberg’s exponential bound). Assume (NPC) and R > 0 exists.
Proof. It suffices to prove that ψt (c0 ) ≤ e−Rc0 for all t ∈ N because ψt (c0 ) ↑ ψ(c0 ) for t → ∞.
We apply Lemma 5.8 to the finite horizon ruin probability ψt (c0 ) to obtain the following proof
by induction.
t → t + 1: we assume that the claim holds true for ψt (c0 ). Then with Lemma 5.8
Z ∞ Z c0
ψt+1 (c0 ) = dF (y) + ψt (c0 − y) dF (y)
c0 −∞
Z ∞ Z c0
≤ e−R(c0 −y) dF (y) + e−R(c0 −y) dF (y)
c0 −∞
)
−Rc0 −Rc0
= e MS1 −π1 (R) = e ,
w
due to the choice of the Lundberg coefficient R > 0. This proves the Lundberg bound. 2
(m
• Under (NPC) and the existence of the Lundberg coefficient
R > 0 we have an exponentially decaying bound on the ulti-
mate ruin probability as initial capital c0 → ∞, i.e.
ψ(c0 ) ≤ e−Rc0 .
tes
Set ε > 0 (small). There exists c0 = c0 (R, ε) ≥ 0 such that
ψ(c0 ) ≤ ε. This means that in the Lundberg case we can
E.F.O.
specify a maximal admissible ruin probability ε as tolerance
Lundberg
and then we can choose an appropriate initial capital c0 which
no
implies that the ultimate ruin probability ψ(c0 ) is bounded by this tolerance.
• The existence of the Lundberg coefficient R > 0 implies that MS1 (R) < ∞
and, using Chebychev’s inequality,
h i
P[S1 > x] = P eRS1 > eRx ≤ e−Rx MS1 (R) ∼ e−Rx as x → ∞.
NL
This means that the claims S1 have exponentially decaying tails which are
so-called light tailed claims.
A main question is whether this exponential bound can be improved in the case
where the Lundberg coefficient exists. The difficulty in most cases is that the ulti-
mate ruin probability cannot be calculated explicitly. An exception is the Bernoulli
case.
Proposition 5.12 (Bernoulli random walk). Assume that Xt are i.i.d. with P[Xt =
1] = p and P[Xt = −1] = 1 − p for given p > 1/2. For all c0 ∈ N we have
!c0 +1
1−p
ψ(c0 ) = .
p
Note that this model is obtained by assuming πt ≡ 1 and St ∈ {0, 2} with proba-
bility p having a zero claim.
Proof. We choose a finite interval (−1, a) for a ∈ N and define for fixed c0 ∈ [0, a) ∩ N0 the
stopping time
τa = inf {s ∈ N0 ; Cs = c0 + Zs ∈
/ (−1, a)} .
The random walk theorem implies τa < ∞, P-a.s., because the interval (−1, a) is finite. We define
the random variable c +Z C
1−p 0 t
1−p t
Yt = = .
p p
w)
It satisfies
" c0 +Zt−1 +Xt # " Xt #
1−p 1−p
E [ Yt | Yt−1 ] = Yt−1 = Yt−1 E Yt−1
E
p p
" −1 #
1−p 1−p
= Yt−1 (1 − p) +p = Yt−1 ,
(m
p p
thus (Yt )t≥0 is a martingale. Note that also the stopped process (Yτa ∧t )t≥0 is a martingale.
Moreover, the latter martingale is bounded and since the stopping time is finite, P-a.s., we can
apply the stopping theorem, see Section 10.10 in Williams [85], which provides
c
1−p 0
= E[Y0 ] = E[Yτa ]
p
tes
−1 a
1−p 1−p
= Pc0 [Cτa = −1] + Pc0 [Cτa = a]
p p
−1 a
1−p 1−p
= Pc0 [Cτa = −1] + (1 − Pc0 [Cτa = −1]) ,
p p
no
where the last step follows because (Ct )t∈N0 leaves the interval (−1, a), Pc0 -a.s., either at −1 or
at a. This provides the identity
c0 a
1−p 1−p
p − p
Pc0 [Cτa = −1] = −1 a .
1−p 1−p
p − p
NL
That is, the Lundberg bound is optimal in the sense that we cannot improve the
exponential order of decay because R is already maximal.
In most cases we cannot explicitly calculate the ultimate ruin probability ψ(c0 ).
Exceptions are the Bernoulli random walk of Proposition 5.12 and the Cramér-
Lundberg process in continuous time with exponential claim size distribution, see
(5.3.8) in Rolski et al. [79]. In the other cases where the Lundberg coefficient
exists we apply Lundberg’s exponential bound of Theorem 5.11, or refined versions
thereof. But the following question remains: what can we do if the Lundberg
w)
coefficient does not exist, i.e. if the tail probability of St does not necessarily decay
exponentially?
(m
We assume (NPC) throughout this section, thus we know
that Ct → ∞, Pc0 -a.s., and ψ(0) < 1. Under these as-
sumptions we can study the (local) minima of the surplus
process. This study is done by looking at the ladder heights
that define these minima. We follow Bühlmann [18], Sec-
tes
tion 6.2.6, Feller [42], Chapter XII, and Rolski et al. [79],
Chapter 6. We define the stopping times ν0 = 0 and for
k∈N
n o
inf t > νk−1 ; Zt < Zνk−1 if νk−1 < ∞,
νk =
no
∞ otherwise.
νk is called the k-th strong descending ladder epoch, see (6.3.6) in Rolski et al. [79].
These stopping times form an increasing sequence that record the arrivals of new
ladder heights (descending records). For their distribution functions we have under
the i.i.d. properties of the Xt ’s (independent and stationary increments)
NL
h n o i
P [ νk < ∞| νk−1 < ∞] = P inf t > νk−1 ; Zt < Zνk−1 < ∞ νk−1 < ∞
The probability of a finite ladder epoch is exactly equal to the ultimate ruin prob-
ability ψ(0) with initial capital c0 = 0.
Note that we could have πt − St ≥ 0, P-a.s., which would imply that the ultimate
ruin probability ψ(0) = 0 because the premium collected is bigger than the max-
imal claim. We exclude this situation as it is not interesting for ruin probability
considerations and because the insured will (hopefully) never pay a premium that
exceeds his maximal possible loss. Henceforth, under (NPC) we throughout assume
that ψ(0) ∈ (0, 1) (where the upper bound follows from (NPC)).
K + = sup {k ∈ N0 ; νk < ∞} .
K + counts the total number of finite ladder epochs, i.e. the total number of strong
descending records. We have (applying the tower property several times)
h i
P K + = k = P [νk < ∞, νk+1 = ∞] = ψ(0)k (1 − ψ(0)),
that is, the total number of finite ladder epochs has a geometric distribution with
success probability 1 − ψ(0) ∈ (0, 1) under (NPC). On the set {K + = k}, k ≥ 1,
w)
we study the ladder heights which are for l ≤ k given by
The random variable Zl+ measures by which amount the old local minima Zνl−1 is
(m
improved. Due to the i.i.d. property of the Xt ’s, we have
" k # k k
h i
{Zl+ xl } K + P Zl+ ≤ xl νl < ∞ =
\ Y Y
≤ =k = H(xl ), (5.3)
P
l=1 l=1 l=1
where the distribution function H neither depends on k nor on l. Thus, the ladder
heights (Zl+ )l=1,...,k are i.i.d. on the set {K + = k}. Finally, we consider the maximal
tes
height achieved by (−Zt )t∈N0 , this is the global minimum of the random walk
(Zt )t∈N0 ,
K +
l=1 t∈N0
This now allows to study the ultimate ruin probability as follows. Choose initial
capital c0 ≥ 0. The ultimate ruin probability is given by
ψ(c0 ) = Pc0 inf Ct < 0 = Pc0 inf Ct − c0 < −c0 = P inf Zt < −c0
t∈N0 t∈N0 t∈N
0
NL
h i K+
P K+ = k P Zl+ >
X X +
= P [M > c0 ] = c0 K = k
k∈N0 l=1
K+
ψ(0)k 1 − P Zl+ ≤ c0 K +
X X
= (1 − ψ(0)) = k
k∈N l=1
ψ(0)k 1 − H ∗k (c0 ) .
X
= (1 − ψ(0))
k∈N
This proves Spitzer’s formula, which is Corollary 6.3.1 in Rolski et al. [79]:
Theorem 5.13 (Spitzer’s formula). Assume ψ(0) ∈ (0, 1). Then
ψ(0)k 1 − H ∗k (c0 ) .
X
ψ(c0 ) = (1 − ψ(0))
k∈N
w)
In classical (continuous time) ruin theory one starts with a homogeneous Poisson
point process (Nf)
t t∈R+ having constant intensity λv > 0 for the arrival of claims.
The premium income is modeled proportionally to time with constant premium
rate β > 0. The continuous time surplus process is then defined by Ce0 = c0 ≥ 0
(m
and for t > 0
N
et
X
Cet = c0 + βt − Su , (5.4)
u=1
with i.i.d. claim amounts St satisfying St > 0, P-a.s., and with these claim amounts
being independent of the claims arrival process (N f)
t t∈R+ . This continuous time
surplus process (Ct )t∈R+ is called Cramér-Lundberg process. Definition 5.2 of the
tes
e
ruin time is then extended to continuous time, namely
n o
τe = inf s ∈ R+ ; Ces < 0 ≤ ∞.
Note that ruin can only occur at time points where claims happen, otherwise the
no
continuous time surplus process (Cet )t∈R+ is strictly increasing with constant slope
β > 0 (in fact, the continuous time surplus process is a spectrally negative Lévy
process, see Chapter 1 in Kyprianou [59]). We define the inter-arrival times between
two claims by Wu , u ∈ N. For the homogeneous Poisson point process (N f)
t t∈R+
these inter-arrival times are i.i.d. exponentially distributed with parameter λv > 0.
Therefore, we can rewrite the continuous time surplus process in these claims arrival
NL
N
eVn n
def. X X
Cn = CeVn = c0 + βVn − Su = c0 + (βWu − Su ).
u=1 u=1
This is exactly the set-up of Definition 5.1 with i.i.d. premia πt = βWt , t ∈ N.
The only thing that has changed is time, moving from t ∈ R+ to operational time
n ∈ N0 , and therefore
h i
P τe < ∞| Ce0 = c0 = Pc0 [τ < ∞] = ψ(c0 ), (5.5)
with πt = βWt . For (NPC) we require premium rate β > 0 such that
0 < E[X1 ] = βE[W1 ] − E[S1 ] = β/(λv) − E[S1 ] =⇒ β > λvE[S1 ].
w)
ization, for details we refer to Theorem 6.4.4 in Rolski et al. [79].
Note that H is a distribution function on R+ because 0∞ P[S1 > y] dy = E[S1 ].
R
This then allows to state the following theorem which gives the Félix Pollaczek
(1892-1981) and Aleksandr Yakovlevich Khinchin (1894-1959) formula.
(m
Theorem 5.14 (Pollaczek-Khinchin formula). Assume we have the compound
Poisson model (5.4) with (NPC) given by ρ = λvE[S1 ]/β ∈ (0, 1). The ultimate
ruin probability for initial capital c0 ≥ 0 is given by
ρk 1 − H ∗k (c0 ) ,
X
ψ(c0 ) = (1 − ρ)
tes
k∈N
Z ∞ Z c0
λv
ψ(c0 ) = (1 − FS (x))dx + ψ(c0 − x)(1 − FS (x))dx ,
β c0 0
A.Y. Khinchin
with distribution function S1 ∼ FS . We do not prove this state-
ment because the Pollaczek-Khinchin formula is sufficient for
our purposes. The exact assumptions and a proof of this integral equation is, for
instance, provided in Rolski et al. [79], Theorem 5.3.2.
We conclude that for the compound Poisson case (5.4) we have three different
descriptions for the ultimate ruin probability: (i) probabilistic description, (ii)
Pollaczek-Khinchin formula from renewal theory, and (iii) the integral equation.
Depending on the problem one then chooses the most appropriate one, i.e. we can
apply different techniques coming from different fields to solve the questions.
)
Lemma 5.15 (subexponential distribution functions). Assume F is subexponential
then the following statements hold true:
w
1. For all n ∈ N
1 − F ∗n (x)
lim = n.
(m
x→∞1 − F (x)
In fact, this is an if and only if statement.
3. For all ε > 0 there exists D < ∞ such that for all n ≥ 2 and all x ≥ 0
tes
1 − F ∗n (x)
≤ D(1 + ε)n .
1 − F (x)
Proof of Lemma 5.15. We start with the following statement for subexponential distribution
functions F : for all t ∈ R
no
1 − F (x − t)
lim = 1. (5.7)
x→∞ 1 − F (x)
We first prove (5.7). Choose t ≥ 0, then we have for x > t, using monotonicity of F ,
Z x
1 − F ∗2 (x) F (x) − F ∗2 (x) 1 − F (x − y)
−1 = = dF (y)
1 − F (x) 1 − F (x) 0 1 − F (x)
NL
Z t Z x
1 − F (x − y) 1 − F (x − y)
= dF (y) + dF (y)
0 1 − F (x) t 1 − F (x)
1 − F (x − t)
≥ F (t) + (F (x) − F (t)) .
1 − F (x)
This implies (the sandwich is for lim inf x→∞ ≤ lim supx→∞ )
1 − F ∗2 (x)
1 − F (x − t) −1
1 ≤ lim ≤ lim sup (F (x) − F (t)) − 1 − F (t) = 1.
x→∞ 1 − F (x) x→∞ 1 − F (x)
For t < 0 note that
1 − F (x − t) 1 1
lim = lim 1−F (x)
= lim = 1.
x→∞ 1 − F (x) x→∞ y→∞ 1−F (y−(−t))
1−F (x−t) 1−F (y)
)
Z x
1 − F ∗(n+1) (x) F (x) − F ∗(n+1) (x) 1 − F ∗n (x − y)
−1= = dF (y)
w
1 − F (x) 1 − F (x) 0 1 − F (x)
Z x−x0 Z x
1 − F ∗n (x − y) 1 − F (x − y) 1 − F ∗n (x − y)
= dF (y) + dF (y).
0 1 − F (x − y) 1 − F (x) x−x0 1 − F (x)
(m
The second integral is non-negative and using (5.7) we obtain
Z x Z x
1 − F ∗n (x − y) 1
lim sup dF (y) ≤ lim sup dF (y)
x→∞ x−x0 1 − F (x) x→∞ x−x0 1 − F (x)
F (x) − F (x − x0 ) 1 − F (x − x0 )
= lim sup = − 1 + lim sup = 0.
x→∞ 1 − F (x) x→∞ 1 − F (x)
For the first integral we have for x > x0 , using the triangle inequality,
tes
Z x−x0 Z x−x0
1 − F ∗n (x − y) 1 − F (x − y)
1 − F (x − y)
dF (y) − n ≤
dF (y) − 1 n
0 1 − F (x − y) 1 − F (x) 0 1 − F (x)
Z x−x0 ∗n
1 − F (x − y) 1 − F (x − y)
+ −n dF (y)
0 1 − F (x − y) 1 − F (x)
no
Z x−x0 Z x−x0
1 − F (x − y) 1 − F (x − y)
≤ n dF (y) − 1 + ε dF (y).
0 1 − F (x) 0 1 − F (x)
Finally observe
Z x−x0 Z x Z x
1 − F (x − y) 1 − F (x − y) 1 − F (x − y)
dF (y) = dF (y) − dF (y),
0 1 − F (x) 0 1 − F (x) x−x0 1 − F (x)
NL
the first integral converges to 1, see (5.8), and the second integral converges to 0 because it is
non-negative with
Z x Z x
1 − F (x − y) 1
lim sup dF (y) ≤ lim sup dF (y)
x→∞ x−x0 1 − F (x) x→∞ x−x0 1 − F (x)
F (x) − F (x − x0 ) 1 − F (x − x0 )
= lim sup = − 1 + lim sup = 0.
x→∞ 1 − F (x) x→∞ 1 − F (x)
This proves that for all ε > 0 there exists x1 ≥ x0 such that for all x > x1 we have
1 − F ∗(n+1) (x)
1 − F (x) − (n + 1) ≤ 4ε.
This proves the first statement of Lemma 5.15. We now turn to the second statement of the
lemma. Note that for 0 < y < x
1 − F (x)
erx (1 − F (x)) = (1 − F (x − y))er(x−y) ery .
1 − F (x − y)
1
Choose ε > 0 and y > r log(3/(1 − ε)) > 0. With (5.7) there exists x0 such that for all x > x0
This implies that the function is strictly increasing with limit +∞. So there remains the proof
of the last statement of Lemma 5.15. Define αn = supx≥0 (1 − F ∗n (x))/(1 − F (x)). Note that
the first assertion of the lemma implies that αn < ∞. Moreover, we have 1 − F ∗(n+1) (x) =
1 − F ∗ F ∗n (x) = 1 − F (x) + F ∗ (1 − F ∗n (x)). This implies for any x0 ∈ (0, ∞)
1 − F (x) + F ∗ (1 − F ∗n (x))
αn+1 = sup
x≥0 1 − F (x)
Z x Z x
1 − F ∗n (x − y) 1 − F ∗n (x − y)
= 1 + sup dF (y) + sup dF (y)
w)
0≤x≤x0 0 1 − F (x) x>x0 0 1 − F (x)
Z x
1 1 − F ∗n (x − y) 1 − F (x − y)
≤ 1+ + sup dF (y)
1 − F (x0 ) x>x0 0 1 − F (x − y) 1 − F (x)
Z x
1 1 − F (x − y)
≤ 1+ + αn sup dF (y)
1 − F (x0 ) 1 − F (x)
(m
x>x0 0
1 − F ∗2 (x)
1
= 1+ + αn sup −1 ,
1 − F (x0 ) x>x0 1 − F (x)
where we have used (5.9) in the last step. The subexponentiality of F implies that for all ε > 0
there exists x0 such that
1
αn+1 ≤ 1 + + αn (1 + ε).
1 − F (x0 )
tes
Iteration provides
1 1
αn+1 ≤ 1+ + 1+ + αn−1 (1 + ε) (1 + ε)
1 − F (x0 ) 1 − F (x0 )
n−1
1 X
≤ 1+ (1 + ε)k + (1 + ε)n
1 − F (x0 )
no
k=0
X n
1 1 1
≤ 1+ (1 + ε)k ≤ 1+ (1 + ε)n+1 ,
1 − F (x0 ) 1 − F (x0 ) ε
k=0
which proves the claim for D = (1 + (1 − F (x0 ))−1 )/ε ∈ (0, ∞). This proves Lemma 5.15. 2
NL
We conclude that for any r > 0 the moment generating function of subexponential
distributions does not exist, and therefore there is no Lundberg coefficient in this
case. We call such subexponential distributions heavy tailed distributions.
Theorem 2.5.5 in Rolski et al. [79] gives an important sufficient condition for having
a subexponential distribution.
then F is subexponential.
w)
Proof. Assume that X1 and X2 are two i.i.d. random variables with regularly varying survival
functions with parameter α > 0. Note that we have for all ε ∈ (0, 1)
{X1 + X2 > x} ⊂ {X1 > (1 − ε)x} ∪ {X2 > (1 − ε)x} ∪ {X1 > εx, X2 > εx}.
(m
The i.i.d. property implies
Thus, we have
1 − F ∗2 (x) 2(1 − F ((1 − ε)x) + (1 − F (εx))2
lim sup ≤ inf lim sup
x→∞ 1 − F (x) ε∈(0,1) x→∞ 1 − F (x)
inf 2(1 − ε)−α = 2.
tes
=
ε∈(0,1)
On the other hand we have for any positively supported distribution function F , see also (5.9),
Z x Z x
1 − F ∗2 (x) 1 − F (x − y)
=1+ dF (y) ≥ 1 + dF (y) = 1 + F (x),
1 − F (x) 0 1 − F (x) 0
no
Remarks 5.17. Lemma 5.16 gives the connection to classical extreme value theory.
In extreme value theory one distinguishes three different domains of attraction for
tail behavior, see Section 3.3 in Embrechts et al. [36]: (i) Weibull case, which are
distribution functions with finite right endpoint of their support; (ii) Gumbel case,
which are light tailed to moderately heavy tailed distribution functions; (iii) Fréchet
case, which are heavy tailed distribution functions. The Fréchet case is exactly
characterized by regularly varying survival functions with (tail) index α > 0, see
Theorem 3.3.7 in Embrechts et al. [36]. This index has already been met in Section
3.2, see formula (3.3). Lemma 5.16 now says that every distribution function that
belongs to the Fréchet domain of attraction is also subexponential. However, the
class of subexponential distribution functions is larger than the class of distribution
functions with regularly varying survival functions, see Example 1.4.3 in Embrechts
et al. [36].
We apply the Pollaczek-Khinchin formula, see Theorem 5.14, to obtain the follow-
ing result in the subexponential case.
ψ(c0 ) ρ
lim = .
w)
c0 →∞ 1 − H(c0 ) 1−ρ
(m
ψ(c0 ) X 1 − H ∗k (c0 )
lim = (1 − ρ) lim ρk .
c0 →∞ 1 − H(c0 ) c0 →∞ 1 − H(c0 )
k∈N
because ρ(1 + ε) < 1. Thus, we have found a uniform integrable upper bound and this allows to
exchange the two limits. This provides
ψ(c0 ) X 1 − H ∗k (c0 ) X
lim = (1 − ρ) ρk lim = (1 − ρ) ρk k.
c0 →∞ 1 − H(c0 ) c0 →∞ 1 − H(c0 )
k∈N k∈N
NL
The last term is the expected value of the geometric distribution which is given by ρ/(1 − ρ).
This proves the theorem. 2
This implies that H has a regularly varying survival function with tail index α−1 >
0. Therefore, Lemma 5.16 implies that H is subexponential and we can apply
Theorem 5.18 to obtain
ψ(c0 ) λvθα
lim −α+1 = .
c0 →∞ 1 (α − 1)β − λvθα
c0
α θ
That is, we have found in the Pareto (subexponential) case for α > 1
λvθα
ψ(c0 ) ∼ c0 −α+1 as c0 → ∞.
w)
(α − 1)β − λvθα
(m
Conclusions. We conclude that the heavy tailed case may
lead to a much more dangerous ruin behavior. In Example
5.19 we obtain for the asymptotic ruin behavior a power law
decay as the initial capital goes to infinity, whereas in the
light tailed case we obtain the exponentially decaying Lund-
berg bound, see Theorem 5.11. This is an impressive example
tes
that heavy tailed claims require careful risk management prac- C.M. Goldie
tices. For instance, an excess-of-loss reinsurance cover with
retention level M > θ would completely change the ruin behavior of a company
facing Pareto distributed claims St , t ∈ N. Also the triggers of ruin are very differ-
ent in the two cases. In the light tailed case it is the big mass of claims that causes
no
ruin, whereas in the heavy tailed case it is the single large claim event that causes
ruin.
The most general version of the asymptotic ruin behavior in the subexponential
case goes back to Paul Embrechts and Noël Veraverbeke [38]. However,
an important missing piece in the argumentation was provided by Charles M.
NL
Goldie. The Pareto case has previously been solved by Bengt von Bahr [8].
w)
In Assumption 5.6 and the random walk Theorem 5.4 we have seen that we need
to charge an expected premium that exceeds the expected claim amount E[St ]
(m
otherwise we have ultimate ruin, P-a.s. This is the so-called (NPC). For the present
chapter we assume that the premium πt is deterministic, then (NPC) reads as
πt > E[St ]. For simplicity (because we consider a fixed accounting year in this
chapter) we drop the time index t and then (NPC) is given by
π = (1 + α) E[S]. (6.2)
133
134 Chapter 6. Premium Calculation Principles
• S2 ≡ γ/c is a constant.
Conclusion. The premium loading should be risk-based! That is, the loading
w)
π − E[S] > 0 should reflect the risk of fluctuations of S around its mean E[S].
(m
The first notion of risk is always described by the variance of a random variable.
Therefore, we assume in this section that the second moment of S exists.
Variance loading principle. Choose a fixed constant α > 0 and define the
insurance premium π by
tes
π = E[S] + αVar(S).
Revisiting Example 6.1 we obtain insurance premia with variance loading given by
γ γ γ
π1 = E[S1 ] + αVar(S1 ) = +α 2 > = E[S2 ] + αVar(S2 ) = π2 .
c c c
no
That is, for the risky position S1 we now charge a premium that strictly exceeds the
expected value and the loading is zero for deterministic claims S2 . An unpleasant
feature of the variance loading principle is that the calibration is difficult because
the variance is not handy for this purpose and, related to this, the principle is not
invariant under changes of currencies. Assume that rfx > 0 is the (deterministic)
NL
exchange rate between two different currencies. Assume rfx 6= 1 then we obtain
2
πfx = E[rfx S] + αVar(rfx S) = rfx E[S] + rfx αVar(S) 6= rfx π.
This non-linearity of the variance implies that the premium cannot easily be scaled
with exchange rates and inflation indexes. Therefore, one often studies other ver-
sions of variance principles which brings us to the next principle.
This principle gives an explicit meaning to the loading constant in (6.2), namely it
says that the loading constant should be proportional to the coefficient of variation
of S. If we revisit Example 6.1 we obtain premia
γ γ 1/2 γ
π1 = E[S1 ] + αVar(S1 )1/2 = +α > = E[S2 ] + αVar(S2 )1/2 = π2 .
c c c
For the risky position S1 we charge a premium that strictly exceeds the expected
value and the loading is zero for deterministic claims S2 . The standard deviation
loading principle is much better understood than the variance loading principle
because often practitioners have a good feeling for appropriate ranges of the co-
w)
efficient of variation. For instance, they know that for certain lines of business
it should be around 10%. Moreover, this principle is invariant under changes of
currencies. Assume that rfx > 0 is again the (deterministic) exchange rate between
two different currencies. Then we obtain the identity
(m
πfx = E[rfx S] + αVar(rfx S)1/2 = rfx E[S] + rfx αVar(S)1/2 = rfx π.
The previous examples consider rather simple premium loading principles and there
are more principles of this type such as the modified variance principle. In the next
section we describe more sophisticated principles which are motivated by economic
behavior of financial agents and give risk management perspectives. These more
tes
advanced principles include:
• utility theory pricing principles
X ⊂ L1 (Ω, F, P).
w)
with a utility function u which is a function that has the following properties:
u:I→R
(m
P-a.s., for all X ∈ X , see Figure 6.1 for two examples.
gamma>1
0
gamma=1
gamma<1
−500
tes 5
−1000
−1500
0
−2000
no
−2500
−5
Figure 6.1: lhs: exponential utility function with α = 0.05 and I = R, see (6.6);
rhs: power utility function with γ ∈ {0.5, 1, 1.5} and I = R+ , see (6.7).
E [u(X)] ≥ E [u(Y )] .
w)
E [u(X)] > E [u(Y )] , (6.3)
i.e. we strictly prefer X over Y . In this context, X has always the interpretation of
(m
a gain and if the gain of position X dominates the gain of position Y (in the above
sense) we have strict preference X Y . We conclude: u introduces a preference
ordering on X where positive outcomes of X ∈ X describe gains and negative
outcomes losses.
Strict concavity property. Strict concavity implies that we can apply Jensen’s
tes
inequality which provides
E [u(X)] ≤ u (E [X]) , (6.4)
and if X ∈ X is non-deterministic we even have a strict inequality in (6.4). Thus,
for non-deterministic positions, strict concavity of u implies E[X] X. The in-
no
This latter property is exactly the argument why policyholders are willing to pay
NL
an insurance premium π that exceeds their average claim amount E[Y ], and hence
finance (NPC). Assume that a policyholder has (deterministic) initial wealth c0
and he faces a risk that may reduce his wealth by (the random amount) Y . Hence,
he holds a risky position c0 − Y and his happiness index of this position is given
by E[u(c0 − Y )] if u describes the (risk-averse) utility function of this policyholder.
The strict concavity and increasing properties now imply the following preference
ordering
E [u(c0 − Y )] < u (c0 − E [Y ]) .
The left-hand side describes the present happiness and the right-hand side describes
the happiness that he would achieve if he could exchange Y by E[Y ]. Therefore,
any deterministic premium π > E [Y ] such that
would make him more happy than his current position c0 −Y . Thus, strict concavity
and increasing property of u implies that he is willing to pay any premium π in
the (non-empty) interval
π ∈ E [Y ] , c0 − u−1 (E [u(c0 − Y )]) , (6.5)
to improve his happiness position. The lower bound of this interval is the (NPC)
and the upper bound is the maximal price π that the policyholder will just tolerate
according to his risk-averse utility function u. The less risk-averse he is the narrower
the interval will get. The extreme case of risk-neutrality, which corresponds to the
w)
linear function u(x) = x, will just provide that the upper bound is equal to the
lower bound in (6.5), and no insurance is necessary.
The most popular utility functions are, see also Figure 6.1:
(m
• exponential utility function, constant absolute risk-aversion (CARA) utility
function: for α > 0 (defined on I = R)
1
u(x) = 1 − exp {−αx} . (6.6)
α
Example 6.3 (exponential utility function). Assume that the policyholder has
exponential utility function (6.6), he has initial wealth c0 and he faces a risky
position Y ∈ L1 (Ω, F, P) with Var(Y ) > 0 and Y ≥ 0, P-a.s. This implies that
the expected claim is given by E[Y ] > 0. The exponential utility function has the
NL
following properties
therefore it is strictly increasing and concave on R, see Figure 6.1 (lhs). The inverse
is given by
1
u−1 (y) = − log (α(1 − y)) .
α
This implies that the possible premium π is given in the non-empty interval, see
(6.5),
1
π ∈ E [Y ] , log E [exp{αY }] .
α
The important observation in this example is that the price tolerance in π does
not depend on the initial wealth c0 of the policyholder. We will see that this
property uniquely holds true for the exponential utility function, and we may ask
the question how realistic this property is in real world behavior?
Example 6.4 (power utility function). Assume that the policyholder has power
utility function (6.7), he has initial wealth c0 > 1 and he faces a risky position Y ∼
Bernoulli(p = 1/2). This implies that the expected claim is given by E[Y ] = 1/2.
The power utility function has the following properties
w)
therefore it is strictly increasing and concave on I = R+ , see Figure 6.1 (rhs). For
our example we choose γ = 1. In this case the inverse of the utility function is
given by
u−1 (y) = exp{y}.
(m
We calculate the upper bound in (6.5),
The important observation in this example is that the price tolerance in π depends
on the initial wealth c0 > 1 of the policyholder.
The function b is defined on (1, ∞) and we have
1.2
c0 →1 c0 →∞
0.8
NL
This shows that we have strict monotonicity in the initial capital c0 > 1, i.e. the
richer the policyholder the narrower the price tolerance interval (6.5), see also
Example 6.14, below.
w)
Of course, π and S need to be such that c0 + π − S ∈ I, P-a.s. This may give some
restrictions on the range of S if I is a bounded interval, see also Example 6.4.
(m
The utility indifference price given in Definition 6.5 gives the insurance company
point of view. It is assumed that the insurance company has initial capital c0 ∈ I,
similar to the surplus process given in Definition 5.1. It will then only accept an
insurance contract S at price π if the utility does not decrease, i.e. if it is indifferent
about accepting S at price π and not selling such a contract.
tes
Jensen’s inequality and the strict increasing property of u immediately provide the
following corollary.
Corollary 6.6. The utility indifference price π = π(u, S, c0 ) for initial capital c0 ,
risk-averse utility function u and risky position S satisfies
no
exponential utility function (6.6) with risk-aversion parameter α > 0, and we would
like to insure a risky position S ∼ N (µ, σ 2 ). Thus, we need to solve
1 1
1− exp {−αc0 } = E 1 − exp {−α(c0 + π − S)} .
α α
This is equivalent to solving
π = π(u, S, c0 ) = µ + ασ 2 /2 > µ.
Remarks.
• The loading is of the form ασ 2 /2 = αVar(S)/2. That is, for the exponential
utility function we get a variance loading. This is exact for S ∼ N (µ, σ 2 )
and it is approximately true for other distribution functions (using a Taylor
approximation).
• The utility indifference price does not depend on the initial capital c0 .
)
w
Exercise 7. Choose the exponential utility function (6.6).
(m
• Calculate the utility indifference price for S ∼ Pareto(θ, α).
u(x) = a − b exp{−cx},
Remark. Note that the utility function u(x) = a − b exp{−cx} gives the same
preference ordering as the exponential utility function (6.6) with c = α: if we have
NL
two different utility functions u(·) and v(·) with v = a + bu for a ∈ R and b ∈ R+
then they generate the same preference ordering.
Proof of Proposition 6.8. The direction “⇐” is immediately clear just by evaluating Definition
6.5. So we prove the direction “⇒”. Definition 6.5 implies for the derivative w.r.t. c0
0 ∂ 0 ∂
u (c0 ) = E[u(c0 + π − S)] = E u (c0 + π − S) 1 + π(c0 ) = E [u0 (c0 + π − S)] ,
∂c0 ∂c0
where in the last step we have used the assumption that the premium does not depend on c0 .
This implies that, using a change of sign,
strictly increasing). The last equation explains that π = π(u, S, c0 ) = π(v, S, c0 ) is also the utility
indifference price for utility function v, since
for any c0 and S where π = π(u, S, c0 ) exists. The latter implies that (the proof is provided
below)
u00 (x) v 00 (x)
= for all x ∈ R. (6.9)
w)
u0 (x) v 0 (x)
Before we prove (6.9) we show that it provides the claim. Calculate
d u00 (x) u000 (x)u0 (x) − (u00 (x))2 u00 (x) u000 (x) u00 (x)
= = 0 − 0
dx u0 (x) (u0 (x))2 u (x) u00 (x) u (x)
(m
00 00 00
u (x) v (x) u (x)
= − 0 = 0.
u0 (x) v 0 (x) u (x)
The solution to this differential equation is given by u(x) = a − b exp{−cx}. Risk-aversion and
tes
increasing property provide b, c > 0.
So there remains to prove that (6.8) implies (6.9). This is proved by contradiction. Assume that
(6.9) does not hold true. Due to the differentiability property of u we can find a non-empty open
interval O ⊂ R with, w.l.o.g.,
u0 (x) v 0 (x)
We consider the function u(v −1 (·)) on the non-empty open interval v(O) (note that v is strictly
increasing). We calculate
d d u0 (v −1 (x))
u(v −1 (x)) = u0 (v −1 (x)) v −1 (x) = 0 −1 > 0,
dx dx v (v (x))
NL
This implies that u(v −1 (·)) is a risk-averse utility function on the non-empty interval v(O). Choose
a non-deterministic random variable Y such that Y ∈ O, P-a.s. Since O is a non-empty open
interval such a random variable can be chosen (i.e. no concentration in a single point). This
implies that v(Y ) is a non-deterministic random variable with range v(O) and the strict concavity
of u(v −1 (·)) on v(O) implies
u−1 (E [u(Y )]) = u−1 E u(v −1 (v(Y ))) < u−1 u v −1 (E [v(Y )]) = v −1 (E [v(Y )]) ,
(6.10)
The proof of Proposition 6.8 provides insights into risk-aversion. Define the ab-
solute and the relative risk-aversions of a twice differentiable utility function u
by
u00 (x) u00 (x)
ρARA (x) = ρuARA (x) = − and ρRRA (x) = ρuRRA (x) = −x .
u0 (x) u0 (x)
Example 6.9 (exponential utility function). The exponential utility function (6.6)
with risk-aversion parameter α > 0 satisfies for all x ∈ R
w)
ρARA (x) = α.
Example 6.10 (power utility function). The power utility function (6.7) with
(m
risk-aversion parameter γ > 0 satisfies for all x ∈ R+
ρRRA (x) = γ.
Assume that u and v are two utility functions that are defined on the same interval
tes
I. Then, u is more risk-averse than v if for any X with range I we have
Proposition 6.11. Assume that u and v are twice differentiable utility functions
defined on the same interval I ⊂ R. The following are equivalent:
no
prove claim (6.9) from (6.8). For the direction “⇐” we consider the function u(v −1 (·)) on v(I).
This is a strictly increasing function because u and v are utility functions, see proof of Proposition
6.8. The latter proof also implies
u0 (v −1 (x))
00 −1
d2 u (v (x)) v 00 (v −1 (x))
−1
u(v (x)) = − 0 −1
dx2 (v 0 (v −1 (x)))2 u0 (v −1 (x)) v (v (x))
0 −1
u (v (x)) v
ρARA (v −1 (x)) − ρuARA (v −1 (x)) ≤ 0
= 0 −1 2
for all x ∈ v(I).
(v (v (x)))
The proof then follows similar to (6.10). 2
Proof. We have the following: ρuARA (x) ≥ ρvARA (x) for all x ∈ I implies
Since both v −1 and v are strictly increasing we see that π(u, S, c0 ) ≥ π(v, S, c0 ). 2
The last corollary also explains that the price elasticity interval (6.5) becomes more
narrow for decreasing risk-aversion.
w)
• π(u, S, c0 ) is decreasing in c0 for all S;
(m
Proof of Theorem 6.13. In complete analogy to the proof of Proposition 6.8 we obtain for the
direction “⇒” (calculating derivatives w.r.t. c0 )
u000 v 00 u00
− 00
=− 0 ≥− 0,
u v u
no
and thus
d u00 u000 (u00 )2 u00 u000 u00
d u
ρARA (x) = − = − + = − − ≤ 0.
dx dx u0 u0 (u0 )2 u0 u00 u0
This prove the first direction of the equivalence. The prove of the direction “⇐” is received by
just reading the prove into the other direction (all the statements are equivalences). 2
NL
Example 6.14 (power utility function). The power utility function (6.7) with
risk-aversion parameter γ > 0 satisfies for all x ∈ R+
w)
Choose a random variable S ∼ F with finite first moment given by
Z
E[S] = s dF (s).
(m
R
follows:
1 Z s
Fα (s) = eαx dF (x),
MS (α) −∞
under the additional assumption that the moment generating function MS (α) of S
exists in α. Note that this defines a (normalized) probability measure Fα .
NL
Definition 6.15 (Esscher premium). Choose S ∼ F and assume that there exists
r0 > 0 such that MS (r) < ∞ for all r ∈ (−r0 , r0 ). The Esscher premium πα of S
in α ∈ (0, r0 ) is defined by
Z
πα = Eα [S] = s dFα (s).
R
d
πα = log MS (r)|r=α ≥ E[S],
dr
where the inequality is strict for non-deterministic S.
MS0 (α)
Z
1 d
πα = seαs dF (s) = = log MS (r)|r=α .
MS (α) R MS (α) dr
Example 6.17 (Esscher premium for Gaussian distributions). Choose α > 0 and
assume that S ∼ N (µ, σ 2 ). Then we have
w)
πα = log MS (r)|r=α = µ + ασ 2 > µ = E[S].
dr
In the Gaussian case we obtain the variance loading. Thus, the variance loading,
the exponential utility function and the Esscher premium principles provide exactly
the same insurance premium in the Gaussian case.
(m
Conclusions.
tes
• The Esscher premium can easily be calculated from the moment generating
function MS (r).
• The Esscher premium can only be calculated for light tailed claims, see also
Section 5.2 on the Lundberg coefficient. Towards all more heavy tailed claims
no
the Esscher premium reacts so sensitive that it becomes infinite. In the next
section we study probability distortion principles that allow for more heavy
tails in premium calculation still leading to a finite premium.
present section we look at probability distortions from a different angle which will
allow for more flexibility. Assume that S ∼ F with S ≥ 0, P-a.s., then using
integration by parts the expected claim is calculated as
Z ∞ Z ∞
E[S] = x dF (x) = P[S > x] dx.
0 0
In this section we distort the survival function P[S > x]. Therefore, we introduce a
distortion function h : [0, 1] → [0, 1] which is a continuous, increasing and concave
function with h(0) = 0 and h(1) = 1, in Figure 6.2 we give two examples.
w)
probability distortions
1.0
(m
0.8
0.6
0.4
tes
0.2
power distortion
0.0
expected shortfall
no
Figure 6.2: Distortion functions h of Examples 6.19 and 6.20, below, with γ = 1/2
and q = 0.1, respectively.
NL
• h(p) distorts the probability p with the property that h(p) ≥ p because h is
increasing and concave with h(0) = 0 and h(1) = 1.
• The concavity of h reflects risk aversion, similar to the utility functions used
in Section 6.2.1.
• Note that the existence of p ∈ (0, 1) with h(p) > p implies that h(p) > p for
all p ∈ (0, 1). Therefore, we assume under strict risk-aversion that h(p) > p
for all p ∈ (0, 1).
w)
0 0
Remarks.
(m
• Similar to the Esscher premium we modify the probability distribution func-
tion of the claims S (in contrast to the utility theory approach where we
modify the claim sizes).
• The probability distortion approach is a technique to
construct coherent risk measures for bounded random
tes
variables. For a detailed outline we refer to Freddy
Delbaen [31], in particular to corresponding Example
4.7 and Corollary 7.6 which relates convex games to
coherent risk measures.
no
claim S ∼ Pareto(θ, α) with α > 1 and θ > 0, and probability distortion function,
see Example 4.5 in Delbaen [31] and Figure 6.2,
Exercise 10. Choose power distortion function (6.11). Calculate the probability
distorted price of S ∼ Γ(1, c) and of S ∼ Bernoulli(p).
w)
h(x) = (6.12)
1 otherwise.
(m
F ← (α) = inf{x ∈ R; F (x) ≥ α}.
For simplicity we assume that F is continuous and strictly increasing. This simpli-
fies considerations because then also F ← is continuous and strictly increasing and
F (F ← (α)) = α and F ← (F (x)) = x, see Chapter 1 (strictly increasing property of
F would not be necessary for getting the full flavor of this example). Consider the
tes
survival function of S given by F (x) = 1 − F (x) = P[S > x]. Note that under our
assumptions
{x<F ← (1−q)}
1Z ∞
= P[S > x] dx + F ← (1 − q).
q F (1−q)
←
The latter is exactly the so-called Tail-Value-at-Risk (TVaR) or the conditional tail
expectation (CTE) of the random variable S at the 1 − q security level. Moreover,
F ← (1 − q) is the Value-at-Risk (VaR) of the random variable S at the 1 − q security
level. The continuity of F implies that this TVaR is equal to the expected shortfall
(ES) of S at the security level 1 − q, that is,
← 1 Z1 ←
πh = E [S |S ≥ F (1 − q)] = F (u) du = ES1−q (S),
q 1−q
see Artzner et al. [5, 6], Acerbi-Tasche [1] and Lemma 2.16 in McNeil et al. [68].
The proof again uses that fact that for continuous distribution functions F we have
w)
F (F ← (α)) = α and then the left-hand side of the above statement can be obtained
by a change of variables from the right-hand side.
We conclude that under continuity assumptions the risk measure ES1−q (S) can be
(m
obtained via probability distortion (6.12), and following Delbaen [31] is therefore
coherent.
Exercise 11. Choose probability distortion (6.12) for q = 1% and calculate the
probability distorted price for
• S ∼ LN(µ, σ 2 ),
tes
• S ∼ Pareto(θ, α) with α > 1,
i.i.d.
• Sn = ni=1 Yi with Yi ∼ Γ(1, 1) and study the diversification benefit of the
P
losses here).
Remarks.
• If the risk measure % is the regulatory risk measure then %(X) ∈ R reflects the
necessary risk bearing capital that needs to be available within the insurance
company to run business X. This is the minimal equity the insurance com-
pany needs to hold to balance possible shortfalls in the insurance portfolio.
• For having a “good” risk measure one requires additional properties for %
such as monotonicity, coherence, etc. This is described below.
• The most commonly used risk measures are: variance, Value-at-Risk (VaR),
expected shortfall (ES) already met in Example 6.20. We further discuss
them below.
w)
Assume a (regulatory) risk measure % : X → R with X 7→ %(X) is given. We would
like to price an insurance portfolio S under the assumption X = S −E[S] ∈ X . The
regulatory capital requirement then prescribes that the insurance company needs to
hold at least risk bearing capital %(S − E[S]). This risk bearing capital %(S − E[S])
(m
quantifies the necessary financial strength of the insurance company so that it is
able to finance shortfalls beyond E[S] exactly up to the amount %(S − E[S]).
We assume %(S − E[S]) ≥ 0 (which is going to be justified below). Then the insur-
ance company looks for shareholders that are willing to provide this risk bearing
capital %(S − E[S]) > 0. The shareholders will provide this capital as soon as the
promised expected return on this (invested) capital is sufficiently high. We call the
tes
expected rate of return on this shareholder capital cost-of-capital rate rCoC > 0.
Thus, the shareholders’/investors’ expect return is
Interpretation.
• For outcomes S ≤ E[S]: the claim can be financed by the pure risk premium
E[S] solely.
• For outcomes S > E[S]: the pure risk premium E[S] is not sufficient and the
shortfall S − E[S] > 0 needs to be paid from %(S − E[S]). Thus, the investors’
capital %(S − E[S]) is at risk, he may lose (part of) it. Therefore, he will ask
for a cost-of-capital rate
rCoC > r0 ,
if r0 denotes the risk-free rate (he receives on a risk-free bank account with
the same time to maturity as his investment).
w)
Then we state the following axioms for risk measures % on X .
Axioms 6.22 (axioms for risk measures %). Assume % is a risk measure on the
convex cone X containing R. Then we define for X, Y ∈ X , c ∈ R and λ > 0:
(m
(a) normalization: %(0) = 0;
(c) translation invariance: for all X and every c we have %(X + c) = %(X) + c;
(d) positive homogeneity: for all X and for every λ > 0 we have %(λX) = λ%(X);
tes
(e) subadditivity: for all X, Y we have %(X + Y ) ≤ %(X) + %(Y ).
Observe that some of the axioms imply others, e.g. positive homogeneity implies
normalization since %(0) = %(λ0) = λ%(0) for all λ > 0 immediately says %(0) = 0.
no
same time it has initial capital c0 = %(S −E[S]) ≥ 0. Then the future surplus of the
company is given by C1 = c0 + π − S. The regulator then checks the acceptability
condition which reads as
w)
finance the cost-of-capital cash flow rCoC c0 = rCoC %(S − E[S]) to the investor.
This can exactly be done with the cost-of-capital premium πCoC and the insurance
company keeps its acceptable position in (6.13) if rCoC c0 is also considered as a
liability of the insurance company.
(m
Monotonicity and normalization imply that more risky positions are charged
with higher capital requirements and, in particular, if we have only downside risks,
i.e. X ≥ 0, P-a.s., then we will have positive capital charges %(X) ≥ %(0) = 0.
Definition 6.23 (coherent risk measure). The risk measure % is called coherent if
tes
it satisfies Axioms 6.22.
for a given parameter α > 0. This risk measure is normalized, positive homoge-
neous, and subadditive. But it is neither translation invariant nor monotone. Note
that for the standard deviation risk measure the cost-of-capital pricing principle
coincides with the standard deviation loading principle presented in Section 6.1.
Example 6.26 (expected shortfall). The expected shortfall has already been in-
troduced in Example 6.20, where we have stated that the expected shortfall is equal
to the TVaR for continuous distribution functions F . Instead of introducing it via
probability distortion functions we can also directly define it. Assume that S ∼ F
with F continuous. Then we have
w)
1Z 1
%(S) = TVaR1−q (S) = E [S |S ≥ VaR1−q (S)] = VaRu (S) du = ES1−q (S).
q 1−q
ES1−q (S) is a coherent risk measure on L1 (Ω, F, P). The cost-of-capital pricing
(m
principle is then given by
This cost-of-capital pricing principle can also be obtained with probability distor-
tion functions: choose h as in Example 6.20 and define the distortion function
e : [0, 1] → [0, 1] as follows
h
tes
h(x)
e = (1 − rCoC ) x + rCoC h(x),
Remarks.
• The Swiss Solvency Test considers ES1−q (S − E[S]) for 1 − q = 99% as the
regulatory risk measure.
• For rCoC one often sets 6% above the risk-free rate. However, this is a heavily
debated number because in stress periods this rate should probably be higher.
probability distortions
1.0
0.8
0.6
0.4
w)
0.2
ES cost−of−capital loading
(m
0.0 0.2 0.4 0.6 0.8 1.0
Figure 6.3: Distortion functions h of Example 6.20 (expected shortfall) and corre-
sponding he for expected shortfall cost-of-capital loading.
tes
Exercise 12. Assume that S ∼ N (µ, σ 2 ) has a Gaussian distribution. Choose
1 − q = 99% and rCoC = 6%. The cost-of-capital pricing principle for the expected
shortfall risk measure gives price
no
1 σ 1
2
π = µ + rCoC √ exp − Φ−1 (1 − q) .
q 2π 2
capital insurance price is the same as for the expected shortfall risk measure.
(c) Calibrate the standard deviation risk measure loading parameter α > 0 such
that the price is the same as for for the expected shortfall risk measure.
Remark. This parameter calibration only holds true under the Gaussian model
assumption.
w)
Assume that ϕ is an integrable and strictly positive random variable with
1
E[ϕ] = d0 = ∈ (0, 1].
1 + r0
(m
Then, d0 can be seen as deterministic discount factor and r0 ≥ 0 can be seen
as deterministic risk-free rate. This is the general version of a deflator ϕ. To
make deflator pricing comparable to the previously introduced pricing principles
we assume that d0 = 1, i.e. no time values are added to cash flows.
Fix ϕ ∈ L1 (Ω, F, P) strictly positive with d0 = 1 and assume that ϕ and S are
tes
positively correlated. Then we can define the deflator based price by
Thus, all random variables S which are positively correlated with ϕ receive a posi-
tive premium loading. The next example shows that this is a generalization of the
Esscher premium, or more generally, it can be understood as a probability distor-
tion principle because ϕ allows to define the equivalent probability measure P∗ by
the Radon-Nikodym derivative as follows
NL
dP∗
= ϕ,
dP
because ϕ is a strictly positive density w.r.t. P for d0 = 1. Then, we price S under
the equivalent probability measure P∗ by
which is exactly the Esscher premium πα , and P∗ is the Esscher measure corre-
sponding to Fα , see Section 6.2.2.
w)
positive premium loading. Moreover, this deflator approach
also allows for stochastic discounting by choosing a deflator ϕ
with E[ϕ] ∈ (0, 1), and generalizations to multiperiod problems
are easily possible and straightforward. For more details we
(m
refer to Wüthrich-Merz [88] and Wüthrich et al. [86].
" ! #
rCoC
πϕ(0) = E (1 − rCoC ) + 1{S≥VaR1−q (S)} S
q
rCoC h i
= (1 − rCoC ) E [S] + E 1{S≥VaR1−q (S)} S
q
= (1 − rCoC ) E [S] + rCoC E [ S| S ≥ VaR1−q (S)]
= E [S] + rCoC (E [ S| S ≥ VaR1−q (S)] − E [S])
= E [S] + rCoC ES1−q (S − E [S]) .
We conclude that we exactly obtain the cost-of-capital loading principle with ex-
pected shortfall as risk measure, see Example 6.26.
w)
(m
tes
no
NL
w)
Linear Models
(m
Assume we have v ∈ N insurance policies denoted by l = 1, . . . , v. These insurance
policies should be sufficiently similar such that we have a homogeneous insurance
portfolio to which the law of large numbers (LLN) applies, see (1.1). The ideal
case of i.i.d. risks justifies the charge of the same premium to every policy. If
there is no perfect homogeneity (and there never is) then there are two different
tes
possibilities of charging a premium: (a) everyone pays the same premium which
reflects more the aspect of social insurance, where one tries to achieve a balance
between the rich and the poor; (b) the individual premium should reflect the quality
of the specific insurance policy, i.e. we try to calculate risk adjusted premia. In
the present chapter we try to achieve (b). We explain this with the compound
no
Poisson model at hand. The aggregation and the disjoint decomposition properties
of the compound Poisson model S ∼ CompPoi(λv, G), see Theorems 2.12 and 2.14,
suggest the consideration of the following identity
N v N (l) v
X X X (l) X
S = Yi = Yi = Sl ,
NL
where µ = E[S]/v = λE[Y1 ] is the average claim over all policies and χ(l) > 0
reflects the risk characteristics of policy l = 1, . . . , v. This means that in the case
of heterogeneity we should determine the risk characteristics χ(l) for every policy l
to obtain risk adjusted premia because these risk characteristics may differ.
159
160 Chapter 7. Tariffication and Generalized Linear Models
For this exposition we assume that we have only two tariff criteria, i.e. K = 2, and
we would like to set up a multiplicative tariff.
w)
The generalization to K > 2 is then straightforward.
Assume we have K = 2 tariff criteria. The first criterion has i ∈ {1, . . . , I} risk
(m
characteristics and the second criterion has j ∈ {1, . . . , J} risk characteristics.
Thus, we have I · J different risk classes, see Table 7.1.
1 ··· j ··· J
1
..
.
tes
i risk classes (i, j)
..
.
I
no
We assume that policy l belongs to risk class (i, j), write χ(l) = χ(i,j) . This provides
vi,j χ(i,j) ,
X
E[S] = µ
i,j
NL
where vi,j denotes the number of policies belonging to risk class (i, j). Our aim is
to set up
Observe that the 1st tariff criterion is continuous, but typically it is discretized for
having finitely many risk characteristics, see Table 7.2 for an example.
w)
χ2,· 1.2 1.1 1.0 0.9 0.8 0.7 0.5
yearly km χ1,·
0-10’000 0.8
10-15’000 0.9
15-20’000 1.0
(m
20-25’000 1.1 χ(4,5) = χ1,4 χ2,5 = 1.1 · 0.8 = 0.88
25’000+ 1.2
homogeneous risk classes. These risk classes are then priced by choosing appropri-
ate multiplicative pricing factors χk,lk .
Remarks.
• A prior choice of tariff criteria should be done using expert opinion. Statisti-
NL
cal analysis should then select as few as possible significant criteria. However,
also market specifications of competitors are important to avoid adverse se-
lection.
• Related to the first item: the aim should be to build homogeneous risk classes
of sufficient volume such that LLN applies and we get statistical significance.
Assume we have two tariff criteria i and j which give I · J risk classes. Our aim
is to find appropriate multiplicative pricing factors χ1,i , i ∈ {1, . . . , I}, and χ2,j ,
j ∈ {1, . . . , J}, which describe the risk classes (i, j) according to the multiplicative
tariff structure (7.1).
We define by Si,j the total claim of risk class (i, j) and by vi,j the corresponding
volume with
)
X X
vi,j = v and Si,j = S.
w
i,j i,j
(m
E[Si,j ] =
v
where µ = λE[Y1 ] is the average claim per policy over the whole portfolio v,
i.e. E[S] = vµ.
2
X
X = . (7.2)
i,j vi,j µ χ1,i χ2,j
This can either be done by first summing over rows i or columns j. Note that χ
b2,j is found by
w)
χ
b2,j = i
Pi,j .
v
i i,j µ
b χ
b 1,i
(m
vi,j µ
bχb1,i χ
b2,j = vi,j µ
bχb1,i .
i i i
vi,j µ
bχ b1,i
Next we apply the Schwarz’ inequality to the terms on the right-hand side which provides the
following lower bound
!1/2
2
X X 1/2 Si,j
X
vi,j µ
bχ b2,j ≥
b1,i χ (vi,j µ
bχb1,i ) = Si,j .
vi,j µ
bχ b1,i
tes
i i i
Example 7.3 (method of Bailey & Simon). We choose an example with two tariff
criteria. The first one specifies whether the car is owned or leased, the second
no
one specifies the age of the driver. For simplicity we set vi,j ≡ 1 and we aim to
determine the tariff factors µ, χ1,i and χ2,j . The method of Bailey & Simon then
requires minimization of
i,j
Note that we need to initialize the estimators for obtaining a unique solution. We
set µb = 1 and χb1,1 = 1. The observations Si,j are given by, see also Figure 7.1,
scatter plot
2000
L leased
O owned
1800
L
1600
claim amount
1400 L
O L L
w)
1200
O O
1000
(m
age classes
In this example we have (systematic) positive bias as stated in Lemma 7.2, i.e.
no
previous method, see Lemma 7.2 and Example 7.3. But it is still
a simple method that is not directly motivated by a stochastic
model, however we will see below that it has its groundings in a
stochastic model. It imposes unbiasedness of rows and columns
by definition: Choose µ, χ1,i and χ2,j > 0 such that the rows i
and columns j satisfy
J. Jung
J
X J
X
vi,j µ χ1,i χ2,j = Si,j , (7.3)
j=1 j=1
I
X I
X
vi,j µ χ1,i χ2,j = Si,j . (7.4)
i=1 i=1
Remarks.
w)
• Both the method of Bailey & Simon and the method of Bailey & Jung are
rather pragmatic methods because they are not directly based on a stochastic
model. Therefore, in the remainder of this chapter we are going to describe
more sophisticated methods from a probabilistic point of view.
(m
Example 7.4 (method of Bailey & Jung, method of total marginal sums). We
revisit the data of Example 7.3. This time we determine the parameters by solving
the system (7.3)-(7.4). This needs to be done numerically and provides the following
multiplicative tariff structure:
21-30y 31-40y 41-50y 51-60y χb1,i
tes
owned 1375 1108 1020 1197 1.0000
leased 1725 1392 1280 1503 1.2553
χb2,j 1375 1108 1020 1197
We conclude that both methods give similar results for this example.
no
Combining this two items implies that we want to consider the following model
def.
Xi,j = log Ri,j ∼ N β0 + β1,i + β2,j , σ 2 .
Thus, taking logarithms may turn the multiplicative tariff structure into an additive
structure. If this logarithm Xi,j of Ri,j has a Gaussian distribution we have nice
mathematical properties. Therefore, we assume a log-normal distribution for Ri,j
which hopefully gives a good approximation to the true tariffication problem. These
w)
choices imply for the first two moments
2 /2 2
E[Ri,j ] = eβ0 +σ eβ1,i eβ2,j and Var(Ri,j ) = E[Ri,j ]2 (eσ − 1).
2
Observe that the mean has the right multiplicative structure, set µ = eβ0 +σ /2 ,
(m
χ1,i = eβ1,i and χ2,j = eβ2,j . However, the distributional properties are rather
different from compound models. Nevertheless, this log-linear Gaussian structure
is often used for tariffication because of its nice mathematical structure and because
popular statistical methods can be applied.
Set M = I · J and define for Xi,j = log Ri,j = log(Si,j /vi,j ) the vector
tes
X = (X1 , . . . , XM )0 = (X1,1 , . . . , X1,J , . . . , XI,1 , . . . , XI,J )0 ∈ RM . (7.5)
Note that we change the enumeration of the observations because this is going to
be simpler in the sequel. Index m always refers to
no
X ∼ N (Zβ, Σ) , (7.7)
NL
Throughout we assume that Z has full rank. We initialize β1,1 = β2,1 = 0 and
β0 plays the role of the intercept. At the moment the weights wm do not have a
−1
natural meaning, often one sets wm = vi,j (inverse proportional to the underlying
volume).
In view of Example 7.3 this gives the following table where the “1”s show to which
class the observations belong to:
w)
This table needs to be turned into the appropriate form so that it fits to (7.7).
Therefore we need to drop the columns “owned” and “21-30y” because of the
chosen normalization β1,1 = β2,1 = 0. This provides the following table:
(m
intercept leased 31-40y 41-50y 51-60y
1 0 0 0 0
1 0 1 0 0 β0
1 0 0 1 0 β1,2
Zβ = 1 0 0 0 1
β2,2
1 1 0 0 0 β2,3
β2,4
tes
1 1 1 0 0
1 1 0 1 0
1 1 0 0 1
1 1
f (x) = exp − (x − Zβ)0 Σ−1 (x − Zβ) .
(2π)M/2 |Σ|1/2 2
MLE −1
β
b = Z 0 Σ−1 Z Z 0 Σ−1 X. (7.8)
The tariff factors can then be estimated by (avoiding the variance correction term)
n o n o n o
µb = exp βb0MLE , MLE
χb1,i = exp βb1,i and MLE
χb2,j = exp βb2,j .
Example 7.5 (log-linear model). We use the data Si,j from Example 7.3. Assume
vi,j ≡ 1 and initialize µb = 1 and χb1,1 = 1. The log-linear MLE formula (7.8)
provides the following multiplicative tariff structure:
We compare the results from the method of Bailey & Simon, the method of total
marginal sums (Bailey & Jung) and the log-linear MLE method.
owned
w)
leased
1600 2000
1400
1200 1500
1000
800 1000
600
400 500
200
0 0
21-30y 31-40y 41-50y 51-60y 21-30y 31-40y 41-50y 51-60y
(m
observed Bailey & Simon marginal sums linear regression
observed Bailey & Simon marginal sums linear regression
• We see that in this example all three methods provide similar results.
• Observe: the risk class (owned, 21-30y) is punished by the bad performance
of (leased, 21-30y) and vice verse. A similar remark holds true for risk class
tes
(leased, 31-40y).
Remarks.
no
• The multiplicative tariff construction above has used the design matrix Z =
(zm,k )m,k ∈ RM ×(I+J−1) which was generated by categorical variables. Cate-
gorical variables allow to group observations into disjoint risk categories.
• Binary variables are a special case of categorical variables that can only have
NL
two specifications, 1 for true and 0 for false. Recall that all our zn,k ∈ {0, 1}.
E.g., the observation Si,j either belongs to the class “owned” or to the class
“leased”.
• Often the linear regression model X = Zβ + ε with ε ∼
N (0, Σ) is introduced for continuous variables (zn,k )n,k
which generate the design matrix Z. E.g. if there is a
(clear) functional relationship between the age and the
tariff criterion χ2 , for instance if χ2 is a linear function
of age, then variable zn,k ∈ R+ modeling age is directly
reflecting this relationship (linear regression). For more
on this subject we refer to Frees [46], for the present dis-
cussion we concentrate on binary variables.
w)
X 2 X 2 X 2
SStot = Xm − X = c −X
X m + Xm − X
c
m = SSreg +SSerr , (7.9)
m m m
(m
with X = 1 PM
Xm and X b MLE .
c = Zβ
M m=1
• SStot is the total difference between observations Xm and the sample mean
X without knowing the explaining variables Z.
Proof of (7.9). We rewrite the total sums of squares SStot in vector notation. Therefore we
define
ε = X − Zβb MLE = X − X and X̄ = X (1, . . . , 1)0 . (7.10)
no
b c
We calculate
0 0
X 0X = (X ε)0 (X
c+b ε) = X
c+b cXc + 2X
cb ε0 b
ε+b ε.
b MLE minimizes in the homoscedastic case (X − Zβ)0 (X − Zβ) and thus we have
The MLE β
MLE
NL
0 = Z 0 (X − Z β
b ) = Z 0b
ε, (7.11)
and as a consequence
0
X
cb b MLE )0 b
ε = (Z β ε = 0.
This implies
0
X 0X = X
cX ε0 b
c+b ε.
0
We subtract on both sides X̄ X̄ to obtain
0 0
ε0 b
SStot = X 0 X − X̄ X̄ = b ε+X
cXc − X̄ 0 X̄ = SSerr + SSreg ,
where for the last step we need to observe that the intercept β0 is contained in every row of
the design matrix Z, therefore the first column in Z is equal to (1, . . . , 1)0 . This and (7.11)
implies 0 = (1, . . . , 1)0 b
P P b
ε = Xm − Xm . This treats the cross-product terms leading to
0 0
X X − X̄ X̄ = SSreg . This proves (7.9).
c c 2
SSreg SSerr
R2 = =1− ∈ [0, 1].
SStot SStot
This is the ratio of explaining variables SSreg and the total sum of squares SStot . If
the model explains well the structure then R2 should be close to 1, because X c is
able to explain the underlying structure.
w)
For Example 7.5 we obtain R2 = 0.9202 which is in favor for this model explaining
the data Si,j .
(m
Residual standard deviation σ: For further analysis we also need the residual
standard deviation σ. It is estimated (in the homoscedastic case) by
1 X b0 b
c 2 = ε ε = SSerr ,
σb 2 = Xm − X m
M m M M
tes
where εb was defined in (7.10). Define r = I + J − 2,
i.e. the dimension of parameter β is r + 1. σb 2 is the
MLE for σ 2 and M σb 2 is distributed as σ 2 χ2M −r−1 , see, for
instance, Section 7.4 in Johnson-Wichern [55]. Often, one
also considers the unbiased variance parameter estimator
no
M
s2 = M −r−1 σb 2 .
Likelihood ratio test: Finally, we would like to see whether we need to include
a specific parameter βk,lk .
Note that the model is, of course, invariant under permutation of parameters and
components. Therefore, we can choose any specific ordering and to simplify nota-
tion we define
β = (β0 , β1 , . . . , βr )0 ∈ Rr+1 , (7.12)
so that we have the ordering of components that is appropriate for the next layout.
w)
fbH (X)
σbH0
−M exp − 2bσ12 (X − Z0 βH0 − Z0 β
b
H0 )
H
Λ = b 0 = 0
ffull (X) σbfull exp − 2bσ12 (X b MLE )0 (X
− Zβ − Zβ
b MLE
)
full full
full
H −M/2 !−M/2 !−M/2
SSerr0
SSH 0
SSH0 − SSfull
(m
err
= M
SSfull
= = 1 + err full err . (7.13)
err SSfull
err SSerr
M
The likelihood ratio test rejects the null hypothesis H0 for small values of Λ. This
is equivalent to rejection for large values of (SSH full full
err − SSerr )/SSerr .
0
where the latter denotes the α quantile of the F -distribution with degrees of free-
NL
• The lines Coefficients give the MLE for the parameters β0 (intercept), β1,2
(leased) and β2,2 , . . . , β2,4 . For these parameters a standard estimation error is
calculated and a t-test is applied to each parameter individually, whether they
are different from zero, see formula (7.14) in Johnson-Wichern [55]. From this
analysis we see that we might only question β2,4 because of the large p-value
of 0.1675 the other parameters are well justified by the observations.
w)
(m
Figure 7.2: R output of Example 7.5 using R command lm.
• The bottom lines then display the residual standard error sb = 0.7447 on
tes
df = 3 degrees of freedom, the coefficient of determination R2 = 0.9202, the
adjusted coefficient of determination Ra2 corrects for the degrees of freedom
SSerr M − 1
Ra2 = 1 − .
SStot M − r − 1
no
• The final line displays an F test statistics (7.14) of value 8.653 for df1 = 4
and df2 = 3 for dropping all variables except of the intercept β0 . This gives
a p-value of 5.36% which says that the null hypothesis is just about to be
rejected on the the 5% significance level and we stay with the full model.
NL
)
sponding i.i.d. claim sizes for l = 1, . . . , Ni,j in risk class (i, j).
w
(l)
We now analyze Ni,j and Yi,j separately.
(m
Definition 7.7 (exponential dispersion family). X ∼ fX belongs to the exponential
dispersion family if fX is of the form
( )
xθ − b(θ)
fX (x; θ, φ) = exp + c(x, φ, w) ,
φ/w
weights in the discrete case. Moreover, depending on the choice of the cumulant
function b(·) and of the possible parameters Θ the support of X may need to be
restricted to subsets of R.
Lemma 7.8. Choose a fixed cumulant function b(·) and assume that the exponen-
tial dispersion family EDF(θ, φ, w, b(·)) gives well-defined densities with identical
supports for all parameters θ ∈ Θ in an open set Θ. Assume that for any θ ∈ Θ
there exists a neighborhood of zero such that the moment generating function MX (r)
of X ∼ EDF(θ, φ, w, b(·)) is finite in this neighborhood of zero (for r). Then we
have for all θ ∈ Θ and r sufficiently close to zero
( )
b(θ + rφ/w) − b(θ)
MX (r) = exp .
φ/w
Proof. Choose θ ∈ Θ and r in the neighborhood of zero such that MX (r) exists. Then we have
xθ − b(θ)
Z
rx
MX (r) = e exp + c(x, φ, w) dx
φ/w
x(θ + rφ/w) − b(θ)
Z
= exp + c(x, φ, w) dx
φ/w
Z
b(θ + rφ/w) − b(θ) x(θ + rφ/w) − b(θ + rφ/w)
= exp exp + c(x, φ, w) dx.
φ/w φ/w
We have assumed that Θ is an open set. Therefore, for any θ ∈ Θ we have that θr = θ+rφ/w ∈ Θ
for r sufficiently close to zero. Therefore, the last integral is the density that corresponds to
EDF(θr , φ, w, b(·)) and since this is a well-defined density with identical support for all θr ∈ Θ
w)
this last integral is equal to 1. This proves the claim. 2
Corollary 7.9. We make the same assumptions as in Lemma 7.8 and in addition
we assume that b ∈ C 2 in the interior of Θ. Then we have
(m
φ 00
E[X] = b0 (θ) and Var(X) = b (θ).
w
Proof. In view of (1.3) we only need to calculate the first and second derivatives at zero of the
moment generating function. We have from Lemma 7.8
d b(θ + rφ/w) − b(θ) 0
= b0 (θ),
MX (r) = exp b (θ + rφ/w)
tes
dr r=0 φ/w r=0
and
d2
b(θ + rφ/w) − b(θ) 0 2 φ 00
2
MX (r) = exp (b (θ + rφ/w)) + b (θ + rφ/w)
dr r=0 φ/w w r=0
φ
(b0 (θ))2 + b00 (θ).
no
=
w
This proves the claim. 2
these examples and explain how they fit into the exponential dispersion family
framework. These considerations also lead to an explicit explanation of the weight
w > 0. We start with the discrete case assuming X ∼ fX .
• Binomial distribution: Choose Θ = R, b(θ) = log(1 + eθ ), φ = 1 and w = v.
In this case we obtain for x ∈ {0, 1/v, 2/v, . . . , 1}
fX (x; θ, 1)
exp v xθ − log(1 + eθ ) = exp v x log eθ − log(1 + eθ )
=
exp{c(x, 1, v)}
eθ
1
= exp vx log exp v(1 − x) log = pvx (1 − p)v−vx ,
1 + eθ 1 + eθ
for p = eθ /(1 + eθ ) ∈ (0, 1). The first two moments are obtained by
eθ
E[X] = b0 (θ) = = p,
1 + eθ
and
1 00 1 eθ 1 1
Var(X) = b (θ) = θ θ
= p(1 − p).
v v 1+e 1+e v
From this we see that N = vX ∼ Binom(v, p).
w)
for λ = eθ > 0. The first two moments are obtained by
1 00 1 1
E[X] = b0 (θ) = eθ = λ and Var(X) = b (θ) = eθ = λ.
v v v
From this we see that N = vX ∼ Poi(λv).
(m
In the absolutely continuous case we have the following examples.
we have for x ∈ R+
( ) ( )
fX (x; θ, φ) xθ + log(−θ) w/φ −θw
= exp = (−θ) exp − x ,
exp {c(x, φ, w)} φ/w φ
this is a gamma density with scale parameter γ = w/φ > 0 and shape
parameter c = −θw/φ > 0. The first two moments are obtained by
NL
γ φ γ
E[X] = b0 (θ) = −1/θ = and Var(X) = 2
= 2.
c wθ c
For more examples we refer to Table 13.8 in Frees [46] on page 379.
The previous example shows that several popular distribution functions belong to
the exponential dispersion family. In the present layout we concentrate on the
Poisson and the gamma distributions for pricing the two components ’number
of claims’ and ’claims severities’. However, the theory holds true in much more
generality, especially within the exponential dispersion family. Our aim is to express
the expected claim of risk class (i, j) as expected number of claims times the average
claim, i.e.
(l)
E[Si,j ] = E[Ni,j ] E[Yi,j ],
(l)
where Ni,j describes the number of claims in risk class (i, j) and Yi,j the corre-
sponding i.i.d. claim sizes for l = 1, . . . , Ni,j in risk class (i, j). We then aim for
calculating a multiplicative tariff which considers risk characteristics χ’s for both
the number of claims and the claims severities.
We assume that Ni,j are independent with Ni,j ∼ Poi(λi,j vi,j ) and vi,j counting
the number of policies in risk class (i, j). Under these assumptions we derive a
multiplicative tariff structure for the number of claims determining the risk char-
acteristics. For the claim sizes we will do a similar construction by making a
w)
gamma distributional assumption. Since the latter is slightly more involved than
the former we start with the Poisson case.
(m
We assume that Ni,j are independent with Ni,j ∼ Poi(λi,j vi,j ) and vi,j counting the
number of policies in risk class (i, j). In view of the exponential dispersion family
we set for the mean frequency, see Example 7.10,
" #
Ni,j
λi,j =E = b0 (θi,j ) = exp{θi,j } = exp{(Zβ)m }, (7.16)
vi,j
tes
where we make the assumption of having a multiplicative tariff structure which
provides an additive structure on the log-scale reflected by Zβ and the index m =
m(i, j) was defined in (7.6). Thus, we assume that Xi,j = Ni,j /vi,j ∈ N0 /vi,j are
independent with
no
Our aim is to estimate the parameter β under the assumption that for every risk
class (i, j) we have
θi,j = (Zβ)m .
NL
The last piece in this consideration is the link function g which connects the mean
λi,j with the parameter vector. For obtaining a multiplicative structure the natural
link function is the so-called log-link function g(·) = log(·). Applying the log-link
function to (7.16) releases the parameter β in the following linear form
where we have applied the relabeling of the components of X and vi,j such that
they fit to the design matrix Z, see also (7.5).
The MLE β b MLE for β is found by the solution of
∂
`X (β) = 0. (7.17)
∂β
We calculate the partial derivatives of the log-likelihood function
∂ ∂ X Xm θm − exp{θm } X Xm − exp{θm } ∂θm
`X (β) = =
∂βl ∂βl m 1/vm m 1/vm ∂βl
w)
X Xm − exp{(Zβ)m } ∂(Zβ)m X Xm − exp{(Zβ)m }
= = zm,l .
m 1/vm ∂βl m 1/vm
where Z = (zm,l )m,l ∈ RM ×(r+1) for β ∈ Rr+1 , see also (7.12). If we define the weight
matrix V = diag(v1 , . . . , vM ) then we have just proved the following proposition:
(m
Proposition 7.11. The solution to the MLE problem (7.17) in the Poisson case
is given by the solution of
Z 0 V exp{Zβ} = Z 0 V X.
Remarks. One should observe the similarities between the Gaussian case (7.8)
tes
and the Poisson case
MLE MLE
Z 0 Σ−1 Z β
b = Z 0 Σ−1 X and Z 0 V exp{Z β
b } = Z 0 V X.
The Gaussian case is solved analytically (assuming full rank of Z), the Poisson case
no
can only be solved numerically, due to the presence of the exponential function.
The Poisson case is rewritten as
Z 0 V exp{Zβ MLE } − Z 0 N = 0.
Observe that the latter exactly leads to method of total marginal sums by Bailey
NL
From the moment generating function given in Section 3.2.1 we immediately see
that for given ni,j the convolution is given by
ni,j
X (l)
Yi,j = Yi,j ∼ Γ(γi,j ni,j , ci,j ).
l=1
We define the normalized random variable Xm = Yi,j /ni,j where we again use the
relabeling defined in (7.6). Observe that the family of gamma distributions is closed
towards multiplication, see (3.4). Therefore, the density of Xm is then given by
(cm nm )γm nm γm nm −1
fXm (x) = x exp{−cm nm x}. (7.18)
Γ(γm nm )
Next we do a reparametrization similar to Example 7.10. Set γm = 1/φm and
cm = −θm /φm . This provides
w)
fXm (x) = x exp − x .
Γ(nm /φm ) φm
Finally, define cumulant function b(θ) = − log(−θ) for θ < 0, see Example 7.10.
The density of Xm = Yi,j /ni,j is then given by
(m
( ) !nm /φm
θm x − b(θm ) 1 nm
fXm (x) = exp xnm /φm −1 .
φm /nm Γ(nm /φm ) φm
− log(−θm ) = (Zβ)m .
For rewriting the previous equation in matrix form we define the weight matrix
Vθ = diag(−θ1 n1 /φ1 , . . . , −θM nM /φM ). The last equation is then written as
∂
`X (β) = Z 0 Vθ X − Z 0 Vθ exp{Zβ}.
∂β
Proposition 7.12. The solution to the MLE problem (7.19) in the gamma case is
given by the solution of
w)
Z 0 Vθ exp{Zβ} = Z 0 Vθ X.
Remarks.
(m
• Proposition 7.12 for the gamma case looks very promising because it has
the same structure as Proposition 7.11 for the Poisson case. However, this
similarity is only at the first sight: the parameter β determines the θ which
is also integrated into the weight matrix Vθ . Therefore, the MLE β b MLE is
only found numerically, using either Fisher’s scoring method or the Newton-
Raphson algorithm.
tes
• For the general case within the exponential dispersion family with link func-
tion g we refer to Section 2.3.2 in Ohlsson-Johansson [72].
no
• We have seen that the weights wi,j are given by the number of policies vi,j in
the Poisson case and by the number of claims ni,j in the gamma case.
MLE
Z 0 Σ−1 Z β
b − Z 0 Σ−1 X = 0.
Poisson case:
MLE
Z 0 V exp{Z β
b } − Z 0 V X = 0.
Gamma case:
MLE
Z 0 Vbθ exp{Z β
b } − Z 0 Vbθ X = 0,
b MLE }.
with θb = − exp{−Z β
w)
We define the function h = (b0 )−1 which implies that θbm = h(µb m ). The log-
likelihood function for this estimate is then given by
X Xm h(µb m ) − b(h(µb m ))
`X (µ)
b = + c(Xm , φ, wm ),
(m
m φ/wm
D∗ (X, µ)
b = 2 (`X (X) − `X (µ))
b
2 X h i
= wm Xm h(Xm ) − b(h(Xm )) − Xm h(µb m ) + b(h(µb m )) .
φ m
NL
b = φD ∗ (X, µ)
D(X, µ) b = 2φ (`X (X) − `X (µ))
b .
Observe that these deviance statistics play the role of the likelihood ratio Λ given
in (7.13) if we compare the model Zβ to the saturated model which is used as
benchmark in this analysis.
Similar to Section 7.2.2 we would now like to see whether we can reduce the number
of parameters in β ∈ Rr+1 .
b H0 ) − D(X, µ
D(X, µ b full ) M − r − 1
F = . (7.20)
D(X, µ b full ) p
w)
given by df 1 = p and df 2 = M − r − 1. Therefore, we apply the same criterion as
in (7.15).
A second test statistics considered is, see Lemma 3.1 in Ohlsson-Johansson [72],
(m
X 2 = D∗ (X, µ
b H0 ) − D ∗ (X, µ
b full ).
Finally, to check the accuracy of the model and the fit one should also study the
residuals. We can study Pearson’s residuals given by
Xm − b0 (θbm )
rP,m = q ,
b00 (θbm )/wm
and the deviance residuals
r h i
0
rD,m = sgn(Xm − b (θb m )) 2wm Xm h(Xm ) − b(h(Xm )) − Xm θbm + b(θbm )
for m = 1, . . . , M . These residuals should not show any structure because the
Xm where assumed to be independent and the observed rP,m should roughly be
centered having the similar variances φ.
From Example 7.10 we know that these Xm ’s have a Gaussian distribution, i.e. their
densities are given by
( )
1 1 (xm − θm )2
f (xm ; θm , φ) = q exp − .
2πφ/wm 2 φ/wm
w)
n o
b = b0 (θ)
The scaled deviance is given by, set µ b =θ
b = exp Z β
b ,
1X 2
D∗ (X, µ)
b = wm Xm − θbm ,
φ m
(m
and the deviance statistics is given by
X 2
b =
D(X, µ) wm Xm − θbm .
m
w)
Theory
(m
In the previous chapter we have done tariffication using GLM. This was done
by splitting the total portfolio into different homogeneous risk classes (i, j). The
volume measures in these risk classes (i, j) were given by vi,j in Section 7.3.1 and
by ni,j in Section 7.3.2, respectively. There might occur the situation where a risk
class (i, j) has only small volume vi,j and ni,j , respectively, i.e. only a few policies
tes
or claims fall into that risk class. In that case an observation Ni,j and Si,j may not
be very informative and single outliers may disturb the whole picture. Credibility
theory aims to deal with such situations in that it specifies a tariff of the following
structure
µi,j = αi,j Si,j + (1 − αi,j )µ,
no
i.e. the tariff µi,j for next accounting year is calculated as a credibility weighted
average between the individual past observation Si,j and the overall average µ
with credibility weight αi,j ∈ [0, 1]. For αi,j = 1 we completely believe into past
observations, for αi,j = 0 we only believe into the overall average µ. Credibility
NL
theory makes this approach rigorous and specifies the credibility weights.
• There are exact Bayesian methods which allow for analytical solutions.
• There are simulation methods such as the Markov chain Monte Carlo (MCMC)
method which allow for numerical solutions of Bayesian models.
• There are approximations such as linear credibility methods which give opti-
mal solutions in sub-spaces of possible solutions.
183
184 Chapter 8. Bayesian Models and Credibility Theory
w)
and he has brought it into today’s form. Therefore, the Bayes’ rule should be called
Bayes-Price-Laplace’s rule. For a historical review we refer to McGrayne [67]. As
we will just see, Bayes’ rule is the mathematical tool to combine prior knowledge
and observations into posterior knowledge. Technically it exchanges probabilities,
(m
therefore it is also known under the name method of inverse probabilities.
fθ (x)π(θ)
π(θ|x) = R ∝ fθ (x)π(θ).
fθ (x)π(θ) dθ
This means we start with a prior distribution π(θ). This prior distribution either
expresses expert knowledge or is determined from a portfolio of similar business.
Having observed x, we modify the prior believe π to obtain the posterior distribu-
tion π(θ|x) that reflects both prior knowledge π(θ) about θ and experience x, that
is, the prior believe π(θ) is improved by the arriving observation x. Thus, when-
ever an observation arrives we can update our knowledge about θ, which constantly
improves our estimation of the unknown model parameter θ.
w)
mentioned in Bühlmann-Gisler [24], this mathematical model goes
back to Fritz Bichsel (1921-1999) [11]. He has introduced it
in the 1960s to calculate a bonus-malus tariff system for Swiss
motor third party liability insurance. The aim was to punish bad
drivers and to reward good drivers which has led to bonus-malus
(m
considerations. F. Bichsel
Definition 8.1 (Poisson-gamma model). Assume fixed volumes vt > 0 are given
for t ∈ N.
Remarks 8.3.
• The posterior is again a gamma distribution but with modified parameters.
For the parameters we obtain the updates
T T
γ 7→ γTpost = γ + c 7→ cpost
X X
Nt and T =c+ vt .
t=1 t=1
Often γ and c are called prior parameters and γTpost and cpost
T posterior pa-
rameters (at time T ).
)
• The remarkable property in the Poisson-gamma model is that the posterior
distribution stays in the same family of distributions as the prior distribution.
w
There are more examples of this kind as we will see below. Many of these
examples belong to the exponential dispersion family with conjugate priors.
(m
• For the estimation of the unknown parameter Λ we obtain the following prior
and posterior estimators
γ
λ0 = E[Λ] = ,
c
γTpost γ + Tt=1 Nt
P
post
λT
b = E[Λ|N ] = post = .
c + Tt=1 vt
P
cT
tes
We analyze the posterior estimator λ b post in more detail below, which will
T
provide the basic credibility formula.
no
b post = α λ
λ b + (1 − α ) λ .
T T T T 0
PT T
t=1 vt b = P 1
X
αT = PT ∈ (0, 1) and λT T Nt .
c+ t=1 vt t=1 vt t=1
This proves the first claim. For the estimation uncertainty we have
2 γTpost
post 1 bpost
E Λ − λT b N = Var ( Λ| N ) = post 2 = (1 − αT ) λ .
(cT ) c T
Remarks 8.5.
b post is a credibility weighted
• Corollary 8.4 shows that the posterior estimator λ T
average between the prior guess λ0 and the purely observation based estimator
b with credibility weight α ∈ (0, 1).
λ
)
T T
w
• The credibility weight αT has the following properties:
(m
t if vt counts the number of policies);
2. for the volume vt → ∞: αT → 1;
3. for the prior uncertainty going to infinity, i.e. c → 0: αT → 1;
4. for the prior uncertainty going to zero, i.e. c → ∞: αT → 0.
Note that
tes
γ 1
Var (Λ) =
2
= λ0 .
c c
For c large we have informative prior distribution, for c small we have vague
prior distribution and for c = 0 we have non-informative or improper prior
distribution. The latter means that we have no prior parameter knowledge
no
• The observation based estimator satisfies, see Estimators 2.27 and 2.32,
b MV = λ
λ b MLE = λ
b .
T T T
NL
• The posterior estimator λb post has the nice property of a recursive update
T
structure which is important in many situations, see next corollary.
b post = β NT b post .
λ T T + (1 − βT ) λ T −1
vT
with credibility weight
vT
βT = PT ∈ (0, 1).
c+ t=1 vt
w)
−1
T
!
1 X
PT Nt + NT
c+ t=1 vt t=1
PT −1 PT −1 T −1
vT NT c + t=1 vt t=1 vt 1 X
= PT + PT PT −1 PT −1 Nt
c+ t=1 vt
vT c + t=1 vt c + t=1 vt t=1 vt t=1
(m
NT
= βT + (1 − βT ) αT −1 λ
bT −1 .
vT
Collecting all terms provides the claim. 2
b post = β Nt b post ,
λ + (1 − βt ) λ
no
t t t−1
vt
parameters change from prior parameters to posterior parameters. There are many
examples of this type. The best known examples belong to the exponential disper-
sion family with conjugate priors. We have already met the exponential dispersion
family in Definition 7.7, X ∼ EDF(θ, φ, w, b(·)) has (generalized) density
( )
xθ − b(θ)
fX (x; θ, φ) = exp + c(x, φ, w) ,
φ/w
for an (unknown) parameter θ in the open set Θ. In the Bayesian case we will model
this parameter Θ = θ with a prior distribution π on Θ and then try to determine the
posterior distribution after we have collected observations X1 , . . . , XT that belong
)
to this EDF(θ, φ, w, b(·)).
w
Model Assumptions 8.7 (exponential dispersion family with conjugate priors).
Assume fixed volumes wt > 0, t = 1, . . . , T , a dispersion parameter φ > 0 and a
(m
cumulant function b : Θ → R on an open set Θ ⊂ R are given.
Theorem 8.8. We make Model Assumptions 8.7 and assume that the domain I of
NL
possible prior choices x0 is an open interval which contains the range of Xt for all
Θ ∈ Θ and t = 1, . . . , T . The posterior distribution of Θ, given X, is given by the
density πbxpost ,τ post (θ) with
T
" T #−1/2
post wt 1
τ post ∈ (0, cτ ),
X
τ = + 2 <τ with
t=1 φ τ
xbpost
T = αT xbMV
T + (1 − αT ) x0 ∈ I,
with credibility weight αT and (minimum variance) estimator xbMV
T
PT T
wt
t=1 1
xbMV
X
αT = PT φ and T = PT wt X t ,
t=1 wt + τ 2 t=1 wt t=1
where for the minimum variance statement we additionally assume that the second
moments of Xt |{Θ} exist and the cumulant function b ∈ C 2 in the interior of Θ.
Proof. The Bayes’ rule gives for the posterior distribution of Θ, conditionally given X,
T T
Y Y Xt θ − b(θ) x0 θ − b(θ)
π (θ| X) ∝ fXt (Xt ; θ, φ) πx0 ,τ (θ) ∝ exp exp
t=1 t=1
φ/wt τ2
(" T # " T # )
X Xt wt x0 X wt 1
= exp + 2 θ− + 2 b(θ)
t=1
φ τ t=1
φ τ
"
T
#−1 " T #
X wt 1 X Xt wt x 0
= exp (τ post )−2 + 2 + 2 θ − b(θ) .
t=1
φ τ t=1
φ τ
w)
" T #−1 " T # T
X wt 1 X Xt wt x0 1 X wt
+ 2 + 2 = αT PT wt
Xt + (1 − αT )x0 ∈ I.
t=1
φ τ t=1
φ τ t=1 φ t=1
φ
(m
T
tion. There remains the proof of the minimum variance statement. For fixed parameter Θ ∈ Θ
we know that X = (X1 , . . . , Xn ) are independent with Xt ∼ EDF(Θ, φ, wt , b(·)). Corollary 7.9
(or its generalization) implies
φ 00
E[Xt |Θ] = b0 (Θ) and Var(Xt |Θ) = b (Θ). (8.1)
wt
Note that Θ does not depend on t, therefore the statement of the minimum variance estimator
tes
follows from Lemma 2.26. This closes the proof. 2
Proof. In view of Theorem 8.8 it suffices to prove the first statement for all x0 ∈ I and τ ∈ (0, cτ ).
x0 θ − b(θ)
Z
E [b0 (Θ)] = b0 (θ) exp + d(x 0 , τ ) dθ
Θ τ2
x0 − b0 (θ)
x0 θ − b(θ)
Z
= x0 − τ 2 exp + d(x 0 , τ ) dθ
Θ τ2 τ2
x0 θ − b(θ)
= x0 − τ 2 exp {d(x0 , τ )} exp = x0 .
τ2
∂Θ
E [XT +1 | X1 , . . . , XT ] = E [E [XT +1 | Θ, X1 , . . . , XT ]| X1 , . . . , XT ]
= E [E [XT +1 | Θ]| X1 , . . . , XT ]
= E [b0 (Θ)| X1 , . . . , XT ] (8.2)
= αT xbMV
T + (1 − αT ) x0 .
w)
Thus, we get a probability weighted average for the premium of XT +1 which is based
on the prior knowledge πx0 ,τ and on the past experience X1 , . . . , XT . Similar to
Corollary 8.6 we obtain a recursive update structure for this experience premium,
which allows to express the premium more and more accurately as time passes
(m
(under the above stationarity assumptions, of course).
Remarks 8.11.
above.
prior information, which may come from experts or from similar business.
Moreover, parameter uncertainty is quantified by the posterior distribution.
Example 8.12 (gamma-gamma model). We close this section with the example of
the gamma-gamma model. We recall Example 7.10. Choose fixed volumes wt > 0,
t = 1, . . . , T , and dispersion parameter φ = 1/γ > 0. Assume that conditionally,
given Θ > 0, X1 , . . . , XT are independent gamma distributed with densities
This is the form used in (7.18) with c = Θ/φ. Observe that the range of the random
variables Xt is R+ and that we obtain well-defined gamma densities on R+ for all
Θ ∈ R+ and all t = 1, . . . , T . This motivates the choice of the open set Θ
f = R
+
for the possible parameter choices Θ.
Thus, we need to show two things: (i) the density fXt (x; Θ, φ) belongs to the
exponential dispersion family for a particular cumulant function b : Θ → R; (ii)
this will allow to define the conjugate prior density πx0 ,τ for which we would like
to show that we can apply Theorem 8.9.
w)
Item (i) was already done in Example 7.10, however we will do it once more because
the signs need a careful treatment.
(m
n o
= exp log Θwt /φ − Θwt /φ x exp {c(x, φ, wt )}
( )
x(−Θ) − (− log(−(−Θ)))
= exp exp {c(x, φ, wt )} .
φ/wt
The last formula seems to be a waste of minus signs, but with the definitions
ϑ = −Θ and b(ϑ) = − log(−ϑ) for ϑ < 0 we see that the gamma density belongs
tes
to the exponential dispersion family, that is, by a slight abuse of notation in fXt ,
( )
xϑ − b(ϑ)
fXt (x; ϑ, φ) = exp exp {c(x, φ, wt )} .
φ/wt
no
This is a gamma density, set θ = −ϑ, with shape parameter 1 + 1/τ 2 > 0 and scale
parameter x0 /τ 2 . This implies that we should choose I = R+ and τ > 0. In view of
Theorem 8.8 the assumptions are fulfilled because I is an open interval containing
all possible observations Xt , and thus Theorem 8.8 can be applied.
f = −Θ given
Next we observe that this density disappears on the boundary of Θ
by the set {0} ∪ {∞}. Therefore, we have from Theorem 8.9 (we also perform the
2 1+1/τ 2
−1 (x0 /τ
) x0
Z
h i 1
0 −1 +1−1
x0 = E[b (ϑ)] = E Θ = θ 2
θ τ2 exp − 2 θ dθ
R+ Γ(1 + 1/τ ) τ
2 2
(x0 /τ 2 )1+1/τ Γ(1/τ 2 ) Z (x0 /τ 2 )1/τ 12 −1 x0
= θ τ exp − θ dθ = x0 .
Γ(1 + 1/τ 2 ) (x0 /τ 2 )1/τ 2 R+ Γ(1/τ 2 ) τ2
w)
with credibility weight
PT
t=1wt
αT = PT φ .
(m
t=1 wt + τ 2
Markov chain Monte Carlo (MCMC) methods which will provide the posterior dis-
tribution in almost any situation where we can write down the posterior density
up to the normalizing constant. That is, whenever we have a posterior density of
the following form
π(θ|x) ∝ fθ (x)π(θ),
w)
section. The key model to this analysis is the Hans Bühlmann
and Erwin Straub (1938-2004) model [25].
(m
Model 8.13 (Bühlmann-Straub (BS) model [25]). Assume we have I risk classes
and T random variables per risk class. Assume fixed volumes wi,t > 0, i = 1, . . . , I
and t = 1, . . . , T , are given.
E [ Xi,t | Θi ] = µ(Θi ),
σ 2 (Θi )
Var (Xi,t | Θi ) = .
wi,t
no
Remarks 8.14.
• The conditional mean and variance are characterized by the two functions
µ : Θ → R and σ 2 : Θ → R+ ; Θ 7→ µ(Θ) and Θ 7→ σ 2 (Θ).
• If we set I = 1, i.e. we only have one risk class, then an explicit example to the
BS Model 8.13 is given by the exponential dispersion family with conjugate
priors, Model Assumptions 8.7. The conditional mean and variance are then
modeled by, see (8.1),
w)
τ 2 = Var(µ(Θ1 )) variance between risk classes, (8.4)
σ 2 = E[σ 2 (Θ1 )] variance within risk class. (8.5)
(m
8.2.2 Bühlmann-Straub credibility formula
The full Bayesian estimator for the (unknown) mean µ(Θi ) of risk class i is given
by
\
µ(Θ i ) = E [µ(Θi )| X 1 , . . . , X I ] . (8.6)
tes
In the exponential dispersion family with conjugate priors this posterior mean can
be calculated explicitly, see Theorem 8.9. In most other situations, however, this is
not the case. Therefore, we approximate this posterior mean. We briefly describe
how this approximation is done. Assume that all considered random variables are
square integrable, thus we work in the Hilbert space L2 (Ω, F, P) of square integrable
no
In this Hilbert space the random variables X 1 , . . . , X I generate the subspace G(X)
of all σ(X 1 , . . . , X I )-measurable random variables. The posterior mean µ(Θ \ i ),
NL
2
given by (8.6), is the element of the subspace G(X) that minimizes the L -distance
between this subspace G(X) and µ(Θi ). Since we have a Hilbert space this estimate
\
µ(Θ i ) corresponds to the orthogonal projection of µ(Θi ) onto G(X). In the general
case this minimization and orthogonal projection to G(X), respectively, has a too
complicated form. To reduce this complexity we restrict the orthogonal projection
to simpler subsets L of G(X). This will provide approximations to µ(Θ \ i ) ∈ G(X)
in the more restricted subsets L ⊂ G(X). We define the following two subsets
X
L(X, 1) = µb = a0 + ai,t Xi,t ; a0 ∈ R, ai,t ∈ R for all i, t ⊂ G(X),
i,t
X
Lµ0 (X) = µb = ai,t Xi,t ; for all i, t and E[µ]
b = µ0 ⊂ G(X).
i,t
The first subsets L(X, 1) includes the constants which will imply unbiasedness of
the estimators, whereas in the second case for Lµ0 (X) we need to enforce unbi-
asedness by a side constraint.
w)
b∈L(X,1)
µ
(m
i E b i
b∈Lµ0 (X)
µ
\
tes
Remark 8.16. The inhomogeneous credibility estimator µ(Θ \ i ) is the best approx-
2
imation to µ(Θi ) (in the L -sense) among all linear estimators given by L(X, 1).
Because L(X, 1) is a subset of G(X), we immediately obtain for the mean square
error with the Pythagorean theorem for successive orthogonal projections
no
!2 " !2
2 #
\
\ \ \
\ \
E µ(Θi ) − µ(Θi ) =E µ(Θi ) − µ(Θi ) + E µ(Θi ) − µ(Θi ) (8.7)
.
NL
\
\
µ(Θi ) = αi,T Xi,1:T + (1 − αi,T ) µ0 ,
c
PT T
t=1 wi,t 1 X
αi,T = PT σ2
and X
c
i,1:T = PT wi,t Xi,t .
t=1 wi,t + τ 2 t=1 wi,t t=1
)
The homogeneous credibility estimator is given by
w
hom
\
\
µ(Θi) = αi,T X i,1:T + (1 − αi,T ) µ
bT ,
c
(m
with estimate
I
1 X
µb T = PI αi,T X
c
i,1:T .
i=1 αi,T i=1
tes
Proof of Theorem 8.17. The theorem can be proved by brute force doing convex optimizations
(using the method of Lagrange in the latter case) or we can apply Hilbert space techniques using
projection properties, see Chapters 3 and 4 in Bühlmann-Gisler [24]. We do the brute force
calculation because it is quite straightforward. We minimize
2
no
X
h(a) = E a0 + al,t Xl,t − µ(Θi )
l,t
over all possible choices a0 , ai,t ∈ R. This requires that we calculate all derivatives w.r.t. these
parameters and set them equal to zero.
NL
∂ X !
h(a) = 2E a0 + al,t Xl,t − µ(Θi ) = 0, (8.8)
∂a0
l,t
∂ X !
h(a) = 2E Xj,s a0 + al,t Xl,t − µ(Θi ) = 0. (8.9)
∂aj,s
l,t
Plugging this into (8.9) and using (8.8) once more immediately gives for all j, s the requirement
!
X
Cov Xj,s , al,t Xl,t − µ(Θi ) = 0.
l,t
Using the uncorrelatedness between different risk classes (which is implied by the independence)
we obtain the following (normal) equations, see Corollary 3.17 and Section 4.3 in Bühlmann-Gisler
[24],
X
a0 = µ0 1 − al,t , (8.10)
l,t
T
X
Cov (Xj,s , µ(Θi )) = aj,t Cov (Xj,s , Xj,t ) for all j, s. (8.11)
t=1
)
Cov (Xj,s , Xj,t ) = E [Cov ( Xj,s , Xj,t | Θj )] + Cov (E [ Xj,s | Θj ] , E [ Xj,t | Θj ])
1
E σ 2 (Θj ) 1{t=s} + Var (µ(Θj ))
w
=
wj,s
σ2
= 1{t=s} + τ 2 > 0.
wj,s
(m
The first covariance is given by
This implies that the left-hand side of (8.11) is equal to 0 for j 6= i and because Cov (Xj,s , Xj,t ) ≥
τ 2 > 0 it follows that aj,s = 0 for all j 6= i. This is not surprising because we have assumed that
the different risk classes are independent. Therefore (8.10)-(8.11) reduces to
tes
T
!
def.
X
a0 = µ0 1 − ai,t = µ0 (1 − αi,T ) , (8.12)
t=1
2 T
σ X σ2
τ2 = ai,s + τ2 ai,t = ai,s + τ 2 αi,T for all s. (8.13)
wi,s t=1
wi,s
no
PT
This defines αi,T = t=1 ai,t and we still need to see that this credibility weight has the claimed
form. Requirement (8.13) then implies for all s
τ2
ai,s = (1 − αi,T ) wi,s .
σ2
If we sum this over s we obtain
NL
T T
X τ2 X
αi,T = ai,s = (1 − αi,T ) wi,s .
s=1
σ2 s=1
If we collect all the terms we have found the following inhomogeneous credibility estimator
T
\
\ 1 X
µ(Θi ) = αi,T PT wi,s Xi,s + (1 − αi,T ) µ0 = αi,T X
bi,1:T + (1 − αi,T ) µ0 .
t=1 wi,t s=1
This proves the first claim and an important observation is that this credibility estimator is
unbiased for µ0 . Therefore, it coincides with the estimator if we would have projected to
Lµ0 (X, 1) = L(X, 1) ∩ {bµ ∈ L2 (Ω, F, P) : E[b
µ] = µ0 }.
The proof of the homogeneous credibility estimator goes along the same lines as the inhomoge-
neous one, using the method of Lagrange for replacing (8.8) by the side constraint
X X X
µ0 = E [b
µ] = E ai,t Xi,t = ai,t E [Xi,t ] = ai,t µ0 ,
i,t i,t i,t
P
which implies i,t ai,t = 1. An alternative proof would go by using the iterative property and the
linearity of orthogonal projections on subspaces. For details we refer to Section 4.6 in Bühlmann-
w)
Gisler [24]. This closes the proof of Theorem 8.17. 2
(m
\ 2
Bayesian estimator µ(Θ i ) in the L -sense, see also (8.7).
The inhomogeneous and the homogeneous credibility estimators are somewhat dif-
ferent which may also lead to different interpretations.
only.
This latter case can now be used for tariffication of risk factors on different risk
classes, similar to the GLM Chapter 7. The overall premium is given by µb T , the
experience of risk class i is given by X i,1:T and the credibility weight αi,T ∈ (0, 1)
c
explains how this information needs to be combined to obtain the risk adjusted
premium of risk class i.
w)
Bühlmann-Gisler [24]. We define
T
1 2
sb2i
X
= wi,t Xi,t − X
c
i,1:T .
T − 1 t=1
(m
A straightforward calculation shows that this is an unbiased estimator for σ 2 (Θi ),
conditionally given Θi . But this immediately implies that sb2i is an unbiased esti-
mator for σ 2 for all i. Therefore, we set
I
1 X
σbT2 sb2i ,
tes
= (8.15)
I i=1
with E[σbT2 ] = σ 2 . Observe that one risk class is sufficient to get an estimate for σ 2 .
If we have prior knowledge µ0 then τ 2 should be calibrated such that it quantifies the
reliability of this prior knowledge. If we use the homogeneous credibility estimator
no
then τ 2 is estimated from the volatility between the risk classes (here we need
more than one risk class i). Therefore, we define the weighted sample mean over
all observations
!
1 X 1 X X
X̄ = P wi,t Xi,t = P wi,t X
c
i,1:T .
NL
Similar to Lemma 2.29 we can calculate the expected value of vbT2 which then shows
that we need to define !
2 2 I σbT2
tT = cw vbT − P
b ,
j,s wj,s
This estimator has the unbiasedness property E[tb2T ] = τ 2 , we refer to Section 4.8 in
Bühlmann-Gisler [24]. The only difficulty is that it might become negative which,
of course, is non-sense for estimating τ 2 . Therefore, we set for the final estimator
n o
τbT2 = max tb2T , 0 . (8.16)
w)
1 2 3 4 5
risk class 1 v1,t 729 786 872 951 1019
S1,t 583 1100 262 837 1630
X1,t 80.0% 139.9% 30.0% 88.0% 160.0%
(m
risk class 2 v2,t 1631 1802 2090 2300 2368
S2,t 99 1298 326 463 895
X2,t 6.1% 72.0% 15.6% 20.1% 37.8%
risk class 3 v3,t 796 827 874 917 944
S3,t 1433 496 699 1742 1038
X3,t 180.0% 60.0% 80.0% 190.0% 110.0%
tes
risk class 4 v4,t 3152 3454 3715 3859 4198
S4,t 1765 4145 3121 4129 3358
X4,t 56.0% 120.0% 84.0% 107.0% 80.0%
risk class 5 v5,t 400 420 422 424 440
S5,t 40 0 169 1018 44
no
Table 8.1: Observed claims Si,t and corresponding numbers of policies vi,t .
NL
The data is provided in Table 8.1. We have claims Si,t and corresponding numbers
of policies vi,t . In order to apply the BS model we choose volumes wi,t = vi,t ,
i.e. the volumes wi,t are determined by the number of policies in the corresponding
cell (i, t) and we define the claims ratios Xi,t = Si,t /vi,t . Our aim is to apply the BS
model to (Xi,t )i,t . Observe that the application of the BS model is motivated by
the fact that some cells have small volumes and volatile claims ratios. Therefore,
Bayesian methods are applied to smooth the premia.
hom
\
\
We would like to calculate the homogeneous credibility estimator µ(Θi ) for the
claims ratios of the risk classes i = 1, . . . , 5, see Theorem 8.17. Therefore, we
first need to estimate the structural parameters. With formulas (8.15) and (8.16)
we obtain σbT2 = 261.2 and τbT2 = 0.1021. This gives estimated credibility coefficient
bT = σ
κ b T2 /τbT2 = 2558 and from this we can estimate the credibility weights αi,T . The
estimates are provided in Table 8.2. We see that in risk class 4 we have big volumes
risk class 1 risk class 2 risk class 3 risk class 4 risk class 5
αb i,T 63.0% 79.9% 63.0% 87.8% 45.2%
Xi,1:T
b 101.3% 30.2% 124.1% 89.9% 60.4%
hom
\
\
µ(Θ i) 93.5% 40.3% 107.9% 88.7% 71.3%
w)
v4,t which results in a high credibility weight estimate of αb 4,T = 87.8%. In risk class
5 we have small volumes v5,t which results in a low credibility weight estimate of
αb 5,T = 45.2%. From this we calculate the credibility weighted overall claims ratio
µb T = 80.4% (which should be compared to the sample mean X̄ = 77.9%) and
(m
from this we finally calculate the homogeneous credibility estimators for the claims
ratios, see Table 8.2. We observe smoothing of X c
i,1:T towards µb T according to the
credibility weights αb i,T .
Exercise 14.
(a) Choose the data of Table 8.1 and calculate the inhomogeneous credibility esti-
tes
\
\
mators µ(Θ i ) for the claims ratios under the assumption that the collective mean
is given by µ0 = 90% and the variance between risk classes is given by τ 2 = 0.20.
(b) What changes if the variance between risk classes is given by τ 2 = 0.05?
no
!
\
\ \
\
Xi,T +1 − µ(Θi ) = (Xi,T +1 − µ(Θi )) + µ(Θi ) − µ(Θi ) .
!2
\
\
= E [Var (Xi,T +1 | Θi )] + E µ(Θi ) − µ(Θi)
σ2
= + (1 − αi,T ) τ 2 , (8.17)
wi,T +1
see Theorem 4.3 in Bühlmann-Gisler [24]. Similarly we obtain for the homogeneous
credibility estimator, see Theorem 4.6 in Bühlmann-Gisler [24],
2
hom !
\ σ2 1 − αi,T
E Xi,T +1 \
− µ(Θi) = + (1 − αi,T ) τ 2 1+ P . (8.18)
wi,T +1 i αi,T
The expressions in (8.17) and (8.18) are called mean square error of prediction
(MSEP). We will come back to this notion in Section 9.3 and for a comprehensive
treatment we refer to Section 3.1 in Wüthrich-Merz [87].
w)
hom
\
\
Exercise 15. Estimate the prediction uncertainty E[(Xi,T +1 − µ(Θi) )2 ] for the
data of Example 8.19 under the assumption that the volume grows 5% in each risk
class.
(m
Exercise 16. We consider Example 4.1 of Bühlmann-Gisler [24]. The observed
numbers of policies vi and claims counts Ni in 21 different regions are given in
Table 8.3.
region i vi Ni
1 50’061 3’880
2 10’135 794
3 121’310 8’941
tes
4 35’045 3’448
5 19’720 1’672
6 39’092 5’186
7 4’192 314
8 19’635 1’934
9 21’618 2’285
10 34’332 2’689
no
11 11’105 661
12 56’590 4’878
13 13’551 1’205
14 19’139 1’646
15 10’242 850
16 28’137 2’229
17 33’846 3’389
18 61’573 5’937
NL
19 17’067 1’530
20 8’263 671
21 148’872 15’014
total 763’525 69’153
Calculate the homogeneous credibility estimators for each region i under the as-
sumption that Ni |Θi has a Poisson distribution with mean µ(Θi )vi = Θi λ0 vi .
Hint: For the estimation of the credibility coefficient κ = σ 2 /τ 2 one should use that
Ni |Θi is Poisson distributed which has direct consequences for the corresponding
variance σ 2 (Θi ), see also Proposition 2.8.
w)
(m
tes
no
NL
Claims Reserving
w)
This chapter will give a completely new perspective to non-life insurance business
(m
which has not been tackled so far in these notes. Until now we have assumed that
the total claim amount for a fixed accounting year can be described by a compound
distribution of the form
Nt
X (t)
St = Yi ,
tes
i=1
where t = 1, . . . , T denotes the different accounting years and Nt counts the number
of claims in accounting year t. This was the base model for the study of the surplus
process (Ct )t∈N0 in Chapter 5 and it was also the base assumption for parameter
no
estimation (based on past claims experience) for the prediction of future claims.
This model suggests that we have Nt claims in accounting year t and their claim
(t) (t)
sizes Y1 , . . . , YNt describe the total payouts to the insured. The issue in practice is
that a typical non-life insurance claim cannot be settled immediately at occurrence,
(t)
i.e. if Yi describes the claim amount of claim i in accounting year t then, in
NL
general, this claim amount is not observable at time t due to a settlement delay
that allows for a final assessment only later. Likewise Nt is not observable at the
end of accounting year t because there might be claims that have occurred in year
t but which are reported only later. We describe reasons for such reporting and
settlement delays in the next section. As a consequence we need to predict future
cash flows of claims that have occurred in the past in order to have a sound basis
for pricing future insurance contracts. This task is exactly the claims reserving
problem and it assesses outstanding loss liabilities for past claims. The prediction
of these outstanding loss liabilities constitute the claims reserves. Importantly,
these claims reserves typically are the largest position on the liability side of the
balance sheet of a non-life insurance company and are essential for the financial
strength of the company. Therefore, we aim to describe the claims reserving process
in this chapter and we also would like to describe the uncertainties involved.
205
206 Chapter 9. Claims Reserving
w)
example several years. Reasons for such reporting delays are that claims are not
immediately reported to the insurance company, for instance, a stolen bike is only
reported once it is clear that it will not “reappear”, but of course the accident
date is the day the bike was stolen. Large reporting delays are typically caused by
(m
claims which are not immediately noticed. A common example is an asbestos claim
which is typically caused a long time before cancer is diagnosed and reported. The
accident date refers to the event when there was contact with asbestos, the trigger
of the cancer, and not to the date of the breakout of the asbestos disease.
Once a claim is reported to the insurance company it typically cannot be settled
immediately. The insurance company starts an investigation, observes the recovery
tes
process, waits for external information, external bills, court decisions, etc. This
process may last for several years for more involved claims. Of course, the insurance
company cannot wait with claims payments until there is a final assessment of the
claim but it will continuously pay for justified claims benefits. Therefore, insurance
no
claims trigger a whole sequence of cash flows after the reporting date. This period
is called settlement period and the final assessment of a claim is called settlement
date or closing date.
Thus, we have three important (ordered) dates for non-life insurance claims:
In addition, there are the following two important dates: beginning of insurance
period U1 and ending of insurance period U2 > U1 , we always assume U2 < ∞.
Typically, the insurance company is only liable for a claim if T1 ∈ [U1 , U2 ], thus, we
only consider claims that have accident dates T1 which fall into the insured period
[U1 , U2 ] specified in the insurance contract.
1. t < T1 . Such (possible) claims have not yet occurred. If the company is
”lucky” then T1 > U2 . This means that it is not liable for this particular claim
with the actual insurance policy because the contract is already terminated
Figure 9.1: Non-life insurance run-off showing insurance period [U1 , U2 ] and a claim
with accident date T1 ∈ [U1 , U2 ], reporting date T2 > U2 and settlement date
T3 > T2 . Moreover, we have claims payments during the settlement period.
w)
at claims occurrence. Be careful, the company may still be liable for this
particular claim, namely, if the contract is renewed and T1 falls into the
(m
renewed insurance period, but renewals are not of interest for the present
discussion.
In this first case t < T1 the only information available at the insurance com-
pany is the insurance contracts signed, i.e. the exposure for which it is liable
in case of a claims occurrence T1 ∈ [U1 , U2 ].
tes
2. T1 ≤ t < T2 and T1 ∈ [U1 , U2 ]. In this case the insurance claim has occurred
but it has not yet been reported to the insurance company. These claims
are called incurred but not yet reported (IBNyR) claims. For such claims
we do not have any individual claims information (because it is IBNyR) but
no
4. T3 < t and T1 ∈ [U1 , U2 ]. Claim is settled, file is closed and stored and we
expect no further payments for that claim. In some circumstances, it may
be necessary that a claim file is re-open due to unexpected further claims
development. If this happens too often then the files are probably closed
too early and the claims settlement philosophy should be reviewed in that
particular company. If there is a systematic re-opening it may also ask for
a special assessment for unexpected re-openings, for example, for contracts
with a timely unlimited cover for relapses.
To give statistical statements about insurance contracts and claims behavior, in-
w)
surance companies build homogeneous groups and sub-portfolios to which a LLN
applies. In non-life insurance, contracts are often grouped into business lines such
as private property, commercial property, private liability, commercial liability, ac-
cident insurance, health insurance, motor third party liability insurance, motor
(m
hull insurance, etc. If this classification is too rough it can further be divided into
sub-portfolios, for example, private property can be divided by hazard categories
like fire, water, theft, etc. Often such sub-classes are build by geographical markets
and for different legislations.
Once these (hopefully) homogeneous risk classes are built we can study all claims
that belong to such a sub-portfolio. These claims are further categorized by the
tes
accident date. Claims that fall into the same period are triggered by similar exter-
nal factors like weather conditions, economic environment, therefore such a clas-
sification is reasonable. Since the usual time scale for insurance contracts and
business consolidation is years, claims are typically gathered on the yearly time
scale. Therefore, we consider accounting years denoted by k ∈ N. All claims that
no
have occurrence date T1 ∈ (k − 1, k] are called claims with accident year k. These
claims generate cash flows which are also considered on the consolidated yearly
level, i.e. all payments that are done in the same accounting year are aggregated.
This motivates the following claims reserving notation for fixed i ∈ N and j ∈ N0
NL
Thus, we consider all claims (for a given sub-portfolio) which have accident dates
T1 ∈ (i − 1, i] in the same year, i.e. the same accident year i. For these claims
we consider aggregate cash flows which are further sub-divided by their payment
delays denoted by j ∈ N0 and called development years. For instance,
..
w)
.
.. ..
i . observations Dt .
..
(m
. to be predicted Dtc
t−1
t Xt,0 Xt,1 ··· Xt,J−1
j describes payments with the same payment delay (relative to the accident year);
and (3) k = i + j describes the payments that are done in the same accounting year
(and hence are influenced by the same external factors like inflation). Therefore,
we denote the accounting year payments by
t∧k (J−1)∧(k−1)
NL
X X X
Xk = Xi,j = Xi,k−i = Xk−j,j .
i+j=k i=1∨(k−J+1) j=0∨(k−t)
At time t ∈ N we are liable for all claims that have occurred with accident years
i ≤ t. We call these claims past exposure claims. Some of these past exposure
claims are already settled (if the settlement date T3 ≤ t), others belong either to
the class IBNeR claims (if the reporting date T2 ≤ t but the settlement date T3 > t)
or to the class IBNyR claims (if the reporting date T2 > t).
On the aggregate level we have the following payment information at time t ∈ N
for past exposure claims
Dt = {Xi,j ; i + j ≤ t, 1 ≤ i ≤ t, 0 ≤ j ≤ J − 1} . (9.1)
This information exactly corresponds to the upper triangle (if t = J) or the upper
trapezoid (if t > J) of Table 9.1. This past exposure claims will generate cash
This corresponds to the lower triangle in Table 9.1. This lower triangle Dtc is called
outstanding loss liabilities and it is the major object of interest. Namely, these
outstanding loss liabilities constitute the liabilities of the insurance company origi-
nating from past exposures. In particular, the company needs to build appropriate
provisions so that it is able to fulfill these future cash flows. These provisions are
w)
called claims reserves and they should satisfy the following requirements:
• the claims reserves should be evaluated such that it considers all relevant
(m
(past) information;
• the claims reserves should be a best-estimate for the outstanding loss liabili-
ties adjusted for time value of money.
tes
Basically, this means that we need to predict the lower triangle Dtc based on all
available information Ft ⊃ Dt at time t. In particular, we need to define a stochas-
tic model on the probability space (Ω, F, P) (i) that allows to incorporate past
information Ft ⊂ F; (ii) that reflects the characteristics of past observations Dt ;
no
(iii) that is able to predict future payments of the outstanding loss liabilities Dtc ;
and (vi) that is able to attach time values to these future cash flows Xi,j , i + j > t.
Of course, this is ambitious and we will build such a stochastic model step-by-step.
For the time-being we skip the task of attaching time values to cash flows and we
only consider nominal payments. The total nominal claims payments for accident
NL
thus, for assessing the total claim amount Si of accounting year i we need to describe
the claims settlement process Xi,0 , . . . , Xi,J−1 . In particular, we need to predict the
(unobserved) future cash flows for the outstanding loss liabilities to quantify the
total claim Si of accounting year i.
The (nominal) best-estimate reserves at time I ≥ J for past exposure claims are
then (under these model assumptions) defined by
X X
R= E [Xi,j | FI ] = E [Xi,j | FI ] ,
i+j>I (i,j)∈IIc
w)
i.e. IIc exactly corresponds to the lower triangle DIc . (Ft )t≥0 is a filtration on
(Ω, F, P) with Xi,j being Fi+j -measurable for all (i, j) ∈ I.
(m
The best-estimate reserves R are a predictor for the (nominal) outstanding loss
liabilities of past exposure claims given by
X
Xi,j ,
(i,j)∈IIc
The title of this section contains the word “algorithms”. Initially, in the insurance
industry actuaries have designed algorithms that enable to determine claims re-
serves R. These algorithms need to be understood as guidelines to obtain claims
reserves. Only much later actuaries started to think about stochastic models un-
derlying these algorithms. In this section we present claims reserving from this
algorithmic point of view and in the next section we present stochastic models that
support these algorithms.
The two most popular algorithms are the so-called chain-ladder (CL) algorithm
and the Bornhuetter-Ferguson (BF) algorithm [16]. These two algorithms take
different viewpoints. The CL algorithm takes the position that the observations
DI are extrapolated into the lower triangle, the BF algorithm takes the position
that the lower triangle DIc is extrapolated independently of the observations using
expert knowledge. Depending on the line of business and the progress of claims de-
velopment process one or the other may provide better predictions. Only actuarial
experience may tell which one should be preferred in which particular situation.
Therefore, we are going to present both algorithms from a rather mechanical point
of view.
w)
j
X
Ci,j = Xi,l ,
l=0
that is, we sum all the payments Xi,l for fixed accident years i so that ultimately
(m
we obtain Ci,J−1 = Si , if Si denotes the total claim that corresponds to accident
year i.
CL idea. All accident years i ∈ {1, . . . , I} behave similarly and for cumulative
payments we have approximately
tes
Ci,j+1 ≈ fj Ci,j , (9.2)
for given factors fj > 0. These factors fj are called CL factors, age-to-age factors
or link ratios.
no
The structure (9.2) immediately provides the intuition for estimating the ultimate
claim Ci,J−1 based on the observations DI , namely, choose for every accident year
i the observation on the last observed diagonal, that is Ci,I−i , and multiply this
observation with the successive CL factors fI−i , . . . , fJ−2 .
NL
The remaining difficulty is that, in general, the CL factors fj are not known and,
henceforth, need to be estimated. Assuming that a volume weighted estimate
provides the most reliable results we set in view of (9.2)
PI−j−1 I−j−1
i=1 Ci,j+1 Ci,j Ci,j+1
fbCL
X
j = PI−j−1 = PI−j−1 . (9.3)
i=1 Ci,j i=1 n=1 Cn,j Ci,j
This formula (9.3) expresses that we should divide the sums of observed successive
columns by each other which exactly reflects (9.2).
w)
CL
= Ci,I−i n−1 bCL for i + n > I.
Q
and, in general, Cbi,n j=I−i fj
(m
cCL = C
b CL − C fbjCL − 1 ,
Y
Ri i,J−1 i,I−i = Ci,I−i
j=I−i
and aggregated over all accident years we predict the outstanding loss liabilities of
past exposure by
cCL = cCL .
X
R R
tes
i
i>I−(J−1)
0 1 2 3 4 5 6 7 8 9
1 5’946’975 3’721’237 895’717 207’760 206’704 62’124 65’813 14’850 11’130 15’813
2 6’346’756 3’246’406 723’222 151’797 67’824 36’603 52’752 11’186 11’646
3 6’269’090 2’976’223 847’053 262’768 152’703 65’444 53’545 8’924
4 5’863’015 2’683’224 722’532 190’653 132’976 88’340 43’329
5 5’778’885 2’745’229 653’894 273’395 230’288 105’224
6 6’184’793 2’828’338 572’765 244’899 104’957
7 5’600’184 2’893’207 563’114 225’517
8 5’288’066 2’440’103 528’043
9 5’290’793 2’357’936
NL
10 5’675’568
Table 9.3: Observed cumulative payments Ci,j with (i, j) ∈ II and estimated CL factors fbjCL .
w)
Chapter 9. Claims Reserving
0 1 2 3 4 5 6 7 8 9 R i
bCL
1 0
2 10’663’318 15’126
3 10’646’884 10’662’008 26’257
4 9’734’574 9’744’764 9’758’606 34’538
5 9’837’277 9’847’906 9’858’214 9’872’218 85’302
6 10’005’044 10’056’528 10’067’393 10’077’931 10’092’247 156’494
7 9’419’776 9’485’469 9’534’279 9’544’580 9’554’571 9’568’143 286’121
8 8’445’057 8’570’389 8’630’159 8’674’568 8’683’940 8’693’030 8’705’378 449’167
9 8’243’496 8’432’051 8’557’190 8’616’868 8’661’208 8’670’566 8’679’642 8’691’971 1’043’242
10 8’470’989 9’129’696 9’338’521 9’477’113 9’543’206 9’592’313 9’602’676 9’612’728 9’626’383 3’950’815
NL
Chapter 9. Claims Reserving
total 6’047’061
CL
Table 9.4: CL predicted cumulative payments Cbi,j cCL .
and estimated CL reserves Ri
no
prior BF reserves CL reserves
i estimate µ β CL C BF C CL R R
i i
bi bI−i bi,J−1 bi,J−1 bBF bCL
1 11’653’101 100.0% 11’148’124 11’148’124
2 11’367’306 99.9% 10’664’316 10’663’318 16’124 15’126
tes
3 10’962’965 99.8% 10’662’749 10’662’008 26’998 26’257
4 10’616’762 99.6% 9’761’643 9’758’606 37’575 34’538
5 11’044’881 99.1% 9’882’350 9’872’218 95’434 85’302
6 11’480’700 98.4% 10’113’777 10’092’247 178’024 156’494
7 11’413’572 97.0% 9’623’328 9’568’143 341’305 286’121
(m
w)
BF idea. All accident years i ∈ {1, . . . , I} behave similarly and payments approx-
imately behave as
Xi,j ≈ γj µb i , (9.5)
(m
for given prior information µb i and given development pattern (γj )j=0,...,J−1 with
PJ−1
normalization j=0 γj = 1.
The prior value µb i should reflect the total expected ultimate claim
E[Ci,J−1 ] = E[Si ] of accounting year i. It is assumed that this
tes
estimate is given externally by expert opinion which, in theory,
should not be based on DI . There only remains the estimate of the
development pattern γj . In view of the CL method, one defines
the following estimates for the development pattern:
no
γb0CL = βb0CL ,
γbjCL = βbjCL − βbj−1
CL
for j = 1, . . . , J − 2,
CL
γbJ−1 = 1 − βbCL J−2 .
Equipped with these estimators we predict the ultimate claim Ci,J−1 for i > I −J +1
in the BF method by
J−1
BF
γbjCL = Ci,I−i + µb i 1 − βbI−i
CL
X
Cbi,J−1 = Ci,I−i + µb i . (9.6)
j=I−i+1
w)
J−2 J−2
CL 1
fbjCL
Y Y
Cbi,J−1 = Ci,I−i + Ci,I−i 1 − .
CL
j=I−i fj
b
j=I−i
(m
CL CL CL
Cbi,J−1 = Ci,I−i + 1 − βbI−i Cbi,J−1 ,
BF CL
Cbi,J−1 = Ci,I−i + 1 − βbI−i µb i .
Thus, we see that we have the same structure. The only difference is that for
the BF method we use the external estimate µb i for the ultimate claim and in
tes
CL
the CL method the observation based estimate Cbi,J−1 . Therefore, we have two
complementing prediction positions, which exactly gives the explanation mentioned
in the introduction to Section 9.2. For further remarks (also detailed remarks on
the example in Tables 9.2-9.5) we refer to Wüthrich-Merz [87].
no
P
dictions, i.e. by how much the true payouts i+j>I Xi,j may deviate from these
predictions, see also (1.8). This brings us back to the notion of risk measures of
Section 6.2.4. In claims reserving, the most popular risk measure is the condi-
tional mean square error of prediction (MSEP) because it can be calculated or
estimated explicitly in many examples. Assume X c is a D -measurable predictor
I
for the random variable X. The conditional MSEP is defined by
2
msepX|DI X
c =E X − X DI .
c (9.7)
2
c = Var (X| D ) + E [ X| D ] − X
msepX|DI X c . (9.8)
I I
If all parameters are known and if we can calculate E [ X| DI ] then we should set
c = E [ X| D ] because this minimizes the conditional MSEP, see (9.8). In any
X I
other case we try to estimate E [ X| DI ] as accurately as possible and then we try
to determine the possible sources of parameter uncertainty in this estimation. In
order to analyze this prediction uncertainty we need to put the claims reserving
w)
algorithms into a stochastic framework.
For the CL method there are different stochastic models that provide the CL re-
serves as predictors:
(m
• distribution-free CL model of Thomas Mack [64],
was the derivation of an estimate for the estimation error term. In the present
text we consider the gamma-gamma Bayesian CL model in detail. This model
belongs to the family of Bayesian CL models for which the conditional MSEP can
be calculated explicitly. Mack’s formula will drop out as an approximation to the
full Bayesian formula in the non-informative prior case of this model.
NL
Some of these models also use estimates of γj different from the ones previously
suggested. In the present text we are not going to consider stochastic models for
the BF method.
w)
(a) Conditionally, given Θ = (Θ0 , . . . , ΘJ−2 ), (Ci,j )j=0,...,J−1 are independent (in
i) Markov processes (in j) with conditional distributions
Ci,j+1
Fi,j+1 = ∼ Γ Ci,j σj−2 , Θj Ci,j σj−2 .
(m
Ci,j
Ci,j ,Θ
(b) Θj are independent and Γ(γj , fj (γj −1))-distributed with given prior constants
fj > 0, γj > 1.
g(C1,0 , . . . , CI,0 ) denotes the density of the first column j = 0. This allows to apply
Bayes’ rule which provides for the posterior of Θ, conditionally given DI ,
PI−j−1 Ci,j PI−j−1 Ci,j+1
Y γj +
J−2 −1 −θj fj (γj −1)+
i=1 σ2 i=1 σ2
j
h(θ|DI ) ∝ θj e j
.
j=0
Lemma 9.2. Under Model Assumptions 9.1, the posteriors of Θ0 , . . . , ΘJ−2 are,
conditionally given DI , independent with
I−j−1 I−j−1
X Ci,j X Ci,j+1
Θj |DI ∼ Γ γj + , fj (γj − 1) + .
i=1 σj2 i=1 σj2
Corollary 9.3. Under Model Assumptions 9.1, the posterior Bayesian CL factors
w)
are given by h i
fbjBCL = E Θ−1 D bCL + (1 − α )f ,
I = αj fj
j j j
(m
PI−j−1
i=1 Ci,j
αj = PI−j−1 ∈ (0, 1). (9.9)
i=1 Ci,j + σj2 (γj − 1)
Proof. The proof is a straightforward application of the gamma distributional properties, namely
I−j−1
" #
−1 1 X Ci,j+1
E Θj DI = fj (γj − 1) +
tes
σj2
PI−j−1 Ci,j
γj + i=1 σj2
−1 i=1
PI−j−1 Ci,j PI−j−1 Ci,j+1
γj − 1 i=1 σj2 i=1 σj2
= PI−j−1 Ci,j fj + PI−j−1 Ci,j PI−j−1 Ci,j .
γj − 1 + i=1 σ2
γj − 1 + i=1 σ2 i=1 σ2
j j j
Remarks 9.4.
• Corollary 9.3 is the key for the derivation of the CL reserves. The result says
that the CL factors should be estimated by a credibility weighted average be-
NL
tween the classical CL estimate fbjCL and the prior estimate fj with credibility
weight αj ∈ (0, 1).
• Observe that the individual development factors Fi,j+1 satisfy the Bühlmann-
Straub model, see Model 8.13: conditionally given Θj and C1,j , . . . , CI,j , the
Fi,j+1 are independent with
Thus, Ci,j plays the role of the volume measure and σj2 (Θ) = σj2 Θ−2 j plays
the role of the variance function. We calculate, see (8.4) and (8.5),
1
τj2 = Var(µ(Θj )) = fj2 ,
γj − 2
2 2 γj − 1
σej2 = E[σj2 Θ−2
j ] = σj fj .
γj − 2
σej2
w)
κj = 2 = σj2 (γj − 1).
τj
Therefore, we obtain the classical structure for the credibility weights, see
Theorem 8.17 and (9.9),
(m
PI−j−1
i=1 Ci,j
αj = PI−j−1 .
i=1 Ci,j + κj
J−2
BCL
fbjBCL .
Y
Cbi,J−1 = E [ Ci,J−1 | DI ] = Ci,I−i
j=I−i
Proof. We use the conditional independence between different accident years, the conditional
Markov property and the tower property to obtain
NL
J−2
Y
BCL −1
Ci,J−1 = E [ E [ Ci,J−1 | Ci,0 , . . . , Ci,I−i , Θ]| DI ] = Ci,I−i E
b Θj DI .
j=I−i
Using the posterior independence of Lemma 9.2 and Corollary 9.3 proves the claim. 2
Remark 9.6. Theorem 9.5 explains that our Model Assumptions 9.1 give the CL
reserves if we let the prior distributions of Θ−1
j become non-informative, i.e. for
γj → 1, j = I − i, . . . , J − 2, we have
BCL CL
Cbi,J−1 → Cbi,J−1 .
For this reason we can use the gamma-gamma Bayesian CL model as one example
that replicates the CL algorithm (9.4) in the non-informative limit. This analogy
allows to study prediction uncertainty within Model Assumptions 9.1.
This shows the optimality of the Bayesian CL predictor within our model and there
remains the calculation of the conditional variance of the ultimate claim.
Theorem 9.7. Under Model Assumptions 9.1 the Bayesian CL predictor satisfies
BCL BCL BCL 2
msepCi,J−1 |DI Cbi,J−1 = Cbi,J−1 ΓI−i + (Cbi,J−1 ) ∆I−i ,
)
!
Cb BCL BCL
X X
msepP = msepCi,J−1 |DI Cbi,J−1
w
i
Ci,J−1 |DI i,J−1
i i
BCL b BCL
X
+2 Cbi,J−1 Cl,J−1 ∆I−i ,
i<l
(m
where we define
J−2 J−2 I−n−1
!
σn2 (γn − 1) + i=1
P
Ci,n
σj2 fbnBCL
X Y
Γk = I−n−1 ,
σn (γn − 2) + i=1 Ci,n
2
P
j=k n=j
PI−j−1
Y σj2 (γj
J−2
− 1) + i=1 Ci,j
∆k = PI−j−1 − 1.
tes
j=k σj2 (γj − 2) + i=1 Ci,j
Proof. We first decouple accident years
! !
X X X
BCL
msepP C = Var Ci,J−1 DI = Cov (Ci,J−1 , Cl,J−1 | DI ) .
bi,J−1
Ci,J−1 |DI
i
i i i,l
no
We calculate these covariance terms. Applying the tower property for conditional expectations
implies
We start with the first term on the right-hand side of (9.12). Observe that this term is zero for
NL
Var (Ci,J−1 | DI , Θ)
= E [ Var ( Ci,J−1 | Ci,J−2 , Θ)| DI , Θ] + Var ( E [ Ci,J−1 | Ci,J−2 , Θ]| DI , Θ)
2
Θ−2 DI , Θ + Var Ci,J−2 Θ−1 DI , Θ
= E Ci,J−2 σJ−2 J−2 J−2
J−3
Y
= Ci,I−i Θ−1
j
2
σJ−2 Θ−2 −2
J−2 + ΘJ−2 Var ( Ci,J−2 | DI , Θ) .
j=I−i
Hence, we obtain the well-known recursive formula for the process variance in the CL method
(see Section 3.2.2 in Wüthrich-Merz [87]). By iterating the recursion we find for given Θ (see
also Lemma 3.6 in Wüthrich-Merz [87])
J−2
X j−1
Y J−2
Y
2 −2
Var (Ci,J−1 | DI , Θ) = Ci,I−i Θ−1
m σ j Θj Θ−2
n . (9.13)
j=I−i m=I−i n=j+1
Applying the operator E[·|DI ] to (9.13) and using the posterior independence of the random
variables Θj we obtain
J−2
X j−1
Y J−2
Y
BCL 2
E Θ−2
E [ Var (Ci,J−1 | DI , Θ)| DI ] = Ci,I−i fbm σj n
DI
j=I−i m=I−i n=j
J−2 j−1 J−2
PI−n−1 Ci,n
X Y Y γn − 1 + i=1 2
σn
BCL 2
= Ci,I−i fbm σj (fbnBCL )2 PI−n−1 Ci,n
j=I−i m=I−i n=j γn − 2 + i=1 2
σn
J−2 J−2
PI−n−1 Ci,n
X Y γn − 1 + i=1 2
σn
BCL
= C
bi,J−1 σj2 fbnBCL PI−n−1 Ci,n
=C BCL
bi,J−1 ΓI−i .
j=I−i n=j γn − 2 + i=1 2
σn
w)
For the second term in (9.12) we have, w.l.o.g. we assume i ≤ l,
J−2
Y J−2
Y
−1 −1
Cov (E [ Ci,J−1 | DI , Θ] , E [ Cl,J−1 | DI , Θ]| DI ) = Ci,I−i Cl,I−l Cov Θj , Θj DI
j=I−i j=I−l
(m
I−i−1
Y J−2
Y J−2
Y J−2
Y
−1 −2 −1 −1
= Ci,I−i Cl,I−l E Θj Θj DI − E Θj DI E Θj DI
j=I−l j=I−i j=I−i j=I−l
PI−j−1 Ci,j
Y γj − 1 + i=1
J−2
σj2
BCL b BCL BCL b BCL
= C bi,J−1 Cl,J−1 PI−j−1 Ci,j − 1 = C bi,J−1 Cl,J−1 ∆I−i .
j=I−i γ j − 2 + i=1 σ 2
j
Under the assumptions that σj2 I−j−1 Ci,j we obtain for γj → 1, we also use a
P
i=1
first order Taylor expansion for ∆k , see also (9.25) below,
J−2 J−2
def. e
σj2 fbnCL = Γ
X Y
Γk ≈ k, (9.14)
j=k n=j
NL
J−2
X σj2 J−2
X σj2 def.
∆k ≈ PI−j−1 ≈ PI−j−1 = ∆
e .
k (9.15)
j=k i=1 Ci,j − σj2 j=k i=1 Ci,j
This motivates for the non-informative prior case γj → 1 the approximation
CL
CL
J−2
1 s2j 1
)2
X
msepCi,J−1 |DI Cbi,J−1 = (Cbi,J−1
bCL 2 Cb CL
+ PI−j−1
, (9.16)
j=I−i (fj ) i,j l=1 C l,j
for s2j = σj2 (fbjCL )2 . This is the famous Mack formula [64] that gave the first rigorous
derivation of an estimate for the conditional MSEP in the CL model (note that
Mack [64] uses s2j as variance parameter, whereas in the Bayesian model we use
σj2 Θ−2
j , see (9.11)).
w)
j
estimate Θ−2j by (fbjCL )2 then we can find estimates σbj2 = sb2j /(fbjCL )2 once we have
estimated s2j . The estimation of the latter is done rather ad-hoc by the classical
estimates, see Lemma 3.5 in Wüthrich-Merz [87],
(m
I−j−1 !2
1 Ci,j+1
sb2j − fbjCL
X
= Ci,j . (9.17)
I − j − 2 i=1 Ci,j
0 1 2 3 4 5 6 7 8 9
sbj 135.25 33.80 15.76 19.85 9.34 2.00 0.82 0.22 0.06
σ
bj 90.62 31.36 15.41 19.56 9.27 1.99 0.82 0.22 0.06
These parameters provide the results for the square-rooted conditional MSEPs
given in Table 9.7. We observe that for the total claims reserves the 1 standard
deviation confidence bounds are about 7.7% of the total claims reserves. We also
observe that the full formula given by Theorem 9.7 with non-informative priors and
Mack’s formula (9.16) are very close, i.e. 462’967 versus 462’960. This observation
holds true for many typical non-life insurance data sets and it justifies the use of
the simpler formula.
w)
7 449’167 85’399 85’398 19.0%
8 1’043’242 134’337 134’337 12.9%
9 3’950’815 410’824 410’817 10.4%
covariance1/2 116’811 116’810
total 6’047’061 462’967 462’960 7.7%
(m
Table 9.7: Claims reserves and prediction uncertainty in the non-informative priors
gamma-gamma Bayesian CL model and Mack’s formula (9.16).
Observe that
E[Xi,j ] = µi γj ,
Var(Xi,j ) = φµi γj .
We have a cross-classified mean with µi modeling the exposure
of accident year i and γj the development pattern of the payout
delay j. In order to make the parameters µi and γj uniquely P.D. England
identifiable we need a side constraint. The two commonly used
side constraints are either
J−1
X
µ1 = 1 or γj = 1.
j=0
The first option is more convenient in the application of GLM methods, the second
option gives an explicit meaning to the pattern (γj )j , namely, that it corresponds
to the cash flow pattern.
The best-estimate reserves at time I are given by
X X
R= E [ Xi,j | DI ] = µi γj .
i+j>I i+j>I
Hence, we need to estimate the parameters µi and γj . This is done with MLE
methods. We assume J = I which simplifies notation. Having observations DI
w)
allows to estimate the parameters. The log-likelihood function for µ = (µ1 , . . . , µI ),
γ = (γ0 , . . . , γJ−1 ) and φ is given by
X
`DI (µ, γ, φ) = −µi γj /φ + (Xi,j /φ) log(µi γj /φ) − log((Xi,j /φ)!).
(i,j)∈II
(m
Calculating the derivatives w.r.t. µ and γ and setting them equal to zero implies
that we need to solve the following system of equations to find the MLEs
I−j
X I−j
X
γj µi = Xi,j for all j = 0, . . . , J − 1, (9.19)
i=1 i=1
I−i I−i
tes
X X
µi γj = Xi,j for all i = 1, . . . , I, (9.20)
j=0 j=0
the CL reserves. Moreover, the constant dispersion parameter φ cancels and is not
relevant for estimating the reserves.
Theorem 9.10. Under Model Assumptions 9.9, the MLEs for µ and γ, given DI ,
are given by
NL
J−2
CL 1 1
µb MLE γbjMLE =
Y
i = Cbi,J−1 and 1 − ,
k=j fbkCL CL
fbj−1
QJ−2
for i = 1, . . . , I and j = 1, . . . , J − 1. Moreover, γb0MLE = k=0 1/fbkCL . For the
estimated reserves we have
J−1
cODP = µ
b MLE cCL .
γbjMLE = R
X
Ri i i
j=I−i+1
Proof. For the proof we refer to Lemma 2.16, Corollary 2.18 and Remarks 2.19 in Wüthrich-Merz
[87]. Basically, the proof goes by induction along the last observed diagonal in DI . 2
Remarks 9.11.
• Theorem 9.10 goes back to Hachemeister-Stanard [52], Kremer [60] and Mack
[63].
• Theorem 9.10 explains the popularity of the ODP model for claims reserving,
because it provides the CL reserves. Thus, we have found a second stochastic
model that can be used to explain the CL algorithm from a stochastic point
of view.
• In this model we can also give an estimate for the conditional MSEP. This
uses that MLEs are approximated by standard Gaussian asymptotic results
w)
for GLM. For details we refer England-Verrall [39] and Wüthrich-Merz [87],
Section 6.4.3.
• The ODP framework also allows to give an estimate for the conditional MSEP
in the BF method, and it justifies the choice βbjCL = jk=0 γbkMLE . For details
P
(m
we refer to Alai et al. [3, 4].
In the run-off situation the flow of information (9.1) is changed to (we do a slight
abuse of notation)
Dt = {Xi,j ; i + j ≤ t, 1 ≤ i ≤ I, 0 ≤ j ≤ J − 1} ,
i.e. this generates a filtration denoted by (Dt )t≥0 on (Ω, F, P) that describes the
NL
flow of information (we set Dt = σ(Dt )). At time t ≥ I the ultimate claim of
accident year i > t − J + 1 is predicted by
(t)
Cbi,J−1 = E [ Ci,J−1 | Dt ] . (9.21)
This is the predictor that minimizes the conditional MSEP at time t. The best-
estimate reserves at time t for accident year i > t − J + 1 are provided by
(t) (t)
Ri = Cbi,J−1 − Ci,I−i+t . (9.22)
(t) (t+1) (t) (t+1)
CDRi,t+1 = Ri − Xi,I−i+t+1 + Ri = Cbi,J−1 − Cbi,J−1 . (9.23)
w)
This is exactly the classical earning statement view in order to
understand the risk that derives from the development of the
outstanding loss liabilities.
The tower property immediately gives the following crucial state- M. Merz
(m
ment:
Corollary 9.12. Assume Ci,J−1 has finite first moment. Then we have
E [ CDRi,t+1 | Dt ] = 0.
tes
This corollary explains that in the average we neither expect losses nor gains but
the prediction is just unbiased. Note that (9.21) defines a martingale in t and
remark that square-integrable martingales have uncorrelated innovations (claims
development results). Our aim is to study the uncertainty in this position measured
by the conditional MSEP. For simplicity we set t = I,
no
h i
msepCDRi,I+1 |DI (0) = E (CDRi,I+1 − 0)2 DI = Var (CDRi,I+1 | DI )
(I+1)
= Var Cbi,J−1 DI . (9.24)
Thus, we need to study the volatility of the one-period update. We do this in the
NL
gamma-gamma Bayes CL Model 9.1. Of course, Lemma 9.2 easily extends to the
following lemma.
(t)
with posterior expected Bayesian CL factors given by fbj = E[Θ−1
j |Dt ].
Next we exploit the recursive structure of credibility estimators, see for instance
Corollary 8.6. These hold true in quite some generality, for the current exposition
we restrict to t = I, I + 1 because these are the only indexes of interest for the
analysis of (9.24). For t = I + 1 and j ≥ 0 we have
PI−j Ci,j+1
h i fj (γj − 1) + i=1 σj2
(I+1)
fbj = E Θ−1
j DI+1 =
PI−j Ci,j
γj − 1 + i=1 σ 2
j
CI−j,j+1 PI−j−1 Ci,j+1
σj2
fj (γj − 1) + i=1 σj2
= PI−j Ci,j + PI−j Ci,j
γj − 1 + γj − 1 +
)
i=1 σ 2 i=1 σ 2
j j
(I+1) CI−j,j+1
(I+1) b(I)
w
= βj + 1 − βj fj ,
CI−j,j
with DI -measurable credibility weight
(m
(I+1) CI−j,j
βj = PI−j ∈ [0, 1].
σj2 (γj − 1) + i=1 Ci,j
(I+1)
The important observation is that there is only one random term in fbj , condi-
tionally given DI . This is crucial in the calculation of the conditional MSEP of the
claims development result prediction. We start with a lemma.
tes
Lemma 9.14. Under Model Assumptions 9.1 we have for I − i + 1 ≤ J − 1
2 (I) (I)
Var (Ci,I−i+1 | DI ) = Ci,I−i σI−i (fbI−i )2 ΥI−i + Ci,I−i
2
(fbI−i )2 (ΥI−i − 1).
where
no
Theorem 9.15. Under Model Assumptions 9.1 the Bayesian CL predictor satisfies
J−2
(I) (I+1) 2
msepCDRi,I+1 |DI (0) = (Cbi,J−1 )2 (ΨI−i + 1)
Y
(βj ) Ψj + 1 − 1 .
j=I−i+1
X
msepP CDRi,I+1 |DI (0) = msepCDRi,I+1 |DI (0)
i
i
J−2
X (I) (I) β (I+1) ΨI−i
Y (I+1) 2
+2 Cbi,J−1 Cbl,J−1 I−i +1 (βj ) Ψj + 1 − 1 ,
i<l j=I−i+1
where
2
!
Var (Ci,I−i+1 | DI ) σI−i
ΨI−i = (I)
= + 1 ΥI−i − 1.
(fbI−i Ci,I−i )2 Ci,I−i
The only random terms under the measure P(·|DI ) are Ci,I−i+1 , Ci−1,I−i+2 , . . . , CI−J+2,J−1 . All
these random variables belong to different accident years i and to different development periods j.
)
Therefore, they are independent given DI , this follows from Model Assumptions 9.1 and Lemma
w
9.2. Moreover, we have unbiasedness in the sense (use the tower property)
(I+1) CI−j,j+1
h i
(I+1) b(I) (I+1) (I)
E βj + 1 − βj fj DI = E fbj DI = fbj .
CI−j,j
(m
In the first step we decouple the covariance as follows
h i
Cov C b (I+1) , C
b (I+1) DI = E C b (I+1) DI − C
b (I+1) C b (I) Cb (I)
i,J−1 l,J−1 i,J−1 l,J−1 i,J−1 l,J−1 ,
with
J−2 J−2
tes
h i
b (I+1) DI = E Ci,I−i+1
b (I+1) C (I+1)
Y Y
(I+1)
E C fbj Cl,I−l+1 fbm DI .
i,J−1 l,J−1
j=I−i+1 m=I−l+1
We first treat the case i = l. In that case we have using the conditional independence
J−2 2
(I+1) CI−j,j+1
h i
no
" 2 #
(I+1) CI−j,j+1
h i
(I+1) 2 (I+1) b(I)
E (fj ) DI = E βj + 1 − βj fj DI
b
CI−j,j
(I+1) CI−j,j+1
(I+1) b(I) (I)
= Var βj + 1 − βj fj DI + (fbj )2
CI−j,j
(I+1) 2
!
βj (I)
h
(I+1) 2
i
(I)
= Var (CI−j,j+1 | DI ) + (fbj )2 = (βj ) Ψj + 1 (fbj )2 .
CI−j,j
There remains the case of different accident years. W.l.o.g. we assume i < l which implies
I − i + 1 > I − l + 1. This and conditional independence given DI implies for the covariance
w)
h i h i
b (I) C
b (I) (I+1) (I+1) 2
Y
= Ci,J−1 l,J−1 βI−i ΨI−i + 1 (βj ) Ψj + 1 .
j=I−i+1
(m
Similar to (9.16) we consider in a first step the linear approximation
J−2
(I) (I+1) 2
msepCDRi,I+1 |DI (0) = (Cbi,J−1 )2 (ΨI−i + 1)
Y
(βj ) Ψj + 1 − 1
j=I−i+1
J−2
(I) (I+1) 2
≈ (Cbi,J−1 )2 ΨI−i +
X
(βj ) Ψj . (9.25)
tes
j=I−i+1
l=1 Cl,j
For the variance constants we consider the same limit as γk → 1 which provides,
see also page 223 and in particular approximation (9.15),
σk2
NL
Υk = 1+ PI−k−1
σk2 (γk − 2) + l=1 Cl,k
2
σ σk2
→ 1 + PI−k−1 k ≈ 1 + PI−k−1 .
l=1 Cl,k − σk2 l=1 Cl,k
Applying this non-informative prior case and using again the linear approximation
we obtain
2 2
σ2
! !" #
σI−i σI−i
ΨI−i = + 1 ΥI−i − 1 ≈ + 1 1 + Pi−1I−i −1
Ci,I−i Ci,I−i l=1 Cl,I−i
2
σI−i σ2
≈ + Pi−1I−i .
Ci,I−i l=1 Cl,I−i
The remaining terms are approximated in the non-informative prior case completely
similarly, i.e.
!2
σj2 σj2
!
(I+1) 2 CI−j,j
(βj ) Ψj ≈ PI−j + PI−j−1
l=1 Cl,j
CI−j,j l=1 Cl,j
!2 PI−j−1 !
CI−j,j l=1 Cl,j + CI−j,j
= PI−j PI−j−1 σj2
l=1 C l,j C I−j,j l=1 C l,j
2
CI−j,j σj (I+1) σj2
= PI−j PI−j−1 = βje
PI−j−1 .
l=1 Cl,j l=1 Cl,j l=1 Cl,j
In analogy to (9.16) we obtain for the uncertainties in the claims development
w)
result prediction in the non-informative prior case
s2I−i
"
CL 1
msepCDRi,I+1 |DI (0) = (Cbi,J−1 )2 CL 2 C
(9.27)
(fI−i )
(m
i,I−i
b
s2I−i J−2
s2j
#
1 X (I+1) 1
+ CL 2
Pi−1 + βj
e
CL PI−j−1 ,
l=1 Cl,I−i
(fI−i )
b (fj )
b 2 Cl,j
j=I−i+1 l=1
(I+1)
for s2j = σj2 (fbjCL )2 and βej given by (9.26). This is the Merz-Wüthrich (MW)
formula, see (3.17) in [69]. We also refer to Bühlmann et al. [23]. Formula (9.16)
tes
is often called total run-off uncertainty and formula (9.27) corresponds to the one-
year run-off uncertainty. Comparing these two formulas we observe that from the
total uncertainty the first term with index j = I − i also appears in the one-
year uncertainty. From the remaining terms j ≥ I − i + 1 of the summation
no
in (9.16) only the second terms survive. These second terms correspond to the
(I+1)
parameter estimation error and need to be scaled with βej to obtain the one-
year uncertainty, which reflects the release of parameter uncertainty when new
information (a new diagonal in the claims development triangle) arrives.
Example 9.16. We revisit claims reserving Example 9.8 and calculate the claims
NL
w)
8 1’043’242 134’337 104’311 78%
9 3’950’815 410’817 385’773 94%
total 6’047’061 462’960 420’220 91%
Table 9.8: Claims reserves and prediction uncertainty: Mack’s formula (9.16) for
(m
the total uncertainty and Merz-Wüthrich formula (9.27) for one-year claims devel-
opment uncertainty.
Exercise 17 (Italian motor third party liability insurance example). We revisit the
Italian motor third party liability insurance example of Bühlmann et al. [23]. The
tes
field study considers 12 × 12 run-off triangles of 37 Italian insurance companies
at the end of 2006. For these data the claims reserves and the corresponding
conditional MSEPs for the total run-off uncertainty and for the one-year claims
development result uncertainty using Mack’s formula (9.16) and the MW formula
no
(9.27), respectively, were calculated. The results are presented in Table 9.9. Note
that for confidentiality reasons the volumes of the 4 biggest companies were all set
equal to 100.0 and the order of these 4 companies is arbitrary.
Give interpretations to these results.
NL
CDR msep1/2
company business total msep1/2 CDR msep1/2 total msep1/2
volume (in % reserves) (in % reserves) (in %)
w)
1 100.0 4.03 3.24 80.4
2 100.0 2.90 2.36 81.4
3 100.0 2.41 1.98 82.3
4 100.0 3.45 2.85 82.6
(m
5 61.8 3.66 3.04 82.9
6 56.9 5.54 4.50 81.2
7 53.0 4.52 3.70 81.8
8 49.4 4.60 3.82 83.1
9 46.2 5.61 4.59 81.8
10 41.6 5.32 4.36 82.0
tes
.. .. .. .. ..
. . . . .
30 3.5 18.02 14.78 82.0
31 3.4 17.23 13.92 80.8
32 2.6 18.73 14.89 79.5
no
Table 9.9: Italian motor third party liability insurance example of Bühlmann et
al. [23]. Prediction uncertainties: Mack’s formula (9.16) for the total uncertainty
and Merz-Wüthrich formula (9.27) for one-year claims development uncertainty.
Solvency Considerations
w)
In the previous chapters we have mainly discussed the model-
(m
ing of insurance contracts, the related liability cash flows and
the implications for tariffication. If we remind of the discussion
in Chapter 1, we recall that the insurance company organizes
the equal balance within the community. That is, it issues in-
surance contracts at a fixed premium and in return it promises
to cover all (financial) claims that fall under these contracts.
tes
Of course, we need to make sure that the insurance company
can keep its promises. This is exactly the crucial task of su-
pervision (regulation) and sound risk management practice. Regulation aims to
protect the policyholder in that it enforces (by law) the insurance company to fol-
no
Chapter 5 is much too simple to reflect real world insurance problems. Therefore,
we modify the ultimate ruin probability considerations so that they reflect the
current risk management task. In a first step we will discuss more general risk
management views, for a comprehensive discussion we refer to Wüthrich-Merz [88],
and in a second step we discuss more explicitly the solvency and risk management
implementations used in the insurance industry.
235
236 Chapter 10. Solvency Considerations
assets liabilities
cash and cash equivalents deposits
debt securities policyholder deposits
bonds reinsurance deposits
loans borrowings
mortgages money market
real estate hybrid debt
equity convertible debt
equity securities insurance liabilities
private equity claims reserves
w)
investments in associates premium reserves
hedge funds
derivatives derivatives
futures, swaptions, equity options
(m
insurance and other receivables insurance and other payables
reinsurance assets reinsurance liabilities
property and equipment employee benefit plan
intangible assets provisions
goodwill
deferred acquisition costs
income tax assets income tax liabilities
tes
other assets other liabilities
Table 10.1: Balance sheet of a non-life insurance company at a fixed point in time.
no
Table 10.1 presents a snap shot of a non-life insurance company’s balance sheet,
that is, it reflects all positions at a certain moment in time t ∈ R+ . The left
hand side shows the assets at time point t and the right hand side should show the
NL
liabilities at the same time point t. We denote the value of the assets at time t by
At , and Lt denotes the value of the liabilities at time t.
In the language of Chapter 5, we can think of At denoting all asset values in the
company at time t. These comprise the initial capital, all premia received and all
other amounts received minus the payments done up to time t. These amounts
are invested at the financial market and, thus, are allocated to the different asset
classes displayed in Table 10.1. On the other hand, the liabilities Lt reflect the
value of all obligations accepted by the insurance company that are still open at
time t.
In the context of the ruin theory Chapter 5 we should have At ≥ Lt in order to cover
the liabilities by asset values at time t. In fact, we have studied the continuous
time surplus process (Cet )t∈R+ , given by Cet = At − Lt , which should fulfill for a
see (5.4) and (5.5). Since an insurance company cannot continuously verify the
solvency situation, condition (10.1) is only checked on the discrete time grid t ∈ N0 ,
this is similar to (5.5). But in fact, one even goes beyond that which we are just
going to describe. This will be done in several steps.
w)
we would like to check a solvency condition (no ruin condition) similar to (10.1).
Moreover, we assume that at time 0 we have only sold one-year contracts (one-year
risk exposures) for which we receive a premium at time 0 and for which the claim
is paid at the end of the year, i.e. at time t = 1.
(m
The total asset value at time 0 is given by A0 . This value is invested at the financial
market and generates value A1 at time 1. Thus, for this one-period problem the
no ruin condition reads as follows:
for a given large probability 1 − p ∈ (0, 1) the initial capital and the asset strategy
should be chosen such that
tes
P [A1 ≥ L1 ] = P [L1 − A1 ≤ 0] ≥ 1 − p. (10.2)
This means that we need to choose the initial capital c0 and the asset strategy, which
no
maps value A0 at time 0 to value A1 at time 1, such that the (given stochastic)
liabilities L1 can be covered with large probability at time 1. Note that A1 and L1
are, in general, not independent.
Step 2 (risk measure). The no ruin condition in (10.2) is described under the
NL
Value-at-Risk risk measure VaR1−p (L1 − A1 ) on security level 1 − p ∈ (0, 1), see
Example 6.25. Assume we have a normalized and translation invariant risk measure
%, see (6.13), then more generally
the initial capital and the asset strategy should be chosen such that
% (L1 − A1 ) ≤ 0. (10.3)
Solvency II uses the VaR risk measure on the 1 − p = 99.5% security level and the
Swiss Solvency Test (SST) uses the TVaR risk measure on the 1−p = 99% security
level, see also Examples 6.25, 6.20 and 6.26. The main aspect is now concerned
with the stochastic modeling of position L1 − A1 .
w)
Assume we can split the liabilities L1 into two elements:
(i) payments X1 done at time 1 (similar to Section 9.1 we map all payments in
accounting year (0, 1] to its endpoint);
(m
(ii) outstanding loss liabilities L+
1 at time 1.
L1 = X 1 + L+
1.
The easier part is the modeling of X1 . We need to find a stochastic model that is
tes
able to predict the payments X1 and capture the dependencies with A1 and L+ 1.
+
The more complicated part is L1 . This amount should reflect a market-consistent
value for the outstanding loss liabilities at time 1. Observe that it differs from the
best-estimate reserves R(1) given in (9.22) in two crucial ways:
no
(1) The best-estimate reserves R(1) were calculated on a nominal basis, i.e. the
time value of money was not considered because no discounting was applied
to R(1) .
the information F1 . That is, these are expected payouts and we should add
a (risk, market-value) margin/loading to obtain market-consistent values for
risk averse financial agents being willing to do the run-off of these liabilities,
see Chapter 6.
The aim of these two tasks (1) and (2) is motivated by the fact that L+ 1 should
reflect a price at which another insurance company is willing to take over the
liabilities at time 1 and to complete the run-off of the outstanding loss liabilities.
sheet and the company is solvent at time 0. If (10.3) is not fulfilled we have an un-
acceptable balance sheet and it needs to be modified to achieve solvency. Options
for modification are the following: change the asset strategy so that it matches
better to the liabilities; reduce liabilities and mitigate uncertainties in liabilities (if
possible); inject more initial capital.
In the remainder of this chapter we discuss the modeling of the asset deficit at time
t = 1, where the asset deficit is for t ∈ N0 defined by
def.
ADt = Lt − At = Xt + L+
t − At . (10.4)
w)
Thus, the insurance company is solvent at time 0 (w.r.t. the risk measure %) if
% (AD1 ) = % (L1 − A1 ) ≤ 0.
(m
10.2 Risk modules
Typically the modeling of the asset deficit AD1 defined in (10.4) is split into different
modules that reflect different risk classes. In a first step each risk class is studied
individually and in a second step the results are aggregated to obtain the overall
tes
picture.
no
NL
Figure 10.1: lhs: Swiss Solvency Test risk modules; rhs: Solvency II risk modules
(sources [26] and [41]).
One can question whether this modeling approach is smart. Modeling individual
risk classes can still be fine, but aggregation is rather non-straightforward because
it is very difficult to capture the interaction between the different risk classes.
Nevertheless we would like to describe the approach used in practice.
In Figure 10.1 we show the individual risk modules used in the Swiss Solvency Test
and in Solvency II. Overall they are rather similar though some differences exist.
Often one considers the following 4 risk classes that are driven by the risk factors
that we will just describe:
1. Market risk. We cite SCR.5.1. of QIS5 [41]: ”Market risk arises from the level
or volatility of market prices of financial instruments. Exposure to market
risk is measured by the impact of movements in the level of financial variables
such as stock prices, interest rates, real estate prices and exchange rates.”
2. Insurance risk. Insurance risk is typically split into the different insur-
ance branches: non-life insurance, life insurance, health insurance and re-
insurance. Here we concentrate on non-life insurance risk. This is further
subdivided into (i) reserve risk which describes outstanding loss liabilities of
past exposure claims; and (ii) premium risk which describes the risk deriving
w)
from newly sold contracts that give an exposure over the next accounting
period.
3. Credit risk. We cite SCR.6.1. of QIS5 [41]: ”The counterparty default risk
(m
module should reflect possible losses due to unexpected default, or deteriora-
tion in the credit standing, of the counterparties and debtors of undertakings
over the forthcoming twelve months. The scope of the counterparty default
risk module includes risk-mitigating contracts, such as reinsurance arrange-
ments, securitisations and derivatives, and receivables from intermediaries,
as well as any other credit exposures which are not covered in the spread risk
tes
sub-module.”
4. Operational risk. We cite SCR.3.1. of QIS5 [41]: ”Operational risk is the risk
of loss arising from inadequate or failed internal processes, or from personnel
and systems, or from external events. Operational risk should include legal
no
risks, and exclude risks arising from strategic decisions, as well as reputation
risks. The operational risk module is designed to address operational risks to
the extent that these have not been explicitly covered in other risk modules.”
Let us formalize these risk factors and classes. Therefore, we first consider the
beginning of accounting year 1. At time t = 0 the asset deficit is given by
NL
AD0 = L0 − A0 .
L0 = L+ PY CY
0 = L0 + L0 .
On the asset side of the balance sheet we have (this is also a simplified version)
A0 = c0 + APY
0 +π
CY
,
where APY
0 are the provisions to cover the PY liabilities LPY
0 , π
CY
is the premium
CY
received for the CY claims L0 and c0 is the initial capital. As described above,
this amount A0 is invested at the financial market and provides value A1 at time
t = 1. This value needs to be compared to
L1 = X1 + L+
1 = X1PY + X1CY + X1Op + L+,PY + L+,CY ,
w)
1 1
where X1PY are the payments for PY claims, X1CY are the payments for CY claims,
L+,PY
1 is the value of the outstanding loss liabilities at time t = 1 for claims with
accident year prior to t = 0, and L+,CY is the value of the outstanding loss liabilities
(m
1
at time t = 1 for CY claims (i.e. accident date in (0, 1]). Thus, if we merge these
+,PY
two values L+1 = L1 + L+,CY
1 we obtain the new outstanding loss liabilities for
past exposure claims with accident date prior to t = 1. Finally, X1Op denotes
the operational risk loss payment where, for simplicity, we assume that this can
immediately be settled. We conclude that the asset deficit at time 1 is given by
tes
AD1 = X1PY + X1CY + X1Op + L+,PY
1 + L+,CY
1 − A1 (10.5)
= X1PY + L+,PY
1 + X1CY + L+,CY
1 + X1Op − A1 . (10.6)
• Formula (10.5) gives the split into payments and outstanding liabilities. This
view is crucial for doing asset-and-liability management, i.e. to compare the
structure of the asset portfolio to the maturities of the liabilities.
NL
• Formula (10.6) provides the split into PY risk and CY risk. The PY risk is
mainly described by the claims development result described in Section 9.4.
The CY risk is described by a compound distribution as, for instance, seen in
Example 4.11. However, both descriptions only consider nominal claims and
in order to get values we still need to add time values for cash flow payments
and a risk margin for bearing the run-off risks, and thus these values also
depend on financial market movements.
• Coming back to the risk modules: market risk affects all variables in (10.5);
insurance risk is mainly reflected in X1PY , L+,PY
1 , X1CY and L+,CY
1 ; credit risk
Op
is a main risk driver in A0 ; and operational risk is reflected in X1 .
w)
assume that this cash flow is adapted to the filtration F = (Fs )s≥1 . In analogy
to Wüthrich-Merz [87] we need to choose an appropriate (state price) deflator
ϕ = (ϕ1 , . . . , ϕn ) (which is F-adapted and strictly positive, P-a.s.) and then
1 X
(m
LIns
1 = E [ ϕs Xs | F1 ]
ϕ1 s≥1
= X1PY + X1CY +
X
P (1, s) E [ Xs | F1 ] ,
s≥2
no
where P (1, s) denotes the price at time 1 of the zero-coupon bond that matures at
time s. Note that viewed from time 0 both P (1, s) and E [ Xs | F1 ] are F1 -measurable
random variables.
Under all the previous assumptions (in particular the uncorrelatedness assumption
(10.7)) the acceptability requirement (10.3) reads as:
NL
The initial capital c0 and the asset strategy should be chosen such that
Since the asset deficit still has a rather involved form the model is further simplified.
Denote the expected values
The first term p(1, s)xs is the expected value (viewed from time 0) of the time-1-
price P (1, s)E [ Xs | F1 ]. The term (P (1, s) − p(1, s)) xs coins uncertainties in finan-
cial discounting and p(1, s) (E [ Xs | F1 ] − xs ) describes volatilities in the insurance
cash flows. The cross term of the uncertainties was dropped in this approxima-
tion. Typically, the above terms are assumed to be independent so that they can
be studied individually and aggregation is obtained by simply convoluting their
marginal distributions.
This approximation implies that for (10.8) we study the following three terms
X
Z1 = p(1, s)xs + (P (1, s) − p(1, s)) xs − A1 ,
w)
s≥1
X
Z2 = p(1, s) (E [ Xs | F1 ] − xs ) ,
s≥1
Z3 = X1Op .
(m
Z1 describes market and credit risks, Z2 describes insurance risk and Z3 describes
operational risk. In non-life insurance one often assumes that these three random
variables are independent.
In the remainder of this chapter we describe the insurance liability variable Z2 . For
the other terms we refer to the solvency literature QIS5 [41], Swiss Solvency Test
[43] and Wüthrich-Merz [88].
tes
10.3.2 Insurance risk
We study insurance risk given by
no
X
Z2 = p(1, s) (E [ Xs | F1 ] − xs ) .
s≥1
As already mentioned the insurance variables are separated into PY variables and
CY variables w.r.t. the valuation date t = 0. This provides the split
Z2 = Z2PY + Z2CY
NL
def.
h i h i
p(1, s) E XsPY F1 − xPY p(1, s) E XsCY F1 − xCY
X X
= + .
s s
s≥1 s≥1
The final simplification is that we assume that there are deterministic payout pat-
terns (γsPY )s≥1 and (γsCY )s≥1 , for instance, obtained by the CL method, see Theo-
rem 9.10 (and the estimation errors in these patterns are neglected). Then the last
expressions can be modified to
h i
Z2PY = p(1, s)γsPY X1PY + R(1) − R(0) ,
X
s≥1
s≥1
The first line Z2PY reflects the study of the claims development result, see (9.23).
The second line Z2CY describes the total nominal claim S1 of accident year (0, 1]
that is caused by the premium exposure π CY . The terms in the round brackets
are the deterministic discount factors that respect the underlying maturities of the
cash flows; the terms in the square brackets are the random terms that need further
modeling and analysis.
w)
The claims development result for PY claims, given by
h i
CDR1 = − X1PY + R(1) − R(0) ,
(m
has expected value 0, see Corollary 9.12, if the claims reserves are defined by
conditional expectations in a Bayesian model. Therefore, there remains the study
of higher moments. In practice, one restricts to the second moment:
• Calculate for every line of business the conditional MSEP of the claims de-
velopment result prediction, for instance using MW formula (9.27). This
tes
provides a variance estimate for every line of business.
• Specify a correlation matrix between the different lines of business, see for
instance SCR.9.34. in QIS5 [41].
no
• The previous two items allow to aggregate the uncertainties of the individual
lines of business to obtain the overall variance over the sum of all lines of
business.
distribution of CDR1 .
The claim E [ S1 | F1 ] resulting from the premium exposure π CY is split into two
independent random variables Ssc and Slc , where Ssc reflects all small claims below
a given threshold M and Slc the claims above that threshold, see Examples 2.16
and 4.11.
The large claim Ssc is modeled per line of business (or by peril) by independent
compound Poisson distributions with Pareto claims severities and aggregation is
done using the aggregation ↑ Theorem 2.12 resulting in a compound Poisson dis-
tribution. The latter can be determined, for instance, with the Panjer algorithm,
see Theorem 4.8.
The small claim Ssc is treated similarly to the claims development result, i.e. es-
timate per line of business the first two moments. Aggregate this moments using
an appropriate correlation matrix, see for instance Section 8.4.2. in the technical
Swiss Solvency Test document [43] and fit a gamma or a log-normal distribution
to this first two moments.
Remarks.
• In the Swiss Solvency Test one distinguishes between pure process risk and
parameter uncertainty for the small claims layer. Process risk is diversifiable
w)
with increasing volume, whereas parameter uncertainty is not. As a result
the coefficient of variation per line of business has a similar form as has been
found for the compound binomial distribution, see Proposition 2.24. That
is, for volume v → ∞ the coefficient of variation does not vanish but stays
(m
strictly positive.
• In the Swiss Solvency Test one aggregates in addition so-called scenarios. The
motivation for this is that the present model cannot reflect all uncertainties
and therefore it is slightly disturbed by scenarios. These scenarios are basi-
cally claims of Bernoulli type, i.e. the occur with a certain probability and if
tes
they occur they have a given amount.
• For the aggregation between PY and CY claims it is either assumed that they
are independent, or that the claims development result uncertainty CDR1 and
the small claim CY Ssc are again aggregated via a correlation matrix and then
no
Market-value margin
The careful reader will have noticed that we have lost the risk margin somewhere
on the way to the final result. We will not further discuss the risk and market-value
NL
margin here, we only want to mention that the current calculation of the market-
value margin is quite ad-hoc, see Chapter 6 in Swiss Solvency Test [43] and Section
10.3 in Wüthrich-Merz [87], and further refinements are necessary.
w)
(m
tes
no
NL
[1] Acerbi, C., Tasche, D. (2002). On the coherence of expected shortfall. Journal
Banking and Finance 26/7, 1487-1503.
w)
[2] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans-
actions on Automatic Control 19/6, 716-723.
[3] Alai, D.H., Merz, M., Wüthrich, M.V. (2009). Mean square error of prediction in
(m
the Bornhuetter-Ferguson claims reserving method. Annals of Actuarial Science
4/1, 7-31.
[4] Alai, D.H., Merz, M., Wüthrich, M.V. (2010). Prediction uncertainty in the
Bornhuetter-Ferguson claims reserving method: revisited. Annals of Actuarial Sci-
ence 5/1, 7-17.
tes
[5] Artzner, P., Delbaen, F., Eber, J.M., Heath, D. (1997). Thinking coherently. Risk
10/11, 68-71.
[6] Artzner, P., Delbaen, F., Eber, J.M., Heath, D. (1999). Coherent measures of risk.
Mathematical Finance 9/3, 203-228.
no
[7] Asmussen, S., Albrecher, H. (2010). Ruin Probabilities. 2nd edition. World Scien-
tific.
[8] Bahr, von B. (1975). Asymptotic ruin probabilities when exponential moments do
not exist. Scandinavian Actuarial Journal 1975, 6-10.
NL
[9] Bailey, R.A. (1963). Insurance rates with minimum bias. Proc. CAS, 4-11.
[10] Bailey, R.A., Simon, L.J. (1960). Two studies on automobile insurance ratemaking.
ASTIN Bulletin 1, 192-217.
[14] Boland, P.J. (2007). Statistical and Probabilistic Methods in Actuarial Science.
Chapman & Hall/CRC.
[15] Bolthausen, E., Wüthrich, M.V. (2013). Bernoulli’s law of large numbers. ASTIN
Bulletin 43/2, 73-79.
247
248 Bibliography
[16] Bornhuetter, R.L., Ferguson, R.E. (1972). The actuary and IBNR. Proc. CAS,
Vol. LIX, 181-195.
[19] Bühlmann, H. (1980). An economic premium principle. ASTIN Bulletin 11/1, 52-
60.
w)
nomics 11/2, 113-127.
[21] Bühlmann, H. (1995). Life insurance with stochastic interest rates. In: Financial
Risk in Insurance, G. Ottaviani (ed.), Springer, 1-24.
(m
[22] Bühlmann, H. (2004). Multidimensional valuation. Finance 25, 15-29.
[23] Bühlmann, H., De Felice, M., Gisler, A., Moriconi, F., Wüthrich, M.V. (2009).
Recursive credibility formula for chain ladder factors and the claims development
result. ASTIN Bulletin 39/1, 275-306.
[24] Bühlmann, H., Gisler, A. (2005). A Course in Credibility Theory and its Applica-
tes
tions. Springer.
[28] Cramér, H. (1930). On the Mathematical Theory of Risk. Skandia Jubilee Volume,
Stockholm.
NL
[29] Cramér, H. (1955). Collective Risk Theory. Skandia Jubilee Volume, Stockholm.
[30] Cramér, H. (1994). Collected Works. Volumes I & II. Edited by A. Martin-Löf.
Springer.
[33] Denuit, M., Maréchal, X., Pitrebois, S., Walhin, J.-F. (2007). Actuarial Modelling
of Claims Count. Wiley.
[34] Dickson, D.C.M. (2005). Insurance Risk and Ruin. Cambridge University Press
[35] Duffie, D. (2001). Dynamic Asset Pricing Theory. 3rd edition. Princeton University
Press.
[36] Embrechts, P., Klüppelberg, C., Mikosch, T. (2003). Modelling Extremal Events
for Insurance and Finance. 4th printing. Springer.
[37] Embrechts, P., Nešlehová, J., Wüthrich, M.V. (2009). Additivity properties for
Value-at-Risk under Archimedean dependence and heavy-tailedness. Insurance:
Mathematics and Economics 44/2, 164-169.
[38] Embrechts, P., Veraverbeke, N. (1982). Estimates for the probability of ruin with
w)
special emphasis on the possibility of large claims. Insurance: Mathematics and
Economics 1/1, 55-72.
[39] England, P.D., Verrall, R.J. (2002). Stochastic claims reserving in general insur-
ance. British Actuarial Journal 8/3, 443-518.
(m
[40] England, P.D., Verrall, R.J., Wüthrich, M.V. (2012). Bayesian overdispersed Pois-
son model and the Bornhuetter-Ferguson claims reserving method. Annals of Ac-
tuarial Science 6/2, 258-283.
[41] European Commission (2010). QIS 5 Technical Specifications, Annex to Call for
Advice from CEIOPS on QIS5.
tes
[42] Feller, W. (1966). An Introduction to Probability Theory and its Applications.
Volume II. Wiley.
[43] FINMA (2006). Swiss Solvency Test. FINMA SST Technisches Dokument, Version
no
2. October 2006.
[46] Frees, E.W. (2010). Regression Modeling with Actuarial and Financial Applica-
tions. Cambridge University Press.
[47] Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo
in Practice. Chapman & Hall.
[49] Gisler, A., Wüthrich, M.V. (2008). Credibility for the chain ladder reserving
method. ASTIN Bulletin 38/2, 565-600.
[50] Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and
Bayesian model determination. Biometrika 82/4, 711-732.
[51] Green, P.J. (2003). Trans-dimensional Markov chain Monte Carlo. In: Highly Struc-
tured Stochastic Systems, P.J. Green, N.L. Hjort, S. Richardson (eds.), Oxford
Statistical Science Series, 179-206. Oxford University Press.
[52] Hachemeister, C.A., Stanard, J.N. (1975). IBNR claims count estimation with
static lag functions. ASTIN Colloquium 1975, Portugal.
[53] Hofert, M., Wüthrich, M.V. (2013). Statistical review of nuclear power accidents.
Asia-Pacific Journal of Risk and Insurance 7/1, Article 1.
[54] Johansen, A.M., Evers, L., Whiteley, N. (2010). Monte Carlo Methods. Lecture
w)
Notes, Department of Mathematics, University of Bristol.
[55] Johnson, R.A., Wichern, D.W. (1998). Applied Multivariate Statistical Analysis.
4th edition. Prentice-Hall.
(m
[56] Jung, J. (1968). On automobile insurance ratemaking. ASTIN Bulletin 5, 41-48.
[57] Kaas, R., Goovaerts, M., Dhaene, J., Denuit, M. (2008). Modern Actuarial Risk
Theory, Using R. 2nd edition. Springer.
[63] Mack, T. (1991). A simple parametric model for rating automobile insurance or
estimating IBNR claims reserves. ASTIN Bulletin 21/1, 93-109.
NL
[64] Mack, T. (1993). Distribution-free calculation of the standard error of chain ladder
reserve estimates. ASTIN Bulletin 23/2, 213-225.
[66] McCullagh, P., Nelder, J.A. (1983). Generalized Linear Models. Chapman & Hall.
[67] McGrayne, S.B. (2011). The Theory That Would Not Die. Yale University Press.
[68] McNeil, A.J., Frey, R., Embrechts, P. (2005). Quantitative Risk Management: Con-
cepts, Techniques and Tools. Princeton University Press.
[69] Merz, M., Wüthrich, M.V. (2008). Modelling the claims development result for
solvency purposes. CAS E-Forum, Fall 2008, 542-568.
[72] Ohlsson, E., Johansson, B. (2010). Non-Life Insurance Pricing with Generalized
Linear Models. Springer.
w)
[75] Renshaw, A.E., Verrall, R.J. (1998). A stochastic model underlying the chain-
ladder technique. British Actuarial Journal 4/4, 903-923.
[76] Resnick, S.I. (1997). Heavy tail modeling of teletraffic data. Annals of Statistics
(m
25/5, 1805-1869.
[77] Resnick, S.I. (2002). Adventures in Stochastic Processes. 3rd printing. Birkhäuser.
[78] Robert, C.P. (2001). The Bayesian Choice. 2nd edition. Springer.
[79] Rolski, T., Schmidli, H., Schmidt, V., Teugels, J. (1999). Stochastic Processes for
tes
Insurance and Finance. Wiley.
[80] Saluz, A., Gisler, A., Wüthrich, M.V. (2011). Development pattern and predic-
tion error for the stochastic Bornhuetter-Ferguson claims reserving model. ASTIN
Bulletin 41/2, 279-317.
no
[82] Schweizer, M. (2009). Stochastic Processes and Stochastic Analysis. Lecture Notes,
ETH Zurich.
[83] Sundt, B., Jewell, W.S. (1981). Further results of recursive evaluation of compound
NL
[84] Tsanakas, A., Christofides, N. (2006). Risk exchange with distorted probabilities.
ASTIN Bulletin 36/1, 219-243.
[87] Wüthrich, M.V., Merz, M. (2008). Stochastic Claims Reserving Methods in Insur-
ance. Wiley.
[88] Wüthrich, M.V., Merz, M. (2013). Financial Modeling, Actuarial Valuation and
Solvency in Insurance. Springer.
w)
(m
tes
no
NL
Exercise 1, page 18
w)
Exercise 2, page 21
Corollary 2.7, page 28
Exercise 3, page 50
Exercise 4, page 60
(m
Exercise 5, page 81
Exercise 6, page 94
Corollary 6.6, page 140
Exercise 7, page 141
Exercise 8, page 144
Exercise 9, page 146
tes
Exercise 10, page 149
Exercise 11, page 150
Exercise 12, page 155
Exercise 13, page 182
no
253
Index
w)
p-value, 21 Bayesian CL
factor, 220
absolutely continuous distribution, 15 predictor, 221
acceptable, 152, 238 Bayesian information criterion, 81
(m
accident Bernoulli
date, 206 distribution, 26
year, 208 experiment, 26
AD (asset deficit), 239 random walk, 121
AD test, 80 Bernoulli, Jakob, 12
adjustment coefficient, 120 best-estimate reserves, 211
tes
admissible, 32 BF
age-to-age factor, 212 method, 211, 216
aggregation property, 30 reserves, 216
AIC, 81 BIC, 81
Akaike information criterion, 81
no
254
Index 255
w)
CL reserves, 213 convergence in distribution, 17
claims convex cone, 152
counts, 23 convolution, 25
frequency, 26 cost-of-capital, 151, 154
claims development
(m
rate, 151, 154
result, 227, 244 Cramér, Harald, 115
triangle, 209 Cramér-Lundberg process, 115
claims inflation, 86 credibility coefficient, 199, 221
claims reserves, 210, 211 credibility estimator, 190
claims reserving, 205 homogeneous, 196
tes
algorithm, 211 inhomogeneous, 196
method stochastic, 217 credibility weight, 183, 186, 189
closing date, 206 credit risk, 240, 243
CLT, 13, 90 CRRA utility function, 138
CoC, 151 CTE, 150
no
w)
Gerber-Shiu risk theory, 115
Edgeworth approximation, 96
Gisler, Alois, 196
Edgeworth, Francis Ysidro, 96
Glivenko-Cantelli theorem, 77, 89
Embrechts, Paul, 132
GLM, 159, 173
Embrechts-Veraverbeke theorem, 131
(m
Goldie, Charles M., 132
empirical
goodness-of-fit, 80, 169
distribution function, 56
loss size index function, 57 happiness index, 135
mean excess function, 57 heavy tailed, 129
England, Peter D., 225 Hill
ES, 150 estimator, 73
tes
Esscher plot, 73
measure, 145 histogram, 55
premium, 145, 156 homogeneous credibility estimator, 196
estimation error, 21
no
estimator, 21 i.i.d., 19
expectation, 15 IBNeR, 207
expected claims frequency, 26 IBNyR, 207
expected shortfall, 149, 150, 154 incomplete gamma function, 59
expected value, 58 independent and identically distributed,
expected value principle, 133 19
NL
w)
Bailey & Jung, 164
law of large numbers, 12
Bailey & Simon, 162
layer, 58, 83
total marginal sums, 164
leverage effect, 86
method of moments, 40
likelihood function, 45
(m
minimum variance estimator, 41
likelihood ratio test, 170
mixed Poisson distribution, 36
linear credibility, 183, 193
definition, 36
link function, 176
MLE, 40, 45
link ratio, 212
MM, 40
LLN, 12
model risk, 13
log-gamma distribution, 69
tes
moment estimator, 41
log-likelihood function, 45
moment generating function, 16, 18, 58
log-linear model, 167
moments, 15
log-link function, 176
log-log plot, 57 monotonicity, 152
Monte Carlo simulation, 89
no
log-normal distribution, 65
loss size index function, 57, 58 Morgenstern, Oskar, 136
Lundberg MSEP, 217, 222
bound, 119, 120 multiplicative tariff, 160
coefficient, 120 multivariate Gaussian distribution, 166
Lundberg, Ernst Filip Oskar, 115 density, 167
NL
w)
p-value, 21 PY claim, 240
Panjer PY risk, 243
algorithm, 101, 103
distribution, 102 radius of convergence, 16
Radon-Nikodym derivative, 156
(m
recursion, 101
Panjer, Harry H., 101 random variables, 14
parameter estimation random walk theorem, 118
claims count distribution, 40 RBNS, 207
error, 217 re-insurance, 85
Pareto distribution, 72 regularly varying, 59, 130
tes
Pareto, Vilfredo Federico Damaso, 72 renewal property, 119
past exposure claim, 209 reporting
Pearson’s residuals, 181 date, 206
Poisson distribution, 28, 175 delay, 206
definition, 28 reserve risk, 240
no
w)
delay, 208
power, 138
period, 206
utility indifference price, 140
shape parameter, 59
utility theory, 135
Shiu, Elias S.W., 115
(m
significance level, 21 vague prior, 187
Simon, LeRoy J., 162 value
skewness, 16, 58 assets, 236
Smirnov, Nikolai Vasilyevich, 77 liabilities, 236
solvency, 238 Value-at-Risk, 150, 153
Solvency II, 237 VaR, 150, 153, 237
tes
Spitzer’s formula, 124 Var, 15
Spitzer, Frank Ludvig, 125 variable reduction analysis, 180
SST, 237 variance, 15, 58
standard assumptions for compound dis- variance loading principle, 134
tributions, 23 Vco, 16
no