You are on page 1of 10

ACTS 6306: Lecture Notes: Part 4

Bayesian Credibility: Continuous Prior for Various Models


Yuly Koshevnik

Introduction
These notes are based on Chapter 3 of the textbook.
Tentatively, material will be covered on September 14 - 16.
Our first exam is scheduled for 09/21, so there will be no class session on that day. After exam, our sessions
will resume on September 23.

1 Setup and Terminology


A random variable, X, is a measure of credibility. It can be either discrete (such as claim frequency or
number of years before the first claim occurs) or continuous, like loss or claim severity. Our starting
point is based on a discrete X. In examples of this type we assume that one or more records of X are
collected and they form a sample of independent identically distributed (i.i.d.) variables with a common
probability function, denoted as
f X (x; θ) = f (x|θ),
with unknown parameter, θ. Since it is not known, Bayesian framework suggests to use a prior distribution,
with probability function,
π(θ) = f Θ (θ)
which can be either mass or density.

1.1 Bayesian Framework


To summarize what was already explained, Bayesian framework operates with the notions presented as
follows.

• Model. Recorded values of X form a data set,

X = {X1 , X2 , . . . , Xn } ∼ f X|Θ (x|θ) = f (x|θ)

such that conditionally, given Θ = θ, their common probability function is f (x|θ)

1
• Prior. The unknown parameter θ is viewed as a random variable, Θ with prior probability function,

f Θ (θ) = π(θ)

• Posterior. Having observed values as a set X of n records, the posterior distribution of (Θ|X) can
be found using a formula for its probability function:

π(θ) · f (x|θ)
π(θ|X) = f Θ|X (θ|x) = ,
f (x)

where the denominator represents joint marginal probability function for X obtained via taking
expectation of what stands i the numerator.

1.2 Objectives
Bayesian framework pursues objectives related to its structure. They are listed below.

1. Joint marginal associated with the sample. That is what was shown in the denominator for a posterior
distribution, π(θ|x). In particular, if there is a single record, X = X1 , then its marginal probability
function needs to be derived.

2. When a potential new value, Y = Xn+1 is supposed to be recorded, we assume that altogether, values

{X1 , X2 , . . . , Xn , Y = Xn+1 },

given Θ = θ, form a collection of conditionally i.i.d. variable. Then the predictive distribution of
(Y |X = x) is one o the goals that can be achieved by using Bayesian framework. It is a conditional
distribution of Y, given observations carried by an array X.

3. Particularly important is Bayesian credibility premium for Y = Xn+1 and defined as conditional
expectation,
E [Y |X] = Ẽ [Y |Θ],
where the symbol Ẽ indicates that the posterior distribution of (Θ|X) is replacing the prior.

2
2 Geometric Model and Continuous Prior
Assume that variables, X = {Xj : j ≥ 1} are independent and identically distributed, presenting the
counts of claim-free time periods (usually, years) that precede the first claim. The model in this case is
geometric, so
f (x|q) = (1 − q)x · q for integer x ≥ 0.
Parameter q is unknown, so Bayesian framework is based on assumptions expressed in terms of the prior
distribution. We restrict our considerations by conjugate pairs for model and prior.
That is why we assume that Q ∼ Beta [a, b], with specified parameters, a > 0 and b > 0. For a geometric
model, sometimes we will require a > 1 or a > 2, when needed.

2.1 Joint Probability Function


For a pair, (Q, X), where X is a single observation, the joint probability function is

f Q,X (q, x) = π(q) · f (x|q),

where the prior density is

Γ(a + b)
π(q) = · q a−1 · (1 − q)b−1 for (0 < q < 1).
Γ(a) · Γ(b)

2.2 Marginal Distribution of X


It is defined by the formula:
Z 1
Γ(a + b) Γ(a + 1) · Γ(b + x)
P [X = x] = P [Xx|Q = q] dq = ·
0 Γ(a) · Γ(b) Γ(a + b + x + 1)

Such formula can hardly help us and is presented as an illustration of how marginal probabilities can be
evaluated. Fortunately, as we use conjugate priors, such an exercise will be easy to bypass. Generally,
when n records are observed, the joint probability function is derived similarly by using
n
X
w= xj
j=1

as the observed total count of claim-free time periods. Then joint probability function is

Γ(a + b) Γ(n + a) · Γ(w + b)


f (x) = ·
Γ(a) · Γ(b) Γ(n + a + w + b)

The sum, W = nj=1 Xj is a sufficient statistic that carries all information about the parameter, as will
P

be seen immediately.

3
2.3 Posterior Distribution
Having observed X = {Xj : 1 ≤ j ≤ n}, we derive the posterior density for (Q|X) as follows:

π(q) · f (x|q)
π(q|x) = ,
P [X = x]

which can be written in the form:

π(q|x) = C · q (n+a)−1 · (1 − q)(w+b)−1 .

Therefore, (Q|X) ∼ Beta [a0 = a + n, b0 = b + w]

2.4 Predicting Distribution


If Y = Xn+1 is to be predicted after having observed n records presented in the sample X, then conditional
distribution of (Y |X) is:

Γ(a + n + w + b)
P [Y = y|X = x] = P [Y |W = w] = (a + n) ·
Γ(w + b) · Γ(a + n + w + b + 1)

See formula (3.11) in the book.

2.5 Credibility Premium


Since posterior density for (Q|X) is already found and conditional distribution of (Y |Q = q) is the same
geometric starting at zero, the identity
1−Q
E [Y |Q] =
Q
leads to Bayesian premium and conditional variance of (Y |X) as follows:

b0 b+w
E [Y |X] = =
a0 − 1 a+n−1

4
3 Poisson Model and Gamma Prior
Typical considerations for claim frequency are based on a Poisson model with unknown parameter λ that
will be viewed as a random variable Λ with specified prior distribution. There is a minor ambiguity with
notation related to gamma-distribution. Using a detailed description, we assume that Λ ∼ Gamma [a, θ]
if its density function is
1 λ
 
Λ a−1
π(λ) = f (λ) = ·λ · exp −
Γ(a) · θa θ
for λ > 0. Often b = θ−1 is used as a parameter, so the same density is presented as
ba
π(λ) = f Λ (λ) = · λa−1 · exp(−bλ)
Γ(a)

3.1 Posterior Distribution


Start with n = 1 exposure showing claim frequency X1 = X = x. Then the marginal distribution of X
can be obtained by integration:
ba
Z ∞ Z ∞
P [X = x] = f (x|λ) · π(λ) dλ = · λ(x+a)−1 · exp(−λ(b + 1)) dλ
0 Γ(a) · x! 0

ba Γ(x + a)
= ·
Γ(a) · x! (b + 1)a+x
If a = r is an integer number, marginal distribution of X is negative binomial and can be interpreted as
b
follows: (X = k) means that k failures will occur before the rth success, where q = b+1 is the rate.
Now move on to a sample of size n. The formula for joint probability distribution for X = x is more
complicated, yet the good news is that one can proceed with no attention paid to it.
Consider W = 1≤j≤n Xj and notice that (W |Λ = λ) also has a Poisson distribution with rate λ · n. Given
P

X = x, we have the same information about Lambda as might be carried out by the sample. Posterior
distribution for (Λ|W = w) is derived as follows:
(n + b)w+a (w+a)−1
π(λ|W = w) = ·λ · exp(−(n + b)λ),
Γ(w + a)
which can be also stated in the equivalent form:

(Λ|W = w) ∼ Gamma [a0 = w + a, b0 = n + b]

3.2 Bayesian Credibility


Since E [Xj |Λ = λ] = λ, the posterior expectation represents Bayesian credibility for a future record,
Y = Xn+1 :
a0 w+a
E [Y |X] = E [Y |W = w] = 0 =
b n+b

5
In terms of θ = b−1 , this formula can be presented as follows:
θ
E [Y |W = w] = (w + a) ·
n·θ+1

3.3 Poisson with Nonstandard Prior


Assume that claim frequency is still Poisson distributed with unknown rate λ, but the parameter is set
as q = P [X > 0] = 1 − exp(−λ) The prior density for Q is Beta [a, b]. Events of interest are X = 0 and
X > 0, so for instance,

P [X = 0|Q = q] = 1 − q and P [X > 0|Q = q] = q.

Example below will explain how this problem can be solved.


Assume that q is associated with a random variable, Q ∼ Beta [a, b]. Then observations are binary, and
we return to the situation with Bernoulli and Beta.

3.3.1 Example
Assume that the prior density of Q is

π(q) = K · q 3 · (1 − q)2 for (0 < q < 1)

and π(q) = 0, otherwise. Thus, Q ∼ Beta [a = 4, b = 3] and

Γ(a + b) 6!
K= = = 60.
Γ(a) · Γ(b) (3!) · (2!)

Given X1 = 0, X2 > 0, and X3 = 0, we are about to determine all elements of Bayesian framework. For
T T
the sake of brevity, introduce the event, A = (X1 = 0) (X2 > 0) (X3 = 0)

1. Marginal probability of A is obtained as follows.

P [A] = E [P [A|Q]]

Since P [A|Q = q] = (1 − q) · q · (1 − q) = (1 − q)2 · q, conclude that


Z 1 Z 1
P [A] = E [(1 − Q)2 · Q] = K · [(1 − q)2 · q] · [q 3 · (1 − q)2 ] dq = K · q 4 · (1 − q)4 dq
0 0

Γ(5) · Γ(5) (4!)(4!)


= 60 · = 60 · ≈ 0.09524
Γ(10) 9!

6
2. Posterior distribution of (Q|A) can be found directly, by using the same trick as before:

π(q) · P [A|Q = q]
π(q|A) = = C · q 4 · (1 − q)4 ,
P [A]

where a factor C does not contain q. Therefore, (Q|A) ∼ Beta [5, 5]

3. Predictive distribution of (Y = X4 |A) is then evaluated using the formula:


5 1
P [Y > 0] = Ẽ [Q] = = = 0.5
5+5 2

4. Given A, the expected value of a binary random variable, Y = X4 is the posterior expectation of Q,
which is 0.5

7
4 Continuous Model and Prior
Assume that a sample X = {Xj : j ≥ 1} represents recorded losses or claim severity values. Assume that
they are all independent and identically distributed:

{Xj : 1 ≤ j ≤ n} ∼ Gamma [r, λ],

where r is a specified natural number and parameter λ > 0 is unknown. Let us consider what will happen
when this parameter is viewed as a random variable, Λ with specified continuous prior distribution.

4.1 Example
Assume that the model for losses is described in terms of a Gamma-density,

θ3 2
f (x|θ) = · x · exp(−θx) for x > 0
2
and the prior density for Θ is
1 3
π(θ) = · θ · exp(−θ).
6
It is clear that (X|Θ = θ) ∼ Gamma [3, θ−1 ] and Θ ∼ Gamma [4, 1].
Those who love the inverse Gamma can use transformed parametrization with Λ = Θ−1 that follows this
distribution with parameters [a = 4, b = 1].
This example and further general extensions is to be considered on September 16. The following tasks are
performed.

• Marginal distribution of X

• Posterior density of (Θ|X = x)

• Predictive distribution of (Y |X = x)

• Bayesian credibility, E [Y |X = x]

4.2 Detailed Solutions


1. Start with the joint probability (density - in this case) function:
1
f Θ,X (θ, x) = π(θ) · f (x|θ) = · θ6 · x2 · exp(−(x + 1) · θ)
12
defined for θ > 0 and x > 0. The marginal density for X is obtained by integration in θ that leads to

x2 x2 60 · x2
Z ∞
Γ(7)
f (x) = θ6 · exp(−(x + 1)θ) dθ = × = ,
12 0 12 (x + 1)7 (x + 1)7

which is identifiable as a Pareto distribution.

8
2. Posterior density of (Θ|X = x) formally should be found using the formula:

π(θ · f (x|θ) (x + 1)7 6


π(θ|x) = = · θ · exp(−(x + 1)θ),
f (x) 720

so (Θ|X = x) ∼ Gamma [7, (x+1)−1 ] with shape parameter a0 = 7 and scale (x+1)−1 . Equivalently,
if you prefer to use the reciprocal to scale, then it is b = x + 1.
The same result could be obtained faster, since model and prior are conjugate. Present the posterior
density in the form:

π(θ|x) = K · π(θ) · f (x|θ) = C · θ6 · exp(−(x + 1)θ),

where the factor K does not contain θ and depends on x and parameters of the prior. Therefore, we
obtain the same posterior for (Θ|X = x) as before, and the value of K can be determined as

(x + 1)7
K=
6!

3. Predictive distribution for (Y |X = x) is obtained via integration of f (y|θ) with respect to the
posterior of (Θ|X = x) as follows:
" #
(x + 1)7 θ3 2 (x + 1)7 · y 2
Z ∞
6
f (y|x) = [θ · exp(−(x + 1)θ)] × · y · exp(−θ · y) dθ = 252 · ,
720 0 2 (x + y + 1)10

which sometimes is called generalized Beta

4. The easiest part is Bayesian credibility premium:


7
E [Y |X = x] =
x+1

Multiple Observations. If a sample of n recorded X-values was given, then evaluation of the joint den-
sity associated with values of X would be time consuming, but the same representation of π(θ|(x1 , x2 , . . . , xn ))
would lead to the formula:
π(θ|x) = K · θ3n+a−1 · exp(−(w + b)θ),
where w = nj=1 xj is the value of a sufficient statistic, Θ ∼ Gamma [a, b−1 ], and K can be found from
P

recognized posterior Gamma-distribution:

(Θ|W = w) ∼ Gamma [a0 = an, (b0 )−1 = (w + b)−1 ]

9
4.3 Poisson Model and Gamma Prior: Example
Assume that observations, X = {Xj : 1 ≤ j ≤ n}, represent claim frequency values. The following
information is provided.
1. {Xj : 1 ≤ j ≤ n} are independent Poisson distributed with unknown rate λ.
2. Prior distribution for Λ is Gamma [α, β] such that
E [Λ] = 0.10 and Var [Λ] = 0.0003

3. During a three-year observation period, there are 150 claims


4. For each of three years, there are 200 policies observed.
Determine the posterior expectation of Λ.

4.3.1 Solution
Identify parameters of the prior distribution as follows. With moments of Λ being specified, we have:
E [Λ] = α · β = 0.10 and Var [Λ] = α · β 2 = 0.0003,
which implies that α = (0.03)−1 and β = (0.003)−1 . The total number of exposures is n = 200 · 3 = 600
and the overall claim frequency is w = 150. Therefore, posterior distribution for Λ is
Gamma [α0 = α + 150, β 0 = β + 600],
so the posterior expectation of Λ is
−1
1 1
  
Ẽ [Λ] = 150 + · + 600 ≈ 0.1964
0.03 0.003

4.4 Conclusion
Assume that Λ ∼ Gamma [a, b−1 ], so
ba
π(λ) = · λa−1 · exp(−b · λ).
Γ(a)
For n being a number of exposures, denote
n
X
W = Xj ,
j=1

the total number claims. Let w be the sum of observed claim frequency values. Then the posterior
distribution of (Λ|X) = x is the same as for (Λ|W = w) and
(Λ|W = w) ∼ Gamma [a0 = a + w, b0 = b + n],
so β 0 = b−1 and
a0 a+w
E [Λ|W = w] = 0
=
b b+n

10

You might also like