You are on page 1of 26

North American Actuarial Journal

ISSN: 1092-0277 (Print) 2325-0453 (Online) Journal homepage: http://www.tandfonline.com/loi/uaaj20

Credibility Estimation of Distribution Functions


with Applications to Experience Rating in General
Insurance

Xiaoqiang Cai, Limin Wen, Xianyi Wu & Xian Zhou

To cite this article: Xiaoqiang Cai, Limin Wen, Xianyi Wu & Xian Zhou (2015)
Credibility Estimation of Distribution Functions with Applications to Experience
Rating in General Insurance, North American Actuarial Journal, 19:4, 311-335, DOI:
10.1080/10920277.2015.1057649

To link to this article: http://dx.doi.org/10.1080/10920277.2015.1057649

Published online: 02 Oct 2015.

Submit your article to this journal

Article views: 242

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=uaaj20

Download by: [University of Nairobi Library] Date: 03 November 2016, At: 05:53
North American Actuarial Journal, 19(4), 311335, 2015
Copyright 
C Society of Actuaries

ISSN: 1092-0277 print / 2325-0453 online


DOI: 10.1080/10920277.2015.1057649

Credibility Estimation of Distribution Functions with


Applications to Experience Rating in General Insurance

Xiaoqiang Cai,1 Limin Wen,2 Xianyi Wu,3 and Xian Zhou4


1
Department of Systems Engineering and Engineering Management, Chinese University of Hong Kong,
Hong Kong, China
2
Institute of Mathematics and Information Science, Jiangxi Normal University, Jiangxi, China
3
Center of International Finance and Risk Management, East China Normal University, Shanghai, China
4
Department of Applied Finance and Actuarial Studies, Macquarie University, Sydney, Australia

This article presents a new credibility estimation of the probability distributions of risks under Bayes settings in a completely
nonparametric framework. In contrast to the Fergusons Bayesian nonparametric method, it does not need to specify a mathematical
form of the prior distribution (such as a Dirichlet process). We then show the applications of the method in general insurance premium
pricing, a procedure commonly known as experience rating, which utilizes the insureds claim experience to calculate a proper premium
under a given premium principle (referred to as a risk measure). As this method estimates the probability distributions of losses, not
just the means and variances, it provides a unified nonparametric framework to experience rating for arbitrary premium principles. This

encompasses the advantages of the well-known Buhlmanns and Fergusons approaches, while it overcomes their drawbacks. We first
establish a linear Bayes method and prove its strong consistency in nonparametric settings that require only knowledge of the first two
moments of the loss distributions considered as a stochastic process. Then an empirical Bayes method is developed for the more general
situation where a portfolio of risks is observed but no knowledge is available or assumed on their loss and prior distributions, including
their moments. It is shown to be asymptotically optimal. The performance of our estimates in comparison with traditional methods is
also evaluated through theoretical analysis and numerical studies, which show that our approach produces premium estimates close to
the optima.

1. INTRODUCTION
Pricing or measuring risks is one of the central tasks of financial enterprises and regulators in risk management. Numerous
risk measures have been proposed for this purpose, including value-at-risk, conditional value-at-risk, coherent risk measures, and
distortion risk measures, to name just a few; see, for example, Dhaene et al. (2006), Natarajan et al. (2009), Szego (2002), Wu
and Zhou (2006), and the references therein. In the context of general insurance, to compensate insured losses in property, health,
business, employment, etc., pricing risks is carried out by premium calculation principles or premium principle for short, which are
the translations of risk measures in insurance markets. Excellent reviews of premium principles can be found in Kaas et al. (2001,
chapter 5), Sundt (1999), and Young (2004), among others. In this article, the terms risk measure and premium principle are
alternately used to indicate a rule H (X) that assigns a premium to a risk X in terms of a functional of its distribution function.
In practice, it is only the first step in the process of risk pricing to determine an appropriate premium principle, because the
involved distributions are generally unknown so that H (X) can be estimated only by using the insureds claim experience together
with certain effective statistical methods. These procedures form the so-called experience rating. We are concerned with solutions
to this problem under a Bayesian setting: The decumulative distribution function Pr(X > x) (abbreviated as ddf hereafter) of a
risk X is identified by an unknown and unobservable parameter (vector) , written formally as S(x, ) = Pr(X > x| ), and is a
random variable. The distribution of , denoted by ( ) (specified or unspecified), is referred to as a prior distribution in statistics
and a structural function in actuarial science. The majority of the literature has mainly focused on the situations where S(x, ) is
fully specified given and ( ) is known (or assumed); thus the problems can be solved with the standard parametric Bayesian
methodology through posterior update. Relevant work in this area includes Heilmann (1989), Klugman (1992), Makov et al. (1996),

Address correspondence to Xianyi Wu, Department of Statistics and Actuarial Science, East China Normal University, 200241 Shanghai,
China. E-mail: xywu@stat.ecnu.edu.cn

311
312 X. CAI ET AL.

Pai (1997), Schmidt (1998), and Gomez et al. (2000, 2006). Although the applications of the parametric Bayesian methodology
have been extensively investigated for general insurance, the reality is that the knowledge of the mechanism underlying the
contingent loss is generally insufficient to specify ( ), making these applications impractical. To deal with such situations, one
approach is to allow some unknown parameters in ( ) but retain an assumed mathematical form of ( ). Then the risk pricing
can be carried out by means of empirical Bayes analysis introduced by Robbins (1955, 1964).
In many insurance practices, however, even a mathematical formula of S(x, ) is not available due to the scarcity of information.
Under this circumstance, the analysis can be done only in a distribution-free or nonparametric basis. The best-known solution in the
community of actuarial science so far is Buhlmanns approach to the net premium principle (known as credibility theory), whose
optimality is demonstrated by Buhlmann (1967). Under the net premium principle, Buhlmanns credibility premium of a future
claim Xn+1 can be simply expressed as a weighted average between the empirical individual mean (indicating the information
delivered by historical data of the risk itself) and the collective mean (indicating the aggregated information obtained from all
possible insureds):

H (Xn+1 ) = zXn + (1 z)0 , (1.1)



where X n = n1 ni=1 Xi is the empirical mean of the claims data {X1 , X2 , . . . , Xn }, 0 = E[X] is known as the collective mean
of the risk, and z is the credibility factor. This linear weighted average has been so well received today that it is almost regarded
as a synonym for credibility theory. Quite a few remarkable contributions have been reported in this direction since Buhlmanns
pioneering work, including the weighted credibility models of Buhlmann and Straub (1970), the credibility for linear regression
models by Hachemeister (1975), the strong consistency of credibility estimates by Schmidt (1991), the asymptotic optimality of
empirical credibility by Mashayekhi (2002), and the credibility for seemingly uncorrelated regression models by Pitselis (2004);
see Norberg (2004) or Buhlmann and Gisler (2005) for more comprehensive review.
Another important solution stream to this problem originated from the seminal paper of Ferguson (1973) and a following paper
by Antoniak (1974) on Bayesian nonparametrics. It was subsequently applied to experience rating by Zehnwirth (1977, 1979,
1981) and Lau et al. (2006), among others. Its idea is briefly outlined below; more details can be found in the monograph of Ghosh
and Ramamoorthi (2003).
Let X and be two random variables on a measurable space (R, B), where X has a (conditional) ddf S(x, ) = Pr(X > x| )
and the distribution of is determined by a finite Borel measure () on the real line R such that Pr( A) = (A)/(R) for all
A B. We can consider {S(x, ), x R} as a stochastic process indexed by x in the sense that given each x R, S(x, ) is a
random variable (as a function of ). Denote by P the probability measure with ddf S(x, ) such that P ((x, )) = S(x, ) for all
x R. Then for each A B, P (A) is a random variable with distribution determined by (), and {P (A), A B} is a stochastic
process indexed by A B. It is in the above sense that P is considered as a random probability measure and referred to as a
Dirichlet process with time horizon B and prior distribution determined by ().
 losses X1 , . . . , Xn are observed, the posterior distribution of P is also a Dirichlet process but with being updated to
After
+ ni=1 Xi , where Xi is the probability measure degenerated at Xi . Consequently, S(x, ) can be estimated by the posterior
mean

 ((x, )) n
S(x, ) = + Sn (x), (1.2)
(R) + n (R) + n

where Sn (x) = n1 ni=1 I (Xi > x). The corresponding empirical Bayes versions when the prior Dirichlet process contains a few
superparameters are discussed in Zehnwirth (1981). Then the experience rating can be carried out by inserting the estimated ddf
of X into any premium principle.
For Buhlmann-type solutions, unfortunately, as noted by Buhlmann (1970) and Gerber (1980), it is not easy to directly transplant
Buhlmanns method to other premium principles, so that almost all contributions in this area have been largely limited to the net
premium principle. The few exceptions are, chronologically, the variance premium principle (Buhlmann, 1970, chapter 4), Esscher
premium principle and a Buhlmann-type credibility estimation of the variance premium principle (Gerber 1980, Goovaerts et al.
1990 and Pan et al. 2008), and the general weighted loss function premium principle (due to Furman and Zitikis 2008) studied
by Wen et al. (2009). These exceptions prove that it is not feasible to directly apply the idea of Buhlmann (1967) to experience
ratemaking under arbitrary premium principles. On the other hand, for Ferguson-type solutions, while the approach provides a
unified solution for all premium principles, it requires the assumption of a Dirichlet prior and thus has an apparent drawback: like
any other prior in Bayesian methodology, the Dirichlet prior is proposed mainly due to mathematical convenience but is hardly
justifiable in practice. Without the Dirichlet assumption, the estimator in (1.2) would lose its justification, and the risk premium
H (Xn+1 ) by Fergusons approach would not be credible when the true prior is far from the Dirichlet.
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 313

In this article we introduce a new distribution-free approach for experience rating under arbitrary premium principles without
precise specification of the prior distribution. It has two direct advantages: (1) truly distribution-free settings as in the Buhlmanns
credibility theory and (2) generating experience rating for arbitrary premium principles as achieved by Fergusons Bayesian
nonparametric method. The main idea is to first derive an estimate of S(x, ) by minimizing the L2 -distance and then embed
the estimated ddf, such as  S(x, ), into the premium principle functional to obtain an estimate of the corresponding premium,
 (X, ) = H (
H S(x, )). In addition, compared with Jewell (1974), who estimated S(x, ) for every fixed point x so that the
estimators are not necessarily ddfs, and hence may not be feasible to produce empirical premiums by plugging the estimators of
the distributions into mathematical formulae of premium principles, our estimates work smoothly for empirical ratemaking by
using a ddf to estimate S(x, ).
Specifically, we consider the following two models:
(1) Model I: The first two moments of S(x, ) with respect to the prior can be specified. In this case the data X1 , X2 , . . . Xn , . . .
are (conditionally) i.i.d. copies of X given . The investigation of Model I brings insight into and motivates the estimation
methods of the more practically meaningful Model II next.
(2) Model II: This is a general model in which we do not need to specify any moments of S(x, ). Suppose that we have a
portfolio of risks Xi , i = 1, 2, . . . , K, where each Xi is identified with a parameter (vector) i and contributes a series of data
Xi1 , Xi2 , . . . Xini , . . . , which are i.i.d. copies of Xi , given i . The purpose is again to estimate the distributions and then assign
proper premiums to the future claims Xi,ni +1 , i = 1, 2, . . . , K under a certain premium principle.
This case belongs to the framework of empirical Bayes methods first introduced by Robbins (1955, 1964), then applied to
credibility theory by Norberg (1980) under the net premium principle, and Zehnwirth (1981) for any premium principle under
Dirichlet priors.
Our work leads to a new and purely nonparametric estimation method for the ddf under the framework of Bayesian methodology.
To overcome the restriction of Dirichlet priors, we develop an approach in a quite different way from the so-called Bayesian
nonparametrics initiated by Ferguson (1973). It turns out, however, that our estimation happens to coincide with that of Ferguson
(1973) when the prior distribution is Dirichlet; see Remark 2.2 in Section 2 for more details. This is a surprising by-product and
shows that our results actually generalize those of Ferguson (1973).
The remainder of the article is organized as follows. In Section 2, starting with a naive idea of using Buhlmanns method
considered by Jewell (1974), we show by a counterexample that Jewells credibility does not estimate a ddf by a function that is
itself a ddf, and then we develop first the criterion for deriving the linear Bayes estimation and then discuss the optimal estimation
of S(x, ) under Models I and II together with the asymptotic properties of the estimators. Section 3 treats the experience rating
under a number of well-known premium principles. Concluding remarks are given in Section 4. To smooth the flow of presentation,
most of the technical proofs and auxiliary materials are relegated to appendices.

2. THE LINEAR BAYES ESTIMATION OF DISTRIBUTIONS


We first study Model I; that is, both the ddf S(x, ) of the risk X and the prior distribution ( ) of the unknown parameter
are unspecified, but their first two moments are known. This assumption will be removed when the empirical Bayes estimation
method is utilized. This section analyzes first a naive application of Buhlmanns idea to estimate S(x, ) (Jewell 1974) to show
that such a scheme does not work for Model I and therefore a new framework is needed.
Write
  
S0 (x) = E [S(x, )] , 02 (x) = Var (S(x, )) and, 02 (x) = E Var I(X>x) | , (2.1)

where I{X>x} is the indicator of event {X > x} and E and Var indicate that the expectations are computed with respect to the
distribution of . We also denote ( ) = E[X| ], 2 = E [Var(X| )] and 2 = Var (E[X| ]). The existence of the expectations
in (2.1) is obvious.
2.1. The Performance Measure to Be Optimized
Following Jewell (1974), because S(x, ) = E [I (X > x)| ] for a fixed x, the idea of Buhlmann (1967) suggested a credibility
estimate of S(x, ) given by

S(x) = Z(x)Sn (x) + (1 Z(x)) S0 (x), (2.2)



where Sn (x) = n1 ni=1 I (Xi > x) is the empirical ddf based on available data set (X1 , X2 , . . . , Xn ), and Z(x) =
n02 (x)/[02 (x) + n02 (x)] is the credibility factor with 02 (x) and 02 (x) defined in (2.1); x is included to indicate the
314 X. CAI ET AL.

dependence of the quantities on x. Using the expressions


   
02 (x) = S0 (x) E S 2 (x, ) and 02 (x) = E S 2 (x, ) S02 (x) (2.3)

and inserting Z(x) into (2.2), S(x, ) can be more precisely rewritten as


n+1
(n i + 1)E [S 2 (x, )] (n i)S 2 (x) S0 (x)E [S 2 (x, )]

S(x, ) = 0
I[X(i1) ,X(i) ) (x), (2.4)
i=1
(n 1)E [S 2 (x, )] nS02 (x) + S0 (x)

where = X(0) < X(1) X(2) X(n) < X(n+1) = are the order statistics of the sample.
This S (x, ) , however, is not generally monotone in x and thus not suitable to be a proper estimate of the ddf, as explicitly
shown in the counterexample below.
Counterexample: Consider the situation where X is nonnegative with E [S 2 (x, )] = q[(1 x)+ ]2 and S0 (x) = q(1 x)+ for
all x 0 and a known constant q (0, 1), where (a)+ = max(a, 0). One example is that takes only values 0 (with probability
1 q) and 1 (with probability q) and X degenerates at 0 for = 0 and S(x, 1) = (1 x)+ for all x 0, where a+ = max(a, 0),
so that 02 (x) = qx(1 x) and 02 (x) = q(1 q)(1 x)2 for all x [0, 1] and zero otherwise. For x (X(n) , 1), the credibility
estimate of S(x, ) is

qx(1 x)
S(x, ) = (1 Z(x))S0 (x) = ,
n(1 q) (n(1 q) 1)x

of which the derivative in x (X(n) , 1) is



dS(x, ) q[( n(1 q) 1)x n(1 q)][( n(1 q) + 1)x n(1 q)]
= .
dx [n(1 q) (n(1 q) 1)x]2

It is easy to check that dS(x, ) /dx is strictly positive at every x (0, b) and negative at every x (b, 1], where b =

n(1 q)/( n(1 q) + 1) (0, 1). Consequently, as long as X(n) < b, S(x, ) is increasing in x (X(n) , b) and decreasing in
x (b, 1). Thus S(x, ) is not monotone.
The lack of monotonicity in the naive credibility estimate S(x, ) makes it difficult to, for example,
give an intuitive interpretation for S (x, ) and
compute premiums under S (x, ) .
This difficulty is obviously due to the dependence of the credibility factor Z(x) on x. A natural and tractable remedy is to restrict
the credibility factor Z to be a constant free of x and construct a convex set of ddfs containing the empirical distributions, and

then find an optimal estimate S(x,
) in that convex set. In this article, we propose to obtain such an optimal estimate S(x, ) by
seeking a function of the form


n
0 (x) + i I (Xi > x) (2.5)
i=1

to minimize the square of the L2 -distance:



2
+ 
n
min E S(x, ) 0 (x) i I (Xi > x) dx, (2.6)
0 (x),1 ,...,n i=1

where 0 (x) is a nonincreasing function of x, independent of the claims history, and 1 , . . . , n are real-valued decision variables of
the optimization problem. Though initially it appears that we need to require the estimating function in (2.5) to be a ddf, this turns
out to be unnecessary because the solution to (2.6) meets this condition automatically and 0 (x) is proportional to the marginal
ddf S0 (x) of X; see Theorem 2.1 below.
 
Remark 2.1. If X has a finite mean, it can be easily checked that the integration in (2.6) as well as 02 (x) dx and 02 (x) dx
are finite and all the theory below is valid. Otherwise, if the integral in (2.6) may be infinite, so that the criterion (2.6) fails, an
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 315

easy remedy is to use a probability distribution function W (x) and replace dx with dW (x). Then all the results below remain valid
under W (x).

2.2. Estimation of Distribution Functions


The following theorem gives an estimator of the ddf by solving the optimization problem (2.6).

Theorem 2.1. The credibility estimator of the ddf S(x, ), which minimizes (2.6), is given by


S(x, ) = ZSn (x) + (1 Z)S0 (x), (2.7)

where

n02
Z= with 02 = 02 (x) dx and 02 = 02 (x) dx, (2.8)
n0 + 02
2

which is referred to as credibility factor as well. Moreover, the mean integrated squared error for the optimal estimate is
 + 
 2 02 02
E 
S(x, ) S(x, ) dx = . (2.9)
02 + n02

Proof. By temporarily writing Yi = I (Xi > x) and using the notation in (2.1), the integrand in (2.6) can be computed by

2  2

n 
n 
n
E S(x, ) 0 (x) i Yi = Var S(x, ) i Yi + 1 i S0 (x) 0 (x)
i=1 i=1 i=1

n 
n 2  
n 2
= 2i 02 (x) + 1 i 02 (x) + 1 i S0 (x) 0 (x)
i=1 i=1 i=1

  
and thus is minimized with respect to 0 (x) at 
0 (x) = 1 ni=1 i S0 (x). The corresponding minimum is

   2

n 
n 
n
Var S(x, ) i Yi = 2i 02 (x) + 1 i 02 (x).
i=1 i=1 i=1

Integrating it with respect to x and then taking the minimization procedure with respect to i leads to j = 02 /(n02 + 02 ),
j = 1, 2, . . . , n, and the final minimum 02 02 /(02 + ni 02 ). This completes the proof. 

Remark 2.2. Let (


, A, P) be a probability space and S(x, ) = Pr(X > x| ) = P ((x, )), where P is a random probability
measure on (R, B) (a stochastic process indexed by B B, for which the randomness comes from ),
. If P is a
Dirichlet process with parameter () (a finite Borel measure on the real line R) as defined in Ferguson (1973), then S(x, )
Beta((x), (R) (x)) with density

((R))
( |x) = (x)1 (1 )(R)(x)1 ,
((x)) ((R) (x))

where (x) = ((x, )), so that

(x)   (x) ((x) + 1)


S0 (x) = E [S(x, )] = and E S 2 (x, ) = .
(R) (R) ((R) + 1)
316 X. CAI ET AL.

Consequently, by (2.3),

(x) ((R) (x)) (x)((R) (x))


02 (x) = and 02 (x) = .
(R) ((R) + 1) 2 (R) ((R) + 1)

Inserting these into (2.8) gives the credibility factor Z = n02 /(n02 + 02 ) = n/((R) + n). Thus the estimate in (2.7) is the same
as (1.2). In this aspect, Theorem 2.1 acts as a linearized version of Fergusons theory with weaker prior assumptions: Theorem 2.1
does not require the Dirichlet prior as in Ferguson (1973).

Since Z (0, 1), as a weighted sum of the empirical ddf Sn (x) and the marginal ddf S0 (x) of X (the so-called collective ddf),
the credibility estimator (2.7) is clearly a ddf. In addition, Z 1 as n and Z 0 as n 0, which allows the classical
credibility interpretation: More data lead to more credible empirical ddf Sn (x), where n = 0 indicates the extreme situation where
no sample is observed. Furthermore, we have the following theorem on the strong consistency of the estimator.

Theorem 2.2. The estimator (2.7) is uniformly strongly consistent:

lim sup |S(x, ) 


S(x, )| = 0 almost surely (a.s.).
n x

Proof. This follows from the well-known Glivenko-Cantelli Theorem on empirical distributions:

sup |S(x, ) 
S(x, )| Z sup |S(x, ) Sn (x)| + (1 Z) sup |S(x, ) S0 (x)| 0
x x x

(cf., e.g., DasGupta 2008, Theorem 1.7, p. 5). 

2.3. The Empirical Bayes Estimation


We now consider the general situation where both the conditional ddf S(x, ) and the structure function ( ) are completely
unknown, including the structure parameters S0 (x), 02 and 02 . We propose an empirical Bayes method to estimate S0 (x),02 , and
02 based on the claim experiences over a number of risks in the same portfolio. More specifically, let X1 , X2 , . . . , XK denote K
risks under observation. Each Xi has a ddf characterized by a risk parameter i and contributes a sequence of claim experiences
denoted by vector Xi = (Xi1 , Xi2 , . . . , Xini ) over ni time periods, i = 1, . . . , K, subject to the following usual assumptions.

Assumption 2.1

1. Conditional on i , the random variables Xij (j = 1, 2, . . . , ni ) are i.i.d. Xi with common unknown ddf and moments:

   
S(x, i ) = Pr Xij > x|i , 2 (x, i ) = Var I(Xij >x ) |i , i = 1, 2, . . . , K, j = 1, 2, . . . , ni .

2. The random variables 1 , . . . , K are K i.i.d. random variables with a common but unknown prior distribution ( ), so that
S0 (x) = E [S(x, )], 02 (x) = Var (S(x, )) and 02 (x) = E [ 2 (x, )].

The task of estimating the individual ddf in a distribution-free setting is then accomplished in two steps. The first step involves
homogeneous estimation of the ddfs, and the second estimates the structural parameters 02 and 02 .

2.3.1. Homogeneous Estimation of the Distribution Functions


ni
Write Si (x) = n1
i j =1 I (Xij > x), i = 1, 2, . . . , K. To obtain the credibility estimator of S(x, i ), consider the class of ddfs

 

K 
ns 
K 
ns
L= ast I (Xst > x) , ast R, ast = 1 (2.10)
s=1 t=1 s=1 t=1
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 317

and solve the minimization problem


 + 
min E [g S(x, i )]2 dx . (2.11)
gL

The solution is stated in the next theorem. While the proof of the first part is similar to that of Theorem 2.1, hence omitted, the
one for the second part is put in Section A.1.1 of the appendices.

Theorem 2.3. The homogeneous credibility estimator of S(x, i ) as the solution to (2.11) is

S (x, i ) = Zi Si (x) + (1 Zi ) 
 S(x), (2.12)

where

ni 02 1 
K
Zi = , i = 1, 2, . . . , K, and 
S(x) = K Zr Sr (x). (2.13)
0 + ni 02
2
r=1 Zr r=1

Moreover, the mean integrated squared error of 


S (x, i ) is
  
r =i Zr + 1
 2
 02 02
E S (x, i ) S(x, i dx =  . (2.14)
r =i Zr + Zi 0 + ni 0
2 2

It is obvious that the mean integrated squared error in (2.14):


is decreasing in every nr , r = 1, 2, . . . , K, indicating that adding samples to any policies improves the estimate
converges to zero as ni , regardless of how other nr (r = i) change, indicating that increasing the sample size of
policy i will pinpoint the true S(x, i ) and
does not tend to zero if ni , indicating that observing the claims from the policies other than i cannot pinpoint the
true risk characteristic of policy i.
One can also consider the class
 K ns 
0 (x) + ast I (Xst > x) : a0 , ast R, and
L1 = s=1 t=1 (2.15)
0 (x) is an arbitrary ddf

to derive an inhomogeneous estimator. The resulting estimator, however, is just the same as the one presented in Equations
(2.7)(2.8). This is a typical phenomenon in credibility theory, see, for example, Buhlmann and Gisler (2005), section 3.1.4.

2.3.2. Estimation of Structure Parameters


We now proceed to the estimation of the structure parameters 02 and 02 , which needs the estimation of S0 (x). At this moment,
however, S(x) is not a suitable estimate of S0 (x) because Zi s involve 02 and 02 (cf. [2.13]). On the other hand, if we use an
  K
estimator that minimizes (S(x) S0 (x))2 dx, where S(x) = K i=1 wi Si (x), with respect to w1 , w2 , . . . , wK , i=1 wi = 1, the
solution for S(x) is also S(x) = 
S(x). Therefore, it is reasonable to select the unbiased estimate

1  
K K
S(x) = ni Si (x), where N = ni depends on the portfolio size K. (2.16)
N i=1 i=1


Recall that under the general strong law of large numbers, if the summands are mutually independent, then cn1 nj=1 (Xj

E[Xj ]) 0 for any sequence {cn } of real numbers, provided that
a.s. 2
k=1 ck Var(Xk ) < ; see, e.g., theorem 3.1
ofK DasGupta
(2008, p. 35). It is then easy to see that S(x) is strongly consistent: Taking cK = N , since Var[nK SK (x)] = Var[ nj =1 I (XKj >
318 X. CAI ET AL.

a.s.
x)] 2n2K , it follows that S(x) S0 (x) as K under the condition


n2K
< . (2.17)
K=1
N2

In view of the definitions of 02 (x) and 02 (x), the parameters 02 and 02 can be estimated using


K 
ni
   2 
K
 2
SSE(x) = I Xij > x Si (x) and SSA(x) = ni Si (x) S(x) , (2.18)
i=1 j =1 i=1

an idea borrowed from analysis of variances (ANOVA). The estimators can be formally defined as

1

02 = SSE(x) dx (2.19)
N K

and
 
N K 1

02 = K SSA(x)dx SSE (x) dx . (2.20)
N2 i=1 n2i N K

Increasingly order the claims of individual i as Xi(1) , Xi(2) , . . . , Xi(ni ) and all N = n1 + + nK claims jointly as R1 , R2 , . . . , RN .
Then some algebraic computations give rise to

K  ni
2j 1
SSE(x) dx = 1 Xi(j ) and (2.21)
i=1 j =1
ni
 1  1 
K ni N
SSA(x) dx = (2ni 2j + 1) Xi(j ) (2N 2j + 1)Rj , (2.22)
n
i=1 i j =1
N j =1

which are useful in computation.


The theorem below provides the properties of estimators (2.19) and (2.20), whose proof is given in A.1.2 of the appendices.

Theorem 2.4. 
02 and 
02 have the following properties.
02 and 
1.  02 are unbiased estimators of 02 and 02 , respectively.
2. Under condition (2.17), 02 02 and 02 02 almost surely as K .

Finally, by inserting the estimates of the structure parameters 02 and 02 into (2.12) and (2.13), we can get empirical Bayes
estimators of the ddfs S(x, i ) as
 

S(x, i ) =  Zi 
Zi Si (x) + 1  S0 (x), i = 1, 2, . . . , K, (2.23)

where

ni 02 1 
K
Z i = , i = 1, 2, . . . , K, and 
S0 (x) = K Z r Sr (x). (2.24)
02 + ni 02 r=1 Z r r=1

2.3.3. Asymptotic Optimality


We now establish the asymptotic optimality of 
S(x, i ) (a similar treatment for the credibility estimate of Buhlmann [1967] can
be found in Mashayekhi [2002], whereas an earlier discussion is Norberg [1980]). For simplicity, we discuss the balanced case in
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 319

which all the ni take the same value equal to, for example, ni n, and thus Zi are all the same over i = 1, 2, . . . , K (so are Z i )
and, consequently,

1 
K

S0 (x) = Sr (x).
K r=1

In turns out that the asymptotic optimality needs an extra condition


+ /(2+)
02 (x)
+ 02 (x) dx < for some constant > 0. (2.25)
n

The result is stated in the theorem below, and its proof is found in Section A.1.3 of the appendices.

Theorem 2.5. Under conditions (2.17) and (2.25), the S(x, i ) defined by (2.23) and (2.24) are asymptotically optimal in the
sense that
  + + 
  2  2 
lim max E  
S(x, i ) S(x, i ) dx S(x, i ) S(x, i ) dx  = 0,
 (2.26)
K 1iK

where 
S(x, i ) is the linear estimator defined by (2.7) and (2.8).

3. APPLICATIONS TO EXPERIENCE RATEMAKING


As aforementioned, in this article we are concerned with the premium principle H (X) that is expressed  as a functional of the
ddf of X; for example, for a nonnegative loss X the net premium principle is expressed as H (X) = E[X] = 0 S(x) dx, where S(x)
indicates the ddf of X. Having estimated the distributions, the applications to the experience  ratemaking are apparent: replacing the
ddf S(x) by its estimate S(x) to produce experience ratemaking; for example, H  (X) = 
0 S(x) dx for the net premium principle;
details for every premium principle can be found below.
For a premium principle H, write H (X| ) for the risk premium of a risk X that is calculated by applying H to X with ddf S(x, ),
referred to also as the risk premium of X. By replacing S(x, ) with the credibility estimators  S(x, ) (cf. [2.7]), or the empirical
Bayesian estimator  S(x, i ) (cf. [2.23]) under Model II for risk i , an estimator H  (X| ) of the risk premium H (X| ) is obtained.
We discuss the properties of the experience premium by inserting  S(x, ) for S(x, ) in Section 3.1 and compare it with existing
methods in Section 3.2. The performance of the empirical Bayesian estimation  S(x, i ) of S(x, ) and the corresponding version
of experience ratemaking using  S(x, i ) are discussed in Section 3.3 by means of numerical studies.

3.1. Experience Ratemaking and the Consistency


Because the target of pricing is to pinpoint the risk premium H (X| ), it is a basic requirement that, as an estimator of the risk
premium, H (X| ) should pinpoint the risk premium if the experience can provide perfect statistical information. In the statistical
language, this can be achieved by the strong consistency of the ratemaking in the sense that there is a 100% chance that the estimate
 (X| ) will approach the true value of H (X| ) as the available information increases unlimitedly. We first present the strong
H
consistency of H (X| ) in the theorem below, which follows directly from Theorems 2.1 and 2.2.

Theorem 3.1. The estimator H  (X| ) is strongly consistent for H (X| ) if the premium principle H is continuous with respect to
the L -norm of ddfs, where the L -norm of a function f (x) on R is denoted and defined by f (x) = supxR |f (x)|. In other
S(x, ) satisfies supxR |
words, if the estimator   (X| )
a.s. a.s.
S(x, ) S(x, )| 0, then H H (X| ).

According to Theorem 3.1, the strong consistency of H  (X| ) is guaranteed under certain regularity conditions. This is a
significant improvement over such literature as Gerber (1980), where the credibility estimator is not generally (strongly) consistent,
and as Pan et al. (2008) and Wen et al. (2009), where the consistency of the credibility estimators needs to be proved separately in
every case. Quite a few premium principles H can be represented as a continuous function of expectations of certain functions of X,
for example, Kamps premium H (X| ) = E[X(1 eX )| ]/E[(1 eX )| ]. When the functions are bounded and continuous on
the support of X, it is well known that H is continuous with respect to the weak convergence of the distribution of X (Portmanteau
theorem; cf. DasGupta, 2008, Theorem 1.4), which is a stronger requirement than the continuity with respect to the L -norm in
320 X. CAI ET AL.

our Theorem 3.1. If the limiting ddf is continuous in x, however, these two conditions are equivalent; see theorem 1.3 of DasGupta
(2008).
There are many well-known and extensively discussed premium principles (cf. Young 2004) for which strong consistency of the
experience ratemaking can be easily checked, although not as a result of Theorem 3.1. They include the net premium H (X| ) =
E[X| ], variance premium H (X| ) = E[X| ] + Var(X| ), modified variance premium H (X| ) = E[X| ] + Var(X| )/E[X| ],
standard deviation premium H (X| ) = E[X| ] + Var(X| ), Esscher premiums H (X| ) = E[XehX | ]/E[ehX | ], Kamps
premium H (X| ) = E[X(1 eX )| ]/E[(1 eX )| ], conditional tail expectation premium H (X| ) = E[X|X > , ], and
exponential premium H (X| ) = 1 log[E(eX | )]. The following are two exceptions in which the consistency is not straightfor-
ward.

1. Dutchs premium principle: H (X| ) = E[X| ] + E[(X E[X| ])+ | ], where 0 < 1 and 1. Observe that
ESn [(X E[X| ])+ ] = n1 ni=1 (Xi  ( ))+ , where ESn [g(X)] denotes the expectation of g(X) with respect to the
distribution generated by Sn (x). The estimate of H (X| ) is then given by
 
Z n
 (X| ) = 
H ( ) + (Xi 
( ))+ + (1 Z) (x 
( ))+ dF 0 (x) , (3.1)
n i=1

n
where dF0 (x) = 1 S0 (x) is the marginal cumulative distribution function of X, 
( ) = ZX n + (1 Z)0 , and Xn = i=1 Xi
(which will be used thereafter). Its consistency is proved in Section A.2.1 of the appendices.
2. Distortion premium principle:
0
H (X| ) = g (S(x, )) dx g(1 S (x, )) dx. (3.2)
0

The premium estimate is


0
 (X| ) =
H g (ZSn (x) + (1 Z) S0 (x)) dx g (ZFn (x) + (1 Z) F0 (x)) dx,
0

 (X| ) H (X| ) as n
where the credibility factor is given by (2.8). For example, if X has bounded support, then H
a.s.

if g is a Lipschitz function on [0, 1] such that |g(x) g(y)| C|x y| for some constant C, which follows easily from
Glivenko-Cantellis theorem. For the case of unbounded support, the consistency still holds if g  is a Lipschitz function on
[0, 1]; see Section A.2.2 of the appendices for more details.

3.2. Performance Evaluation of Experience Rating by S(x,  )


In this section, we will evaluate the performance of this newly proposed approach by comparing it with some well-known
representative methods in the literature. More specifically, the experience rating developed in the last section is compared with the
methods of Buhlmann (1967) for a net premium, Buhlmann (1970) for a variance premium, and the credibility premiums of Gerber
(1980) and Pan et al. (2008) under the Esscher premium principle. This section will use the notation Xn = (X1 , X2 , . . . , Xn ).

3.2.1. Evaluation under the Net Premium Principle


Under the net premium, H (X| ) = E[X| ] = ( ). Replacing S(x, ) with the estimate S(x,
) presented in (2.7), the estimate
of H (X| ) is

 (X| ) = ZX n + (1 Z) 0 ,
H (3.3)

which has the same form as Buhlmanns credibility premium  c ( ) = Z c Xn + (1 Z c ) 0 but uses a different credibility
factor, where the superscript c denotes classical, Z = n 2 /(n 2 + 2 ) is the credibility factor, 2 = Var(( )), and
c

2 = E [Var(X| )]; see Buhlmann and Gisler (2005). It is clear that 


c ( ) differs from 
( ) only in the credibility factors Z c and
Z = n0 /(n0 + 0 ). The following theorem provides the expected squared errors of the two estimators, of which the proof is
2 2 2

straightforward.
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 321

Theorem 3.2. The expected quadratic losses of the credibility estimators 


( ) and 
c ( ) are, respectively,

2  c 2 2
( ) ( )]2 = Z 2
E [ + (1 Z)2 2 and E 
( ) ( ) = (Z c )2 + (1 Z c )2 2 . (3.4)
n n

As a result,

( ) ( )]2
E[
lim = 1. (3.5)
c ( ) ( )]2
n E[

To see how (3.5) follows from (3.4), just plug Z = n02 /(n02 + 02 ) and Z c = n 2 /(n 2 + 2 ) in (3.4) to obtain

n04 2 + 04 2  c 2 2 2
( ) ( )]2 =
E [ and E 
( ) ( ) = .
(n02 + 02 )2 n 2 + 2

It then follows that

( ) ( )]2
E[ (04 2 + 04 2 /n)( 2 + 2 /n) 04 2 2
lim = lim = = 1.
c ( ) ( )]2
n E[ n (02 + 02 /n)2 2 2 (02 )2 2 2

Under the net premium principle, 


c ( ) is optimal and hence theoretically better than 
( ). The limit in (3.5), however, indicates
that the two are actually asymptotically equivalent. On the other hand, the worst estimate of the conditional mean should be the
collective mean 0 = E [( )], which has an expected squared loss 2 , as it does not take into account any information from the
historical data. The following example presents a clear figure on the performance of  ( ) for small sample sizes, which shows that
( ) lies between 0 and 
 c ( ).

Example 3.1. Assume that X1 , X2 , . . . , Xn are i.i.d. as S(x, ) = ex , x > 0, and Gamma(, ) with density ( ) =
1 e / (), > 0, > 2, > 0, where and are known quantities. By some algebraic computations, the expected
squared errors of  ( ), 
c ( ), and 0 can be shown to satisfy the equalities

( ) ( ))2 ]
E[( 2n c ( ) ( ))2 ]
E[( 1
= 1 + and = . (3.6)
c ( ) ( ))2 ]
E[( ( 1)(n + 2 1)2 E[(0 ( )) ]2 n+1

The two equalities clearly show that the estimate  c ( ) and both have the MSEs that are only of n1
( ) is slightly worse than 
order of the MSE of the collective premium 0 .

3.2.2. Evaluation under the Variance Premium Principle


Under the variance premium principle, the risk premium is H (X| ) = E[X| ] + Var(X| ) for some positive constant . Thus
     
 (X| ) = ZX n + (1 Z) 0 + ZD2n + (1 Z) 2 + 2 + Z (1 Z) X n 0 2 ,
H (3.7)

n
where Dn2 = n1 i=1 (Xi Xn )2 . Recall Buhlmanns credibility premium (Buhlmann 1970, chapter 4):
 
HBul (Xn ) = bX n + (1 b) 0 + csn2 + (1 c) 2 + (1 b) 2 , (3.8)

where
 
1 
n
n 2 Var 2 ( )
sn2 = (Xi Xn )2 , b= 2 , c=      .
n 1 i=1 n + 2 Var 2 ( ) + E Var sn2 |
322 X. CAI ET AL.

Because the quantity Var(sn2 | ) in the credibility factor c involves the fourth moment of Xi given , Buhlmann suggested to use
2 4 ( )/(n 1) as an approximation under certain assumptions, so that c was in fact approximated by Var ( 2 ( ))/{Var ( 2 ( )) +
E [2 4 ( )/(n 1)]}. Therefore, in contrast to Buhlmanns version, a direct advantage of (3.7) is that we need only the first two
moments of the risk distribution.
The following example numerically illustrates how H  (X| ) in (3.7) is close to Buhlmanns credibility estimator HBul .

Example 3.2. Poisson-exponential model: Let X be Poisson distributed with Pr(X = k| ) = k e /k! given , k = 0, 1, . . . ,
and has an exponential prior density ( ) = e ( > 0), such that ( ) = , 2 ( ) = , 0 = 1/, 2 = 1/2 and
2 = 1/. The risk and collective premiums are H (X| ) = (1 + ) and Hcol (X) = 1/ + ( + 1)/2 , respectively. The posterior
distribution of given Xn is Gamma(nXn +1, n+), with (conditional) mean (nXn +1)/(n+) and variance (nX n +1)/(n+)2 .
The estimators of H (X| ) include the Bayes premium

(n + )( + 1) +
HB (Xn ) = E[Xn+1 |Xn ] + Var(Xn+1 |Xn ) = (nXn + 1),
(n + )2

the premium Hcu (Xn ) in (3.7), Buhlmanns credibility premium HBul in (3.8), and collective premium Hcol (X), where and
henceforth, the subscript cu indicates the current experience premium. Consider the mean squared error of these estimators as
V& = E[(H& (Xn ) H (X| ))2 ] for & = B, Bul, cu, col. While Vcol = ( + 1)2 /2 + 2 /4 and VB = 2 [M 2 n + (Mn 1
)2 +(M +Mn1)2 ] are both exact, where M = [(n+)( +1)+]/(n+)2 , the values of VBul and Vcu can be approximated
only by the Monte Carlo method. We approximated the values of VBul and Vcu for fixed = 0.2 and a variety of values with
sample size n = 30 and n = 100. Accordingly, we also computed their relative efficiencies Eff & = (Vcol V& )/(Vcol VB )
(Eff col = 0 and Eff B = 1, and a larger value of Eff & stands for higher efficiency of the method &). The results are reported in
Table 1 which shows that (1) the estimate Hcu (Xn ) is better than the collective premium Hcol , (2) the Vcu is slightly larger than
VBul but the differences are negligible as the sample size increases, and (3) the estimates Hcu (Xn ), HBul (Xn ) are both very close
to the Bayes premium HB (Xn ).

3.2.3. Evaluation under the Esscher Premium Principle


The Esscher premium principle (Buhlmann 1980), expressed as H (X) = E[XehX ]/E[ehX ], is the optimal solution to minimizing
min
the expected exponentially weighted loss PR E[ehX (X P )2 ]. Gerber (1980) was the first to propose a version of its credibility
premium (referred to as Gerbers premium below). Recent work by Pan et al. (2008) found that Gerbers premium does not
converge to the risk premium in general and suggested a new credibility premium that does so (written as Pans premium below).
Generally, it is very difficult to compute Gerbers and Pans premiums; see Pan et al. (2008) or Wen et al. (2009) for detailed
accounts. The variety versions of Esscher premiums can be regarded as the solutions to the unified minimization problem
 
min E (Xn+1 P )2 ehXn+1 (3.9)
P

TABLE 1
Numerical Results of V& and Eff & for n = 30 and n = 100
n = 30 n = 100
VB Vcu Eff cu VBul Eff Bul Vcol VB Vcu Eff cu VBul Eff Bul Vcol
0.2 0.2405 0.4050 0.9822 0.3937 0.9832 61.000 0.0720 0.0958 0.9996 0.1180 0.9992 61.000
0.3 0.1530 0.1582 0.9886 0.1538 0.9893 20.938 0.0479 0.0842 0.9982 0.0779 0.9985 20.938
0.4 0.1189 0.1365 0.9907 0.1546 0.9863 10.562 0.0359 0.0496 0.9986 0.0503 0.9986 10.562
0.5 0.0947 0.18 0.9885 0.0840 0.9939 6.4000 0.0270 0.0306 0.9994 0.0285 0.9997 6.4000
0.6 0.0786 0.0803 0.9886 0.0792 0.9919 4.3086 0.0238 0.0288 0.9988 0.0270 0.9992 4.3086
0.7 0.0671 0.0769 0.9906 0.0720 0.9929 3.1053 0.0180 0.0207 0.9991 0.0199 0.9993 3.1053
0.8 0.0585 0.0847 0.9829 0.0876 0.9812 2.3476 0.0160 0.0174 0.9993 0.0170 0.9995 2.3476
0.9 0.0518 0.0399 0.9916 0.0383 0.9927 1.8387 0.0158 0.0162 0.9998 0.0160 0.9999 1.8387
1.0 0.0465 0.0656 0.9795 0.0664 0.9789 1.4800 0.0142 0.0171 0.9980 0.0171 0.9980 1.4800
Note: Vcol is independent of the sample size and thus is shared by n = 30 and n = 100.
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 323

under a certain domain listed below:

= R for the collective premium H (Xn+1 ) = E[Xn+1 ehXn+1 ]/E[ehXn+1 ]


= the collection of all measurable functions g( ) of the parameter for the risk premium H [Xn+1 | ] =
E[Xn+1 ehXn+1 | ]/E[ehXn+1 | ]
= the collection of all measurable functions P (X1 , X2 , . . . , Xn ) of the samples for the Bayes premium HB (Xn ) =
E[Xn+1 ehXn+1 |Xn ]/E[ehXn+1 |Xn ]
  
G = q0 + ni=1 qi Xi : q1 , . . . , qn R for Gerbers premium HG (Xn ) = Z G Xn + (1 Z G E [( )]/Hcol (X))
Hcol (X) and  
P = {p + q ni=1 Xi ehXi / ni=1 ehXi : p, q R} for Pans premium HP (Xn ) = Z P Hn + (1 Z P E [hn ( )]/H (x))
H (x), where

Cov (H (X| ), hn ( )) Cov (H (X| ), ( ))


ZP =      , ZG = ,
Var [hn ( )] + E Var Hn Xn | Var [( )] + n1 E [Var(X| )]

 
Hn = ni=1 Xi ehXi / ni=1 ehXi and hn ( ) = E (Hn | ) with E , Var , and Cov denoting, respectively, the expectation, vari-
ance, and covariance with respect to a fictitious distribution of , defined in terms of density by ( ) = ( )mh ( )/mh
with mh ( ) = E(ehX | ) and mh = E[mh ( )] = E(ehX ).  See, for example, Pan et al. (2008) or Wen et al. (2009). In addi-
tion, Z = Zhn /(Zhn + (1 Z)h0 ), where hn = n1 ni=1 ehXi , h0 = E[ehX ] and Z is given by (2.8) in Theorem 2.1.
Note that the individual and collective premiums are independent of n.

Now that the Esscher premium principle is obtained by minimizing the exponentially weighted quadratic error in (3.9); we use
the weighted quadratic loss E[L(H& (Xn ))] = E[(Xn+1 H& (Xn ))2 ehXn+1 ], for & = col, B, P , G, and cu, to measure the closeness
of the experience premiums H& (Xn ). This is similar to what we have done in the case of the Buhlmanns credibility formula with
quadratic errors in Section 3.2.1. Note that it can be represented in terms of the risk premium H (X| ) as
   
E[L(H& (Xn ))] = E (Xn+1 H (X| ))2 ehXn+1 + E (H (X| ) H& (Xn ))2 ehXn+1 , (3.10)

where the first term of the right-hand side is independent of &. Hence, comparing E[(H& (Xn ))] can be reduced to comparing

  2    2 
V& = E H (X| ) H& Xn ehXn+1 = E H (X| ) H& Xn mh ( ) . (3.11)

Obviously E[L(HB (Xn ))] E[L(H& (Xn ))] for all & and E[L(H& (Xn ))] E[L(Hcol (Xn ))] for & = B, P , and G. Thus, HB (Xn )
is the best of H& (Xn ) over all values of &. While we do not generally know whether Hcu is better than HP , HG , and Hcol , it is
highly interesting that Hcu (Xn ) is optimal under the Bernoulli-Uniform model: It coincides with the Bayes premium HB (Xn ). This
is stated in Example 3.3 below. Example 3.4 compares V& for & = G, P and cu under the Poisson-Gamma model.

Example 3.3 (Bernoulli-Uniform Model). Let X be a Bernoulli variable given with Pr(X = 1| ) = 1 Pr(X = 0| ) = and
uniformly distributed over interval (0, 1). Then Hcu (Xn ) = HB (Xn ); see Section A.3.2 of the appendices for a proof.

i.i.d
Example 3.4 (Poisson-Gamma Model). Let Xi P oisson ( ) and Gamma(, ) with density ( ) = 1 e / (),
> 0, > 2, > 0. It follows that E(X| ) = Var(X| ) = and, given Xn , the posterior distribution of is Gamma( +
nX n , + n). The corresponding premiums are listed in Table 2, where
 
The term approx indicates that Pans premium HP X n can be computed only by a Monte Carlo approximation (an algorithm
is presented in Algorithm A.1)
For Hcu (Xn ), the credibility factor is Z = Zhn /(Zhn + (1 Z) h0 ), with Z = n 20 /(n 20 + 02 )

 
min(i, j ) ( + i + j ) ( + i) ( + j )
02 = (3.12)
i=1 j =1
i!j ! () ( + 2)+i+j () ( + 1)2+i+j
324 X. CAI ET AL.

TABLE 2
Experience Premiums under Poisson-Gamma Model
Premium and Individual Collective Bayes Pan Gerber Current
denotation H (X| ) Hcol (X) HB (Xn ) HP (Xn ) HG (Xn ) Hcu (Xn )

Expression eh (+nXn )eh approx (+nXn )eh Z Hn + (1 Z ) H (X)


(eh +1) +neh +1 +neh +1

and

  min(i, j ) ( + i + j )
02 = ; (3.13)
i=1 j =1
i!j ! () ( + 2)+i+j

see Section A.3.4 of the appendices for proofs of both (3.12) and (3.13) and
It is also interesting that
   
HG Xn = HB Xn ; (3.14)

see Section A.3.5 of the appendices for a computations of HG and HB .


We then compute V& . While Vcol = E[(H (x) H (X| ))2 ehXn+1 ] = e2h /( eh + 1)+2 is easy to obtain, the quantities V&
for & = B, P can only be computed numerically. In the following computation, h = 0.6 and = 6 are taken. For different values
of , the sample sizes n = 10 and n = 100 are considered. For the pairs of (, n), because HG (Xn ) = HB (Xn ), only the values of
V& for & = col, B, and P are necessary. The outcomes of a numerical experiment are listed in Table 3, where the efficiency, which
is defined as Eff & = (Vcol V& )/(Vcol VB ), is a measure of how well the experience premium H& (Xn ) performs, by comparing
with the best HB (Xn ). Table 3 shows that VB < VP < Vcu < Vcol for n = 10 and VB < VP Vcu < Vcol for n = 100. It is also
apparent that although Hcu is not the best, it is sufficiently good for practical use.

We conclude this section with two remarks: (1) there are cases where Hcu is optimal, and (2) even if Hcu is not optimal, it is
tightly close to the optima, which is strongly supported by the numerical results in Table 3, where the lowest efficiency of Hcu is
0.9296 at = 5 and n = 10 (a very small sample size).

TABLE 3
Numerical Results of V& and Eff & for n = 10 and n = 100
n = 10 n = 100
VB Vcu Eff cu VP Eff P Vcol VB Vcu Eff cu VP Eff P Vcol
2.0 0.1128 0.1740 0.9475 0.1521 0.9663 1.2776 0.0151 0.0368 0.9828 0.0369 0.9827 1.2776
2.5 0.1320 0.1847 0.9668 0.1697 0.9762 1.7191 0.0217 0.0398 0.9893 0.0391 0.9897 1.7191
3.0 0.2087 0.2574 0.9758 0.2423 0.9833 2.2207 0.0277 0.0517 0.9891 0.0515 0.9892 2.2207
3.5 0.2245 0.3002 0.9705 0.2829 0.9772 2.7889 0.0292 0.0651 0.9870 0.0654 0.9869 2.7889
4.0 0.3390 0.5018 0.9474 0.4518 0.9635 3.4311 0.0390 0.1177 0.9768 0.1121 0.9784 3.4311
4.5 0.3898 0.5987 0.9445 0.5144 0.9669 4.1551 0.0484 0.1423 0.9771 0.1453 0.9764 4.1551
5.0 0.4718 0.7885 0.9296 0.6411 0.9624 4.9698 0.0733 0.2139 0.9713 0.2166 0.9707 4.9698
5.5 0.4463 0.7589 0.9425 0.6646 0.9598 5.8848 0.0647 0.1809 0.9800 0.1678 0.9823 5.8848
6.0 0.5736 0.8912 0.9499 0.7999 0.9643 6.9106 0.1004 0.2412 0.9793 0.2403 0.9795 6.9106
6.5 0.6296 1.1399 0.9313 0.9642 0.9550 8.0590 0.1181 0.3274 0.9736 0.3052 0.9764 8.0590
7.0 0.8354 1.3253 0.9424 1.1425 0.9639 9.3425 0.1248 0.3946 0.9707 0.3773 0.9726 9.3425
7.5 1.0700 1.8302 0.9217 1.5353 0.9520 10.7752 0.1745 0.6393 0.9562 0.6018 0.9597 10.7752
8.0 1.3422 2.0515 0.9357 1.7321 0.9646 12.3724 0.1542 0.5338 0.9689 0.5107 0.9708 12.3724
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 325

TABLE 4
Averages of 100 ISEs of the estimates of the ddf
Policy No. i 1 2 3 4 5 6 7 8 9 10
ISE of 
S(x, i ) 1.832 1.752 1.801 1.797 1.870 1.859 1.838 1.739 1.861 1.796
ISE of 
S (x, i ) 1.914 1.799 1.868 1.854 1.937 1.936 1.903 1.793 1.933 1.870
ISE of 
S(x, i ) 1.973 1.858 1.979 1.909 2.006 2.011 1.958 1.860 1.991 1.915

3.3. Performance Evaluation of the Empirical Bayes Estimation


This section reports the numerical results of a small simulation study conducted under Model II so as to show the closeness
of the empirical Bayes estimate  S(x, i ) to the true S(x, ), with comparisons to that of the inhomogeneous  S(x, ) and the
homogeneous S (x, i ), as well as the performance of empirical premiums obtained by plugging in 

S(x, i ) for S(x, ) under,
respectively, net, variance, modified variance, and standard deviation premium principles. This simulation was performed under
the following settings:
The size of the simulation is set to K = 10 and ni = 20, i = 1, 2, . . . , 10
The experiential claims Xij were sampled from the exponential density f (x, i ) = i exp (i x) , x > 0, in which the
risk parameter i followed the Gamma distribution with shape parameter = 4 and scale parameter = 3
To fix the premium principles, their riskloading coefficients (i.e., [risk premium mean loss] divided by the standard
deviation) were set to a common value 0.3.
The corresponding averages of the integrated squared errors (ISE) of the empirical Bayesian estimates 
S(x, i ), inhomogeneous
S(x, i ) and homogeneous estimates 
estimates  S (x, i ), that is, the averages of

(
S(x, i ) S(x, i ))2 dx, (
S(x, i ) S(x, i ))2 dx and (
S (x, i ) S(x, i ))2 dx,

were computed and, after being multiplied by 100 to make the values in a moderate scale, are listed in Table 4. In this table, in
terms of the average of the ISEs, the empirical Bayes estimation is slightly worse than the inhomogeneous estimation, and the
latter is slightly worse than the homogeneous estimation. This loss of accuracy is clearly caused by the additional estimation of
the unknown structure parameters 02 , 02 , and S0 (x).
This simulation also computed the averages of the squared errors (SE) of the empirical premiums obtained by plugging in

S(x, i ). The squares of differences between the empirical and theoretical premiums, under net, variance, modified variance, and
standard deviation principles, after being multiplied by 10, are listed in Table 5.
To measure the efficiency of the empirical premium computed by  S(x, ), we computed the quantities

ASE(col) ASE(EB)
Eff = , (3.15)
ASE(col) ASE(Bayes)

where ASE(H ) is the average of the squared errors obtained by applying premium principle H: col means collective premium H (X),
EB the empirical premium computed from  S(x, ), and Bayes the Bayesian premium computed as follows. Under the probability
distribution setting in the simulation, the predictive distribution of a future loss, such as Xi,ni +1 , given {Xij , j = 1, 2, . . . , ni } is

TABLE 5
Averages of 10 SEs for Empirical Premiums
Policy No. i 1 2 3 4 5 6 7 8 9 10
SE of Net 0.731 0.658 0.778 0.679 0.848 0.699 0.794 0.683 0.764 0.753
SE of Var. 5.310 5.246 40.51 5.794 13.30 4.771 7.171 8.373 10.28 11.93
SE of ModVar 1.579 1.522 1.842 1.460 1.771 1.562 1.695 1.548 1.704 1.691
SE of StDev 1.303 1.189 1.439 1.188 1.484 1.249 1.403 1.232 1.374 1.362
326 X. CAI ET AL.

TABLE 6
Efficiency of Empirical Premiums with Respect to Collective Premium
Policy No. i 1 2 3 4 5 6 7 8 9 10
Eff of Net 0.988 0.974 1.000 0.991 0.971 0.982 0.978 0.973 0.956 0.957
Eff of Var. 0.868 0.614 0.793 0.917 0.834 0.851 0.906 0.949 0.889 0.801
Eff of ModVar 0.934 0.930 0.958 0.956 0.937 0.931 0.943 0.931 0.922 0.924
Eff of StD 0.975 0.966 0.991 0.985 0.965 0.970 0.973 0.963 0.948 0.953

ni
the Pareto distribution with shape parameter ni + and scale parameter j =1 Xij + , so that
ni ni
j =1 Xij +   j =1 Xij +
E[Xi,ni +1 |Xi1 , . . . , Xini ] = and Var Xi,ni +1 |Xi1 , . . . , Xini = .
ni + 1 ni + 1

The Bayes premium for Xi,ni +1 was then computed by substituting the predictive distribution into a risk premium for the risk
distribution S(x, ) under the net, variance, modified variance, and standard deviation premium principles. The resulting efficiencies
from the simulation under the four principles above are listed in Table 6, which shows that the empirical Bayes premiums under
all four premium principles are of high efficiencies, though they vary over premium principles.

4. CONCLUDING REMARKS
We have developed a completely nonparametric estimation for loss distributions and established a unified distribution-free
approach to experience rating for arbitrary premium principles. The method combines the advantages of Buhlmanns credibility
theory and Fergusons nonparametric Bayes premiums and thus provides a powerful tool to generate appropriate experience rating
given the growing body of premium principles developed in general insurance. It is demonstrated under a number of principles
that, although this new approach does not guarantee theoretical optimality, it does produce solutions that are close to the optima.
In examples we have examined (Section 3.2.3) for the Esscher premium principle, the efficiencies with respect to the optimal
premium range between 92.17% and 97.58% even with a small sample size of n = 10 (cf. Table 3).
This new approach can be broadly applied in almost all premium pricing problems in general insurance, including health care,
income protection, property, financial products, and business. More broadly, our distribution-free approach to estimate distribution
functions can be applied to many other areas, such as reserve evaluation (including incurred but not reported and reported but
not settled claims) to predict outstanding claim losses, Bonus-Malus insurance systems (cf. Ferreira 1974; Lemaire 1995) that
give premium discount to low risks in the past year, the optimal claim decision problem of policyholders (see, e.g., Haehling von
Lanzenauer 1974; Braun et al. 2006), health care cost analysis (Bertsimas et al. 2008; Enthoven and Fuchs 2006; Stephens et al.
2005), and simulation of health insurance markets (Feldman and Dowd 1982). This approach is also useful in economics, finance,
and other areas where previous experiences influence present and future risks.
The data structure we have used is, however, limited to the Buhlmann type (conditionally i.i.d.). The extension of the approach
to the Buhlmann-Straub model (Buhlmann and Straub 1970) is not difficult. There are, however, further interesting topics for future
researches, including problems where the data possess certain types of hierarchical settings, losses or risks of regression structures
dependent on covariates, and correlation structures such as panel data. It will be desirable and of practical importance to investigate
if results parallel to what we have found here could be derived for problems with different data settings. On the other hand, our
approach has been established by means of optimal estimation of the risk distributions under the L2 -distance measure, where
optimization could be performed based on derivative equations. It will be interesting to investigate if distribution-free approaches
of comparable performance could be developed under other distance measures. Another interesting topic is to theoretically identify
the conditions under which the experience ratings deduced by inserting the estimated distribution would agree with existing ones
such as Buhlmanns credibilities for the net premium principle and variance premiums, and the Gerbers and Pans versions for
Esscher premiums.

FUNDING
The authors acknowledge the support of GRF Grants No. 410211 and 410213 from the Research Grants Council of Hong
Kong, for X. Q. Cai, NSFC Grant No. 71361015, Jiangxi Provincial Natural Science Foundation Grant No. 20142BAB201013,
No. 2013M540534 from the China Postdoctoral Science Foundation, No. 2014T70615 from the China Postdoctoral Fund Special
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 327

Project for L. M. Wen, and Shanghai Philosophy and Social Science Foundation Grant No. 2010BJB004, the 111 Project under
Grant No. B14019, and NSFC Grant No. 71371074 for X. Y. Wu.

REFERENCES
Antoniak, C. E. 1974. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. Annals of Statistics 2(6): 11521174.
Bertsimas, D., M. V. Bjarnadottir, M. A. Kane, J. C. Kryder, R. Pandey, S. Vempala, and G. Wang. 2008. Algorithmic Prediction of Health-Care Costs. Operations
Research 56: 13821392.
Braun M., P. S. Fader, E. T. Bradlow, and H. Kunreuther. 2006. Modeling the Pseudodeductible in Insurance Claims Decisions. Management Science 52(8):
12581272.
Buhlmann, H. 1967. Experience Rating and Credibility. ASTIN Bulletin 4: 199207.
Buhlmann, H., 1970. Mathematical Methods in Risk Theory. Berlin: Springer-Verlag.
Buhlmann, H. 1980. An Economm Prerium Principle. ASTIN Bulletin 11: 5260.
Buhlmann, H., and A. Gisler. 2005. A Course in Credibility Theory and Its Applications. Amsterdam: Springer.
Buhlmann, H., and E. Straub. 1970. Glaubwudigkeit fur Schadensaze. Bulletin of the Swiss Association of Actuaries 70(1): 11133.
DasGupta, A., 2008. Asymptotic Theory of Statistics and Probability. New york: Springer Science+Business Media.
Dhaene, J., S. Vanduffel, M. J. Goovaerts, R. Kaas, Q. Tang, and D. Vyncke. 2006. Risk Measures and Comonotonicity: A Review. Stochastic Models 22:
573606.
Enthoven, A. C., and V. R. Fuchs. 2006. Employment-Based Health Insurance: Past, Present, and Future. Health Affairs 25: 15381547.
Feldman, R. D., and B. E. Dowd. 1982. Simulation of a Health Insurance Market with Adverse Selection. Operations Research 30: 10271042.
Ferreira, J. 1974. The Long-Term Effects of Merit-Rating Plans on Individual Motorists. Operations Research 22: 954978.
Ferguson, T. 1973. A Bayesian Analysis of Some Non-parametric Problems. Annals of Statistics 1(2): 209230.
Furman, E., and R. Zitikis. 2008. Weighted Premium Principles. Insurance: Mathematics and Economics 42(1): 459465.
Gerber, H. U. 1980. Credibility for Esscher Premium. Mitleilungen der Vereinigung schweiz. Versicher ungsmathematiker 3: 307312.
Ghosh, J. K., and R. V. Ramamoorthi. 2003. Bayesian Nonparametrics. Springer Series in Statistics. New York: Springer-Verlag.
Gomez, E., A. Hernandez, and F. J. Vazquez-Polo. 2000. Robust Bayesian Premium Principles in Actuarial Science. Journal of the Royal Statistical Society, Series
D 49(2): 241252.
Gomez, E., A. Hernandez, and F. J. Vazquez-Polo. 2006. On the Use of Posterior Regret -Minimax Actions to Obtain Credibility Premiums. Insurance:
Mathematics and Economics 39(1): 115121.
Goovaerts, M. J., R. Kaas, A. E. Van Heerwaarden, and T. Bauwelinckx. 1990. Effective Actuarial Methods. Amsterdam: North-Holland.
Hachemeister, C. A. 1975. Credibility for Regression Models with Application to Trend. In Credibility: Theory and Applications, Proceedings of the Berkeley
Actuarial Research Conference on Credibility, New York: Academic, pp. 129163.
Haehling von Lanzenauer, C. 1974. Optimal Claim Decisions by Policyholders in Automobile Insurance with Merit-Rating Structures. Operations Research 22:
979990.
Heilmann, W. R. 1989. Decision Theoretic Foundations of Credibility Theory. Insurance: Mathematics and Economics 8: 7795.
Jewell, W. S. 1974. The Credible Distribution. ASTIN Bulletin 7(3): 237269.
Kaas, R., M. Goovaerts, J. Dhaene, and M. Denuit. 2001. Modern Actuarial Risk Theory. New York: Kluwer Academic.
Klugman, S. A. 1992. Bayesian Statistics in Actuarial Science: With Emphasis on Credibility. Boston: Kluwer.
Lau, J. W., T. K. Siu, and H. Yang. 2006. On Bayesian Mixture Credibility. ASTIN Bulletin 36(2): 573588.
Lemaire, J. 1995. Bonus-Malus Systems in Automobile Insurance. New York: Kluwer Academic.
Makov, U. E., A. F. M. Smith, and Y. H. Liu. 1996. Bayesian Methods in Actuarial Science. Journal of the Royal Statistical Society, Series D 45(4): 503515.
Mashayekhi, M. 2002. On Asymptotic Optimality in Empirical Bayes Credibility. Insurance: Mathematics and Economics 31: 285295.
Natarajan, K., D. Pachamanova, M. Sim. 2009. Constructing Risk Measures from Uncertainty Sets. Operations Research 57(5): 11291141.
Norberg, R. 1980. Empirical Bayes credibility. Scandinavian Actuarial Journal. 1980: 172194.
Norberg, R. 2004. Credibility Theory. In Encyclopedia of Actuarial Science, edited by J. Teugels and B. Sundt. Chichester, UK: Wiley.
Pai, J. S. 1997. Bayesian Analysis of Compound Loss Distributions. Journal of Econometrics 79(1): 129146.
Pan, M., R. Wang, and X. Wu. 2008. On the Consistency of Credibility Premiums Regarding Esscher Principle. Insurance: Mathematics and Economics 42:
119126.
Pitselis, G. 2004. A Seemingly Unrelated Regression Model in a Credibility Framework. Insurance: Mathematics and Economics 34: 3754.
Robbins, H. 1955. An Empirical Bayes Approach to Statistics. In Proceedings of the Third Berkeley Symposium on Mathematics, Statistics and Probability 1:
157164.
Robbins, H. 1964. The Empirical Bayes Approach to Statistical Decision Problems. Annals of Mathematics and Statistics 35: 120.
Schmidt, K. D. 1991. Convergence of Bayes and Credibility Premiums. ASTIN Bulletin 20(2): 167172.
Schmidt, K. D. 1998. Bayesian Models in Actuarial Mathematics. Mathematical Methods of Operations Research 48: 117146.
Stephens, C. R., H. Waelbroeck, and S. Talley. 2005. Predicting Healthcare Costs Using GAs. In Proceedings of the 2005 Workshops on Genetic and Evolutionary
Computation, June 2526, Washington, D.C., GECCO 05. ACM, New York, pp. 159163. http://doi.acm.org/10.1145/1102256.1102291.
Sundt, B. 1999. An Introduction to Non-life Insurance Mathematics. 4th edition. Karlsruhe: Verlag Versicherungswirtschaft.
Szego, G. 2002. Measures of Risk. Journal of Banking and Finance 26: 12531272.
Wen, L., X. Wu, and X. Zhao. 2009. The Credibility Estimators under Generalized Weighted Loss Functions. Journal of Industrial and Management Optimization
5(4): 893910.
Wu, X., and X. Zhou. 2006. A New Characterization of Distortion Premiums Via Countable Additivity for Comonotonic Risks. Insurance: Mathematics and
Economics 38: 324334.
Young, V. R. 2004. Premium Principles. In Encyclopedia of Actuarial Science, edited by J. Teugels and B. Sundt, pp. 13221331. New York: Wiley.
Zehnwirth, B. 1977. The Mean Credibility Formula is a Bayes Rule. Scandinavian Actuarial Journal 212216.
328 X. CAI ET AL.

Zehnwirth, B. 1979. Credibility and the Dirichlet Process. Scandinavian Actuarial Journal 1323.
Zehnwirth, B. 1981. A Note on the Asymptotic Optimality of the Empirical Bayes Distribution Function. Annals of Statistics 9: 221224.

Discussions on this article can be submitted until July 1, 2016. The authors reserve the right to reply to any discussion. Please see
the Instructions for Authors found online at http://www.tandfonline.com/uaaj for submission instructions.

APPENDICES
A.1. Proofs of Theorems in Section 2.3
A.1.1. Proof of Theorem 2.3
Proof. Note that the mean squared error of 
S (x, i ) can be decomposed as
   
 2  2
E 
S (x, i ) S(x, i dx = E   
S(x, ) S(x, i ) + S (x, i ) S(x, ) dx

   
 2   
=E 
S(x, ) S(x, i ) dx + 2E   
S (x, i ) S(x, ) S(x, ) S(x, i ) dx

 
 2
+E 
S (x, i ) 
S(x, ) dx . (A.1)

First, (2.9) states that


 
 2
E 
S(x, i ) S(x, i ) dx = (1 Zi )2 02 . (A.2)

K K 
Second, it follows
 from the equalities  S (x, i ) 
S(x, i ) = (1 Zi ) r=1 Zr (Sr (x) S0 (x)) / r=1 Zr and Cov S(x, i )
S(x, i ), Sr (x) = 0, r = 1, 2, . . . , n, that
  
E S (x, i ) 
S(x, i )  S(x, ) S(x, i ) = 0. (A.3)

Third, as Var (Sr (x)) dx = 02 /nr + 02 = 02 /Zr ,

2   2 
E S (x, i ) 
S(x, i ) dx = Var S (x, i ) 
S(x, i ) dx


(1 Zi )2  2
K
2 (1 Zi )2
=  K 2 Zr Var (Sr (x)) dx = 0K . (A.4)
r=1 Zr r=1 r=1 Zr

Inserting (A.2), (A.3), and (A.4) into (A.1) leads to the desired equality:
  
r =i Zr + 1
 2
 02 (1 Zi )2 02 02
E S (x, i ) S(x, i ) dx = K + (1 Zi ) 0 = 
2
.
r =i Zr + Zi 0 + ni 0
2 2
r=1 Zr


A.1.2. Proof of Theorem 2.4


Proof. Define Yij = I (Xij > x) and write Yi = (Yi1 , Yi2 , . . . , Yi,ni ) , i = 1, 2, . . . , K. Then E[Yi ] = S0 (x)1 and
Var(Yi ) = 02 (x)I + 02 11 , where I is the identity matrix and 1 is the column vector of 1s, both with proper dimensions.
i    2
Since nj =1 I Xij > x Si (x) = Yi (I 11 /ni )Yi , it is easy to check


     
11
K K K
 11
E [SSE(x)] = E Yi I Y = trace I Var(Y ) = (ni 1)02 (x),
i=1
ni i=1
ni i=1
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 329

which implies the unbiasedness of  02 . Furthermore, write n = (n1 , n2 , . . . , nK ) , N = diag(n1 , n2 , . . . , nK ) (the diagonal matrix
with diagonal elements n1 , n2 , . . . , nK ) and S = (S1 (x), S2 (x), . . . , SK (x)) . Then E[S] = S0 (x)1 and Var(S) = 02 (x)N1 +02 (x)I ,
because
 2 
Var(Si (x)) = Var(1 Yi /ni ) = n2  
i 1 0 (x)I + 0 11 1 = 0 (x)/ni + 0 (x).
2 2 2

Note the expression SSA(x) = S (Nnn /1 n)S. It follows that



nn nn  2 
E[SSA(x)] = S02 (x)1 N  1 + trace N  0 (x)N1 + 02 (x)I
1n 1n
K K 2
( i=1 ni ) i=1 ni 2
2
= (K 1)02 (x) + K 0 (x),
i=1 ni

which implies the unbiasedness of 


02 .
We next prove the consistency of 02 . First note that (2.21) yields
K
Ti ni
2j ni 1

02 = K i=1
, where Ti = Xi(j ) .
i=1 (ni 1) j =1
ni

Since
2
ni
2j n 1 1 ni
1 ni
E[Ti2 ] = E Xi(j ) (2j ni 1)2 E 2
i
Xi(j )
j =1
n i ni j =1
ni j =1

1 
ni
 2   2  (n2i 1)  2 
= 4j 4j (ni + 1) + (ni + 1)2 E Xi1 = E Xi1 ,
ni j =1 3

we have
 
 Var (TK )  E TK2 1  2 (n2K 1)
K K E Xij K <
K=1 ( i=1 (ni 1))2 K=1 ( i=1 (ni 1))2 3 K=1 ( i=1 (ni 1))
2

due to condition (2.17). Thus the consistency of 02 follows from Kolmogorovs strong law of large numbers for independent but
not identically distributed series. To show the consistency of 
02 , note the expression


1 1
K
 2
K SSA(x) dx = K ni (Si (x) S0 (x)) dx
2
S0 (x) S(x) dx.
i=1 ni i=1 ni i=1

 
First, as |x| dS(x) |x| dS0 (x) by the strong law of large numbers and maxx |S0 (x) S(x)| 0 (Glivenko-Cantellis
theorem), under condition (2.17) we have

 2    
S0 (x) S0 (x) dx = x S0 (x) S(x) d S0 (x) S(x)


   
= x S0 (x) S(x) dS(x) x S0 (x) S(x) dS0 (x)


 
max S0 (x) S(x) |x| dS(x) + |x| dS0 (x) 0.
x
330 X. CAI ET AL.

K  
We next treat i=1 ni (Si (x) S0 (x))2 dx/ Ki=1 ni . Note that


(Si (x) S0 (x)) dx = 2
x (Si (x) S0 (x)) d (Fi (x) F0 (x))

0
= x (Si (x) S0 (x)) d (Fi (x) F0 (x)) + x (F0 (x) Fi (x)) d (Fi (x) F0 (x))
0
0 0
= xSi (x) dF i (x) + xS0 (x) dF 0 (x) xFi (x) dF i (x) xF0 (x) dF 0 (x)
0 0
0 0
+ xF0 (x) dF i (x) + xFi (x) dF 0 (x) xS0 (x) dF i (x) xSi (x) dF 0 (x).
0 0

Define Hi (x) = Si (x)I (x 0) + Fi (x)I (x < 0) and H0 (x) = S0 (x)I (x 0) + F0 (x)I (x < 0). Then

(Si (x) S0 (x)) dx = 2
|x|Hi (x) dF i (x) + |x|H0 (x) dF 0 (x)


|x|H0 (x) dF i (x) |x|Hi (x) dF 0 (x)


|x|Hi (x) dF i (x) + |x|H0 (x) dF 0 (x).

Thus
 2
Var ni (Si (x) S0 (x))2 dx n2i E (Si (x) S0 (x))2 dx

 2
n2i E |x|Hi (x) dF i (x) + |x|H0 (x) dF 0 (x)

 2  2
2n2i E |x|Hi (x) dF i (x) + E |x|H0 (x) dF 0 (x)

 
2n2i E x 2 dF i (x) + x 2 dF 0 (x) = 4n2i E[Xi1 2
].

Again, by Kolmogorovs strong law of large numbers,


K  
1 
K ni (Si (x) S0 (x))2 dx E (Si (x) S0 (x))2 dx 0
i=1 ni i=1

almost surely under condition (2.17). 

A.1.3. Proof of Theorem 2.5


 +  +
Proof. Write D = E[ S(x, i ) S(x, i )]2 dx E[
S(x, i ) S(x, i )]2 dx. It can be rearranged as

+
 
D= E (
S(x, i ) + 
S(x, i ) 2S(x, i ))(
S(x, i ) 
S(x, i )) dx

+ +
   
= E (
S(x, i ) 
S(x, i ))2 dx + 2 E ( S(x, i ) S(x, i ))(
S(x, i ) 
S(x, i )) dx.

CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 331

Application of the triangle inequality and the Cauchy-Schwartz inequality yields


+  
|D| E (
S(x, i ) 
S(x, i ))2 dx

 + + 1/2
   
+2 E (
S(x, i ) S(x, i ))2 dx E (
S(x, i ) 
S(x, i ))2 dx

+  + 1/2
  2 2  
= E (
S(x, i ) 
S(x, i ))2 dx + 2 2 0 0 2 E (
S(x, i ) 
S(x, i ))2 dx , (A.5)
0 + n 0

where the equality is due to (2.9). By (A.5), it suffices to show that


+  
lim max E (
S(x, i ) 
S(x, i ))2 dx = 0. (A.6)
K 1iK

Note |Z i Zi | 1 and
     
 n02 n 2  1  2
/ 2

2
/ 2  2
  0 02 
|Z i Zi | =  2 0  =  0 0
  0 0
  .
0 + n02 02 + n 20  n  1 + 02 /n02 1 + 02 /n 20   02 02 

It follows that
 2 
 2 
max |Z i Zi | min  02 02  , 1 = A (say).
1iK 0 0

Because  S(x, i ) = (1 Z i )[S


S(x, i )  0 (x) S0 (x)] + (Zi Zi ) (Si (x) S0 (x)), we see that


2
 2 1 
K

S(x, i ) 
S(x, i ) 2(1 Z i )2 Sr (x) S0 (x) + 2(Z i Zi )2 (Si (x) S0 (x))2
K r=1

K 2
2 
2 (Sr (x) S0 (x)) + 2A2 (Si (x) S0 (x))2 .
K r=1

Consequently,
+
 
max E (
S(x, i ) S(x, i ))2 dx
1iK

K 2
2 +  +
 2 
max E (S r (x) S0 (x)) dx + 2 E A (Si (x) S0 (x)) 2
dx
1iK K 2
r=1
2 +
2 0  
= + 02 + 2 max E A2 (Si (x) S0 (x))2 dx.
K n 1iK

By the Holder inequality, for any > 0,

+  + 2
 2   
2+ 2/(2+)
  2/(2+)
E A (Si (x) S0 (x)) dx E[A ]
2
E (Si (x) S0 (x)) 2(2+)/
dx

 + 2
 2/(2+)   2/(2+)
E[A2+ ] E (Si (x) S0 (x))2 dx

332 X. CAI ET AL.

 2/(2+) 2
 2/(2+) +
02 (x)
= E[A2+ ] + 02 (x) dx .
n

It follows that, under condition (2.25),

+  2
max E S(x, i ) 
S(x, i ) dx
1iK

+ 2 2/(2+) 2
2 02  
2+ 2/(2+) 0 (x)
+ 0 + 2 E[A ]
2
+ 0 (x)
2
dx 0 as K
K n n

since limK E[A2+ ] = 0 by Theorem 2.4. 

A.2. Proofs for the Consistency of Experience Premiums


A.2.1. Dutchs Premium Principle

Proof. Since |(Xi 


( ))+ (Xi ( ))+ | |
( ) ( )| and 
( ) = ZX n + (1 Z)0 E[X| ] a.s.,
 
Z n
Z
n 
 
 (Xi 
( ))+ (Xi ( ))+  Z |
( ) ( )| 0a.s.
n n 
i=1 i=1

This is equivalent to

Z 1
n n
lim (Xi 
( ))+ = lim Z lim (Xi ( ))+ = E [X ( )]+ . (A.7)
n n n n n
i=1 i=1

It follows that

 
(x 
( ))+ dF 0 (x) (x ( ))+ dF 0 (x) + (x 
( ))+ (x ( ))+  dF 0 (x)


(x ( ))+ dF 0 (x) + |
( ) ( )| dF 0 (x)


= (x ( ))+ dF 0 (x) + |
( ) ( )| .

Thus
 
(1 Z) (x 
( ))+ dF 0 (x) (1 Z) (x ( ))+ dF 0 (x) + |
( ) ( )| 0 (A.8)

almost surely as Z 1. Here (A.7) and (A.8) imply that


 
Z
n
 (X| ) = 
H ( ) + (Xi 
( ))+ + (1 Z) (x 
( ))+ dF 0 (x)
n i=1

converges to E[X] + E [(X ( ))+ ] almost surely. This completes the proof. 
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 333

A.2.2. Distortion Premium Principle



Proof. First, note that for any S(x), H (S(x)) can be represented also as H (S(x)) = xg  (S) dF(x). We thus have


 (X| ) H (X| ) =
H xg  (ZSn (x) + (1 Z) S0 (x)) d(ZFn (x) + (1 Z) F0 (x))


xg  (S(x, )) dF(x, )


 
= x g  (ZSn (x) + (1 Z) S0 (x)) g  (S(x, )) d(ZFn (x) + (1 Z) F0 (x))


+ xg  (S(x, )) d(ZFn (x) + (1 Z) F0 (x)) xg  (S(x, )) dF(x, ).

Therefore, the consistency follows from Theorem 2.2, the strong law of large numbers, and Z 1 as n :

 
    
 x g (ZSn (x) + (1 Z) S0 (x)) g (S(x, )) d(ZFn (x) + (1 Z) F0 (x))




  
|x|  g  (ZSn (x) + (1 Z) S0 (x)) g  (S(x, )) d(ZFn (x) + (1 Z) F0 (x))


C max |ZSn (x) + (1 Z) S0 (x) S(x, )| |x| | d(ZFn (x) + (1 Z) F0 (x))| 0
x

and

xg  (S(x, )) d(ZFn (x) + (1 Z) F0 (x)) xg  (S(x, )) dF(x, ) 0.

 (X| ) H (X| ) almost surely.


Consequently, H 

A.3. Proofs for Section 3.2


A.3.1. Proof of Example 3.1

Proof. By the modeling assumptions it is easy to see ( ) = 1/ , 0 = /( 1), 2 = 2 /[( 1)( 2)], and 2 =
2 /[( 1)2 ( 2)]. Hence Z c = n/(n + 1) and 
c ( ) = (nX n +)/(n+ 1). On the other hand, as S0 (x) = (/( + x))
and E [S(x, ) ] = (/( + 2x)) , we have
2

2

02 (x) = and 02 (x) = ,
+ 2x +x +x + 2x

implying 02 = /[2(2 1)( 1)] and 02 = /[2( 1)]. The credibility factor and the estimator of (3.3) are then given,
respectively, by

n n 2 1
Z= and 
( ) = Xn + .
n + 2 1 n + 2 1 n + 2 1 1

Hence the expected squared errors are given by (3.6). 


334 X. CAI ET AL.

A.3.2. Proof of Example 3.3


) n n  
Proof. First, ( |Xn ) ( ) ni=1 f (Xi , ) i=1 Xi (1 )n i=1 Xi Beta( ni=1 Xi + 1, n ni=1 Xi + 1). Next, as
E[XehX | ] = eh and mh ( ) = E[ehX | ] = 1 + (eh 1), the Bayes premium can be given by

  E[Xn+1 ehXn+1 |Xn ] E[ eh |Xn ] (nXn + 1)eh


HB Xn = = = .
E[ehXn+1 |Xn ] E[1 + (eh 1)|Xn ] (nX + 1)(eh 1) + n + 2

Further note that



1, if x < 0,   1, if x < 0,
S0 (x) = 1/2, if 0 x < 1, E S(x, )2 = 1/3, if 0 x < 1,

0, if x 1, 0, if x 1,
 
02 = 1/12, 02 = 1/6, and Z = n/(n + 2). Thus Hcu (Xn ) = ( ni=1 Xi ehXi + eh )/( ni=1 ehXi + 1 + eh ). Because Xi takes values
0 and 1 only, we have Xi ehXi = Xi eh and ehXi = eh Xi + 1 Xi . Straightforward computation then gives Hcu (Xn ) = HB (Xn ). 

A.3.3. The Monte Carlo Approximation of Pans Credibility Premiums under the Esscher Principle

Algorithm A.1. The algorithm comprises the following steps:



m m values i ,i = 1, 2, . . . , m from distribution = Gamma(, e + 1) and compute their sample
h
Step 1. Randomly sample
1
mean = m
i=1 i ;
Step 2. For each i , generate r samples, each of which n consists n i.i.d. values form P oisson(i ):{xij 1, xij 2, , xij n },
of
j = 1, 2, . . . , r. For each j, compute Hij = s=1 xij s e
hxij s
/ ns=1 ehxij s and let Ui = Hi = 1r rj =1 Hij and Vi =

(r 1)1 rj =1 (Hij Hi )2 .
 m m
Step 3. Let a = e h
(m 1)1 m i=1 (i )Ui ,b = (m 1)
1
i=1 (Ui U ) ,c = m
2 1
i=1 Vi , and
d = m1 m i=1 Ui . Then, Z
(P )
a/(b + c) and E [hn ( )] d.
Step 4. Finally, the credibility estimator HP (Xn ) can be computed by
n
  a i=1 Xi e
hXi
eh ad
HP Xn  n + . (A.9)
b+c i=1 e
hX i e +1 b+c
h

A.3.4. Proof of Equalities (3.12) and (3.13)



Proof. Since S(x, ) = k
k=x+1 e /k!, we can write


1  ( + k) 1   ( + i + j )
E [S(x, )] = and E [S(x, )]2 = .
() k=x+1 k! ( + 1)+k () i=x+1 j =x+1 i!j ! ( + 2)+i+j

Consequently,


 +1 k
1   ( + k)  ( + 1 + k) 1
E [S(x, )] = = = ,
x=0
() x=0 k=x+1 k! ( + 1)+k k=0 k! ( + 1) +1 +1

where the last equality holds because the summands are the probabilities of a negative binomial distribution. It follows that


 
    min(i, j ) ( + i + j )
02 = E [S(x, )] E S(x, )2 =
x=0
i=1 j =1
i!j ! () ( + 2)+i+j
CREDIBILITY ESTIMATION OF DISTRIBUTION FUNCTIONS 335

and


  
02 = Var (S(x, )) = E [S(x, )]2 {E [S(x, )]}2
x=0 x=0


2
   ( + i + j )   ( + k)
=
x=0 i=x+1 j =x+1
i!j ! () ( + 2) +i+j
x=0 k=x+1
k! () ( + 1)+k



   2 ( + i) ( + j )
( + i + j )
=
x=0 i=x+1 j =x+1
i!j ! () ( + 2)+i+j x=0 i=x+1 j =x+1
i!j ! ()2 ( + 1)2+i+j
    
( + i + j ) ( + i) ( + j )
=
x=0 i=x+1 j =x+1
i!j ! () ( + 2)+i+j () ( + 1)2+i+j

 min(i,j
)1  
( + i + j ) ( + i) ( + j )
=
i=1 j =1 x=0
i!j ! () ( + 2)+i+j () ( + 1)2+i+j
   
min(i, j ) ( + i + j ) ( + i) ( + j )
= .
i=1 j =1
i!j ! () ( + 2)+i+j () ( + 1)2+i+j


A.3.5. Proof of Equality (3.14)


Proof. It is easy to see that the Bayes premium is

  ( + nX n )eh
HB Xn = . (A.10)
+ n eh + 1

It is not difficult to calculate Cov (H (X| ), ( )) = eh /( eh + 1)2 , Var (( )) = /( eh + 1)2 and E [( )] =


E [Var(X| )] = /( eh + 1). Therefore,

Cov (H (X| ), ( )) neh


ZG = 1
= ,
Var (( )) + n E [Var(X| )] + n eh + 1

and the credibility estimator of Gerbers is given by


 
  + nX n eh
HG Xn = Z G Xn + Hcol (X) Z G E [( )] = . (A.11)
+ n eh + 1

From (A.10) and (A.11), we have HG (Xn ) = HB (Xn ).