You are on page 1of 38

Insurance Analytics

Prof. Julien Trufin

Année académique 2020-2021

1
Performance evaluation

2
Introduction

Context

• Actuaries resort to advanced statistical tools to be able to accurately assess


the risk profile of the policyholders.
• If the data are subdivided into groups determined by many features, actuaries
are often faced with sparsely populated risk classes so that simple averages
become suspect and regression models are needed.
• Regression models predict a response variable from a function of features.
• Actuarial pricing models are generally calibrated so that a measure of the
goodness-of-fit is optimized (deviance or log-likelihood, in most cases).
- Include in-sample errors and out-of-sample errors (predictive performance
criteria).

3
Introduction

Context

• In this chapter, we aim to evaluate performance of a candidate premium


based on the two following aspects :
- the variability of the resulting premium amounts, as larger premium
differentials induce more lift.
- the ability of the premium income to match the true one for increasing risk
profiles.
• The first objective can be formalized with the help of the convex order that
can be characterized by means of the Lorenz curves.
• The second objective can be assessed by means of concentration curves.

4
True and working pure premiums

Regression function

• Consider a response Y and a set of features X1 , . . . , Xp gathered in the


vector X .
• The dependence structure inside the random vector (Y , X1 , . . . , Xp ) is
exploited to extract the information contained in X about Y .
• In actuarial pricing, the target is µ(X ) = E[Y |X ].
• µ(X ) is generally unknown and approximated by a (working, or actual)
premium π(X ).
• The merits of a given pricing tool can be assessed using the pair
µ(X ), π(X ) .
- The premium π(X ) has to be as close as possible to the true premium µ(X ).

5
True and working pure premiums

Technical assumptions

• All predictors π(X ) under consideration and µ(X ) are continuous random
variables.
• A predictor π(X ) is supposed to be correct on average, that is,

E[π(X )] = E[µ(X )] = E[Y ].

6
True and working pure premiums

Notation

• Fπ (t) the distribution function of π(X ), i.e.

Fπ (t) = P[π(X ) ≤ t], t ≥ 0,

• fπ the probability density function of π(X ), i.e.


Z t
Fπ (t) = fπ (s)ds, t ≥ 0.
0

• Fπ−1 the associated quantile function (or Value-at-Risk) defined as the


generalized inverse of Fπ , i.e.

Fπ−1 (α) = inf{t|Fπ (t) ≥ α} for a probability level α.

• Our continuity assumption ensures that the identity


 
Fπ Fπ−1 (α) = α holds true for all probability levels α.

7
True and working pure premiums

Convex order

• The more π(X ) is dispersed, the more information it contains about the true
premium.
- The constant predictor π(X ) = E[Y ], the least dispersed one, does not bring
any information about the relative riskiness of the different policies.
• Definition :
Consider two non-negative random variables Z1 and Z2 . Then, Z1 is said to
be smaller than Z2 in the convex order, henceforth denoted as Z1 cx Z2 , if

E[g(Z1 )] ≤ E[g(Z2 )]

for all the convex functions g for which the expectations exist.

8
True and working pure premiums

Convex order

• We have
Z1 cx Z2 ⇒ V[Z1 ] ≤ V[Z2 ].
⇒ cx is a variability order : it only applies to random variables with the
same expected value and compares the dispersion of these variables.
• We can interpret Z1 cx Z2 as “Z2 is more variable than Z1 ”.
- The variability in question extends beyond the simple comparison of standard
deviation.

9
Performance curves

Concentration curve

• Definition :
The concentration curve of the true premium µ(X ) with respect to the
working premium π(X ) is defined as

E µ(X )I[π(X ) ≤ Fπ−1 (α)]


 
α 7→ CC[µ(X ), π(X ); α] = .
E[µ(X )]

• Interpretation
 : 
CC µ(X ), π(X ); α represents the proportion of the total true premium
income corresponding to the sub-portfolio π(X ) ≤ Fπ−1 (α), i.e. to the
100α% of contracts with the smallest premium π.

10
Performance curves

Concentration curve

• Idea :
Policies with low-risk profiles are at risk of leaving the portfolio, being
attracted by a competitor.
- It is therefore important not to over-charge this group of policyholders.
- Hence the importance of the concentration curve to assess the appropriateness
of the premium π.

11
Performance curves

From premiums to ranks

• Notice that
π(X ) ≤ Fπ−1 (α) ⇔ Fπ π(X ) ≤ α.


- It is enough to consider the ranking induced by the predictor.


- We are free to replace every predictor π(X ) with the corresponding rank

Π = Fπ π(X ) ∼ Uni(0, 1).

• Π is the rank of a policyholder, once all contracts have been ordered


according to their corresponding premiums.
• A concentration curve alone is not enough to assess performance of π. Only
the rank induced by π matters, not the actual values of π.
⇒ We also rely on the Lorenz curve.

12
Performance curves

Lorenz curve

• Definition :
The Lorenz curve LC associated with the predictor π(X ) is defined as

α 7→ LC[π(X ); α] = CC[π(X ), π(X ); α]


E π(X )I[π(X ) ≤ Fπ−1 (α)]
 
= .
E[π(X )]

• Interpretation :
A Lorenz curve is thus strictly related to dispersion (or variability) by
definition.
- It is known that increasing the predictor π(X ) in the convex order moves its
Lorenz curve lower.

13
Performance curves

Concentration curve and Lorenz curve

• If π(X ) = µ(X ) then

LC[π(X ); α] = CC[µ(X ), π(X ); α]

for all probability levels α.


- The sub-portfolio corresponding to π(X ) ≤ Fπ−1 (α) is in equilibrium.
• In practice, π(X ) only approximates µ(X ) : π(X ) 6= µ(X ).
⇒ We resort to the pair of curves CC[µ(X ), π(X ); α] and LC[π(X ); α] to
evaluate performance of a pricing model.
• A large difference between the two performance curves thus suggests that
π(X ) poorly approximates µ(X ).

14
Performance curves

Estimation concentration curve


 
• µ(X ) is not observed in reality ⇒ CC µ(X ), π(X ); α ?
• Actually, from the law of total expectation :

E Y I[π(X ) ≤ Fπ−1 (α)]


 
 
CC µ(X ), π(X ); α = CC[Y , π(X ); α] = .
E[Y ]

⇒ A concentration curve can also be interpreted as the proportion of the


total losses Y attributable to the sub-portfolio gathering a given proportion
α of policies with the lowest predictions.

15
Performance curves

Estimation concentration curve

• We can equivalently replace the pure premium µ(X ) with the response Y in
the concentration curve.
• Assuming the samples (Yi , X i ), i = 1, . . . , n, to be iid, the concentration
curve can be estimated as follows :
 
CC
c µ(X ), π(X ); α = CC[Y c , π(X ); α]
1 X
= Yi
nY bπ−1 (α)
i|b
π (X i )≤F
P
i|b
π (X i )≤F b −1 (α) Yi
= Pn π .
i=1 Yi

• CC
c expresses the total sub-portfolio loss in relative terms, as a percentage of
the aggregate loss at the entire portfolio level.

16
Performance curves

Estimation Lorenz curve

• The empirical version of the Lorenz curve is obtained as


P
 i|b b −1 (α) π
π (X i )≤F b(X i )
LC π(X ); α =
c Pn π .
i=1 π
b(X i )

• LC
c expresses the percentage of the total premium income corresponding to
the 100α% smaller premiums when the latter are computed using a predictor
π.

17
Performance curves

Properties

• The Lorenz curve inherits the properties of the concentration curve as a


special case.
• Monotonicity :
The concentration
 curve
 is based on the function
t 7→ E Y I[π(X ) ≤ t] /E[Y ] evaluated at quantiles of π(X ). This function is
clearly non-decreasing, starting from (0, 0) to reach (1, 1).
⇒ α 7→ CC[Y , π(X ); α] is non-decreasing and satisfies

lim CC[Y , π(X ); α] = 0 and lim CC[Y , π(X ); α] = 1.


α→0 α→1

18
Performance curves

Properties

• Line of independence/equality :
- If Y and π(X ) are independent then

E[Y ]P π(X ) ≤ Fπ−1 (α)


 
CC[µ(X ), π(X ); α] = = α.
E[Y ]

- If π(X ) brings a lot of information about the true premium µ(X ), then the
concentration curve should be far from the line of independence.

19
Performance curves

Properties

• Line of independence/equality :
- Proposition :
If µ(X ) is positively expectation dependent on π(X ), that is, if the inequality
 
E[µ(X )] ≥ E µ(X ) π(X ) ≤ t

holds for all t, then

CC[µ(X ), π(X ); α] ≤ α for all probability levels α.

Proof :
It suffices to write
   
E µ(X )I[π(X ) ≤ t] P[π(X ) ≤ t]E µ(X ) π(X ) ≤ t
=
E[Y ] E[Y ]
≤ P[π(X ) ≤ t].

The announced then follows by replacing t with Fπ−1 (α).

20
Performance curves

Properties

• Convexity :
- Proposition :
The concentration curve α 7→ CC[µ(X ), π(X ); α] is convex if, and only if,
µ(X ) is positively regression dependent on π(X ), that is, if the function
 
t 7→ E µ(X ) π(X ) = t

is non-decreasing.
- The increments of the function
E Y I[Fπ−1 (α) < π(X ) ≤ Fπ−1 (α + ∆)]
 
CC[Y , π(X ); α + ∆] − CC[Y , π(X ); α] =
E[Y ]

are thus non-decreasing in α.

21
Performance curves

Measuring goodness-of-lift

• Performance of a predictor :
- The performances of a predictor π(X ) is assessed by means of the respective
positions of the two curves

α 7→ LC[π(X ); α] and α 7→ CC[µ(X ), π(X ); α].

- As the total expected income of π and µ match the total expected loss, the
two ratios are directly comparable.
- As actuaries, we would like that the graph of CC is as close as possible to the
graph of LC.
⇒ The smaller the area between the two curves the better.

22
Comparison of the performances of two predictors

Concentration and Lorenz curves

• We have two predictors π1 and π2 .


• Definition :
The premium π1 (X 1 ) is more discriminatory than π2 (X 2 ) if, and only if,

π2 (X 2 ) cx π1 (X 1 ) ⇔ LC[π1 (X 1 ); α] ≤ LC[π2 (X 2 ); α] for all α

and the inequality

CC[µ(X ), π1 (X 1 ); α] ≤ CC[µ(X ), π2 (X 2 ); α]

holds for all probability levels α.

23
Comparison of the performances of two predictors

Concentration and Lorenz curves

• Proposition :
If
π2 (X 2 ) cx π1 (X 1 ) and (Y , Π2 ) conc (Y , Π1 )
then predictor π1 (X 1 ) is more discriminatory than predictor π2 (X 2 ) for
response Y .
• π1 (X 1 ) is more discriminatory than π2 (X 2 ) if π1 (X 1 ) is simultaneously
more variable in the sense of the convex order) and more correlated (in the
sense of the concordance order) with the response Y than π2 (X 2 ).

24
Comparison of the performances of two predictors

Integrated concentration and Lorenz curves

• The preference relation proposed earlier only forms a partial ranking :


- Two predictors might well be incomparable because their respective
concentration or Lorenz curves intersect.
• In such a case, we can base the comparison on the integral of the
concentration curves :
Z α
ICC[µ(X ), π(X ); α] = CC[µ(X ), π(X ); ξ]dξ
0
   
α E µ(X )I[Π ≤ ξ] E µ(X )(α − Π)+
Z
= dξ =
0 E[Y ] E[Y ]
 
Cov µ(X ), (α − Π)+  
= + E (α − Π)+ ,
E[Y ]

where
α α2
Z
 
E (α − Π)+ = (α − ξ)dξ = .
0 2

25
Comparison of the performances of two predictors

Integrated concentration and Lorenz curves

• Again, as
   
E Y (α − Π)+ = E E[Y (α − Π)+ |X ]
 
= E µ(X )(α − Π)+

we are allowed to replace µ(X ) with Y in the definition of the integrated


concentration curve.
• ICC is the integral of the concentration curve over the whole interval [0, 1],
i.e.

ICC = ICC[µ(X ), π(X ); 1]


 
Cov µ(X ), 1 − Π 1
= +
E[Y ] 2
 
1 Cov µ(X ), Π
= − .
2 E[Y ]

26
Comparison of the performances of two predictors

Some useful insurance metrics : Area Between the Curves

• The area between the two curves CC and LC turns out to be a better
performance indicator.
• This area between the curves, ABC in short, is given by
Z 1  
ABC[π(X )] = CC[Y , π(X ); α] − LC[π(X ); α] dα
0
1
Z 1     
= E Y I[Π ≤ α] − E π(X )I[Π ≤ α] dα
E[π(X )] 0
Z 1Z ∞
1    
= P π(X ) ≤ y, Π ≤ α] − P Y ≤ y, Π ≤ α] dydα
E[π(X )] 0 0
1     
= Cov π(X ), Π − Cov Y , Π .
E[π(X )]

27
Numerical examples

Assumptions

• π(X ) ∼ Gam(µ, σ 2 ) with µ = 1.


- Ordered in the cx -sense with σ.
2
• µ(X ) ∼ Gam(µ, σY ) with µ = 1.
• Remark : E[µ(X )] = E[π(X )] = 1.
• Dependence structures linking µ(X ) and π(X ) :
- Clayton copula :
 −1/θ
Cθ (u, v ) = u −θ + v −θ − 1 , θ > 0.

- Frank’s copula :
 
1 (exp(−θu) − 1)(exp(−θv ) − 1)
Cθ (u, v ) = − ln 1 + , θ 6= 0.
θ exp(−θ) − 1

Express positive dependence ⇒ Previous results hold true.

28
Numerical examples

Variability

Line type π(X ) µ(X ) Copula C ABC


medium dash Gam(1, 1) Gam(1, 2) Clayton(τ = 0.5) 6.33%
short dash Gam(1, 1) Gam(1, 1) Clayton(τ = 0.5) 9.66%
dotted Gam(1, 1) Gam(1, 0.5) Clayton(τ = 0.5) 13.08%

29
Numerical examples

Dependence

Line type π(X ) µ(X ) C ABC


medium dash Gam(1, 1) Gam(1, 1) Clayton(τ = 0.75) 3.46%
short dash Gam(1, 1) Gam(1, 1) Clayton(τ = 0.50) 9.66%
dotted Gam(1, 1) Gam(1, 1) Clayton(τ = 0.25) 17.04%

30
Numerical examples

Crossing copulas
• Consider a Clayton copula C1 and a Frank copula C2 .
• There exists a function f such that C1 (u, v ) − C2 (u, v ) ≤ 0 if v ≤ f (u) and
C1 (u, v ) − C2 (u, v ) ≥ 0 if v ≥ f (u).
⇒ Not ordered according to the concordance order.
Line type π(X ) µ(X ) C ABC
short dash Gam(1, 1) Gam(1, 1) Frank(τ = 0.5) 7.79%
dotted Gam(1, 1) Gam(1, 1) Clayton(τ = 0.5) 9.66%

31
Numerical examples

Non-regression dependent copula impact


• Consider the mixture
C (u, v ) = (1 − θ) min{u, v } + θ max{0, u + v − 1}.

- Does not exhibit positive quadrant dependence.


- Positive expectation dependence if, and only if, θ ≤ 21 .
Line type π(X ) µ(X ) C ABC
dotted Gam(1, 1) Gam(1, 1) θ = 0.8 10%

32
Case study

Data set

• French motor third-party liability insurance portfolio freMTPL2freq available


in the CASdatasets package in R.
- 678,013 observations ;
- Response Y : number of claims ;
- 9 explanatory variables (X = (X1 , . . . , X9 )) :
Policyholder : age, density of inhabitants in the home city, region, area,
bonus-malus ;
Car : power, age, brand, fuel type ;
- Exposure-to-risk.
• Partition the data set into
- Training set of 610,000 observations ;
- Validation set comprising the remaining observations.

33
Case study

Models

• Models of Noll et al. (2018) for the predictors πk (X k ) :


- glm1 : Poisson GLM with a log-link function and all explanatory variables
- glm3 : same as glm1 but without area and region variables
- pbm1 : boosted SBS (Standardized Binary Splits) tree (depth = 1, iterations
= 30)
- pbm3 : boosted SBS tree (depth = 3, iterations = 50)
- pbm3.s2 : boosted SBS tree (depth = 3, iterations = 50, shrinkage = 0.5)
- glm1.pbm3 : boosted SBS tree starting from glm1 fit (depth = 3, iterations =
50)
- nn : shallow neural network (20 neurons with one hidden layer).

34
Case study

In- and out-of-sample errors

35
Case study

Goodness-of-lift metrics

36
Case study

Goodness-of-lift metrics

37
References

References
Denuit, M., Sznajder, D., Trufin, J. (2019).
Model selection based on Lorenz and concentration curves, Gini indices and convex order.
Insurance : Mathematics and Economics 89, 128-139.
Frees, E.W., Meyers, G., Cummings, A.D. (2011).
Summarizing insurance scores using a Gini index.
Journal of the American Statistical Association 106, 1085-1098.
Frees, E.W., Meyers, G., Cummings, A.D. (2013).
Insurance ratemaking and a Gini index.
Journal of Risk and Insurance 81, 335-366.
Gourieroux, C. (1992).
Courbes de performance, de sélection et de discrimination.
Annales d’Économie et de Statistique 28, 107-123.
Gourieroux, C., Jasiak, J. (2011).
The Econometrics of Individual Risk : Credit, Insurance, and Marketing.
Princeton University Press.

38

You might also like