You are on page 1of 284

Deeper Understanding, Faster Calculation

--Exam C Insights & Shortcuts


6th Edition

by Yufeng Guo
Fall 2009
The Missing Manual

This electronic book is intended for individual buyer use for the sole purpose of preparing for
Exam C. This book can NOT be resold to others or shared with others. No part of this publication
may be reproduced for resale or multiple copy distribution without the express written permission
of the author.

2009, 2010 By Yufeng Guo


Guo Fall 2009 C, Page 1 / 284

Table of Contents
Introduction 4
Chapter 1 Doing calculations 100% correct 100% of the time.. 5
6 strategies for improving calculation accuracy ............................................................. 5
6 powerful calculator shortcuts....................................................................................... 6
#1
Solve ax 2 + bx + c = 0 . .................................................................................... 6
#2
Keep track of your calculation...................................................................... 10
#3
Calculate mean and variance of a discrete random variable......................... 21
#4
Calculate the sample variance....................................................................... 29
#5
Find the conditional mean and conditional variance .................................... 30
#6
Do the least squares regression ..................................................................... 36
#7
Do linear interpolation .................................................................................. 46

Chapter 2

Maximum likelihood estimator ......................................... 52

Basic idea ...................................................................................................................... 52


General procedure to calculate the maximum likelihood estimator ............................. 53
Fisher Information ........................................................................................................ 58
The Cramer-Rao theorem ............................................................................................. 62
Delta method................................................................................................................. 66

Chapter 3

Kernel smoothing................................................................ 75

Essence of kernel smoothing ........................................................................................ 75


Uniform kernel.............................................................................................................. 77
Triangular kernel........................................................................................................... 82
Gamma kernel............................................................................................................... 90

Chapter 4

Bootstrap.............................................................................. 95

Essence of bootstrapping .............................................................................................. 95


Recommended supplemental reading ........................................................................... 96

Chapter 5

Bhlmann credibility model ............................................ 102

Trouble with black-box formulas................................................................................ 102


Rating challenges facing insurers ............................................................................... 102
3 preliminary concepts for deriving the Bhlmann premium formula ....................... 106
Preliminary concept #1 Double expectation ....................................................... 106
Preliminary concept #2 Total variance formula.................................................. 108
Preliminary concept #3 Linear least squares regression ..................................... 111
Derivation of Bhlmanns Credibility Formula.......................................................... 112
Summary of how to derive the Bhlmann credibility premium formulas .................. 117
Special case................................................................................................................. 122
How to tackle Bhlmann credibility problems ........................................................... 123
An example illustrating how to calculate the Bhlmann credibility premium ........... 123
Shortcut ....................................................................................................................... 126
Practice problems........................................................................................................ 126

Chapter 6

Bhlmann-Straub credibility model ............................... 148

Context of the Bhlmann-Straub credibility model.................................................... 148


Assumptions of the Bhlmann-Straub credibility model............................................ 149
Summary of the Bhlmann-Straub credibility model................................................. 154
Guo Fall 2009 C, Page 2 / 284

General Bhlmann-Straub credibility model (more realistic) .................................... 155


How to tackle the Bhlmann-Straub premium problem ............................................. 158

Chapter 7

Empirical Bayes estimate for the Bhlmann model...... 168

Empirical Bayes estimate for the Bhlmann model ................................................... 168


Summary of the estimation process for the empirical Bayes estimate for the
Bhlmann model..................................................................................................... 170
Empirical Bayes estimate for the Bhlmann-Straub model........................................ 173
Semi-parametric Bayes estimate................................................................................. 182

Chapter 8

Limited fluctuation credibility ........................................ 187

General credibility model for the aggregate loss of r insureds ................................. 188
Key interim formula: credibility for the aggregate loss............................................. 190
Final formula you need to memorize .......................................................................... 191
Special case................................................................................................................. 192

Chapter 9

Bayesian estimate ......................................................... 202

Intuitive review of Bayes Theorem ........................................................................... 202


How to calculate the discrete posterior probability .................................................... 206
Framework for calculating the discrete posterior probability..................................... 208
How to calculate the continuous posterior probability ............................................... 213
Framework for calculating discrete-prior Bayesian premiums................................... 219
Calculate Bayesian premiums when the prior probability is continuous.................... 251
Poisson-gamma model ................................................................................................ 260
Binomial-beta model................................................................................................... 264

Chapter 10 Claim payment per payment ........................................... 268


Chapter 11 LER (loss elimination ratio)............................................. 274
Chapter 12 Find E(Y-M)+.................................................................... 276
About the author .................................................................................... 284

Guo Fall 2009 C, Page 3 / 284

Introduction
This manual is intended to be a missing manual. It skips what other manuals explain well.
It focuses on what other manuals dont explain or dont explain well. This way, you get
your moneys worth.
Chapter 1 teaches you how to do manual calculation quickly and accurately. If you
studied hard but failed Exam C repeatedly, chances are that you are concept strong,
calculation weak. The calculator techniques will improve our calculation accuracy.
Chapter 2 focuses on the variance of a maximum likelihood estimator (MLE), a difficult
topic for many.
Chapter 3 explains the essence of kernel smoothing and teaches you how to derive
complex kernel smoothing formulas for k y ( x ) and K y ( x ) . You shouldnt have any
trouble memorizing complex kernel smoothing formulas after this chapter.
Many candidates dont know the essence of bootstrap. Chapter 4 is about bootstrap.
Chapter 5 explains the core theory behind the Bhlmann credibility model.
Chapter 6 compares and contrasts the Bhlmann-Straub credibility models with the
Bhlmann credibility model.
Many candidates are afraid of empirical Bayes estimate problems. The formulas are just
too hard to remember. Chapter 7 will relieve your pain.
Many candidates find that there are just too many limited fluctuation credibility formulas
to memorize. To address this, Chapter 8 gives you a unified formula.
Chapter 9 presents a framework for quickly calculating the posterior probability (discrete
or continuous) and the posterior mean (discrete or continuous). Many candidates can
recite Bayes theorem but cant solve related problem in the exam condition. Their
calculation is long, tedious, and prone to errors. This chapter will drastically improve
your calculation efficiency.
Chapter 10 is about claim payment per payment.
Chapter 11 is about loss elimination ratio.
Chapter 12 is about how to quickly calculate E (Y

M )+ .

Guo Fall 2009 C, Page 4 / 284

Chapter 1
the time

Doing calculations 100% correct 100% of

>From: Exam C candidate (name removed)


>To: yufeng_guo@msn.com
>Subject: Help..
>Date: someday in 2006
>
>Hello Mr. Guo.
>
> I tried Exam C problems under the exam-like condition. To my surprise, I found that I
>made too many mistakes; one mistake is 1+1=3. How can I improve my accuracy?

6 strategies for improving calculation accuracy


1. Gain a deeper understanding of a core concept. People tend to make errors if they
memorize a black-box formula without understanding the formula. To reduce
errors, try to understand core concepts and formulas.
2. Learn how to solve a problem faster. Many exam candidates solve hundreds of
practice problems yet fail Exam C miserably. One major cause is that their
solutions are inefficient. Typically, these candidates copy solutions presented in a
textbook and study manuals. Authors of textbooks and many study manuals
generally use software to do the calculations. To solve a messy calculation, they
just type up the formula and click Compute button. However, when you take
the exam, you have to calculate the answer manually. A solution that looks clean
and easy in a textbook may be a nightmare in the exam. When you prepare for
Exam C, dont copy textbook solutions. Improve them. Learn how to do manual
calculation faster.
3. Build solution frameworks and avoid reinventing the wheel. If you analyze Exam
C problems tested in the past, youll see that SOA pretty much tests the same
things over and over. For example, the Poisson-gamma model is tested over and
over. When preparing for Exam C, come up with a ready-to-use solution
framework for each of the commonly tested problems in Exam C. This way, when
you walk into the exam room and see a commonly tested problem, you dont need
to solve the problem from scratch. You can use your pre-built solution framework
and solve it quickly and accurately.
4. Keep an error log. Whenever you solve some practice problems, record your
errors in a notebook. Analyze why you made errors. Try to solve a problem
differently to avoid the error. Review your error log from time to time. Using an
error log helps you avoid making the same calculation errors over and over.
5. Avoid doing mental math in the exam even for the simplest calculations. Even if
you are solving a simple problem like 2+3, use your calculator to solve the
Guo Fall 2009 C, Page 5 / 284

problem. Simply enter 2 + 3 in your calculator. This will reduce your silly
errors.
6. Learn some calculator tricks.

6 powerful calculator shortcuts


Fast and safe techniques for common calculations.

#1

Solve ax 2 + bx + c = 0 .

b b 2 4ac
is OK when a, b, and c are nice and small numbers.
2a
However, when a, b, and c have many decimals or are large numbers and we are in the
pressured situation, the standard solution often falls apart in the heat of the exam.
The formula x =

Example 1. Solve 0.3247 x 2 89.508 x + 0.752398 = 0 in 15 seconds.

If candidates need to solve this equation in the exam, many will fluster. The standard

b b 2 4ac
approach x =
is labor intensive and prone to errors when a, b, and c are
2a
messy.
To solve this equation 100% right under pressure and in a hurry, well do a little trick.
1
First, we set x = v =
. So we treat x as a dummy discount factor. The original
1+ r
equation becomes:
0.3247v 2 89.508v + 0.752398 = 0

If we can find r , the dummy interest rate, well be able to find x .


Finding r is a concept you learned in Exam FM. We first convert the equation to the
following cash flow diagram:

Time t

Cash flow

$0.752398

- $89.508

$0.3247
Guo Fall 2009 C, Page 6 / 284

So at time zero, you receive $0.752398. At time one, you pay $89.508. Finally, at time
two, you receive $0.3247. Whats your IRR?
To find r (the IRR), we simply use Cash Flow Worksheet in BA II Plus or BA II Plus
Professional.
Enter the following cash flows into Cash Flow Worksheet:
Cash Flow

CF 0
0.752398

C 01
- 89.508
F 01
1

Frequency

C 02
0.3247
F 02
1

Because the cash flow frequency is one for both C 01 and C 02 , we dont need to enter
F 01 = 1 and F 02 = 1 . If we dont enter cash flow frequency, BA II Plus and BA II Plus
Professional use one as the default cash flow frequency.
Using the IRR function, we find that IRR = 99.63722807 . Remember this is a
percentage. So r = 99.63722807%
x1 =

1
1
=
= 275.6552834
1 + r 1 99.63722807%

How are going to find the second root? Well use the following formula:
If x1 and x2 are the two roots of ax 2 + bx + c = 0 , then
x1 x2 =

x2 =

c
a

x2 =

1 c

x1 a

1 c
1
0.752398
=

= 0.00840619
x1 a 275.6552834 0.3247

Guo Fall 2009 C, Page 7 / 284

Keystrokes in BA II Plus / BA II Plus Professional


Procedure
Assume we set the calculator to
display 8 decimal places.
Use Cash Flow Worksheet

Keystroke

Display

CF

CF0=(old content)

2nd [CLR WORK]

CF0=0.00000000
CF0=0.752398

Clear Worksheet
Enter the cash flow at t = 0.
Enter the cash flow at t =1.
C01 0.00000000

89.508 +/- Enter

C01= - 89.50800000

Enter the # of cash flows for C01


F01= 1.00000000
The default # is 1. So no need to
enter anything.
Enter the cash flow at t =2.
C02 0.00000000

0.3247 Enter

C02= 0.32470000

Calculate IRR
IRR

IRR=0.00000000

CPT
%

IRR= - 99.63722807
IRR 0.9963722807 (This is the
dummy interest)

+ 1=

IRR 0.00362772

1x

IRR 275.65528324
This is x1

STO 0

IRR 275.65528324

1 x 0.752398 0.3247

0.00840619 This is

Find the dummy discount factor

x1 =

1
1 + IRR %

Store in Memory 0. This leaves an


auditing trail.
Find the 2nd root.

x2 =

1 c

x1 a

Store in Memory 0. This leaves an


auditing trail.

x2

=
STO 1

0.00840619

Guo Fall 2009 C, Page 8 / 284

You can always double check your calculations. Retrieve x1 and x2 from the calculator
memory and plug in 0.3247 x 2 89.508 x + 0.752398 . You should get a value close to
zero. For example, plugging in x1 = 275.6552834 :
0.3247 x 2 89.508 x + 0.752398 = 0.00000020 (OK)
Plugging in x2 = 0.00840619
0.3247 x 2 89.508 x + 0.752398 = 6.2 10

12

(OK)

We didnt get a zero due to rounding.


Does this look at lot of work? Yes at the first time. Once you get familiar with this
process, it takes you 15 seconds to finish calculating x1 and x2 and double checking they
are right.
Quick and error-free solution process to ax 2 + bx + c = 0
Step 1 Rearrange ax 2 + bx + c = 0 to c + bx + ax 2 = 0 .
Step 2 Use BA II Plus/BA II Plus Professional Cash Flow Worksheet to find IRR
CF 0 = c (cash flow at time zero)
C 01 = b (cash flow at time one)
C 02 = a (cash flow at time two)

Time t

Cash flow

Step 3 Find x1 and x2


x1 =

1
1 c
, x2 =
IRR
x1 a
1+
100

Step 4 Plug in x1 and x2 . Check whether ax 2 + bx + c = 0


Guo Fall 2009 C, Page 9 / 284

In the exam, if an equation is overly simple, just try out the answer. If an equation is not
overly simple, always use the above process to solve ax 2 + bx + c = 0 .
For example, if you see x 2 2 x 3 = 0 , you can guess that x1 = 1 and x2 = 3 . However,
if you see x 2 2 x 7.3 = 0 , use Cash Flow Worksheet to solve it.
Exercise
#1
Solve 10,987 x 2 + 65,864 x + 98,321 = 0
Answer: x1 = 7.2321003 and x2 = 1.23737899
#2
Solve x 2 2 x 7.3 = 0 .
Answer: x1 = 3.88097206 and x2 = 1.88097206
#3
Solve 0.9080609 x 2 0.00843021x 0.99554743 = 0
Answer: x1 = 1.0517168 and x2 = 1.04243305
#4
Solve x 2 2 x + 3 = 0 .
Answer: youll get an error message if want to calculate IRR. Theres no solution.
2
x 2 2 x + 3 = ( x 1) + 2 2 . So theres no solution.

#2

Keep track of your calculation

Example 1
A group of 23 highly-talented actuary students in a large insurance company are taking
SOA Exam C at the next exam sitting. The probability for each candidate to pass Course
2 is 0.73, independent of other students passing or failing the exam. The company
promises to give each actuary student who passes Exam C a raise of $2,500. Whats the
probability that the insurance company will spend at least $50,000 on raises associated
with passing Exam C?
Solution
If the company spends at least $50,000 on exam-related raises, then the number of
students who will pass Exam C must be at least 50,000/2,500=20. So we need to find the
probability of having at least 20 students pass Exam C.
Let X = the number of students who will pass Exam C. The problem does not specify the
distribution of X . So possibly X has a binomial distribution. Lets check the conditions
for a binominal distribution:

Guo Fall 2009 C, Page 10 / 284

There are only two outcomes for each student taking the exam either Pass or
Fail.
The probability of Pass (0.73) or Not Pass (0.27) remains constant from one
student to another.
The exam result of one student does not affect that of another student.

X satisfies the requirements of a binomial random variable with parameters n =23 and
p =0.73. We also need to find the probability of x 20 .

Pr(x

20) = Pr(x = 20) + Pr(x = 21) + Pr(x = 22) + Pr(x = 23)

Applying the formula f X ( x ) = Cnx p x (1 p )

n x

f (x

, we have:

20)

20
21
22
23
= C 23
(.73)20 (.27)3 + C 23
(0.73)21(.27)2 + C 23
(.73)22 (.27) + C 23
(.73)23 = .09608

Therefore, there is a 9.6% of chance that the company will have to spend at least $50,000
to pay for exam-related raises.
Calculator key sequence for BA II Plus:
Method #1 direct calculation with out using memories
Procedure
Set to display 8 decimal places (4
decimal places are sufficient, but
assume you want to see more
decimals)
Set AOS (Algebraic operating system)

Keystroke

Display
DEC=8.00000000

2nd Format 8 Enter

2nd [FORMAT],
keep pressing multiple
times until you see Chn.
Press 2nd [ENTER]
(if you see AOS, your
calculator is already in AOS,
in which case press
[CLR Work] )
Calculate

AOS

20
C 23
(.73)20 (.27)3

Calculate

20
C23

23 2

nd

Cr

1,771.000000
20

20

3.27096399

Calculate (.73)

.73
Calculate

(.27)3

20
0.064328238

.27

3+

Guo Fall 2009 C, Page 11 / 284

Calculate

21
C23
(0.73)21(.27)2

Calculate

21
C23

23 2

nd

nC r

253.0000000
21

21

0.34111482

Calculate (.73)

.73
Calculate

(.27)2

21
0.08924965

.27
Calculate

x +

22
C23
(0.73)22 (.27)

Calculate

22
C23

23 2

nd

Cr

23.00000000
22

22

0.02263762

Calculate (.73)

.73
Calculate
Calculate

(.27)

22
0.09536181

.27 +

23
C23
(0.73)23

Calculate

23

Calculate (.73)

23
C23

and get the final


result

23 2

nd

Cr

1.00000000
23
0.09608031

.73

23

Method 2: Store intermediate values in your calculators memories


Procedure
Set to display 8 decimal places (4
decimal places are sufficient, but
assume you want to see more
decimals)
Set AOS (Algebraic operating system)

Keystroke

Display
DEC=8.00000000

2nd Format 8 Enter

2nd [FORMAT],
keep pressing multiple
times until you see Chn.
Press 2nd [ENTER]
(if you see AOS, your
calculator is already in AOS,
in which case press
[CLR Work] )
2nd MEM
2nd CLR Work
CE/C

Clear memories
Get back to calculation mode
Calculate

20
23

(.73 ) (.27 )
20

AOS

M0=0.00000000
0.00000000

and

Guo Fall 2009 C, Page 12 / 284

store it in Memory 1
Calculate

Calculate

20
C23

( 0.73 )

Calculate

Cr

1,771.000000
20

20

3.27096399
.73

(.27 )

20

Store the result in Memory 0


Get back to calculation mode
Calculate

23 2

nd

0.064328238
.27

3=

STO 0
CE/C

0.064328238
0.00000000

21
C23
( 0.73 ) (.27)2
21

and store it in Memory 2


Calculate

21
C23

nd

Cr

253.0000000
21

21

0.34111482

Calculate (.73)

.73
Calculate

(.27)2

Calculate

21
0.07290000

.27
Store the result in Memory 1
Get back to calculation mode

STO 1
CE/C

0.02486727
0.00000000

22
C23
(0.73)22 (.27) and

store it in Memory 3
Calculate

22
C23

23 2

nd

Cr

23.00000000
22

22

0.02263762

Calculate (.73)

.73
Calculate

(.27)

Store the result in Memory 2


Calculate

23
23
23 (0.73) and

22
0.00611216

.27 =
STO 2

0.09536181

store it

in Memory 4
Calculate

23

Calculate (.73)

23
C23

and get the final


result

Store the result in Memory 3


Recall values stored in Memory 1,2,3,
and 4. Sum them up.

23 2

nd

Cr

1.00000000
23
0.00071850

.73

STO 3

23

=
0.00071850

Guo Fall 2009 C, Page 13 / 284

RCL 0
+ RCL 1
+ RCL 2
+ RCL 3 =

0.064328238
0.02486727
0.00611216
0.09608031

Comparing Method #1 with Method #2:


Method #1 is quicker but more risky. Because you dont have an audit history, if you
miscalculate one item, youll need to recalculate everything again from scratch.
Method #2 is slower but leaves a good auditing trail by storing all your intermediate
values in your calculators memories. If you miscalculate one item, you need to
recalculate that item alone and reuse the result of other calculations (which are correct).
20
For example, instead of calculating C 23
(.73 )

20

(.27 )

as you should, you calculated

20
C 23
(.73 ) (.27 ) . To correct this error under method #1, you have to start from scratch
and calculate each of the following four items:
3

20
C 23
(.73 )

20

20

3
21
21
22
23
(0.73)22 (.27) , and C 23
(0.73)23
(.27 ) , C23
( 0.73 ) (.27)2 , C23

In contrast, correcting this error under Method #2 is lot easier. You just need to
20
3
20
recalculate C 23
(.73 ) (.27 ) ; you dont need to recalculate any of the following three
items:
21
22
23
C23
(0.73)22 (.27) , and C 23
(0.73)23
( 0.73 ) (.27)2 , C23
21

You can easily retrieve the above three items from your calculators memories and
calculate the final result:
20
21
22
23
C 23
(.73)20 (.27)3 + C 23
(0.73)21(.27)2 + C 23
(.73)22 (.27) + C 23
(.73)23 = .09608

Example 2 (a reserve example for Exam C)


Given:
9,617,802
l20
l30

9,501,381

l50

8,950,901

A50

0.24905

a20

16.5133

a30

15.8561

a50

13.2668

Interest rate

6%
Guo Fall 2009 C, Page 14 / 284

a20

Calculate V =

l50
A50 v 20
l30
a20

l30 10
v a30
l20
l50 30
v a50
l20

Solution
This calculation is complex. Unless you use a systematic method, youll make mistakes.
Calculation steps using BA II Plus/BA II Plus Professional

Step 1 Simplify calculations

a20

l50
A50 v 20
l30
a20
v = 1.06

l30 10
v a30
l20
= A50 v 20
l50 30
v a50
l20

1
a20
l30

l30 10
v a30
l20

1
a20
l50

l50 30
v a50
l20

1
a20
20 l30
= A50 v
1
a20
l50

1 10
v a30
l20
1 30
v a50
l20

V = A501.06

20

a20

a30

l30

l20

a20

a50

l50

l20

1.06
1.06

10

30

Make sure you dont make mistakes in simplification. If you are afraid of making
mistakes, dont simplify and just do your calculations using the original equation:
a20

V=

l50
A50 v 20
l30
a20

l30 10
v a30
l20
l50 30
v a50
l20

Step 2 Assign a memory to each input in the formula above

Input

Memory

Value

l20

M0

9,617,802

l30

M1

9,501,381
Guo Fall 2009 C, Page 15 / 284

l50

M2

8,950,901

A50

M3

0.24905

a20

M4

16.5133

a30

M5

15.8561

a50

M6

13.2668

After you assign a memory to each input, the formula becomes:

V = A501.06

20

a20

a30

l30

l20

a20

a50

l50

l20

1.06
1.06

10

= ( M 3)1.06
30

20

M4 M5
1.06 10
M1 M 0
M4 M6
1.06 30
M2 M0

Calculator key sequence to assign memories to the inputs:


Procedure
Set to display 8 decimal
places
Set AOS (Algebraic
operating system)

Keystroke

Display
DEC=8.00000000

2nd Format 8 Enter


2nd [FORMAT],
keep pressing multiple
times until you see Chn.
Press 2nd [ENTER]
(if you see AOS, your
calculator is already in AOS,
in which case press
[CLR Work] )

Clear existing numbers


from the memories

AOS

M0=0.00000000
2

nd

nd

MEM 2 CLR Work

Enter 9,617,802 in M0

M0=9,617,802.000
9,617,802 Enter

Move to the next memory

M1=0.00000000

Enter 9,501,381 in M1

M1=9,501,381.000
9,501,381 Enter

Move to the next memory

M2=0.00000000

Guo Fall 2009 C, Page 16 / 284

Enter 8,950,901 in M2

M2=8,950,901.000
8,950,901 Enter

Move to the next memory

M3=0.00000000

Enter 0.24905 in M3

M3=0.24905000
0.24905 Enter

Move to the next memory

M4=0.00000000

Enter 16.5133 in M4

M4=16.51330000
16.5133 Enter

Move to the next memory

M5=0.00000000

Enter 15.8561 in M5

M5=15.85610000
15.8561 Enter

Move to the next memory

M6=0.00000000

Enter 13.2668 in M6

M6=13.26680000
13.2668 Enter

Leave the memory


workbook and get back to
the normal calculation
mode

CE/C
This is the button on the
bottom left corner. This is the
same button for
CLR Work

Step 3 Double check data entry.


Dont bypass this step; its easy to enter a wrong data.
Keystrokes: press 2nd MEM. Then keep pressing the down-arrow key to view all the
data you entered in the memories. Make sure all the correct numbers are entered.
Step 4 Do the final calculation.

V = ( M 3)1.06

20

M4 M5
1.06 10
M1 M 0
M4 M6
1.06 30
M2 M0
Guo Fall 2009 C, Page 17 / 284

Well break down the calculation into two pieces:


M4
M1

M5
1.06
M0

M4
M2

M6
1.06
M0

10

= M 7 (store the result in M7)

30

= M 8 (store the result in M8)

V = ( M 3)1.06

Procedure
Calculate
M4 M5
1.06
M1 M 0

10

20

M7
M8

Keystroke

Display

Recall 4 Recall 1 - Recall 5


Recall 0

0.00000082

1.06 y x 10 +/- =
Store the result in M7.
Go back to the normal
calculation mode.

STO 7 CE/C
0.00000160

Calculate
M4 M6
1.06
M2 M0

30

Recall 4 Recall 1 - Recall 5


Recall 0
1.06 y x 10 +/- =

Store the result in M8.


Go back to the normal
calculation mode.

STO 8 CE/C
0.0399556010
x

Calculate
V = ( M 3)1.06

20

M7
M8

Recall 3 1.06 y
Recall 7 Recall 8

20 +/-

So V = 0.0399556 0.04
Though this calculation process looks long, once you get used to it, you can do it in less
than one minute.
Advantages of this calculation process:

Guo Fall 2009 C, Page 18 / 284

Inputs are entered only once. In this problem, l20 and a20 are used twice in the
formula V = A501.06

20

a20

a30

l30

l20

a20

a50

1.06

10

. However, we enter l20 and a20 into


30

1.06
l50
l20
memories only once. This reduces data entry error.

This process gives us a good auditing trail, enabling us to check the data entry and
calculations.

We can isolate errors. For example, if a wrong value of l30 is entered into the
a20 a30
memory, we can reenter l30 , recalculate
1.06 10 , and store the calculate
l30
l20
M7
value into M7. Next, we recalculate V = ( M 3)1.06 20
.
M8

Bottom line: I recommend that you master this calculation method. It costs you extra
work, but it enables you to do messy calculations 100% right in the exam.
When exams get tough and calculations get messy, many candidates who know as much
as you do will make calculations errors here and there and fail the exam. In contrast,
youll stand above the crowd and make no errors, passing another exam.
Problem 3 (Reserve example revised)
In Example 2, you calculated that V = 0.04 . However, none of the answer choices given
is 0.04. Suspecting that you made an error in calculations, you decided to redo the
calculation. First, you scrolled over the memories and gladly you found no error in data
M4 M5
M4 M6
entry. Next, you recalculated
1.06 10 = M 7 and
1.06 30 = M 8 .
M1 M 0
M2 M0
Once again, you found your previous calculations were right. Finally, you recalculated
M7
V = ( M 3)1.06 20
. Once again, you got V = 0.04 .
M8
You already spent four minutes in this problem. You decided to spend two more minutes
on this problem. If you couldnt figure out the right answer, you just had to give it up and
move on to the next problem.
So you quickly read the problem again. Oops! You found that your formula was wrong.
Your original formula was:

Guo Fall 2009 C, Page 19 / 284

a20

V=

l50
A50 v 20
l30
a20

l30 10
v a30
l20
l50 30
v a50
l20

The correct formula should be:


a20

V=

l50
a50 v 20
l30
a20

l30 10
v a30
l20
l50 30
v a50
l20

How could you find the answer quickly, using the correct formula?
Solution
The situation described here sometimes happens in the actual exam. If you dont use a
systematic method to do calculations, you wont leave a good auditing trail. In that case,
all your previous calculations are gone and you have to redo calculations from scratch.
This is awful.
Fortunately, you left a good auditing trail and correcting errors was easy.
Your previous formula after assigning memories to inputs:
a20

V=

l50
A50 v 20
l30
a20

l30 10
v a30
l20
= ( M 3)1.06
l50 30
v a50
l20

20

M7
M8

The correct formula is:


a20

V=

l50
a50 v 20
l30
a20

l30 10
v a30
l20
= ( M 6 )1.06
l50 30
v a50
l 20

20

M7
M8

Remember a50 = M 6
You simply reuse M7 and M8 and calculate
V = ( M 6 )1.06

20

M7
= 2.10713362
M8

2.11
Guo Fall 2009 C, Page 20 / 284

Now you look at the answer choices again. Good. 2.11 is there!
#3

Calculate mean and variance of a discrete random variable

There are two approaches:


Use TI-30 IIS (using the redo capability of TI-30IIS)
Use BA II Plus/BA II Plus Professional 1-V Statistics Worksheet
Exam #1
(#8 Course 1 May 2000) A probability distribution of the claim sizes for
an auto insurance policy is given in the table below:
Claim Size
20
30
40
50
60
70
80

Probability
0.15
0.10
0.05
0.20
0.10
0.10
0.30

What percentage of the claims are within one standard deviation of the mean claim size?
(A) 45%, (B) 55%, (C) 68%, (D) 85%, (E)100%

Solution
This problem is conceptually easy but calculation-intensive. It is easy to make calculation
errors. Always let the calculator do all the calculations for you.
One critical thing to remember about the BA II Plus and BA II Plus Professional
Statistics Worksheet is that you cannot directly enter the probability mass function f ( x i )
into the calculator to find E ( X ) and Var ( X ) . BA II Plus and BA II Plus Professional 1V Statistics Worksheet accepts only scaled-up probabilities that are positive integers. If
you enter a non-integer value to the statistics worksheet, you will get an error when
attempting to retrieve E ( X ) and Var ( X ) .
To overcome this constraint, first scale up f ( x i ) to an integer by multiplying f ( x i ) by a
common integer.

Guo Fall 2009 C, Page 21 / 284

Claim Size x
20
30
40
50
60
70
80
Total

Probability Pr(x )
0.15
0.10
0.05
0.20
0.10
0.10
0.30
1.00

Scaled-up probability =100 Pr(x )


15
10
5
20
10
10
30
100

Next, enter the 7 data pairs of (claim size and scaled-up probability) into the BA II Plus
Statistics Worksheet to get E ( X ) and X .
BA II Plus and BA II Plus Professional calculator key sequences:
Procedure
Keystrokes
Display
Set the calculator to display
4 decimal places
2nd [FORMAT] 4 ENTER DEC=4.0000
Set AOS (Algebraic
operating system)

2nd [FORMAT],
keep pressing multiple
times until you see Chn.
Press 2nd [ENTER]

AOS

(if you see AOS, your


calculator is already in
AOS, in which case press
[CLR Work] )
Select data entry portion of
Statistics worksheet
Clear worksheet
Enter data set

2nd [Data]

X01 (old contents)

2nd [CLR Work]

X01 0.0000

20 ENTER

X01=20.0000
Y01=15.0000

15 ENTER
30 ENTER

X02=30.0000
Y02=10.0000

10 ENTER
40 ENTER

X03=40.0000
Y03=5.0000

5 ENTER
Guo Fall 2009 C, Page 22 / 284

50 ENTER
20 ENTER
60 ENTER

X04=50.0000
Y04=20.0000
X05=60.0000
Y05=10.0000

10 ENTER
70 ENTER

X06=70.0000
Y06=10.0000

10 ENTER
80 ENTER

X07=80.0000
Y07=30.0000

30 ENTER
Select statistical calculation
portion of Statistics
worksheet
Select one-variable
calculation method
View the sum of the scaledup probabilities

View mean
View sample standard
deviation

View standard deviation


View

2nd [Stat]

Old content

Keep pressing 2nd SET


until you see 1-V

1-V
n=100.0000 (Make sure the
sum of the scaled-up
probabilities is equal to the
scaled-up common factor,
which in this problem is
100. If n is not equal to the
common factor, youve
made a data entry error.)
x =55.0000
S x =21.9043 (this is a
sample standard deviation-- dont use this value). Note
that
1 n
Sx =
(X i X )2
n 1 i =1
X =21.7945
X =5,500.0000 (not

needed for this problem)


View

X2

X 2 =350,000.0000 (not
Guo Fall 2009 C, Page 23 / 284

needed for this problem,


though this function might
be useful for other
calculations)
You should always double check (using
to scroll up or down the data pairs of X and
Y) that your data entry is correct before accepting E ( X ) and X generated by BA II
Plus.
If you have made an error in data entry, you can 2nd DEL to delete a data pair (X, Y) or
2nd INS to insert a data pair (X,Y). If you typed a wrong number, you can use to delete
the wrong number and then re-enter the correct number. Refer to the BA II Plus
guidebook for details on how to correct data entry errors.
If this procedure of calculating E ( X ) and X seems more time-consuming than the
formula-driven approach, it could be because you are not familiar with the BA II Plus
Statistics Worksheet yet. With practice, you will find that using the calculator is quicker
than manually calculating with formulas.
Then, we have
(X

, X +

) = (55 21.7945,

55 + 21.7945)

=(33.21, 76.79)
Finally, you find
Pr(33.21

76.79) = Pr( X = 40) + Pr( X = 50) + Pr( X = 60) + Pr( X = 70)


=0.05+0.20+0.10+0.10 = 0.45

Using TI-30X IIS


First, calculate E ( X ) using E ( X ) =
xf (x ) to

xf (x ) . Then modify the formula

x 2 f (x ) to calculate Var(X) without re-entering f (x ) .

To find E ( X ) , we type:
20*.15+30*.1+40*.05+50*.2+60*.1+70*.1+80*.3
Then press Enter. E ( X ) =55.
Next we modify the formula
20 .15+30 .1+40 .05+50 .2+60 .1+70 .1+80 .3
Guo Fall 2009 C, Page 24 / 284

to
20 2 .15+30 2 .1+40 2 .05+50 2 .2+60 2 .1+70 2 .1+80 2 .3
To change 20 to 20 2 , move the cursor immediately to the right of the number 20 so
your cursor is blinking on top of the multiplication sign . Press 2nd INS x 2 .
You find that
20 2 .15+30 2 .1+40 2 .05+50 2 .2+60 2 .1+70 2 .1+80 2 .3
=3500
So E ( X 2 ) =3,500
Var ( X ) = E ( X 2 ) E 2 ( X ) =3,500- 552 =475.
Finally, you can calculate

and the range of ( X

, X +

).

Keep in mind that you can enter up to 88 digits for a formula in TI-30X IIS. If your
formula exceeds 88 digits, TI 30X IIS will ignore the digits entered after the 88th digit.

Example 2

(#19, Course 1 November 2001)

A baseball team has scheduled its opening game for April 1. If it rains on April 1, the
game is postponed and will be played on the next day that it does not rain. The team
purchases insurance against rain. The policy will pay 1,000 for each day, up to 2 days,
that the opening game is postponed. The insurance company determines that the number
of consecutive days of rain beginning on April 1 is a Poisson random variable with a 0.6
mean. What is the standard deviation of the amount the insurance company will have to
pay?
(A) 668, (B) 699, (C) 775, (D) 817, (E) 904

Solution
Let N =# of days it rains consecutively. N can be 0,1,2, or any non-negative integer.

Pr(N = n ) = e

n!

=e

0.6

0.6 n

n!

(n =0,1,2,..+ )

Guo Fall 2009 C, Page 25 / 284

Let X = payment by the insurance company. According to the insurance contract, if there
is no rain (n=0), X=0. If it rains for only 1 day, X=$1,000. If it rains for two or more days
in a row, X is always $2,000. We are asked to calculate X .
If a problem asks you to calculate the mean, standard deviation, or other statistics of a
discrete random variable, it is always a good idea to list the variables values and their
corresponding probabilities in a table before doing the calculation to organize your data.
So lets list the data pair ( X , probability) in a table:
Payment X

Probability of receiving X

Pr(N = 0) = e

1,000

Pr(N = 1) = e

2,000
Pr(N

0.6

0.6 0

=e

0!
0.6

0.61

1!

0.6

= 0.6e

0.6

2) = Pr(N = 2) + Pr(N = 3) + ...


=1-[ Pr(N = 0) + Pr(N = 1)]
=1-1.6e

0.6

Once you set up the table above, you can use BA II Pluss Statistics Worksheet or TI-30
IIS to find the mean and variance.
Calculation Method 1 --- Using TI-30X IIS
First we calculate the mean by typing:
1000*.6e^(-.6)+2000(1-1.6e^(-.6
When typing e^(-.6) for e 0.6 , you need to use the negative sign, not the minus sign, to
get -6. If you type the minus sign in e^( .6), you will get an error message.
Additionally, for 0.6 e 0.6 , you do not need to type 0.6*e^(-.6), just type .6e^(-.6). Also,
to calculate 2000(1 1.6e .6 ) , you do not need to type 2000*(1-1.6*(e^(-.6))). Simply
type
2000(1-1.6e^(-.6
Your calculator understands you are trying to calculate 2000(1 1.6e .6 ) . However, the
omission of the parenthesis sign works only for the last item in your formula. In other
words, if your equation is
2000(1 1.6e

.6

) + 1000 .6e

.6

Guo Fall 2009 C, Page 26 / 284

you have to type the first item in its full parenthesis, but can skip typing the closing
parenthesis in the 2nd item:
2000(1-1.6e^(-.6)) + 1000*.6e^(-.6
If you type
2000(1-1.6e^(-.6 + 1000*.6e^(-.6
your calculator will interpret this as
2000(1-1.6e^(-.6 + 1000*.6e^(-.6) ) )
Of course, this is not your intention.
Lets come back to the calculation. After you type
1000*.6e^(-.6)+2000(1-1.6e^(-.6
press ENTER. You should get E ( X ) = 573.0897. This is an intermediate value. You
can store it on your scrap paper or in one of your calculators memories.
Next, modify your formula to get E (x 2 ) by typing:
1000 2 .6e ^ ( .6) + 2000 2 (1 1.6 ^ ( .6
You will get 816892.5107. This is E (x 2 ) . Next, calculate Var ( X )
Var (X ) = E (x 2 ) E 2 (x ) =488460.6535
X

= Var (x ) = 698.9960 .

Calculation Method 2 --Using BA II Plus/ BA II Plus Professional


First, please note that you can always calculate

without using the BA II Plus built-in

Statistics Worksheet. You can calculate E (X ), E (X 2 ),Var (X ) in BA II Plus as you do


any other calculations without using the built-in worksheet.
In this problem, the equations used to calculate

E (x ) = 0 * e

.6

+ 1,000(.6e

.6

are:

) + 2,000(1 1.6e

.6

Guo Fall 2009 C, Page 27 / 284

E (x 2 ) = 02 e

.6

+ 1,0002 .6e

Var (x ) = E (x 2 ) E 2 (x ),

.6

+ 2,0002 (1 1.6e

.6

= Var (x )

You simply calculate each item in the above equations with BA II Plus. This will give
you the required standard deviation.
However, we do not want to do this hard-core calculation in an exam. BA II Plus already
has a built-in statistics worksheet and we should utilize it.
The key to using the BA II Plus Statistics Worksheet is to scale up the probabilities to
integers. To scale the three probabilities:

(e

.6

, 0.6e

.6

, 1 1.6e

.6

is a bit challenging, but there is a way:


Payment X

0
1,000
2,000
Total

Probability (assuming you set


your BA II Plus to display 4
decimal places)
e 0.6 =
0.5488
0.6
0.6e
= 0.3293
0.6
1-1.6e
=0.1219
1.0

Scale up probability to integer


(multiply the original probability
by 10,000)
5,488
3,293
1,219
10,000

Then we just enter the following data pairs into BA II Pluss statistics worksheet:
X01=0
X02=1,000
X03=2,000

Y01=5,488;
Y02=3,293;
Y03=1,219.

Then the calculator will give you

= 698.8966

Make sure your calculator gives you n that matches the sum of the scaled-up
probabilities. In this problem, the sum of your scaled-up probabilities is 10,000, so you
should get n=10,000. If your calculator gives you n that is not 10,000, you know that at
least one of the scaled-up probabilities is wrong.
Of course, you can scale up the probabilities with better precision (more closely
resembling the original probabilities). For example, you can scale them up this way
(assuming you set your calculator to display 8 decimal places):

Guo Fall 2009 C, Page 28 / 284

Payment X

Probability

0
1,000
2,000
Total

e 0.6 = 0.54881164
0.6e 0.6 = 0.32928698
1-1.6e 0.6 =0.12190138

Scale up probability to integer


more precisely (multiply the
original probability by
100,000,000)
54,881,164
32,928,698
12,190,138
100,000,000

Then we just enter the following data pairs into BA II Pluss statistics worksheet:
X01=0
X02=1,000
X03=2,000

Y01=54,881,164;
Y02=32,928,698;
Y03=12,190,138.

Then the calculator will give you


n=100,000,000)

=698.8995993 (remember to check that

For exam problems, scaling up the original probabilities by multiplying them by 10,000
is good enough to give you the correct answer. Under exam conditions it is unnecessary
to scale the probability up by multiplying by 100,000,000.
#4

Calculate the sample variance

May 2000 #33


The number of claims a driver has during the year is assumed to be Poisson distributed
with an unknown mean that varies by driver.
The experience for 100 drivers is as follows:
# of claims during the year
0
1
2
3
4
Total

# of drivers
54
33
10
2
1
100

Determine the credibility of one years experience for a single driver using
semiparametric empirical Bayes estimation.
Solution

Guo Fall 2009 C, Page 29 / 284

For now dont worry about credibility and focus on calculating the sample mean and
sample variance.
Standard calculation not using 1-V Statistics Worksheet
Let X represent the # of claims in a year, then

=X =

54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63
=
= 0.63
54 + 33 + 10 + 2 + 1
100

Var ( X ) =

n 1 i =1

Xi

1 100
Xi
100 1 i =1

54 ( 0 .63) + 33 ( 0 .63) + 54 (1 .63) + 10 ( 2 .63) + 2 ( 3 .63) + 1( 4 .63)


2

100 1

=0.68

Use 1-V Statistics Worksheet:


Enter
X01=0, Y01=54
X02=1, Y02=33
X03=2, Y03=10
X04=3, Y04=2
X05=4, Y05=1
You should get:
X = 0.63
S X = 0.82455988 (this is the unbiased sample standard deviation)

While your calculator displays S X = 0.82455988 , press the x 2 key of your calculator.
You should get: 0.67989899. This is Var ( X ) = S X2 . So Var ( X ) = 0.67989899 0.68
#5

Find the conditional mean and conditional variance

Example

For an insurance:
A policyholders annual losses can be 100, 200, 300, and 400 with respective
probabilities 0.1, 0.2, 0.3, and 0.4.

Guo Fall 2009 C, Page 30 / 284

The insurance has a annual deductible of $250 per loss.


Calculate the mean and the variance of the annual payment made by the insurer to the
policyholder, given theres a payment.
Solution
Let X represent the annual loss. Let Y represent the claim payment by the insurer to the
policyholder.
Then Y =

0
X

if X 250
250 if X > 250

We are asked to find E (Y X > 250 ) and Var ( Y X > 250 )


Standard solution
100
0
0.1

X
Y
P(X )

200
0
0.2

300
50
0.3

400
150
0.4

P ( X > 250 ) = P ( X = 300 ) + P ( X = 400 ) = 0.3 + 0.4 = 0.7


P(X )

0.1
0.7

P ( X > 250 )
E(X
E

(X
"

0.2
0.7

0.3
0.7

0.4
0.7

250 X > 250 ) = 0


150 ) +

1
2
3
4
+0
+ 50
+ 150
= 107.1428571
7
7
7
7
1
2
3
4
X > 150 ! = 0 2
+ 02
+ 502
+ 150 2
= 13, 928.57143
#
7
7
7
7

Var "( X 150 )+ X > 150 !# = 13,928.57143 107.14285712 = 2, 448.99


Fast solution using BA II Plus/BA II Plus Professional 1-V Statistics Worksheet
100
200
300
400
X
Y >250?
No. Discard No. Discard. Yes. Keep.
Yes. Keep.
If Yes, Keep; if No,
discard.
New table after discarding X
X
Y

P(X )

250 :
300
50

400
150

0.3

0.4
Guo Fall 2009 C, Page 31 / 284

10P ( X ) -- scaled up probability

Enter the following into 1-V Statistics Worksheet:


X01=50, Y01=3;

X02=150, Y02=4

BA II Plus or BA II Plus Professional should give you:


n = 7,
Var =

X = 107.14,

= 49.48716593

= 2, 4489.98

This is how BA II Plus/Professional 1-V Statistics Worksheet works. After you enter
X01=50, Y01=3,X02=150, Y02=4, BA II Plus/Professional knows that your random
variable X takes on two values: 50 (with frequency of 3) and 150 (with frequency 4).
Next, BA II Plus/Professional sets up the following table for statistics calculation:
3
3
=
3+ 4 7
X=
$150 with probability 4 = 4
$
3+ 4 7
$$50

with probability

Then, BA II Plus/Professional calculates the mean and variance:


E ( X ) = 50

3
4
,
+ 150
7
7

E ( X 2 ) = 502

3
4
,
+ 1502
7
7

Var ( X ) = E ( X 2 ) E 2 ( X )

Compare BA II Plus/Professional calculation with our manual calculation presented


earlier:
E(X

"

(X

250 X > 250 ) = 0

1
2
3
4
+0
+ 50
+ 150
7
7
7
7

1
2
3
4
2
150 ) + X > 150 ! = 02
+ 02
+ 502
+ 150 2
#
7
7
7
7
Guo Fall 2009 C, Page 32 / 284

Var "( X 150 )+ X > 150 !# = 13,928.57143 107.14285712 = 2, 448.99


Now you see that BA II/Professional correctly calculates the mean and variance.
In BA II Plus/Professional 1-V Statistics Worksheet, whats important is the relative data
frequency, not the absolute data frequency.
The following entries produce identical mean, sample mean, and variance:
Entry One:
X01=50, Y01=3;
Entry Two: X01=50, Y01=6,
Entry Three: X01=50, Y01=30,

X02=150, Y02=4,
X02=150, Y02=8,
X02=150, Y02=40,

In each entry, BA II Plus/Professional produces the following table for calculation:

X=

$$50
$150
$

3
7
4
with probability
7

with probability

General procedure to calculate E "Y ( x ) x > a !# using BA II Plus and BA II Plus


Professional 1-V Statistics Worksheet:
Throw away all the data pairs (Yi , X i ) where the condition X > a is NOT met.

Using the remaining data pairs to calculate E (Y ) and Var (Y ) .


General procedure to calculate E "Y ( x ) x < a !# using BA II Plus and BA II Plus
Professional 1-V Statistics Worksheet:
Throw away all the data pairs (Yi , X i ) where the condition X < a is NOT met.
Using the remaining data pairs to calculate E (Y ) and Var (Y ) .
Example

You are given the following information (where k is a constant)


X =x

pX ( x )

0.5

4
0.54 ) k
(
6

0.25

1
0.253 ) ( 0.75 ) k
(
6
Guo Fall 2009 C, Page 33 / 284

1
0.753 ) ( 0.25 ) k
(
6

0.75

Calculate E ( X ) using BA II Plus shortcut.


Solution
Please note that you dont need to calculate k .

pX ( x )

X =x

Scaled p X ( x ) up multiply

p X ( x ) by
4
0.54 ) k = 0.041667 k
(
6

0.5
0.25
0.75

1, 000, 000
k

41,667

1
0.253 ) ( 0.75 ) k = 0.001953 k
(
6
1
0.753 ) ( 0.25 ) k = 0.017578 k
(
6

1,953
17,578

Next, we enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=0.5, Y01=41,667
X02=0.25, Y02= 1,953
X03=0.75, Y03=17,578
You should get: n = 61,198 , X = 0.56382970 . So E ( X ) = 0.56382970
Exam C Nov 2002 #29
You are given the following joint distribution:

&
X
0
1
2

0
0.4
0.1
0.1

1
0.1
0.2
0.1

For a given value of ' and a sample of size 10 for X :

10
i =1

X i = 10

Determine the Bhlmann credibility premium.


Solution
Guo Fall 2009 C, Page 34 / 284

Dont worry about the Bhlmann credibility premium for now. All you need to do right
now is to calculate the following 7 items:
E ( X ' = 0 ) , Var ( X ' = 0 ) , E ( X ' = 1) , Var ( X ' = 1) ,
E " E ( X ' ) !# , Var " E ( X ' ) !# , E "Var ( X ' ) !#

First, lets calculate E ( X ' = 0 ) , Var ( X ' = 0 )

X '=0

P ( X ' = 0)

10 P ( X ' = 0 )

0
1
2

0.4
0.1
0.1

4
1
1

Enter the following into 1-V Statistics Worksheet:


X01=0, Y01=4;

X01=1, Y02=1;

X03=2, Y03=1

BA II Plus or BA II Plus Professional should give you:


n = 6,

X = 0.5,

= 0.76376262

Var =

E ( X ' = 0 ) = 0.5 , Var ( X ' = 0 ) = 0.58333333 =

= 0.58333333 =

7
12

7
12

Next, lets calculate E ( X ' = 1) , Var ( X ' = 1)


X ' =1

P ( X ' = 1)

10 P ( X ' = 1)

0
1
2

0.1
0.2
0.1

1
2
1

Enter the following into 1-V Statistics Worksheet:


X01=0, Y01=1;

X01=1, Y02=2;

X03=2, Y03=1

BA II Plus or BA II Plus Professional should give you:


n = 4,

X =1

= 0.70710678

Var =

= 0.707106782 = 0.5

E ( X ' = 1) = 1 , Var ( X ' = 1) = 0.5


Guo Fall 2009 C, Page 35 / 284

Next, lets calculate E " E ( X ' ) !# and Var " E ( X ' ) !# .


E ( X ' = 0 ) = 0.5

P ( ' = 0 ) = 0.4 + 0.1 + 0.1 = 0.6

10 P ( ' = 0 ) = 6

E ( X ' = 1) = 1

P ( ' = 0 ) = 0.1 + 0.2 + 0.1 = 0.4

10 P ( ' = 1) = 4

Enter the following into 1-V Statistics Worksheet:


X01=0.5, Y01=6;

X01=1, Y02=4

BA II Plus or BA II Plus Professional should give you:


n = 10,

X = 0.7,

= 0.24494897

Var =

= 0.24494897 2 = 0.06

E " E ( X ' ) !# = 0.7 , Var " E ( X ' ) !# = 0.06

Finally, lets calculate E "Var ( X ' ) !# .


Var ( X ' = 0 ) =

7
12
Var ( X ' = 1) = 0.5

P ( ' = 0 ) = 0.4 + 0.1 + 0.1 = 0.6

10 P ( ' = 0 ) = 6

P ( ' = 0 ) = 0.1 + 0.2 + 0.1 = 0.4

10 P ( ' = 1) = 4

Enter the following into 1-V Statistics Worksheet:


X01=

7
, Y01=6;
12

X01=0.5, Y02=4

BA II Plus or BA II Plus Professional should give you:


n = 10,

X = 0.55,

= 0.04085483

E "Var ( X ' ) !# = 0.55

#6

Do the least squares regression

One useful yet neglected feature of BA II Plus/BA II Plus Professional is the linear least
squares regression functionality. This feature can help you quickly solve a tricky problem
with a few simple key strokes. Unfortunately, 99.9% of the exam candidates dont know
of this feature. Even SOA doesnt know.

Guo Fall 2009 C, Page 36 / 284

Let me quickly walk through the basic formula behind the linear least squares regression.
This part is also explained in the chapter on the Bhlmann credibility premium. So I will
just repeat what I said in that chapter.
In a regression analysis, you try to fit a line (or a function) through a set of points. With
least squares regression, you want to get a better fit by minimizing the distance squared
of each point to the fitted line. You then use the fitted line to project where the data point
is most likely to be.
Say you want to find out how ones income level affects how much life insurance he
buys. Let X represent ones income. Let Y represent the amount of life insurance this
person buys. You have collected some data pairs of ( X , Y ) from a group of consumers.
You suspect theres a linear relationship between X and Y . So you want to predict
Y using the function a + bX , where a and b are constant. With least squares regression,
you want to minimize the following:

Q=E

"(

2
a + bX Y ) !
#

Next, well derive a and b .

(Q (
(
2
2
E ( a + bX Y ) ! = E
=
( a + bX Y ) !# )* = E " 2 ( a + bX Y ) !#
"
#
"
(a (a
(a
+
= 2 " E ( a + bX Y ) !# = 2 " a + bE ( X ) E (Y ) !#

Setting

(Q
= 0.
(a

a + bE ( X ) E (Y ) = 0

( Equation I )

(Q (
(
2
2
E ( a + bX Y ) ! = E
=
( a + bX Y ) #! )* = E " 2 ( a + bX Y ) X !#
"
#
"
(b (b
(b
+
= 2 E "( a + bX Y ) X #! = 2 " aE ( X ) + bE ( X 2 ) E ( X Y ) !#
Setting

(Q
= 0.
(b

aE ( X ) + bE ( X 2 ) E ( X Y ) = 0

( Equation II )

(Equation II ) - (Equation I ) E ( X ) :
b " E ( X 2 ) E 2 ( X ) !# = E ( X Y ) E ( X ) E (Y )

However, E ( X 2 ) E 2 ( X ) = Var ( X ) , E ( X Y ) E ( X ) E ( Y ) = Cov ( X , Y ) .


Guo Fall 2009 C, Page 37 / 284

b=

Cov ( X , Y )
Var ( X )

, a = E (Y ) bE ( X )

Where
Var ( X ) = E ( X 2 ) E 2 ( X ) , E ( X ) =

pi xi , E ( X 2 ) =

Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) , E ( X Y ) =

pi xi2

pi xi yi , E (Y ) =

pi yi

pi represents the probability that the data pair ( xi , yi ) occurs.


Example 1. For the following data pair ( xi , yi ) , find the linear least squares regression

line a + bX :
i

pi ( xi , yi )

1
2
3

13
13
13

X = xi

0
3
12

Y = yi

1
6
8

Also, calculate a + bX when X =0, 3, 12 respectively.


Solution
1
1
( 0 + 3 + 12 ) = 5 , E ( X 2 ) = ( 02 + 32 + 122 ) = 51
3
3
Var ( X ) = 51 52 = 26

E(X ) =

1
1
(1 + 6 + 8) = 15 , E ( X Y ) = ( 0 1 + 3 6 + 12 8 ) = 38
3
3
Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) = 38 5 5 = 13

E (Y ) =

b=

Cov ( X , Y )
Var ( X )

13
= 0.5 , a = E (Y ) bE ( X ) = 5 0.5 5 = 2.5
26

So the least squares regression linear is 2.5 + 0.5 X .


Next, well calculate a + bX when X =0, 3, 12.
If X =0, 2.5 + 0.5 X = 2.5 + 0.5 ( 0 ) = 2.5 ;
Guo Fall 2009 C, Page 38 / 284

If X =3, 2.5 + 0.5 X = 2.5 + 0.5 ( 3) = 4 ;

If X =12, 2.5 + 0.5 X = 2.5 + 0.5 (12 ) = 8.5 ;


Now you understand the linear least squares regression. Next, lets talk about how to use
BA II Plus/BA II Plus Professional to find a and b and calculate a + bX when X =0, 3,
12.
Example 2. For the following data pair ( xi , yi ) , find the linear least squares regression

line a + bX using BA II Plus/BA II Plus Professional.


i

pi ( xi , yi )

1
2
3

13
13
13

X = xi

Y = yi

0
3
12

1
6
8

Also, calculate a + bX when X =0, 3, 12 respectively.


Solution

In BA II Plus/Professional, the linear least squares regression functionality is called LIN.


The keystrokes to find a + bX using BA II Plus/Professional:
2nd Data (activate statistics worksheet)
2nd CLR Work (clear the old contents)
X01=0,
Y01=1
X02=3,
Y02=6
X03=12,
Y03=8
2nd STAT (keep pressing 2nd Enter, 2nd Enter, , until your calculator displays
LIN)
Press the down arrow key
Press the down arrow key
Press the down arrow key

, youll see n = 3
, youll see X = 5
, youll see S X = 6.24499800 (sample standard deviation)

Press the down arrow key

, youll see

Press the down arrow key


Press the down arrow key

, youll see Y = 5
, youll see S y = 3.60555128 (sample standard deviation)

Press the down arrow key

, youll see

Press the down arrow key


Press the down arrow key
Press the down arrow key

, youll see a = 2.5


, youll see b = 0.5
, youll see r = 0.8660254 ( the correlation coefficient)

= 5.09901951 (standard deviation)

= 2.94392029 (standard deviation)

Guo Fall 2009 C, Page 39 / 284

Press the down arrow key , youll see X ' = 0


Enter X ' = 0 ( To do this, press 0 Enter)
Press the down arrow key .
Press CPT. Youll get Y ' = 2.5 (this is a + bX when X =0)
Press the up arrow key , youll see X ' = 0
Enter X ' = 3 ( To do this, press 3 Enter)
Press the down arrow key .
Press CPT. Youll get Y ' = 4 (this is a + bX when X =3)
Press the up arrow key , youll see X ' = 3
Enter X ' = 12 ( To do this, press 12 Enter)
Press the down arrow key .
Press CPT. Youll get Y ' = 8.5 (this is a + bX when X =12)
You see that using BA II Plus/Professional LIN Statistics Worksheet, we get the same
result.
You might wonder why we didnt use the probability pi ( xi , yi ) . Here is an important
point. BA II Plus/Professional Statistics Worksheet (including LIN) cant directly handle
probabilities. To use Statistics Worksheet, you have to first convert the probabilities to
1
the # of occurrences. In this problem, pi ( xi , yi ) = for i =1,2, and 3. So we have 3 data
3
pairs of ( xi , yi ) and each data pair is equally likely to occur. So we arbitrarily let each
data pair to occur only once. This way, BA II Plus/Professional knows that each of the
three data pairs has 1 3 chance of occurring. Later I will show you how to use LIN when

pi ( xi , yi ) is not uniform.
Some of you might complain: I can easily use my pen and find the answers. Why do I
need to bother using LIN? There are several reasons why you might want to use LIN to
find the regression line a + bX and calculate various values of a + bX :

In the of the exam, its easy for you to be brain dead and forget the formulas
Cov ( X , Y )
b=
, a = E (Y ) bE ( X )
Var ( X )

Even if you are not brain dead, you can easily make mistakes calculating a + bX
from scratch. In contrast, if you have entered your data pair ( xi , yi ) correctly, BA
II Plus/Professional will generate the results 100% right.

Even if you want to calculate a + bX from scratch, its good to use LIN to
double check your work.
Guo Fall 2009 C, Page 40 / 284

Example 3. For the following data pair ( xi , yi ) , find the linear least squares regression
line a + bX using BA II Plus/BA II Plus Professional.
i

pi ( xi , yi )

1
2
3

16
13
12

X = xi

Y = yi

0
3
12

1
6
8

Also, calculate a + bX when X =0, 3, 12 respectively.


Solution

Here pi ( xi , yi ) is not uniform. To convert probabilities to the # of occurrences, lets


assume we have a total of 6 occurrences. Then ( x1 , y1 ) occurs once; ( x2 , y2 ) occurs
twice; and ( x3 , y3 ) occurs three times. When calculating a + bX , LIN Statistics
Worksheet automatically figures out that p1 ( x1 , y1 ) =
p3 ( x3 , y3 ) =

1
1
, p2 ( x2 , y2 ) = , and
6
3

1
.
2

Of course, you can also assume that the total # of occurrences is 60. Then ( x1 , y1 ) occurs
10 times; ( x2 , y2 ) occurs 20 times; and ( x3 , y3 ) occurs 30 times. However, this approach
will make your data entry difficult.
The following calculation assumes the total # of occurrences is 6.
When using LIN Statistics Worksheet, we enter the following data:
X01=0,

Y01=1

X02=3,
X03=3,

Y02=6
Y04=6

X04=12,
X05=12,
X06=12,

Y04=8
Y05=8
Y06=8

Your calculator should give you:


n = 6 , X = 7 , S X = 5.58569602 ,

= 5.09901951 ,

Y = 6.16666667 , SY = 2.71416040 , Y = 2.47767812


a = 3.25 , b = 0.41666667 , r = 0.85749293
Guo Fall 2009 C, Page 41 / 284

a + bX = 3.25 + 0.41666667 X

Set X ' = 0 . Press CPT .You should get Y ' = 3.25


Set X ' = 3 . Press CPT . You should get Y ' = 4.5
Set X ' = 12 . Press CPT . You should get Y ' = 8.25
Double checking BA II Plus/Professional LIN functionality:
i
Y = yi
pi ( xi , yi ) X = xi
1
2
3

0
3
12

16
13
12

1
1
1
( 0 ) + ( 3) + (12 ) = 7 ,
6
3
2
Var ( X ) = 75 72 = 26

E(X ) =

1
6
8
E(X2) =

1 2 1 2 1
0 ) + ( 3 ) + (122 ) = 75
(
6
3
2

1
1
1
(1) + ( 6 ) + ( 8 ) = 6.1667
6
3
2
1
1
1
E ( X Y ) = ( 0 1) + ( 3 6 ) + (12 8 ) = 54
6
3
2
E (Y ) =

Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) = 54 7 6.1667 = 10.8331


b=

Cov ( X , Y )
Var ( X )

10.8331
= 0.41666
26

a = E (Y ) bE ( X ) = 6.1667 0.41666 ( 7 ) = 3.25


a + bX = 3.25 + 0.41666 X

If X = 0 , then Y ' = a + bX = 3.25 + 0.41666 ( 0 ) = 3.25


If X = 3 , then Y ' = a + bX = 3.25 + 0.41666 ( 3) = 4.5
If X = 12 , then Y ' = a + bX = 3.25 + 0.41666 (12 ) = 8.25
Now you should be convinced that LIN Statistics Worksheet produces the correct result.
Application of LIN Statistics Worksheet in Exam C

Guo Fall 2009 C, Page 42 / 284

There are at least two places you can use LIN. One is to calculate Bhlmann credibility
premium as the least squares regression of Bayesian premium. Another situation is to use
LIN for liner interpolation. Ill walk you through both.
Bhlmann credibility premium as the least squares regression of Bayesian premium
Example 4. (old SOA problem)
Let X 1 represent the outcome of a single trial and let E ( X 2 X 1 ) represent the expected
value of the outcome of a 2nd trial as described in the table below:
Outcome
k

Initial probability
of outcome

Bayesian Estimate
E ( X 2 X1 = k )

0
3
12

13
13
13

1
6
8

Calculate the Bhlmann credibility premium corresponding to Bayesian estimates (1,6,8).


Solution

Bhlmann credibility premium is P = a + Z X , which minimizes the following items:


E ( a + ZX 1 Y )

where Y = E ( X 2 X 1 ) .
Since the probability of data pair is uniformly 1 3, we enter the following data in LIN:
X01=0,
X02=3,
X03=12,

Y01=1
Y02=6
Y03=8

We should get:
a = 2.5 , b = 0.5
Enter X ' = 0 . Press CPT. Youll get Y ' = 2.5 (this is a + bX when X =0)
Enter X ' = 3 . Press CPT. Youll get Y ' = 4 (this is a + bX when X =3)
Enter X ' = 12 Press CPT. Youll get Y ' = 8.5 (this is a + bX when X =12)

So the Bhlmann credibility premium corresponding to Bayesian estimates (1,6,8) is (2.5,


4, 8.5).
Guo Fall 2009 C, Page 43 / 284

Example 5 (another old SOA problem)


You are given the following information about insurance coverage:
# of losses
n

Probability

Bayesian Premium
E ( X 2 X1 = n )

0
1
2

14
12
14

0.5
0.9
1.7

Determine the Bhlmann credibility factor for this experience.


Solution
The probability is not uniform. Assume the total # of occurrences is 4. Then the data pair
" n = 0, E ( X 2 X 1 = 0 ) = 0.5!# occurs once, " n = 1, E ( X 2 X 1 = 1) = 0.9 !# occurs twice, and
" n = 2, E ( X 2 X 1 = 2 ) = 1.7 !# occurs once.

So we enter the following data into LIN:


X01=0,
X02=1,
X03=1,
X04=2,

Y01=0.5
Y02=0.9
Y03=0.9
Y03=1.7

We should get:
a = 0.4 , b = 0.6 . So the Bhlmann credibility factor is Z = b = 0.6 .

Example 6 (old SOA problem)

Outcome
Ri

Probability
Pi

Bayesian Estimate Ei
given outcome Ri

0
2
14

23
29
19

7 4
55 24
35 12

The Bhlmann credibility factor after one experiment is

1
. Calculate a and b that
12

minimize the following expression:


Guo Fall 2009 C, Page 44 / 284

3
i =1

Pi ( a + bRi

Ei )

Solution
1
. However, to solve this problem, you
12
really dont need to know b . Once again, well use LIN to solve the problem. Lets
assume the total # of occurrences of data pairs ( Ri , Ei ) is 9. Then (0, 7 4 ) occurs 6

SOA makes your life easier by giving you b =

times; (2, 55 24 ) occurs 2 times; and (14, 35 12 ) occurs one time.


Enter the following into LIN:
X01=0,
X02=0,
X03=0,
X04=0,
X05=0,
X06=0,

Y01= 7 4 = 1.75
Y02=1.57
Y03=1.57
Y04=1.57
Y05=1.57
Y06=1.57

X07=2,
X08=3,

Y07= 55 24
Y08= 55 24

X09=14,

Y09= 35 12

We should get:
a = 1.8333 , b = 0.08333 =

1
.
12

Does this solution sound too much data entry? Not to me. Yes, I can figure out the
answers using the equations:
b=

Cov ( X , Y )
Var ( X )

, a = E (Y ) bE ( X )

I might solve this problem using the above equations when Im not taking the exam.
However, in the exam room, you bet I wont bother using these equations. I will enter 18
numbers into the calculator and let the calculator do the math for me. This way, I dont
have to think. I just enter the numbers and the calculator will spit out the answer for me.
And I know that my result is 100% right.

Guo Fall 2009 C, Page 45 / 284

#7

Do linear interpolation

Another use of LIN is to do linear interpolation. You are given two data pairs ( x1 , y1 ) and

( x2 , y2 ) . Then you are given a single value

x3 . You need to find y3 using linear

interpolation.
The equation for linear interpolation is this:
y3
x3

y1 y2
=
x1 x2
y3 =

y1
= slop of line ( x1 , y1 ) and ( x2 , y2 )
x1
y2
x2

y1
( x3
x1

x1 ) + y1

Under exam conditions, this standard approach is often prone to errors.


To use LIN for linear interpolation, please note that the least squares regression line for
two data points ( x1 , y1 ) and ( x2 , y2 ) is just an ordinary straight line connecting ( x1 , y1 )
and ( x2 , y2 ) . To find y3 , we simply find the least squares regression line a + bX for

( x1 , y1 )

and ( x2 , y2 ) . Then we enter x3 into LIN. Then LIN will produce y3 .

Example 1. (May 2000, #2)

You are given the following random sample of 10 claims:


46
121 493 738 775
1078 1452 2054 2199 3207
Determine the smoothed empirical estimate of the 90th percentile, as defined in Klugman,
Panjer, and Willmot.
Solution

To find the smoothed empirical estimate, we arrange the n observations in ascending


100k
percentile. For example, the 1st observation 46
order. Then the k -th number is the
n +1
100 (1)
100 ( 2 )
= 9.09 percentile; the 2nd observation 121 is the
= 18.18 percentile.
is the
10 + 1
10 + 1
So on and so forth.

Guo Fall 2009 C, Page 46 / 284

To find the smoothed estimate of the 90-th percentile, we linearly interpolate between the
100 ( 9 )
9-th observation, which is
= 81.82 -th percentile, and the 10th observation, which
10 + 1
100 (10 )
is
= 90.91 -th percentile.
10 + 1

2,199

x90

3,207

81.82

90

90.91

x90 = x81.82 +

percentile

90 81.82
( x90.91 x81.82 )
90.91 81.82

= 2,199 +

90 81.82
( 3, 207 2,199 ) = 3,106.09
90.91 81.82

The above is the standard solution, which is prone to errors.


Next, Ill show you two shortcuts. One is without using LIN; the other with using LIN.

Shortcut without LIN:


Since the k -th number is the

100k
100k
percentile, the m =
percentile corresponds to
n +1
n +1

m ( n + 1)
- th observation. For example, the 81.82-th percentile corresponds to
100
81.82 (10 + 1)
= 9 -th observation; 90.91-th percentile corresponds to the
100
90.91(10 + 1)
= 10 -th observation.
100
Important Rules:
The k -th observation is the

100k
percentile.
n +1

Guo Fall 2009 C, Page 47 / 284

The m -th percentile is the

m ( n + 1)
- th observation.
100

Once you understand the above two rules, you can quickly find the 90-th percentile.
Set m = 90 : k =

m ( n + 1) 90 (10 + 1)
=
= 9.9 . So 9.9-th observation is what we are
100
100

looking for.
Of course, there isnt 9.9-th observation. So we need to find it using linear interpolation.

2,199

x90

3,207

9.9

10

x90 = 2,199 +

9.9 9
( 3, 207 2,199 ) = 3,106.2
10 9

You see that this linear interpolation is must faster than the previous linear interpolation.

Shortcut using LIN


We have two data pairs (9, 2,199) and (10, 3,207). As said before, if you have only two
points, then the least squares line is just the ordinary line connecting the two points. We
are interested in finding the ordinary straight line connecting (9, 2,199) and (10, 3,207).
So well use the LIN function to find the least squares line, which is the ordinary line.
Enter the following into LIN:
X01=9,
X02=10,

Y01=2199
Y02=3207

Youll find that: a = 6,873 , b = 1, 008 , r = 1 . The correlation coefficient should be one
because we have only two data pairs. Two data points always produce perfectly linear
relationship. So if your r is not equal to one, you did something wrong.
Next, set X ' = 9.9 . Press CPT. You should get: Y ' = 3,106.2 . This is the 90th percentile
you are looking for.
Guo Fall 2009 C, Page 48 / 284

Example 2
You are given the following values of the cdf of a standard normal distribution:

, ( 0.4 ) = 0.6554 , , ( 0.5) = 0.6915


Use linear interpolation, calculate , ( 0.443)
Solution

The standard solution is


, ( 0.443 ) =

0.5 0.443
0.443 0.4
, ( 0.4 ) +
, ( 0.5 )
0.5 0.4
0.5 0.4

= 0.57, ( 0.4 ) + 0.43, ( 0.5)


= 0.57 ( 0.6554 ) + 0.43 ( 0.6915) = 0.6709
This approach is prone to errors. The math logic is simple, but there are simply too many
numbers to calculate. And its very easy to make a mistake, especially in the heat of the
exam.
To quickly solve this problem, well use LIN. Enter the following data:
X01=0.4, Y01=0.6554
X02=0.5, Y02=0.6915
2nd STAT (keep pressing 2nd Enter until you see LIN)
Press the down arrow key , youll see n = 2
Press the down arrow key , youll see X = 0.45
Press the down arrow key , youll see S X = 0.07071068
= 0.05

Press the down arrow key

, youll see

Press the down arrow key


Press the down arrow key

, youll see Y = 0.67345


, youll see S y = 0.02552655

Press the down arrow key

, youll see

Press the down arrow key


Press the down arrow key
Press the down arrow key
Press the down arrow key
Enter X ' = 0.443

, youll see a = 0.511


, youll see b = 0.361
, youll see r = 1 (this is the correlation coefficient)
, youll see X ' = 0.00

= 0.01805

Guo Fall 2009 C, Page 49 / 284

Press the down arrow key .


Press CPT. Youll get Y ' = 0.670923
So , ( 0.443) = 0.670923

In the above example, after generating , ( 0.443) = 0.670923 , you want to generate
, ( 0.412345 ) , this is what you do:
Enter X ' = 0.412345
Press the down arrow key .
Press CPT. Youll get Y ' = 0.65985655 . This is , ( 0.412345 ) .
If you want to generate , ( 0.46789 ) , this is what you do:
Enter X ' = 0.46789
Press the down arrow key .
Press CPT. Youll get Y ' = 0.67990829 . This is , ( 0.46789 ) .
General procedure
Given two data pairs ( c1 , d1 ) and ( c2 , d 2 ) and a single data c3 , to use BA II Plus and BA
II Plus Professional LIN Worksheet to generate d3 , enter
X01= c1 , Y01= d1
X02= c2 , Y02= d 2
X ' = c3

In other words, the independent variable c1 , c2 , c3 must be entered as X ' s and d1 , d 2


must be entered as Y ' s .

Example 3
You are given the following values of the cdf of a standard normal distribution:

, ( 0.4 ) = 0.6554 , , ( 0.5) = 0.6915


Use linear interpolation, find a, b, c , and e (all these are positive numbers) such that
, ( a ) = 0.6666
, ( b ) = 0.6777
, ( c ) = 0.6888
Guo Fall 2009 C, Page 50 / 284

, ( d ) = 0.6999
Solution

In BA II Plus and BA II Plus Professional LIN Statistics Worksheet, enter


X01=0.6554, Y01=0.4
X02=0.6915, Y02=0.5
Enter X ' = 0.6666 . Then the calculator will generate Y ' = 0.43102493 .
So a = 0.43102493 .
Enter X ' = 0.6777 . Then the calculator will generate Y ' = 0.46177285
So b = 0.46177285 .
Enter X ' = 0.6888 . Then the calculator will generate Y ' = 0.49252078
c = 0.49252078
Enter X ' = 0.6999 . Then the calculator will generate Y ' = 0.52326870
So d = 0.52326870

Example 4
The population of a survivor group is assumed to be linear between two consecutive ages.
You are given the following:

Age
50
51

# of people alive at this age


598
534

Calculate the # of people alive at the following fractional ages:


50.2, 50.5, 50.7, 50.9
Solution

In BA II Plus and BA II Plus Professional LIN Statistics Worksheet, enter


X01=50, Y01=598
X02=51, Y02=534
Enter
Enter
Enter
Enter

X'
X'
X'
X'

= 50.2 . Then the calculator will generate


= 50.5 . Then the calculator will generate
= 50.7 . Then the calculator will generate
= 50.9 . Then the calculator will generate

Y'
Y'
Y'
Y'

= 585.2
= 566
= 553.2
= 540.4
Guo Fall 2009 C, Page 51 / 284

Chapter 2

Maximum likelihood estimator

Basic idea
An urn has two coins, one fair and the other biased. In one flip, the fair coin has 50%
chance of landing with heads, while the biased one has 90% chance of landing with
heads. Now a coin is randomly chosen from the urn and is tossed. The outcome is a head.
Question: Which coin was chosen from the urn? The fair coin or the biased coin?
Imagine you have entered a bet. If your guess is correct, youll earn $10. If your guess is
wrong, youll lose $10. How would you guess?
Most people will guess that the coin chosen from the urn was the biased coin; the biased
coin is far more likely to land on heads.
This simple example illustrates the intuition behind the maximum likelihood estimator. If
we have to estimate a parameter from an n -size sample X 1 , X 2 ,, X n , we can choose a
parameter that has the highest probability to be observed.
Example. You flip a coin 9 times and observe HTTTHHHTH. You dont know whether
the coin is fair and you need to estimate the probability of getting H in one flip.
Let p represent the probability of getting a head in one flip. The probability for us to
observe HTTTHHHTH is

P ( HTTTHHHTH p ) = p5 (1 p )

This is called the likelihood function L ( p ) .

Sample values of p and the corresponding likelihood function are:


p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

P ( HTTTHHHTH p ) = p5 (1 p )

0.000000000
0.000006561
0.000131072
0.000583443
0.001327104
0.001953125
0.001990656
0.001361367
0.000524288
0.000059049
0.000000000
Guo Fall 2009 C, Page 52 / 284

If we have to guess p among the possible values 0, 0.1, 0.2, , we might guess p = 0.6 ,
which has the highest probability to produce the outcome of HTTTHHHTH.

General procedure to calculate the maximum likelihood estimator


A coin is tossed n times and x number of heads are observed. Let p represent the
probability that a head shows up in one flip of coin. Calculate the maximum likelihood
estimator of p .
Step One
function)

Write the probability that the observed event happens (the likelihood

The probability for us to observe x heads out of n flips of a coin is:

P ( getting x heads out of n flips p ) = Cnx p x (1 p )

n x

Step Two
Take logarithms of the likelihood function (called log-likelihood
function). This step simplifies our calculation (as youll see soon).
ln P ( getting x heads out of n flips p ) = ln Cnx + x ln p + ( n x ) ln (1 p )

Step Three Take the 1st derivative of the log-likelihood function regarding the
parameter. Set the 1st derivative to zero.
d
ln P ( getting x heads out of n flips p ) = 0
dp
d
ln Cnx + x ln p + ( n x ) ln (1 p ) = 0 ,
dp
d
d
d
ln Cnx + ( x ln p ) +
dp
dp
dp

(n

x ) ln (1 p ) = 0 ,

In the above equation, the variable is p ; n and x are constants.


d
ln Cnx = 0 ,
dp
d
dp

(n

d
d
x
( x ln p ) = x ( ln p ) = ,
dp
dp
p

x ) ln (1 p ) = ( n x )

d
n x
ln (1 p ) =
1 p
dp

Guo Fall 2009 C, Page 53 / 284

x n x
1 p n x 1 n
=0,
=
,
= ,
p 1 p
p
x
p x

p=

x
n

Nov 2000 #6
You have observed the following claim severities:
11.0,

15.2,

18.0,

21.0,

25.8

You fit the following probability density function to the data:


1
exp
2 x

f ( x) =

1
2
(x ) , x > 0 , > 0
2x

Determine the maximum likelihood estimator of .


Solution
First, make sure you understand the theoretical framework.
Here we take a random sample of 5 claims X 1 , X 2 , X 3 , X 4 , and X 5 . We assume that
X 1 , X 2 , X 3 , X 4 , and X 5 are independent identically distributed with a common pdf
f ( x) =

1
exp
2 x

1
2
(x )
2x

The joint density of X 1 , X 2 , X 3 , X 4 , and X 5 is:

f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

= f X ( x1 ) f X ( x2 ) f X ( x3 ) f X ( x4 ) f X ( x5 )
1
exp
2 x1

1
exp
2 x4

1
1
2
exp
( x1 )
2 x1
2 x2
1
( x4
2 x4

)
2

1
1
2
exp
( x2 )
2 x2
2 x3

1
exp
2 x5

1
2
( x3 )
2 x3

1
2
( x5 )
2 x5

The probability that we observe X 1 , X 2 , X 3 , X 4 , and X 5 is:


P ( x1

X1

x1 + dx1 , x2

X2

x2 + dx2 , x3

X3

x3 + dx3 , x4

X4

x4 + dx4 , x5

X5

x5 + dx5 )

= f X ( x1 ) f X ( x2 ) f X ( x3 ) f X ( x4 ) f X ( x5 ) dx1dx2 dx3dx4 dx5


Guo Fall 2009 C, Page 54 / 284

Our goal is to find a parameter that will maximize our chance of observing X 1 , X 2 ,
X 3 , X 4 , and X 5 . To maximize our chance of observing X 1 , X 2 , X 3 , X 4 , and X 5 is to
maximize the joint pdf f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) . To maximize the joint pdf

f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) , we can set the 1st derivative of the joint pdf regarding to

equal to zero:

d
f X , X , X , X , X ( x1 , x2 , x3 , x4 , x5 ) = 0
d 1 2 3 4 5
Though we can solve the above equation by pure hard work, an easier approach is to find
a parameter that will maximize the log-likelihood of us observing X 1 , X 2 , X 3 , X 4 ,
and X 5 :
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

If ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) is maximized, f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) will
surely be maximized. So the task boils down to finding such that the 1st derivative of
the log pdf is zero:
d
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0
d
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

ln
i =1

1
exp
2 xi

1
( xi
2 xi

ln
i =1

1
2 xi

1
( xi
2 xi

Setting the 1st derivative of the log joint pdf to zero:


d
d

ln
i =1

1
2 xi

1
( xi
2 xi

=0

In the above equation, the random variable is ; x1 , x2 , x3 , x4 , and x5 are constants. So


1
is a constant and its derivative regarding is zero.
2 xi
d
d

5
i =1

1
( xi
2 xi

d
= 0,
d

2
( xi )

i =1

xi

=0

Guo Fall 2009 C, Page 55 / 284

d
d

2
( xi )

i =1

xi

i =1

5
5
xi
d ( xi )

= 2
= 2
1
=0
d
xi
xi
xi
i =1
i =1
2

=0, 5

xi

i =1

1 1 1 1 1
+ + + +
=0
x1 x2 x3 x4 x5

5
5
=
= 16.74
1 1 1 1 1
1
1
1 1
1
+ + + +
+
+ + +
x1 x2 x3 x4 x5 11 15.2 18 21 25.8

After understanding the theoretical framework and detailed calculation, we are ready to
use a shortcut. First, lets isolate the variable :
f ( x) =

1
exp
2 x

1
2
(x )
2x

f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

exp
i =1

1
( xi
2 xi

2
( xi )

i =1

xi

d
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0
d

1
2
(x )
2x

exp

5
i =1

5
xi
d ( xi )
= 2
=0
d
xi
xi
i =1
2

5
5
=
= 16.74
1 1 1 1 1
1
1
1 1
1
+ + + +
+
+ + +
x1 x2 x3 x4 x5 11 15.2 18 21 25.8

May 2000 #21

You are given the following five observations:


521

658

702

819

1217

You use the single-parameter Pareto with cumulative distribution function:


F ( x) = 1

500
x

, x > 500 , ! > 0

Guo Fall 2009 C, Page 56 / 284

Calculate the maximum likelihood estimate of the parameter ! .


Solution
From Exam C Table, you should be able to find:
f ( x) =

! 500!
x! +1

The joint pdf of having 5 observations is:


5

! 500!

i =1

xi! +1

f ( x1 , x2 , x3 , x4 , x5 ) = "

ln f ( x1 , x2 , x3 , x4 , x5 ) = ln

! 5 5005!

( x1 x2 x3 x4 x5 )

! +1

! 5 5005!

( x1 x2 x3 x4 x5 )

! +1

= 5ln ! + 5! ln 500

(! + 1) ln ( x1 x2 x3 x4 x5 )

d
5
ln f ( x1 , x2 , x3 , x4 , x5 ) = + 5 ln 500 ln ( x1 x2 x3 x4 x5 ) = 0
d!
!
5

+ 5ln 500 ln ( 521 658 702 819 1217 ) = 0 ,

! = 2.453

Nov 2000 #22


You are given the following information about a random sample:
The sample size equals five
The sample is from a Weibull distribution with $ = 2
Two of the sample observations are known to exceed 50, and the remaining three
observations are 20, 30, and 45
Calculate the maximum likelihood estimator of % .
Solution
From Exam C table, youll find the Weibull pdf and cdf:

f ( x) =

%
x

2x

%2

F ( x) = 1 e

, S ( x) = e

We have observed the following:


Guo Fall 2009 C, Page 57 / 284

x1 > 50 , x2 > 50 , x3 = 20 , x4 = 30 , x5 = 45
The likelihood function is:

L (% ) = f ( 20 ) f ( 30 ) f ( 45 ) S ( 50 ) S ( 50 )
=

2 ( 20 )

%2

exp

L (% )

20

2 ( 30 )

%2

8,325

30

exp

ln L (% ) = k

%2

2 ( 8,325 ) 6
d
ln L (% ) =
= 0,
d%
%3
%

2 ( 40 )

%6

8,325

%2

exp

40

exp

50

exp

50

6ln % , where k is a constant

% = 52.7

Fisher Information
One key theorem you need to memorize for Exam C is that the maximum likelihood
1
:
estimator % is approximately normally distributed with mean %0 and variance
I (% )

N %0 ,

I (% )

Here %0 is the true parameter. L ( x,% ) , called Fisher information or information, is the
variance of

I (% ) = VarX

d
ln L ( x,% ) :
d%

d
ln L (% ) = E X
d%

d
ln L ( x, % )
d%

= EX

d2
ln L ( x, % )
d% 2

Please note in the above equation, the expectation and variance are regarding X .

Its quit a bit a math to prove that %

N %0 ,

I (% )

. So I wont show you the proof.

Youll just need to memorize it. However, Ill show you why
Guo Fall 2009 C, Page 58 / 284

I (% ) = VarX

d
ln L ( x,% ) = E X
d%

d
ln L ( x,% )
d%

= EX

d2
ln L ( x,% )
d% 2

First, let me introduce a new concept to you called score. The term score is not the
syllabus. However, its a building block for Fisher information. So lets take a look.
Assume we have observed x1 , x2 ,, xn . Let L ( x,% ) represent the likelihood function.
n

L ( x,% ) = " f ( xi ,% ) , where % is the unobservable parameter of the density function.


i =1

When calculating the maximum likelihood estimator % , we often use the log-likelihood
function. So lets consider log-likelihood function, ln L ( x,% ) . The derivative of the logd
ln L ( x, % ) , is called the score of the
d%
log-likelihood function. Lets find the mean and variance of the score.

likelihood function regarding the estimator % ,

d
1 d
ln L ( x,% ) =
L ( x, % )
d%
L (% ) d%
Using the standard formula E X g ( x ) = & g ( x ) f ( x ) dx , we have:

EX

=&

d
1
d
L ( x, % )
ln L ( x,% ) = E X
d%
L ( x , % ) d%

d
1
d
d
L ( x, % ) L ( x, % ) dx = &
L ( x, % ) dx =
d%
L ( x , % ) d%
d%

& L ( x,% ) dx

density

random variable

However,

EX

& L ( x,% ) dx = 1 ( property of pdf). So we have:

d
1
d
d
ln L ( x, % ) = E X
1= 0
L ( x, % ) =
d%
L ( x , % ) d%
d%

Next, let me explain why E

d
ln L ( x,% )
d%

= E

d2
ln L ( x,% )
d% 2

Guo Fall 2009 C, Page 59 / 284

We know that E X

d
d
ln L ( x, % ) = &
ln L ( x, % ) L ( x,% ) dx = 0
d%
d%

Taking derivative regarding % at both sides of the above equation:

d
d%
Moving
d
d%

&

d
d
ln L ( x, % ) L ( x,% ) dx =
0=0
d%
d%

d
inside the integration, we have:
d%

d
d
ln L ( x, % ) L ( x,% ) dx = &
d%
d%

&

Using the formula


d
d%

d
ln L ( x, % ) L ( x, % ) dx
d%

d
d
d
u ( x) v ( x) = u ( x)
v ( x) + v ( x)
u ( x ) , we have:
dx
dx
dx

d
ln L ( x, % ) L ( x,% )
d%

= L ( x, % )

However,

d d
d
d
ln L ( x, % ) +
ln L ( x, % )
L ( x, % )
d% d%
d%
d%

d
d
1
ln L ( x,% ) =
L ( x, % ) .
d%
L ( x,% ) d%

d
d
L ( x , % ) = L ( x, % )
ln L ( x, % )
d%
d%

So we have:
d
d%

d
ln L ( x,% ) L ( x,% )
d%

= L ( x, % )

d d
d
d
ln L ( x, % ) +
ln L ( x, % )
L ( x, % )
d% d%
d%
d%

= L ( x, % )

d d
d
ln L ( x, % ) + L ( x, % )
ln L ( x, % )
d% d%
d%

d2
d
= L ( x, % ) 2 ln L ( x, % ) + L ( x, % )
ln L ( x, % )
d%
d%

= L ( x, % )

d2
d
ln L ( x, % ) +
ln L ( x, % )
2
d%
d%

Guo Fall 2009 C, Page 60 / 284

Then

&

d
d%

&

d
ln L ( x, % ) L ( x,% ) dx = 0 becomes:
d%

d2
d
ln L ( x, % ) L ( x,% ) dx + &
ln L ( x, % )
2
d%
d%

However,

L ( x, % ) dx = 0

&

d2
d2
L
x
L
x
dx
E
ln
,
%
,
%
=
ln L ( x,% ) ,
(
)
(
)
d% 2
d% 2

&

d
ln L ( x, % )
d%

L ( x, % ) dx = E

d
ln L ( x, % )
d%

d2
d
Then it follows that E
ln L ( x, % ) + E
ln L ( x, % )
2
d%
d%

Since we know that E

Var

= 0.

d
ln L ( x, % ) = 0 , it follows that
d%

d
d
ln L ( x, % ) = E
ln L ( x,% )
d%
d%

The score

= E

d2
ln L ( x,% )
d% 2

d
ln L ( x, % ) has
d%

d
zero mean and variance E
ln L ( x, % )
d%

d2
ln L ( x,% )
= E
d% 2

Nov 2003 #18


The information associated with the maximum likelihood estimator of a parameter % is
4n , where n is the number of observations.

Calculate the asymptotic variance of the maximum likelihood estimator of 2% .


Solution

()

()

Var % is the inverse of the information. So Var % =

( )

()

Var 2% = 4Var % = 4

1
4n

1
1
= .
4n
n
Guo Fall 2009 C, Page 61 / 284

The Cramer-Rao theorem


Suppose the random variable X has density function f ( x,% ) . If g ( x ) is any unbiased
estimator of % , then Var g ( x ) '

Var f ( x,% )

. The proof is as follows:

E g ( x ) = & g ( x ) f ( x, % ) dx . Since g ( x ) is an unbiased estimator of % , E g ( x ) = % .

& g ( x ) f ( x,% ) dx = % .
Taking derivative regarding % at both sides of the above equation:
d
d
% =1
g ( x ) f ( x, % ) dx =
&
d%
d%

Moving

d
inside the integration:
d%
d
d
g ( x ) f ( x, % ) dx = &
g ( x ) f ( x, % ) dx = 1
&
d%
d%

g ( x ) is a constant if the derivative is regarding % . So we have:


d

& d%
However,

g ( x ) f ( x, % ) dx = & g ( x )

d
f ( x, % )dx = 1
d%

d
d
f ( x, % ) = f ( x, % )
ln f ( x, % ) . So we have
d%
d%

d
d
& g ( x ) d% f ( x,% )dx = & g ( x ) d% ln f ( x,% ) f ( x,% ) dx = 1

However,

d
d
& g ( x ) d% ln f ( x,% ) f ( x,% ) dx = E g ( x ) d% ln f ( x,% )

EX g ( x )

d
ln f ( x, % ) = 1 .
d%

Guo Fall 2009 C, Page 62 / 284

Next, consider covariance Cov g ( x ) ,


Cov g ( x ) ,

d
ln f ( x, % ) .
d%

{g ( x )

d
ln f ( x,% ) = E X
d%

Eg ( x )}

d
d
ln f ( x, % ) E
ln f ( x,% )
d%
d%

The above is just the standard formula Cov ( X , Y ) = E


However, E X g ( x ) = % , E X

E(X ) Y

E (Y ) .

d
d
ln f ( x,% ) is the score and has
ln f ( x, % ) = 0 .
d%
d%

zero mean. Then it follows:

{ g ( x ) % } dd% ln f ( x,% )

Cov g ( x ) ,

d
ln f ( x, % )
d%

= EX g ( x )

d
d
ln f ( x,% ) %
ln f ( x, % )
d%
d%

= EX g ( x )

d
ln f ( x,% )
d%

EX %

d
ln f ( x, % )
d%

= EX g ( x )

d
ln f ( x, % )
d%

% EX

d
ln f ( x,% )
d%

= EX

=1 % 0 =1

Cov g ( x ) ,

d
ln f ( x, % ) = 1
d%

Next, applying the general rule:

Cov ( X , Y ) = * X ,Y + X + Y , where * X ,Y is the correlation coefficient. Because * X ,Y

1 , we

have:
Cov ( X , Y )

= * X ,Y + X + Y

d
1 = Cov g ( x ) ,
ln f ( x,% )
d%

[+ X + Y ]

= Var ( X ) Var (Y )

Var g ( x ) Var

d
ln f ( x,% )
d%

Guo Fall 2009 C, Page 63 / 284

Var g ( x ) '
Var

d
ln f ( x, % )
d%

The above formula means this:


For an unbiased estimator g ( x ) , its variance is no less than the reciprocal of the variance
of the score

d
ln f ( x,% ) .
d%
1

Var g ( x ) '

is a generic formula. When we use the maximum


d
Var
ln f ( x,% )
d%
likelihood estimator, then the density function is:

f ( x,% ) = f ( x1 , % ) f ( x2 ,% ) ... f ( xn ,% ) = L ( x, % )
When the

d
ln f ( x,% ) meets certain condition, Var g ( x ) =
d%

. We
d
Var
ln f ( x,% )
d%
are not going to worry about what these conditions are. All we need to know is that for
the maximum likelihood estimator g ( x ) , when n , the sample size of the observed data
X 1 , X 2 ,..., X n , approaches infinity, the variance of g ( x ) approaches
1
Var

d
ln L ( x, % )
d%

For a single maximum likelihood estimator % ,


Var (% ) .

1
d
Var
ln L ( x, % )
d%

as simple size n approaches infinity.

Extending the above result to a series of maximum likelihood estimators (presented


without proof):

Guo Fall 2009 C, Page 64 / 284

Assume that random variable X has density f ( x;%1 ,% 2 ,...,% k ) . The covariance

Cov (% i , % j ) between two maximum likelihood estimators %i and % j , as simple size n

approaches infinity, is equal to the inverse of ( i, j ) entry of Fisher Information:


Ii , j = E

/ 2 ln f ( x;%1 ,% 2 ,...,% k )
/%i /% j

=E

/ 2 ln L ( x;%1 , % 2 ,..., % k )
/%i /% j

For two maximum likelihood estimators, Fisher Information matrix is:


/2
ln L ( x;%1 ,% 2 )
/%12

E
I=
E

/2
ln L ( x;%1 ,% 2 )
/%1/% 2

Where I1,2 = I 2,1 = E

E
E

/2
ln L ( x;%1 ,% 2 )
/%1/% 2
/2
ln L ( x;%1 ,% 2 )
/% 22

/2
ln L ( x;%1 , % 2 )
/%1/% 2

Then
Cov (%1 , %1 ) = Var (%1 )
Cov (%1 , % 2 )
=I
Cov (% 2 , %1 )
Cov (% 2 , % 2 ) = Var (% 2 )

Nov 2000 #13


A sample of ten observations comes from a parametric family f ( x, y;%1 , % 2 ) with log
likelihood function
ln L (%1 , % 2 ) =

10
i =1

ln f ( xi , yi ;%1 , % 2 ) = 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k

where k is a constant.
Determine the estimated covariance matrix of the maximum likelihood estimator

%1
.
%2

Solution
Guo Fall 2009 C, Page 65 / 284

/2
ln L (%1 ,% 2 ) = E
/%12

/2
2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 5) = 5
2 (
/%1

/2
ln L (%1 , % 2 ) = E
/% 22

/2
2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 2 ) = 2
2 (
/% 2

/2
ln L (%1 , % 2 ) = E
/%1/% 2

/2
2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 3) = 3
(
/%1/% 2

Fisher Information is:


I=

5 3
3 2

The general formula inversing a 22 matrix is:


a b
c d

d
ad bc b

c
, if ad bc 0 0
a

Var (%1 )
Cov (%1 ,% 2 )
=I
Cov (%1 ,% 2 ) Var (% 2 )

5 3
=
3 2

2
5 2 3 3 3
1

3
2
=
5
3

3
5

Fisher Information matrix is good for estimating the variance and covariance of a series
of maximum likelihood estimators. What if we need to estimate the variance and
covariance of a function of a series of maximum likelihood estimators? We can use the
delta method.

Delta method
Assume that random variable X has mean X and variance + X2 . Define a new function

Y = f ( X ) . Assume that f ( X ) is differentiable, we have:


f ( X ) . f ( X ) + f / ( X )( X

X )

Take variance at both sizes and notice that f ( X ) and f / ( X ) are constants:
Var f ( X ) . Var f ( X ) + f / ( X )( X

X )

Guo Fall 2009 C, Page 66 / 284

= f / ( X ) Var ( X

X ) = f / ( X ) Var ( X )

We get the delta formula Var f ( X ) . f / ( X ) Var ( X ) .


2

Example. Y = X . Then Var

( )

d
X .
X
dx

Var ( X ) =
X =X

1
2 X

Var ( X )

To get a feel of this formula, set Y = f ( X ) = cX , where c is a constant. Then the delta
formula becomes: Var [ cX ] . c 2Var ( X ) .
We can rewrite the formula Var f ( X ) . f / ( X ) Var ( X ) as
2

Var f ( X ) . f / ( X ) Var ( X ) f / ( X )

()

Suppose we want to find the variance of f % , where % is an estimator of a true


parameter % . Please note that % is a random variable. For example, if % is the maximum
likelihood estimator, % varies depending on the sample size and on the sample data we
have observed. Also assume based on the sample data we have, we get one estimator %0 .

()

Set X = % and E ( X ) = E % :

()

Var f %

()

. f/ E %

()

Var %

()

If % is the MLE of an unobservable true parameter % , then % is unbiased and E % = % .

()

However, we dont know the true value of % . Nor do we know f / E %

. Assume that,

based on your sample data on hand, the maximum likelihood estimators for the true
parameters % is a . Then we might want to set % . a .Then we have:

()

Var f %

()

. f / ( a ) Var %
2

Variance of a function of two random variables


X has mean X and variance + X2 ; random variable Y has mean Y and variance + Y2

Define a new function Z = f ( X , Y ) . Assume that f ( Z ) is differentiable, we have:

Guo Fall 2009 C, Page 67 / 284

f ( X , Y ) . f ( X , Y ) + f X/ ( X , Y )( X

X ) + fY/ ( X , Y )(Y X )

Take variance at both sides of the equation and notice that X , Y , f ( X , Y ) ,


f X/ ( X , Y ) , and fY/ ( X , Y ) are all constants:
Var f ( X , Y )

. f X/ ( X , Y ) Var ( X
2

X ) + fY/ ( X , X ) Var (Y X )
2

+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov

( X X ) , ( X X )

. f X/ ( X , Y ) Var ( X ) + fY/ ( X , X ) Var (Y )


2

+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov ( X , Y )

Express this formula in a matrix:


Var f ( X , Y )

. f

/
X

( X , Y )

/
X

Var ( X ) Cov ( X , Y )
Cov ( X , Y )
Var ( Y )

( X , Y )

f X/ ( X , Y )
f X/ ( X , Y )

Many times we are interested in finding the variance of a function of maximum


likelihood estimators. As a simple case, say we have two maximum likelihood estimators

%1 and % 2 . We want to find the variance of f %1 ,% 2 . Setting X = %1 , Y = % 2 ,

( )

( )

X = E %1 , X = E % 2 , we have:

Var f %1 , % 2
/

fE %

(% ,% )
1

( )

Var %1 + f E %
/

( )
2

(% ,% )
1

( )

Var % 2 + 2 f E/ % %1 ,% 2 f E/ %

( )
1

( )
2

(% ,% ) Cov (% ,% )
1

( )

If %1 and % 2 are MLE of the true unobservable parameters %1 and % 2 , then E %1 = %1

( )

and E % 2 = % 2 . Then

Var f %1 , % 2

f%/1 %1 ,% 2

( )

Var %1 + f%/2 %1 ,% 2

( )

) (

) (

Var % 2 + 2 f%/1 %1 ,% 2 f%/2 %1 ,% 2 Cov %1 ,% 2

Guo Fall 2009 C, Page 68 / 284

However, we dont know the true value of %1 and % 2 . Nor do we know f%/1 %1 ,% 2 and

f%/2 %1 ,% 2 . Assume that, based on your sample data on hand, the maximum likelihood
estimators for the true parameters %1 and % 2 are a and b respectively. Then we might
want to set

1
f %1 , % 2
1%1

f%/1 %1 , % 2 =

1
f %1 , % 2
1% 2

f%/2 %1 , % 2 =

1
f %1 , % 2
1%1

.
%1

1
f %1 , % 2
1% 2

.
%2

,
%1 = a

)
% 2 =b

Then we have:

Var f %1 , % 2

1
f %1 , % 2
1%1

+2

( )

1
f %1 , % 2
1% 2

Var %1 +
%1 = a

1
f %1 , % 2
1%1

%1 = a

1
f %1 , % 2
1% 2

1
f %1 , % 2
1% 2

Var f %1 , % 2

+2

as
% 2 =b

% 2 =b

%2 =b

1
f %1 , % 2
1% 2

1
f %1 , % 2
1%1

as
%1 = a

1
f %1 ,% 2
1%1

)
%1

. Then
%2

1
f %1 , % 2
1%1

( )

Var % 2

Cov %1 , % 2

To simply the notation, well rewrite the symbol

and

1
f %1 , % 2
1%1

( )

Var %1 +
%1

%1

1
f %1 , % 2
1% 2

1
f %1 , % 2
1% 2

Cov %1 , % 2

( )

Var % 2
%2

%2

Guo Fall 2009 C, Page 69 / 284

1
f %1 , % 2
1%1

However, youll need to remember that

1
f %1 , % 2
1%1

1
f %1 , % 2
1% 2

and that
%1 = a

really means
%1

really means
%2

1
f %1 , % 2
1% 2

.
% 2 =b

Otherwise, youll get in a conceptual mess that %1 in the function f %1 ,% 2 is a random

variable, yet %1 in the symbol

[ ]%

is not a random variable but a fixed maximum

likelihood estimator.
Expressing the above formula in a matrix:

Var f %1 , % 2

. f% %1 ,% 2
/

f% %1 , % 2
/

Cov % 1 ,% 2

( )

Var % 1

Please note that

( )

Var % 1

Cov % 1 , % 2

Cov % 1 , % 2

Cov % 1 ,% 2

( )

Var % 2

( )

Var % 2

( )
(% ,% )

f%/ %1 ,% 2
1

f%/

= I 1 , where I is Fisher Information.

May 2000 #25


You model a loss function using lognormal distribution with parameters and + . You
are given:
The maximum likelihood estimates of and + are

= 4.215
+ = 1.093

The estimated covariance matrix of and + is:


0.1195
0
0
0.0597

1
The mean of the lognormal distribution is exp + + 2
2

Estimate the variance of the maximum likelihood estimate of the mean of the lognormal
distribution, using the delta method.
Guo Fall 2009 C, Page 70 / 284

Solution
1
The mean function is f ( , + ) = exp + + 2 . The maximum likelihood estimator of
2
1 2
f ( , + ) is f , + = exp + + , where and + are maximum likelihood
2
estimator of and + respectively.

( )

( )

We are asked to find Var f , +

1 2
= Var exp + +
2

Using Taylor series approximation around ( , + ) , we have:

( )

( ) ( ) +

1
f ,+
1

f ,+ . f ( ,+ ) +

( ) (+ + )

1
f ,+
1+

Taking variance at both sides of the equation:

( )

Var f , +

1
.
f ,+
1

( )

+2

1
Var +
f ,+
1+

( )

( )

1
f ,+
1

( )

( )

1
f ,+
1+

( )

Var +
+

( )

Cov , +
+

We are told that The estimated covariance matrix of and + is:


0.1195
0
0
0.0597

( )

( )

( )

So Var . 0.1195 , Var + . 0.0597 , Cov , + . 0 .

( )

Var f , +

( )

1
.
f ,+
1

( )

1
0.1195 +
f ,+
1+

However, we dont know and + . Nor do we know

0.0597
+

( )

1
f ,+
1

and

( )

1
f ,+
1+

.
+

Consequently, we set
Guo Fall 2009 C, Page 71 / 284

( )

1
f ,+
1

( )

1
f ,+
1

( )

1
f ,+
1+

.
+

( )

1 2
= exp + +
2

( )

1 2
= + exp + +
2

1
1
1 2
f ,+ =
exp + +
2
1
1

1
1
1 2
f ,+ =
exp + +
2
1+
1+

( )

1
f ,+
1

( )

1
f ,+
1

1 2
. exp + +
2

1 2
. + exp + +
2

( )

Var f , +

( )

1
f ,+
1+

1
. exp 4.125 + 1.0932 = 123.02
2

1
. 1.093exp 4.125 + 1.0932 = 134.46
2

. 123.022 0.1195 + 134.462 0.0597 = 2,888

Please note that you can also solve this problem using the black-box formula

Var f %1 , % 2
.

+2

1
f %1 , % 2
1%1

1
f %1 , % 2
1%1

( )

Var %1 +
%1

%1

1
f %1 , % 2
1% 2

1
f %1 , % 2
1% 2

Cov %1 , % 2

( )

Var % 2
%2

%2

However, I recommend that you first solve the problem using Taylor series
approximation. This forces you to understand the logic behind the messy formula. Once
you understand the formula, next time you can use the memorized formula for

Var f %1 , % 2

and quickly solve the problem.

May 2005 #9, #10


The time to an accident follows an exponential distribution. A random sample of size two
has a mean time of 6. Let Y represent the mean of a new sample of size two.

Determine the maximum likelihood estimator of Pr ( Y > 10 ) .


Guo Fall 2009 C, Page 72 / 284

Use the delta method to approximate the variance of the maximum likelihood estimator
of FY (10 ) .
Solution

The time to an accident follows an exponential distribution. Assume % is the mean for
this exponential distribution. If X 1 and X 2 are two random samples of time-to-accident,
then the maximum likelihood estimator of % is just the sample mean. So % = 6 .
Pr (Y > 10 ) = Pr

X1 + X 2
> 10 = Pr ( X 1 + X 2 > 20 )
2

X 1 + X 2 is gamma with parameters ! = 2 and % . 6 . Then


Pr ( X 1 + X 2 > 20 ) =

te t 6
&20 36 dt

To calculate

&
&

+2
a

+2
a

x2

te t 6
& 36 dt , youll need to memorize the following shortcut:
20

x /%

x /%

dx = (a + % ) e

dx =

(a + % )

a /%

+% 2 e

a /%

If interested, you can download the proof of this shortcut from my website
http://www.guo.coursehost.com. The shortcut and the proof are in the sample chapter of
my P manual. Just download the sample chapter of P manual and youll get the proof and
more worked out examples using this shortcut.
2

te t 6
1
1
&20 36 dt = 6 20& t 6 e

t 6

dt =

1
[ 20 + 6] e
6

20 6

= 0.1546

If two new samples X 1 and X 2 are taken, then


FY (10 ) = Pr ( X 1 + X 2

20 ) =

20

&
0

te

t%

%2

dt

FY (10 ) =

20

&
0

te

t %
2

dt

Guo Fall 2009 C, Page 73 / 284

2
20

Var FY (10 ) = Var

&

te

()

t %
2

1
1%

dt .

20

&
0

t %

te

()

Var %

dt

()

E % .6

X1 + X 2
1
1
1
= ( 2 ) Var ( X ) = % 2 . ( 6 2 ) = 18
2
4
2
2
Please note that the two samples X 1 and X 2 are independent identically distributed with

( )

Var % = Var X = Var

a common variance Var ( X ) = % 2 .

1
1%

Next, we need to calculate

20

&
0

t %

te

dt =

20

&
0

20

&t
0

( 20 + % ) e

1
1
1%

1
1%

1+

te

20

20 %

20 %

t %
2

dt

()

E % .6

t %

&

&t

1+

400

dt .

1
1%

t %

te

dt =

=1

20

20

1+

20

&t

dt

20

t %

dt

20 %

20

exp

t %

20 %

400

20 %

= 0.066
6

Var FY (10 ) .

1
1%

20

&
0

te

t %
2

dt

()

()

Var % = 0.0662 (18) = 0.078

E % .6

Guo Fall 2009 C, Page 74 / 284

Chapter 3

Kernel smoothing

Essence of kernel smoothing


Kernel smoothing
=Set your point estimate equal to the average of a neighborhood
=Recalculate at every point by averaging this point and the nearby points
Let me illustrate this with a story. You want to buy a house. After looking at many
houses, you find one house you like most. You go the current owner of the house and ask
for the price. The current owner tells you, Im asking for $210,000. Make me an offer.
What are you going to offer? 200,000? $203,000? $205,000 or something else? You are
not sure. And you know the danger: if your offer is too high, the seller accepts your offer
and youll overpay the house; if your offer is too low, youll look stupid and the seller
may refuse to deal with you anymore. So to your best interest, youll want to make your
offer reasonable, not too high, not too low.
If you talk to someone experienced in the real estate market, hell tell you how (and this
works): instead of making a random offer, you can make your offering price to be around
the average selling price of the similar houses sold in the same neighborhood.
Say four similar houses in the same neighborhood are sold this year. Their prices are
$198,000, $200,000, $201,000, and $202,000. So the average selling price is $200,250. If
the house you want to be is truly similar to these four houses, then the seller is asking for
too much. You can offer around $200, 250 and explain to the seller that your asking price
is very similar to the selling price of the houses in the same neighborhood. A reasonable
seller will be willing to lower his asking price.
What advantage do we gain by looking at a neighborhood? A smoothed, better estimate.
If we focus on one house alone, its selling price appears random. However, when we
broaden our view and look at many similar houses nearby, well remove the randomness
of the asking price and see a more reasonable price.
This simple story illustrates the spirit of kernel smoothing. When we want to estimate
f X ( x ) , probability density of a random variable X at point x . Instead of looking at one
# of x's in the sample
, we may want to look at the x s
sample size n
neighborhood. For example, we may want to look at 3 data points x b , x , and x + b
where b is a constant. Then we calculate the average of empirical densities at x b , x ,
and x + b and use it as an estimator of f X ( x ) :
point x and say f

( x) = p ( x) =

Guo Fall 2009 C, Page 75 / 284

( x) =

1
1
1
p ( x b) + p ( x) + p ( x + b)
3
3
3
calculate f ( x ) by averaing the empirical densities
of a neighborhood x b , x , x + b

Please note the analogy of determining the house price is not perfect. Theres one small
difference between how we estimate the price of a house located at x and how we
estimate f X ( x ) . When we estimate the fair price of a house located at x , we exclude the
data point x because we dont know the value of the house located at x :
Value of a house located at x
= 0.5 *value of the houses located at x b + 0.5 *value of the houses located at x + b
In contrast, when we estimate the density at x , we include the empirical density p ( x ) in
our estimate:
f

( x) =

1
1
1
p ( x b) + p ( x) + p ( x + b)
3
3
3

We include p ( x ) in our f

( x)

calculation because f

( x)

by itself is an estimate of

p ( x ) . Stated differently, in kernel smoothing, we estimate f X ( x ) twice. The first time,


we use the empirical density p ( x ) =
time, we refine our estimate f

# of x's in the sample


to estimate f X ( x ) . The 2nd
sample size n

( x ) = p ( x ) by taking the average empirical densities of

x and its nearby points x b and x + b . This is why kernel smoothing recalculates at
every point by averaging this point and its nearby points.

Of course, we can expand our neighborhood. Instead of looking at only two nearby points,
we may look at 4 nearby points and calculate the average empirical density of a 5-point
neighborhood:
f

( x) =

1
1
1
1
1
p ( x 2b ) + p ( x b ) + p ( x ) + p ( x + b ) + p ( x + 2b )
5
5
5
5
5
calculate f ( x ) by averaing the empirical densities
of a neighborhood x 2b , x b , x , x +b , x + 2b

In addition, we dont need to use equal weighting. We can assign more weight to the data
points near x . For example, we can set
f

( x) =

1
2
4
2
1
p ( x 2b ) +
p ( x b) + p ( x) +
p ( x + b ) + p ( x + 2b )
10
10
10
10
10

Guo Fall 2009 C, Page 76 / 284

Now you understand the essence of kernel smoothing. Lets talk about the two major
issues to think about if you want to use kernel smoothing:

How big is the neighborhood? This is called the bandwidth. The bigger the
neighborhood, the greater the smoothing. However, if your neighbor is too big,
you may run the risk of over-smoothing and finding false patterns.

How much weight you do give to each data point in the neighborhood? For
example, you can assign equal weight to each data point in the neighborhood.
You can also give more weight to the data point closer to the point whose density
you want to estimate. There are many weighting methods out there for you to use.
The weighting method is called kernel.

Of these two factors, the bandwidth is typically more important than the weighting
method. Your final result may not change much if you use different weighting method.
However, if you change the bandwidth, your estimated density may change widely.
Theres some literature out there explaining in more details on how to choose a proper
bandwidth and a proper weighting method. However, for the purpose of passing Exam C,
you dont need to know that much.
3 kernels you need to know
Loss Models explains three kernels. Youll need to understand them.

Uniform kernel. This is one of the easiest weighting methods. If you use this
method to estimate density, youll assign equal weight to each data point in the
neighborhood.

Triangular kernel. Under this weighting method, you give more weight to the
data points that are closer to the point for which you are estimating density.

Gamma kernel. This is more complex but less important than the uniform kernel
and the triangular kernel. If you want to cut some corners, you can skip the
gamma kernel.

Now lets look at the math formulas. Lets focus on the uniform kernel first.

Uniform kernel
The uniform kernel for estimating density function:
0
ky ( x) =

1
2b
0

if x < y - b
if y - b

y+b

if x > y + b
Guo Fall 2009 C, Page 77 / 284

Lets look at the symbol k y ( x ) . Here x is your target data point (the location of the house
you want to buy) for which you want to estimate the density (the fair price of the house
you want to buy). y is a data point in the neighborhood (location of a similar house in the
neighborhood). k y ( x ) is y s weight for estimating the density function of x .
The uniform kernel estimator of the density function at x :
f ( x)

p ( yi )

=
All yi

kernel estimator of the


density function at x

k yi ( x )

empirical density of yi

yi 's weight

Calculate the density at x by taking the average of


the empirical densities of the nearby points yi 's

The uniform kernel for estimating the distribution function:


if x < y - b

0
K y ( x) =

x y+b
if y - b x y + b
2b
1
if x > y + b

The uniform kernel estimator of the distribution function at x :


F ( x)

p ( yi )

=
All yi

kernel estimator of the


distribution function at x

empirical density of yi

K yi ( x )
yi 's weight

Calculate the distribution function at x by taking the


average of the empirical densities of the nearby points yi 's

Now lets look at the formula for k y ( x ) . The formula looks intimidating. The good news
is that you really dont need to memorize it. You just need to understand the essence of
the uniform weighting method. Once you understand the essence, you can derive the
formula effortless on the spot.
Lets rewrite the uniform kernel formula as:

0
1
ky ( x) =
2b
0

if x < y - b
if y - b

if x > y + b

y+b

ky ( x) =

if y - x > b

1
2b

if y - x

Guo Fall 2009 C, Page 78 / 284

To help us remember the formula, lets draw a neighborhood diagram:


A
x b

y1

y3

y4

B
x+b

y2

Here your neighborhood is [x b, x + b]. b is called the bandwidth, which is half of the
width of the neighborhood you have chosen. Now the formula for k y ( x ) becomes:

ky ( x) =

if y - x > b

1
2b

if y - x

ky ( x) =

if y is OUT of the
neighborhood [ x - b, x + b]

1
2b

if y is in the
neighborhood [ x - b, x + b]

If the data point y is out of the neighborhood [x b, x + b] , its weight is zero. We throw
this data point away and not use it in our estimation. And this should make intuitive sense.
In the neighborhood diagram, data points y1 and y2 are discarded.
If the data point y is in the neighborhood [x b, x + b], well use this data point in our
estimation and assign a weight 1 2b . In the neighborhood diagram, data points y3 and
y4 are used in the estimation and each gets a weight1 2b .
This is how we get 1 2b . Area ABCD represents the total weight we can possibly assign
to all the data points in the neighborhood. So well want the total area ABCD equal to
one.
1
Area ABCD = AB * BC = (2b) * BC =1, so BC =
.
2b
So for each data point that falls in the neighborhood AB, its weight is BC = 1 2b . For
each data point that falls out of the neighborhood AB, its weight is zero.
Now you shouldnt have trouble memorizing the uniform kernel formula for k y ( x ) .
Next, lets look at the formula for K y ( x ) , the weighting factor for the distribution
function at x :
Guo Fall 2009 C, Page 79 / 284

if x < y - b

0
K y ( x) =

x y+b
if y - b x y + b
2b
1
if x > y + b

Its quite complex to derive the K y ( x ) . So lets not worry about how to derive the
formula. Lets just find an easy way to memorize the formula. Once again, lets draw a
neighborhood diagram:

A
F
x b y

B
x+b

To find how much weight to give to the data point y toward calculating the F ( x ) , draw
a vertical line at the data point y (Line EF). Next, imagine that you use a pair of scissors
to cut off whats to the left of Line EF while keeping whats to the right of Line EF. Next,
calculate the area of the neighborhood rectangular ABCD thats remaining after the cut.
This remaining area of the neighborhood rectangular ABCD that survives the cut is
K y ( x ) . Lets walk through this rule.
If x y

Situation One

b (see the diagram below), we draw a vertical line EF at

the data point y .


A
x b

F
y

B
x+b

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram
becomes:
F
y

B
x+b

Guo Fall 2009 C, Page 80 / 284

Next, we calculate the area of the neighborhood rectangular ABCD that survives the cut.
After the cut, the original neighborhood rectangular ABCD shrinks to the rectangular
EFBC. The area of surviving area is:
EFBC = EF EC =

1
x y+b
( x + b y) =
2b
2b

This is the weight assigned to the data point y toward calculating F ( x ) .


Situation Two
the data point y .
F
y

A
x b

If y < x b (see the diagram below), we draw a vertical line EF at

B
x+b

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram
is as follows:
F
y

A
x b

B
x+b

The original neighborhood rectangular ABCD completely survives the cut. So well set
K y ( x ) = ABCD = 1 .
Situation Three
the data point y .

A
x b

If y > x + b (see the diagram below), we draw a vertical line EF at

B
x+b

F
y

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram
is as follows:

Guo Fall 2009 C, Page 81 / 284

F
y

E
The original neighborhood rectangular ABCD is completely cut off. So well set
K y ( x) = 0 .

Now you see that you really dont need to memorize the ugly K y ( x ) formula. Just draw
a neighborhood diagram, use a pair of scissors, choose y at the cutting point and cut off
the left side of the diagram. Then you just calculate the surviving area of the
neighborhood rectangle. The surviving area is the K y ( x ) .

Triangular kernel
In the uniform kernel, every data point in the neighborhood gets an identical weight of
1 2b . Say we have two data points in the neighborhood y3 and y4 , but y4 is closer to x
and y4 is farther away from x (see the diagram below).
A
x b

y3

x y4

B
x+b

The uniform kernel will gives 1 2b to y3 and y4 .


However, often times it makes sense for us to give y4 more weight than y3 . For example,
x is the location of the house you want to buy; y3 and y4 are the locations of the two
similar houses in your neighborhood. It makes intuitive sense for us to give more weight
to the house located at y4 than the one located at y3 . If the house located at y3 was sold
at $200,000 and the house located at y4 was once sold at $198,000, we might want to
assign 40% weight to the house located at y3 and 60% to the one located at y4 . Then the
estimated fair price of the house located at x is:
60%* Price of the house located at y4 + 40% * Price of the house located at y3
= 60% * 198,000 + 40% * 200,000 = $198,800

Guo Fall 2009 C, Page 82 / 284

Here comes the kernel smoothing. Kernel smoothing assigns more weight to a data point
closer to the point for which we need to estimate the density. Its assign less weight to a
data point farther away from the point for which we need to estimate the density.
Lets make sense of the triangular kernel formulas for k y ( x ) and K y ( x ) . First, lets look
at k y ( x ) :
0
b+ x y
b2
ky ( x) =
b+ y x
b2
0

if x < y - b
if y - b
if y

y+b

if x > y + b

Lets rewrite this formula as:


0
b+ x y
b2
ky ( x) =
b+ y x
b2
0

if x < y - b
0
if y - b
if y

ky ( x) =

y+b

if x > y + b

Please note that y - b x y is equivalent to x


equivalent to x - b y x .

x- y >b

if

b+ x y
b2
b+ y x
b2

if x

if x - b

x + b . And y

x+b
y

y + b is

To make sense of the k y ( x ) formula, lets draw a neighborhood diagram:


D
H
F

y1

A
x b

E
y2

C
x

G
y3

B
x+b

y4

Guo Fall 2009 C, Page 83 / 284

The neighborhood is [A, B]= [x b, x + b]. Now the k y ( x ) formula becomes:

0
ky ( x) =

b+ x y
b2
b+ y x
b2

0
ky ( x) =

b+ x y
b2
b+ y x
b2

if
if x

x- y >b
y

if x - b

x+b
y

if y is OUT of the neighborhood [ x - b, x + b ]

[ x, x + b ]

if y is in the right-half neighborhood, that is y


if y is in the left-half neighborhood, that is y

[ x - b, x ]

It makes sense that k y ( x ) = 0 if y is out of the neighborhood [ x - b, x + b] . Data points


y1 and y4 are out of the neighbor and have zero weight.

Now lets find k y when the data point y is in the neighborhood [ x - b, x + b] . Data points
y2 and y3 are in the neighborhood and their weights are equal to the height EF and GH
respectively.

Before calculating EF and GH, let me give you a preliminary high school math formula.
This formula is used over and over in the triangle kernel smoothing:
In a triangle ABC,
DE EC
=
,
AB BC

B = 90 degrees and DE is parallel to AB. Then

DE = AB

1
DE EC
DEC 2
DE
=
=
ABC 1 AB BC
AB
2

EC
BC

EC
=
BC

where DEC represents the area DEC and

DE
DEC = ABC
AB

EC
= ABC
BC

ABC the area of ABC.

Guo Fall 2009 C, Page 84 / 284

DE EC
If you dont understand why
=
and
AB BC
high school geometry.

DEC
EC
=
ABC
BC

, youll want to review

Now lets come back to the following diagram and calculate EF and GH. EF is the weight
assigned to the data point y2 . GH is the weight assigned to the data point y3 .
D
H
F

y1

A
x b

E
y2

C
x

G
y3

B
x+b

y4

First, please note that the area of the triangle ABD represents the total weight assigned to
all the data points in the neighborhood [A, B]. So the area of the triangle ABD should be
one:
1
ABD = 0.5 * AB * CD = 1. However, AB= 2b .
0.5* 2b *CD=1, CD =
b

y
EF AE
AE
=
, EF =
CD = 2
CD AC
AC

(x
b

b ) 1 b + y2
=
b
b2

x + b y3 1 b + x y3
GH BG
BG
=
, GH =
CD =
=
CD BC
BC
b
b
b2

if y2

[x

b, x ] ;

if y3

[ x, x + b ]
Guo Fall 2009 C, Page 85 / 284

So we have:
if y is OUT of the neighborhood [ x - b, x + b ]

0
ky ( x) =

b+ x y
b2
b+ y x
b2

[ x, x + b ]

if y is in the right-half neighborhood, that is y

[ x - b, x ]

if y is in the left-half neighborhood, that is y

Next, lets look at the triangle kernel formula K y ( x )

if x < y - b

(b + x

y)

if y - b

2b 2

K y ( x) =

(b + y

x)

2b 2

if y

y+b

if x > y + b

Lets rewrite this formula as:


0

(b + x

y)

(b + y

Please note that y - b

to y

x)

2b 2

if y

[ x, x + b ]

if y

[ x - b, x ]

if y

( x + b, + )

2b 2

K y ( x) =

if y

, x b)

y is equivalent to y

[ x, x + b ]

and y

y + b equivalent

[ x - b, x ] .

To make sense of the K y ( x ) formula, well apply the scissor-cut rule.

Guo Fall 2009 C, Page 86 / 284

D
H
F

y1

Situation One

A
x b

If y

E
y2

C
x

G
y3

B
x+b

y4

[ x, x + b ]

Draw a vertical line at the data point y (Line GH). Next, imagine that you use a pair of
scissors and cut off whats to the left of Line GH while keeping whats to the right of
Line GH. Next, calculate the area of the triangle ABD remaining after the cut. This
remaining area after the cut is K y ( x ) .

D
H

A
x b

C
x

G
y

B
x+b

After the cut:


H

G
y

B
x+b
Guo Fall 2009 C, Page 87 / 284

BG
K y ( x ) = BGH = BDC
BC
If y

Situation Two

1 x+b y
=
2
b

(x +b
=

y)

2b 2

[ x - b, x ]
D

A
x b

E
y

C
x

B
x+b

Draw a vertical line at data point y (Line EF). Cut off whats to the left of EF.
After the cut:
D

E
y

K y ( x ) = BDFE = 1
AE
AEF = ACD
AC

K y ( x ) = BDFE = 1

C
x

B
x+b

AEF
2

1 y
= !
2

(b + x

y)

(x
b

b)

"

(b + x
=

y)

2b 2

2b 2

Guo Fall 2009 C, Page 88 / 284

If y

Situation three

, x b)
D

M
y

A
x b

C
x

B
x+b

Draw a vertical line MN at data point y . Cut off whats to the left of line MN. Now the
whole area ABD will survive the cut. So K y ( x ) = 1 .

If y

Situation Four

( x + b, + )
D
S

A
x b

C
x

B
x+b

R
y

Draw a vertical line RS at data point y . Cut off whats to the left of line RS. Now the
whole area ABD will be cut off. So K y ( x ) = 0 .
Now you see that you really dont need to memorize the complex formulas for K y ( x ) .
Just draw a diagram and directly calculate K y ( x ) .
Finally, lets look at the gamma kernel.

Guo Fall 2009 C, Page 89 / 284

Gamma kernel
#x

ax y

y
x$ (# )

ky ( x) =

, where x > 0

To understand the gamma kernel, youll need to know this: in kernel smoothing, all the
weights should add up to one. Because of this, for convenience, we can use a density
function as weights. This way, the weights automatically add up to one.

(x % )

In the gamma kernel, we just use gamma pdf

ky ( x)

(x % )
=

x %

x$ (# )

x# 1e x %
x# 1e
= #
=
#
% $ (# )
y

x %

x$ (# )

. However, we set % =

xa y

$ (# )

The simplest gamma pdf is when a = 1 (i.e. exponential pdf). So the simple gamma
kernel is an exponential kernel:

ky ( x) =

1
e
y

x y

, where x > 0
x

x
y

If you need to find the exponential kernel for F ( x ) , then K y ( x ) = & k y ( t )dt = 1 e .
0

This is all you need to know about gamma kernel.

Problem 1
A random sample of size 12 gives us the following data:
1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12
Use uniform kernel with bandwidth 2, calculate f ( 6 ) , F ( 6 )
Solution
uniform kernel with bandwidth 2
Guo Fall 2009 C, Page 90 / 284

The neighborhood is from 6- b =6-2=4 to 6+ b =6+2=8. When calculating f ( 6 ) , we


discard any data points that are out side of the neighborhood [4, 8]. So 1, 2, 3, 3, 9, 9,11,
12 are discarded. We only consider 5, 6, 7, 8. Each of these four data points has a weight
of 1 / (2*b)=1/4.
So f ( 6 ) =

p ( y ) k y ( 6) =

1 1
1 1
1 1
1 1
1
+
+
+
=
12 4 12 4 12 4 12 4
12

In the calculation of F ( 6 ) , any data point that falls out of the lower bound or touches the
lower bound of the neighborhood [4, 8] gets a full weight of 1. Data 1, 2, 3, 3 are below
the lower bound of the neighborhood [4, 8] and they each get a weight of 1. Any data
point that falls out of the upper bound or touches the upper bound of the neighborhood [4,
8] get zero weight. So 8 (touching the upper bound) and 9, 9, 11, 12 (staying above the
upper bound) each get zero weight.
Data points y = 5, 6, 7 are in the neighborhood range [4, 8]. If you draw a diagram, youll
find that the weights for y = 5, 6, 7 are:
K5 ( 6 ) =

3
2
1
, K6 ( 6) = , K7 ( 6) =
4
4
4

F (6) =

p ( y ) K y ( 6)

1
1
1
1
1 3
1 2
1 1
+
+
' 0.4583
(1) + (1) + (1) + (1) +
12
12
12
12
12 4 12 4 12 4
y

11

12

1/4

1/4

1/4

1/4

3/4

2/4

1/4

p ( y ) 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12
k y (6)
K y ( 6)

Problem 2

A random sample of size 12 gives us the following data:


1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12
Use triangle kernel with bandwidth 2, calculate f ( 6 ) , F ( 6 )
Solution
Guo Fall 2009 C, Page 91 / 284

If you draw the diagram, you should get:


y
2
3
3
5
6
7
8
9
9
11
12
1
p ( y ) 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12

k y (6)
K y ( 6)

1/4

1/2

1/4

7/8

1/2

1/8

f ( 6) =
F (6) =
=

p ( y ) k y ( 6) =

1 1
1 1
1 1
1
+
+
=
12 4 12 2 12 4
12

p ( y ) K y ( 6)

1
1
1
1
1 7
1 1
1 1
+
+
= 0.42708
(1) + (1) + (1) + (1) +
12
12
12
12
12 8 12 2 12 8

Problem 3

A random sample of size 12 gives us the following data:


1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12
Use the gamma kernel with # = 1 , calculate f ( 6 ) , F ( 6 )
Solution

1
ky ( x) = e
y

Gamma kernel with # = 1


f ( 6) =
1
12
1
+
12

x y

, K y ( x ) = & k y ( t )dt = 1 e

x
y

p ( y ) k y ( 6)

1 61
1 1 62
1
e
+
e
+
1
12 2
12
1 67
1 1 68
1
e
+
e
+
7
12 8
12

1 63
1
e
+
3
12
1 69
1
e
+
9
12

1 63
1 1 65
1 1 66
e
+
e
+
e
3
12 5
12 6
1 69
1 1 6 11
1 1
e
+
e
+
e
9
12 11
12 12

6 12

' 0.0248

F (6) =

1
(1 e
12
1
+ (1 e
12
=

p ( y ) K y ( 6)
6 1

6 7

) + 121 (1

6 2

) + 121 (1

6 8

) + 121 (1

63

) + 121 (1

69

) + 121 (1

63

) + 121 (1

69

) + 121 (1

65

) + 121 (1

) + 121 (1

6 11

) + 121 (1

66

6 12

' 0.658
Guo Fall 2009 C, Page 92 / 284

Nov 2003 #4
You study five lives to estimate the time from the onset of a disease to death. The times
to death are:
2

Using a triangular kernel with bandwidth 2, estimate the density function at 2.5.
Solution
The neighborhood is [0.5, 4.5]. If you draw a neighborhood diagram, you should get:
y

p( y)

15

15

15

15

15

k y ( 2.5)

1.5
4

1.5
4

1.5
4

1.5
4

f ( 2.5 ) =

p ( y ) k y ( 2.5 ) =

1 1.5 1 1.5 1 1.5 1 1.5


+
+
+
= 0.3
5 4
5 4
5 4
5 4

Nov 2004 #20


From a population having distribution function F , you are given the following sample:

2.0

3.3

3.3

4.0

4.0

4.7

4.7

4.7

Calculate the kernel density estimate F ( 4 ) , using the uniform kernel with bandwidth 1.4.
Solution

The neighborhood is [4-1.4, 4+1.4]=[2.6, 5.4] = [B,E]

A
2

B
2.6

C
3.3

D
4

E
4.7

F
5.4

Guo Fall 2009 C, Page 93 / 284

If you use scissors to cut whats left to the line AG at y = 2 , the neighborhood
rectangular BEKG completely survives the cut. So K y = 2 ( 4 ) = ABD = 1 .

If you use scissors to cut whats left to the line CI at y = 3.3 , the surviving area is CFLI.
Area CFLI=0.75. So K y =3.3 ( 4 ) = 0.75
If you use scissors to cut whats left to the line DJ at y = 4 , the surviving area is DFLI,
which is 0.5. K y = 4 ( 4 ) = BCD = 0.5 .
If you use scissors to cut whats left to the line EK at y = 4.7 , the surviving area is EFLK,
which is 0.25. So K y = 4.7 ( 4 ) = 0.25 .
y

p( y)

2.0
18

K y ( 6)

F ( 4) =

3.3
18

3.3
18

4.0
18

4.0
18

4.7
18

0.75 0.75

0.5

0.5

0.25 0.25 0.25

p ( y ) K y ( 4) =

4.7
18

4.7
18

1
1
1
1
(1) + ( 0.75) 2 + ( 0.5 ) 2 + ( 0.25 ) 3 = 0.53125
8
8
8
8

Guo Fall 2009 C, Page 94 / 284

Chapter 4

Bootstrap

Essence of bootstrapping
Loss Models doesnt explain bootstrap much. As a result, many candidates just memorize
a black-box formula without understanding the essence of bootstrap.
Let me explain bootstrap with an example. Suppose you want to find out the mean and
variance of GRE score of a group of 5,000 students. One way to do so is to take out lot of
random samples. For example, you can sample 20 students GRE scores and calculate the
mean and variance of the GRE score. Here you have one sample of size 20. Of course,
you want to take many samples. For example, you can take out 30 samples, each sample
consisting 20 students GRE score. For each of the 30 samples, you can calculate the
mean and variance of the GRE score.
As you can see, taking 30 samples of size 20 takes lot of time and money. As a research
scientist, you are short of research grant. And your life is busy. Is there any way you can
cut some corners?
You can cut corners this way. Instead of taking out 30 samples of size 20, you just take
out one sample of size 20 and collect 20 students GRE scores. These 20 scores are X 1 ,
X 2 ,, X 20 . You bring these 20 scores home. Your data collection is done.
Next, you reproduce 30 samples of size 20 each from one sample of size 20. How? Just
resample from your one sample of 20 scores. You randomly select 20 scores with
replacement from the 20 scores you have. This is your 1st resample. Next, you randomly
select 20 scores with replacement from the 20 scores you have. This is your 2nd resample.
If you repeat this process 30 times, youll get 30 resamples of size 20 each. If you repeat
this process 100 times, youll get 100 resamples of size 20 each. Now your original one
sample gives birth to many resamples. How wonderful.
The rest is easy. If you have 30 resamples, you can calculate the mean and variance of the
GRE scores for each sample. This should give you a good idea of the mean and variance
of the GRE scores.
Does this sound a fraud? Not really. Your original sample of size 20 X 1 , X 2 ,, X 20
reflects the population. As a result, resamples from this sample are pretty much what you
get if you take out many samples from the population. (By the way, the bootstrap comes
from the phrase to pull oneself by ones bootstrap.)
To use bootstrap, youll need to have a computer and some bootstrapping software to
quickly create a great number (such as 10,000) of resamples and to calculate the statistics
of the resamples. Bootstrap is a computer-intensive technique.
Guo Fall 2009 C, Page 95 / 284

To summarize, bootstrap reduces researchers time and money spent on data collection.
Researchers just need to collect one good sample and bring it home. Then they can use
computers to create resamples and calculate statistics data.

Recommended supplemental reading


For more information on bootstrap, you can download the free PDF file at
http://bcs.whfreeman.com/pbs/cat_160/PBS18.pdf

May 2000 #17


You are given a random sample of two values from a distribution F :
1
You estimate
X=

( F ) = Var ( X )

using the estimator g ( X 1 , X 2 ) =

1
2

2
i =1

(X

, where

X1 + X 2
. Determine the bootstrap approximation to the mean square error.
2

Solution
Your original sample is (1,3). The variance of your original sample is
Var ( X ) = E ( X

1
E ( X ) = (12 + 32 )
2
2

1
(1 + 3)
2

=1

Under the bootstrap method, you resample from your original sample with replacement.
Your resamples are: (1,1), (1,3),(3,1), and (3,3), each having probability of 1 4 .
For each resample, you calculate g ( X 1 , X 2 ) =

1
2

2
i =1

(X

. Then the mean square

error is: MSE = E g ( X 1 , X 2 ) Var ( X ) .


2

Guo Fall 2009 C, Page 96 / 284

Resample with
replacement
( X1 , X 2 )
(1,1)

X=

X1 + X 2
2

(1,3)

(3,1)

(3,3)

MSE = E g ( X 1 , X 2 ) Var ( X )
=

g ( X1, X 2 ) =
1
2
1
2
1
2
1
2
2

1
2

2
i =1

(X

(1 1) + (1 1)
2

=0

(1

2) + (3 2)

=1

(3

2 ) + (1 2 )

=1

(3

3) + ( 3 3)

=0

P ( X 1 , X 2 ) g ( X 1 , X 2 ) Var ( X )

1
1
1
1
1
2
2
2
2
( 0 1) + (1 1) + (1 1) + ( 0 1) =
4
4
4
4
2

Nov 2006 #26


You are given a random sample of two values from a distribution F :
2
You estimate
X=

( F ) = Var ( X ) using the estimator g ( X 1 , X 2 ) =

2
i =1

(X

, where

X1 + X 2
. Determine the bootstrap approximation to the mean square error.
2

Solution
The only difference between this problem and the previous problem (May 2000 #17)
is the definition of g ( X 1 , X 2 ) . In this problem, g ( X 1 , X 2 ) =

previous problem, g ( X 1 , X 2 ) =

1
2

2
i =1

(X

2
i =1

(X

; in the

Your original sample is (1,3). The variance of your original sample is


Var ( X ) = E ( X

1
E ( X ) = (12 + 32 )
2
2

1
(1 + 3)
2

=1

Guo Fall 2009 C, Page 97 / 284

Under the bootstrap method, you resample from your original sample with replacement.
Your resamples are: (1,1), (1,3),(3,1), and (3,3), each having probability of 1 4 .
2

For each resample, you calculate g ( X 1 , X 2 ) =

i =1

(X

. Then the mean square error

is: MSE = E g ( X 1 , X 2 ) Var ( X ) .


2

Resample with
replacement
( X1 , X 2 )
(1,1)

X=

X1 + X 2
2

(1
(1
(3
(3

(1,3)

(3,1)

(3,3)

MSE = E g ( X 1 , X 2 ) Var ( X )
=

g ( X1, X 2 ) =

i =1

(X

1) + (1 1) = 0
2

2) + (3 2) = 2
2

2 ) + (1 2 ) = 2
2

3 ) + ( 3 3) = 0
2

P ( X 1 , X 2 ) g ( X 1 , X 2 ) Var ( X )

1
1
1
1
2
2
2
2
( 0 1) + ( 2 1) + ( 2 1) + ( 0 1) = 1
4
4
4
4

May 2005 #4

Three observed values of the random variables X are:

4.

{X

E(X )

You estimate the 3rd central moment of X using the estimator


g ( X1, X 2 , X 3 ) =

1
3

(X

Determine the bootstrap estimate of the mean squared error of g .


Solution

First, you need to understand that the n -th central moment is E

}.

For example, the 1st central moment is


E X

E(X ) = E(X ) E E(X ) = E(X ) E(X ) = 0

Guo Fall 2009 C, Page 98 / 284

{X

E(X )

{X

E(X )

The 2nd central moment is E


The 3rd central moment is E

} = Var ( X ) .

}.

Your original sample is (1,1,4). The 3rd central moment of this sample is calculated as
follows:
X=

1+1+ 4
=2 , E
3

{X

E(X )

} = 13 (1

2) +
3

1
1
3
3
(1 2 ) + ( 4 2 ) = 2
3
3

The third central moment of this original sample is used to approximate the true 3rd
central moment of the population. So the true parameter is = 2 .
Next, you need to understand bootstrap. Under bootstrap, you resample from the original
sample with replacement. Imagine you have 3 boxes to fill from left to right. The 1st box
can be filled with any number of your original sample (1,1,4); the 2nd box can be filled
with any number of your original sample (1,1,4); and the 3rd box can be filled with any
number of your original sample (1,1,4). The # of resamples is 33=27. This is a concept in
Exam P.
For each resample ( X 1 , X 2 , X 3 ) , you calculate g ( X 1 , X 2 , X 3 ) =

1
3

(X

Your resamples are:


(1) Three 1s. The number of permutation is 8. To understand why, lets denote the
original sample as (a,b,c) with a=1, b=1, and c=4. Then the following 8 resamples will
produce (1,1,1): aaa,aab,aba,baa, bba,bab,abb, bbb. For the resample of (1,1,1),
X=

1+1+1
=1 ,
3

=E

{X

E(X )

} = 13 (1

1) +
3

1
1
3
3
(1 1) + (1 1) = 0 ,
3
3

= ( 0 2) = 4
2

(2) Two 1s and one 4. The following 8 permutations will produce two 1s and one 4:
aac,aca,caa,bbc,bcb,cbb,abc,acb,cab,bac,bca,cba.
X=

1+1+ 4
=2 , E
3

{X

E(X )

} = 13 (1

2) +
3

1
1
3
3
(1 2 ) + ( 4 2 ) = 2 ,
3
3

= ( 2 2) = 0
2

(3) Two 4s and one 1. The following 6 permutations will produce two 4s and one 1:
Guo Fall 2009 C, Page 99 / 284

cca, cac, acc, ccb, cbc, bcc.


X=

1+ 4 + 4
=3 , E
3

{X

E(X )

} = 13 (1

3) +
3

1
1
3
3
( 4 3) + ( 4 3) = 2 ,
3
3

= ( 2 2) = 0
2

(4) Three 4s. The following 1 permutation will produce two 4s and one 1: ccc.
X=

4+4+4
=4 , E
3

{X

E(X )

} = 13 ( 4

4) +
3

1
1
3
3
( 4 4) + ( 4 4) = 0 ,
3
3

= ( 4 2) = 4
2

Finally, the mean squared error is:


E

8
12
6
1
( 4 ) + ( 0 ) + (16 ) + ( 4 )
27
27
27
27

4.9 .

Nov 2004 #16


A sample of claim amounts is {300, 600, 1500}. By applying the deductible to this
sample, the loss elimination ratio for a deductible of 100 per claim is estimated to be
0.125.
You are given the following simulations from the sample:
Simulation
1
2
3
4
5
6
7
8
9
10

600
1500
1500
600
600
600
1500
1500
300
600

Claim Amounts
600
300
300
600
300
600
1500
300
600
600

1500
1500
600
300
1500
1500
1500
1500
300
600

Determine the bootstrap approximation to the mean square error of the estimate.
Solution
Your original sample is {300, 600, 1500}. If you resample this sample with replacement,
youll get 33=27 resamples. However, calculating the mean square errors based on 27
Guo Fall 2009 C, Page 100 / 284

resamples is too much work under the exam condition. Thats why SOA gives you only
10 resamples.
E min ( X , d )
Loss elimination ratio is LERX ( d ) =
.
E(X )
Loss elimination ratio for the original sample {300, 600, 1500} with 100 deductible is
0.125. SOA already gives the loss ratio. If we need to calculate it, this is how:
For the loss amount 300, the insurer pays only 200, saving 100.
For the loss amount 600, the insurer pays only 500, saving 100.
For the loss amount 1500, the insurer pays only 1400, saving 100.
1
The expected saving due to 100 deductible is:
(100 + 100 + 100 ) = 100
3
1
The expected loss amount is: ( 300 + 600 + 1500 ) = 100 + 200 + 500 = 800
3
So the loss ratio is: 100 / 800 = 0.125
Next, for each of the 10 resamples, you calculate the loss ratio as we did for the original
sample. To speed up the calculation, lets set $100 as one unit of money. Then the
deductible is one.
2
LER
( LER 0.125 )
X1
X2
X3
Resample
1
6
6
15
1/9
0.000193
2
15
3
15
1/11
0.001162
3
15
3
6
1/8
0
4
6
6
3
1/5
0.005625
5
6
3
15
1/8
0
6
6
6
15
1/9
0.000193
7
15
15
15
1/15
0.003403
8
15
3
15
1/11
0.001162
9
3
6
3
1/4
0.015625
10
6
6
6
1/6
0.001736
Total
0.0291
For example, for the 1st resample {6,6,15}, the claim payment after the deductible of 1 is
{5,5,14}. So the LER is (1+1+1) / (6+6+15) =3/27=1/9.
The MES =

10

1
( LERi
i =1 10

0.125 ) =
2

0.0291
= 0.0029
10

Guo Fall 2009 C, Page 101 / 284

Chapter 5

Bhlmann credibility model

Trouble with black-box formulas


The Bhlmann credibility premium formula is tested over and over in Course 4 and Exam
C. However, many candidates dont have a good understanding of the inner workings of
the Bhlmann credibility premium model. They just memorize a series of black-box
formulas:
Z=

E Var ( X
n
, k=
n+k
Var (

, and P

(1 Z ) + Z X

Rote memorization of a formula without fully grasping the concepts is tedious, difficult,
and prone to errors. Additionally, a memorized formula will not yield the needed
understanding to grapple with difficult problems.
In this chapter, were going to dig deep into Bhlmanns credibility premium formula and
gain a crystal clear understanding of the concepts.

Rating challenges facing insurers


Lets start with a simple example to illustrate one major challenge an insurance company
faces when determining premium rates. Imagine you are the founder and the actuary of
an auto insurance company. Your companys specialty is to provide auto insurance for
taxi drivers.
Before you open your business, there are half of dozen insurance companies in your area
that offer auto insurance to taxi drivers. The world has been going on fine for many years
without your start up. It can continue going on without your start up. So its tough for you
to get customers. Finally, you take out a big portion of your saving account and buy TV
advertising, which brings in your first three customers: Adam, Bob, and Colleen. Since
your corporate office is your garage and you have only one employee (you), you decide
that three customers is good enough for you to start your business.
When you open your business at t = 0 , you sell three auto insurance policies to Adam,
Bob, and Colleen. The contract of your insurance policy says that the premium rate is
guaranteed for only two years. Once the two-year guarantee period is over, you have the
right to set the renewal premium, which can be higher than the guaranteed initial
premium.
When you set your premium rate at t = 0 , you notice that Adam, Bob, and Colleen are
similar in many ways. They are all taxicab drivers. They work at the same taxi company
in the same city. They are all 35 years old. They all graduated from the same high school.
Guo Fall 2009 C, Page 102 / 284

They are all careful drivers. Therefore, at t = 0 you treat Adam, Bob, and Colleen as
identical risks and charge the same premium for the first two years.
To actually set the initial premium for the first two years, you decide to buy a rate book
from a consulting firm. This consulting firm is well-known in the industry. Each year it
publishes a rate manual that lists the average claim cost of a taxi driver by city, by
mileage and by several other criteria. Based on this rate manual, you estimate that Adam,
Bob, and Colleen may each incur $4 claim cost per year. So at t = 0 , you charge Adam,
Bob, and Colleen $4 each. This premium rate is guaranteed for two years.
During the 2-year guaranteed period, Adam, Bob, and Colleen have incurred the
following claims:
Year 1
Year 2 Total Claim
Average claim
Claim
Claim
per insured per year
Adam
$0
$0
$0 / 2 = $0
$0
Bob
$1
$7
$8 / 2 = $4
$8
Colleen
$4
$9
$13 / 2 =$6.5
$13
Grand Total
$21
Average claim per person per year (for the 3-person group): $21 / (3 2) = $3.5
Now the two-year guarantee period is over. You need to determine the renewal premium
rate for Adam, Bob, and Colleen respectively for the third year. Once you have
determined the premium rates, you will need to file these rates with the insurance
department of the state where you do business (called domicile state).
Question: How do you determine the renewal premium rate for the third year for Adam,
Bob, and Colleen respectively?
One simple approach is to charge Adam, Bob, and Colleen a uniform rate (i.e. the group
premium rate). After all, Adam, Bob, and Colleen are similar risks; they form a
homogeneous group. As such, they should pay a uniform group premium rate, even
though their actual claim patterns for the past two years are different. You can continue
charging them the old rate of $4 per insured per year. However, since the average claim
cost for the past two years is $3.50 per insured per year, you can charge them $3.50 per
person for year three.
Under the uniform group rate of $3.50, Bob and Colleen will probably underpay their
premiums; their actual average annual claim for the past two years exceeds this group
premium rate. Adam, on the other hand, may overpay his premiums; his average annual
claim for the past two years is below the group premium rate. When you charge each
policyholder the uniform group premium rate, low-risk policyholders will overpay their
premiums and the high-risk policyholders will underpay their premiums. Your business
as whole, however, will collect just enough premiums to pay the claim costs.

Guo Fall 2009 C, Page 103 / 284

However, in the real world, most likely you wont be able to charge Adam, Bob, and
Colleen a uniform rate of $3.50. Any of your customers can easily shop around, compare
premium rates, and buy an insurance policy elsewhere with a better rate. For example,
Adam can easily find another insurer who sells a similar insurance policy for less than
your $3.50 group rate. Additionally, the commissioner of your state insurance department
is unlikely to approve your uniform rate. The department will want to see that your low
risk customers pay lower premiums.
Key points to remember:
Under the classical theory of insurance, people with similar risks form a homogeneous
group to share the risk. Members of a homogeneous group are photocopies of each other.
The claim random variable for each member is independent identically distributed with a
common density function f X ( x ) . The uniform pure premium rate is E ( X ) . Each member
of the homogeneous group should pay E ( X ) .
In reality, however, theres no such thing as a homogeneous group. No two
policyholders, however similar, have exactly the same risks. If you as an insurer charge
everybody a uniform group rate, then low-risk policyholders will leave and buy insurance
elsewhere.
To stay in business, you have no choice but to charge individualized premium rates that
are proportional to policyholders risks.
Now lets come back to our simple case. We know that uniform rating wont work in the
real world. Well want to set up a mathematical model to calculate the fair renewal
premium rate for Adam, Bob, and Colleen respectively. Our model should reflect the
following observations and intuition:

Adam, Bob, and Colleen are largely similar risks. Well need to treat them as a
rating group. This way, our renewal rates for Adam, Bob, and Colleen are
somewhat related.

On the other hand, we need to differentiate between Adam, Bob, and Colleen. We
might want to treat Adam, Bob, and Colleen as potentially different sub-risks
within a largely similar rate group. This way, our model will produce different
renewal rates. We hope the renewal rate calculated from our model will agree
with our intuition that Adam deserves the lowest renewal rate, Bob a higher rate,
and Colleen the highest rate.

To reflect the idea that Adam, Bob, and Colleen are different sub-risks within a
largely similar rate group, we may want to divide the largely similar rate group
into four sub-risks (or more sub-risks if you like): super preferred, preferred,
standard, and sub-standard. So the rate group actually consists of four sub-risks.
Adam or Bob or Colleen can be any one of the four sub-risks.
Guo Fall 2009 C, Page 104 / 284

Here comes a critical point: we dont know who belongs to which sub-risk. We
dont know whether Adam is a super-preferred sub-risk, or a preferred sub-risk, a
standard sub-risk, or a sub-standard sub-risk. Nor do we know to which sub-risk
Bob or Colleen belongs. This is so even if we have Adams two-year claim data.
Judged from his 2-year claim history, Adam seems to be a super preferred or at
least a preferred sub-risk. However, a bad driver can have no accidents for a while
due to good luck; a good driver can have several big accidents in a row due to bad
luck. So we really cant say for sure that Adam is indeed a better risk. All we
know that Adams sub-risk class is a random variable consisting of 4 possible
values: super preferred, preferred, standard; and substandard.

To visualize that Adams sub-risk class is a random variable, think about rolling a 4-sided
die. One side of the die is marked with the letters SP (super preferred); another side is
marked with PF (preferred); the third side is marked with STD (standard); and the
fourth side is marked with SUB (substandard). To determine Adam belongs to which
sub-class, well roll the die. If the result is SP, then well assign Adam to the super
preferred class. If the result is PF, well assign him to the preferred class. And so on
and so forth. Similarly, we can roll the die and randomly assign Bob or Colleen to one of
the four sub-classes: SP, PF, STD, and SUB.
Now we are ready to come up with a model to calculate the renewal premium rate:
Let random variable X j t represent the claim cost incurred in year t by the j -th insured,
where t = 1, 2,..., n , and n + 1 and j =1,2,, and m . Here in our example, n = 2 (we
have two years of claim data) and m = 1, 2,3 (corresponding to Adam, Bob, and Colleen).
For any j =1,2,, and m , X j 1 , X j 2 ,, X j n , and X j n +1 are identically distributed with a
common density function f X ,

variance

( x, ) , a common mean = E ( X j t ) , and a common

= Var X j t . What we are saying here is that all policyholders j =1,2,,

and m have identical mean claim and identical claim variance

is a realization of
.
is a random variable (or a vector of random variables)
representing the presence of multiple sub-risks. X j 1 , X j 2 ,, X j n , and X j n +1 , which
represent the claim costs incurred by the same policyholder, belong to the same sub risk
class .
However, is unknown to us. All we know is that is a random realization of . Here
in our example, = {SP, PF, STD, SUB} . When we say that is a realization of
, we
mean that with probability p1 ,
p3 ,

= SP ; with probability p2 ,

= STD ; with probability p4 = 1

( p1 + p2 + p3 ) ,

= PF ; with probability

= SUB .

Guo Fall 2009 C, Page 105 / 284

Because X j 1 , X j 2 ,, X j n , and X j n +1 are claims generated from the same (unknown)


sub-risk class, we assume that given
identically distributed. That is, X j 1

, X j 1 , X j 2 ,, X j n , and X j n +1 are independent


, X j2

, , X j n

, X j n +1

identically distributed with a common conditional mean E X j t

).

common conditional variance Var X j t

are independent

)= (

) and a

We have observed X j 1 , X j 2 ,, X j n . Our goal is to estimate X j n +1 , the claim cost in


year n + 1 by the j -th insured, using his prior n -year average claim cost X j =

1
n

n
t =1

X jt .

The estimated value of X j n +1 is the pure renewal premium for year n + 1 . Bhlmanns
approach is to use a + Z X j to approximate X j n +1 subject to the condition that

E a+ZX j

X j n +1

is minimized.

The final result:

a + Z X j = (1 Z ) + Z X j ,

Z=

n
,
n+k

k=

(
E(X

E Var X j t
Var

= E(X j t) = E E X j t

jt

=E

)
)

E Var X j t
Var

Next, well derive the above formulas. However, before we derive the Bhlmann
premium formulas, lets go over some preliminary concepts.

3 preliminary concepts for deriving the Bhlmann premium formula


Preliminary concept #1
E(X ) = E

E(X

Double expectation

If X is discrete, E ( X ) = E

E(X

p(

)E(X ).

all

If X is continuous, E ( X ) = E

E(X

E(X

)f ( )d
Guo Fall 2009 C, Page 106 / 284

Ill explain the double expectation theorem assuming X is discrete. However, the same
logic applies when X is continuous.
Lets use a simple example to understand the meaning behind the above formula. A class
has 6 boys and 4 girls. These 10 students take a final. The average score of the 6 boys is
80; the average score of the 4 girls is 85. Whats the average score of the whole class?
This is an elementary level math problem. The average score of the whole class is:
Average score =

Total score 6 ( 80 ) + 4 ( 85) 820


=
=
= 82
# of students
10
10

Now lets rearrange the above equation:


Average score =

6
4
( 80 ) + ( 85 )
10
10

If we express the above calculation using the double expectation theorem, then we have:
E ( Score ) = EGender E ( Score Gender ) =

P ( Gender ) E ( Score Gender )

= P ( boy ) E ( score boy ) + P ( girl ) E ( score girl )


=

6
4
( 80 ) + (85 ) = 82
10
10

So instead of directly calculating the average score for the whole class, we first break
down the whole class into two groups based on gender. We then calculate the average
score of these two groups: boys and girls. Next, we calculate the weighted average of
these two group averages. This weighted average is the average of the whole class. If you
understand this formula, you have understood the essence of the double expectation
theorem.
The Double Expectation Theorem in plain English:
Instead of directly calculating the mean of the whole population, you first break down the
population into several groups based on one standard (such as gender). You calculate the
mean of each group. Next, you calculate the mean of all the group means. This is the
mean of the whole population.

Problem A group of 20 graduate students (12 with non-math major and 8 with math
major) have a total GRE score of 12,940. The GRE score distribution by major is as
follows:
Guo Fall 2009 C, Page 107 / 284

Total GRE scores of 12 non-math major


Total GRE scores of 8 math major
Total GRE score

7,740
5,200
12,940

Find the average GRE score twice. First time, do not use the double expectation theorem.
The second time, use the double expectation theorem. Show that you get the same result.
Solution
(1) Find the mean without using the double expectation theorem. The average GRE score
for 20 graduate students is:
Average score =

Total score 12,940


=
= 647
# of students
20

(2) Find the mean using the double expectation theorem.


E ( GRE ) = EMajor E ( GRE Major ) =

P ( Major ) E ( GRE Major )

= P ( non math ) E ( GRE non math ) + P ( math ) E ( GRE math )

12 7, 740
8 5, 200
+
= 647
20 12
20
8

You can see the two methods produce an identical result.


Preliminary concept #2

Total variance formula

Var ( X ) = E Y Var ( X Y ) + Var Y E ( X Y )

Proof.
Var ( X ) = E ( X 2 ) E 2 ( X )

Put the double expectation theorem into use:


E ( X ) = EY E ( X Y ) ,

E ( X 2 ) = EY E X 2 Y

However, E X 2 Y = Var ( X Y ) + E 2 ( X Y ) .
Guo Fall 2009 C, Page 108 / 284

{E E ( X Y ) }
E(X Y) ) }

Var ( X ) = E ( X 2 ) E 2 ( X ) = E Y Var ( X Y ) + E 2 ( X Y )

= E Y Var ( X Y ) + E Y E 2 ( X Y )

(E

= E Y Var ( X Y ) + Var Y E ( X Y )

If X is the lost amount of a policyholder and Y is the risk class of the policyholder, then
Var ( X ) = E Y Var ( X Y ) + Var Y E ( X Y ) means that the total variance of the loss
consists of two components:

E Y Var ( X Y ) , the average variance by risk class

Var Y E ( X Y ) , the variance of the average loss by risk class.

E Y Var ( X Y ) is called the expected value of process variance. Var Y E ( X Y ) is

called the variance of hypothetical mean.

Var ( X )

Total variance

E Y Var ( X Y )
expected process variance

Var Y E ( X Y )
variance of hypothetical mean

Next, lets look at a comprehensive example using double expectation and total variance.
Example. The number of claims, N , incurred by a policyholder has the following
distribution:

P (n) =

3!
3 n
p n (1 p ) .
n !( 3 n ) !

P is uniformly distributed over [0, 1]. Find E ( N ) and Var ( N ) .

Solution

If p is constant, N has a binomial distribution with mean and variance:

E ( N ) = 3 p , Var ( N ) = 3 p (1 p )
However, p is also a random variable. So we cannot directly use the above formula.
Guo Fall 2009 C, Page 109 / 284

To find E ( N ) , we divide N into different groups by p , just as we divided the class into
boys and girls. The only difference is that this time we have an infinite number of groups
( p is a continuous random variable).
Lets consider a small group [ p, p + dp ]

Each value of p is a separate group. For each group, we will calculate its mean. Then we
will find the weighted average mean of all the groups, with weight being the probability
of each groups p value. The result should be E ( N ) .
1

E ( N ) = EP E ( N p )

3 2
=
E ( N p ) f P ( p ) dp =
3 p dp =
p
2
p= 0
p= 0

=
0

3
2

Please note that p is uniform over [0, 1]. Consequently, f P ( p ) = 1 .


Alternatively, E ( N ) = EP E ( N p ) = EP [3 p ] = 3E ( P ) = 3

1
3
=
2
2

Next, well calculate Var ( N ) . One method is to calculate Var ( N ) from scratch using
the standard formula Var ( N ) = E ( N 2 ) E 2 ( N ) . Well use the double expectation
theorem to calculate E ( N 2 ) and E ( N ) .
E(N

)=E

E N p

= E N 2 p f ( p ) dp
0

E N 2 p = E 2 ( N p ) + Var ( N p ) = ( 3 p ) + 3 p (1 p ) = 6 p 2 + 3 p
2

3
E ( N ) = E N p f ( p ) dp = ( 6 p + 3 p ) dp = 2 p + p 2
2
0
0
2

Var ( N ) = E ( N

7
E (N) =
2
2

3
2

7
2

5
4

Alternatively, you can use the following formula to calculate the variance:
Var ( N ) = E p Var ( N p ) + Var P E ( N p )
Guo Fall 2009 C, Page 110 / 284

Because N p is binomial with parameter 3 and p , we have:


E ( N p ) = 3 p , Var ( N p ) = 3 p (1 p )
E p Var ( N p ) = E p 3 p (1 p ) = E p ( 3 p 3 p 2 )

= E p ( 3 p ) E p ( 3 p 2 ) = 3E p ( p ) 3 E p ( p 2 )

Var P E ( N p ) = Var P ( 3 p ) = 9Var ( p )


Var ( N ) = E p Var ( N p ) + Var P E ( N p ) = 3E p ( p ) 3E p ( p 2 ) + 9Var ( p )

Applying the general formula:


a+b
If X is uniform over [a, b] , then E ( X ) =
,
2

Var ( X )

(b
=

a)
12

We have:
0 +1 1
E (P) =
= ,
2
2

Var ( P )

(1
=

1
E ( P ) = E ( P ) + Var ( P ) =
2
2

0)
1
=
12
12

1
4
=
12 12

Var ( N ) = E p Var ( N p ) + Var P E ( N p ) = 3E p ( p ) 3E p ( p 2 ) + 9Var ( p )


=3

1
2

Preliminary concept #3

4
1
5
+9
=
12
12
4

Linear least squares regression

In a regression analysis, you try to fit a line (or a function) through a set of points. With
least squares regression, you get a better fit by minimizing the distance squared of each
point to the fitted line.
Lets say you want to find out how a persons income level affects how much life
insurance he buys. Let X represent income. Let Y represent the amount of life insurance
this person buys. You have collected some data pairs of ( X , Y ) from a group of
consumers. You suspect theres a linear relationship between X and Y . You want to
predict Y using the function a + bX , where a and b are constant. With least squares
regression, you want to minimize the following:
Guo Fall 2009 C, Page 111 / 284

Q=E

( a + bX

Y)

Next, well derive a and b .


Q
E
=
a
a

( a + bX

Y)

!
= E#
% a

( a + bX

2 "
Y) $
&

= 2 E ( a + bX Y ) = 2 a + bE ( X ) E (Y )

Setting

Q
= 0.
a

Q
=
E
b
b

a + bE ( X ) E (Y ) = 0

( a + bX

= 2E

( a + bX

Setting

Q
= 0.
b

Y)

!
= E#
% b

( a + bX

( Equation I )
2 "
Y) $
&

Y ) X = 2 aE ( X ) + bE ( X 2 ) E ( X Y )
aE ( X ) + bE ( X 2 ) E ( X Y ) = 0

(Equation II )

(Equation II) - (Equation I) E ( X ) :


b E ( X 2 ) E 2 ( X ) = E ( X Y ) E ( X ) E (Y )

However, E ( X 2 ) E 2 ( X ) = Var ( X ) , E ( X Y ) E ( X ) E ( Y ) = Cov ( X , Y ) .


b=

Cov ( X , Y )
Var ( X )

, a = E (Y ) bE ( X )

Derivation of Bhlmanns Credibility Formula


Now Im ready to give you a quick proof of the Bhlmann credibility formula. To
simplify notations, Im going to fix on one particular insured (such as Adam) and change
the symbol X j t to X t . Remember, our goal is to estimate X n +1 , the individualized
premium rate for year n + 1 , using a + Z X . Z is the credibility factor assigned to the
1
mean of past claims X = ( X 1 + X 2 + ... + X n ) . Well want to find a and Z that
n
minimize the following:
Guo Fall 2009 C, Page 112 / 284

E a+ZX

X n +1

Please note that X 1 , X 2 ,, X n , and X n +1 are claims incurred by the same policyholder
(whose risk class is unknown to us) during year 1, 2, , n , and n + 1 .
Applying the formula developed in preliminary concept #3, we have:
z=

Cov X , X n +1

( )

Var X

Cov X , X n +1 = Cov
=

1
1
( X 1 + X 2 + ... + X n ) , X n +1 = Cov
n
n

( X 1 + X 2 + ... + X n ) , X n+1

1
Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )
n

One common mistake is to assume that X 1 , X 2 ,, X n , X n +1 are independent identically


distributed. If indeed X 1 , X 2 ,, X n , X n +1 are independent identically distributed, we
would have

Cov ( X 1 , X n +1 ) = Cov ( X 2 , X n +1 ) = ... = Cov ( X n , X n +1 ) = 0

Z=

Cov X , X n +1

( )

) =0

Var X

The result Z = 0 simply doesnt make sense. What went wrong is the assumption that
X 1 , X 2 ,, X n , X n +1 are independent identically distributed. The correct statement is
that X 1 , X 2 ,, X n , and X n +1 are identically distributed with a common density function
f ( x,

) , where

is unknown to us.

Or stated differently, X 1 , X 2 ,, X n , and X n +1 are independent identically distributed


given risk class . In other words, if we fix the sub-class variable at , then all the
claims incurred by the policyholder who belongs to sub-class are independent
identically distributed. Mathematically, this means that X 1 , X 2 ,, X n , and
X n +1

are independent identically distributed.

Here is an intuitive way to see why X i and X j have non-zero covariance. X i and X j
represent the claim amount incurred at time i and j by the policyholder whose sub-class
Guo Fall 2009 C, Page 113 / 284

is unknown to us. So X i and X j are controlled by the same risk-class factor


a low risk, then X i and X j both tend to be small. On the other hand, if

. If

is

is a high risk,

then X i and X j both tend to be big. So X i and X j are correlated and have a non-zero
variance.
Next, lets derive the formula:
Cov ( X i , X j ) = E ( X i X j ) E ( X i ) E ( X j ) = Var

where i ' j .

Using the double expectation theorem, we have E ( X i X j ) = E


Because X i

and X j

conditional mean (

),

) ( ) = ( )

)E(X )

E ( Xi

=E

Cov ( X i , X j ) = E

we have:

E Xi X j

are independent identically distributed with a common

) = E(X )E(X ) = (

E Xi X j

E Xi X j

{E

=E

= Var

1
Cov ( X 1 + X 2 + ... + X n ) , X n +1
n
1
= Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )
n
1
= nVar ( ) = Var ( )
n

Cov X , X n +1 =

( )

Next, well calculate Var X .

( )

Var X = Var

1
1
( X 1 + X 2 + ... + X n ) = 2 Var ( X 1 + X 2 + ... + X n )
n
n

Once again, we have to be careful here. One temptation is to write:

Var ( X 1 + X 2 + ... + X n ) = Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n )

Wrong!

This is wrong because X 1 , X 2 ,, X n are not independent. Instead, X 1


X2

,, X n

are independent. So we have to include covariance among X 1 ,

X 2 ,, X n . The correct expression is:


Guo Fall 2009 C, Page 114 / 284

Var ( X 1 + X 2 + ... + X n )
= Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n )
+2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n )
So we have n variance terms. Though X 1 , X 2 ,, X n are not independent, they have a

common mean = E ( X ) and common variance Var ( X ) .

Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n ) = nVar ( X ) .

Next, lets look at the covariance terms:


2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n ) .
Out of X 1 , X 2 ,, X n , if you take out any two items X i and X j where i ' j , youll get

n ( n 1)
ways of taking
2
out two items X i and X j where i ' j , the sum of the covariance terms becomes:

a covariance Cov ( X i , X j ) = Var

. Since there are Cn2 =

2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n )

= 2Var (

}C

( )

2
n

= 2 Var

1
n ( n 1) = n ( n 1) Var
2

1
Var ( X 1 + X 2 + ... + X n )
n2
1
= 2 { Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n )
n
+ 2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n ) }

Var X =

1
nVar ( X ) + n ( n 1) Var
n2

Var ( X ) Var

+ Var

} = 1n {Var ( X ) + ( n

1) Var

Using the total variance formula, we have:


Var ( X ) = E Var ( X
Var ( X ) Var

+ Var

E(X

= E Var ( X

)
)
Guo Fall 2009 C, Page 115 / 284

( )

Var X = Var

1
E Var ( X
n

Finally, we have:

Z=

Cov X , X n +1

( )

) = Var

Var X

( )

E Var ( X

E Var ( X
Var

Var

n+

Var

Var X

Let k =

Var

Var

1
E Var ( X
n

. Then Z =

( )

Var X

n
n+k

( )

Next, we need to find a = E ( X n +1 ) Z E X . Remember, X 1 , X 2 ,, X n , though not


independent, have a common mean E ( X ) = and a common variance Var ( X ) .

( )

E X =E

1
1
1
( X 1 + X 2 + ... + X n ) = E ( X 1 + X 2 + ... + X n ) = ( n ) =
n
n
n

E ( X n +1 ) =

( )

a = E ( X n +1 ) Z E X = Z = (1 Z )
a + Z X = (1 Z ) + Z X = Z X + (1 Z ) , where z =

n
n+k

Guo Fall 2009 C, Page 116 / 284

Summary of how to derive the Bhlmann credibility premium formulas


Z=

Cov X , X n +1

( )

),

a = (1 Z )

Var X

Cov X , X n +1 = Cov ( X i , X j ) = Var

( )

Var X =

= VE , where i ' j

Var ( X 1 + X 2 + ... + X n )
n2

Var ( X 1 + X 2 + ... + X n ) = nVar ( X ) + n ( n 1) Cov ( X i , X j )


= nVar ( X ) + n ( n 1) Var

= n Var ( X ) Var
= nE Var ( X

( )

Var X =

Z=

( )

) = Var

Var X

( )

( )

) = Var

Var X

( )

Var X

1
E Var ( X
n

E Var ( X
Var (

1
E Var ( X
n

Var X

n+

Cov X , X n +1

Var (

Or Z =

} + n Var

+ n 2Var

Var ( X 1 + X 2 + ... + X n )
= Var
n2

Cov X , X n +1

= VE +

1
EV
n

+ nVar (
=

n
,
n+k

VE
n
=
EV
1
VE + EV n +
n
VE

P = a + Z X = (1 Z ) + Z X
Lets look at the final formula:
P
Renewal
premium

= Z

X
risk-specific
sample mean

(1

Z)

global mean

Guo Fall 2009 C, Page 117 / 284

Here P is the renewal premium rate during year n + 1 for a policyholder whose sub-risk
is unknown to us. X is the sample mean of the claims incurred by the same policyholder
(hence the same sub-risk class) during year 1, 2, , n . is the mean claim cost of all
the sub-risks combined.
If we apply this formula to set the renewal premium rate for Adam for Year 3, then the
formula becomes:
P Adam

= Z

Renewal
premium

Adam

(1

Z)

risk-specific
sample mean

Adam, Bob, Colleen


global mean

At first, the above formula may seem counter-intuitive. If we are interested only in
Adams claim cost in Year 3, why not set Adams renewal premium for Year 3 equal to
his prior two-year average claim X (so P X )? Why do we need to drag in , the
global average, which includes the claim costs incurred by Bob and Colleen?
Actually, its blessing that the renewal premium formula includes . X varies widely
based on your sample size. However, the state insurance departments generally want the
renewal premium to be stable and responsive to the past claim data. If your renewal
premium P is set to X , then P will fluctuate wildly depending on the sample size. Then
youll have a difficult time getting your renewal rates approved by state insurance
departments.
In addition, you may have P X = 0 ; this is the case for Adam. Youll provide free
insurance to the policyholder who has not incurred any claim yet. This certainly doesnt
make any sense.

By including the global mean , the renewal premium P = (1 Z ) + Z X is stabilized.


Adam

Bob

At the same time, P is still responsive to X . Since X


<X
, the renewal
Adam
Bob
<P
.
premium formula P = (1 Z ) + Z X will produce P
There are other ways to derive the Bhlmann credibility formula. For example, instead of

minimizing E a + Z X
E a+ZX

Please note that (


example, (

X n +1

, we can minimize

) = E ( X ) is a random variable. In our taxi driver insurance

) has four possible values:


Guo Fall 2009 C, Page 118 / 284

E ( X SP ) , E ( X PF ) , E ( X STD ) , and E ( X SUB )

The idea behind E a + Z X

is this. If we know that a policyholder belongs to

sub-risk , then we can set our renewal premium for year n + 1 equal to his conditional
mean claim cost ( ) = E ( X n +1 ) = E ( X 1 ) = E ( X 2 ) = ... = E ( X n ) . However, we
. As a result, we list all the possible values of (

dont know

squared errors estimator of (

) by minimizing

) and find the least mean


(

a+ZX

Then using Preliminary Concept #3, we have:


Z=

Cov X , (

( )

Var X

Cov X , (
= Cov
=

)
)

1
Cov
n

+ Cov X 2 , (

+ ... + Cov X n , (

1
( X 1 + X 2 + ... + X n ) , (
n

1
Cov X 1 , (
n

( X 1 + X 2 + ... + X n ) , ( )
)

For i = 1, 2,..., n , we have:


Cov X i , (

E Xi (

= E E Xi (

For a fixed

= E Xi (

, (

E Xi (

E ( Xi ) E (

Cov X i , (
Cov X ,

E ( Xi ) E (

) is a constant. Hence
)

= E E Xi (

=E

X1, (

E ( Xi

=E

+ Cov X ,

}= E

= E

{E

X2, (

E
2

E Xi (

= (

= Var

+ ... + CovX ,

)E

Xi

= (

Xn, (

= nVar

Guo Fall 2009 C, Page 119 / 284

Cov X , (
=

1
Cov X 1 , (
n

+ Cov X 2 , (

+ ... + Cov X n , (

( )

Var X is the same whether E a + Z X

X n +1

} =Var

or E X ,

a+ZX

n
n+k

is to be

minimized:

( ) 1n {E

Var X =

Var ( X

) = Var

+ nVar

One again, we get:


Z=

Cov X , X n +1

( )

Var X

( )

Var X

=
n+

E Var ( X
Var (

( )

a = E ( X n +1 ) Z E X = Z = (1 Z )
a + Z X = (1 Z ) + Z X = Z X + (1 Z ) ,
Theres a third approach to deriving Bhlmanns credibility formula. Instead of
minimizing

E a+ZX

X n +1

or E a + Z X

we can minimize E a + Z X

E ( X n +1 X 1 , X 2 ,..., X n )

Here X n +1 X 1 , X 2 ,..., X n represents the claim cost at year n + 1 of the policyholder who

incurred claims X 1 , X 2 ,..., X n in year 1,2,, n . The notation X n +1 X 1 , X 2 ,..., X n


emphasizes that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same sub-class .
This condition must hold for the Bhlmann credibility formula to be valid. For example,
if X n +1 comes from sub class 1 and X 1 , X 2 ,..., X n from sub-class 2 , then the Bhlmann
credibility formula will not hold true.
However, the requirement that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same
sub-class shouldnt bother us at all. At the very beginning when we presented the
Bhlmann credibility formula, we already used X 1 , X 2 ,..., X n , X n +1 to refer to the claims
incurred by the same policyholder whose sub-risk is . As a result,
Guo Fall 2009 C, Page 120 / 284

E ( X n +1 X 1 , X 2 ,..., X n ) = E X

So E a + Z X

= (

E ( X n +1 X 1 , X 2 ,..., X n )

= E a+ZX

Key Points
We can derive the Bhlmann credibility formula by minimizing any of the following
three terms:

E a+ZX

X n +1

, E a+ZX

, E a+ZX

E ( X n +1 X 1 , X 2 ,..., X n )

The Bhlmann credibility premium is the least squares linear estimator of any of the
following three terms:
X n +1 , the claim amount in year n + 1 incurred by the policyholder who has
X 1 , X 2 ,..., X n claims in year 1, 2,.., n .

) , the mean claim amount of the sub-class

that has generated X 1 , X 2 ,..., X n

E ( X n +1 X 1 , X 2 ,..., X n ) , the Bayes posterior estimate of the mean claim in year n + 1

given we have observed that the same policyholder has X 1 , X 2 ,..., X n claim costs
in years 1, 2,.., n respectively.
Even though we have derived the Bhlmann credibility formula assuming X is the claim
cost, the Bhlmann credibility formula works if X is any other quantity such as loss
ratio, the aggregate loss amount, or the number of claims.
Popularity of the Bhlmann credibility formula
The Bhlmann credibility formula is popular due to its simplicity. The renewal premium
is the weighted average of the uniform group rate and the sample mean of the past claims.
The renewal premium is easy to calculate and easy to explain to clients.

In contrast, Bayesian premiums (the posterior means) are often difficult to calculate,
requiring knowledge of prior distributions and involving complex integrations.
Next, lets derive a special case of the Bhlmann credibility formula. This special case is
presented in Loss Models.

Guo Fall 2009 C, Page 121 / 284

Special case
If E ( X i ) = , Var ( X i ) =

, and for i ' j Cov ( X i , X j ) = *

where correlation

coefficient * satisfies 1 < * < 1 , determine the Bhlmann credibility premium.

Once again, the credibility premium is a + Z X .

Z=

Cov X , X n +1

( )

),

Var X

( )

Var X =

Cov X , X n +1 = Var

= Cov ( X i , X j ) = *

Var ( X 1 + X 2 + ... + X n )
n2

Var ( X 1 + X 2 + ... + X n ) = nVar ( X ) + n ( n 1) Cov ( X i , X j ) = n

Z=

Cov X , X n +1

( )

Var X

a = (1 Z ) = 1

)=

*
1
n
n2

+ n ( n 1) *

=
2

+ n ( n 1) *

n*
1 + ( n 1) *

(1 * )
n*
=
1 + ( n 1) *
1 + ( n 1) *

The Bhlmann credibility premium is


Z X + (1 Z ) =

(1 * ) =
n*
*
X+
1 + ( n 1) *
1 + ( n 1) * 1 + * n *

n
i =1

Xi +

(1 * )
1 + * n*

You dont need to memorize the Bhlmann credibility premium formula for this special
case. If you understand how to derive the general Bhlmann credibility premium formula,
you can derive the special case formula any time by setting Cov ( X i , X j ) = * 2 .
Next, lets turn our attention toward how to solve the Bhlmann credibility problem on
the exam.

Guo Fall 2009 C, Page 122 / 284

How to tackle Bhlmann credibility problems


Step 1

Divide the policyholders into sub-classes

Step 2

For each sub-class , calculate the average claim cost (or loss ration,
aggregate claim, etc) ( ) = E ( X ) ; calculate the variance of the claim
cost Var ( X

Step 3

).

Calculate EV= E Var ( X

combined. Calculate VE= Var

, the average variance for all sub-classes


E(X

, the variance of the average

claim for all sub-classes combined.


EV
n
, Z=
VE
n+k

Step 4

Calculate k =

Step 5

Calculate = E

E(X

, the average claim cost for all sub-classes

combined. This is the uniform group premium rate you would charge
under the classical theory of insurance.
Step 6

Step 7

Calculate the sample claim of the past data X =

1
n

n
i =1

Xi .

Calculate the Bhlmann credibility premium Z X + (1 Z ) . This is the


weighted average of the sample mean and the uniform group rate.

An example illustrating how to calculate the Bhlmann credibility premium


(Nov 2003 #23)
You are given:
Two risks have the following severity distributions:
Amount of Claim
250
2,500
60,000

Probability of Claim
Amount for Risk 1
0.5
0.3
0.2

Probability of Claim
Amount for Risk 2
0.7
0.2
0.1

Risk 1 is twice as likely to be observed as Risk 2.


Guo Fall 2009 C, Page 123 / 284

A claim of 250 is observed.


Determine the Bhlmann credibility estimate of the second claim amount from the same
risk.
Solution
This is a typical problem for Exam C. Here policyholders are from two risk classes. Even
though the problem doesnt say that Risk 1 and Risk 2 are two sub-risks of a similar
bigger risk group (i.e. homogeneous group), we should assume so. Otherwise, the
Bhlmann credibility formula wont work. Remember the Bhlmann credibility premium
is the weighted average of the uniform group rate and the risk specific sample mean
X . If Risk 1 and Risk 2 are not sub-risks of a homogeneous group, then the uniform
group rate doesnt exist; we have no way of calculating Z X + (1 Z ) .

The problem says that a claim of 250 is observed. This means that a policyholder of an
unknown sub-class has incurred a claim of X 1 =$250. Since Risk 1 is twice as likely as
2
1
Risk 2, the $250 claim has
chance of coming from Risk 1 and chance of from Risk
3
3
2. The question asks us to estimate the next claim amount X 2 incurred by the same
policyholder.
Amount of Claim
250
2,500
60,000

Probability of Claim
Amount for Risk 1
0.5
0.3
0.2

Probability of Claim
Amount for Risk 2
0.7
0.2
0.1

Let X represent the dollar amount of a claim randomly chosen.


E ( X risk 1) = 250(0.5) + 2,500(0.3) + 60,000(0.2) = 12,875
E ( X risk 2 ) = 250(0.7) + 2,500(0.2) + 60,000(0.1) = 6,675

The uniform group rate is:


= E ( X ) = P ( X from risk 1) E ( X risk 1) + P ( X from risk 2 ) E ( X risk 2 )
=

2
1
(12,875 ) + ( 6, 675 ) = 10,808.33
3
3

The variance of the conditional mean is:


VE = P ( X from risk 1) E ( X risk 1)

+ P ( X from risk 2 ) E ( X risk 2 )

Guo Fall 2009 C, Page 124 / 284

(
E(X

2
1
2
2
(12,875 ) + ( 6, 675 ) 10,808.332 = 8,542, 222.22
3
3

)
risk 2 ) =250 (0.7 )+ 2,500 (0.2) + 60,000 (0.1) =361,293,750

E X 2 risk 1 =2502(0.5) + 2,5002(0.3)+ 60,0002(0.2)=721,906,250


2

(
Var ( X risk 2 ) = E ( X

Var ( X risk 1) = E X 2 risk 1


2

risk 2

E 2 ( X risk 1) = 721,906, 250 12,8752 = 556,140, 625


E 2 ( X risk 2 ) = 361, 293, 750 6, 6752 = 316, 738,125

The average conditional variance is:


EV = P ( X from risk 1) Var ( X risk 1) + P ( X from risk 2 ) Var ( X risk 1)
=

2
2
( 556,140, 625) + ( 316, 738,125) = 476,339, 791.67
3
3

k=

EV 476,339, 791.67
=
= 55.76
VE
8,542, 222.22

Z=

n
1
=
= 1.76% ,
n + k 1 + 55.76

P = Z X + (1 Z ) = 1.76% ( 250 ) + (1 1.76% )10,808.33 = 10, 622.50


Next, I want to emphasize an important point.
In the Bhlmann credibility premium formula, what matters is the
1
X = ( X 1 + X 2 + ... + X n ) , not the individual claims data X 1 , X 2 ,, X n .
n
For example, for n = 3 , ( X 1 , X 2 , X 3 ) = ( 0,3, 6 ) , ( X 1 , X 2 , X 3 ) = (1, 7,1) , and

( X 1 , X 2 , X 3 ) = ( 2, 2,5) have the same


P = Z X + (1 Z ) = 3Z + (1 Z ) .

X = 3 and will produce the same renewal premium

Guo Fall 2009 C, Page 125 / 284

Shortcut
We can rewrite the Bhlmann credibility premium formula as
n
k
k + nX
P = Z X + (1 Z ) =
X+
=
=
n+k
n+k
n+k

k +

n
i =1

Xi

n+k

EV
as the number of samples taken out of the global mean .
VE
Imagine we have two urns, A and B. A contains an infinite number of identical balls with
each ball marked with the number . B contains an infinite number of identical balls

We can interpret k =

with each ball marked with the number X . You take out k balls from Urn A and n balls
from Urn B.

k + nX
Then the average value per ball is: P =
=
n+k

k +

n
i =1

Xi

n+k

This is the renewal premium just for year n + 1 .


k

n
X

k + nX
P=
=
n+k

k +

n
i =1

Xi

n+k

Practice problems
Q1
You are an actuary on group health insurance pricing. You want to use the
Bhlmann credibility premium formula P = Z X + (1 Z ) to set the renewal premium
rate for a policy. One day the vice president of your company stops by. He has a Ph.D.
degree in statistics and is widely regarded as an expert on the central limit theorem. He
asks you to throw the formula P = Z X + (1 Z ) into the trash can and focus on .
All we care about is . As long as we charge each policyholder , well be okay, the
vice president says. The fundamental concept of insurance is that many people form a
group to share the risk. If we charge , the law of large numbers will work its magic and
well be able to collect enough premiums to pay our guarantees.
Guo Fall 2009 C, Page 126 / 284

Comment on the vice presidents remarks.


Solution
According to the law of large numbers, for a homogeneous group of policyholders, we
can set the premium rate equal to the average claim cost = E ( X ) . Some policyholders
will suffer losses greater than E ( X ) , while others will suffer losses less than E ( X ) .
However, on average, insurance companies will have collected just enough premiums to
offset the loss. As long as each policyholder pays , then the insurer will be solvent.
However, in practice, insurance companies cant charge X . Members of a so-called
homogenous risk group are really different risks. Policyholders of different risks can shop
around and compare premium rates. If any policyholder believes that his premium is too
high, he can terminate his policy and buy cheaper insurance elsewhere.
If an insurer charges to similar yet different risks, good risks will stop doing business
with the insurer and buy cheaper insurance elsewhere; only bad risks will remain in the
insurers book of business. As more and more good risks leave the insurers book of
business, the actual expected claim cost will exceed the original average premium rate .
Then the insurer has to increase , causing more policyholders to terminate their
policies. Gradually, the insurers customer base will shrink and the insurer will go
bankrupt.

Q2
Compare and contrast the classical theory of insurance and the credibility theory
of insurance.
Solution

Guo Fall 2009 C, Page 127 / 284

The classical theory vs. the credibility theory:


Classical theory of
insurance
Is there a homogeneous
Yes. This is the
group?
foundation of insurance.
Identical risks form a
homogeneous group to
share risks.
Are claim random variables
Yes. Since each member
X of different members of a of a homogeneous group
group independent
has identical risk, each
identically distributed?
members claim random
variable is independent
identically distributed at
all times.
Whats the fair premium
The fair premium is
rate?
E ( X ) = , where X is
the random loss variable
of any policyholder in
the homogeneous group.
Every member of a
homogeneous group
needs to pay , the
uniform group pure
premium rate.

Credibility
Theory
Each member of a seemingly
homogeneous group belongs
to a sub-class. The insurer
doesnt know who belongs to
which sub-class.
No. Since members of a
similar risk group are actually
of different sub-risk classes,
only claims incurred by the
same sub-class are
independent identically
distributed.
The fair premium is
E ( X = ) = ( ) , which
is the mean claim cost of the
sub-class . Every member
of the same sub-class
needs to pay ( ) ,

Q3
One day you visited your college statistics professor. He asked what you were
doing in your job. You told him that you used the Bhlmann credibility premium formula
to set the renewal premium for group health insurance policies. The Bhlmann credibility
theory was new to the professor. After listening to your explanation of the formula
P = Z X + (1 Z ) , he looked puzzled. He told you that for 20 years he had been telling
his students that X is the unbiased estimator of E ( X ) . I dont get it. Why dont you
just set P

X ?

Explain why its not a good idea to set P

X.

Solution
Your stats professor is perfectly correct in saying that the sample mean is an unbiased
estimator of the population mean. If the number of observations n is large (so we have
observed X 1 , X 2 , , X n claims), for any policyholder, setting his renewal premium
equal to his prior average mean claim is a good idea.

Guo Fall 2009 C, Page 128 / 284

In reality, however, its hard to implement the idea P X . Often you, as an insurer,
have to set the renewal premium with limited data (so n may be small). For a small n ,
X may not be a good estimate of E ( X ) . In addition, we may have a weird situation
where X = 0 . In our taxi driver insurance example, if you use P X to set the renewal
premium for Adam, youll get P = 0. This clearly doesnt make any sense.
Q4
Nov 2005 #26
For each policyholder, losses X 1 , X 2 , , X n , conditional on
identically distributed with mean

) = E(X j

),

, are independent

j = 1, 2,..., n

and variance
v(

) = Var ( X j

),

j = 1, 2,..., n

You are given:


The Bhlmann credibility factor assigned for estimating X 5 based on X 1 , X 2 ,
X 3 , X 4 is Z = 0.4

The expected value of the process variance is known to be 8.

Calculate Cov ( X i , X j ) , i ' j .


Solution

Z=

Cov X , X n +1

( )

) = Var

Var X

( )

Var X

VE
1
VE + EV
n

We are told that n = 4 (we have four years of claim data), Z = 0.4 , and VE = 0.8 .

0.4 =

VE
8
VE +
4

VE
,
VE + 2

VE = 1.33 .

So Cov ( X i , X j ) = Cov X , X n +1 = Var (

= VE = 1.33

Guo Fall 2009 C, Page 129 / 284

Q5
Nov 2005 #19
For a portfolio of independent risks, the number of claims for each risk in a year follows
a Poisson distribution with means given in the following table:
Class
1
2
3

Mean # of claims per risk


1
10
20

# of risks
900
90
10

You observe x claims in Year 1 for a randomly selected risk.


The Bhlmann credibility estimate of the number of claims for the same risk in Year 2 is
11.983. Determine x .
Solution
The problem states that x claims in Year 1 have been observed for a randomly selected
risk. The wording a randomly selected risk is needed because in order for the
Bhlmann credibility formula to work, the risk class must be unknown to us. If we
already know the risk class, we can calculate the expected number of claims in Year 2;
we dont need to estimate any more.
Please also pay attention to the wording the Bhlmann credibility estimate of the
number of claims for the same risk in Year 2 is In order for the Bhlmann credibility
formula to work, the renewal premium (or the expected number of claims in this
problem) in year n + 1 and the prior n year claims X 1 , X 2 , , X n must refer to the
same (unknown) risk class.
And now back to the problem. Let Y represent the number of claims incurred in a year
by a randomly chosen class. Since Y has is Poisson random variable,
E (Y

) = Var (Y ) .
Class
=

Mean # of claims per risk


E (Y ) = Var (Y )

1
2
3
Total

1
10
20

P(

# of risks

90%
9%
1%
100%

900
90
10
1,000

The global mean (using the double expectation theorem):

= E (Y ) = E E (Y

P(

) E (Y ) = 1( 90% ) + 10 ( 9% ) + 20 (1% ) = 2

The average conditional variance:


Guo Fall 2009 C, Page 130 / 284

P(

EV =

) Var (Y ) = 1( 90% ) + 10 ( 9% ) + 20 (1% ) = 2

The variance of conditional means:


E (Y

VE = Var

E (Y

=E

{ E E (Y ) }

= 12 ( 90% ) + 102 ( 9% ) + 202 (1% ) 22 = 9.9


Alternatively, VE = Var

E (Y

= E E (Y

)}

E (Y

= (1 2 ) ( 90% ) + (10 2 ) ( 9% ) + ( 20 2 ) (1% ) = 9.9


2

k=

EV
2
=
VE 9.9
n

nY + k
P=
=
n+k

Yi + k

i =1

n+k

2
( 2)
9.9
=
= 11.983 ,
2
1+
9.9
x+

x = 14

Q6
Nov 2005 #7
For a portfolio of policies, you are given:
The annual claim amount on a policy has probability density function
2x
f (x ) = 2 , 0 < x <

The prior distribution of


+ ( ) = 4 3, 0 < <1

A randomly selected policy had claim amount 0.1 in Year 1.

has density function:

Determine the Bhlmann credibility estimate of the claim amount for the selected risk in
Year 2.
Solution

The conditional mean is:


E(X

) = x f (x
0

) dx = x
0

2x
2

2 x2

dx =

2
0

dx =

2 1 3
x
2
3

2
3

Please dont write:

Guo Fall 2009 C, Page 131 / 284

E(X

)=

x f (x

)d

Wrong!

E(X

) is the expected value of

= . So

X if we fix the random variable

should

be treated as a constant and d = 0 . The correct calculation is to integrate x f ( x


regarding x , not

) = E(X

Conditional variance: Var ( X

E X2

= x2 f ( x

) dx = x 2

2x

) = E(X

2 x3

dx =

) = 12

(X

= E(X ) = E E(X

The global mean:

E2 ( X

Var ( X

+(

)d =

E(

)=

)d = 4

0
1
2

+(

)d

E2 (

1
EV = E (
18
2
3

= E(

)=

d =

(4 ) d

= 4

)=

d =

4
6

4
5

) = 181 64 ,

2 4
8
= ,
3 5
15

1
18

2
E(
3

)
1
18

=E

1
2

= Var

2
3

1
E(
18
=

2
3

Var (

4
5

) = E(

0
1

Var (

(4

E(X

The variance of the conditional mean: VE = Var

)=

2
3

=E

The expected conditional variance: EV = E

E(

2
3

)
2 1 4
x
2
4

dx =

Var ( X

4
6

2
75

2
VE =
3

Var (

2
)=
3

2
75

1 4
EV
18 6
k=
=
= 3.125
2
VE
2 2
3 75

Guo Fall 2009 C, Page 132 / 284

The above fraction is complex. We dont want to bother expressing k in a neat fraction;
trying to expressing k in a neat fraction is prone to errors.
n

k +
P=

i =1

Xi

k + X1
=
1+ k

n+k

8
+ 0.1
15
= 0.428
1 + 3.125

3.125

Alternative method of calculation. This method is more complex.


E(X

)=

x f (x

) dx =

2x

= E(X ) = E E(X

dx =

2 x2

dx =

= E(X

)+ (

)d =

VE = Var
E(X

E(X

)
1

=
1

=E

E(X

E(X

+(

=0

VE = VE
EV = E Var ( X

E X2

E(X

= x2 f ( x

EV = E Var ( X

d =

16
9

) dx = x 2

=
1
5

=0

= +(
0

)
)

8
=
27

+(

Var ( X

d =

2 4
3 5

5
0

8
15

)=
)

1
2

)d

2
3
1

d = 4
0

= 0.01185

2 1 4
x
2
4

dx =

8
15

E2 ( X

E2 ( X

(as before)

16 1
8
=
9 6
27

2 x3

dx =

)= E X2

d =

) = E(X

2x

2
3

{E E ( X ) }

2
3

)d

Var ( X

Var ( X

2
3

{E E ( X ) }

=0

2
3

=0

2 1 3
x
2
3

1
18

1
18
2

1
2

d =

4 1
18 6

Guo Fall 2009 C, Page 133 / 284

4 1
EV 18 6
k=
=
= 3.125
VE 0.01185
k +
P=

Xi

i =1

n+k

k + X1
=
1+ k

8
+ 0.1
15
= 0.428
1 + 3.125

3.125

Q7
May 2005, #20
For a particular policy, the conditional probability of the annual number of claims given
, = , and the probability distribution of , are as follows:
# of claims
Probability

0
2

2
1 3

Probability

0.05 0.30
0.80 0.20

Two claims are observed in Year 1.


Calculate the Bhlmann credibility estimate of the number of claims in Year 2.
Solution
Let X represent the annual number of claims.
0

X
Probability
E(X

E X2
Var ( X

1 3

)=2 5
) = 02 ( 2 ) + 12 ( ) + 22 (1 3 ) = 4

)=4

11

VE = Var E ( X
EV = E Var ( X

Probability
E(

) = 0 ( 2 ) + 1( ) + 2 (1

= E E(X

E(

(2

=9

25

11

= E [2 5

] = 2 5E ( )
) = V ( 2 5 ) = V ( 5 ) = 25Var ( )
) = E ( 9 25 2 ) = 9E ( ) 25E ( 2 )

0.05 0.30
0.80 0.20

) = 0.05 ( 0.8) + 0.3 ( 0.2 ) = 0.1


2
) = 0.052 ( 0.8) + 0.32 ( 0.2 ) = 0.02
Guo Fall 2009 C, Page 134 / 284

Var (

) = 0.02

0.12 = 0.01

= 2 5E (

) = 2 5 ( 0.1) = 1.5
VE = 25Var ( ) = 25 ( 0.01) = 0.25
EV = 9 E ( ) 25 E ( 2 ) = 9 ( 0.1)

25 ( 0.02 ) = 0.4

k +

EV
0.4
k=
=
= 1.6
VE 0.25

P=

n
i =1

n+k

Xi

k + X 1 1.6 (1.5 ) + 2
=
= 1.69
1+ k
1 + 1.6

Q8
May 2005, #17
You are given:
The annual number of claims on a given policy has a geometric distribution with
parameter The prior distribution of - has the Pareto density function

+ (- ) =

( - + 1)

. +1

0<- <

where . is a known constant greater than 2.


A randomly selected policy has x claims in Year 1.
Calculate the Bhlmann credibility estimate of the number of claims for the selected
policy in Year 2.
Solution

Let X represent the annual number of claims on a randomly selected policy. Here the
risk factor is - . The conditional random variable X - has geometric distribution. If you
look up Tables for Exam C/4, youll find geometric random variable N with parameter
- has mean and variance as follows:

E(N) = - ,

Var ( N ) = - (1 + - )

Applying the above mean and variance formula, we have:


The conditional mean: E ( X - ) = The conditional variance: Var ( X - ) = - (1 + - )

Guo Fall 2009 C, Page 135 / 284

EV = E- Var ( X - ) = E- - (1 + - ) = E- ( - ) + E- ( - 2 ) . Typically, we write E- ( - )

as E ( - ) and E- ( - 2 ) as E ( - 2 ) . So

EV = E- Var ( X - ) = E- - (1 + - ) = E ( - ) + E ( - 2 )

VE = V- E ( X - ) = V ( - )

= E- E ( X - ) = E ( - )

The global mean is:

We are told that the prior distribution of - has the Pareto density function

+ (- ) =

( - + 1)

. +1

0<- <

Here the phrase prior distribution refers to the fact that we know + ( - ) prior to our
observation of x claims in Year 1. In other words, + ( - ) hasnt incorporated our
observation of x claims in Year 1 yet. Please note that the prior distribution, not the
posterior distribution, is used in Bhlmanns credibility estimate.
Frankly, I think SOAs emphasis that + ( - ) is prior (as opposed to posterior) distribution
is unnecessary and really meant to scare exam candidates. When we talk about density
function, we always refer to prior distribution. So theres never a need to say prior
distribution. If we want to refer to a distribution that has incorporated our recent
observations, at that time we say posterior distribution.
Back to the problem. We are told that - has Pareto distribution. Is it a one-parameter
Pareto or two-parameter Pareto? Many candidates have trouble knowing which one to
use. Here is a simple rule:
To decide whether to use one-parameter Pareto or two-parameter Pareto, look at your
random variable X . If X is greater than a positive number, then use single-parameter
Pareto. If X is greater than zero, then use two-parameter Pareto:
If X > a positive constant

, then use single-parameter Pareto f ( x ) =

If X > 0 , then use two-parameter Pareto f ( x ) =

x. +1

(x + )

. +1

In this problem, the Pareto random variable - > 0 . So we should use the two-parameter
Pareto formula in Tables for Exam C/4.

Guo Fall 2009 C, Page 136 / 284

E(Xk ) =

k!
(. 1)(. 2 ) ... (.

k)

Please note that the denominator has k items.


E(X ) =

. 1

2 2
(. 1)(. 2 )

E(X2) =

Var ( X ) = E ( X

2 2
E (X ) =
(. 1)(. 2 )

. 1

(.

1) (. 2 )
2

Since the two-parameter Pareto is frequently tested in Exam C, you might want to
memorize the following formulas:
E(X ) =

. 1

2 2
E(X ) =
,
(. 1)(. 2 )

Var ( X ) =

- is a two-parameter Pareto random variable with pdf + ( - ) =


parameters are . and
E (- ) =

. 1

(.

1) (. 2 )
2

( - + 1)

. +1

. So the two

= 1 . So we have:

, E (- 2 ) =

(.

1)(. 2 )

EV = E ( - ) + E ( - 2 ) =
VE = V ( - ) =

. 1

(.

1) (.
2

2)

, Var ( - ) =

(.

1)(. 2 )

(.
=

1) (.
2

(.

= E (- ) =

2)

1)(. 2 )
1

. 1

k=

(. 1)(. 2 ) = . 1
EV
=
.
VE
2
(. 1) (. 2 )
k +

P=

n
i =1

n+k

Xi

k + X1
=
1+ k

(.

1)

+x

. 1
1 + (. 1)

x +1

.
Guo Fall 2009 C, Page 137 / 284

Q9
May 2005 #11
You are given:
The number of claims in a year for a selected risk follows Poisson distribution
with mean 0
The severity of claims for the selected risk follows exponential distribution with
mean
The number of claims is independent of the severity of claims.
The prior distribution of 0 is exponential with mean 1.
The prior distribution of is Poisson with mean 1.
A priori, 0 and are independent.
Using the Bhlmann credibility for aggregate losses, determine k .
Solution
Let N represent the annual number of claims for a randomly selected risk.
Let X represent the loss dollar amount per loss incident.
Let S represent the aggregate annual claim dollar amount incurred by a risk.
Then S =

N
i =1

X i = X 1 + X 2 + ... + X N .

N is a Poisson random variable with mean 0 . So f N ( n 0 ) = e

0n

( N = 0,1, 2,... ).
n!
Here 0 is an exponential random variable with pdf f ( 0 ) = e 0 . We have E ( 0 ) = 1 ,

Var ( 0 ) = 12 = 1 , and E ( 0 2 ) = E 2 ( 0 ) + Var ( 0 ) = 12 + 1 = 2

X 1 , X 2 ,, X N are independent identically distributed with a common pdf


1
f X ( x ) = e x . Here is a Poisson random variable with pdf f ( ) = e

) = Var ( ) = 1 . Hence

E(

Here the risk parameters are ( 0 ,

).

have E (

E (S ) = E (N ) E ( X ) ,

) = E ( ) + Var ( ) = 1
2

1
. We
!

+1 = 2 .

Var ( S ) = E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

To remember that you need to use E 2 ( X ) , not E ( X ) , in the Var ( S ) formula, please
note that Var ( S ) is dollar squared. If you use Var ( N ) E ( X ) , youll get dollar, not dollar
squared. As a result, you need to use Var ( N ) E 2 ( X ) .
For a fixed pair of ( 0 ,

),

the conditional mean is:


Guo Fall 2009 C, Page 138 / 284

E S (0,

= E(N 0) E( X

)=0

The conditional variance is:


Var S ( 0 , ) = E ( N 0 ) Var ( X

) + Var ( N 0 ) E ( X ) = 0
2

+0

Please note that N 0 is a Poisson random variable with mean 0 ; X

is an exponential

random variable with mean


EV = E0 , Var S ( 0 ,

= 20

} = E ( 20 ) .

0,

Since 0 and are independent, we have:


EV = E0 , ( 20 2 ) = 2 E ( 0 ) E ( 2 ) = 2 (1)( 2 ) = 4

{E

VE = Var 0 ,

S (0,

} = Var

0,

(0 ) = E 0, (0 )

(0 ) = E 0, (0 2 2 ) = E (0 2 ) E (
( 0 ) = E ( 0 ) E ( ) = 1(1) = 1
2

E 0,
E 0,

VE = E 0 ,
k=

(0 )

E 0, (0

E 0, (0

) = 2 ( 2) = 4
= 4 12 = 3

EV 4
=
VE 3

Q10 May 2005 #6


You are given:
Claims are conditionally independent and identically Poisson distributed with
mean

The prior distribution of


F(

) =1

1
1+

is:

2. 6

>0

Five claims are observed.


Determine the Bhlmann credibility factor.
Solution
Guo Fall 2009 C, Page 139 / 284

Let X represent the number of claims. The risk factor is


Poisson random variable with mean
The conditional mean is: E ( X

)=

The conditional variance is: Var ( X

k=

= E(

E(
EV
=
VE Var (

To quickly calculate E (

)=

VE = Var E ( X

),

= Var (

and Var (

. 1

E(X2) =

) , youll need to recognize that:

cdf for a two-parameter Pareto random variable is: F ( x ) = 1


E(X ) =

is a

EV = E Var ( X

. We are told that X

2 2
,
(. 1)(. 2 )

Var ( X ) =

, where x > 0 .

x+

(.

1) (. 2 )
2

2. 6

1
Here we are given that F ( ) = 1
. So
+1
variable with parameters = 1 and . = 2.6 .

So E ( X ) =

is a two-parameter Pareto random

1
1
2.6
2.6
= 2
=
and Var ( X ) =
2
2.6 1 1.6
( 2.6 1) ( 2.6 2 ) 1.6 ( 0.6 )

1
) = 1.6 = 1.6 ( 0.6 ) = 0.369
EV
=
k=
2.6
VE Var ( )
2.6
2
1.6 ( 0.6 )
E(

Z=

n
5
=
= 0.93
n + k 5 + 0.369

Q11 Nov 2004 #29


You are given:
Claim counts follow a Poisson distribution with mean 0
Claim sizes follow a lognormal distribution with parameters and
Claim counts and claim amounts are independent.
The prior distribution has joint pdf:
Guo Fall 2009 C, Page 140 / 284

f (0, ,

)=2

, 0 < 0 < 1, 0 < < 1, 0 <

<1

Calculate Bhlmanns k credibility for aggregate losses.


Solution

Let N represent the claim counts, X i the dollar amount of the i -th claim, and S the
aggregate losses. N 0 has Poisson distribution with mean of 0 . X i ,
distribution with parameters and
independent identically distributed.

. In addition, for i = 1 to N , X i ,

has lognormal
is

The aggregate loss is:


S=

N
i =1

Xi

E ( S ) = E ( N ) E ( X ) , Var ( S ) = E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

The risk parameters are 0 , , and


E ( S 0, ,
Var ( S 0 , ,

) = E(N 0) E( X

. If we fix 0 , , and

) = 0E ( X

, then

) = E ( N 0 )Var ( X , ) + Var ( N 0 ) E ( X
= 0 Var ( X , ) + 0 E ( X , )
= 0 E ( X , )
2

From Tables for Exam C/4, we know the lognormal distribution has the following
moments:
1
E ( X k ) = exp k + k 2
2
E ( X ) = exp +
E ( S 0, ,

1
2

) = 0E ( X

1
, E ( X 2 ) = exp 2 + 22
2

) = 0 exp

1
2

= exp ( 2 + 2

Guo Fall 2009 C, Page 141 / 284

Var ( S 0 , ,

) = 0E ( X

= 0 exp ( 2 + 2

The global mean is:


E ( S 0, ,

= E0 , ,
1

0 exp +

1
2

0 exp +

1
2

= 0 = 0 0 =0

=
= 0 = 0 0 =0

1
2

Set 0.5

f (0, ,

= y . Then

= ( e 1)

e0.5 d = ( e 1)

= 0 = 0 0 =0

=
= 0 = 0 0 =0

1
3

=0

e y dy = ( e 1) ( e0.5 1)

= E 0 , ,
2

= E 0 , ,

E ( S 0, ,

)2

2 exp (

=0

) 12 ( e

0 2d 0 d d

0 =0

1) d

{E
2

E ( S 0, ,

0 , ,

)}

d0 d d

exp ( 2 )

0 2 exp ( 2 +

0 2 exp ( 2 +

=0

0 =0

=0

) f ( 0, , ) d 0 d d

=0

0d 0 d d

2 e0.5 d

0 2 exp ( 2 +

2 exp (

y =0

E ( S 0, ,

0.5

E 0 , ,

2 e0.5

= dy .

1
0 exp +
2

1
( e 1)
2

=0

VE = Var 0 , ,

=0

=0

=0

1
2

) d0 d d

2 d0 d d

e d d

2 e0.5

0 exp +

= E0 , ,

1 2
( e 1)
3

1
3

2 exp (

=0

exp (

exp ( 2 ) d d

=0

)d

=0

Guo Fall 2009 C, Page 142 / 284

Set

= y . Then 2 d

1
0 exp +
2

E 0 , ,

= dy .

{E

)}

E ( S 0, ,

0 , ,

=
= 0 = 0 0 =0

=
= 0 = 0 0 =0

Var ( S 0 , ,

= E0 , ,

0 , ,

(e

0.5

0 exp ( 2 + 2

)2

0d 0 d d

0 =0

2 exp ( 2

d0 d d

exp ( 2 )

=0

)}

1) = 0.5872

0 exp ( 2 + 2

1 2
1 y
e 1)
e dy
(
3
2
y =0

1 2
( e 1) ( e 1)
6

E ( S 0, ,

) f (0, , ) d 0 d d

1 1 2
( e 1)
2 2
k=

{E

( e 1)

=0

1) ( e 1) =

)d

=0

0 exp ( 2 + 2

2 exp ( 2

exp (

= 2 = ( e 1) ( e0.5 1)

1 2
( e 1) ( e 1)
6

EV = E0 , ,

(e

1 1
3 2

E ( S 0, ,

VE = E 0 , ,
=

1 2
( e 1)
3

)d

=0

1
2

2 exp ( 2

=0

exp ( 2 ) d d

=0

2
1 1 2
1
1
e 1) ( e 2 1) = ( e2 1) = 5.103
(
2 2
2
8

EV
5.103
=
= 8.69
VE 0.5872

Shortcut to avoid the hard-core integration seen above.


The joint pdf is f ( 0 , ,

c(

)=2

)=2

= a (0 )b ( ) c (

. In addition, 0 , , and

) , where a ( 0 ) = 1 , b ( ) = 1 , and

lie in the cube 0 < 0 < 1 , 0 < < 1 , 0 <

< 1.

Guo Fall 2009 C, Page 143 / 284

Consequently, 0 , , and
marginal pdf:

f0 ( 0 ) = 1 ,

0 < 0 < 1;

f ( ) = 1,

0 < < 1;

( )=2

, 0<

E ( S 0, ,

= E0 , ,

< 1.

)
0

Var ( S 0 , ,

EV = E0 , ,
E (e

)=

e 2 du =

( )

E e2

= E ( 0 ) E ( e ) E e0.5

0 exp +

= E0 , ,

E ( e ) = e du = e 1 ,

1
,
2

E (0 ) =

are independent random variables with the following

1 2
e
2

= e2 2 d
2

1
0

= E0 , ,
=

1 2
e
2

E e0.5

) = ( e 1) ( e

1
2

0.5

= E ( 0 ) E ( e ) E e0.5

= e0.5 2 d
2

1)

= 2 e0.5

0 exp ( 2 + 2

E 0 , ,

1
0 exp +
2

0
2

= E ( 0 ) E ( e2 ) E e 2

= E 0 , ,

E ( S 0, ,

= E 0 , ,

0 2 exp ( 2 +

( )

k=

E ( S 0, ,

1
1
E ( 0 2 ) = 0 2 d 0 = , E ( e 2 ) = ( e2 1) , E e
3
2
0
VE = E 0 , ,

E ( S 0, ,

1 2
( e 1)
2

= E 0 , ,

2
2

= 2 ( e0.5 1)

( )

( ) = 12 12 ( e 1) 12 ( e 1) = 18 ( e 1)

E ( S 0, ,

1 2
( e 1)
2

EV = E ( 0 ) E ( e 2 ) E e 2

VE = Var 0 , ,

2 =

1 2
( e 1) ( e 1)
6

)
)
2

{E

= 5.103

E ( S 0, ,

0 , ,

)}

( )

= E ( 0 2 ) E ( e2 ) E e

= e 2 d
2

= e

=e 1

( e 1)

(e

0.5

1) = 0.5872
2

EV
5.103
=
= 8.69
VE 0.5872

Guo Fall 2009 C, Page 144 / 284

Please note
The joint pdf f ( 0 , ,

) = a ( 0 ) b ( ) c ( ) alone doesnt guarantee that 0 , , and

are

independent. The additional requirement for 0 , , and to be independent is that 0 , ,


and lie in a cube A < 0 < B , C < < D , E < < F , where A,B,C,D,E, and F are
constant. For example, say A < 0 < B , C < < D , e ( 0 ) <
f (0, ,

) = a ( 0 ) b ( ) c ( ) , then 0 , , and

< f ( 0 ) . Even if

are NOT independent.

Q12 Nov 2004 #25


You are given:
A portfolio of independent risks is divided into two classes.
Each class contains the same number of risks.
For each risk in Class 1, the number of claims per year follows a Poisson
distribution with mean 5.
For each risk in Class 2, the number of claims per year follows a binomial
distribution with mean m = 8 and q = 0.55 .
A randomly selected risk has three claims in Year 1, r claims in Year 2, and four
claims in Year 3.

The Bhlmann credibility estimate for the number of claims in Year 4 for this risk is
4.6019.
Determine r .
Solution

Risk

P ( Risk )

X Risk

E ( X Risk )

Var ( X Risk )

#1
#2

0.5
0.5

Poisson with mean 5


Binomial m = 8 and q = 0.55

5
8(0.55) = 4.4

5
8(0.55)0.45 = 1.98

1
( 5 + 4.4 ) = 4.7 ,
2
2
VE = ( 5 4.4 ) 0.52 = 0.09

EV 3.49
=
= 38.78 ,
k=
VE 0.09

4.6019 =

EV =

1
( 5 + 1.98) = 3.49
2

k +
P=

n
i =1

Xi

n+k

38.78 ( 4.7 ) + ( 3 + r + 4 )
, r =3
3 + 38.78

Guo Fall 2009 C, Page 145 / 284

Q13 Nov 2001 #23


You are given the following information on claim frequency of auto accidents for
individual drivers:

Rural
Urban
Total

Business Use
Expected claims Claim variance
1.0
0.5
2.0
1.0
1.8
1.06

Pleasure use
Expected claims Claim variance
1.5
0.8
2.5
1.0
2.3
1.12

You are given:


Each drivers claim experience is independent of every other drivers.
There are an equal number of business and pleasure use drivers.
Determine the Bhlmann credibility factor for a single driver.
Solution
The key to solving this problem is correctly identifying risk classes. There are four risk
classes:

= ( BR, BU , PR, PU )
BR=Business & Rural Use, BU=Business & Urban Use
PR=Pleasure & Rural Use, PU=Pleasure & Urban Use
Next, we need to calculate the probability of Rural Use and Urban Use.

Rural
Urban
Total

Expected claims
1.0
2.0
1.8

P ( R )(1.0 ) + P (U )( 2.0 ) = 1.8 ;

P ( R ) + P (U ) = 1

Solving these two equations, we get: P ( R ) = 0.2 , P (U ) = 0.8 .


Next, we list the probability for each class:
Business Use 0.5
Pleasure use 0.5
Rural 0.2
P(BR)=0.2(0.5)=0.1 P(PR)=0.2(0.5)=0.1
Urban 0.8
P(BU)=0.8(0.5)=0.4 P(PU)=0.8(0.5)=0.4

Guo Fall 2009 C, Page 146 / 284

Let X represent the claim frequency of auto accidents of a randomly selected driver.
E(X

BR
BU
PR
PU

1.0
2.0
1.5
2.5

EV = E Var ( X

VE = Var E ( X
E(X

VE = E
k=

P(

0.5
1.0
0.8
1.0

= E E(X

Var ( X

0.1
0.4
0.1
0.4

E(X

1.0
4.0
2.25
6.25

= 1.0(0.1) + 2.0(0.4) + 1.5(0.1) + 2.5(0.4) = 2.05

) = 0.5(0.1) + 1.0(0.4) + 0.8(0.1) + 1.0(0.4) = 0.93


) = E E ( X ) {E E ( X ) }
2

) = 1.0(0.1) + 4.0(0.4) + 2.25(0.1) + 6.25(0.4) = 4.425


E(X )
{E E ( X ) } =4.425 2.052 = 0.2225
2

EV
0.93
=
= 4.18
VE 0.2225

The Bhlmann credibility factor for a single driver is:


Z=

n
1
=
= 0.193
n + k 1 + 4.18

Guo Fall 2009 C, Page 147 / 284

Chapter 6

Bhlmann-Straub credibility model

In the Bhlmann credibility model, we focus on one policyholder. We know that this
policyholder has incurred claim amounts X 1 , X 2 ,, X n in Year 1, 2, , n
respectively. We want to estimate his conditional mean claim amount in Year n + 1 :
E ( X n +1 X 1 , X 2 ,..., X n )

We find that the conditional mean claim is E ( X n +1 X 1 , X 2 ,..., X n )

(1 Z ) + Z X .

Now we move from the Bhlmann credibility world to a more complex, the BhlmannStraub credibility world. Instead of looking at only one policyholder, we look at a group
of policyholders.

Context of the Bhlmann-Straub credibility model


In Year 1, there are m1 policyholders. The 1st policyholder has incurred X (1, t = 1) claim.

The 2nd policyholder has incurred X ( 2, t = 1) claim. And the m1 -th policyholder has
incurred X ( mi , t = 1) claim dollar amount.
In Year 2, there are m2 policyholders. The 1st policyholder has incurred X (1, t = 2 )
claim. The 2nd policyholder has incurred X ( 2, t = 2 ) claim. And the m2 -th
policyholder has incurred X ( m2 , t = 2 ) claim amount.
In Year t , there are mt policyholders. The 1st policyholder has incurred X (1, t ) claim.
The 2nd policyholder has incurred X ( 2, t ) claim. And the mt -th policyholder has
incurred X ( mt , t ) claim amount.
In Year n , there are mn policyholders. The 1st policyholder has incurred X (1, t = n )
claim. The 2nd policyholder has incurred X ( 2, t = n ) claim. And the mn -th
policyholder has incurred X ( mn , t = n ) claim amount.
In Year n + 1 , there are mn +1 policyholders.
Question: In Year n + 1 , how much renewal premium should each of the mn +1
policyholders pay?
Guo Fall 2009 C, Page 148 / 284

Assumptions of the Bhlmann-Straub credibility model


All the observed policyholders belong to the same sub-risk class

. That is, m1

policyholders in Year 1, m2 policyholders in Year 2, mn policyholders in Year n , and


the mn +1 policyholders in year n + 1 , all belong to the same sub-risk .
We dont know the specific value of
from = { 1 , 2 ,...} .

Given

. All we know is that

takes on a random value

, all the claims throughout n + 1 years, X (1, t = 1) , X ( 2, t = 2 ) ,,

X ( mn , t = n ) , X ( mn +1 , t = n + 1) are independent identically distributed with a common


conditional mean E X ( i, t )
Var X ( i, t )

= (

and a common conditional variance

( ).

One approach is to calculate the renewal premium for Year n + 1 from the scratch. An
easier approach is to convert the Bhlmann-Straub credibility problem into a standard
Bhlmann credibility problem. Ill do both.
First, lets look at the problem from the Bhlmann world. In Year 1, m1 policyholders
m1

have incurred a total of


i =1

X ( i, t = 1) claim amount. Because these m1 policyholders

belong to the same, unknown, sub-risk , theres no distinction between any two of these
m1 policyholders. All these m1 policyholders are just photocopies of one another.
So
m1

In Year 1, m1 policyholders have incurred a total of

X ( i, t = 1) claim amount.

i =1

Is the same as
m1

In the first m1 years, one policyholder has incurred a total of


m1

In either case, the total claim amount is

X ( i, t = 1) claim amount.

i =1

X ( i, t = 1) ; the average claim per policyholder

i =1

per year is

1
m1

m1

X ( i, t = 1) .

i =1

Guo Fall 2009 C, Page 149 / 284

Similarly,
m2

In Year 2, m2 policyholders have incurred a total of

X ( i, t = 2 ) claim amount.

i =1

Is the same as
m1

In the next m2 years, the policyholder (who has incurred


m2

years) has incurred total

i =1

X ( i, t = 1) in the first m1

X ( i, t = 2 ) claim.

i =1

So on and so forth.

Now the original Bhlmann-Straub problem becomes a standard Bhlmann problem:


m1

In the first m1 years, one policyholder has incurred total

m2

In the next m2 years, the policyholder has incurred total

X ( i, t = 2 ) claim.

i =1
m3

In the next m3 years, the policyholder has incurred total

X ( i, t = 1) claim.

i =1

X ( i, t = 3) claim.

i =1

mn

In the next mn years, the policyholder has incurred total

X ( i, t = n ) claim.

i =1

This is the same as:


In m = m1 + m2 + ... + mn years, one policyholder has incurred total

mi

X ( i, t ) claim.

t =1 i =1

Then the expected claim cost in Year m + 1 for one policyholder can be calculated using
the Bhlmann credibility formula:
P = Z X + (1 Z ) , where
Total observed claims
1
X=
=
Total # of observed years m

mt

X ( i, t )

t =1 i =1

# of observation years
m
Z=
=
,
# of observation years + k m + k

k=

E
Var

( )
( )
2

Guo Fall 2009 C, Page 150 / 284

Theres nothing new under the sun in the Bhlmann-Straub credibility model. Every
problem about the Bhlmann-Straub credibility model can be solved using the Bhlmann
credibility model.
Actually, we can have a unified formula for the Bhlmann-Straub and the Bhlmann
credibility models:
P = Z X + (1 Z )
X=

observed claim dollar amounts


,
# of observed exposures (measured on the insured-year basis)

# of observed exposures
Z=
,
# of observed exposures + k

k=

E
Var

( )
( )
2

In this unified formula, the observed exposure is measured on the insured-year basis. For
example, if one policyholder has incurred $500 claim in one year, the exposure is:
1 insured 1 year = 1 insured-year
If the policyholder has incurred $500 claim in a 2-year period, then the exposure is:
1 insured 2 years = 2 insured-years
Lets see how the unified formula works for the Bhlmann and the Bhlmann-Straub
credibility models. In the Bhlmann model, we have an n -year claim history of one
policyholder. So the observed exposure is:
1 insured n years = n insured-years.
Then the formula becomes:
Z=

n
,
n+k

X=

X 1 + X 2 + ... + X n
n

In the Bhlmann-Straub model, we have observed m = m1 + m2 + ... + mn policyholders.


However, we have only 1-year claim data for each of these m policyholders. So the total
# of exposures is:

m insureds 1 year = m insured - years


Then the unified formula becomes:
Guo Fall 2009 C, Page 151 / 284

1
X=
m

m
Z=
,
m+k

mt

X ( i, t )

t =1 i =1

Now you know how to convert a Bhlmann-Straub problem into a Bhlmann problem
and how to use a unified formula for the Bhlmann-Straub model and the Bhlmann
model. Next, Ill derive the Bhlmann credibility formula from the scratch. First, lets
create an average policyholder and reorganize each years claim data from the viewpoint
of this average policyholder.
Lets look at the claim history data in the Bhlmann-Straub model from the average
policyholders point of view:
m1

In year 1, m1 policyholders have incurred total


policyholder has incurred X 1 =

1
m1

m1

X ( i, t = 1) claim. So the average

i =1

X ( i, t = 1) claim.

i =1

m2

1
In Year 2, the average policyholder has incurred total X 2 =
m2

X ( i, t = 2 ) claim.

i =1

In Year t , the average policyholder has incurred total X t =

1
mt

mt

X ( i, t ) claim.

i =1

In Year n , the average policyholder has incurred total X n =

1
mn

mn

X ( i, t = n ) claim.

i =1

Our goal is to estimate E ( X n +1 ) , this average claim in Year n + 1 . Well use a + Z X to


approximate E ( X n +1 ) , where X =

Z=

Cov X , X n +1

( )

Var X

E ( Xt

)=E

1
mt

mt
i =1

),

n
i =1

mi
X i . Well minimize E a + Z X
m

E ( X n +1 )

a = (1 Z )

X ( i, t )

1
mt

mt
i =1

E X ( i, t )

1
mt

mt

)= ( )

i =1

Guo Fall 2009 C, Page 152 / 284

Var ( X t

) = Var

mt

1
mt

( )

=E

i =1

( )

Var X

n
i =1

i =1

Var X ( i, t )

i =1

1
mt2

mt
2

( )

i =1

( )
n

1
m

i =1

mi =

( )

mi E ( X i
n

1
m2

mt

1
mt2

mt

mi
( Xi
m

( )

( )

mi
( Xi
m

= Var

i =1

1
= 2 mt
mt

E X

X ( i, t )

i =1

mi 2Var ( X i
2

(m) =

( )= (

Heres a quick way to find E X

1
m

i =1

mi (

1
m2

m
2

i =1

mi 2

i =1

, has mean (

) and variance

Cov X , X n +1 = E X X n +1

E X X n +1 = E

mi

( ) = m1

and Var X

( ) without using

( )

E X =E
E ( X n +1 ) = E

( )

E X

E ( X n +1

Cov X , X n +1 = E

1
m

=E

) and a common variance

( ).
n +1

( )E(X

E X

=E

=E

, are

claim amount incurred by these m policyholders, given

) E(X )E(X

E X X n +1

( )

independent identically distributed with a common mean (

( ) . As a result, the average

mi = (

( )

complex summation symbols above. The # of policyholders observed is


m = m1 + m2 + ... + mn . The claims incurred by these m policyholders, given
2

n +1

=E

= Var (

Guo Fall 2009 C, Page 153 / 284

Z=

Cov X , X n +1

( )

Var X

)=

Var (
Var (

)
2

( )

m+

Var

( )
( )

Summary of the Bhlmann-Straub credibility model


Period 1

Period n

m1

mn

Exposure

Hypothetical mean for risk

) = E ( X 1 ) = E ( X 2 ) = ... = E ( X n ) = ...

per unit exposure


Var ( X 1

Process variance risk

)=

( )

m1

Var ( X n

)=

( )

mn

Then
E ( X n +1 X 1 , X 2 ,..., X n ) = Z X + (1 Z )
X=

n
i =1

mi
Xi ,
m

Z=

m
,
m+k

k=

Var

( )
( )

Remember that X 1 , X 2 , , X n are claims incurred by an artificially created average


n
policyholder in Year 1, 2, , n respectively. In addition, Z
. Dont make a
n+k
n
n
common mistake of writing Z =
. The formula Z =
is not good for the
n+k
n+k
Bhlmann-Straub credibility model.
Key point
In the Bhlmann-Straub credibility model, what matters is the total exposure m and the
historical average claim per exposure X . The individual claim amount X ( i, t ) doesnt
matter.
For example, everything else being equal, the following two cases have the same
Bhlmann-Straub credibility estimate.

Guo Fall 2009 C, Page 154 / 284

Case #1
m1 = 2 , X (1, t = 1) = 7 , X (1, t = 1) = 1 ;

m2 = 3 , X (1, t = 2 ) = 0 , X ( 2, t = 2 ) = 4 , X ( 3, t = 2 ) = 2 .

Case #2
m1 = 1 , X (1, t = 1) = 9 ;
m2 = 4 , X (1, t = 2 ) = 3 , X ( 2, t = 2 ) = 0.6 , X ( 3, t = 2 ) = 1 , X ( 3, t = 2 ) = 0.4 .
In both cases,
the total exposure is m1 + m2 = 5 ;
the total claim dollar amount is 14 = 7+0+4+2 =9+3+0.6+1+0.4;
the average claim per insured per year is 14/5=2.8.

General Bhlmann-Straub credibility model (more realistic)


This is a minor point. If you dont care, just skip it.
Loss Models mentions the Hewitts version of the Bhlmann-Straub credibility model.
This model assumes that X i , the average claim, given the sub risk class , are

independently distributed with a common mean E ( X i


Var ( X i

) = ( ) and a variance

( )
) = w( ) + m .
2

So the difference between the general and the standard Bhlmann-Straub model is about
the conditional variance assumption. Hewitts assumption is Var ( X i
the standard Bhlmann-Straub assumption is Var ( X i
Then Loss Models derived the formulas:
v
w = E w( ) , v = E v ( ) , w +
= E Var X j
mj

m* =

mj

j =1

v + wmj

v
j =1
w+
mj

n
j =1

E Var X j

)=

( ).

( )
) = w( ) + m ;
2

mi

)
Guo Fall 2009 C, Page 155 / 284

P = Z X + (1 Z ) ,

Z=

am *
,
1 + am *

X=

j =1

n
j =1

Xj

E Var X j
1

E Var X j

If m1 = m2 = ...mn = m , then
m* =

j =1

Z=

v
w+
mj

am *
=
1 + am *

j =1

1
v
w+
m

1
1+

1 1
a m*

n
v
w+
m

=
1
1+
a

w+

v
m

=
n+

w+

v
m

Two points to remember:


First, X

mj X j
mj

. Here X j s weight is inversely proportional E Var X j

, the

expected process variance . The higher the expected process variance of X j , the less
weight is assigned to X j . This way, X will have the minimum variance. This point is
explained in the study note by Curtis Gary Dean. Refer to this study note if you want to
find out more.
am *
. To get comfortable with this formula,
1 + am *
1
n
look at the basic formula Z =
=
. Lets compare these two formulas:
v
v
n+
1+
a
na

Next, lets look at the crazy formula Z =

Z=

n
n+

v
a

1
,
1 v
1+
a n

Z=

am *
=
1 + am *

1
1+

1 1
a m*

Guo Fall 2009 C, Page 156 / 284

Now you see that these two formulas are similar. If Var ( X i

) = ( ) and m
2

= 1 as in

the Bhlmann model, then


m* =

E Var X j

j =1

1 n
= ,
v
j =1 v

Z=
1+

This is the Bhlmann credibility premium formula Z =

1 1
a m*

1
1 v
1+
a n

1
n
.
=
v
1 v
n
+
1+
a
a n

The third point. Loss Models points out that in this version of the model, as m j
approaches infinity, the credibility factor Z wont approach to one. Lets take a look at
this.

( )
) = w( ) + m
2

Var ( X i

, Var ( X i

When m j

) = w(

1
n
=
,
w
j =1 w

m* =

)+

( )

Z=
1+

w(

mi

1 1
a m*

).

1
<1
1 w
1+
a n

Compare this with the Bhlmann model or the Bhlmann-Straub model. In the Bhlmann
model, as the number of exposures n approaches ,
Z=

n
v
n+
a

1
1 v
1+
a n

In the Bhlmann-Straub model, w = 0 . So as m j


Var ( X i

)=

Z=

1
1 1
1+
a m*

( )

0,

mi

m* =

n
j =1

,
1

E Var X j

Guo Fall 2009 C, Page 157 / 284

Finally, Loss Models has a special case of the general Bhlmann-Straub model. In this
special case, Var ( X i

( )
) = w( ) + m
2

as in the standard Bhlmann-Straub model.

Whats new is Var (

=a+

b
, as opposed to Var (
m

= a in the Bhlmann

model and the Bhlmann-Straub model. Here m =

j =1

m j represents the total exposure.

As you can see, this special case just changes Var (

= a to Var (

=a+

b
. In
m

b
. Loss Models points out to find
m
b
the credibility factor for this special case, we just need to change a to a + :
m

other words, this special case just changes a to a +

Z=

am *
1 + am *

b
m*
m
Z=
b
1+ a +
m*
m
a+

How to tackle the Bhlmann-Straub premium problem


Most likely, Exam C wont have problems on the generalized version of the BhlmannStraub model. So you should focus on the standard Bhlmann-Straub model. To tackle
the standard Bhlmann-Straub model, you can use any of the following 3 approaches:
m
m+k

Use the Bhlmann-Straub model formula Z =

Convert the Bhlmann-Straub problem into a Bhlmann problem. So instead of


having m = m j policyholder, we have m years of observation of one
policyholder. Then Z =

# of observation years
m
=
# of observation years + k m + k

Use the unified formula (without converting into the Bhlmann model):

Z=

# of observation years
m
=
# of observation years + k m + k

Guo Fall 2009 C, Page 158 / 284

Sample SOA Problems


Nov 2001 #26
You are given the following data on large business policyholders:
Losses for each employee of a given policyholder are independent and have a
common mean and variance.

The overall average loss per employee for all policyholders is 20.

The variance of the hypothetical mean is 40.

The expected value of the process variance is 8,000.

The following experience is observed for a randomly selected policyholder:


Year
1
2
3

Average loss per employee


15
10
5

Number of employees
800
600
400

Determine the Bhlmann-Straub credibility premium per employee for this policyholder.
Solution
Method 1

Use the Bhlmann-Straub credibility premium formula

The global mean:


The expected process variance:
The hypothetical mean:
So k =

= 20 .
EV=8,000
VE=40

EV 8, 000
=
= 200
VE
40

m = m1 + m2 + m3 = 800 + 600 + 400 = 1800


Z=

m
1800
=
= 0.9
m + k 1800 + 200

X=

1
m

3
i =1

mi X i =

800 (15 ) + 600 (10 ) + 400 ( 5)


= 11.111
1800

P = Z X + (1 Z ) = 0.9 (11.111) + 0.1( 20 ) = 12


Alternatively,
Guo Fall 2009 C, Page 159 / 284

k +
P=

3
i =1

mi X i

m+k

Method 2

200 ( 20 ) + 800 (15) + 600 (10 ) + 400 ( 5 )


1800 + 200

= 12

Convert the Bhlmann-Straub problem into a Bhlmann problem

We convert this table


Year
1
2
3

Average loss per employee


15
10
5

Number of employees
800
600
400

Into
Year
First 800 years
Next 600 years
Next 400 years

Total loss
15*800
10*600
5*400

Number of employee
1
1
1

The above two tables are essentially the same. In both tables, the average loss per
employee per year is
X=

800 (15 ) + 600 (10 ) + 400 ( 5 )


= 11.111
1800

After the conversion, the # of observation years n = 800 + 600 + 400 = 1800 . This seems
crazy, but it is merely a conceptual tool for us to transform a Bhlmann-Straub problem
into a Bhlmann problem.
Using the Bhlmann premium formulas, we have:
Z=

n
1800
=
= 0.9 , P = Z X + (1 Z ) = 0.9 (11.111) + 0.1( 20 ) = 12
n + k 1800 + 200

Method 3

Use the unified credibility premium formula

In this method, we dont care about the distinction between the Bhlmann and the
Bhlmann Straub models. We just use the following unified formulas:

Guo Fall 2009 C, Page 160 / 284

P = Z X + (1 Z )
X=

observed claims
,
# of observed exposures (measured on the insured-year basis)

# of observed exposures
Z=
,
# of observed exposures + k

k=

E
Var

Z=

# of observed exposures
1800
=
= 0.9
# of observed exposures + k 1800 + 200

X=

800 (15 ) + 600 (10 ) + 400 ( 5 )


= 11.111
1800

( )
( )
2

P = Z X + (1 Z ) = 0.9 (11.111) + 0.1( 20 ) = 12


May 2001 #23
You are given the following information about a single risk:
The risk has m exposures in each year
The risk is observed for n years
The variance of the hypothetical means is a

The expected value of the annual process variance is w +

v
m

Determine the limit of the Bhlmann-Straub credibility factor as m approaches infinity.


Solution
A nave approach is to use the Bhlmann credibility formula:

Z=

n
n
n
=
=
v
n + k n + EV
w+
VE n +
m
a

Then as m approaches infinity, w +

v
n
approaches w and Z approaches Z =
.
w
m
n+
a

Incidentally, this leads to the correct answer. However, this line of thinking is
problematic. As explained earlier, in the Bhlmann-Straub model, the credibility factor is
Guo Fall 2009 C, Page 161 / 284

m
Z=
=
m+k

i =1
n
i =1

mi

not Z =

mi + k

n
.
n+k

The correct logic is to realize that this problem involves a special Bhlmann-Straub
credibility model where Var ( X i

) = w(

)+

( ) . We are told that

mi
As derived earlier, when m = m2 = ... = mn = m , we have:

Z=

am *
=
1 + am *

As m

1
1+

v
m

1 1
a m*

w+

1
1+
a

v
m

0, Z =

n+

=
n+

m1 = m2 = ... = mn .

w+

v
m

w+

v
m

n+

w
a

Nov 2004 #9
Members of three classes of insureds can have 0, 1, or 2 claims, with the following
probabilities:

Class
I
II
III

# of claims
0
1
2
0.9 0.0 0.1
0.8 0.1 0.1
0.7 0.2 0.1

A class is chosen at random, and varying # of insureds from that class are observed over
2 years, as shown below:
Year
1
2

# of insureds
20
30

# of claims
7
10

Determine the Bhlmann-Straub credibility estimate of the number of claims in Year 3


for 35 insureds.
Solution
Guo Fall 2009 C, Page 162 / 284

Method 1

Use the Bhlmann-Straub credibility premium formula

P(

Class
I
II
III

E(X

Var ( X

(1)

1/3
1/3
1/3

(2)

0.2
0.3
0.4

0.36
0.41
0.44

Note (1)
(2)

0.2 = 0*(0.9) + 1*(0.0) + 2*(0.1)


0.36 =02*(0.9) + 12*(0.0) + 22*(0.1) 0.22
1
m = m1 + m2 = 20 + 30 = 50 , = ( 0.2 + 0.3 + 0.4 ) = 0.3 ,
3
1
VE = Var E ( X ) = ( 0.22 + 0.32 + 0.4 2 ) 0.32 = 0.00667
3
1
EV = E Var ( X ) = ( 0.36 + 0.41 + 0.44 ) = 0.4033
3
k=

EV
0.4033
=
= 60.5 ,
VE 0.00667

Z=

m
50
=
= 0.4525 ,
m + k 50 + 60.5

X=

7 + 10
= 0.34
50

P = 0.4525 ( 0.34 ) + (1 0.4525 ) 0.3 = 0.318

Alternatively, after we find k , we can skip Z :


k +
P=

i =1

mi X i

m+k

Method 2

60.5 ( 0.3) + (10 + 7 )


= 0.318
50 + 60.5

Convert the Bhlmann-Straub problem into a Bhlmann problem

We convert this table


Year
1
2

# of insureds
20
30

# of claims
7
10

into
Year
First 20years
Next 30 years

Total # of claims
7
10

Number of insured
1
1

Guo Fall 2009 C, Page 163 / 284

The above two tables are essentially the same. In both tables, the average loss per insured
per year is
X=

7 + 10
= 0.34
50

After the conversion, the # of observation years n = 20 + 30 = 50 . Using the Bhlmann


premium formulas, we have:
Z=

n
50
=
= 0.4525 , P = 0.4525 ( 0.34 ) + (1 0.4525 ) 0.3 = 0.318
n + k 50 + 60.5

Method 3

Use the unified credibility premium formula

In this method, we dont care about the distinction between the Bhlmann and the
Bhlmann Straub models.
Z=

# of observed exposures
50
=
= 0.4525
# of observed exposures + k 50 + 60.5

X=

7 + 10
= 0.34
50

P = 0.4525 ( 0.34 ) + (1 0.4525 ) 0.3 = 0.318


Nov 2002 #32
You are given four classes of insured, each of whom may have zero or one claim, with
the following probabilities:

# of claims
0
1
Class
I
II
III
IV

0.9
0.8
0.5
0.1

0.1
0.2
0.5
0.9

A class is selected at random, and four insureds are selected at random from the class.
The total number of claims is two, If five insureds are selected at random from the same
class, estimate the total number of claims using the Bhlmann Straub credibility.
Solution

Guo Fall 2009 C, Page 164 / 284

You can use any one of the three methods. Here I use the Bhlmann Straub credibility
m
.
formula Z =
m+k

P(

Class
I
II
III
IV

E(X

1/4
1/4
1/4
1/4

0.1
0.2
0.5
0.9

Var ( X

0.09
0.16
0.25
0.09

(1)

Note (1) 0.9 = ( 1 0 ) 2 * 0.9 * 0.1. Use the following shortcut:


a
If X = !
"b

with probability p
2
, then E ( X ) = ap + bq , Var ( X ) = ( a b ) pq
with probability q = 1- p

1
( 0.1 + 0.2 + 0.5 + 0.9 ) = 0.425
4
1
EV = ( 0.09 + 0.16 + 0.25 + 0.09 ) = 0.1475
4
1
VE = ( 0.12 + 0.22 + 0.52 + 0.92 ) 0.4252 = 0.096875
4

k=

EV
0.1475
=
= 1.5226 ,
VE 0.096875

Z=

m
4
=
= 0.7243 ,
m + k 4 + 1.5226

X=

2
= 0.5
4

P = 0.7243 ( 0.5 ) + (1 0.7243) 0.425 = 0.4793 (this is for one insured)

The credibility premium for the 5 insureds is 0.4793*5=2.4

Nov 2003 #27


You are given:
The # of claims incurred in a month by any insured has a Poisson distribution
with mean #
The claim frequency of different insureds are independent.
The prior distribution is gamma with probability density function

(100# )
f (# ) =

100 #

120#

Guo Fall 2009 C, Page 165 / 284

Months
1
2
3
4

# of insureds
100
150
200
300

# of claims
6
8
11
?

Determine the Bhlmann Straub credibility estimate of the # of claims in Month 4.


Solution
This time, lets solve it by converting the Bhlmann Straub credibility problem into a
Bhlmann credibility problem.
This table
Months
1
2
3

# of insureds
100
150
200

# of claims
6
8
11

Is the same as
Months
First 100
Next 150
Next 200

# of insureds
1
1
1

# of claims
6
8
11

So the total number of the observation years n = 100 + 150 + 200 = 450 . The total # of
25
observed claims is 6+8+11=25. So X =
.
460
Let N represent the # of claims incurred in a month by a randomly chosen policyholder.
Then N # is Poisson with mean # . So the risk random variable is # .
E ( N # ) = Var ( N # ) = #

= EV = E# Var ( N # ) = E# [ # ] = & = 6
VE = Var# E ( N # ) = Var# [ # ] = & (& + 1)
k=

EV
0.06
=
= 100
VE 0.0006

Z=

1
= 0.06
100
2

&

1
= 6(7)
100

0.06 2 = 0.0006

n
450
=
= 0.818
n + k 450 + 100
Guo Fall 2009 C, Page 166 / 284

P = Z X + (1 Z ) = 0.818

25
+ (1 0.818 ) 0.06 = 0.0564
460

300*0.0564=16.9

Nov 2005 #22


You are given:
A region is comprised of 3 territories. Claims experience for Year 1 is as follows:
Territory
A
B
C

# of insureds
10
20
30

# of claims
4
5
3

The # of claims for each insured each year has a Poisson distribution.
Each insured in a territory has the same expected claim frequency.
The # of insureds is constant over time for each territory.

Determine the Bhlmann-Straub empirical Bayes estimate of the credibility factor Z for
Territory A.
Solution

P(

Territory
A
B
C

= E E(X

VE = Var E ( X
k=

E(X

1/6
2/6
3/6

4/10=0.4
5/20=0.25
3/30=0.1

= E Var ( X

=E

EV
0.2
=
= 16 ,
VE 0.0125

) = Var ( X )

E(X
Z=

= EV =

1
2
3
( 0.4 ) + ( 0.25 ) + ( 0.1) = 0.2
6
6
6

2 =

1
2
3
0.4 2 ) + ( 0.252 ) + ( 0.12 ) 0.22 = 0.0125
(
6
6
6

n
10
=
= 0.385
n + k 10 + 16

Guo Fall 2009 C, Page 167 / 284

Chapter 7
Empirical Bayes estimate for the
Bhlmann model
Deans study note has a good explanation of the formulas and worked out problems.
Read Deans study note along with my explanation.
This topic is among the least interesting ones in Exam C. However, it was repeatedly
tested in Exam C. The exam problems on this topic are easy. The difficulty is to
memorize the formulas. In this chapter, I will show you some ideas behind the formulas
to help you memorize the formulas.

Empirical Bayes estimate for the Bhlmann model


We have an n -year claim data about r risks. For each risk, we have its claim amount in
Year 1, Year 2, , Year n . Let X i j represent the claim incurred by the i -th
policyholder in Year j . This is what we know:
Year 1

Year 2

X 11

X 12

X 1n

X 21

X 22

X 2n

X r1

X r2

Xrn

Risk

Year n

The issue here is that we dont know the probability distribution of the conditional claim
random variable X or the probability distribution of the risk variable . As a result, we
cant calculate the two inputs for the credibility factor Z : the expected process variance
EV = E Var ( X ) and the variance of the hypothetical mean VE = Var E ( X ) .
So we need to estimate EV and VE from the past claim data given to us.
Its easy to estimate EV = E Var ( X
using the formula

EV = E Var X i j

2
i

n 1 t =1

(X

it

)
Xi

. We can estimate Var ( X


2

for each risk

. Then well take the average and find

. This estimation process can be summarized as follows:

Guo Fall 2009 C, Page 168 / 284

Year 1

Risk

Year 2

Year
n

X 11

X 12

X 1n

X 21

X 22

X 2n

X r1

X r2

Xrn

Sample
mean X i

2
i

(X

Xi

X1

X2

Xn

it

n 1 t =1
2
1 n
X 1t
1 =
n 1 t =1
2
1 n
X 2t
2 =
n 1 t =1

2
1 n
X rt
r =
n 1 t =1

1
X 1t
n t =1
1 n
X2 =
X 2t
n t =1

1 n
X1 =
Xr t
n t =1
X1 =

Then the expected process variance is estimated as:


EV =

1
r

2
i

i =1

1
r ( n 1)

i =1 t =1

(X

Xi

it

Next, we need to estimate VE = Var E X i j

. We dont directly estimate VE.

Instead, we estimate VE using the following equation:

( )

Var X = Var

1
E Var ( X
n

= VE +

1
EV
n

We derived this equation in the chapter on the Bhlmann model. Then:


VE = Var

( )

VE = V ar X

( )

( )

1
E Var ( X
n

= Var X

( )

( )

VE = V ar X

= Var X

1
EV
n

1
EV
n

V ar X is simple to calculate: V ar X =

So

( )

1
1 n
EV =
Xi
n
r 1 i =1

r 1 i =1

(X

1
1

n r ( n 1)

i =1 t =1

(X

it

Xi

In some situations, VE calculated this way may be negative. If VE is negative, we set


EV
VE = 0 . If VE = 0 , then k =
and Z = 0 .
VE

Guo Fall 2009 C, Page 169 / 284

Summary of the estimation process for the empirical Bayes estimate for the
Bhlmann model
Step 1 Calculate the sample variance for each risk and the expected process variance for
all risks combined:
2
i

n 1 t =1

(X

Xi

it

( )

Step 2 Calculate V ar X =

, EV =

r 1 i =1

(X

( )

Step 3 Use the equation Var X = VE +

1
r

2
i

i =1

1
r ( n 1)

i =1 t =1

(X

it

Xi

( )

1
EV . Find VE = V ar X
n

1
EV
n

May 2000 #15


An insurer has data on losses for four policyholders for seven years. X i j is the loss from
the i -th policyholder for year j . You are given:
4

i =1 j =1

4
i =1

(X

(X

Xi

ij

Xi

ij

= 33.6

= 3.3

Calculate the Bhlmann credibility factor for an individual policyholder using


nonparametric empirical Bayes estimation.
Solution

Here the # of risks r = 4 ; the # of observation years is n = 7 .


Step 1

EV =

1
r

( )

2
i

i =1

Step 2

V ar X =

Step 3

Var X = VE +

( )

1
r ( n 1)
n

r 1 i =1

(X

1
EV
n

i =1 t =1

(X

it

Xi

1
33.6 = 1.4
4 ( 7 1)

3.3
= 1.1
4 1
VE = 1.1

1.4
= 0.9
7

Guo Fall 2009 C, Page 170 / 284

k=

EV

VE

1.4
,
0.9

Z=

n
7
=
= 0.818
n + k 7 + 1.4
0.9

Nov 2002 #11


An insurer has data on losses for four policyholders for seven years. X i j is the loss from

the i -th policyholder for year j . You are given:


4

i =1 j =1

4
i =1

(X

(X
ij

Xi

ij

Xi

= 33.6

= 3.3

Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility
factor for an individual policyholder.
Solution

This is the same as May 2000 #15.

Nov 2003 #15


You are given total claims for two policyholders:

Policyholder
X
Y

Year
2
3
800 650
650 625

1
730
655

4
700
750

Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility
factor for Policyholder Y.
Solution
r = 2, n = 4.

Step 1 Calculate the sample conditional variance for each risk and the mean
2
i

n 1 t =1

X it

Xi

, EV =

1
r

r
i =1

2
i

1
r ( n 1)

i =1 t =1

(X

it

Xi

Guo Fall 2009 C, Page 171 / 284

X=

730 + 800 + 650 + 700


= 720
4

Y=

655 + 650 + 625 + 750


= 670
4

V ar ( X ) =

V ar ( Y ) =

(X

Xi

it

n 1 t =1
1
2
2
2
2
=
( 730 720 ) + (800 720 ) + ( 650 720 ) + ( 700 720 ) =3,933.33
4 1

EV =

4 1

( 655

670 ) + ( 650 670 ) + ( 625 670 ) + ( 750 670 )


2

=3,016.67

1
( 3,933.33 + 3, 016.67 ) = 3, 475
2

( )

Step 2 Calculate V ar X =

The global mean: =

( )

V ar X =

1
2 1

( 720

r 1 i =1

(X

695 ) + ( 670 695 )


2

( )

( )

1
1
X + Y = ( 720 + 670 ) = 695
2
2

Step 3 Use the equation Var X = VE +

VE = V ar X

= 1, 250

( )

1
EV . Find VE = V ar X
n

1
EV
n

1
1
EV = 1, 250
( 3, 475 ) = 381.25
n
4

The final result:

k=

EV
VE

3, 475
mY
4
=
= 0.305
= 9.115 , ZY =
381.25
mY + k 4 + 9.115

PY = ZY Y + (1 ZY ) = 0.305 ( 670 ) + (1 0.305 ) 695 = 687.4

Guo Fall 2009 C, Page 172 / 284

Empirical Bayes estimate for the Bhlmann-Straub model


Here the # of policyholders varies from risk to risk and year to year. For risk 1, m11
policyholders have incurred X 11 claim amount in Year 1; m12 policyholders have
incurred X 12 claim amount in Year 1; ; that m1n policyholders have incurred X 1n
claim amount in Year 1.
For risk 2, m21 policyholders have incurred X 21 claim amount in Year 1; m22
policyholders have incurred X 22 claim amount in Year 2; ; that m2n2 policyholders
have incurred X 2n2 claim amount in Year n2 . So on and so forth.
This is the information given to you:
Year 1
Year 2

Risk
1

r
Total

n1
n2

nr

X 11

X 12

X 1n1

m11

m12

m1n1

X 21

X 22

X 2n

m21

m22

m2n

X r1

X r2

Xrn

mr1

mr 2

mr n

Total
exposure

Sample mean

How to estimate:
Risk Periods
1

Year
n1

m1 =
m2 =

mr =
mr =

n1
t =1

X1 =

m2 t

X2 =

mr t

1
Xr =
mr

mi

X=

n1
t =1

nr
t =1
r
i =1

1
m1

m1t

1
m2

1
m

n1
t =1

m1t X 1t

n2
t =1

mr nr
X r nr

Sample variance
2
1

mr t X r t

2
r

=
=

1
1

X1

m2 t X 2 t

X2

Xr

nr

nr 1 t =1
r

EV =

n2

n2 1 t =1
1

m1t X 1t

n1

n1 1 t =1

nr
t =1

Year
nr

X 2 n2

2
2

mi X i

m1n2

m2t X 2t

nr
t =1

Year
n2

i =1

mr t X r t

( ni
r

i =1

( ni

1)

2
i

1)

Guo Fall 2009 C, Page 173 / 284

Step 1 Calculate the sample variance for each risk and the expected process variance for
all risks combined:
2
i

mi t X i t

ni 1 t =1
r

EV =

ni

i =1

( ni
r

i =1

1)

( ni

ni

2
i

1)

Xi

i =1 t =1

mi t X i t
r
i =1

( ni

Xi

1)

Step 2 Calculate VE

VE =

i =1

mi X i

1
m

( r 1) EV

r
i =1

mi2

This formula is counter-intuitive and very hard to remember. However, youll just have
to memorize it. Perhaps Deans explained might help you a little bit. He says that the
crude estimate for VE is
r

VE =

i =1

mi X i

r 1

However, this estimate is biased. To have an unbiased estimator, we need to change the
above estimate to
r

VE =

i =1

mi X i

1
m

( r 1) EV

r
i =1

mi2

This isnt a big help on how to memorize the formula. This formula is hard. Youll just
have to memorize it.
Final point. Loss Models mentions the concept of credibility weighted average premium.
It proves that the total loss will be equal to the total premium if we set

Guo Fall 2009 C, Page 174 / 284

Zi X i

i =1
r

Zi
i =1

Refer to Loss Models to understand the proof.


Nov 2000 #27
You are given the following information on towing losses for two classes of insured,
adults and youths:
Exposures
Year
Adult
1996
2000
1997
1000
1998
1000
1999
1000
Total 5000

Youth
450
250
175
125
1000

Pure Premium
Year
Adult
1996
0
1997
5
1998
6
1999
4
Weighted
3
Average

Total
2450
1250
1175
1125
6000

Youth
15
2
15
1
10

Total
2.755
4.400
7.340
3.667
4.167

You are also given that the estimated variance of the hypothetical means is 17.125.
Determine the non-parametric empirical Bayes credibility premium for the youth class,
using the method that preserves the total losses.
Solution
We have two risk groups: adults and youths. So r = 2 .

EV =

i =1

( ni
r

i =1

( ni

1)
1)

2
i

ni

i =1 t =1

mi t X i t
r
i =1

( ni

Xi

1)

Guo Fall 2009 C, Page 175 / 284

1
2
2
2
2
2, 000 ( 0 3) + 1, 000 ( 5 3) + 1, 000 ( 6 3) + 1, 000 ( 4 3)
( 4 1) + ( 4 1)

+450 (15 10 ) + 250 ( 2 10 ) + 175 (15 10 ) + 125 (1 10 )


2

= 12,291.7
The calculation of VE is complex. Fortunately, we are given that a = VE = 17.125
(Thank you SOA! )
ZA =

5, 000
1, 000
= 0.874 , ZY =
= 0.582
12, 291.7
12, 291.7
5, 000 +
1, 000 +
17.125
17.125

The credibility weighted average global mean is:


r

Zi X i

i =1
r

Zi

0.874 ( 3) + 0.528 (10 )


= 5.8
0.874 + 0.528

i =1

The non-parametric empirical Bayes credibility premium for the youth class is:

Z Y X Y + 1 Z Y = 0.582 (10 ) + (1 0.582 ) 5.8 = 8.24


Lets verify that the total credibility premium is equal to the total loss:
The non-parametric empirical Bayes credibility premium for the adult class is:

Z A X A + 1 Z A = 0.874 ( 3) + (1 0.874 ) 5.8 = 3.35


The total credibility premium is:
1,000(8.24)+5,000(3.35)=25,000
The total loss is:
Adult: 2,000(0)+1,000(5)+1,000(6)+1,000(5)=15,000
Or 5,000(average exposure) * (3 average premium per exposure)=15,000
Youth: 450(15)+250(2)+175(15)+125(1)=10,000
Or 1,000(average exposure) * (10 average premium per exposure)=10,000
Total: 25,000
Guo Fall 2009 C, Page 176 / 284

May 2001 #32


You are given the following experience for two insured groups:
Year
Group
1
2
1
# of members
8
12
Average loss
96
91
per member
# of members
25
30
2
Average loss
113
111
per member
Total
# of members
Average loss
per member
2

i =1 j =1

mij xij

2
i =1

mi xi

xi

3
5
113

4
25
97

20
116

75
113
100
109

= 2020

= 4800

Determine the nonparametric Empirical Bayes credibility premium for group 1, using the
method that preserves the total loss.
Solution
r

EV =

i =1

( ni
r

i =1

( ni

VE =

k=

i =1

1)

VE

i =1 j =1

mi j X i j
r
i =1

mi X i

ni

2
i

1
m
m

EV

Z1 =

1)

(r

r
2
i

m
i =1

( ni

1) EV

Xi

1)

4800

2020
= 505
2+2

( 2 1) 505

1
100
( 252 + 752 )
100

= 114.533

505
= 4.409
114.533

m1
m1 + k

25
m2
75
= 0.85 , Z 2 =
=
= 0.944
25 + 4.409
m2 + k 75 + 4.409

Guo Fall 2009 C, Page 177 / 284

Please dont write Z 1 =


Z=

. As mentioned before, in the Bhlmann-Straub model,

n+k

m
n
, not Z =
.
m+k
n+k
r

Zi X i

i =1
r

Zi

0.85 ( 97 ) + 0.944 (113)


= 105.42
0.85 + 0.944

i =1

Z 1 X 1 + 1 Z 1 = 0.85 ( 97 ) + (1 0.85 )105.42 = 98.26

Nov 2001 #30


You are making credibility estimates for regional rating factors. You observe that the
Bhlmann-Straub nonparametric empirical Bayes method can be applied, with rating
factor playing the role of pure premium. X i j denotes the rating factor for region i and

year j , where i = 1, 2,3 and j = 1, 2, 3, 4 . Corresponding to each rating factor is the


number of reported claims, mi j , measuring exposure.
You are given:
mi =

1
2
3

4
j =1

Xi =

mi j

50
300
150

1
mi

4
j =1

vi =

mi j

1.406
1.298
1.178

1
3

4
j =1

mi j X i j
0.536
0.125
0.172

Xi

mi j X i

0.887
0.191
1.348

Determine the credibility estimate of the rating factor for region 1 using the method that
3

preserves
i =1

mi X i .

Solution
r

EV =

i =1

( ni
r

i =1

( ni

1)
1)

2
i

ni

i =1 j =1

mi j X i j
r
i =1

( ni

Xi

1)

Guo Fall 2009 C, Page 178 / 284

( 4 1) + ( 4 1) + ( 4 1)

VE =

mi X i

i =1

1
m

k=

EV

VE

Z1 =

Z2 =

Z3 =

(r

1) EV

r
i =1

3 ( 0.536 ) + 3 ( 0.125) + 3 ( 0.172 ) = 0.2777

mi2

0.887 + 0.191 + 1.348 0.2777 ( 2 )


= 0.069
1
2
2
2
500
( 50 + 300 + 150 )
500

0.2777
= 40.0829
0.069

m1
m1 + k

25
= 0.555
25 + 40.0829

300
= 0.882
300 + 40.0829

150
= 0.789
150 + 40.0829

m2
m2 + k
m3
m3 + k
3

Zi X i

i =1

Zi

0.555 (1.406 ) + 0.8821(1.298 ) + 0.789 (1.178)


= 1.2824
0.555 + 0.8821 + 0.789

i =1

Z 1 X 1 + 1 Z 1 = 0.555 (1.406 ) + (1 0.555 )1.2824 = 1.35


Nov 2004 #17
You are given the following commercial automobile policy experience:

Losses
# of automobile
Losses
# of automobile
Losses
# of automobile

Company
I
II
III

Year 1
50,000
100
?
?
150,000
50

Year 2
50,000
200
150,000
500
?
?

Year 3
?
?
150,000
300
150,000
150

Determine the nonparametric Bayes credibility factor, Z , for Company III.


Guo Fall 2009 C, Page 179 / 284

Solution
Company
I

Year 1
X11=50,000 /
100
= 500

Year 2
X12=50,000/200
= 250

II
III

X=

Xi
X 1 =100,000/300
= 333.33

X21=150,000/500 X22=150,000/300
=300
= 500
X32=150,000/150
=1,000

X31=150,000/50
=3,000

X 2 =300,000/800
=375
X 3 =300,000/200
=1,500

100, 000 + 300, 000 + 300, 000


= 538.46
300 + 800 + 200

EV =

i =1

( ni
r

i =1

Year 3

1)

( ni

1)

ni

2
i

i =1 j =1

mi j X i j
r
i =1

( ni

Xi

1)

100 ( 500 333.33) + 200 ( 250 333.33)


2

( 2 1) + ( 2 1) + ( 2 1)

+500 ( 300 375 ) + 300 ( 500 375 )


2

+50 ( 3, 000 1,500 ) + 150 (1, 000 1,500 )


2

=53,888,889
r

VE =

i =1

mi X i

1
m

( r 1) EV

r
i =1

mi2

300 ( 333.33 538.46 ) + 800 ( 375 538.46 ) + 200 (1,500 538.46 )


=
1
1,300
3002 + 8002 + 2002 )
(
1,300
2

53,888,889 ( 3 1)

=157,035.6

k=

53,888,889
= 343.16 ,
157, 035.6

Z=

200
= 0.368
200 + 343.16
Guo Fall 2009 C, Page 180 / 284

May 2005 #25


You are given:
Group
Total Claims 1
# in Group
Average
Total Claims 2
# in Group
Average
Total Claims
# in Group
Average

Year 1

Year 2
10,000
50
200
18,000
90
200

16,000
100
160

Year 3
15,000
60
250

Total
2,500
110
227.27
34,000
190
178.95
59,000
300
196.67

You are also given a = 651.03


Use the nonparametric empirical Bayes method to estimate the credibility factor for
Group 1.
Solution
r

EV =

i =1

( ni
r

i =1

( ni

1)

2
i

1)

ni

i =1 j =1

mi j X i j
r
i =1

( ni

Xi

1)

1
2
2
50 ( 200 227.27 ) + 60 ( 250 227.27 )
( 2 1) + ( 2 1)

+100 (160 178.95 ) + 90 ( 200 178.95 )


2

=71,985.65

Z1 =

110
= 0.5
71,985.65
100 +
651.03

Guo Fall 2009 C, Page 181 / 284

Semi-parametric Bayes estimate


We have a parametric model for X

, but we dont have a parametric model for

(hence the name semi-parametric). Typically, a problem will tell us that X


random variable with mean

This is how to find EV and VE. Since X


E(X

is a Poisson

) = Var ( X ) =

is a Poisson, we have:

E(X

= E Var ( X

= EV .
However, Var ( X ) = E Var ( X

+ Var E ( X

= EV + VE

VE = Var ( X ) EV ,

May 2000 #33


The number of claims a driver has during the year is assumed to be Poisson distributed
with an unknown mean that varies by driver.
The experience for 100 drivers is as follows:
# of claims during the year
0
1
2
3
4

# of drivers
54
33
10
2
1

Determine the credibility of one years experience for a single driver using
semiparametric empirical Bayes estimation.
Solution
Let X represent the # of claims in a year and
that X is a Poisson random variable.

= E(X ) = E E(X
EV = E Var ( X

)
=E

=E

represents the mean of X . We are told

( ) = E( )

( ) = E( )
Guo Fall 2009 C, Page 182 / 284

=X =

54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63
=
= 0.63
54 + 33 + 10 + 2 + 1
100

EV = = 0.63

Var ( X ) =

1 100
Xi
100 1 i =1

54 ( 0 .63) + 33 ( 0 .63) + 54 (1 .63) + 10 ( 2 .63) + 2 ( 3 .63) + 1( 4 .63)


=
100 1
=0.68
2

Var ( X ) = E Var ( X

+ Var E ( X

= EV + VE

Var ( X ) = EV + VE , VE = Var ( X ) EV = 0.68 0.63 = 0.05

We need to calculate Z for a single driver. So n = 1 .


Z=

n
EV
n+
VE

1
= 0.073
0.63
1+
0.05

When taking the exam, you should use BA II Plus/ Professional 1-V Statistics Worksheet
to quickly calculate the sample mean and the sample variance.
Nov 2000 #7
The following information comes from a study of robberies of convenience stores over
the course of a year:

X i is the number of robberies of i -th store, with i =1,2,,500

X i = 50

X i2 = 220

The number of robberies of a given convenience store during the year is assumed
to be Poisson distributed with an unknown mean that varies by store.

Determine the semiparametric empirical Bayes estimate of the expected number of


robberies next year of a store that reported no robberies during the studied year.
Solution

Guo Fall 2009 C, Page 183 / 284

50
= 0.1
500

EV = = X =

Var ( X ) =

500

n 1 i =1

(X

A general formula you should memorize:


n
i =1

Xi

( )

i =1

X i2 n X

To see why this formula works, notice that the (biased) sample variance is:
Var ( X ) =
1
n

n
i =1

1
n

n
i =1

Xi

Var ( X ) =

Xi

1
n

= E ( X 2 ) E2 ( X ) =

( )

n
i =1

Xi2

1 500
Xi
500 1 i =1

,
i =1

1
500 1

1
n

(X )

n
i =1

Xi

X i2

500
i =1

( )

n
i =1

X i2 n X

( )

X i 2 500 X

220 5
= 0.43086
499

VE = Var ( X ) EV = 0.43086 0.1 = 0.33086

For a single policy,

Z=

n
EV
n+
VE

1
= 0.768
0.1
1+
0.33086

The single store didnt have any robbery incidents for two years. So the sample mean is
zero.
P

Z X + (1 Z ) = (1 Z ) = (1 0.768 ) 0.1 = 0.0232

Nov 2004 #37


For a portfolio of motorcycle insurance policyholders, you are given:
The number of claims for each policyholder has a conditional Poisson distribution
For Year 1, the following data are observed:
Guo Fall 2009 C, Page 184 / 284

Number of claims
0
1
2
3
4
Total

Number of Policyholders
2000
600
300
80
20
3000

Determine the credibility factor Z for Year 2.


Solution
Enter the following in BA II Plus/Professional 1-V Statistics Worksheet:
X01=0, Y01=2000
X02=1, Y02= 600
X03=2, Y03= 300
X04=3, Y04= 80
X05=4, Y05= 20
You should get:
The sample mean is X =0.50666667 0.507. This is and EV .
The sample standard deviation is S X = 0.83077411
The sample variance is S X 2 = 0.830774112 = 0.69018562 0.69019 . This is Var ( X ) .
So VE = Var ( X ) EV = 0.69019 0.507 = 0.183
Z

n
1
=
= 0.265
0.507
EV
1+
n+
VE
0.183

Please note that n = 1 (we have only one years data).

May 2005 #28


During a 2-year period, 100 policies had the following claim experience:
Number of claims in Year 1 and Year 2 Number of Policyholders
0
50
1
30
2
15
3
4
4
1
Guo Fall 2009 C, Page 185 / 284

The number of claims per year follows a Poisson distribution.


Each policyholder was insured for the entire 2-year period.
A randomly selected policyholder had one claim over the 2-year period.
Using semiparametric empirical Bayes estimation, determine the Bhlmann estimate for
the number of claims in Year 3 for the same policyholder.
Solution
Well use a 2-year period as one unit of time. So well calculate the Bhlmann estimate
the number of claims in Year 3 and Year 4. Then half of this amount will be the
Bhlmann estimate for the number of claims in Year 3.
Enter the following in BA II Plus/Professional 1-V Statistics Worksheet:
X01=0, Y01=50
X02=1, Y02=30
X03=2, Y03=15
X04=3, Y04= 4
X05=4, Y05= 1
You should get:
The sample mean is X =0.76. This is and EV .
The sample standard deviation is S X = 0.92244734
The sample variance is S X 2 = 0.922447342 = 0.85090909 0.851 . This is Var ( X ) .
VE = Var ( X ) EV = 0.851 0.76 = 0.091

n
1
=
= 0.107
0.76
EV
1+
n+
VE
0.091

A randomly selected policyholder had one claim over the 2-year period. So the sample
claim frequency is
P

Z X + (1 Z ) = 0.107 (1) + (1 0.107 ) 0.76 0.786

1
1
P = ( 0.786 ) = 0.393
2
2

Guo Fall 2009 C, Page 186 / 284

Chapter 8

Limited fluctuation credibility

The study note titled Chapter 8 Credibility jointly written by Mahler and Dean provides
an excellent explanation of the limited fluctuation credibility theory. Please read this
study note along with my explanation.
The goal of the limited fluctuation credibility model is the same as the goal of the
Bhlmann credibility model. We observe that a policyholder has incurred S1 , S2 ,, Sn
claim dollar amounts in Year 1, 2,, n respectively. We want to estimate the
policyholder renewal premium in Year n + 1 . The renewal premium in Year n + 1 is
E ( S n +1 S1 , S 2 ,..., S n ) , the expected claim dollar amount in Year n + 1 .
Here I wrote the past n year claim amounts as S1 , S2 ,, Sn instead of X 1 , X 2 ,, X n as
in the Bhlmann credibility model. Theres a reason for using a different notation. In the
limited fluctuation credibility model, we typically break down the annual claim dollar
amount S into two components:

the number of claims incurred by a policyholder in a year (loss frequency)


the claim dollar amount per loss incurred by a policyholder in a year (loss
severity)

Mathematically, S =

N
i =1

X i . Here N is the total number of claims incurred in a year (loss

frequency) by a policyholder. X i is the claim dollar amount of the i -th claim (loss
severity) incurred by the policyholder. S is the total claim dollar amount incurred in a
year (also called the annual aggregate claim) by the policyholder. In contrast, in the
Bhlmann credibility model, we dont break down the annual claim dollar amount into
loss frequency and loss severity.
In the limited fluctuation credibility model, we assume, as in the Bhlmann credibility
model, that the renewal premium is the weighted average of the global premium rate
1
(called manual rate) and the sample mean S = ( S1 + S 2 + ... + S n ) :
n
P
Renewal
premium

= E ( S n +1 S1 , S 2 ,..., S n ) = Z

S
policyholder-specific
sample mean

(1

Z)

global mean
(manual rate)

Here S is specific to a policyholder. Different policyholders have different claim


amounts S1 , S2 ,, Sn and hence different S . However, is the same for all
policyholders regardless of their different claim history.
Guo Fall 2009 C, Page 187 / 284

The limited fluctuation credibility assumes that the above renewal premium equation
automatically holds true without any proof. This equation is the starting point for the
limited fluctuation credibility. So when you study the limited fluctuation credibility,
youll need to accept the above equation without demanding proof.
In contrast, the Bhlmann credibility theory doesnt assume the above equation holds true
automatically. It derives this equation using basic probability theories.
Next, we need to calculate the weighting factor Z ( 0 Z 1 ), which is the credibility
assigned to the prior sample mean S . The Limited fluctuation credibility calculates Z as
follows:
Z=

# of observations you actually have


=
expected # of observations needed to make Z=1

your n
E ( N ) to make Z=1

If Z calculated above exceeds one, well set Z = 1 .


Once again, the limited fluctuation credibility assumes that Z =

your n
E ( N ) to make Z=1

holds true automatically without the need to prove it. So you need to accept it without
demanding any proof. The core theory of the limited fluctuation credibility is to calculate
E ( N ) to make Z = 1 .

General credibility model for the aggregate loss of r insureds


We first derive a model for r insureds. Then to calculate the renewal premium for one
insured, we just set r = 1 .
The aggregate annual loss for r insureds is:
S=

M
i =1

X i = X 1 + X 2 + ... + X M

Here X i is the dollar amount of the i -th claim. M =

r
j =1

N j = N1 + N 2 + ... + N r is the total

# of annual claims for r insureds; N j is the number of claims incurred by the j -th
insured.
We assume that X 1 , X 2 , X M are independent identically distributed with a common
pdf f X ( x ) ; N1 , N 2 ,, N r are independent identically distributed with a common pdf
fN (n) .
Guo Fall 2009 C, Page 188 / 284

We arbitrarily set Z = 1 if E ( M ) satisfies the following equation:


P

E (S )

k E (S )

E (S )

E (S )

A simplifying assumption is that

S E (S )

is approximately normal. Set Z =

S E (S )

Then Z is approximately a standard normal random variable.

E (S )

p,

Please note that for a non-negative a ,


P

= P( a

a) =

(a)

( a) .

( a ) + ( a ) = 1 . This holds whether

However,

=2

(a)

1, P

a is positive, negative, or zero.

E (S )

=2

E (S )

Lets consider the worst case P

E (S )

k E ( S ) = p or 2

E (S )

1 = p . We

still set Z = 1 for this worst case.

E (S )

= 1+ p ,

E (S )

Define CVS =

E (S )
the mean. Then

( y) =

1+ p
2

as the coefficient of variation. Its the standard deviation divided by

k
1+ p
=
,
CVS
2
Next, define

k=

1+ p
, or y =
2

1+ p
CVS .
2
1

1+ p
. Then k = y CVS
2

Guo Fall 2009 C, Page 189 / 284

Key interim formula: credibility for the aggregate loss


As actuaries, we set k and p . Then we find E ( N ) to make Z = 1 by solving the

k
1+ p
or k =
=
CVS
2

equation

1+ p
CVS = y CVS
2

Next, lets drive the full formula.


E ( M ) = E ( N1 + N 2 + ... + N r ) = r E ( N )
Var ( M ) = Var ( N1 + N 2 + ... + N r ) = rVar ( N )
E ( S ) = E ( X 1 + X 2 + ... + X M ) = E ( M ) E ( X ) = r E ( N ) E ( X )
Var ( S ) = Var ( X 1 + X 2 + ... + X M ) = E ( M ) Var ( X ) + Var ( M ) E 2 ( X )

= rE ( N ) Var ( X ) + rVar ( N ) E 2 ( X )
= r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

(S ) =
CVS =
E (S )
k = y CVS = y

k
y

k
y

r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )
r E(N)E(X )
r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )
r E(N)E( X )

r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )
r E(N)E( X )

Var ( X ) Var ( N )
1
1
,
=

+
r E(N)
E2 ( X )
E(N)

E ( N ) Var ( X ) + Var ( N ) E 2 ( X )
r E(N)E(X )

y
r E(N) =
k

Var ( X ) Var ( N )
+
E2 ( X )
E(N)

Guo Fall 2009 C, Page 190 / 284

Final formula you need to memorize


You also know how to derive from scratch this is the mother of all the formulas
for the limited fluctuation credibility model
y
k

If r E ( N ) =

Var ( X ) Var ( N )
y
+
=
2
E (X )
E(N)
k

CVX2 +

Var ( N )
E(N)

, then Z = 1 .

Please note that r is the number of insureds needed to achieve the full credibility. E ( N )
is the number of annual claims per insured. So r E ( N ) represents the expected number
of claims the insurer needs to have in its book of business to have the full credibility.

# of insureds in
the book of business

expected # of
claims per insured

E(N )

the expected # of claims the insurer


needs to have in its book of business
to have full credibility

1+ p
2
k

Var ( X ) Var ( N )
+
E2 ( X )
E(N)

One easy mistake made by many is to write


y
r E(N) =
k

To remember the term

Var ( X ) Var ( N )
+
E(X )
E(N)

Var ( X )
E2 ( X )

Wrong!

, please note that X is the claim dollar amount. So

E ( X ) is dollar amount and Var ( X ) is dollar squared. To have a meaningful ratio, we


need to square E ( X ) so the numerator and denominator are both dollar squared.
Please also note that

Var ( N )
E(N )

is fine. Here N is the claim number. So Var ( N ) is a

number; E ( N ) is a number. So the ratio

Var ( N )
E(N )

is fine.

Once again, remember that X is the dollar amount of a single claim incurred by one
policyholder and that N is the annual number of claims incurred by the policyholder.

Guo Fall 2009 C, Page 191 / 284

Special case
Credibility formulas for the aggregate loss for one insured (credibility in terms of
the expected number of annual claims)
Set r = 1 .
1

If E ( N ) =

Z = min

1+ p
2
k

Var ( X ) Var ( N )
, then Z = 1 .
+
E2 ( X )
E(N)

your n
, 1 = min
E ( N ) to make Z=1

your n
,1
Var ( N )
2
n0 CVX +
E(N)

May 2000 #26


You are given:
Claim counts follow a Poisson distribution
Claim sizes follow a lognormal distribution with coefficient of variation of 3
Claim sizes and claim counts are independent
The number of claims in the 1st year is 1,000
The aggregate loss in the 1st year was 6.75 million
The manual premium for the 1st year was 5 million
The exposure in the 2nd year is identical to the exposure in the 1st year
The full credibility standard is to be within 5% of the expected aggregate loss
95% of the time

Determine the limited fluctuation credibility net premium (in millions) for the 2nd year.
Solution

We are asked to find the limited fluctuation credibility renewal net premium for Year 2.
So we are just concerned with one policy (or one insured). Set r = 1 .
Credibility for the aggregate loss

Guo Fall 2009 C, Page 192 / 284

E(N) =

1+ p
2
k

1 + 95%
2
5%

Var ( X ) Var ( N )
+
=
E2 ( X )
E(N)

Var ( X ) Var ( N )
+
E2 ( X )
E(N)

We are told that the claim size X is lognormal with the coefficient of variation of one.
The information that X is lognormal is not needed. SOA just wants to scare us. What
matters is CVX . We are told that CVX = 3 .
In addition, we know that N is Poisson. So

Var ( N )
E(N)

=1.

So to have full credibility Z = 1 , the expected number of claims is:


E(N) =

Z = min

1.96
5%

( 32 + 1) = 10

1.96
5%

your n
, 1 = min
E ( N ) to make Z=1

1000
10

= E ( S n +1 S1 , S 2 ,..., S n ) = Z

P
Renewal
premium

1.96
5%

, 1 = min 10

(1

Z)

policyholder-specific
sample mean

5%
,1 =0.255
1.96

global mean
(manual rate)

= 0.255*6.75 + (1-0.255)*5=5.446

Nov 2000 #14

For an insurance policy, you are given:


For each individual insured, the number of claims follows a Poisson distribution
The mean claim count varies by insured, and the distribution of mean claim
counts follow a gamma distribution
For a random sample of 1000 insureds, the observed claim counts are as follows:
# of claims, n
# of insureds, f n

n f n = 750 ,

0
512

1
307

2
123

3
41

4
11

5
6

n 2 f n = 1494

Claim sizes follow a Pareto distribution with mean 1500 and variance 6,750,000.
Guo Fall 2009 C, Page 193 / 284

Claim sizes and claim counts are independent


The full credibility standard is to be with 5% of the expected aggregate loss 95%
of the time.

Determine the minimum number of insureds needed for the aggregate loss to be fully
credible.
Solution
1

r E(N) =

1+ p
2
k

Var ( X ) Var ( N )
+
E2 ( X )
E(N )

r=

1
E(N)

1+ p
2
=
k

We know that CVX

1+ p
2
k

Var ( X ) Var ( N )
+
E2 ( X )
E(N)

1 + 95%
1.96
2
=
5%
5%

(X ) =
=
E(X )

6,750,000
.
1500

CV =

We can use the method of moments to estimate


E(N

n fn

750
E(N) =
=
= 0.75 ,
1000 1000

r=

1
E(N)

1 1.96
=
0.75 5%

1+ p
2
k
2

)=

Var ( N )
E(N )
n2 fn
1000

Var ( N )

Var ( N ) = 1.494 0.752 = 0.9315


1

6,750,000
1500

2
X

E(N)

=3

1, 494
= 1.494
1000

0.9315
= 1.242
0.75

Var ( X ) Var ( N )
+
E2 ( X )
E(N)

( 3 + 1.242 ) =

6,518.42688
= 8, 691.24
0.75
Guo Fall 2009 C, Page 194 / 284

Nov 2001 #15


You are given the following information about a general liability book of business
comprised of 2500 insureds:

Xi =

Ni
j =1

Yi j is a random variable representing the annual loss of the i -th insured.

N1 , N 2 ,, N 2500 are independent and identically distributed random variables


following a negative binomial distribution with parameters r = 2 and = 0.2 .
Yi1 , Yi 2 , , Yi Ni are independent and identically distributed random variables
following a Pareto distribution with parameters = 3 and = 1000 .
The full credibility standard is to be within 5% of the expected aggregate losses
90% of the time.

Using classical credibility theory, determine the partial credibility of the annual loss
experience for the book of business.
Solution
First, lets calculate the # of insureds to have full credibility.
1

r=

1
E(N)

However,

1+ p
2
k

Var (Y )
E 2 (Y )

Var (Y ) Var ( N )
+
E 2 (Y )
E(N)

E (Y 2 ) E 2 ( Y )
E 2 (Y )

1
r=
E(N)

E (Y 2 )

E 2 (Y )

1+ p
2
k

E (Y 2 )

E (Y )
2

1+

Var ( N )
E(N)

N is negative binomial with parameters r = 2 and

E ( N ) = r = 2 ( 0.2 ) = 0.4 ,

Var ( N ) = r

(1 + ) ,

= 3 and

= 1000 .

Y is a 2-parameter Pareto with

= 0.2 .
Var ( N )
E(N)

= 1+

= 1 + 0.2 = 1.2

Guo Fall 2009 C, Page 195 / 284

E (Y k ) =

1)(

E (Y 2 )

E (Y )
2

k!
2 ) ... (

k)

2 2
1)(

2)

E (Y ) =

2(

1)
2

, E (Y 2 ) =

2 ( 3 1)
3 2

2 2
1)(

2)

=4

Youll want to memorize that

E (Y 2 )

if Y is a 2-parameter Pareto, then

1+ p
2
k

1
r=
E(N)

1 + 90%
2
5%

1
=
0.4

4.2 1.645
=
0.4 5%

E (Y )
2

=2

1
2

E (Y 2 )

1+

E 2 (Y )

Var ( N )
E(N)

1 1.645
( 4 1 + 1.2 ) =
0.4 5%

1.645
= 10.5
5%

(4

1 + 1.2 )

Please note that many times its advantageous not to expand

1+ p
2
k

. For

example, in this problem, its not necessary to calculate:

1+ p
2
k

1.645
= 10.5
5%

= 11,365.305

1.645
Lets continue. 10.5
is the number of insured to get full credibility. However,
5%
the number of insureds is 2500 in the book of the business.

Guo Fall 2009 C, Page 196 / 284

your r
=
r to make Z=1

Z=

2500
10.5

1.645
5%

50
= 0.469
1.645
10.5
5%

Nov 2002 #14


You are given the following information about a commercial auto liability book of
business:
Each insureds claim count has a Poisson distribution with mean , where
a gamma distribution with = 1.5 and = 0.2

has

Individual claim size amounts are independent and exponentially distributed with
mean 5000

The full credibility standard is for the aggregate losses to be within 5% of the
expected with probability 0.9

Using classical credibility, determine the expected number of claims required for full
credibility.
Solution
1

rE ( N )

the expected # of claims the insurer


needs to have in its book of business
to have full credibility

1+ p
2
=
k

1+ p
2
k

Var ( X ) Var ( N )
+
E2 ( X )
E(N)

1 + 90%
1.645
2
=
5%
5%

N is the annual number of claims incurred by one insured. N

gamma with parameters = 1.5 and


r = = 1.5 and = = 0.2 .

E ( N ) = r , Var ( N ) = r
X is exponentially distributed.

is Poisson and

is

= 0.2 . So N is negative binomial with parameters

+ 1) ,

Var ( N )

Var ( X )
E2 ( X )

E(N)

= 1+

= 1 + 0.2 = 1.2

=1

Guo Fall 2009 C, Page 197 / 284

rE ( N ) =

1+ p
2
k

Var ( X ) Var ( N )
1.645
+
=
2
5%
E (X )
E(N )

(1 + 1.2 ) = 2381.302

So the insurer needs to have at least 2,381 claims in a year to have full credibility.
Please note that the following information is not necessary for us to solve the problem:

= 1.5 (one parameter of gamma distribution). If N is negative binomial, then


Var ( N )
= 1 + regardless of .
E(N)

The mean 5000 for the individual claim size random variable. If X is
Var ( X )
exponential, then 2
= 1 regardless of the mean.
E (X)

Nov 2003 #3
You are given:
The number of claims has a Poisson distribution
Claim sizes have a Pareto distribution with parameters = 0.5 and = 6
The number of claims and claim sizes are independent
The observed pure premium should be within 2% of the expected pure premium
90% of the time.
Determine the expected number of claims needed for full credibility.
Solution
The pure premium is the expected total annual claim dollar amount incurred by one
policyholder. Set r = 1 , we have:
1

E(N) =

1+ p
2
k

Var ( X ) Var ( N )
+
=
E2 ( X )
E(N)

1+ p
2
k

The claim size X has a Pareto distribution with parameters


E(X2)
E

(X )

=2

E(X2)

E2 ( X )

= 0.5 and

1+

Var ( N )
E(N )

=6

1
6 1
=2
= 2.5
2
6 2
Guo Fall 2009 C, Page 198 / 284

N is Poisson. So

E(N) =

1.645
2%

E(N)

1+ p
2
k

Var ( N )

=1.

E(X2)
E

(X )

1+

Var ( N )
E(N )

1 + 90%
2
2%

( 2.5

1 + 1)

( 2.5 ) = 16, 912.66

Nov 2004 #21


You are given:
The number of claims has probability function:
p ( x) =

m x
m x
q (1 q ) , x = 0,1, 2,..., m
x

The actual number of claims must be within 1% of the expected number of claims
with probability 0.95.

The expected number of claims for full credibility is 34,574.

Determine q .
Solution
1

rE ( N ) =

1+ p
2
k

Var ( X ) Var ( N )
+
E2 ( X )
E(N)

This problem is concerned only with loss frequency. So we in the aggregate loss model
S=

N
i =1

X i , we set X i = 1 . This way, S = N becomes the total number of claims. Setting

Var ( X ) = Var (1) = 0 , we have:

Guo Fall 2009 C, Page 199 / 284

rE ( N ) =

1+ p
2
k

Var ( N )
E(N)

Plugging in the numbers: p = 95% , k = 1% ,


1

rE ( N ) =

1+ p
2
k

Var ( N )
E(N)

mq (1 q )
mq

=1 q

Var ( N )
E(N)

1.96
=
1%

(1

q ) = 34,574

q = 0.9

May 2005 #2
You are given:
The number of claims follows a negative binomial distribution with parameters r
and = 3 .
Claim severity has the following distribution

The number
the severity

Claim Size
1
10
100

Probability
0.4
0.4
0.2

of claims is independent of
of claims.

Determine the expected number of claims needed for aggregate losses to be within 10%
of the expected aggregate losses with 95% probability.
Solution

Claim Size X
1
10
100

Probability
0.4
0.4
0.2

You can verify that E ( X ) = 24.4 , Var ( X ) = 1, 445.04

The claim size N is negative binomial. So


1

rE ( N ) =

1+ p
2
k

Var ( N )
E(N)

= 1+

= 1+ 3 = 4 .

Var ( X ) Var ( N )
1.96
+
=
2
10%
E (X)
E(N)

1, 445.04
+ 4 = 2469.06
24.42

Guo Fall 2009 C, Page 200 / 284

Nov 2005 #35


You are given:
The number of claims follows a Poisson distribution
(unknown) and
Claim sizes follow a gamma distribution with parameters
= 10, 000
The number of claims and claim sizes are independent
The full credibility standard has been selected so that actual aggregate losses will
be within 10% of the expected aggregate losses 95% of the time
Using limited fluctuation (classical) credibility, determine the expected number of claims
required for full credibility.
Solution
1

rE ( N ) =

1+ p
2
k
1+ p
2
k

Var ( X ) Var ( N )
+
=
E2 ( X )
E(N)
2

Var ( X ) + E 2 ( X )
E2 ( X )

rE ( N ) =

Since

1+ p
2
k

E(X2)
E

(X )

1+ p
2
k

E2 ( X )

+1

E(X2)

E2 ( X )

and E ( X 2 ) =

X is gamma. From Exam C Table, we know E ( X ) =

Var ( X )

1+ p
2
k

1+ p
2
k

+ 1)

+1

1.96
10%

+1

is unknown, we dont have enough information to find rE ( N ) .

Guo Fall 2009 C, Page 201 / 284

Chapter 9

Bayesian estimate

Exam C routinely tests Bayesian premium problems. Though many seem to understand
the theory behind Bayesian premiums, they have trouble calculating Bayesian premiums.
Most candidates are weak in the following two areas:

When the prior probability is continuous, many candidates dont know how to
calculate the posterior probability or how to find the Bayesian premium.
Continuous-prior problems are typically harder than discrete-prior problems.

When the prior probability is discrete and the calculation is messy, many
candidates dont know how to solve the problem in a few minutes. Many
candidates have inefficient calculation methods that are long and prone to errors.

In this chapter, I will first give you an intuitive review of Bayes Theorem. Next, I will
give you a framework for quickly solving Bayesian premium problems whether the prior
probability is discrete and continuous. In addition, I will give you a BA II Plus/BA II Plus
Professional shortcut for calculating Bayesian premiums when the prior probability is
discrete.
Even you are proficient in Bayes Theorem, I recommend that you still go over the
review. It is the foundation for the framework and shortcut to be presented later.

Intuitive review of Bayes Theorem


Prior probability. Before anything happens, as our baseline analysis, we believe (based
on existing information we have up to now or using purely subjective judgment) that our
total risk pool consists of several homogenous groups. As a part of our baseline analysis,
we also assume that these homogenous groups have different sizes. For any insured
person randomly chosen from the population, he is charged a weighed average premium.
As an over-simplified example, we can divide, by the aggressiveness of a persons
driving habits, all insureds into two homogenous groups: aggressive drivers and nonaggressive drivers. In regards to the sizes of these two groups, we assume (based on
existing information we have up to now or using purely subjective judgment) that the
aggressive insureds account for 40% of the total insureds and non-aggressive account
for the remaining 60%.
So for an average driver randomly chosen from the population, we charge a weighed
average premium rate (we believe that an average driver has some aggressiveness and
some non-aggressiveness):

Guo Fall 2009 C, Page 202 / 284

Premium charged on a person randomly chosen from the population


= 40%*premium rate for an aggressive drivers rate
+ 60%*premium rate for a non-aggressive drivers rate
Posterior probability. Then after a year, an event changed our belief about the makeup
of the homogeneous groups for a specific insured. For example, we found in one year one
particular insured had three car accidents while an average driver had only one accident
in the same time period. So the three-accident insured definitely involved more risk than
did the average driver randomly chosen from the population. As a result, the premium
rate for the three-accident insured should be higher than an average drivers premium
rate.
The new premium rate we will charge is still a weighted average of the rates for the two
homogeneous groups, except that we use a higher weighting factor for an aggressive
drivers rate and a lower weighting factor for a non-aggressive drivers rate.
For example, we can charge the following new premium rate:
Premium rate for a driver who had 3 accidents last year
= 67%* premium rate for an aggressive drivers rate
+ 33%* premium rate for a non-aggressive drivers rate
In other words, we still think this particular drivers risk consists of two risk groups
aggressive and non-aggressive, but we alter the sizes of these two risk groups for this
specific insured. So instead of assuming that this persons risk consists of 40% of an
aggressive drivers risk and 60% of a non-aggressive drivers risk, we assume that his
risk consists of 67% of an aggressive drivers risk and 33% of a non-aggressive drivers
risk.
How do we come up with the new group sizes (or the new weighting factors)? There is a
specific formula for calculating the new group sizes:
For any given group,
Group size after an event
=K the group size before the event this groups probability to make the event happen.
K is a scaling factor to make the sum of the new sizes for all groups equal to 100%.
In our example above, this is how we got the new size for the aggressive group and the
new size for the non-aggressive group. Suppose we know that the probability for an
aggressive driver to have 3 car accidents in a year is 15%; the probability for a nonaggressive driver to have 3 car accidents in a year is 5%. Then for the driver who has 3
accidents in a year,
the size of the aggressive risk for someone who had 3 accidents in a year
Guo Fall 2009 C, Page 203 / 284

= K (prior size of pure aggressive risk)


(probability of an aggressive driver having 3 car accidents in a year)
= K (40% )(15%)
the size of the non-aggressive risk for someone who had 3 accidents in a year
= K (prior size of the non-aggressive risk)
(probability of a no- aggressive driver having 3 car accidents in a year)
= K ( 60% ) (5%)
K is a scaling factor such that the sum of posterior sizes is equal to one. So
K ( 40% ) (15%) + K ( 60% ) ( 5%) =1,

K=

1
= 11.11%
40% (15% ) + 60% ( 5% )

the size of the aggressive risk for someone who had 3 accidents in a year
= 11.11% (40% ) ( 15% )= 66.67%
the size of the non-aggressive risk for someone who had 3 accidents in a year
=11.11% (60% ) ( 5%) = 33.33%
The above logic should make intuitive sense. The bigger the size of the group prior to the
event, the higher contribution this group will make to the events occurrence; the bigger
the probability for this group to make the event happen, the higher the contribution this
group will make to the events occurrence. So the product of the prior size of the group
and the groups probability to make the event happen captures this groups total
contribution to the events occurrence.
If we assign the post-event size of a group proportional to the product of the prior size
and the groups probability to make the event happen, we are really assigning the postevent size of a group proportional to this groups total contribution to the events
occurrence. Again, this should make sense.
Lets summarize the logic for finding the new size of each group in the following table:

Guo Fall 2009 C, Page 204 / 284

Event: An insured had 3 accidents in a year.


A
B
C
Homogenous
BeforeGroups
groups (also called event
probability to
segments, which
group size make the even
are 2 components
happen
of a risk)
Aggressive
40%
15%

D=(scaling factor K) BC
Post-event group size

K40%15%
=

Non-aggressive

60%

5%

40% 15%
40% 15% + 60% 5%

K60%5%
=

60% 5%
40% 15% + 60% 5%

We can translate the above rule into a formal theorem:


If we divide the population into n non-overlapping groups G1,G 2, ...,Gn such that each
element in the population belongs to one and only one group, then after the event E
occurs,
Pr(Gi | E ) = K Pr(Gi ) Pr( E | Gi )
K is a scaling factor such at
K [ Pr(G1 | E ) + Pr(G2 | E ) + ... + Pr(Gn | E )] = 1
Or K [Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )] = 1
So K=

1
Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )

And Pr(Gi | E ) =

Pr(Gi ) Pr( E | Gi )
Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )

Pr(Gi | E ) is the conditional probability that Gi will happen given the event E happened,
so it is called the posterior probability. Pr(Gi | E ) can be conveniently interpreted as the
new size of Group Gi after the event E happened. Intuitively, probability can often be
interpreted as a group size.
For example, if a probability for a female to pass Course 4 is 55% and male 45%, we can
say that the total pool of the passing candidates consists of 2 groups, female and male
with their respective sizes of 55% and 45%.
Guo Fall 2009 C, Page 205 / 284

Pr(Gi ) is the probability that Gi will happen prior to the event Es occurrence, so its
called prior probability. Pr(Gi ) can be conveniently interpreted as the size of group Gi
prior to the occurrence of E.
Pr( E | Gi ) is the conditional probability that E will happen given Gi has happened. It is the
Group Gi s probability of making the event E happen. For example, say a candidate who
has passed Course 3 has 50% chance of passing Course 4, that is to say:
Pr(passing Course 4 / passing Course 3)=50%
We can say that the people who passed Course 3 have a 50% of chance of passing Course
4.

How to calculate the discrete posterior probability


Before we jump into the formula, lets look at a sixth-grade level math problem, which
requires zero knowledge about probability. If you understand this problem, you should
have no trouble understanding Bayes Theorem.
Problem 1
A rock is found to contain gold. It has 3 layers, each with a different density of gold. You
are given:

The top layer, which accounts for 80% of the mass of the rock, has a gold density
of only 10% (i.e. the amount of gold contained in the top layer is equal to 10% of
the mass of the top layer).

The middle layer, which accounts for 15% of the rocks mass, has a gold density
of 5%.

The bottom layer, which accounts for only 5% of the rocks mass, has a gold
density of 0.2%.

Questions
What is the rocks density of gold (i.e. what % of the rocks mass is gold)?
Of the total amount of gold contained in the rock, what % of gold comes from the top
layer? What % from the middle layer? What % comes from the bottom layer?
Solution
Lets set up a table to solve the problem. Assume that the mass of the rock is one (can be
1 pound, 1 gram, 1 ton it doesnt matter).
Guo Fall 2009 C, Page 206 / 284

A
Layer

2
3
4
5

Top
Middle
Bottom
Total

B
Mass of
the layer
0.80
0.15
0.05
1.00

C
Density of
gold in the
layer
10.0%
5.0%
0.2%

D=BC
Mass of gold
contained in the
layer
0.0800
0.0075
0.0001
0.0876

E=D/0.0876
Of the total amount of
gold in the rock, what %
comes from this layer?
91.3%
8.6%
0.1%
100%

As an example of the calculations in the above table,


Cell(D,2)=0.810%=0.08,
Cell(D,5)=0.0800+0.0075+0.0001=0.0876,
Cell(E,2)= 0.08/0.0876=91.3%.
So the rock has a gold density of 0.0876 (i.e. 8.76% of the mass of the rock is gold).
Of the total amount of gold contained in the rock, 91.3% of the gold comes from the top
layer, 8.6% of the gold comes from the middle layer, and the remaining 0.1% of the gold
comes from the bottom layers. In other words, the top layer contributes to 91.3% of the
gold in the rock, the middle layer 8.6%, and the bottom layer 0.1%.
The logic behind this simple math problem is exactly the same logic behind Bayes
Theorem.
Now lets change the problem into one about prior and posterior probabilities.
Problem 2
In underwriting life insurance applications for nonsmokers, an insurance company
believes that theres an 80% chance that an applicant for life insurance qualifies for the
standard nonsmoker class (which has the standard underwriting criteria and the standard
premium rate); theres a 15% chance that an applicant qualifies for the preferred smoker
class (which has more stringent qualifying standards and a lower premium rate than the
standard nonsmoker class); and theres a 5% chance that the applicant qualifies for the
super preferred class (which has the highest underwriting standards and the lowest
premium rate among nonsmokers).
According to medical statistics, different nonsmoker classes have different probabilities
of having a specific heart-related illness:

The standard nonsmoker class has 10% of chance of getting the specific heart
disease.
The preferred nonsmoker class has 5% of chance of getting the specific heart
disease.
Guo Fall 2009 C, Page 207 / 284

The super preferred nonsmoker class has 0.2% of chance of getting the specific
heart disease.

If a nonsmoking applicant was found to have this specific heart-related illness, what is
the probability of this applicant coming from the standard risk class? What is the
probability of this applicant coming from the preferred risk class? What is the probability
of this applicant coming from the super preferred risk class?

Solution
The solution to this problem is exactly the same as the one to the rock problem.
Event: the applicant was found to have the specific heart disease
A
B
C
E=D/0.0876
D=BC
(i.e. the scaling factor
=1/0.0876)
1
Group
BeforeThis groups After-event
After-event size of the
(or
event size probability
size of the
group (scaled)
segment) of the
of having
group (not yet
group
the specific
scaled)
heart illness
2
Standard
0.80
10.0%
0.0800
91.3%
3
Preferred
0.15
5.0%
0.0075
8.6%
4
Super
0.05
0.2%
0.0001
0.1%
Preferred
5
Total
1.00
0.0876
100%
So if the applicant was found to have the specific heart disease, then
Theres a 91.3% chance he comes from the standard risk class;
Theres an 8.6% chance he comes from the preferred risk class;
Theres a 0.1% chance he comes from the super preferred risk class.
Framework for calculating the discrete posterior probability
When calculating the discrete posterior probability, if the problem is tricky, try to set up
the table as we did in Problem 1 and Problem 2. Use this table to help you keep track of
your data and work.
Problem 3
1% of the women at age 45 who participate in a study are found to have breast cancer.
80% of women with breast cancer will have a positive mammogram. 10% of women
without breast cancer will also have a positive mammogram. One woman aged 45 who
participated in the study was found to have a positive mammogram.
Guo Fall 2009 C, Page 208 / 284

Calculate the probability that this woman has breast cancer.


Solution
This problem is tricky and many folks wont be able to solve this problem right.
To solve this problem, we need to correctly identify the following 3 items:

Whats the event?

What are the distinct causes (i.e. segments) that can possibly produce the event?
Make sure your causes are mutually exclusive (i.e. no two causes can happen
simultaneously) and collectively exhaustive (i.e. there are no other causes).

What is each causes probability to produce the event?

Event a woman (who participated in a study) is found to have a positive mammogram.


Causes of this event two distinct causes. Women with breast cancer and without breast
cancer. These are the two segments. In terms of size of each segment, women with breast
cancer account for 1% of the participants; and women without breast cancer account for
99%.
Each cause probability to produce the event women with breast cancer have 80%
chance of having a positive mammogram. Women without breast cancer have 10% of the
chance of having a positive mammogram.
Next, we set up the following table:
Event: a woman in the study is found to have a positive mammogram.
Segments
Segments
Segments
contribution
contribution % to the
probability to
Segment (distinct
Segments
produce the
amount to the
event (post event
causes)
size
event
event
probability)
women with breast
cancer

1%

80%

1%(80%) =0.008 0.008/0.107=7.48%

women without breast


cancer

99%

10%

99%(10%)=0.099 0.009/0.107=92.52%

Total

100%

0.107

100%

So if a woman aged 45 who participated in the study is found to have a positive


mammogram, then she has 7.48% chance of actually having breast cancer.

Guo Fall 2009 C, Page 209 / 284

Problem 4 (SOA May 2003, Course 1, #31)


A health study tracked a group of persons for five years. At the beginning of the study,
20% were classified as heavy smokers, 30% as light smokers, and 50% as nonsmokers.
Results of the study showed that light smokers were twice as likely as nonsmokers to die
during the five-year study, but only half as likely as heavy smokers.
A randomly selected participant from the study died over the five-year period.
Calculate the probability that the participant was a heavy smoker.
Solution
Let p =the probability that a non-smoker will die during the next 5 years. Then,
The probability that a light smoker will die during the next 5 years is 2 p
The probability that a heavy smoker will die during the next 5 years is 4 p
Please note that we dont enough information to calculate p . This shouldnt bother us.
We need to know the value of p to solve the problem.
Event: A participant died during the 5-year period
Segment's
Segment
probability to
Segment's
Segment's
Segment
size
produce the event contribution amount
contribution %
4p
20%(4 p )=0.8 p 0.8 p /1.9 p =42.11%
Heavy smoker
20%
2p
30%(2 p )=0.6 p 0.6 p /1.9 p =31.58%
Light smoker
30%
p
50%( p )= 0.5 p
0.5 p /1.9 p =26.32%
Non smoker
50%
1.9 p
Total
100%
100.00%
The probability that the participant was a heavy smoker is 42.11%.
The probability that the participant was a heavy smoker is 31.58%.
The probability that the participant was a heavy smoker is 26.32%.
Morale of this problem
In problems related to Bayes Theorem, the absolute size of each segment doesnt
matter; only the ratio of each segment size matters. Similarly, the absolute
probability for each segment to produce the event doesnt matter; only the ratio of
probabilities matters.
If we are to solve this problem quickly, we can set up the following table:

Guo Fall 2009 C, Page 210 / 284

Event: A participant died during the 5-year period

Segment
Heavy smoker
Light smoker
Non smoker
Total

Segment
size
2
3
5
10

Segment's probability to
produce the event
4
2
1

Segment's
contribution
amount
2(4)=8
3(2)=6
5(1)=5
19

Segment's
contribution %
8/19=42.11%
6/19=31.58%
5/19=26.32%
100%

In the above table, we change the segment sizes from 20%, 30%, and 50% to 2, 3, and 5.
Similarly, we change the segments probabilities from 4 p , 2 p , and p to 4, 2, and 1.
This speeds up our calculations. You can use this technique when taking the exam.
Problem 5 (May 2000, #22)
You are given:

A portfolio of independent risks is divided into two classes, Class A and Class B.
There are twice as many risks in Class A as in Class B.
The number of claims for each insured during a single year follows a Bernoulli
distribution.
Class A and B have claim size distributions as follows:
Claim Size
50,000
100,000

Class A
0.6
0.40

Class B
0.36
0.64

The expected number of claims per year is 0.22 for Class A and 0.11 for Class B.

One insured is chosen at random. The insureds loss for two years combined is 100,000.
Calculate the probability that the selected insured belongs to Class A.
Solution
This time, well use a formula driven approach without a table. Lets S represent the
total claim $ amount incurred by the randomly chosen insured during the 2-year period.
We observe that S = 100, 000 . We are asked to find P ( A S = 100, 000 ) , which is the
posterior probability that Class A has incurred a total loss of $100,000 during the 2-year
period.
Using either the conditional probability formula or the Bayes Theorem, we have:
P ( A S = 100, 000 ) =

P ( A S = 100, 000 ) P ( A) P ( S = 100, 000 A)


=
P ( S = 100, 000 )
P ( S = 100, 000 )
Guo Fall 2009 C, Page 211 / 284

P ( A ) P ( S = 100, 000 A )

P ( A) P ( S = 100, 000 A) + P ( B ) P ( S = 100, 000 B )

There are twice as many risks in Class A as in Class B. So P ( A) = 2 P ( B )


P ( A S = 100, 000 ) =

=
1+

P ( A) P ( S = 100, 000 A)

P ( A ) P ( S = 100, 000 A ) + P ( B ) P ( S = 100, 000 B )

1
=
P ( B ) P ( S = 100, 000 B )
P ( A ) P ( S = 100, 000 A )

P ( A S = 100, 000 ) =

1+

1
P ( S = 100, 000 B )

P ( B)

P ( A) P ( S = 100, 000 A )

1
1 P ( S = 100, 000 B )
1+
2 P ( S = 100, 000 A)

Once again, you see that the posterior probability depends on


Ratio of P ( A) and P ( B ) , not their absolute amounts

Ratio of P ( S = 100, 000 A ) and P ( S = 100, 000 A ) , not their absolute amounts

So we need to find the ratio

P ( S = 100, 000 B )

P ( S = 100, 000 A )

P ( S = 100, 000 A ) is the probability that the Class A produces the observation (i.e. Class

A incurs $100,000 loss in 2 years).


We are told that the # of claims for Class A and B is a Bernoulli random variable.
Remember that Bernoulli random variable is just a binominal random variable with n = 1
(only one trial). Let X represent the # of claims incurred by the insured. Let p represent
the probability for the insured to have a claim. Then E ( X ) = p . We are told that
E ( X A ) = 0.22 . So pA = 0.22 . Similarly, E ( X B ) = pB = 0.11 .
So each year, Class A can have either zero claim (with probability 0.78) or one claim
(0.22). The claim amount is either 50,000 (probability 0.6) and 100,000 (probability 0.4).
Each year, Class B can have either zero claim (with probability 0.89) or one claim (0.11).
The claim amount is either 50,000 (probability 0.36) and 100,000 (probability 0.64).
Guo Fall 2009 C, Page 212 / 284

There are only 3 ways for Class A or B to produce $100,000 claims in two years:
Have $50,000 claim in Year 1 and $50,000 Year 2.
Have $100,000 claim in Year 1 and $0 claim in Year 2.
Have $0 claim in Year 1 and $100,000 claims in Year 2.
P ( S = 100, 000 A) = ( 0.222 )( 0.62 ) + 2 ( 0.22 )( 0.78 )( 0.4 ) = 0.1547

P ( S = 100,000 B ) = ( 0.112 )( 0.36 2 ) + 2 ( 0.11)( 0.89 )( 0.64 ) = 0.1269

P ( A S = 100, 000 ) =

1
1
=
= 0.709
1 P ( S = 100, 000 B ) 1 + 1 0.1269
1+
2 0.1547
2 P ( S = 100, 000 A )

How to calculate the continuous posterior probability


Problem 6 (continuous random variable)
You are tossing a coin. Not knowing p , the success rate of a heads showing up in one
toss of the coin, you subjectively assume that p is uniformly distributed over [ 0,1] . Next,
you do an experiment by tossing the coin 3 times. You find that, in this experiment, 2 out
of 3 tosses have heads.
Calculate the posterior probability p .
Solution

Event: getting 2 heads out of 3 tosses.


A
B
C
1 Group
BeforeThis groups
event
probability
size of the
to make the
group
event
happen
1
2 Any p in
C32 p 2 (1 p )
[0,1]

E=D Scaling factor


D=BC
After-event size After-event size of the
of the group (not group (scaled)
yet scaled)
C32 p 2 (1 p )

C32 p 2 (1 p )
1

C32 p 2 (1 p )dp

Total

C32 p 2 (1 p )dp

100%

The key to solving this problem is to understand that we have an infinite number of
groups. Each value of p ( 0 p 1 ) is a group. Because p is uniform over
Guo Fall 2009 C, Page 213 / 284

[0,1], f ( p ) = 1 . As a result, for a given group of p , the before-event size is one. And for

a given group of p , this groups probability to make the event getting 2 heads out of 3
tosses happen is a binomial distribution with probability of C32 p 2 (1 p ) . So the afterevent size is
C32 p 2 (1 p )

scaling factor before-event


group size

the group's probability


to have 2 heads out of 3 tosses

After-event size of the group

k is a scaling factor such that the sum of the after-event sizes for all the groups is equal to
one. Since we have an infinite number of groups, we have to use integration to sum up all
the after-event sizes for each group:
1

k C32 p 2 (1 p )dp = 1

k=

1
1

C32 p 2 (1 p )dp

Then the after-event size (or posterior probability) is:


k C32 p 2 (1 p ) =

C32 p 2 (1 p )
1

C p (1 p )dp
2
3

p 2 (1 p )
1

p 2 (1 p )dp

It turns out that the posterior probability we just calculated is a Beta distribution.
Key point
The process for calculating the continuous posterior probability is the same for
calculating the discrete posterior probability. The only difference is this: you use
integration for continuous posterior probability; you use summation for discrete posterior
probability.
Problem 7 (May 2000 #10)
The size of a claim for an individual insured follows an inverse exponential distribution
with the following probability density function:
e x
f (x ) =
x2

The parameter
function:

, x>0

has the prior distribution with the following probability density


Guo Fall 2009 C, Page 214 / 284

g(

)=

>0

One claim of size 2 has been observed for a particular insured. Which of the following is
proportional to the posterior distribution of ?
2

e ,

Solution
The observation is x = 2 . We need to find g ( x = 2 ) .
g ( x = 2) =
posterior density

g ( x = 2 ) = kg (

g(

k
scaling factor

g ( x = 2) = k

e
22

k
e
16

x =2
3

is proportional to e

So the posterior distribution of

this group's density to


make the event happen

e x
x2

f (x = 2

posterior density

) f (x = 2 ) = k
4

Here the problem didnt ask you to find the full posterior probability. If you have to find
it, this is how. One way is to do integration. Assume g ( x = 2 ) = K e 3 4 . Because the
total posterior probability should be one, we have:
+

g ( x = 2 )d =

K e

K=

d = 1,

1
+

0
+

To calculate

d , set

e
0

d =
0

3
= y . Then
4

4
4
4
y e yd
y =
3
3
3

2 +

ye y dy = 1 , K =

gamma distribution. So
0

4
y.
3

4
ye dy =
3
y

3
4

. Here ye

is a simple

9
9
, and g ( x = 2 ) = xe
16
19

3x 4

Guo Fall 2009 C, Page 215 / 284

Another, quicker, way to find the full expression of g ( x = 2 ) is to notice that e


=

= 2 and

is a gamma distribution with parameter

4
. If you look at the table for
3

Exam C, youll see the gamma pdf:


1

f ( x) =

( 4 3)

x 2 1e

x ( 4 3)

9
xe
16

3x 4

Problem (Nov 2004, #33)


You are given:
In a portfolio of risks, each policyholder can have at most one claim per year.
The probability of a claim for a policyholder during a year is q .

The prior probability is

(q) =

q3
, 0.6 < q < 0.8
0.07

A randomly selected policyholder has one claim in Year 1 and zero claim in Year 2.
For this policyholder, determine the posterior probability that 0.7 < q < 0.8 .
Solution
The observation is N1 = 1 and N 2 = 0 . We are asked to find the posterior probability
P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 )

P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 ) =

0.8

f ( q N1 = 1, N 2 = 0 )dq

q = 0.7

f ( q N1 = 1, N 2 = 0 ) =

f ( q ) P ( N1 = 1, N 2 = 0 q )
P ( N1 = 1, N 2 = 0 )

f ( q ) P ( N1 = 1, N 2 = 0 q )
0.8

f ( q ) P ( N1 = 1, N 2 = 0 q ) dq

0.6

P ( N1 = 1, N 2 = 0 q ) = P ( N1 = 1 q ) P ( N 2 = 0 q ) = q (1 q ) . We assume N1 and N 2 are

independent.
f ( q ) P ( N1 = 1, N 2 = 0 q ) =

q3
q 4 q5
q (1 q ) =
0.07
0.07

Guo Fall 2009 C, Page 216 / 284

q 4 q5
f ( q N1 = 1, N 2 = 0 ) = 0.8 40.075
=
q q
dq
0.07
0.6

q4 q5
0.8

q = 0.7
0.8

0.8

1 7
q
7 ! 0.7

1 5
q
5

1 6
q
6 ! 0.6

0.8

1
0.86 0.7 6 )
(
=5
1
( 0.85 0.65 )
5

0.8

q ( q 4 q 5 )dq

(q

q 5 ) dq

0.6

1 6
q
6

q 5 ) dq

0.6

0.8

P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 ) =

(q

(q

q = 0.7
0.8

(q

q 6 )dq
q 5 ) dq

0.6

1
0.87 0.77 )
(
6
= 0.5572
1
6
6
( 0.8 0.6 )
6

Nov 2001 #34


You are given:
The # of claims for each policyholder follows a Poisson distribution with mean
The distribution of across all policyholders has probability density function

( )=
e

"

d =

>0

,
1
n2

A randomly selected policyholder is known to have had at least one claim last year.
Determine the posterior probability that this same policyholder will have at least one
claim this year.
Solution
The observation is N1 # 1 . We are asked to find P ( N 2 # 1 N1 # 1) . If we ignore N1 # 1 ,
then by conditioning on

, we have:

P ( N2 # 1

P ( N 2 # 1) =

) f ( )d

=0

Guo Fall 2009 C, Page 217 / 284

N2

is a Poisson random variable with mean . So


P ( N2 # 1

) =1

P ( N2 = 0

(1

P ( N 2 # 1) =

) =1

) f ( )d

=0

The observation N1 # 1 will change the above equation to

(1

P ( N 2 # 1 N1 # 1) =

)f(

N1 # 1) d

=0

Next, we have:
f

N1 # 1) =
f

( ) P ( N1 # 1 )

( ) P ( N1 # 1 ) d

(1

)d

e d

N1 # 1) =

d =

4
e
3

P ( N 2 # 1 N1 # 1) =

(1

(1

)f(

1
12

4
3

)d

(1
2

(1

N1 # 1) d =

=0

2e

+e

)d

=0

4 1
3 12

1 3
=
22 4

=0

(1
(1

4
3

e d
0

) 43
e

2
0

(1

d +

)d

1 1
+
= 0.8148
22 32

Guo Fall 2009 C, Page 218 / 284

Calculate Bayesian premium when the prior probability is discrete


Next, Ill give you a framework for how to calculate Bayesian problems. As I explain my
framework, I will also give you a shortcut.
Framework for calculating discrete-prior Bayesian premiums
Step 1

Determine the observation.

Step 2

Discard the observation. Set up your partition equation.

Step 3

Consider the observation. Modify your parturition equation obtained in


Step 2. Change the prior probability to posterior probability.

Step 4

Use Bayes Theorem and calculate the posterior probability.

Step 5

Calculate the final answer.

Let me use examples to illustrate this solution framework.


Problem 6 (Nov 2001 #7)
You are given the following information about six coins:
Coin
1-4
5
6

Probability of Heads
0.50
0.25
0.75

A coin is selected at random and the flipped repeatedly. X i denotes the outcomes of the
i th flip, where 1 indicates heads and 0 indicates tails. The following sequence is
obtained:

S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}
Determine E ( X 5 S ) using Bayesian analysis.
Solution

Step 1 Determine the observation. This is easy; we are already told the observation is
S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}
Step 2 Discard the observation. Set up the partition equation.
Guo Fall 2009 C, Page 219 / 284

Now were going to simplify the problem by purposely discarding the observation. So
instead of calculating E ( X 5 S ) , well just calculate E ( X 5 ) . X 5 is the # of heads
showing up in the fifth flip of the coin randomly chosen. X 5 is a binominal random
variable with parameter n = 1 (one flip of coin) and p (the probability of the head
showing up). Using the binomial distribution formula, we have:
E ( X5 ) = n p = p
However, the parameter p varies by coin types. For Coin 1-4, p = 0.5 ; for Coin 5,
p = 0.25 ; and for Coin 6, p = 0.75 . Because the coin is randomly chosen from Coin 1, 2,
3, 4, 5, and 6, we dont know which coin is chosen. So well need to partition E ( X 5 )
over coin types:
E ( X5 )

= E ( X 5 Coin 1-4 ) P ( Coin 1-4 ) + E ( X 5 Coin 5 ) P ( Coin 5 ) + E ( X 5 Coin 6 ) P ( Coin 6 )

We already know that


E ( X 5 Coin 1-4 ) = P ( Coin 1-4 showing a head in one flip ) = 0.5
E ( X 5 Coin 5 ) = P ( Coin 5 showing a head in one flip ) = 0.25
E ( X 5 Coin 6 ) = P ( Coin 6 showing a head in one flip ) = 0.75

E ( X 5 ) = 0.5P ( Coin 1-4 ) + 0.25P ( Coin 5 ) + 0.75P ( Coin 6 )

We can go one step further and calculate E ( X 5 ) . Though the problem doesnt
specifically tell us P ( Coin 1-4 ) , P ( Coin 5) , and P ( Coin 6 ) , we assume that coins are
uniformly distributed so each coin is equally likely to be chosen. So
P ( Coin 1-4 ) =

4
,
6

P ( Coin 5 ) =

1
,
6

P ( Coin 6 ) =

1
6

4
1
1
E ( X 5 ) = 0.5 + 0.25 + 0.75 = 0.5
6
6
6

Of course, this problem isnt as simple as this. Otherwise, everyone who has passed
Exam P will pass Exam C.
Step 3 Consider the observation. Modify the equation obtained in Step 2. Change
Guo Fall 2009 C, Page 220 / 284

the prior probabilities to posterior probabilities.


We have found E ( X 5 ) . The real problem, however, is to find E ( X 5 S ) . So well need
to modify our equation obtained in Step 2. The original partition equation (if we discard
the observation) is:

E ( X5 )

= E ( X 5 Coin 1-4 ) P ( Coin 1-4 ) + E ( X 5 Coin 5 ) P ( Coin 5 ) + E ( X 5 Coin 6 ) P ( Coin 6 )

How to modify:
E ( X5 )

E ( X5 S )

P ( Coin 1-4 )

P ( Coin 1-4 S )

P ( Coin 5)

P ( Coin 5 S )

P ( Coin 6 )

P ( Coin 6 S )

Here the observation S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} changes our equation. Because of


this observation, we can no longer assume that the coin randomly chosen has 4 6 chance
of being Coin 1-4, 1 6 chance of being Coin 5, and 1 6 chance of being Coin 6; these
probabilities would have fine if we didnt observe S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} . Now

we have this new information S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} . We will need to


reevaluate the probability that the coin belongs to which type. So well replace the prior
probabilities P ( Coin 1-4 ) , P ( Coin 5) , and P ( Coin 6 ) with posterior probabilities
P ( Coin 1-4 S ) , P ( Coin 5 S ) , and P ( Coin 6 S ) respectively.

In addition, well need to change E ( X 5 ) to E ( X 5 S ) to indicate that we are calculating


the conditional expectation.
Now the new equation is:
E ( X5 S )
= E ( X 5 Coin 1- 4 ) P ( Coin 1- 4 S ) + E ( X 5 Coin 5 ) P ( Coin 5 S ) + E ( X 5 Coin 6 ) P ( Coin 6 S )
= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )
E ( X 5 S ) = 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )

Guo Fall 2009 C, Page 221 / 284

Please note that our observation S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} doesnt change how
likely each coin actually produces a head in one flip. So the following three items are
fixed regardless of our observation:
E ( X 5 Coin 1-4 ) = P ( Coin 1-4 showing a head in one flip ) = 0.5
E ( X 5 Coin 5 ) = P ( Coin 5 showing a head in one flip ) = 0.25
E ( X 5 Coin 6 ) = P ( Coin 6 showing a head in one flip ) = 0.75

Step 4 Calculate the posterior probabilities using Bayes Theorem.


P ( Coin 1- 4 S ) =

P ( Coin 5 S ) =

P ( Coin 6 S ) =

P(S

Coin 1- 4 ) P ( Coin 1- 4 ) P ( S Coin 1- 4 )


=
P(S )
P(S )

P(S

Coin 5) P ( Coin 5 ) P ( S Coin 5)


=
P(S )
P(S )

P(S

Coin 6 ) P ( Coin 6 ) P ( S Coin 6 )


=
P(S )
P(S )

Where
P ( S ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )
Detailed calculation:
P ( S Coin 1- 4 ) = P (1,1, 0,1 Coin 1- 4 ) = 0.5 ( 0.5 )( 0.5 )( 0.5 ) = 0.54
P ( S Coin 5 ) = P (1,1, 0,1 Coin 5 ) = 0.25 ( 0.25 )( 0.75 )( 0.25 ) = 0.253 ( 0.75 )
P ( S Coin 6 ) = P (1,1, 0,1 Coin 6 ) = 0.75 ( 0.75 )( 0.25 )( 0.75 ) = 0.753 ( 0.25 )

P(S
P(S
P(S

4
Coin 1- 4 ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) = 0.54
6
1
Coin 5 ) = P ( Coin 5 ) P ( S Coin 5 ) = 0.253 0.75
6
1
Coin 6 ) = P ( Coin 6 ) P ( S Coin 6 ) = 0.753 0.25
6

Guo Fall 2009 C, Page 222 / 284

P(S ) =

4
1
1
0.54 ) + ( 0.253 ) ( 0.75 ) + ( 0.753 ) ( 0.25 )
(
6
6
6

P ( Coin 1- 4 S ) =

4
( 0.54 )
6

4
1
1
0.54 ) + ( 0.253 ) ( 0.75 ) + ( 0.753 ) ( 0.25)
(
6
6
6

= 0.681

1
0.253 ) ( 0.75 )
(
6
P ( Coin 1- 4 S ) =
= 0.032
4
1
1
4
3
3
( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)
6
1
0.753 ) ( 0.25 )
(
6
P ( Coin 1- 4 S ) =
= 0.287
4
1
1
4
3
3
( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)
6

Step 5 The final result:


E ( X5 S )
= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )

= 0.5 (0.681) + 0.25 (0.032) + 0.75 (0.287) = 0.564


I recommend that initially you use the 5-step framework to calculate discrete-prior
Bayesian premiums. Just copy what I did. Explicitly write out each of the 5 steps; dont
skip step. Solve as many problems as you need until you are proficient with the
framework.
Once you are familiar with the 5-step process, lets learn how to improve it. Well focus
on improving Step 4 (calculating the posterior probabilities). If you ever solve a Bayesian
premium problem, youll have discovered that Step 4 is long, tedious, and prone to errors.
Take a look at Step 4 in Problem 4. See how involved the calculation is. When taking the
exam, you are really stressed. In addition, you have only 3 minutes to solve a problem. If
you follow the standard solution approach, chances are high that youll mess up at least
one step of your calculation. Then all your hard work is ruined. You wont be able to
score a point.
Most exam candidates will mess up in Step 4 . Lets find a better way to do Step 4.

Guo Fall 2009 C, Page 223 / 284

What are we doing in Step 4? Two things. First, we calculate the raw posterior
probabilities:
P(S
P(S
P(S

4
Coin 1- 4 ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) = 0.54
6
1
Coin 5 ) = P ( Coin 5 ) P ( S Coin 5 ) = 0.253 0.75
6
1
Coin 6 ) = P ( Coin 6 ) P ( S Coin 6 ) = 0.753 0.25
6

Next, we normalize these raw posterior probabilities. We do so by using a normalizing


constant

k=
=

1
P(S )

1
P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )

After multiplying each raw posterior probability with this constant, the three posterior
probabilities will nicely add up to one. Normalization is necessary; its a part of Bayes
Theorem. However, it is a messy calculation. So ideally, well want to avoid it.
It turns out that we really can avoid normalizing the raw posterior probabilities. To
understand how to avoid normalization, lets formally present the question:
Problem -- Calculate E ( X ) given the following information:
X =x

pX ( x )

0.5

4
0.54 ) k
(
6

0.25
0.75

1
0.253 ) ( 0.75 ) k
(
6
1
( 0.753 ) ( 0.25) k
6

Please note that E ( X ) is exactly E ( X 5 S ) in this problem

Guo Fall 2009 C, Page 224 / 284

We have seen this problem in the chapter on how to BA II Plus/Professional 1-V


Statistics Worksheet. This is how we solved it without calculating k .
X =x
pX ( x )
Scaled p X ( x ) up
multiply p X ( x ) by
0.5
0.25
0.75

4
0.54 ) k = 0.041667 k
(
6
1
( 0.253 ) ( 0.75) k = 0.001953 k
6
1
( 0.753 ) ( 0.25) k = 0.017578 k
6

1, 000, 000
k

41,667
1,953
17,578

Next, we enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=0.5, Y01=41,667
X02=0.25, Y02= 1,953
X03=0.75, Y03=17,578
Next, set your BA II Plus/Professional to I-V Statistics Worksheet. You do this by
pressing 2ND Stat and then keeping pressing ENTER until your calculator displays
1-V.
Press the down arrow key & . You should get: n = 61,198
Press the down arrow key & . You should get: X = 0.56382970

So E ( X 5 S ) ' X = 0.564
This the result calculated using BA II Plus/Professional 1-V Statistics Worksheet matches
what we calculated in the 5-step process.
Now its time for me to present my shortcut

Guo Fall 2009 C, Page 225 / 284

Event: the coin produces HHTH


A
B
C
D=BC
Group Before- This
After-event size of the
(Coin event
groups
group (raw posterior
Type) size of probability probability)
the
to produce
group
HHTH
0.54
4
4
0.54 ) = 0.041667
(
1-4
6
6
3
0.25
1
1
0.253 ) ( 0.75 ) = 0.001953
(
5
0.75
6
6
3
0.75
1
1
0.753 ) ( 0.25 ) = 0.017578
(
6
0.25
6
6

E = 1,000,000
F
Scale up raw Conditional
posterior
mean
probability

41,667

0.50

1,953

0.25

17,578

0.75

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=0.5, Y01=41,667
X02=0.25, Y02= 1,953
X03=0.75, Y03=17,578
Next, set your BA II Plus/Professional to I-V Statistics Worksheet. You do this by
pressing 2ND Stat and then keeping pressing ENTER until your calculator displays
1-V.
Press the down arrow key & . You should get: n = 61,198
Press the down arrow key & . You should get: X = 0.56382970

So E ( X 5 S ) ' X = 0.564

Guo Fall 2009 C, Page 226 / 284

Better yet, we can reduce the decimal places in Column D to 4


decimal places. This is even faster:
Event: the coin produces HHTH
A
B
C
D=BC
Group Before- This
After-event size of the
(Coin event
groups
group (raw posterior
Type) size of probability probability)
the
to produce
group
HHTH
I
0.54
4
4
0.54 ) = 0.0417
(
6
6
II
0.253
1
1
0.253 ) ( 0.75 ) = 0.0020
(
0.75
6
6
III
0.753
1
1
0.753 ) ( 0.25 ) = 0.0176
(
0.25
6
6

E = 10,000
Scale up raw
posterior
probability

F
Conditional
mean

417

0.50

20

0.25

176

0.75

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=0.5, Y01=417
X02=0.25, Y02= 20
X03=0.75, Y03=176
Using 1-V Statistics Worksheet, you should get: n = 613 , X = 0.56362153 ' 0.564

Next, well practice this shortcut.


Problem 7 (May 2000 #7)
You are given the following information about two classes of risks:

Risks in Class A have a Poisson claim count distribution with a mean of 1.0 per year.
Risks in Class B have a Poisson claim count distribution with a mean of 3.0 per year.
Risks in Class A have an exponential severity distribution with a mean of 1.0 per year.
Risks in Class B have an exponential severity distribution with a mean of 3.0 per year.
Each class has the same number of risks.
Within each class, severities and claim counts are independent.
A risk is randomly selected and observed to have 2 claims during one year. The observed
claim amounts were 1.0 and 3.0. Calculate the posterior expected value of the aggregate
loss for this risk during the next year.
Solution

This is Bayes theorem applied in the context of compound loss distributions.


Guo Fall 2009 C, Page 227 / 284

Conceptual framework
Let
S represent the aggregate claim dollar amount.
X represent the individual claim dollar amount
N represent the # of claims
N

Then S = ( X i . We are told that N and X are independent. In addition, X i are


i =1

independent identically distributed. We have observed that { N = 2, X 1 = 1, X 2 = 3} .

We are asked to calculate E ( S N = 2, X 1 = 1, X 2 = 3) .


First, lets make things simple and forget about the condition N = 2, X 1 = 1, X 2 = 3 . Then

E ( S ) = E ( N ) E ( X ) . Since the risk is randomly chosen from Class A and Class B, we


have:
E ( S ) = E ( S A ) P ( A) + E ( S B ) P ( B )

The above formula is an Exam P concept. You shouldnt have trouble understanding it.
Here P ( A) and P ( B ) are prior probabilities, which are probabilities prior to our
observation { N = 2, X 1 = 1, X 2 = 3} .
Next,
E ( S A ) = E ( N A ) E ( X A ) = "A

= 1(1) = 1

E ( S B ) = E ( N B ) E ( X B ) = "B

= 3 ( 3) = 9

Here "A and "B are the Poisson means for claim counts for Class A and B respectively.
And A and B are exponential mean claim amounts for Class A and B respectively.
E ( S ) = P ( A) + 9P ( B )
Now lets move to the complex concept E ( S N = 2, X 1 = 1, X 2 = 3) . To calculate this
amount, well still use the formula E ( S ) = P ( A ) + 9 P ( B ) . However, well replace the
prior probabilities P ( A) and P ( B ) with posterior probabilities:

Guo Fall 2009 C, Page 228 / 284

P ( A N = 2, X 1 = 1, X 2 = 3) , P ( B N = 2, X 1 = 1, X 2 = 3)

Our observation { N = 2, X 1 = 1, X 2 = 3} has changed our belief of the likelihood that the

risk is from Class A and Class B. So well no longer use the prior probability P ( A) and
P ( B ) to calculate E ( S ) .
In addition, well replace E ( S ) with E ( S N = 2, X 1 = 1, X 2 = 3) to indicate that the
expected aggregate claim amount is based on the observation { N = 2, X 1 = 1, X 2 = 3} .
Then our original partition equation becomes:
E ( S N = 2, X 1 = 1, X 2 = 3)
= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)

Next, well need to use the Bayes theorem to calculate the posterior probabilities
P ( A N = 2, X 1 = 1, X 2 = 3) and P ( B N = 2, X 1 = 1, X 2 = 3) :
P ( A N = 2, X 1 = 1, X 2 = 3) =

P ( N = 2, X 1 = 1, X 2 = 3)

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( B N = 2, X 1 = 1, X 2 = 3) =

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
P ( N = 2, X 1 = 1, X 2 = 3)

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

If you understand my logic so far, you are in the good shape. The remaining work is just
the calculation.
Standard calculation
Well calculate the probability for Class A Risk and Class B Risk to each produce the
observed outcome { N = 2, X 1 = 1, X 2 = 3} :
Guo Fall 2009 C, Page 229 / 284

P A { N = 2, X 1 = 1, X 2 = 3} = P A ( N = 2 ) P A ( X = 1) P A ( X = 3)
=e

"A

( "A )

2!

=e

1
(e
2!

)( e ) = 12 e
3

= 0.00337

P B { N = 2, X 1 = 1, X 2 = 3} = P B ( N = 2 ) P B ( X = 1) P B ( X = 3)
=e

"B

( "B )
2!

=e

32 1
e
2! 3

1
3

1
e
3

3
3

1
= e
2

1
3

= 0.00656

Next, well calculate the posterior probability:


P ( A N = 2, X 1 = 1, X 2 = 3)

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
0.5 ( 0.00337 )

0.5 ( 0.00337 ) + 0.5 ( 0.00656 )

= 0.339

Similarly,
P ( B N = 2, X 1 = 1, X 2 = 3)
=

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
0.5 ( 0.00656 )

0.5 ( 0.00337 ) + 0.5 ( 0.00656 )

= 0.661

Finally,
E ( S N = 2, X 1 = 1, X 2 = 3)
= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)

=1(0.339) + 9(0.661) = 6.29

Guo Fall 2009 C, Page 230 / 284

Shortcut
When taking the exam, youll still need to understand the conceptual framework
explained in the beginning of the solution. However, youll skip the normalizing step and
avoid the need to manually calculate the mean.
This is what you need when solving this problem in the exam condition:
Event: { N = 2, X 1 = 1, X 2 = 3}

Group Before- This groups


probability to produce
event
size of the event
the
group

1
e 1 )( e 3 )
(
2!
1
= e 5 = 0.00337
2

0.5

0.5

32 1
e
e
2! 3

1
= e
2

1
3

1
3

1
e
3

3
3

After-event
size of the
group (raw
posterior
probability)

Scale up
Conditional
raw
mean
posterior
probability
(multiply
the raw
probability
by
200,000)
0.5(0.00337)
337
"A A = 1(1) = 1

656
0.5(0.00656)

"B

= 3 ( 3) = 9

= 0.00656

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=1,
Y01=337
X02=9,
Y02=656
You should get: n = 993 , X ' 6.28 . So E ( S N = 2, X 1 = 1, X 2 = 3) ' 6.28 .

Problem 8 (Nov 2000 #3)

You are given the following for a dental insurance:

Claim counts for individual insureds follow a Poisson distribution.


Half of the insureds are expected to have 2 claims per year.
The other half of the insureds are expected to have 4 claims per year.

Guo Fall 2009 C, Page 231 / 284

A randomly selected insured has made 4 claims in each of the first two policy years.
Determine the Bayesian estimate of this insureds claim count in the next (third) policy
year.
Solution
The observation is { N1 = 4, N 2 = 4} . We are asked to find E ( N 3 N1 = 4, N 2 = 4 ) .

However, the insured can belong to either Class A with "A = 2 or Class B with "B = 4 .

So if we dont worry about the observation { N1 = 4, N 2 = 4} , we have:


E ( N3 ) = E ( N3 A ) P ( A) + E ( N3 B ) P ( B )

= "A P ( A ) + "B P ( B ) = 2 P ( A ) + 4 P ( B )

Next, well modify the above partition equation by considering the observation
{ N1 = 4, N 2 = 4} . Well change the prior probabilities to posterior probabilities:
E ( N 3 N1 = 4, N 2 = 4 ) = 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )

Next, we need to calculate the posterior probabilities:

P ( A N1 = 4, N 2 = 4 ) =

( N1 = 4, N 2 = 4 )!
P ( N1 = 4, N 2 = 4 )

P A

P ( A ) P ( N1 = 4, N 2 = 4 A )

P ( A) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )

Similarly,
P ( B N1 = 4, N 2 = 4 ) =

P ( B ) P ( N1 = 4, N 2 = 4 B )

P ( A ) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )

Detailed calculations (if you use my shortcut, youll avoid most of these calculations):
P ( N1 = 4, N 2 = 4 A ) = P ( N1 = 4 A) P ( N 2 = 4 A ) = e

"A

( "A )

24
= e
4! !
4!

Guo Fall 2009 C, Page 232 / 284

P ( N1 = 4, N 2 = 4 B ) = P ( N1 = 4 B ) P ( N 2 = 4 B ) = e

( "B )

44
= e
4! !
4!

"B

24
0.5 e
4!
2

P ( A N1 = 4, N 2 = 4 ) =

0.5 e

2
4!

= 0.176

44
+ 0.5 e
4!

= 0.824

+ 0.5 e

4
4!

44
0.5 e
4!
4

P ( B N1 = 4, N 2 = 4 ) =

24
0.5 e
4!

The above two calculations are nasty and prone to errors. Many candidates will mess up
in these calculations and wont score a point. Assume you have done your calculation
right, you should get:
E ( N 3 N1 = 4, N 2 = 4 )
= 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )

= 2(0.176) + 4(0.824) = 3.648


What you should do in the exam room
Just set up the following table and let BA II Plus/Professional 1-V do the magic for you.
Watch and relax.
Event: { N1 = 4, N 2 = 4}

Group Beforeevent
size of
the
group

This
After-event size of the
groups
group (raw posterior
probability probability)
to produce
the event

A
0.5

24
e2
4!

24
0.5 e 2
4!

Scale up
raw
posterior
probability

Conditional
mean

"A = 2

"B = 4

B
0.5

4
4!

0.5 e

4
4!

Guo Fall 2009 C, Page 233 / 284

Next, well need to scale the raw posterior probabilities up. Well want to avoid the errorprone calculation of following two raw posterior probabilities:
2

24
0.5 e
4!
2

44
0.5
e
,
4!

Remember what I said earlier when I was explaining Bayes Theorem to you:
What matters is the ratio of these two (or more) raw posterior probabilities, not their
absolute amounts.
What matters is the ratio of
24
0.5 e
4!
2

44
0.5
e
,
4!

So well change these two raw posterior probabilities to


24
0.5 e
4!

0.5 e

2
4!

44
0.5 e
4!

= 1,

0.5 e

2
4!

44
= 4
2

(e )
(e )

4 2
2 2

= 216 ( e

2 2

= 256e 4 = 4.689

New Table

Guo Fall 2009 C, Page 234 / 284

Event: { N1 = 4, N 2 = 4}

Group Beforeevent
size of
the
group

This
groups
probability
to produce
the event

24
e
4!

0.5

After-event
size of the
group (raw
posterior
probability)

24
0.5 e
4!

Scale up
raw
Condiposterior
tional
probability
(multiply
mean
the raw
probability
by 1,000)

After-event size of the


group (raw posterior
probability) after
simplification

24
0.5 e
4!

24
0.5 e
4!

=1

1,000

"A = 2

4,689

"B = 4

44
0.5 e
4!

0.5

44
e4
4!

44
0.5 e 4
4!

0.5 e

2
4!

= 256e

= 4.689
Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:
X01=2,
Y01=1,000
X02=4,
Y02=4,689
You should get: n = 5, 689 , X ' 3.648 . So E ( N 3 N1 = 4, N 2 = 4 ) ' 3.648 .
Problem 9 (Nov 2000 #28)

Prior to observing any claims, you believed that claim sizes followed a Pareto distribution
with parameters = 10 and =1, 2, or 3, with each value equally likely. You then
observe one claim of 20 for a randomly selected risk. Determine the posterior probability
that the next claim for this risk will be greater than 30.
Solution

The observation is X 1 = 20 . If we dont bother with this new information, then


P ( X 2 > 30 )

= P ( X 2 > 30

= 1) P ( = 1) + P ( X 2 > 30

= 2 ) P ( = 2 ) + P ( X 2 > 30

= 3) P ( = 3)

Guo Fall 2009 C, Page 235 / 284

If you look at Tables for Exam C/4, youll see that the survival function of a (2is S ( x

parameter) Pareto distribution with parameters

)=

. Here the
x+
problem doesnt say whether the Pareto is one parameter or two parameters. One quick
way to determine whether to use one parameter or two-parameter Pareto is this:
If the random variable is greater then zero, then use two parameter Pareto.
If the random variable is greater than a positive constant, then use one parameter Pareto.
The problem just vaguely says that claim sizes follow a Pareto distribution. Here the
claim size (i.e. claim dollar amount) must be greater than zero. Theres no reason for us
to think that the claim dollar amount must exceed a positive constant (such $500). As a
result, well use the 2-parameter Pareto.
Then for

= 10 , using the 2-parameter Pareto survival function, we have:

P ( X 2 > 30

) = S ( 30 ) =
1
4

P ( X 2 > 30 ) =
=

10
30 + 10

P ( = 1) +

1
4

1
4

P ( = 2) +

1
4

P ( = 3)

1
1
1
P ( = 1) + P ( = 2 ) + P ( = 3)
4
16
64

Now the observation X 1 = 20 will change the above calculation to:


P ( X 2 > 30 X 1 = 20 ) =

1
1
1
P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )
4
16
64

Next, well calculate the posterior probabilities. If you look at Tables for Exam C/4,
youll find the density function of a 2-parameter Pareto distribution with parameters

f (x

Then for

)=

is:

+1

(x+ )

= 10 , f ( 20

+1

x+

) = 10 2010+ 10

+1

1
=
10 3

+1

Then the posterior probabilities are:

Guo Fall 2009 C, Page 236 / 284

P ( = 1 X 1 = 20 ) =

P ( = 2 X 1 = 20 ) =

P ( = 3 X 1 = 20 ) =

P ( = 1) f ( 20
f ( 20 )

= 1)

P ( = 2 ) f ( 20

= 2)

P ( = 3) f ( 20

= 3)

f ( 20 )

f ( 20 )

f ( 20 ) = P ( = 1) f ( 20

= 1) + P ( = 2 ) f ( 20

10
Apply the formula f ( 20 ) =
10 20 + 10
calculation right, youll find:

+1

= 2 ) + P ( = 3) f ( 20

1
=
10 3

= 3)

+1

. Assume you can do the above

f ( 20 ) = 0.3704% + 0.2409% + 0.1235% = 0.7408%


P ( = 1 X 1 = 20 ) =

0.3704% 1
=
0.7408% 2

P ( = 2 X 1 = 20 ) =

0.2469% 1
=
0.7408% 3

P ( = 3 X 1 = 20 ) =

0.1235% 1
=
0.7408% 6

Then
1
1
1
P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )
4
16
64
1 1
1 1
1 1
=
+
+
= 0.148
4 2 16 3 64 6

P ( X 2 > 30 X 1 = 20 ) =

If you ever try to reproduce my answers, youll find the calculation outlined above is
absolutely a nightmare. In addition, I must acknowledge that I used an Excel spreadsheet
to help me do the above calculations when I was preparing this manual. I must also
knowledge that theres little chance that I will be able to do the calculation right in the
heat of the exam.
In the exam, Ill never use the above standard approach, which is prone to errors. This is
what I will do in the exam (dramatically reducing the complexity of the calculations).
This is what you should do in the exam:
Guo Fall 2009 C, Page 237 / 284

What you should do in the exam room


Event: X 1 = 20
A
B
Group Beforeevent
size of
the
group

D=BC

This groups
density to
produce the
event

After-event size
of the group (raw
posterior
probability)

Scale up raw
posterior
probability

f ( 20

1
=
10 3

1
3

1
30 3
1 1
30 3

=2

1
3

2 1
30 3

=3

1
3

3 1
30 3

=
=1

multiply the
raw prob by
3(30)(32)

+1

1 1
3 30

1
3

1 2
3 30

1
3

1 3
3 30

1
3

F
P ( X 2 > 30
=

1
4

1
4

1
4

1
4

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X 01 =

1
= 0.25 ,
4

1
X 02 =
4
1
X 02 =
4

Y01 = 3

= 0.0625 ,

Y02 = 2

= 0.015625 ,

Y03 = 1

You should get: n = 6 , X = 0.14843750 . So P ( X 2 > 30 X 1 = 20 ) = 0.14843750

You see how nice and easy the shortcut calculation is.

Guo Fall 2009 C, Page 238 / 284

May 2001 #10


The claim count and claim size distribution for risks of Type A are:
# of claims
0
1
2

Claim size
500
1235

Probability
1/3
2/3

The claim count and claim size distributions for risks of Type B are:

# of claims
0
1
2

Probability
4/9
4/9
1/9

Probability
1/9
4/9
4/9

Claim size
250
328

Probability
2/3
1/3

Risks are equally likely to be type A and type B


Claim counts and claim sizes are independent within each risk type.
The variance of the total losses in 296,962.

A randomly selected risk is observed to have total annual losses of 500.


Determine the Bayesian premium for the next year for this same risk.
Solution
N

Let S = ( X i represent the total annual loss. The observation is S1 = 500 . We are asked
i =1

to find E ( S 2 S1 = 500 ) . If we ignore the observation S1 = 500 , then the problem becomes
finding E ( S2 ) . Since the risk can be from either Type A or Type B, well condition S2
on risk types.
E ( S2 ) = E ( S2 A) P ( A) + E ( S2 B ) P ( B )

E (S ) = E (N ) E ( X ) ,
E ( S2 A ) = E ( N 2 A) E ( X A ) ,

E ( S2 B ) = E ( N 2 B ) E ( X B )

E ( N 2 A) = 0

4
4
1
6
1
4
4
12
+1
+2
= , E ( N2 B ) = 0
+1
+2
=
9
9
9
9
9
9
9
9
1
2
2
1
E ( X A ) = 500
+ 1235
= 990 , E ( X B ) = 250
+ 328
= 276
3
3
3
3
Guo Fall 2009 C, Page 239 / 284

E ( S2 A ) = E ( N 2 A ) E ( X A ) =

6
( 990 ) = 660
9
12
E ( S 2 B ) = E ( N 2 B ) E ( X B ) = ( 276 ) = 368
9
E ( S 2 ) = E ( S 2 A ) P ( A ) + E ( S 2 B ) P ( B ) = 660 P ( A) + 368 P ( B )

The observation S1 = 500 will change the above equation to:


E ( S 2 S1 = 500 ) = 660 P ( A S1 = 500 ) + 368 P ( B S1 = 500 )

P ( A S1 = 500 ) =

P ( A) P ( S1 = 500 A )
P ( S1 = 500 )

, P ( B S1 = 500 ) =

P ( B ) P ( S1 = 500 B )
P ( S1 = 500 )

Well calculate the ratio:


P ( A S1 = 500 )

P ( B S1 = 500 )

P ( A ) P ( S1 = 500 A )

P ( B ) P ( S1 = 500 B )

The only way for Type A to incur 500 claim in Year 1 is to have one claim of 500. The
only way for Type B to incur 500 claim in Year 1 is to two claims of 250 each.
4 1
4
2
So P ( S1 = 500 A) = , P ( S1 = 500 B ) =

9 3
9
3

We are told that P ( A ) = P ( B ) = 0.5 .


4
1

P ( A S1 = 500 ) P ( A ) P ( S1 = 500 A )
9
3
=
=
P ( B S1 = 500 ) P ( B ) P ( S1 = 500 B )
4
2

0.5
9
3
0.5

3
4

Because P ( A S1 = 500 ) + P ( B S1 = 500 ) = 1 , we have:


P ( A S1 = 500 ) =

3
4
, P ( B S1 = 500 ) =
7
7

Finally, E ( S 2 S1 = 500 ) = 660 P ( A S1 = 500 ) + 368 P ( B S1 = 500 )


= 660

3
4
+ 368
= 493.14
7
7
Guo Fall 2009 C, Page 240 / 284

What you should do in the exam room


Event: S1 = 500
A
B
Group Beforeevent
size of
the
group

D=BC

This groups
probability
to produce
the event

After-event size
of the group (raw
posterior
probability)

Scale up raw
posterior
probability

F
E ( S 2 Type )

multiply the
raw prob by
32
0.5

Type
A

Type
B

0.5

4 1

9 3

0.5

4
2

9
3

4 1
0.5
9 3
2

0.5

4
2

9
3

4
9
3

660

368

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=660, Y01=3; X02=368, Y02=4.
You should get: n = 7 , X = 493.14 . So E ( S 2 S1 = 500 ) = 493.14
Nov 2002 #39

You are given:


Class

1
2
3

# of insureds

3000
2000
1000

0
1/3
0
0

Claim Count
Probabilities
1
2
3
1/3 1/3
0
1/6 2/3 1/6
0
1/6 2/3

4
0
0
1/6

A randomly selected insured has one claim in Year 1.


Determine the expected number of claims in Year 2 for that insured.
Solution
Guo Fall 2009 C, Page 241 / 284

Conceptual framework
The observation is N1 = 1 . We are asked to find E ( N 2 N1 = 1) . If we ignore the
observation N1 = 1 , then the problem becomes finding E ( N 2 ) . Since N 2 can be
generated from each of the three classes, well condition N 2 on classes:
3

E ( N 2 ) = ( E ( N 2 Class i ) P ( Class i )
i =1

E ( N 2 Class 1) = 0

1
1
1
+1
+2
=1
3
3
3
1
2
1
E ( N 2 Class 2 ) = 1
+2
+3
=2
6
3
6
1
2
1
E ( N 2 Class 3) = 2
+3
+4
=3
6
3
6

E ( N 2 ) = P ( Class i ) + 2 P ( Class i ) + 3P ( Class i )


The observation N1 = 1 will change the above equation into:
E ( N 2 N1 = 1) = P ( Class 1 N1 = 1) + 2 P ( Class 2 N1 = 1) + 3P ( Class 3 N1 = 1)

P ( Class 1 N1 = 1) =
P ( Class 2 N1 = 1) =

P ( Class 3 N1 = 1) =

P ( Class 1) P ( N1 = 1 Class 1)
P ( N1 = 1)

3 1

6
3
=
P ( N1 = 1)

P ( Class 2 ) P ( N1 = 1 Class 2 )

2 1

= 6 6
P ( N1 = 1)

P ( Class 3) P ( N1 = 1 Class 3)

3
0
6
=
=0
P ( N1 = 1)

P ( N1 = 1)

P ( N1 = 1)

3 1 2 1 3
2
P ( N1 = 1) = + + 0 =
6 3 6 6 6
9

Guo Fall 2009 C, Page 242 / 284

P ( Class 1 N1 = 1) =

P ( Class 2 N1 = 1) =

P ( Class 1) P ( N1 = 1 Class 1)
P ( N1 = 1)

3 1

6
3=3
=
2
4
9

P ( Class 2 ) P ( N1 = 1 Class 2 )
P ( N1 = 1)

2 1

1
=6 6=
2
4
9

E ( N 2 N1 = 1) = P ( Class 1 N1 = 1) + 2 P ( Class 2 N1 = 1) + 3P ( Class 3 N1 = 1)

3
1
+ 2 + 3 0 = 1.25
4
4

This is what you should do in the exam room


Event: N1 = 1
A
B

Group Before- This groups


(class) event
probability
size of to produce
the
the event
group

3/6

1/3

2/6

1/6

1/6

D=BC

After-event size
of the group (raw
posterior
probability)

Scale up raw
posterior
probability

E ( N 2 Class )

multiply the
raw prob by
18
3 1

6 3

2 1

6 6
1/60

Because the posterior probability is zero for Class to produce N1 = 1 , we can delete the
last row.
Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:
X01=1, Y01=3; X02=2, Y02=1.
You should get: n = 4 , X = 1.25 . So E ( N 2 N1 = 1) = 1.25
Guo Fall 2009 C, Page 243 / 284

Nov 2000 #33


A car manufacturer is testing the ability of safety devices to limit damage in car
accidents. You are given:

A test car has either front air bags or side air bags (but not both), each type being
equally likely
The test car will be driven into either a wall or a lake, with each accident type
being equally likely
The manufacturer randomly selects 1, 2, 3, or 4 crash test dummies to put into a
car with front air bags.
The manufacturer randomly selects 2, or 4 crash test dummies to put into a car
with side air bags.
Each crash test dummy in a wall-impact accident suffers damage randomly equal
to either 0.5 or 1, with damage to each dummy being independent of damage to
the others.
Each crash test dummy in a lake-impact accident suffers damage randomly equal
to either 1 or 2, with damage to each dummy being independent of damage to the
others.

One test car is selected at random, and a test dummy accident produces total damage of 1.
Determine the expected value of the total damage for the next accident, given that the
kind of safety device (front or side air bags) and accident type (wall or lake) remain the
same.
Solution
This is one of the most feared exam problems. If you use the framework and shortcut,
however, you should do just fine.
Conceptual framework
N

Damage S = ( X i , where X is damage incurred by one test dummy; N is the number


i =1

of dummies chosen for the crash testing. The observation is S1 = 1 . We are asked to find
E ( S 2 S1 = 1) .

To simplify the problem, lets first discard the observation. Then the problem becomes
finding E ( S2 ) . The crash testing falls into four types:

Front air bag, wall collision (FW)


Front air bag, lake collision (FL)
Side air bag, wall collision (SW)
Side air bag, lake collision (FW)
Guo Fall 2009 C, Page 244 / 284

Next, we set up the partition equation:

E ( S2 )

= E ( S 2 FW ) P ( FW ) + E ( S 2 FL ) P ( FL ) + E ( S 2 SW ) P ( SW ) + E ( S 2 SL ) P ( SL )

Next, lets calculate E ( S 2 FW ) . The manufacturer randomly selects 1, 2, 3, or 4 crash


test dummies to put into a car with front air bags. Each dummy is equally likely to be
chosen. So the expected number of dummies used for crash testing under FW is:
E ( N FW ) = E ( N F ) =

1+ 2 + 3 + 4
= 2.5
4

If the car is tested for lake collision, then the damage to a tested dummy can be either 0.5
or 1 with each damage equally likely:
E ( X FW ) = E ( X W ) =

0.5 + 1
= 0.75
2

E ( S 2 FW ) = E ( N FW ) E ( X FW ) = 2.5 ( 0.75 )

Similarly,
E ( S 2 FL ) = E ( N FL ) E ( X FL ) =

1+ 2 + 3 + 4 1+ 2

= 2.5 (1.5 )
4
2

E ( S 2 SW ) = E ( N SW ) E ( X SW ) =
E ( S 2 SL ) = E ( N SL ) E ( X SL ) =

2 + 4 0.5 + 1

= 3 ( 0.75 )
2
2
2 + 4 1+ 2

= 3 (1.5 )
2
2

E ( S2 ) = 2.5 ( 0.75 ) P ( FW ) + 2.5 (1.5 ) P ( FL ) + 3 ( 0.75 ) P ( SW ) + 3 (1.5 ) P ( SL )


If we want to complete the above calculation, well plug in
P ( FW ) = P ( FL ) = P ( SW ) = P ( SL ) = 0.25
This will produce the prior mean.

Guo Fall 2009 C, Page 245 / 284

However, we are interested in finding the posterior mean E ( S 2 S1 = 1) . So we need to


consider the impact of the observation S1 = 1 . This observation will change the partition
equation into:
E ( S 2 S1 = 1) = 2.5 ( 0.75 ) P ( FW S1 = 1) + 2.5 (1.5 ) P ( FL S1 = 1)
+3 ( 0.75 ) P ( SW S1 = 1) + 3 (1.5 ) P ( SL S1 = 1)

Using Bayes Theorem, we have:


P ( FW S1 = 1) =

P ( SW S1 = 1) =

P ( FW ) P ( S1 = 1 FW )
P ( S1 = 1)

P ( SW ) P ( S1 = 1 SW )
P ( S1 = 1)

P ( FL S1 = 1) =

P ( SL S1 = 1) =

P ( FL ) P ( S1 = 1 FL )
P ( S1 = 1)

P ( SL ) P ( S1 = 1 SL )
P ( S1 = 1)

Where

P ( S1 = 1) = P ( FW ) P ( S1 = 1 FW ) + P ( FL ) P ( S1 = 1 FL )
+ P ( SW ) P ( S1 = 1 SW ) + P ( SL ) P ( S1 = 1 SL )

The key is to calculate P ( S1 = 1 FW ) . In a front bag lake collision testing, the number of
dummies can be 1,2,3, or 4; the damage per dummy can be 0.5 or 1. So there are only 2
ways for FW to produce S1 = 1 .
Two dummies were chosen each having 0.5 damage. Probability: 0.25(0.5)(0.5)
One dummy was chosen having 1 damage. Probability: 0.25(0.5)
Total probability: P ( S1 = 1 FW ) =0.25(0.5)(0.5) + 0.25(0.5) = 0.1875
We can apply the same logic and find (please verify my calculation):
P ( S1 = 1 FL ) = 0.125 , P ( S1 = 1 SW ) = 0.125 , P ( S1 = 1 SL ) = 0
We are given that P ( FW ) = P ( FL ) = P ( SW ) = P ( SL ) = 0.25
Finally,
P ( S1 = 1) = 0.25 * (0.1875 + 0.125 + 0.125)
P ( FW S1 = 1) =

0.25 0.1875
3
=
0.25 (0.1875 + 0.125 + 0.125) 7
Guo Fall 2009 C, Page 246 / 284

P ( FL S1 = 1) =

0.25 0.125
2
=
0.25 (0.1875 + 0.125 + 0.125) 7

P ( SW S1 = 1) =

0.25 0.1875
2
=
0.25 (0.1875 + 0.125 + 0.125) 7

P ( SL S1 = 1) =

0.25 0
=0
0.25 (0.1875 + 0.125 + 0.125)

Finally,
E ( S 2 S1 = 1) = 2.5 ( 0.75 ) P ( FW S1 = 1) + 2.5 (1.5 ) P ( FL S1 = 1)
+3 ( 0.75 ) P ( SW S1 = 1) + 3 (1.5 ) P ( SL S1 = 1)

3
2
2
= 2.5 ( 0.75 ) + 2.5 (1.5 ) + 3 ( 0.75 ) = 2.518
7
7
7

This is what you should do in the exam room


Event: S1 = 1
A
B

Group Before- This groups


(class) event
probability
size of to produce
the
the event
group

FW

FL

SW
SL

1
4
1
4
1
4
1
4

0.1875

0.125

0.125
0

D=BC

After-event size
of the group (raw
posterior
probability)

Scale up raw
posterior
probability

F
E ( S 2 S1 = 1)

multiply the
raw prob by
40,000
1
( 0.1875)
4
1
( 0.125)
4
1
( 0.125)
4
0

1875

2.5 ( 0.75 )

1250

2.5 (1.5 )

1250

3 ( 0.75)

3 (1.5)

Because the posterior probability is zero for Class to produce S1 = 1 , we can delete the
last row.
Guo Fall 2009 C, Page 247 / 284

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01= 2.5 ( 0.75 ) , Y01=1875;
X02= 2.5 (1.5 ) ,

Y02=1250

X03= 3 ( 0.75) ,

Y03=1250

You should get: n = 4357 , X = 2.518 . So E ( S 2 S1 = 1) = 2.518


Problem 10 (May 2005 #35)

The # of claims on a given policy has the geometric distribution with parameter
One-third of the policies have = 2 ; and the remaining two-thirds have = 5 .

A randomly selected policy had two claims in Year 1.


Calculate the Bayesian expected # of claims for the selected policy in Year 2.
Solution

The observation is N1 = 2 . We are asked to find E ( N 2 N1 = 2 ) . If we dont worry about


the observation N1 = 2 , then
E ( N2 ) = E ( N2

= 2) P (

= 2) + E ( N2

= 5) P (

Because N 2 has a geometric distribution, we have E ( N 2


E ( N2 ) = 2P (

= 2) + 5P (

= 5)

)=

= 5)

The observation N1 = 2 will change the above equation to


E ( N 2 N1 = 2 ) = 2 P (

= 2 N1 = 2 ) + 5 P (

= 5 N1 = 2 )

Next, well calculate the raw (un-normalized) posterior probability:

Guo Fall 2009 C, Page 248 / 284

Event: A policy has 2 claims in Year 1.


A
B
C
D=BC
Group Beforeevent
size of
the
group

This groups
probability to
produce the
event (a
geometric
distribution)
P ( N1 = 20

=2

1
3

=5

2
3

After-event size of
the group (raw
posterior
probability)

22

(1 + 2 )

52

(1 + 5)

Scale up
raw
posterior
probability

F
E ( N2

)=

multiply
the raw
prob by
100,000

(1 + )

4
27

1 4
= 0.04938
3 27

4,938

25
216

2 25
= 0.07716
3 216

7,716

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=2, Y01=4,938; X02=5, Y02=7,716.
You should get: X ' 3.83 . So E ( N 2 N1 = 2 ) ' 3.83
Problem 11 (Nov 2005, #15)

For a particular policy, the conditional probability of the annual number of claims given
) = , and the probability distribution of ) are as follows:
# of claims
Probability

0
2

Probability

0.10
0.80

0.30
0.20

2
1 3

One claim was observed in Year 1.


Calculate the Bayesian estimate of the expected # of claims in Year 2.
Solution

Guo Fall 2009 C, Page 249 / 284

The observation is X 1 = 1 . We are asked to find E ( X 2 X 1 = 1) .


Ignoring this observation, we have:
E ( X2 ) = E ( X2
E ( X2

= 0.1) P ( = 0.1) + E ( X 2

= 0.3) P ( = 0.3)

= E ( X 2 , X 2 = 0 ) P ( X 2 = 0 ) + E ( X 2 , X 2 = 1) P ( X 2 = 1) + E ( X 2 , X 2 = 2 ) P ( X 2 = 2 )
= 0(2

) + 1( ) + 2 (1

E ( X2

)=2

= 0.1) = 2 5 ( 0.1) = 1.5 , E ( X 2

= 0.3) = 2 5 ( 0.3) = 0.5

E ( X 2 ) = 1.5 P ( = 0.1) + 0.5P ( = 0.3)


Considering the observation X 1 = 1 , we have:
E ( X 2 X 1 = 1) = 1.5 P ( = 0.1 X 1 = 1) + 0.5 P ( = 0.3 X 1 = 1)

Event: X 1 = 1
A
B

D=BC

Beforeevent
size of
the
group

This groups
probability
to produce
the event .
(The
probability
to have one
claim is )

After-event size
of the group (raw
posterior
probability)

Scale up raw
posterior
probability

= 0.1

0.8

0.1

0.8(0.1)=0.08

1.5

= 0.3

0.2

0.3

0.2(0.3)=0.06

0.5

Group

F
E ( X2

multiply the
raw prob by
100

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=1.5, Y01=8; X02=0.5, Y02=6.
You should get: X = 1.07142857 . So E ( X 2 X 1 = 1) ' 1.07

Guo Fall 2009 C, Page 250 / 284

Calculate Bayesian premiums when the prior probability is continuous


The solution process for continuous-prior problems are similar to the process for the
discrete prior problems. There are two major differences:

Well use integration for the continuous prior problems; well use summation for
the discrete prior problems.

You cant use the BA II Plus/Professional 1-V Statistics Worksheet shortcut any
more to solve a continuous-prior premium problem. In contrast, you use the BA II
Plus/Professional 1-V Statistics Worksheet shortcut to solve a discrete-prior
premium problem.

Calculate the Bayesian premium when the prior probability is continuous


Step 1
Determine the observation.
Step 2

Discard the observation. Set up your partition equation.

Step 3

Consider the observation. Modify your parturition equation obtained in


Step 2. Change the prior probability to posterior probability.

Step 4

Use Bayes Theorem and calculate the posterior probability.

Step 5

Calculate the final answer.

Ill illustrate this problem with examples.


Problem 1 (May 2001 #37)
You are given the following information about workers compensation coverage

The # of claims from an employee during the year follows a Poisson distribution
100 p
with mean
, where p is the salary (in thousands) for the employee
100

The distribution of p is uniform on the interval [0, 100].

An employee is selected at random. No claims were observed for this employee during
the year. Determine the posterior probability that the selected employee has a salary
greater than 50.
Solution
Step 1 Determine the observation. This is N = 0 . We are asked to find P ( p > 50 N = 0 ) .
Please note we are NOT asked to find P ( N 2 > 50 N1 = 0 ) .
Guo Fall 2009 C, Page 251 / 284

Step 2 Ignore the observation. Set up your partition equation.


If we ignore the observation, we just need to find P ( p > 50 ) . Since p is uniform on the
interval [0, 100], we have:
P ( p > 50 ) =

100

f ( p ) dp

50

Step 3 Consider the observation. Modify the equation.


P ( p > 50 N = 0 ) =

100

f ( p N = 0 ) dp

50

Step 4 Calculate the posterior probability


f ( p N = 0) =

f ( p) P ( N = 0 p)
P ( N = 0)

f ( p) P ( N = 0 p)
100

f ( p ) P ( N = 0 p ) dp

p =0

N p is a Poisson random variable with mean " =


P ( N = 0 p ) = e0.01 p 1 .

P ( N = 0) =

100

f ( p ) P ( N = 0 p ) = 0.01e0.01 p 1 ,

f ( p ) P ( N = 0 p ) dp =

p =0

=e

100

100

0.01e0.01 p 1dp = e
p=0

( e 1) = 1

f ( p N = 0) =

100 p
= 1 0.01 p . So
100

0.01e0.01 p dp .
p =0

f ( p) P ( N = 0 p)
P ( N = 0)

0.01e0.01 p 1 0.01e 1 0.01 p 0.01 0.01 p


=
=
e
e
e 1
1 e1
1 e1

Step 5 Calculate the final answer


P ( p > 50 N = 0 ) =

100

p = 50

100

0.01 0.01 p
e e0.5
f ( p N = 0 )dp =
e dp =
= 0.622
e 1
e 1
p = 50

Guo Fall 2009 C, Page 252 / 284

Shortcut
100 p
, we naturally set
100

Since N p is a Poisson random variable with mean


100 p
. Since p is uniform over [0, 100], 100
100
100 p
"=
is uniform over [0, 1]. f ( " ) = 1 .
100

"=

f ( " N = 0) =

f (" ) P ( N = 0 " )
P ( N = 0)

e
1

"

e "d"

p is also uniform over [0, 100] and

"

1 e

"=

100 p
p
=1
,
100
100

p = 100 (1 " )

p > 50 * 100 (1 " ) > 50 * " < 0.5 ,

P ( p > 50 N = 0 ) = P ( " < 0.5 N = 0 ) =

0.5

f ( " N = 0 )d " =

" =0

0.5

" =0

"

1 e

d"

1 e 0.5
= 0.6225
1 e1

Problem 13 (Nov 2005 #32)


You are given:
In a portfolio of risks, each policyholder can have at most two claims per year.
For each year, the distribution of the number of claims is:

# of claims
0
1
2

Probability
0.1
0.9 q
q

The prior density is

(q) =

q2
, 0.2 < q < 0.5
0.039

A randomly selected policyholder had two claims in Year 1 and two claims in Year 2.
For this insured, determine the Bayesian estimate of the expected number of claims in
Year 3.
Solution

Guo Fall 2009 C, Page 253 / 284

Continuous-prior problems are harder than discrete-prior ones and many candidates are
scared of them. However, if you can follow the 5-step framework, youll be on the right
track.
The observation is ( N1 = 2, N 2 = 2 ) . We are asked to find E ( N 3 N1 = 2, N 2 = 2 ) .

Lets simplify the problem by discarding the observation ( N1 = 2, N 2 = 2 ) . Then our task
is to find prior mean E ( N 3 ) . This is an Exam P problem.
N 3 is distributed as follows:

+0
,
N 3 = -1
,2
.

with probability

0.1

with probability
with probability

0.9 - q
q

Here q is a random variable with pdf

(q) =

q2
, 0.2 < q < 0.5 . If q is fixed, then
0.039

the prior mean given q is:


E ( N 3 q ) = 0 ( 0.1) + 1( 0.9 q ) + 2 ( q ) = q + 0.9

Next, we take the expectation of the above equation regarding q :


Eq E ( N 3 q ) ! = Eq ( q + 0.9 ) = E ( q ) + 0.9

However, Eq E ( N 3 q ) ! = E ( N 3 ) -- this is the double expectation theorem.


E ( N 3 ) = E ( q ) + 0.9
0.5

0.5

q2
E ( q ) = q ( q ) dq = q
dq = 0.39
0.039
0.2
0.2
E ( N 3 ) = E ( q ) + 0.9 = 0.9 + 0.39 = 1.29
So the mean prior to the observation is 1.29. Please note that we dont need to calculate
the prior mean. I calculated it just to show you this: if you discard the observation, then
the problem becomes an Exam P problem.
Next, lets add in the observation. The observation ( N1 = 2, N 2 = 2 ) will change the
equation from E ( N 3 ) = E ( q ) + 0.9 to
Guo Fall 2009 C, Page 254 / 284

E ( N 3 N1 = 2, N 2 = 2 ) = E ( q N1 = 2, N 2 = 2 ) + 0.9

E ( q N1 = 2, N 2 = 2 ) =

0.5

q f ( q N1 = 2, N 2 = 2 ) dq

0.2

f ( q N1 = 2, N 2 = 2 ) =

f ( q ) P ( N1 = 2, N 2 = 2 q )
P ( N1 = 2, N 2 = 2 )

f ( q ) P ( N1 = 2, N 2 = 2 q )
0.5

f ( q ) P ( N1 = 2, N 2 = 2 q ) dq

0.2

P ( N1 = 2, N 2 = 2 q ) = q 2 ,

(q) =

q2
.
0.039

q2
q2 )
(
f ( q N1 = 2, N 2 = 2 ) = 0.5 0.039
=
q2
2
( q ) dq
0.039
0.2

q4
0.5

q 4 dq
0.2
0.5

1 6 0.5
q !
0.2
0.2
E ( q N1 = 2, N 2 = 2 ) = q f ( q N1 = 2, N 2 = 2 ) dq = 0.5
=6
= 0.419
1 5 0.5
4
0.2
q !
q dq
0.2
5
q 5 dq

0.5

0.2

E ( N 3 N1 = 2, N 2 = 2 ) = E ( q N1 = 2, N 2 = 2 ) + 0.9 = 0.419 + 0.9 = 1.32

Problem (Nov 2000 #23)


You are given:
The parameter / has an inverse gamma distribution with probability density
function

g ( " ) = 500" 4 e

10 "

, " >0

The size of a claim has an exponential distribution with probability density


function
f ( x / = " ) = " 1e

x "

, x >0, " >0

For a single insured, two claims were observed that totaled 50. Determine the expected
value of the next claim from the same insured.
Solution
Guo Fall 2009 C, Page 255 / 284

We are asked to find E ( X 3 X 1 + X 2 = 50 ) . If we ignore the observation X 1 + X 2 = 50 ,


then the problem becomes
E ( X3 ) =

xf ( x )dx =

xf ( x " )g ( " ) dx =

x ( " 1e

x "

)g ( " ) dx

If we consider the observation, well need to change the prior density g ( " ) to the
posterior density g ( " X 1 + X 2 = 50 )
E ( X 3 X 1 + X 2 = 50 ) =

x ( " 1e

x "

)g ( " X

+ X 2 = 50 ) dx

Nov 2001 #14


For a group of insureds, you are given:
The amount of claim is uniformly distributed but will not exceed a certain
unknown limit
500
The prior distribution of is ( ) = 2 , > 500

Two independent claims of 400 and 600 are observed.

Determine the probability that the next claim will exceed 500.
Solution
The observation is X 1 = 400, X 2 = 600 . We are asked to find
P ( X 3 > 550 X 1 = 400, X 2 = 600 )
P ( X 3 > 550

If we ignore the observation, then P ( X 3 > 550 ) =

)f ( )d

500

X3

is uniformly distributed over [ 0,


P ( X 3 > 550 ) =

550

] . So P ( X 3 > 550 ) =

550

( )d

500

Since we have the observation X 1 = 400, X 2 = 600 , we will modify the above equation by
changing the prior density f

( )

to the posterior density f

X 1 = 400, X 2 = 600 ) :

Guo Fall 2009 C, Page 256 / 284

P ( X 3 > 550 X 1 = 400, X 2 = 600 ) =

550

X 1 = 400, X 2 = 600 ) d ]

600

Please note that weve also changed

to

X 1 = 400, X 2 = 600 ) 0 f

because weve observed X 2 = 600 .

500

600

( ) P ( X 1 = 400 ) P ( X 2 = 600 )

500 1

500

> 600

where

Next, well find the normalizing constant:


f

d = 1, k

X 1 = 400, X 2 = 600 ) =

600

1
4 +1

k
! 600 = 3 ( 600

X 1 = 400, X 2 = 600 ) =

> 600

where

) = 1,

l = 3 ( 6003 )

3 ( 6003 )
4

P ( X 3 > 550 X 1 = 400, X 2 = 600 )


=

550

3 ( 6003 )

1
4

d = 3 ( 6003 )

600

550

)d

600

= 3 ( 6003 )

1
4 +1

4 +1

550
5 +1

5+1

! 600

= 3 ( 6003 )

1
600
3

550
600
4

3 550
= 0.3125
4 600

=1

Nov 2002 #24


You are given:
The amount of a claim, X , is uniformly distributed on the interval [ 0,

The prior distribution of

is

( )=

500
2

> 500
Guo Fall 2009 C, Page 257 / 284

Two claims, x1 = 400 and x2 = 600 , are observed. You calculate the posterior
distribution as:

x1 , x2 ) = 3

6003
4

> 600

Calculate the Bayesian premium E ( X 3 x1 , x2 ) .


Solution

This problem is the recycled problem of Nov 2001 #14.


E ( X 3 x1 , x2 ) =

E ( X3

)f (

x1 , x2 ) d

600

is uniform over [ 0,

X3

E ( X 3 x1 , x2 ) =

2
600

] . So E ( X 3 ) = .
2
6003
4

d =

3
( 6003 )
2
600

d =

3
1
6003 ) ( 600
(
2
2

) = 450

May 2001 #18


You are given:
An individual automobile insured has annual claim frequencies that follow a
Poisson distribution with mean "
An actuarys prior distribution for the parameter " has probability density
function

( " ) = ( 0.5) 5e

5"

1
+ ( 0.5 ) e
5

" 5

In the first policy year, no claims were observed for the insured.

Determine the expected # of claims in the 2nd policy year.


Solution

The observation is N1 = 0 . We are asked to find E ( N 2 N1 = 0 ) . If we ignore the


observation N1 = 0 , then the problem becomes finding E ( N 2 ) . Using the double
expectation theorem, we have:
Guo Fall 2009 C, Page 258 / 284

E ( N 2 ) = E" E ( N 2 " ) ! = E ( " ) =

(" ) d"
0

If we consider the observation N1 = 0 , the above equation becomes:


E ( N 2 N1 = 0 ) = E ( " N1 = 0 ) =

(" N

= 0) d "

(" N

So the key is to find the posterior distribution

(" N

= 0) 0

(" N

( " ) P ( N1 = 0 " ) = ( 0.5 ) 5e


= 0) = k

=k

So

(" N

( 0.5 ) 5e

6"

5 ( 0.5)
( 6e
6

1
+ ( 0.5 ) e
5
6"

) + 0.5
6

5"

= 0) .

1
+ ( 0.5 ) e
5

" 5

"

6" 5

6
e
5

6" 5

= 0 ) is a mixture of two exponential distribution.

Next, well need to find the normalizing constant k . The total probability should be one.
We have:

(" N

5 ( 0.5)
( 6e
6

= 0 )d " = k
0

5 ( 0.5)
( 6e
6

6"

(" N

) + 0.5
6

= 0) = 2
=

6
e
5

6" 5

5 ( 0.5 )
( 6e
6

5
( 6e
6

6"

) + 16

E ( N 2 N1 = 0 ) = E ( " N1 = 0 ) =

) + 0.5
6

6
e
5

6" 5

( 0.5) 5 + 0.5 = 0.5

6"

) + 0.5
6

6
e
5

k =2

6
e
5

=1

6" 5

6" 5

(" N

6"

= 0) d " =

5 1
1 5
+
= 0.278
6 6
6 6

Guo Fall 2009 C, Page 259 / 284

Poisson-gamma model
Problem (May 2000, #30)
You are given:
An individual automobile insured has an annual claim frequency distribution that
follows a Poisson distribution with mean "
" follows a gamma distribution with parameter
and
st
The 1 actuary assumes that = 1 and = 1 6
The 2nd actuary assumes the same mean for the gamma distribution, but only half
the variance
A total of one claim is observed for the insured over a 3-year period
Both actuaries determine the Bayesian premium for the expected number of
claims in the next year using their model assumptions

Determine the ratio of the Bayesian premium that the 1st actuary calculates to the
Bayesian premium that the 2nd actuary calculates.
Solution

If
N " is Poisson with mean "

" follows a gamma distribution with parameter and


n1 , n2 ,, nk claims are observed in Year 1, Year 2,, Year k respectively
Then
The conditional random variable " n1 , n2 ,..., nk also follows gamma distribution with
parameters
*

+ n1 + n2 + ... + nk =

1+ k

+k

+ total # of claims observed

+ # of observation years

The Bayesian premium for the next year, Year k + 1 , is


E ( N k +1 n1 , n2 ,..., nk ) = E ( " n1 , n2 ,..., nk ) =

* *

+ total # of claims observed


1
+ # of observation years

This theorem is tested over and over and you should memorize it. If you want to find the
proof of this theorem, refer to the textbook Loss Models.
Guo Fall 2009 C, Page 260 / 284

In this problem,
the observation period = 3 years
# of claims observed = 1
=1,

1st actuary:

= 1 6 . The Bayesian premium for the 4th year is

+ total # of claims observed


1+1
2
=
=
1
+ # of observation years 6 + 3 9

and has
2nd actuary: You need to know that a gamma distribution with parameters
2
mean
and variance
. We are told that the two actuaries get the same mean but the
nd
2 actuary gets half the variance of the 1st one.
1 1
= 1 = ,
6 6

1
1
= 1
2
6

=2,

1
12

The Bayesian premium for the 4th year is


+ total # of claims observed
2 +1
1
=
=
1
+ # of observation years 12 + 3 5

So the ratio is

2 1 10
=
9 5 9

Nov 2001 #3
You are given:
The # of claims per auto insured follows a Poisson distribution with mean "
The prior distribution for " has the following probability density function:
f (" ) =

( 500" ) e 500"
"1 ( 50 )
50

A company observes the following claims experience:


Year 1 Year 2
# of claims
75
210
# of autos insured 600
900

The company expects to insure 1,100 autos in Year 3.


Determine the expected # of claims in Year 3.
Solution

Guo Fall 2009 C, Page 261 / 284

The observation is N1 = 75, N 2 = 210 , where N1 is the # of claims in Year 1 for the 600
auto policies; N 2 is the # of claims in Year 2 for the 900 auto policies. N1 has Poisson
distribution with mean of 600" . N 2 has Poisson distribution with mean of 900" .
We need to find E ( N 3 N1 = 75, N 2 = 210 ) , where N 3 is the # of claims in Year 3 for the
one auto policy. Then the expected # of auto claims in Year 3 for 1,100 auto policies is
simply
1,100 E ( N 3 N1 = 75, N 2 = 210 )

If we ignore the observation N1 = 75, N 2 = 210 , then E ( N 3 ) = E" E ( N 3 " ) ! = E ( " ) .


We are told that
500" ) e 500 "
(
f (" ) =
"1 ( 50 )
50

If you look at Table for Exam C, youll find the gamma pdf is:

x
f ( x) =

x1(

e
=

e
1(

( x" )

"1 (

x"

, where " =

You should immediately recognize that this is gamma distribution with parameters
= 50 and " = 500 . Then using the gamma distribution formula listed in Table for
Exam C, we have
E ( N3 ) = E ( " ) =

"

50
= 0.1 .
500

If we consider the observation N1 = 75, N 2 = 210 , then we need to modify the formula

E ( N 3 ) = E ( " ) to E ( N 3 N1 = 75, N 2 = 210 ) = E ( " N1 = 75, N 2 = 210 ) .


f ( " N1 = 75, N 2 = 210 ) 0 f ( " ) P ( N1 = 75 " ) P ( N 2 = 210 " )

0 ( " 49 e
0"

500 "

49 + 75+ 210

600 "

( 600" )

( 500 + 600+ 900 )"

75

0" e
334

900 "

( 900" )

210

2000 "

Guo Fall 2009 C, Page 262 / 284

So " N1 = 75, N 2 = 210 is a gamma distribution with parameters


*

= 335 and

1
.
2, 000
E ( N 3 N1 = 75, N 2 = 210 ) = E ( " N1 = 75, N 2 = 210 ) = a*

335
2, 000

Then the expected # of auto claims in Year 3 for 1,100 auto policies is simply
1,100

335
= 184.25
2, 000

May 2001 #2

You are given:


Annual claim counts follow a Poisson distribution with mean "
The parameter " has prior distribution with probability density function
1
f (" ) = e
3

" 3

, " >0

Two claims were observed during the 1st year. Determine the variance of the posterior
mean.
Solution

Please note that exponential distribution is a gamma distribution with parameter


So this is the Poisson-gamma model.

=1.

The observation is N1 = 2 . We are asked to find the variance Var ( " N1 = 2 ) . We are told
that N " is Poisson with mean " , yet " is gamma with

=1,

= 3.

Then " N1 = 2 is also gamma with updated parameters


*

+ # of observed claims =

= ( # of observation periods +

Then Var ( " N1 = 2 ) =

( )

* 2

+ N1
1

= (1 + 3

= 0.75

= 3 ( 0.75 ) = 1.6875
2

Guo Fall 2009 C, Page 263 / 284

Binomial-beta model
Problem (Nov 2000, #11)
For a risk, you are given:
The # of claims during a single year follows a Bernoulli distribution with mean p
The prior distribution for p is uniform on the interval [0, 1]
The claims experience is observed for a number of years
1
The Bayesian premium is calculated as based on the observed claims
5
Which of the following observed claims data could have yielded this calculation?
0 claims during 3 years
0 claims during 4 years
0 claims during 5 years
1 claims during 4 years
1 claims during 5 years
Solution
Please note that a uniform distribution is a special case of beta distribution with
parameter a = b = = 1 . In addition, Bernoulli distribution is a special case of binomial
distribution with n = 1 .
Next, Ill give you the general binomial-beta formula.
If

X p has binomial distribution with parameters n and p


p has beta distribution with parameter a and b
x1 , x2 ,, xk claims are observed in Year 1, Year 2,, Year k respectively (where xi
can be 0, 1, , n )

Then
The conditional random variable p x1 , x2 ,..., xk also has beta distribution with parameters
a* = a + x1 + x2 + ... + xk = a + total # of claims observed

b* = b + k n

( x1 + x2 + ... + xk ) = b + k n

total # of claims observed

The Bayesian premium for Year k + 1 is:


E ( X k +1

a*
x1 , x2 ,..., xk ) = n E ( p x1 , x2 ,..., xk ) = n * *
a +b

Guo Fall 2009 C, Page 264 / 284

Proof.

f ( p ) P ( x1 , x2 ,..., xk p )

f ( p x1 , x2 ,..., xk ) =

f ( p ) P ( x1 , x2 ,..., xk p ) dp

. Where

1
f ( p ) P ( x1 , x2 ,..., xk p ) dp

is

a normalizing constant. So f ( p x1 , x2 ,..., xk ) is proportional to f ( p ) P ( x1 , x2 ,..., xk p ) .


Next, lets find the beta pdf f ( p ) . If you look at the Exam C table, youll see that beta
distribution has the following pdf:
f ( x) =

1 (a + b)

1 ( a ) 1 (b )

u a (1 u )

b 1

1
x
, 0< x< , u=
x

This pdf is really annoying. It has variables u and x . To simplify the pdf, set
Then u = x and 0 < x < 1 . The pdf becomes:
f ( x) =

1 (a + b)

x a (1 x )

1 ( a ) 1 (b )

1 (a + b) a
1
x
=
x 1 ( a ) 1 (b)

b 1

(1

x)

b 1

= 1.

, 0 < x < 1.

This is the most commonly used beta pdf. This is the one you should use for Exam C.
Back to the problem. Since p has beta distribution with parameter a and b , the pdf is
f ( p) =

1 (a + b)

1 ( a ) 1 (b )

pa

(1

p)

b 1

, which is proportional to p a

(1

p)

b 1

Next, lets look at P ( x1 , x2 ,..., xk p ) . P ( x1 , x2 ,..., xk p ) = P ( x1 p ) P ( x2 p ) ...P ( xk p ) .


This is so because x1 , x2 ,..., xk are independent identically distributed given p . For i = 1
to k , xi p is binomial with parameters n and p . So P ( xi p ) = Cnxi p xi (1 p )

n xi

So P ( x1 , x2 ,..., xk p ) is proportional to
p x1 (1 p )

n x1

p x2 (1 p )

n x2

... p xk (1 p )

n xk

= p x1 + x2 +...+ xk (1 p )

kn

( x1 + x2 +...+ xk )

( xi
kn
xi
= p i=1 (1 p ) (
i =1
k

Guo Fall 2009 C, Page 265 / 284

f ( p x1 , x2 ,..., xk ) is proportional to f ( p ) p
k

a 1

(1

p)

b 1

( xi
i =1

(1

p)

kn

( =p
i =1
xi

( xi

(1

i =1

a+

p)

kn

( xi , which is proportional to
i =1

( xi
i =1

(1

p)

b+k n

( xi

i =1

We now see that f ( p x1 , x2 ,..., xk ) is beta distribution with parameters


a* = a + x1 + x2 + ... + xk ,

b* = b + k n

( x1 + x2 + ... + xk )

Next, well calculate E ( X k +1 x1 , x2 ,..., xk ) , the Bayesian estimate for Year k + 1 , using the
5-step framework.
We first discard the observation x1 , x2 ,..., xk . Then E ( X k +1 x1 , x2 ,..., xk ) becomes
E ( X k +1 ) . Using the double expectation theorem, we have:
E ( X k +1 ) = E p E ( X k +1 p ) ! = E p [ n p ] = n E ( p )

Next, we consider the observation x1 , x2 ,..., xk . Well modify the above equation by

changing the prior mean E ( p ) to the posterior mean E ( p x1 , x2 ,..., xk ) . We already


know that p x1 , x2 ,..., xk has beta distribution with parameters
a* = a + x1 + x2 + ... + xk ,

b* = b + k n

( x1 + x2 + ... + xk )

Looking up the beta expectation formula from the Exam C table, we have:
E ( p x1 , x2 ,..., xk ) =

a*
a* + b*

Finally, we have:
E ( X k +1 x1 , x2 ,..., xk ) = n E ( p x1 , x2 ,..., xk ) = n

a*
a* + b*

Now lets apply the binomial-beta formula to this problem. We are told that the # of
claims in a year is a Bernoulli random variable. So the number of trial is n = 1 . In
addition, the prior distribution of p is uniform over [0, 1], which is beta distribution with
parameter a = b = 1 .

Guo Fall 2009 C, Page 266 / 284

Assume we have observed a total of

(x

claims in k years. Then the Bayesian

premium for the next year is:


k

E ( X k +1 x1 , x2 ,..., xk ) = n

a + ( xi
i =1

a+b+k n

= (1)

We are told that E ( X k +1 x1 , x2 ,..., xk ) =

1 + ( xi
i =1

1 + 1 + k (1)

1 + ( xi
i =1

2+k

1
5

1 + ( xi
i =1

2+k

1
5

We have two unknowns in one equation. We cant solve it. One way to find the right
k

answer is to test each answer. If

(x
i =1

= 0 and k = 3 , well have

1 + ( xi
i =1

2+k

1
. So zero
5

claim during 3 years is the right answer.

Also see Problem #15, May 2007.

Guo Fall 2009 C, Page 267 / 284

Chapter 10

Claim payment per payment

2005 Exam M May #32


For an insurance:
Losses can be 100, 200 or 300 with respective probabilities 0.2, 0.2, and 0.6.
The insurance has an ordinary deductible of 150 per loss.
Y P is the claim payment per payment random variable.
Calculate Var (Y P ) .
(A) 1500 (B) 1875 (C) 2250 (D) 2625 (E) 3000
Core concepts:
Ground up loss
Ordinary deductible
Claim payment
Claim payment per payment
Explanation
Let X represent the ground up loss amount (ground up loss amount is the actual loss
incurred by the policyholder). Let d where d 0 represent the deductible.
Amount paid the insurer (called claim payment):

(X

d )+ = max ( X

d , 0) =

0
X

if X d
if X > d

Amount the insured needs to pay out of his own pocket:

(X

d ) = min ( X , d ) =

X
d

if X d
if X > d

Please note that

X
ground up loss

(X

d )+

amount paid by the


insurance company

(X

d)

amount paid by the insured


out of his own pocket

Guo Fall 2009 C, Page 268 / 284

Example. Your deductible for your car insurance is $500. If you have an accident and the
loss is $600, you pay $500 out of your own pocket and your insurance company pays you
$100. In this case,

600
ground up loss

100
amount paid by the
insurance company

500
amount paid by the insured
out of his own pocket

However, if the loss is $400, then you pay all the loss and the insurance company pays
zero.
=

400
ground up loss

0
amount paid by the
insurance company

400
amount paid by the insured
out of his own pocket

Claim payment per payment


Let Y represent the claim payment. Then Y = ( X

means (Y Y > 0 ) . Evidently, if X

d , then Y

d )+ . Claim payment per payment

0 . In this case, the insured will cover all

the loss with his money and wont need to report the loss to the insurance company. So
the insurance company may not even know that a loss has incurred. So for the insurance
company to pay any claim, Y must be positive. This is why the claim payment per
payment is (Y Y > 0 ) .
Full solution

Let X represent the ground up loss. Let Y represent the claim payment. The deductible is
d = 150 .

Y = ( X 150 ) + = max ( X 150, 0 )


YP =Y Y > 0
We are asked to find Var (Y P ) .
Var (Y P ) = Var (Y Y > 0 ) = Var
=E

(X

(X

150 ) + X > 150


2

150 ) + X > 150


E 2 ( X 150 X > 150 )

Please note that


Guo Fall 2009 C, Page 269 / 284

Var ( X 150 X > 150 )

E ( X 150 X > 150 )

E 2 ( X 150 X > 150 )

This is because E ( X 150 X > 150 ) is not an appropriate symbol.


2

X
( X 150 )+

P(X )

100
0

200
50

300
150

0.2

0.2

0.6

P ( X > 150 ) = P ( X = 200 ) + P ( X = 300 ) = 0.8


P( X )

0.2
0.8

P ( X > 150 )

E ( X 150 X > 150 ) = 0

(X

Var

(X

0.2
0.8

0.6
0.8

0.2
0.2
0.6
+ 50
+ 150
= 125
0.8
0.8
0.8

150 ) + X > 150 = 0 2


2

0.2
0.2
0.6
+ 50 2
+ 150 2
= 17,500
0.8
0.8
0.8

150 )+ X > 150 = 17,500 1252 = 1,875

Well use BA II Plus or BA II Plus Professional 1-V Statistics Worksheet to calculate


Var ( X 150 )+ X > 150 .
As explained in the chapter on calculators, when using BA II Plus or BA II Plus
Professional 1-V Statistics Worksheet, we can simply discard the data that falls out of the
conditional probability and calculate the mean/variance on the remaining data.
X
Is X > 150 ?

100
No, so discard
this data

200
Yes. Keep this
data.

300
Yes. Keep this
data.

After we discarded X = 100 , the remaining data is:


200
300
X
50
150
( X 150 )+
P(X )

0.2

0.6

10P ( X ) -- Scaled up
probability

Guo Fall 2009 C, Page 270 / 284

Enter the following into Statistics Worksheet:


X01=200, Y01=2;

X02=150, Y02=6

BA II Plus or BA II Plus Professional should give you:

n = 8,
Var =

X = 125,

= 43.30127019

= 1,875

Additional practice problems

#1 For an insurance policy:


Losses can be 100, 200, 300, and 400 with respective probabilities 0.1, 0.2, 0.3, and 0.4.
The insurance has an ordinary deductible of 250 per loss.
Y P is the claim payment per payment random variable.

Calculate Var (Y P ) .
Solution
Fast solution

Ground up loss X
Is X > 250 ?

100
No. Discard

New table after discarding X


X
( X 250 )+

200
No. Discard.

300
Yes. Keep.

400
Yes. Keep.

250 :

300
50

400
150

P(X )

0.3

0.4

10 P ( X ) -- scaled up probability

Enter the following into 1-V Statistics Worksheet:


X01=50, Y01=3;

X02=150, Y02=4

BA II Plus or BA II Plus Professional should give you:


Guo Fall 2009 C, Page 271 / 284

n = 7,
Var =

X = 107.14,

= 49.48716593

= 2, 4489.98

Standard solution
X
(X

250 ) +

P(X )

100
0

200
0

300
50

400
150

0.1

0.2

0.3

0.4

P ( X > 250 ) = P ( X = 300 ) + P ( X = 400 ) = 0.3 + 0.4 = 0.7


P(X )

0.1
0.7

P ( X > 250 )
E(X

250 X > 250 ) = 0

(X

Var

0.2
0.7

0.4
0.7

1
2
3
4
+0
+ 50
+ 150
= 107.1428571
7
7
7
7

150 ) + X > 150 = 0 2

(X

0.3
0.7

1
2
3
4
+ 02
+ 502
+ 150 2
= 13, 928.57143
7
7
7
7

150 )+ X > 150 = 13,928.57143 107.14285712 = 2, 448.99

#2 For an insurance policy:


Losses can be 1,000, 4,000, 5,000, 9,000, and 12,000 with respective probabilities 0.11,
0.17, 0.24, 0.36, and 0.12.
The insurance has an ordinary deductible of 900 per loss.
Y P is the claim payment per payment random variable.

Calculate Var (Y P ) .

Solution

Guo Fall 2009 C, Page 272 / 284

To speed up calculations, we set one unit of money equal to $1,000.


Ground up loss X
Is X > 0.9 ?

( X 0.9 )+
P(X )
100P ( X ) -- scaled
up probability

1
Yes.
Keep.
0.1

4
Yes.
Keep.
3.1

5
Yes.
Keep.
4.1

9
Yes.
Keep.
8.1

12
Yes.
Keep.
11.1

0.11

0.17

0.24

0.36

0.12

11

17

24

36

12

Enter the following into 1-V Statistics Worksheet:


X01=0.1, Y01=11;
X03=4.1, Y03=24;
X04=11.1, Y04=12

X02=3.1, Y02=17;
X04=8.1, Y04=36;

BA II Plus or BA II Plus Professional should give you:


n = 100,
Var =

X = 5.77,

= 3.28345854

= 10.781 = 10.781 ( $1, 000 ) = 10, 781,100$ 2


2

Guo Fall 2009 C, Page 273 / 284

Chapter 11

LER (loss elimination ratio)

Exam M Sample #27


You are given:
Losses follow an exponential distribution with the same mean in all years.
The loss elimination ratio this year is 70%.
The ordinary deductible for the coming year is 4/3 of the current deductible.
Compute the loss elimination ratio for the coming year.

Core concept:
Loss elimination ratio (LER)
LER =

Expected loss amount paid by the insured E ( X d )


=
E(X )
Expected loss amount

LER answers the question, What % of the expected loss amount is absorbed by the
policyholder due to the deductible?
How to calculate LER.
E(X ) =

xf ( x )dx =

(X

s ( x )dx

d ) = min ( X , d ) =

E(X

X
d

if X d
if X > d

d ) = x f ( x )dx + d

f ( x )dx (Intuitive formula)

Alternatively,
E(X

d ) = s ( x )dx =

1 FX ( x ) dx

You can find the proof of the 2nd formula from Loss Models.
Guo Fall 2009 C, Page 274 / 284

To help memorize the above formulas, notice that if we set d = 0 , then


E(X ) = E(X

0) =

s ( x )dx

Solution to Sample #27


Ground up loss X has exponential distribution with mean
f ( x) =

E(X

e , s ( x) = 1 F ( x) = 1
d

LER =

d)

E(X )

= e , E(X ) =

d ) = s ( x )dx = e dx =

E(X

1 e

1 e

=1 e

(you might want to memorize this result)

Under the original deductible, LER = 70%


d

1 e

= 0.7, e

= 0.3

Under the new deductible (which is

LER ' = 1 e

4 d
3

=1

4
3

4
of the original deductible),
3
4

= 1 0.3 3 = 0.799

Guo Fall 2009 C, Page 275 / 284

http://www.guo.coursehost.com

Chapter 12
E (Y

Find E(Y-M)+

m )+ = E (Y ) m + mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)

Where Y and m are non-negative integers.


The above formula works whether Y is a simple random variable or a compound random
variable Y =

n
i =1

X i . If Y =

n
i =1

X i , make sure you write

E (Y m )+ = E (Y ) m + mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)


Dont write

E (Y m )+ = E (Y ) m + mf X ( 0 ) + ( m 1) f X (1) + ( m 2 ) f X ( 2 ) + ...1 f X ( m 1)
In other words, the pdf in the right hand side must match up with the random variable in
the left hand side. If the random variable in the left hand side Y =

fY ( y ) in the right hand and write the following equation:

n
i =1

X i , you need to use

E (Y m )+ = E (Y ) m + mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)


If your random variable in the left hand side is X , then you need to write

E(X

m ) + = E ( X ) m + mf X ( 0 ) + ( m 1) f X (1) + ( m 2 ) f X ( 2 ) + ...1 f X ( m 1)

To use the above formula in the heat of the exam, we rewrite the above formula into:
fY ( 0 )
E (Y

m )+ = E (Y )

fY (1)

m +

fY ( 2 )
...
fY ( m 1)

m
m 1
m 2
...
1

In the above formula,

Guo Fall 2009 C, Page 276 / 284

http://www.guo.coursehost.com

fY ( 0 )
fY (1)

fY ( 2 )
...

m
m 1
m 2 = mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)
...

fY ( m 1)

This is not a standard notation. However, we use it anyway to help us memorize the
formula. In the exam, you just write these 2 matrixes. Then you simply take out each
element in the 1st matrix and multiply it with a corresponding element in the 2nd matrix.
Next, sum everything up.
Please note that if you take out an element fY ( k ) (where 0 k

m 1 ) from the 1st

matrix, then you need to multiple it with m k from the 2nd matrix so ( m k ) + k = m
stands.
The proof of this formula is simple.
The standard formula is:
d 1

E ( S d )+ = E ( S )

s =0

1 FS ( s )

Please note that I didnt write the formula as


d 1

E ( S d )+ = E ( S )

s =0

1 FS ( x )

The above formula is confusing. f S ( x ) is not a good notation because S and x dont
match. The right notation should be f S ( s ) .
Lets move on from the formula E ( S d )+ = E ( S )

d 1
s =0

1 FS ( s ) . To make our proof

simple, lets set d = 3 . The proof is the same if d is bigger.


E ( S 3) + = E ( S )
2
s =0

2
s =0

1 FS ( s )

1 FS ( s ) = 1 FS ( 0 ) + 1 FS (1) + 1 FS ( 2 ) = 3

FS ( 0 ) = P ( S

FS ( 0 ) + FS (1) + FS ( 2 )

0) = P ( S = 0) = fS (0)
Guo Fall 2009 C, Page 277 / 284

http://www.guo.coursehost.com

FS (1) = P ( S 1) = P ( S = 0 ) + P ( S = 1) = f S ( 0 ) + f S (1)
FS ( 2 ) = P ( S

2 ) = P ( S = 0 ) + P ( S = 1) + P ( S = 2 ) = f S ( 0 ) + f S (1) + f S ( 2 )

FS ( 0 ) + FS (1) + FS ( 2 ) = 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )
E ( S 3)+ = E ( S ) 3 + 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )

Now you should be convinced that the following formula is correct:


fY ( 0 )
E (Y

m )+ = E (Y )

fY (1)

m +

fY ( 2 )
...
fY ( m 1)

Problem 1

# 11

m
m 1
m 2
...
1

May 2000 Course 3

A company provides insurance to a concert hall for losses due to power failure. You are
given:

The number of power failures in a year has a Poisson distribution with mean 1.

The distribution of ground up losses due to a single power failure is

x
10
20
50

Probability of x
0.3
0.3
0.4

The number of power failures and the amounts of losses are independent.

There is an annual deductible of 30.

Calculate the expected amount of claims paid by the insurer in one year.
Solution

Let N = # of power failures, S = total claim dollar amount before deductible.

Guo Fall 2009 C, Page 278 / 284

http://www.guo.coursehost.com

Then S =

N
i =1

Xi .

The total claim dollar amount after the deductible of $30 is:

(S

30 )+ =

Xi

i =1

30
+

Applying the formula, we have:


fS (0)
E ( S 30 )+ = E ( S )

f S (1)

30 +

30
29

f S ( 2 ) 28
...
...
f S ( 29 )
1

It seems like we have awful lot of work to do about the two matrixes. Before you start to
panic, please note that many of the values f S ( 0 ) , f S (1) ,..., f S ( 29 ) will be zero. This is
because X has only 3 distinct values: 10, 20, and 50 with probability of 0.3, 0.3, and 0.4
respectively. Evidently, we can throw away X = 50 . If X = 50 , then S is at least 50 and
is out of the range S 29 .

Please also note that S =

N
i =1

P ( N = n) =

1
e
n!

X i where N is a Poisson random variable with mean

So for S 29 , the possible values of S are:


N
X
P(N )
P ( X 1 , X 2 ,..., X N )

S=

N
i =1

0
1

e
e

1
e
2

1
1

=1.

X = 10
X = 20

0.3
0.3

( X 1 , X 2 ) = (10,10 )

0.32

0
10
20
20

Xi

P(S )
e1
0.3e 1
0.3e 1
1 1
e ( 0.32 )
2

Guo Fall 2009 C, Page 279 / 284

http://www.guo.coursehost.com

Next, we consolidate the probabilities:


S=

N
i =1

Xi

0
10
20
20

P(S )
e1
0.3e 1
0.3e 1
1 1
e ( 0.32 )
2

After consolidation:

S=

N
i =1

0
10
20

E ( S 30 ) + = E ( S )

P(S )

Xi

e1
0.3e

1
0.3e 1 + e
2

( 0.3 ) = 0.345e
2

fS ( 0)

30 +

f S (10 )
f S ( 20 )

30

20
10

In the actual exam, to help remember the two matrixes, you can write only the 1st matrix:
fS ( 0)

f S (10 )
f S ( 20 )

b
c

As said early, the sum of the two elements in each row needs to be m (or 30 in this
problem). As a result,
0 + a = 30
10 + b = 30
20 + c = 30

a = 30
b = 20
c = 10

Then, you can fill out the 2nd matrix:

Guo Fall 2009 C, Page 280 / 284

http://www.guo.coursehost.com

fS ( 0)

f S (10 )
f S ( 20 )

30

f S (10 )
f S ( 20 )

S=

i =1

b = f S (10 )
c
f S ( 20 )

fS ( 0)

fS ( 0)

30

20
10

30

20 = 0.3e 1 20 = e
10
0.345e 1
10

30

0.3 20 = 39.45e
0.345
10

E (S ) = E (N ) E ( X )

Xi

E ( N ) = 1 , E ( X ) = 10 ( 0.3) + 20 ( 0.3) + 50 ( 0.4 ) = 29


E ( S ) = E ( N ) E ( X ) = 29
E ( S 30 ) + = E ( S ) 30 + 39.45e 1 = 13.5128

Problem 2

#18 May M, 2005

For a collective risk model:

The number of losses has a Poisson distribution with

The common distribution of the individual losses is:

x
1
2

=2

fX ( x)
0.6
0.4

An insurance covers aggregate losses subject to a deductible of 3.


Calculate the expected aggregate payments of the insurance.

Solution

Guo Fall 2009 C, Page 281 / 284

http://www.guo.coursehost.com

S=

N
i =1

X i where S is the aggregate loss and X is individual loss dollar amount.

We are asked to find E ( S 3)+ .


fS ( 0)

E ( S 3 )+ = E ( S )

3 + f S (1) 2
fS ( 2)
1

Where E ( S ) = E ( N ) E ( X ) = 2 1( 0.6 ) + 2 ( 0.4 ) = 2.8


fS ( 0)

f S (1) .
fS ( 2)

Next, we need to find

P(N )

P ( X 1 , X 2 ,..., X N )

S=

N
i =1

0
1

X =1

0.6

0
1

X =2

0.4

( X 1 , X 2 ) = (1,1)

0.62

e
2e

22 2
e = 2e
2!

Xi

P(S )
e2
( 0.6 ) 2e

( 0.4 ) 2e 2

( 0.6 ) 2e
2

Next, we consolidate the table into:


S=

N
i =1

0
1
2

Xi

P(S )
e2
( 0.6 ) 2e 2 = 1.2e

( 0.4 ) 2e 2 + ( 0.62 ) 2e 2 = 1.52e 2

E ( S 3)+ = E ( S )

fS ( 0)

3 + f S (1) 2 = 2.8 3 + 1.2e


fS ( 2)
1
1.52e

3
2
2

2
1

Guo Fall 2009 C, Page 282 / 284

http://www.guo.coursehost.com

1
= 2.8 3 + e

Problem 3

1.2 2 = 2.8 3 + 6.92e 2 = 0.73652


1.52
1

Sample M #45

Prescription drug losses, S, are modeled assuming the number of claims has a geometric
distribution with mean 4, and the amount of each prescription is 40.
Calculate E ( S 100 ) +

Ill leave this problem for you to solve.

Guo Fall 2009 C, Page 283 / 284

About the author


Yufeng Guo was born in central China. After receiving his Bachelors degree in physics
at Zhengzhou University, he attended Beijing Law School and received his Masters of
law. He was an attorney and law school lecturer in China before immigrating to the
United States. He received his Masters of accounting at Indiana University. He has
pursued a life actuarial career and passed exams 1, 2, 3, 4, 5, 6, and 7 in rapid succession
after discovering a successful study strategy.
Mr. Guos exam records are as follows:
Fall 2002
Passed Course 1
Spring 2003
Passed Courses 2, 3
Fall 2003
Passed Course 4
Spring 2004
Passed Course 6
Fall 2004
Passed Course 5
Spring 2005
Passed Course 7
Mr. Guo currently teaches an online prep course for Exam P, FM, MFE, and MLC. For
more information, visit http://actuary88.com/.
If you have any comments or suggestions, you can contact Mr. Guo at
yufeng_guo@msn.com.

Guo Fall 2009 C, Page 284 / 284