313 views

Uploaded by Gustavo De La Rosa

- ASM Exam LC Study Manual (90 Days)
- GuoArchMLC2007
- 2. Guo-Deeper Understanding Faster Calc SOA MFE.pdf
- H. Mahler Study Aids for SOA Exam C and CAS Exam 4， 2013 Nodrm
- FM Guo
- SOA Exam MFE Flash Cards
- DUMLC Guo Manual
- Exam C Formulas
- Asm c Manual PDF
- MLCSM ACTEX
- YGM Manual MLC
- Guo´s Manuals SOA exam C.pdf
- Formulas for the MFE Exam
- Deeper Understanding, Faster Calculation
- Deeper Understanding Fall 08 Manual for FM
- ACTEX Exam C Spring 2009 Vol 1
- FINAN MFE / EXAM 3F
- A&J Questions Bank for SOA Exam MFE/ CAS Exam 3F
- Deeper Understanding, Faster Calculation- Exam P.pdf
- 2014 Mahler Sample m Fe

You are on page 1of 284

6th Edition

by Yufeng Guo

Fall 2009

The Missing Manual

This electronic book is intended for individual buyer use for the sole purpose of preparing for

Exam C. This book can NOT be resold to others or shared with others. No part of this publication

may be reproduced for resale or multiple copy distribution without the express written permission

of the author.

Guo Fall 2009 C, Page 1 / 284

Table of Contents

Introduction 4

Chapter 1 Doing calculations 100% correct 100% of the time.. 5

6 strategies for improving calculation accuracy ............................................................. 5

6 powerful calculator shortcuts....................................................................................... 6

#1

Solve ax 2 + bx + c = 0 . .................................................................................... 6

#2

Keep track of your calculation...................................................................... 10

#3

Calculate mean and variance of a discrete random variable......................... 21

#4

Calculate the sample variance....................................................................... 29

#5

Find the conditional mean and conditional variance .................................... 30

#6

Do the least squares regression ..................................................................... 36

#7

Do linear interpolation .................................................................................. 46

Chapter 2

General procedure to calculate the maximum likelihood estimator ............................. 53

Fisher Information ........................................................................................................ 58

The Cramer-Rao theorem ............................................................................................. 62

Delta method................................................................................................................. 66

Chapter 3

Kernel smoothing................................................................ 75

Uniform kernel.............................................................................................................. 77

Triangular kernel........................................................................................................... 82

Gamma kernel............................................................................................................... 90

Chapter 4

Bootstrap.............................................................................. 95

Recommended supplemental reading ........................................................................... 96

Chapter 5

Rating challenges facing insurers ............................................................................... 102

3 preliminary concepts for deriving the Bhlmann premium formula ....................... 106

Preliminary concept #1 Double expectation ....................................................... 106

Preliminary concept #2 Total variance formula.................................................. 108

Preliminary concept #3 Linear least squares regression ..................................... 111

Derivation of Bhlmanns Credibility Formula.......................................................... 112

Summary of how to derive the Bhlmann credibility premium formulas .................. 117

Special case................................................................................................................. 122

How to tackle Bhlmann credibility problems ........................................................... 123

An example illustrating how to calculate the Bhlmann credibility premium ........... 123

Shortcut ....................................................................................................................... 126

Practice problems........................................................................................................ 126

Chapter 6

Assumptions of the Bhlmann-Straub credibility model............................................ 149

Summary of the Bhlmann-Straub credibility model................................................. 154

Guo Fall 2009 C, Page 2 / 284

How to tackle the Bhlmann-Straub premium problem ............................................. 158

Chapter 7

Summary of the estimation process for the empirical Bayes estimate for the

Bhlmann model..................................................................................................... 170

Empirical Bayes estimate for the Bhlmann-Straub model........................................ 173

Semi-parametric Bayes estimate................................................................................. 182

Chapter 8

General credibility model for the aggregate loss of r insureds ................................. 188

Key interim formula: credibility for the aggregate loss............................................. 190

Final formula you need to memorize .......................................................................... 191

Special case................................................................................................................. 192

Chapter 9

How to calculate the discrete posterior probability .................................................... 206

Framework for calculating the discrete posterior probability..................................... 208

How to calculate the continuous posterior probability ............................................... 213

Framework for calculating discrete-prior Bayesian premiums................................... 219

Calculate Bayesian premiums when the prior probability is continuous.................... 251

Poisson-gamma model ................................................................................................ 260

Binomial-beta model................................................................................................... 264

Chapter 11 LER (loss elimination ratio)............................................. 274

Chapter 12 Find E(Y-M)+.................................................................... 276

About the author .................................................................................... 284

Introduction

This manual is intended to be a missing manual. It skips what other manuals explain well.

It focuses on what other manuals dont explain or dont explain well. This way, you get

your moneys worth.

Chapter 1 teaches you how to do manual calculation quickly and accurately. If you

studied hard but failed Exam C repeatedly, chances are that you are concept strong,

calculation weak. The calculator techniques will improve our calculation accuracy.

Chapter 2 focuses on the variance of a maximum likelihood estimator (MLE), a difficult

topic for many.

Chapter 3 explains the essence of kernel smoothing and teaches you how to derive

complex kernel smoothing formulas for k y ( x ) and K y ( x ) . You shouldnt have any

trouble memorizing complex kernel smoothing formulas after this chapter.

Many candidates dont know the essence of bootstrap. Chapter 4 is about bootstrap.

Chapter 5 explains the core theory behind the Bhlmann credibility model.

Chapter 6 compares and contrasts the Bhlmann-Straub credibility models with the

Bhlmann credibility model.

Many candidates are afraid of empirical Bayes estimate problems. The formulas are just

too hard to remember. Chapter 7 will relieve your pain.

Many candidates find that there are just too many limited fluctuation credibility formulas

to memorize. To address this, Chapter 8 gives you a unified formula.

Chapter 9 presents a framework for quickly calculating the posterior probability (discrete

or continuous) and the posterior mean (discrete or continuous). Many candidates can

recite Bayes theorem but cant solve related problem in the exam condition. Their

calculation is long, tedious, and prone to errors. This chapter will drastically improve

your calculation efficiency.

Chapter 10 is about claim payment per payment.

Chapter 11 is about loss elimination ratio.

Chapter 12 is about how to quickly calculate E (Y

M )+ .

Chapter 1

the time

>To: yufeng_guo@msn.com

>Subject: Help..

>Date: someday in 2006

>

>Hello Mr. Guo.

>

> I tried Exam C problems under the exam-like condition. To my surprise, I found that I

>made too many mistakes; one mistake is 1+1=3. How can I improve my accuracy?

1. Gain a deeper understanding of a core concept. People tend to make errors if they

memorize a black-box formula without understanding the formula. To reduce

errors, try to understand core concepts and formulas.

2. Learn how to solve a problem faster. Many exam candidates solve hundreds of

practice problems yet fail Exam C miserably. One major cause is that their

solutions are inefficient. Typically, these candidates copy solutions presented in a

textbook and study manuals. Authors of textbooks and many study manuals

generally use software to do the calculations. To solve a messy calculation, they

just type up the formula and click Compute button. However, when you take

the exam, you have to calculate the answer manually. A solution that looks clean

and easy in a textbook may be a nightmare in the exam. When you prepare for

Exam C, dont copy textbook solutions. Improve them. Learn how to do manual

calculation faster.

3. Build solution frameworks and avoid reinventing the wheel. If you analyze Exam

C problems tested in the past, youll see that SOA pretty much tests the same

things over and over. For example, the Poisson-gamma model is tested over and

over. When preparing for Exam C, come up with a ready-to-use solution

framework for each of the commonly tested problems in Exam C. This way, when

you walk into the exam room and see a commonly tested problem, you dont need

to solve the problem from scratch. You can use your pre-built solution framework

and solve it quickly and accurately.

4. Keep an error log. Whenever you solve some practice problems, record your

errors in a notebook. Analyze why you made errors. Try to solve a problem

differently to avoid the error. Review your error log from time to time. Using an

error log helps you avoid making the same calculation errors over and over.

5. Avoid doing mental math in the exam even for the simplest calculations. Even if

you are solving a simple problem like 2+3, use your calculator to solve the

Guo Fall 2009 C, Page 5 / 284

problem. Simply enter 2 + 3 in your calculator. This will reduce your silly

errors.

6. Learn some calculator tricks.

Fast and safe techniques for common calculations.

#1

Solve ax 2 + bx + c = 0 .

b b 2 4ac

is OK when a, b, and c are nice and small numbers.

2a

However, when a, b, and c have many decimals or are large numbers and we are in the

pressured situation, the standard solution often falls apart in the heat of the exam.

The formula x =

If candidates need to solve this equation in the exam, many will fluster. The standard

b b 2 4ac

approach x =

is labor intensive and prone to errors when a, b, and c are

2a

messy.

To solve this equation 100% right under pressure and in a hurry, well do a little trick.

1

First, we set x = v =

. So we treat x as a dummy discount factor. The original

1+ r

equation becomes:

0.3247v 2 89.508v + 0.752398 = 0

Finding r is a concept you learned in Exam FM. We first convert the equation to the

following cash flow diagram:

Time t

Cash flow

$0.752398

- $89.508

$0.3247

Guo Fall 2009 C, Page 6 / 284

So at time zero, you receive $0.752398. At time one, you pay $89.508. Finally, at time

two, you receive $0.3247. Whats your IRR?

To find r (the IRR), we simply use Cash Flow Worksheet in BA II Plus or BA II Plus

Professional.

Enter the following cash flows into Cash Flow Worksheet:

Cash Flow

CF 0

0.752398

C 01

- 89.508

F 01

1

Frequency

C 02

0.3247

F 02

1

Because the cash flow frequency is one for both C 01 and C 02 , we dont need to enter

F 01 = 1 and F 02 = 1 . If we dont enter cash flow frequency, BA II Plus and BA II Plus

Professional use one as the default cash flow frequency.

Using the IRR function, we find that IRR = 99.63722807 . Remember this is a

percentage. So r = 99.63722807%

x1 =

1

1

=

= 275.6552834

1 + r 1 99.63722807%

How are going to find the second root? Well use the following formula:

If x1 and x2 are the two roots of ax 2 + bx + c = 0 , then

x1 x2 =

x2 =

c

a

x2 =

1 c

x1 a

1 c

1

0.752398

=

= 0.00840619

x1 a 275.6552834 0.3247

Procedure

Assume we set the calculator to

display 8 decimal places.

Use Cash Flow Worksheet

Keystroke

Display

CF

CF0=(old content)

CF0=0.00000000

CF0=0.752398

Clear Worksheet

Enter the cash flow at t = 0.

Enter the cash flow at t =1.

C01 0.00000000

C01= - 89.50800000

F01= 1.00000000

The default # is 1. So no need to

enter anything.

Enter the cash flow at t =2.

C02 0.00000000

0.3247 Enter

C02= 0.32470000

Calculate IRR

IRR

IRR=0.00000000

CPT

%

IRR= - 99.63722807

IRR 0.9963722807 (This is the

dummy interest)

+ 1=

IRR 0.00362772

1x

IRR 275.65528324

This is x1

STO 0

IRR 275.65528324

1 x 0.752398 0.3247

0.00840619 This is

x1 =

1

1 + IRR %

auditing trail.

Find the 2nd root.

x2 =

1 c

x1 a

auditing trail.

x2

=

STO 1

0.00840619

You can always double check your calculations. Retrieve x1 and x2 from the calculator

memory and plug in 0.3247 x 2 89.508 x + 0.752398 . You should get a value close to

zero. For example, plugging in x1 = 275.6552834 :

0.3247 x 2 89.508 x + 0.752398 = 0.00000020 (OK)

Plugging in x2 = 0.00840619

0.3247 x 2 89.508 x + 0.752398 = 6.2 10

12

(OK)

Does this look at lot of work? Yes at the first time. Once you get familiar with this

process, it takes you 15 seconds to finish calculating x1 and x2 and double checking they

are right.

Quick and error-free solution process to ax 2 + bx + c = 0

Step 1 Rearrange ax 2 + bx + c = 0 to c + bx + ax 2 = 0 .

Step 2 Use BA II Plus/BA II Plus Professional Cash Flow Worksheet to find IRR

CF 0 = c (cash flow at time zero)

C 01 = b (cash flow at time one)

C 02 = a (cash flow at time two)

Time t

Cash flow

x1 =

1

1 c

, x2 =

IRR

x1 a

1+

100

Guo Fall 2009 C, Page 9 / 284

In the exam, if an equation is overly simple, just try out the answer. If an equation is not

overly simple, always use the above process to solve ax 2 + bx + c = 0 .

For example, if you see x 2 2 x 3 = 0 , you can guess that x1 = 1 and x2 = 3 . However,

if you see x 2 2 x 7.3 = 0 , use Cash Flow Worksheet to solve it.

Exercise

#1

Solve 10,987 x 2 + 65,864 x + 98,321 = 0

Answer: x1 = 7.2321003 and x2 = 1.23737899

#2

Solve x 2 2 x 7.3 = 0 .

Answer: x1 = 3.88097206 and x2 = 1.88097206

#3

Solve 0.9080609 x 2 0.00843021x 0.99554743 = 0

Answer: x1 = 1.0517168 and x2 = 1.04243305

#4

Solve x 2 2 x + 3 = 0 .

Answer: youll get an error message if want to calculate IRR. Theres no solution.

2

x 2 2 x + 3 = ( x 1) + 2 2 . So theres no solution.

#2

Example 1

A group of 23 highly-talented actuary students in a large insurance company are taking

SOA Exam C at the next exam sitting. The probability for each candidate to pass Course

2 is 0.73, independent of other students passing or failing the exam. The company

promises to give each actuary student who passes Exam C a raise of $2,500. Whats the

probability that the insurance company will spend at least $50,000 on raises associated

with passing Exam C?

Solution

If the company spends at least $50,000 on exam-related raises, then the number of

students who will pass Exam C must be at least 50,000/2,500=20. So we need to find the

probability of having at least 20 students pass Exam C.

Let X = the number of students who will pass Exam C. The problem does not specify the

distribution of X . So possibly X has a binomial distribution. Lets check the conditions

for a binominal distribution:

There are only two outcomes for each student taking the exam either Pass or

Fail.

The probability of Pass (0.73) or Not Pass (0.27) remains constant from one

student to another.

The exam result of one student does not affect that of another student.

X satisfies the requirements of a binomial random variable with parameters n =23 and

p =0.73. We also need to find the probability of x 20 .

Pr(x

n x

f (x

, we have:

20)

20

21

22

23

= C 23

(.73)20 (.27)3 + C 23

(0.73)21(.27)2 + C 23

(.73)22 (.27) + C 23

(.73)23 = .09608

Therefore, there is a 9.6% of chance that the company will have to spend at least $50,000

to pay for exam-related raises.

Calculator key sequence for BA II Plus:

Method #1 direct calculation with out using memories

Procedure

Set to display 8 decimal places (4

decimal places are sufficient, but

assume you want to see more

decimals)

Set AOS (Algebraic operating system)

Keystroke

Display

DEC=8.00000000

2nd [FORMAT],

keep pressing multiple

times until you see Chn.

Press 2nd [ENTER]

(if you see AOS, your

calculator is already in AOS,

in which case press

[CLR Work] )

Calculate

AOS

20

C 23

(.73)20 (.27)3

Calculate

20

C23

23 2

nd

Cr

1,771.000000

20

20

3.27096399

Calculate (.73)

.73

Calculate

(.27)3

20

0.064328238

.27

3+

Calculate

21

C23

(0.73)21(.27)2

Calculate

21

C23

23 2

nd

nC r

253.0000000

21

21

0.34111482

Calculate (.73)

.73

Calculate

(.27)2

21

0.08924965

.27

Calculate

x +

22

C23

(0.73)22 (.27)

Calculate

22

C23

23 2

nd

Cr

23.00000000

22

22

0.02263762

Calculate (.73)

.73

Calculate

Calculate

(.27)

22

0.09536181

.27 +

23

C23

(0.73)23

Calculate

23

Calculate (.73)

23

C23

result

23 2

nd

Cr

1.00000000

23

0.09608031

.73

23

Procedure

Set to display 8 decimal places (4

decimal places are sufficient, but

assume you want to see more

decimals)

Set AOS (Algebraic operating system)

Keystroke

Display

DEC=8.00000000

2nd [FORMAT],

keep pressing multiple

times until you see Chn.

Press 2nd [ENTER]

(if you see AOS, your

calculator is already in AOS,

in which case press

[CLR Work] )

2nd MEM

2nd CLR Work

CE/C

Clear memories

Get back to calculation mode

Calculate

20

23

(.73 ) (.27 )

20

AOS

M0=0.00000000

0.00000000

and

store it in Memory 1

Calculate

Calculate

20

C23

( 0.73 )

Calculate

Cr

1,771.000000

20

20

3.27096399

.73

(.27 )

20

Get back to calculation mode

Calculate

23 2

nd

0.064328238

.27

3=

STO 0

CE/C

0.064328238

0.00000000

21

C23

( 0.73 ) (.27)2

21

Calculate

21

C23

nd

Cr

253.0000000

21

21

0.34111482

Calculate (.73)

.73

Calculate

(.27)2

Calculate

21

0.07290000

.27

Store the result in Memory 1

Get back to calculation mode

STO 1

CE/C

0.02486727

0.00000000

22

C23

(0.73)22 (.27) and

store it in Memory 3

Calculate

22

C23

23 2

nd

Cr

23.00000000

22

22

0.02263762

Calculate (.73)

.73

Calculate

(.27)

Calculate

23

23

23 (0.73) and

22

0.00611216

.27 =

STO 2

0.09536181

store it

in Memory 4

Calculate

23

Calculate (.73)

23

C23

result

Recall values stored in Memory 1,2,3,

and 4. Sum them up.

23 2

nd

Cr

1.00000000

23

0.00071850

.73

STO 3

23

=

0.00071850

RCL 0

+ RCL 1

+ RCL 2

+ RCL 3 =

0.064328238

0.02486727

0.00611216

0.09608031

Method #1 is quicker but more risky. Because you dont have an audit history, if you

miscalculate one item, youll need to recalculate everything again from scratch.

Method #2 is slower but leaves a good auditing trail by storing all your intermediate

values in your calculators memories. If you miscalculate one item, you need to

recalculate that item alone and reuse the result of other calculations (which are correct).

20

For example, instead of calculating C 23

(.73 )

20

(.27 )

20

C 23

(.73 ) (.27 ) . To correct this error under method #1, you have to start from scratch

and calculate each of the following four items:

3

20

C 23

(.73 )

20

20

3

21

21

22

23

(0.73)22 (.27) , and C 23

(0.73)23

(.27 ) , C23

( 0.73 ) (.27)2 , C23

In contrast, correcting this error under Method #2 is lot easier. You just need to

20

3

20

recalculate C 23

(.73 ) (.27 ) ; you dont need to recalculate any of the following three

items:

21

22

23

C23

(0.73)22 (.27) , and C 23

(0.73)23

( 0.73 ) (.27)2 , C23

21

You can easily retrieve the above three items from your calculators memories and

calculate the final result:

20

21

22

23

C 23

(.73)20 (.27)3 + C 23

(0.73)21(.27)2 + C 23

(.73)22 (.27) + C 23

(.73)23 = .09608

Given:

9,617,802

l20

l30

9,501,381

l50

8,950,901

A50

0.24905

a20

16.5133

a30

15.8561

a50

13.2668

Interest rate

6%

Guo Fall 2009 C, Page 14 / 284

a20

Calculate V =

l50

A50 v 20

l30

a20

l30 10

v a30

l20

l50 30

v a50

l20

Solution

This calculation is complex. Unless you use a systematic method, youll make mistakes.

Calculation steps using BA II Plus/BA II Plus Professional

a20

l50

A50 v 20

l30

a20

v = 1.06

l30 10

v a30

l20

= A50 v 20

l50 30

v a50

l20

1

a20

l30

l30 10

v a30

l20

1

a20

l50

l50 30

v a50

l20

1

a20

20 l30

= A50 v

1

a20

l50

1 10

v a30

l20

1 30

v a50

l20

V = A501.06

20

a20

a30

l30

l20

a20

a50

l50

l20

1.06

1.06

10

30

Make sure you dont make mistakes in simplification. If you are afraid of making

mistakes, dont simplify and just do your calculations using the original equation:

a20

V=

l50

A50 v 20

l30

a20

l30 10

v a30

l20

l50 30

v a50

l20

Input

Memory

Value

l20

M0

9,617,802

l30

M1

9,501,381

Guo Fall 2009 C, Page 15 / 284

l50

M2

8,950,901

A50

M3

0.24905

a20

M4

16.5133

a30

M5

15.8561

a50

M6

13.2668

V = A501.06

20

a20

a30

l30

l20

a20

a50

l50

l20

1.06

1.06

10

= ( M 3)1.06

30

20

M4 M5

1.06 10

M1 M 0

M4 M6

1.06 30

M2 M0

Procedure

Set to display 8 decimal

places

Set AOS (Algebraic

operating system)

Keystroke

Display

DEC=8.00000000

2nd [FORMAT],

keep pressing multiple

times until you see Chn.

Press 2nd [ENTER]

(if you see AOS, your

calculator is already in AOS,

in which case press

[CLR Work] )

from the memories

AOS

M0=0.00000000

2

nd

nd

Enter 9,617,802 in M0

M0=9,617,802.000

9,617,802 Enter

M1=0.00000000

Enter 9,501,381 in M1

M1=9,501,381.000

9,501,381 Enter

M2=0.00000000

Enter 8,950,901 in M2

M2=8,950,901.000

8,950,901 Enter

M3=0.00000000

Enter 0.24905 in M3

M3=0.24905000

0.24905 Enter

M4=0.00000000

Enter 16.5133 in M4

M4=16.51330000

16.5133 Enter

M5=0.00000000

Enter 15.8561 in M5

M5=15.85610000

15.8561 Enter

M6=0.00000000

Enter 13.2668 in M6

M6=13.26680000

13.2668 Enter

workbook and get back to

the normal calculation

mode

CE/C

This is the button on the

bottom left corner. This is the

same button for

CLR Work

Dont bypass this step; its easy to enter a wrong data.

Keystrokes: press 2nd MEM. Then keep pressing the down-arrow key to view all the

data you entered in the memories. Make sure all the correct numbers are entered.

Step 4 Do the final calculation.

V = ( M 3)1.06

20

M4 M5

1.06 10

M1 M 0

M4 M6

1.06 30

M2 M0

Guo Fall 2009 C, Page 17 / 284

M4

M1

M5

1.06

M0

M4

M2

M6

1.06

M0

10

30

V = ( M 3)1.06

Procedure

Calculate

M4 M5

1.06

M1 M 0

10

20

M7

M8

Keystroke

Display

Recall 0

0.00000082

1.06 y x 10 +/- =

Store the result in M7.

Go back to the normal

calculation mode.

STO 7 CE/C

0.00000160

Calculate

M4 M6

1.06

M2 M0

30

Recall 0

1.06 y x 10 +/- =

Go back to the normal

calculation mode.

STO 8 CE/C

0.0399556010

x

Calculate

V = ( M 3)1.06

20

M7

M8

Recall 3 1.06 y

Recall 7 Recall 8

20 +/-

So V = 0.0399556 0.04

Though this calculation process looks long, once you get used to it, you can do it in less

than one minute.

Advantages of this calculation process:

Inputs are entered only once. In this problem, l20 and a20 are used twice in the

formula V = A501.06

20

a20

a30

l30

l20

a20

a50

1.06

10

30

1.06

l50

l20

memories only once. This reduces data entry error.

This process gives us a good auditing trail, enabling us to check the data entry and

calculations.

We can isolate errors. For example, if a wrong value of l30 is entered into the

a20 a30

memory, we can reenter l30 , recalculate

1.06 10 , and store the calculate

l30

l20

M7

value into M7. Next, we recalculate V = ( M 3)1.06 20

.

M8

Bottom line: I recommend that you master this calculation method. It costs you extra

work, but it enables you to do messy calculations 100% right in the exam.

When exams get tough and calculations get messy, many candidates who know as much

as you do will make calculations errors here and there and fail the exam. In contrast,

youll stand above the crowd and make no errors, passing another exam.

Problem 3 (Reserve example revised)

In Example 2, you calculated that V = 0.04 . However, none of the answer choices given

is 0.04. Suspecting that you made an error in calculations, you decided to redo the

calculation. First, you scrolled over the memories and gladly you found no error in data

M4 M5

M4 M6

entry. Next, you recalculated

1.06 10 = M 7 and

1.06 30 = M 8 .

M1 M 0

M2 M0

Once again, you found your previous calculations were right. Finally, you recalculated

M7

V = ( M 3)1.06 20

. Once again, you got V = 0.04 .

M8

You already spent four minutes in this problem. You decided to spend two more minutes

on this problem. If you couldnt figure out the right answer, you just had to give it up and

move on to the next problem.

So you quickly read the problem again. Oops! You found that your formula was wrong.

Your original formula was:

a20

V=

l50

A50 v 20

l30

a20

l30 10

v a30

l20

l50 30

v a50

l20

a20

V=

l50

a50 v 20

l30

a20

l30 10

v a30

l20

l50 30

v a50

l20

How could you find the answer quickly, using the correct formula?

Solution

The situation described here sometimes happens in the actual exam. If you dont use a

systematic method to do calculations, you wont leave a good auditing trail. In that case,

all your previous calculations are gone and you have to redo calculations from scratch.

This is awful.

Fortunately, you left a good auditing trail and correcting errors was easy.

Your previous formula after assigning memories to inputs:

a20

V=

l50

A50 v 20

l30

a20

l30 10

v a30

l20

= ( M 3)1.06

l50 30

v a50

l20

20

M7

M8

a20

V=

l50

a50 v 20

l30

a20

l30 10

v a30

l20

= ( M 6 )1.06

l50 30

v a50

l 20

20

M7

M8

Remember a50 = M 6

You simply reuse M7 and M8 and calculate

V = ( M 6 )1.06

20

M7

= 2.10713362

M8

2.11

Guo Fall 2009 C, Page 20 / 284

Now you look at the answer choices again. Good. 2.11 is there!

#3

Use TI-30 IIS (using the redo capability of TI-30IIS)

Use BA II Plus/BA II Plus Professional 1-V Statistics Worksheet

Exam #1

(#8 Course 1 May 2000) A probability distribution of the claim sizes for

an auto insurance policy is given in the table below:

Claim Size

20

30

40

50

60

70

80

Probability

0.15

0.10

0.05

0.20

0.10

0.10

0.30

What percentage of the claims are within one standard deviation of the mean claim size?

(A) 45%, (B) 55%, (C) 68%, (D) 85%, (E)100%

Solution

This problem is conceptually easy but calculation-intensive. It is easy to make calculation

errors. Always let the calculator do all the calculations for you.

One critical thing to remember about the BA II Plus and BA II Plus Professional

Statistics Worksheet is that you cannot directly enter the probability mass function f ( x i )

into the calculator to find E ( X ) and Var ( X ) . BA II Plus and BA II Plus Professional 1V Statistics Worksheet accepts only scaled-up probabilities that are positive integers. If

you enter a non-integer value to the statistics worksheet, you will get an error when

attempting to retrieve E ( X ) and Var ( X ) .

To overcome this constraint, first scale up f ( x i ) to an integer by multiplying f ( x i ) by a

common integer.

Claim Size x

20

30

40

50

60

70

80

Total

Probability Pr(x )

0.15

0.10

0.05

0.20

0.10

0.10

0.30

1.00

15

10

5

20

10

10

30

100

Next, enter the 7 data pairs of (claim size and scaled-up probability) into the BA II Plus

Statistics Worksheet to get E ( X ) and X .

BA II Plus and BA II Plus Professional calculator key sequences:

Procedure

Keystrokes

Display

Set the calculator to display

4 decimal places

2nd [FORMAT] 4 ENTER DEC=4.0000

Set AOS (Algebraic

operating system)

2nd [FORMAT],

keep pressing multiple

times until you see Chn.

Press 2nd [ENTER]

AOS

calculator is already in

AOS, in which case press

[CLR Work] )

Select data entry portion of

Statistics worksheet

Clear worksheet

Enter data set

2nd [Data]

X01 0.0000

20 ENTER

X01=20.0000

Y01=15.0000

15 ENTER

30 ENTER

X02=30.0000

Y02=10.0000

10 ENTER

40 ENTER

X03=40.0000

Y03=5.0000

5 ENTER

Guo Fall 2009 C, Page 22 / 284

50 ENTER

20 ENTER

60 ENTER

X04=50.0000

Y04=20.0000

X05=60.0000

Y05=10.0000

10 ENTER

70 ENTER

X06=70.0000

Y06=10.0000

10 ENTER

80 ENTER

X07=80.0000

Y07=30.0000

30 ENTER

Select statistical calculation

portion of Statistics

worksheet

Select one-variable

calculation method

View the sum of the scaledup probabilities

View mean

View sample standard

deviation

View

2nd [Stat]

Old content

until you see 1-V

1-V

n=100.0000 (Make sure the

sum of the scaled-up

probabilities is equal to the

scaled-up common factor,

which in this problem is

100. If n is not equal to the

common factor, youve

made a data entry error.)

x =55.0000

S x =21.9043 (this is a

sample standard deviation-- dont use this value). Note

that

1 n

Sx =

(X i X )2

n 1 i =1

X =21.7945

X =5,500.0000 (not

View

X2

X 2 =350,000.0000 (not

Guo Fall 2009 C, Page 23 / 284

though this function might

be useful for other

calculations)

You should always double check (using

to scroll up or down the data pairs of X and

Y) that your data entry is correct before accepting E ( X ) and X generated by BA II

Plus.

If you have made an error in data entry, you can 2nd DEL to delete a data pair (X, Y) or

2nd INS to insert a data pair (X,Y). If you typed a wrong number, you can use to delete

the wrong number and then re-enter the correct number. Refer to the BA II Plus

guidebook for details on how to correct data entry errors.

If this procedure of calculating E ( X ) and X seems more time-consuming than the

formula-driven approach, it could be because you are not familiar with the BA II Plus

Statistics Worksheet yet. With practice, you will find that using the calculator is quicker

than manually calculating with formulas.

Then, we have

(X

, X +

) = (55 21.7945,

55 + 21.7945)

=(33.21, 76.79)

Finally, you find

Pr(33.21

=0.05+0.20+0.10+0.10 = 0.45

First, calculate E ( X ) using E ( X ) =

xf (x ) to

To find E ( X ) , we type:

20*.15+30*.1+40*.05+50*.2+60*.1+70*.1+80*.3

Then press Enter. E ( X ) =55.

Next we modify the formula

20 .15+30 .1+40 .05+50 .2+60 .1+70 .1+80 .3

Guo Fall 2009 C, Page 24 / 284

to

20 2 .15+30 2 .1+40 2 .05+50 2 .2+60 2 .1+70 2 .1+80 2 .3

To change 20 to 20 2 , move the cursor immediately to the right of the number 20 so

your cursor is blinking on top of the multiplication sign . Press 2nd INS x 2 .

You find that

20 2 .15+30 2 .1+40 2 .05+50 2 .2+60 2 .1+70 2 .1+80 2 .3

=3500

So E ( X 2 ) =3,500

Var ( X ) = E ( X 2 ) E 2 ( X ) =3,500- 552 =475.

Finally, you can calculate

, X +

).

Keep in mind that you can enter up to 88 digits for a formula in TI-30X IIS. If your

formula exceeds 88 digits, TI 30X IIS will ignore the digits entered after the 88th digit.

Example 2

A baseball team has scheduled its opening game for April 1. If it rains on April 1, the

game is postponed and will be played on the next day that it does not rain. The team

purchases insurance against rain. The policy will pay 1,000 for each day, up to 2 days,

that the opening game is postponed. The insurance company determines that the number

of consecutive days of rain beginning on April 1 is a Poisson random variable with a 0.6

mean. What is the standard deviation of the amount the insurance company will have to

pay?

(A) 668, (B) 699, (C) 775, (D) 817, (E) 904

Solution

Let N =# of days it rains consecutively. N can be 0,1,2, or any non-negative integer.

Pr(N = n ) = e

n!

=e

0.6

0.6 n

n!

(n =0,1,2,..+ )

Let X = payment by the insurance company. According to the insurance contract, if there

is no rain (n=0), X=0. If it rains for only 1 day, X=$1,000. If it rains for two or more days

in a row, X is always $2,000. We are asked to calculate X .

If a problem asks you to calculate the mean, standard deviation, or other statistics of a

discrete random variable, it is always a good idea to list the variables values and their

corresponding probabilities in a table before doing the calculation to organize your data.

So lets list the data pair ( X , probability) in a table:

Payment X

Probability of receiving X

Pr(N = 0) = e

1,000

Pr(N = 1) = e

2,000

Pr(N

0.6

0.6 0

=e

0!

0.6

0.61

1!

0.6

= 0.6e

0.6

=1-[ Pr(N = 0) + Pr(N = 1)]

=1-1.6e

0.6

Once you set up the table above, you can use BA II Pluss Statistics Worksheet or TI-30

IIS to find the mean and variance.

Calculation Method 1 --- Using TI-30X IIS

First we calculate the mean by typing:

1000*.6e^(-.6)+2000(1-1.6e^(-.6

When typing e^(-.6) for e 0.6 , you need to use the negative sign, not the minus sign, to

get -6. If you type the minus sign in e^( .6), you will get an error message.

Additionally, for 0.6 e 0.6 , you do not need to type 0.6*e^(-.6), just type .6e^(-.6). Also,

to calculate 2000(1 1.6e .6 ) , you do not need to type 2000*(1-1.6*(e^(-.6))). Simply

type

2000(1-1.6e^(-.6

Your calculator understands you are trying to calculate 2000(1 1.6e .6 ) . However, the

omission of the parenthesis sign works only for the last item in your formula. In other

words, if your equation is

2000(1 1.6e

.6

) + 1000 .6e

.6

you have to type the first item in its full parenthesis, but can skip typing the closing

parenthesis in the 2nd item:

2000(1-1.6e^(-.6)) + 1000*.6e^(-.6

If you type

2000(1-1.6e^(-.6 + 1000*.6e^(-.6

your calculator will interpret this as

2000(1-1.6e^(-.6 + 1000*.6e^(-.6) ) )

Of course, this is not your intention.

Lets come back to the calculation. After you type

1000*.6e^(-.6)+2000(1-1.6e^(-.6

press ENTER. You should get E ( X ) = 573.0897. This is an intermediate value. You

can store it on your scrap paper or in one of your calculators memories.

Next, modify your formula to get E (x 2 ) by typing:

1000 2 .6e ^ ( .6) + 2000 2 (1 1.6 ^ ( .6

You will get 816892.5107. This is E (x 2 ) . Next, calculate Var ( X )

Var (X ) = E (x 2 ) E 2 (x ) =488460.6535

X

= Var (x ) = 698.9960 .

First, please note that you can always calculate

any other calculations without using the built-in worksheet.

In this problem, the equations used to calculate

E (x ) = 0 * e

.6

+ 1,000(.6e

.6

are:

) + 2,000(1 1.6e

.6

E (x 2 ) = 02 e

.6

+ 1,0002 .6e

Var (x ) = E (x 2 ) E 2 (x ),

.6

+ 2,0002 (1 1.6e

.6

= Var (x )

You simply calculate each item in the above equations with BA II Plus. This will give

you the required standard deviation.

However, we do not want to do this hard-core calculation in an exam. BA II Plus already

has a built-in statistics worksheet and we should utilize it.

The key to using the BA II Plus Statistics Worksheet is to scale up the probabilities to

integers. To scale the three probabilities:

(e

.6

, 0.6e

.6

, 1 1.6e

.6

Payment X

0

1,000

2,000

Total

your BA II Plus to display 4

decimal places)

e 0.6 =

0.5488

0.6

0.6e

= 0.3293

0.6

1-1.6e

=0.1219

1.0

(multiply the original probability

by 10,000)

5,488

3,293

1,219

10,000

Then we just enter the following data pairs into BA II Pluss statistics worksheet:

X01=0

X02=1,000

X03=2,000

Y01=5,488;

Y02=3,293;

Y03=1,219.

= 698.8966

Make sure your calculator gives you n that matches the sum of the scaled-up

probabilities. In this problem, the sum of your scaled-up probabilities is 10,000, so you

should get n=10,000. If your calculator gives you n that is not 10,000, you know that at

least one of the scaled-up probabilities is wrong.

Of course, you can scale up the probabilities with better precision (more closely

resembling the original probabilities). For example, you can scale them up this way

(assuming you set your calculator to display 8 decimal places):

Payment X

Probability

0

1,000

2,000

Total

e 0.6 = 0.54881164

0.6e 0.6 = 0.32928698

1-1.6e 0.6 =0.12190138

more precisely (multiply the

original probability by

100,000,000)

54,881,164

32,928,698

12,190,138

100,000,000

Then we just enter the following data pairs into BA II Pluss statistics worksheet:

X01=0

X02=1,000

X03=2,000

Y01=54,881,164;

Y02=32,928,698;

Y03=12,190,138.

n=100,000,000)

For exam problems, scaling up the original probabilities by multiplying them by 10,000

is good enough to give you the correct answer. Under exam conditions it is unnecessary

to scale the probability up by multiplying by 100,000,000.

#4

The number of claims a driver has during the year is assumed to be Poisson distributed

with an unknown mean that varies by driver.

The experience for 100 drivers is as follows:

# of claims during the year

0

1

2

3

4

Total

# of drivers

54

33

10

2

1

100

Determine the credibility of one years experience for a single driver using

semiparametric empirical Bayes estimation.

Solution

For now dont worry about credibility and focus on calculating the sample mean and

sample variance.

Standard calculation not using 1-V Statistics Worksheet

Let X represent the # of claims in a year, then

=X =

54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63

=

= 0.63

54 + 33 + 10 + 2 + 1

100

Var ( X ) =

n 1 i =1

Xi

1 100

Xi

100 1 i =1

2

100 1

=0.68

Enter

X01=0, Y01=54

X02=1, Y02=33

X03=2, Y03=10

X04=3, Y04=2

X05=4, Y05=1

You should get:

X = 0.63

S X = 0.82455988 (this is the unbiased sample standard deviation)

While your calculator displays S X = 0.82455988 , press the x 2 key of your calculator.

You should get: 0.67989899. This is Var ( X ) = S X2 . So Var ( X ) = 0.67989899 0.68

#5

Example

For an insurance:

A policyholders annual losses can be 100, 200, 300, and 400 with respective

probabilities 0.1, 0.2, 0.3, and 0.4.

Calculate the mean and the variance of the annual payment made by the insurer to the

policyholder, given theres a payment.

Solution

Let X represent the annual loss. Let Y represent the claim payment by the insurer to the

policyholder.

Then Y =

0

X

if X 250

250 if X > 250

Standard solution

100

0

0.1

X

Y

P(X )

200

0

0.2

300

50

0.3

400

150

0.4

P(X )

0.1

0.7

P ( X > 250 )

E(X

E

(X

"

0.2

0.7

0.3

0.7

0.4

0.7

150 ) +

1

2

3

4

+0

+ 50

+ 150

= 107.1428571

7

7

7

7

1

2

3

4

X > 150 ! = 0 2

+ 02

+ 502

+ 150 2

= 13, 928.57143

#

7

7

7

7

Fast solution using BA II Plus/BA II Plus Professional 1-V Statistics Worksheet

100

200

300

400

X

Y >250?

No. Discard No. Discard. Yes. Keep.

Yes. Keep.

If Yes, Keep; if No,

discard.

New table after discarding X

X

Y

P(X )

250 :

300

50

400

150

0.3

0.4

Guo Fall 2009 C, Page 31 / 284

X01=50, Y01=3;

X02=150, Y02=4

n = 7,

Var =

X = 107.14,

= 49.48716593

= 2, 4489.98

This is how BA II Plus/Professional 1-V Statistics Worksheet works. After you enter

X01=50, Y01=3,X02=150, Y02=4, BA II Plus/Professional knows that your random

variable X takes on two values: 50 (with frequency of 3) and 150 (with frequency 4).

Next, BA II Plus/Professional sets up the following table for statistics calculation:

3

3

=

3+ 4 7

X=

$150 with probability 4 = 4

$

3+ 4 7

$$50

with probability

E ( X ) = 50

3

4

,

+ 150

7

7

E ( X 2 ) = 502

3

4

,

+ 1502

7

7

Var ( X ) = E ( X 2 ) E 2 ( X )

earlier:

E(X

"

(X

1

2

3

4

+0

+ 50

+ 150

7

7

7

7

1

2

3

4

2

150 ) + X > 150 ! = 02

+ 02

+ 502

+ 150 2

#

7

7

7

7

Guo Fall 2009 C, Page 32 / 284

Now you see that BA II/Professional correctly calculates the mean and variance.

In BA II Plus/Professional 1-V Statistics Worksheet, whats important is the relative data

frequency, not the absolute data frequency.

The following entries produce identical mean, sample mean, and variance:

Entry One:

X01=50, Y01=3;

Entry Two: X01=50, Y01=6,

Entry Three: X01=50, Y01=30,

X02=150, Y02=4,

X02=150, Y02=8,

X02=150, Y02=40,

X=

$$50

$150

$

3

7

4

with probability

7

with probability

Professional 1-V Statistics Worksheet:

Throw away all the data pairs (Yi , X i ) where the condition X > a is NOT met.

General procedure to calculate E "Y ( x ) x < a !# using BA II Plus and BA II Plus

Professional 1-V Statistics Worksheet:

Throw away all the data pairs (Yi , X i ) where the condition X < a is NOT met.

Using the remaining data pairs to calculate E (Y ) and Var (Y ) .

Example

X =x

pX ( x )

0.5

4

0.54 ) k

(

6

0.25

1

0.253 ) ( 0.75 ) k

(

6

Guo Fall 2009 C, Page 33 / 284

1

0.753 ) ( 0.25 ) k

(

6

0.75

Solution

Please note that you dont need to calculate k .

pX ( x )

X =x

Scaled p X ( x ) up multiply

p X ( x ) by

4

0.54 ) k = 0.041667 k

(

6

0.5

0.25

0.75

1, 000, 000

k

41,667

1

0.253 ) ( 0.75 ) k = 0.001953 k

(

6

1

0.753 ) ( 0.25 ) k = 0.017578 k

(

6

1,953

17,578

X01=0.5, Y01=41,667

X02=0.25, Y02= 1,953

X03=0.75, Y03=17,578

You should get: n = 61,198 , X = 0.56382970 . So E ( X ) = 0.56382970

Exam C Nov 2002 #29

You are given the following joint distribution:

&

X

0

1

2

0

0.4

0.1

0.1

1

0.1

0.2

0.1

10

i =1

X i = 10

Solution

Guo Fall 2009 C, Page 34 / 284

Dont worry about the Bhlmann credibility premium for now. All you need to do right

now is to calculate the following 7 items:

E ( X ' = 0 ) , Var ( X ' = 0 ) , E ( X ' = 1) , Var ( X ' = 1) ,

E " E ( X ' ) !# , Var " E ( X ' ) !# , E "Var ( X ' ) !#

X '=0

P ( X ' = 0)

10 P ( X ' = 0 )

0

1

2

0.4

0.1

0.1

4

1

1

X01=0, Y01=4;

X01=1, Y02=1;

X03=2, Y03=1

n = 6,

X = 0.5,

= 0.76376262

Var =

= 0.58333333 =

7

12

7

12

X ' =1

P ( X ' = 1)

10 P ( X ' = 1)

0

1

2

0.1

0.2

0.1

1

2

1

X01=0, Y01=1;

X01=1, Y02=2;

X03=2, Y03=1

n = 4,

X =1

= 0.70710678

Var =

= 0.707106782 = 0.5

Guo Fall 2009 C, Page 35 / 284

E ( X ' = 0 ) = 0.5

10 P ( ' = 0 ) = 6

E ( X ' = 1) = 1

10 P ( ' = 1) = 4

X01=0.5, Y01=6;

X01=1, Y02=4

n = 10,

X = 0.7,

= 0.24494897

Var =

= 0.24494897 2 = 0.06

Var ( X ' = 0 ) =

7

12

Var ( X ' = 1) = 0.5

10 P ( ' = 0 ) = 6

10 P ( ' = 1) = 4

X01=

7

, Y01=6;

12

X01=0.5, Y02=4

n = 10,

X = 0.55,

= 0.04085483

#6

One useful yet neglected feature of BA II Plus/BA II Plus Professional is the linear least

squares regression functionality. This feature can help you quickly solve a tricky problem

with a few simple key strokes. Unfortunately, 99.9% of the exam candidates dont know

of this feature. Even SOA doesnt know.

Let me quickly walk through the basic formula behind the linear least squares regression.

This part is also explained in the chapter on the Bhlmann credibility premium. So I will

just repeat what I said in that chapter.

In a regression analysis, you try to fit a line (or a function) through a set of points. With

least squares regression, you want to get a better fit by minimizing the distance squared

of each point to the fitted line. You then use the fitted line to project where the data point

is most likely to be.

Say you want to find out how ones income level affects how much life insurance he

buys. Let X represent ones income. Let Y represent the amount of life insurance this

person buys. You have collected some data pairs of ( X , Y ) from a group of consumers.

You suspect theres a linear relationship between X and Y . So you want to predict

Y using the function a + bX , where a and b are constant. With least squares regression,

you want to minimize the following:

Q=E

"(

2

a + bX Y ) !

#

(Q (

(

2

2

E ( a + bX Y ) ! = E

=

( a + bX Y ) !# )* = E " 2 ( a + bX Y ) !#

"

#

"

(a (a

(a

+

= 2 " E ( a + bX Y ) !# = 2 " a + bE ( X ) E (Y ) !#

Setting

(Q

= 0.

(a

a + bE ( X ) E (Y ) = 0

( Equation I )

(Q (

(

2

2

E ( a + bX Y ) ! = E

=

( a + bX Y ) #! )* = E " 2 ( a + bX Y ) X !#

"

#

"

(b (b

(b

+

= 2 E "( a + bX Y ) X #! = 2 " aE ( X ) + bE ( X 2 ) E ( X Y ) !#

Setting

(Q

= 0.

(b

aE ( X ) + bE ( X 2 ) E ( X Y ) = 0

( Equation II )

(Equation II ) - (Equation I ) E ( X ) :

b " E ( X 2 ) E 2 ( X ) !# = E ( X Y ) E ( X ) E (Y )

Guo Fall 2009 C, Page 37 / 284

b=

Cov ( X , Y )

Var ( X )

, a = E (Y ) bE ( X )

Where

Var ( X ) = E ( X 2 ) E 2 ( X ) , E ( X ) =

pi xi , E ( X 2 ) =

Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) , E ( X Y ) =

pi xi2

pi xi yi , E (Y ) =

pi yi

Example 1. For the following data pair ( xi , yi ) , find the linear least squares regression

line a + bX :

i

pi ( xi , yi )

1

2

3

13

13

13

X = xi

0

3

12

Y = yi

1

6

8

Solution

1

1

( 0 + 3 + 12 ) = 5 , E ( X 2 ) = ( 02 + 32 + 122 ) = 51

3

3

Var ( X ) = 51 52 = 26

E(X ) =

1

1

(1 + 6 + 8) = 15 , E ( X Y ) = ( 0 1 + 3 6 + 12 8 ) = 38

3

3

Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) = 38 5 5 = 13

E (Y ) =

b=

Cov ( X , Y )

Var ( X )

13

= 0.5 , a = E (Y ) bE ( X ) = 5 0.5 5 = 2.5

26

Next, well calculate a + bX when X =0, 3, 12.

If X =0, 2.5 + 0.5 X = 2.5 + 0.5 ( 0 ) = 2.5 ;

Guo Fall 2009 C, Page 38 / 284

Now you understand the linear least squares regression. Next, lets talk about how to use

BA II Plus/BA II Plus Professional to find a and b and calculate a + bX when X =0, 3,

12.

Example 2. For the following data pair ( xi , yi ) , find the linear least squares regression

i

pi ( xi , yi )

1

2

3

13

13

13

X = xi

Y = yi

0

3

12

1

6

8

Solution

The keystrokes to find a + bX using BA II Plus/Professional:

2nd Data (activate statistics worksheet)

2nd CLR Work (clear the old contents)

X01=0,

Y01=1

X02=3,

Y02=6

X03=12,

Y03=8

2nd STAT (keep pressing 2nd Enter, 2nd Enter, , until your calculator displays

LIN)

Press the down arrow key

Press the down arrow key

Press the down arrow key

, youll see n = 3

, youll see X = 5

, youll see S X = 6.24499800 (sample standard deviation)

, youll see

Press the down arrow key

, youll see Y = 5

, youll see S y = 3.60555128 (sample standard deviation)

, youll see

Press the down arrow key

Press the down arrow key

, youll see b = 0.5

, youll see r = 0.8660254 ( the correlation coefficient)

Enter X ' = 0 ( To do this, press 0 Enter)

Press the down arrow key .

Press CPT. Youll get Y ' = 2.5 (this is a + bX when X =0)

Press the up arrow key , youll see X ' = 0

Enter X ' = 3 ( To do this, press 3 Enter)

Press the down arrow key .

Press CPT. Youll get Y ' = 4 (this is a + bX when X =3)

Press the up arrow key , youll see X ' = 3

Enter X ' = 12 ( To do this, press 12 Enter)

Press the down arrow key .

Press CPT. Youll get Y ' = 8.5 (this is a + bX when X =12)

You see that using BA II Plus/Professional LIN Statistics Worksheet, we get the same

result.

You might wonder why we didnt use the probability pi ( xi , yi ) . Here is an important

point. BA II Plus/Professional Statistics Worksheet (including LIN) cant directly handle

probabilities. To use Statistics Worksheet, you have to first convert the probabilities to

1

the # of occurrences. In this problem, pi ( xi , yi ) = for i =1,2, and 3. So we have 3 data

3

pairs of ( xi , yi ) and each data pair is equally likely to occur. So we arbitrarily let each

data pair to occur only once. This way, BA II Plus/Professional knows that each of the

three data pairs has 1 3 chance of occurring. Later I will show you how to use LIN when

pi ( xi , yi ) is not uniform.

Some of you might complain: I can easily use my pen and find the answers. Why do I

need to bother using LIN? There are several reasons why you might want to use LIN to

find the regression line a + bX and calculate various values of a + bX :

In the of the exam, its easy for you to be brain dead and forget the formulas

Cov ( X , Y )

b=

, a = E (Y ) bE ( X )

Var ( X )

Even if you are not brain dead, you can easily make mistakes calculating a + bX

from scratch. In contrast, if you have entered your data pair ( xi , yi ) correctly, BA

II Plus/Professional will generate the results 100% right.

Even if you want to calculate a + bX from scratch, its good to use LIN to

double check your work.

Guo Fall 2009 C, Page 40 / 284

Example 3. For the following data pair ( xi , yi ) , find the linear least squares regression

line a + bX using BA II Plus/BA II Plus Professional.

i

pi ( xi , yi )

1

2

3

16

13

12

X = xi

Y = yi

0

3

12

1

6

8

Solution

assume we have a total of 6 occurrences. Then ( x1 , y1 ) occurs once; ( x2 , y2 ) occurs

twice; and ( x3 , y3 ) occurs three times. When calculating a + bX , LIN Statistics

Worksheet automatically figures out that p1 ( x1 , y1 ) =

p3 ( x3 , y3 ) =

1

1

, p2 ( x2 , y2 ) = , and

6

3

1

.

2

Of course, you can also assume that the total # of occurrences is 60. Then ( x1 , y1 ) occurs

10 times; ( x2 , y2 ) occurs 20 times; and ( x3 , y3 ) occurs 30 times. However, this approach

will make your data entry difficult.

The following calculation assumes the total # of occurrences is 6.

When using LIN Statistics Worksheet, we enter the following data:

X01=0,

Y01=1

X02=3,

X03=3,

Y02=6

Y04=6

X04=12,

X05=12,

X06=12,

Y04=8

Y05=8

Y06=8

n = 6 , X = 7 , S X = 5.58569602 ,

= 5.09901951 ,

a = 3.25 , b = 0.41666667 , r = 0.85749293

Guo Fall 2009 C, Page 41 / 284

a + bX = 3.25 + 0.41666667 X

Set X ' = 3 . Press CPT . You should get Y ' = 4.5

Set X ' = 12 . Press CPT . You should get Y ' = 8.25

Double checking BA II Plus/Professional LIN functionality:

i

Y = yi

pi ( xi , yi ) X = xi

1

2

3

0

3

12

16

13

12

1

1

1

( 0 ) + ( 3) + (12 ) = 7 ,

6

3

2

Var ( X ) = 75 72 = 26

E(X ) =

1

6

8

E(X2) =

1 2 1 2 1

0 ) + ( 3 ) + (122 ) = 75

(

6

3

2

1

1

1

(1) + ( 6 ) + ( 8 ) = 6.1667

6

3

2

1

1

1

E ( X Y ) = ( 0 1) + ( 3 6 ) + (12 8 ) = 54

6

3

2

E (Y ) =

b=

Cov ( X , Y )

Var ( X )

10.8331

= 0.41666

26

a + bX = 3.25 + 0.41666 X

If X = 3 , then Y ' = a + bX = 3.25 + 0.41666 ( 3) = 4.5

If X = 12 , then Y ' = a + bX = 3.25 + 0.41666 (12 ) = 8.25

Now you should be convinced that LIN Statistics Worksheet produces the correct result.

Application of LIN Statistics Worksheet in Exam C

There are at least two places you can use LIN. One is to calculate Bhlmann credibility

premium as the least squares regression of Bayesian premium. Another situation is to use

LIN for liner interpolation. Ill walk you through both.

Bhlmann credibility premium as the least squares regression of Bayesian premium

Example 4. (old SOA problem)

Let X 1 represent the outcome of a single trial and let E ( X 2 X 1 ) represent the expected

value of the outcome of a 2nd trial as described in the table below:

Outcome

k

Initial probability

of outcome

Bayesian Estimate

E ( X 2 X1 = k )

0

3

12

13

13

13

1

6

8

Solution

E ( a + ZX 1 Y )

where Y = E ( X 2 X 1 ) .

Since the probability of data pair is uniformly 1 3, we enter the following data in LIN:

X01=0,

X02=3,

X03=12,

Y01=1

Y02=6

Y03=8

We should get:

a = 2.5 , b = 0.5

Enter X ' = 0 . Press CPT. Youll get Y ' = 2.5 (this is a + bX when X =0)

Enter X ' = 3 . Press CPT. Youll get Y ' = 4 (this is a + bX when X =3)

Enter X ' = 12 Press CPT. Youll get Y ' = 8.5 (this is a + bX when X =12)

4, 8.5).

Guo Fall 2009 C, Page 43 / 284

You are given the following information about insurance coverage:

# of losses

n

Probability

Bayesian Premium

E ( X 2 X1 = n )

0

1

2

14

12

14

0.5

0.9

1.7

Solution

The probability is not uniform. Assume the total # of occurrences is 4. Then the data pair

" n = 0, E ( X 2 X 1 = 0 ) = 0.5!# occurs once, " n = 1, E ( X 2 X 1 = 1) = 0.9 !# occurs twice, and

" n = 2, E ( X 2 X 1 = 2 ) = 1.7 !# occurs once.

X01=0,

X02=1,

X03=1,

X04=2,

Y01=0.5

Y02=0.9

Y03=0.9

Y03=1.7

We should get:

a = 0.4 , b = 0.6 . So the Bhlmann credibility factor is Z = b = 0.6 .

Outcome

Ri

Probability

Pi

Bayesian Estimate Ei

given outcome Ri

0

2

14

23

29

19

7 4

55 24

35 12

1

. Calculate a and b that

12

Guo Fall 2009 C, Page 44 / 284

3

i =1

Pi ( a + bRi

Ei )

Solution

1

. However, to solve this problem, you

12

really dont need to know b . Once again, well use LIN to solve the problem. Lets

assume the total # of occurrences of data pairs ( Ri , Ei ) is 9. Then (0, 7 4 ) occurs 6

Enter the following into LIN:

X01=0,

X02=0,

X03=0,

X04=0,

X05=0,

X06=0,

Y01= 7 4 = 1.75

Y02=1.57

Y03=1.57

Y04=1.57

Y05=1.57

Y06=1.57

X07=2,

X08=3,

Y07= 55 24

Y08= 55 24

X09=14,

Y09= 35 12

We should get:

a = 1.8333 , b = 0.08333 =

1

.

12

Does this solution sound too much data entry? Not to me. Yes, I can figure out the

answers using the equations:

b=

Cov ( X , Y )

Var ( X )

, a = E (Y ) bE ( X )

I might solve this problem using the above equations when Im not taking the exam.

However, in the exam room, you bet I wont bother using these equations. I will enter 18

numbers into the calculator and let the calculator do the math for me. This way, I dont

have to think. I just enter the numbers and the calculator will spit out the answer for me.

And I know that my result is 100% right.

#7

Do linear interpolation

Another use of LIN is to do linear interpolation. You are given two data pairs ( x1 , y1 ) and

interpolation.

The equation for linear interpolation is this:

y3

x3

y1 y2

=

x1 x2

y3 =

y1

= slop of line ( x1 , y1 ) and ( x2 , y2 )

x1

y2

x2

y1

( x3

x1

x1 ) + y1

To use LIN for linear interpolation, please note that the least squares regression line for

two data points ( x1 , y1 ) and ( x2 , y2 ) is just an ordinary straight line connecting ( x1 , y1 )

and ( x2 , y2 ) . To find y3 , we simply find the least squares regression line a + bX for

( x1 , y1 )

46

121 493 738 775

1078 1452 2054 2199 3207

Determine the smoothed empirical estimate of the 90th percentile, as defined in Klugman,

Panjer, and Willmot.

Solution

100k

percentile. For example, the 1st observation 46

order. Then the k -th number is the

n +1

100 (1)

100 ( 2 )

= 9.09 percentile; the 2nd observation 121 is the

= 18.18 percentile.

is the

10 + 1

10 + 1

So on and so forth.

To find the smoothed estimate of the 90-th percentile, we linearly interpolate between the

100 ( 9 )

9-th observation, which is

= 81.82 -th percentile, and the 10th observation, which

10 + 1

100 (10 )

is

= 90.91 -th percentile.

10 + 1

2,199

x90

3,207

81.82

90

90.91

x90 = x81.82 +

percentile

90 81.82

( x90.91 x81.82 )

90.91 81.82

= 2,199 +

90 81.82

( 3, 207 2,199 ) = 3,106.09

90.91 81.82

Next, Ill show you two shortcuts. One is without using LIN; the other with using LIN.

Since the k -th number is the

100k

100k

percentile, the m =

percentile corresponds to

n +1

n +1

m ( n + 1)

- th observation. For example, the 81.82-th percentile corresponds to

100

81.82 (10 + 1)

= 9 -th observation; 90.91-th percentile corresponds to the

100

90.91(10 + 1)

= 10 -th observation.

100

Important Rules:

The k -th observation is the

100k

percentile.

n +1

m ( n + 1)

- th observation.

100

Once you understand the above two rules, you can quickly find the 90-th percentile.

Set m = 90 : k =

m ( n + 1) 90 (10 + 1)

=

= 9.9 . So 9.9-th observation is what we are

100

100

looking for.

Of course, there isnt 9.9-th observation. So we need to find it using linear interpolation.

2,199

x90

3,207

9.9

10

x90 = 2,199 +

9.9 9

( 3, 207 2,199 ) = 3,106.2

10 9

You see that this linear interpolation is must faster than the previous linear interpolation.

We have two data pairs (9, 2,199) and (10, 3,207). As said before, if you have only two

points, then the least squares line is just the ordinary line connecting the two points. We

are interested in finding the ordinary straight line connecting (9, 2,199) and (10, 3,207).

So well use the LIN function to find the least squares line, which is the ordinary line.

Enter the following into LIN:

X01=9,

X02=10,

Y01=2199

Y02=3207

Youll find that: a = 6,873 , b = 1, 008 , r = 1 . The correlation coefficient should be one

because we have only two data pairs. Two data points always produce perfectly linear

relationship. So if your r is not equal to one, you did something wrong.

Next, set X ' = 9.9 . Press CPT. You should get: Y ' = 3,106.2 . This is the 90th percentile

you are looking for.

Guo Fall 2009 C, Page 48 / 284

Example 2

You are given the following values of the cdf of a standard normal distribution:

Use linear interpolation, calculate , ( 0.443)

Solution

, ( 0.443 ) =

0.5 0.443

0.443 0.4

, ( 0.4 ) +

, ( 0.5 )

0.5 0.4

0.5 0.4

= 0.57 ( 0.6554 ) + 0.43 ( 0.6915) = 0.6709

This approach is prone to errors. The math logic is simple, but there are simply too many

numbers to calculate. And its very easy to make a mistake, especially in the heat of the

exam.

To quickly solve this problem, well use LIN. Enter the following data:

X01=0.4, Y01=0.6554

X02=0.5, Y02=0.6915

2nd STAT (keep pressing 2nd Enter until you see LIN)

Press the down arrow key , youll see n = 2

Press the down arrow key , youll see X = 0.45

Press the down arrow key , youll see S X = 0.07071068

= 0.05

, youll see

Press the down arrow key

, youll see S y = 0.02552655

, youll see

Press the down arrow key

Press the down arrow key

Press the down arrow key

Enter X ' = 0.443

, youll see b = 0.361

, youll see r = 1 (this is the correlation coefficient)

, youll see X ' = 0.00

= 0.01805

Press CPT. Youll get Y ' = 0.670923

So , ( 0.443) = 0.670923

In the above example, after generating , ( 0.443) = 0.670923 , you want to generate

, ( 0.412345 ) , this is what you do:

Enter X ' = 0.412345

Press the down arrow key .

Press CPT. Youll get Y ' = 0.65985655 . This is , ( 0.412345 ) .

If you want to generate , ( 0.46789 ) , this is what you do:

Enter X ' = 0.46789

Press the down arrow key .

Press CPT. Youll get Y ' = 0.67990829 . This is , ( 0.46789 ) .

General procedure

Given two data pairs ( c1 , d1 ) and ( c2 , d 2 ) and a single data c3 , to use BA II Plus and BA

II Plus Professional LIN Worksheet to generate d3 , enter

X01= c1 , Y01= d1

X02= c2 , Y02= d 2

X ' = c3

must be entered as Y ' s .

Example 3

You are given the following values of the cdf of a standard normal distribution:

Use linear interpolation, find a, b, c , and e (all these are positive numbers) such that

, ( a ) = 0.6666

, ( b ) = 0.6777

, ( c ) = 0.6888

Guo Fall 2009 C, Page 50 / 284

, ( d ) = 0.6999

Solution

X01=0.6554, Y01=0.4

X02=0.6915, Y02=0.5

Enter X ' = 0.6666 . Then the calculator will generate Y ' = 0.43102493 .

So a = 0.43102493 .

Enter X ' = 0.6777 . Then the calculator will generate Y ' = 0.46177285

So b = 0.46177285 .

Enter X ' = 0.6888 . Then the calculator will generate Y ' = 0.49252078

c = 0.49252078

Enter X ' = 0.6999 . Then the calculator will generate Y ' = 0.52326870

So d = 0.52326870

Example 4

The population of a survivor group is assumed to be linear between two consecutive ages.

You are given the following:

Age

50

51

598

534

50.2, 50.5, 50.7, 50.9

Solution

X01=50, Y01=598

X02=51, Y02=534

Enter

Enter

Enter

Enter

X'

X'

X'

X'

= 50.5 . Then the calculator will generate

= 50.7 . Then the calculator will generate

= 50.9 . Then the calculator will generate

Y'

Y'

Y'

Y'

= 585.2

= 566

= 553.2

= 540.4

Guo Fall 2009 C, Page 51 / 284

Chapter 2

Basic idea

An urn has two coins, one fair and the other biased. In one flip, the fair coin has 50%

chance of landing with heads, while the biased one has 90% chance of landing with

heads. Now a coin is randomly chosen from the urn and is tossed. The outcome is a head.

Question: Which coin was chosen from the urn? The fair coin or the biased coin?

Imagine you have entered a bet. If your guess is correct, youll earn $10. If your guess is

wrong, youll lose $10. How would you guess?

Most people will guess that the coin chosen from the urn was the biased coin; the biased

coin is far more likely to land on heads.

This simple example illustrates the intuition behind the maximum likelihood estimator. If

we have to estimate a parameter from an n -size sample X 1 , X 2 ,, X n , we can choose a

parameter that has the highest probability to be observed.

Example. You flip a coin 9 times and observe HTTTHHHTH. You dont know whether

the coin is fair and you need to estimate the probability of getting H in one flip.

Let p represent the probability of getting a head in one flip. The probability for us to

observe HTTTHHHTH is

P ( HTTTHHHTH p ) = p5 (1 p )

p

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

P ( HTTTHHHTH p ) = p5 (1 p )

0.000000000

0.000006561

0.000131072

0.000583443

0.001327104

0.001953125

0.001990656

0.001361367

0.000524288

0.000059049

0.000000000

Guo Fall 2009 C, Page 52 / 284

If we have to guess p among the possible values 0, 0.1, 0.2, , we might guess p = 0.6 ,

which has the highest probability to produce the outcome of HTTTHHHTH.

A coin is tossed n times and x number of heads are observed. Let p represent the

probability that a head shows up in one flip of coin. Calculate the maximum likelihood

estimator of p .

Step One

function)

Write the probability that the observed event happens (the likelihood

n x

Step Two

Take logarithms of the likelihood function (called log-likelihood

function). This step simplifies our calculation (as youll see soon).

ln P ( getting x heads out of n flips p ) = ln Cnx + x ln p + ( n x ) ln (1 p )

Step Three Take the 1st derivative of the log-likelihood function regarding the

parameter. Set the 1st derivative to zero.

d

ln P ( getting x heads out of n flips p ) = 0

dp

d

ln Cnx + x ln p + ( n x ) ln (1 p ) = 0 ,

dp

d

d

d

ln Cnx + ( x ln p ) +

dp

dp

dp

(n

x ) ln (1 p ) = 0 ,

d

ln Cnx = 0 ,

dp

d

dp

(n

d

d

x

( x ln p ) = x ( ln p ) = ,

dp

dp

p

x ) ln (1 p ) = ( n x )

d

n x

ln (1 p ) =

1 p

dp

x n x

1 p n x 1 n

=0,

=

,

= ,

p 1 p

p

x

p x

p=

x

n

Nov 2000 #6

You have observed the following claim severities:

11.0,

15.2,

18.0,

21.0,

25.8

1

exp

2 x

f ( x) =

1

2

(x ) , x > 0 , > 0

2x

Solution

First, make sure you understand the theoretical framework.

Here we take a random sample of 5 claims X 1 , X 2 , X 3 , X 4 , and X 5 . We assume that

X 1 , X 2 , X 3 , X 4 , and X 5 are independent identically distributed with a common pdf

f ( x) =

1

exp

2 x

1

2

(x )

2x

f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

= f X ( x1 ) f X ( x2 ) f X ( x3 ) f X ( x4 ) f X ( x5 )

1

exp

2 x1

1

exp

2 x4

1

1

2

exp

( x1 )

2 x1

2 x2

1

( x4

2 x4

)

2

1

1

2

exp

( x2 )

2 x2

2 x3

1

exp

2 x5

1

2

( x3 )

2 x3

1

2

( x5 )

2 x5

P ( x1

X1

x1 + dx1 , x2

X2

x2 + dx2 , x3

X3

x3 + dx3 , x4

X4

x4 + dx4 , x5

X5

x5 + dx5 )

Guo Fall 2009 C, Page 54 / 284

Our goal is to find a parameter that will maximize our chance of observing X 1 , X 2 ,

X 3 , X 4 , and X 5 . To maximize our chance of observing X 1 , X 2 , X 3 , X 4 , and X 5 is to

maximize the joint pdf f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) . To maximize the joint pdf

equal to zero:

d

f X , X , X , X , X ( x1 , x2 , x3 , x4 , x5 ) = 0

d 1 2 3 4 5

Though we can solve the above equation by pure hard work, an easier approach is to find

a parameter that will maximize the log-likelihood of us observing X 1 , X 2 , X 3 , X 4 ,

and X 5 :

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

If ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) is maximized, f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) will

surely be maximized. So the task boils down to finding such that the 1st derivative of

the log pdf is zero:

d

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0

d

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

ln

i =1

1

exp

2 xi

1

( xi

2 xi

ln

i =1

1

2 xi

1

( xi

2 xi

d

d

ln

i =1

1

2 xi

1

( xi

2 xi

=0

1

is a constant and its derivative regarding is zero.

2 xi

d

d

5

i =1

1

( xi

2 xi

d

= 0,

d

2

( xi )

i =1

xi

=0

d

d

2

( xi )

i =1

xi

i =1

5

5

xi

d ( xi )

= 2

= 2

1

=0

d

xi

xi

xi

i =1

i =1

2

=0, 5

xi

i =1

1 1 1 1 1

+ + + +

=0

x1 x2 x3 x4 x5

5

5

=

= 16.74

1 1 1 1 1

1

1

1 1

1

+ + + +

+

+ + +

x1 x2 x3 x4 x5 11 15.2 18 21 25.8

After understanding the theoretical framework and detailed calculation, we are ready to

use a shortcut. First, lets isolate the variable :

f ( x) =

1

exp

2 x

1

2

(x )

2x

f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

exp

i =1

1

( xi

2 xi

2

( xi )

i =1

xi

d

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0

d

1

2

(x )

2x

exp

5

i =1

5

xi

d ( xi )

= 2

=0

d

xi

xi

i =1

2

5

5

=

= 16.74

1 1 1 1 1

1

1

1 1

1

+ + + +

+

+ + +

x1 x2 x3 x4 x5 11 15.2 18 21 25.8

521

658

702

819

1217

F ( x) = 1

500

x

Solution

From Exam C Table, you should be able to find:

f ( x) =

! 500!

x! +1

5

! 500!

i =1

xi! +1

f ( x1 , x2 , x3 , x4 , x5 ) = "

ln f ( x1 , x2 , x3 , x4 , x5 ) = ln

! 5 5005!

( x1 x2 x3 x4 x5 )

! +1

! 5 5005!

( x1 x2 x3 x4 x5 )

! +1

= 5ln ! + 5! ln 500

(! + 1) ln ( x1 x2 x3 x4 x5 )

d

5

ln f ( x1 , x2 , x3 , x4 , x5 ) = + 5 ln 500 ln ( x1 x2 x3 x4 x5 ) = 0

d!

!

5

! = 2.453

You are given the following information about a random sample:

The sample size equals five

The sample is from a Weibull distribution with $ = 2

Two of the sample observations are known to exceed 50, and the remaining three

observations are 20, 30, and 45

Calculate the maximum likelihood estimator of % .

Solution

From Exam C table, youll find the Weibull pdf and cdf:

f ( x) =

%

x

2x

%2

F ( x) = 1 e

, S ( x) = e

Guo Fall 2009 C, Page 57 / 284

x1 > 50 , x2 > 50 , x3 = 20 , x4 = 30 , x5 = 45

The likelihood function is:

L (% ) = f ( 20 ) f ( 30 ) f ( 45 ) S ( 50 ) S ( 50 )

=

2 ( 20 )

%2

exp

L (% )

20

2 ( 30 )

%2

8,325

30

exp

ln L (% ) = k

%2

2 ( 8,325 ) 6

d

ln L (% ) =

= 0,

d%

%3

%

2 ( 40 )

%6

8,325

%2

exp

40

exp

50

exp

50

% = 52.7

Fisher Information

One key theorem you need to memorize for Exam C is that the maximum likelihood

1

:

estimator % is approximately normally distributed with mean %0 and variance

I (% )

N %0 ,

I (% )

Here %0 is the true parameter. L ( x,% ) , called Fisher information or information, is the

variance of

I (% ) = VarX

d

ln L ( x,% ) :

d%

d

ln L (% ) = E X

d%

d

ln L ( x, % )

d%

= EX

d2

ln L ( x, % )

d% 2

Please note in the above equation, the expectation and variance are regarding X .

N %0 ,

I (% )

Youll just need to memorize it. However, Ill show you why

Guo Fall 2009 C, Page 58 / 284

I (% ) = VarX

d

ln L ( x,% ) = E X

d%

d

ln L ( x,% )

d%

= EX

d2

ln L ( x,% )

d% 2

First, let me introduce a new concept to you called score. The term score is not the

syllabus. However, its a building block for Fisher information. So lets take a look.

Assume we have observed x1 , x2 ,, xn . Let L ( x,% ) represent the likelihood function.

n

i =1

When calculating the maximum likelihood estimator % , we often use the log-likelihood

function. So lets consider log-likelihood function, ln L ( x,% ) . The derivative of the logd

ln L ( x, % ) , is called the score of the

d%

log-likelihood function. Lets find the mean and variance of the score.

d

1 d

ln L ( x,% ) =

L ( x, % )

d%

L (% ) d%

Using the standard formula E X g ( x ) = & g ( x ) f ( x ) dx , we have:

EX

=&

d

1

d

L ( x, % )

ln L ( x,% ) = E X

d%

L ( x , % ) d%

d

1

d

d

L ( x, % ) L ( x, % ) dx = &

L ( x, % ) dx =

d%

L ( x , % ) d%

d%

& L ( x,% ) dx

density

random variable

However,

EX

d

1

d

d

ln L ( x, % ) = E X

1= 0

L ( x, % ) =

d%

L ( x , % ) d%

d%

d

ln L ( x,% )

d%

= E

d2

ln L ( x,% )

d% 2

We know that E X

d

d

ln L ( x, % ) = &

ln L ( x, % ) L ( x,% ) dx = 0

d%

d%

d

d%

Moving

d

d%

&

d

d

ln L ( x, % ) L ( x,% ) dx =

0=0

d%

d%

d

inside the integration, we have:

d%

d

d

ln L ( x, % ) L ( x,% ) dx = &

d%

d%

&

d

d%

d

ln L ( x, % ) L ( x, % ) dx

d%

d

d

d

u ( x) v ( x) = u ( x)

v ( x) + v ( x)

u ( x ) , we have:

dx

dx

dx

d

ln L ( x, % ) L ( x,% )

d%

= L ( x, % )

However,

d d

d

d

ln L ( x, % ) +

ln L ( x, % )

L ( x, % )

d% d%

d%

d%

d

d

1

ln L ( x,% ) =

L ( x, % ) .

d%

L ( x,% ) d%

d

d

L ( x , % ) = L ( x, % )

ln L ( x, % )

d%

d%

So we have:

d

d%

d

ln L ( x,% ) L ( x,% )

d%

= L ( x, % )

d d

d

d

ln L ( x, % ) +

ln L ( x, % )

L ( x, % )

d% d%

d%

d%

= L ( x, % )

d d

d

ln L ( x, % ) + L ( x, % )

ln L ( x, % )

d% d%

d%

d2

d

= L ( x, % ) 2 ln L ( x, % ) + L ( x, % )

ln L ( x, % )

d%

d%

= L ( x, % )

d2

d

ln L ( x, % ) +

ln L ( x, % )

2

d%

d%

Then

&

d

d%

&

d

ln L ( x, % ) L ( x,% ) dx = 0 becomes:

d%

d2

d

ln L ( x, % ) L ( x,% ) dx + &

ln L ( x, % )

2

d%

d%

However,

L ( x, % ) dx = 0

&

d2

d2

L

x

L

x

dx

E

ln

,

%

,

%

=

ln L ( x,% ) ,

(

)

(

)

d% 2

d% 2

&

d

ln L ( x, % )

d%

L ( x, % ) dx = E

d

ln L ( x, % )

d%

d2

d

Then it follows that E

ln L ( x, % ) + E

ln L ( x, % )

2

d%

d%

Var

= 0.

d

ln L ( x, % ) = 0 , it follows that

d%

d

d

ln L ( x, % ) = E

ln L ( x,% )

d%

d%

The score

= E

d2

ln L ( x,% )

d% 2

d

ln L ( x, % ) has

d%

d

zero mean and variance E

ln L ( x, % )

d%

d2

ln L ( x,% )

= E

d% 2

The information associated with the maximum likelihood estimator of a parameter % is

4n , where n is the number of observations.

Solution

()

()

( )

()

Var 2% = 4Var % = 4

1

4n

1

1

= .

4n

n

Guo Fall 2009 C, Page 61 / 284

Suppose the random variable X has density function f ( x,% ) . If g ( x ) is any unbiased

estimator of % , then Var g ( x ) '

Var f ( x,% )

& g ( x ) f ( x,% ) dx = % .

Taking derivative regarding % at both sides of the above equation:

d

d

% =1

g ( x ) f ( x, % ) dx =

&

d%

d%

Moving

d

inside the integration:

d%

d

d

g ( x ) f ( x, % ) dx = &

g ( x ) f ( x, % ) dx = 1

&

d%

d%

d

& d%

However,

g ( x ) f ( x, % ) dx = & g ( x )

d

f ( x, % )dx = 1

d%

d

d

f ( x, % ) = f ( x, % )

ln f ( x, % ) . So we have

d%

d%

d

d

& g ( x ) d% f ( x,% )dx = & g ( x ) d% ln f ( x,% ) f ( x,% ) dx = 1

However,

d

d

& g ( x ) d% ln f ( x,% ) f ( x,% ) dx = E g ( x ) d% ln f ( x,% )

EX g ( x )

d

ln f ( x, % ) = 1 .

d%

Cov g ( x ) ,

d

ln f ( x, % ) .

d%

{g ( x )

d

ln f ( x,% ) = E X

d%

Eg ( x )}

d

d

ln f ( x, % ) E

ln f ( x,% )

d%

d%

However, E X g ( x ) = % , E X

E(X ) Y

E (Y ) .

d

d

ln f ( x,% ) is the score and has

ln f ( x, % ) = 0 .

d%

d%

{ g ( x ) % } dd% ln f ( x,% )

Cov g ( x ) ,

d

ln f ( x, % )

d%

= EX g ( x )

d

d

ln f ( x,% ) %

ln f ( x, % )

d%

d%

= EX g ( x )

d

ln f ( x,% )

d%

EX %

d

ln f ( x, % )

d%

= EX g ( x )

d

ln f ( x, % )

d%

% EX

d

ln f ( x,% )

d%

= EX

=1 % 0 =1

Cov g ( x ) ,

d

ln f ( x, % ) = 1

d%

1 , we

have:

Cov ( X , Y )

= * X ,Y + X + Y

d

1 = Cov g ( x ) ,

ln f ( x,% )

d%

[+ X + Y ]

= Var ( X ) Var (Y )

Var g ( x ) Var

d

ln f ( x,% )

d%

Var g ( x ) '

Var

d

ln f ( x, % )

d%

For an unbiased estimator g ( x ) , its variance is no less than the reciprocal of the variance

of the score

d

ln f ( x,% ) .

d%

1

Var g ( x ) '

d

Var

ln f ( x,% )

d%

likelihood estimator, then the density function is:

f ( x,% ) = f ( x1 , % ) f ( x2 ,% ) ... f ( xn ,% ) = L ( x, % )

When the

d

ln f ( x,% ) meets certain condition, Var g ( x ) =

d%

. We

d

Var

ln f ( x,% )

d%

are not going to worry about what these conditions are. All we need to know is that for

the maximum likelihood estimator g ( x ) , when n , the sample size of the observed data

X 1 , X 2 ,..., X n , approaches infinity, the variance of g ( x ) approaches

1

Var

d

ln L ( x, % )

d%

Var (% ) .

1

d

Var

ln L ( x, % )

d%

without proof):

Assume that random variable X has density f ( x;%1 ,% 2 ,...,% k ) . The covariance

Ii , j = E

/ 2 ln f ( x;%1 ,% 2 ,...,% k )

/%i /% j

=E

/ 2 ln L ( x;%1 , % 2 ,..., % k )

/%i /% j

/2

ln L ( x;%1 ,% 2 )

/%12

E

I=

E

/2

ln L ( x;%1 ,% 2 )

/%1/% 2

E

E

/2

ln L ( x;%1 ,% 2 )

/%1/% 2

/2

ln L ( x;%1 ,% 2 )

/% 22

/2

ln L ( x;%1 , % 2 )

/%1/% 2

Then

Cov (%1 , %1 ) = Var (%1 )

Cov (%1 , % 2 )

=I

Cov (% 2 , %1 )

Cov (% 2 , % 2 ) = Var (% 2 )

A sample of ten observations comes from a parametric family f ( x, y;%1 , % 2 ) with log

likelihood function

ln L (%1 , % 2 ) =

10

i =1

where k is a constant.

Determine the estimated covariance matrix of the maximum likelihood estimator

%1

.

%2

Solution

Guo Fall 2009 C, Page 65 / 284

/2

ln L (%1 ,% 2 ) = E

/%12

/2

2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 5) = 5

2 (

/%1

/2

ln L (%1 , % 2 ) = E

/% 22

/2

2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 2 ) = 2

2 (

/% 2

/2

ln L (%1 , % 2 ) = E

/%1/% 2

/2

2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 3) = 3

(

/%1/% 2

I=

5 3

3 2

a b

c d

d

ad bc b

c

, if ad bc 0 0

a

Var (%1 )

Cov (%1 ,% 2 )

=I

Cov (%1 ,% 2 ) Var (% 2 )

5 3

=

3 2

2

5 2 3 3 3

1

3

2

=

5

3

3

5

Fisher Information matrix is good for estimating the variance and covariance of a series

of maximum likelihood estimators. What if we need to estimate the variance and

covariance of a function of a series of maximum likelihood estimators? We can use the

delta method.

Delta method

Assume that random variable X has mean X and variance + X2 . Define a new function

f ( X ) . f ( X ) + f / ( X )( X

X )

Take variance at both sizes and notice that f ( X ) and f / ( X ) are constants:

Var f ( X ) . Var f ( X ) + f / ( X )( X

X )

= f / ( X ) Var ( X

X ) = f / ( X ) Var ( X )

2

( )

d

X .

X

dx

Var ( X ) =

X =X

1

2 X

Var ( X )

To get a feel of this formula, set Y = f ( X ) = cX , where c is a constant. Then the delta

formula becomes: Var [ cX ] . c 2Var ( X ) .

We can rewrite the formula Var f ( X ) . f / ( X ) Var ( X ) as

2

Var f ( X ) . f / ( X ) Var ( X ) f / ( X )

()

parameter % . Please note that % is a random variable. For example, if % is the maximum

likelihood estimator, % varies depending on the sample size and on the sample data we

have observed. Also assume based on the sample data we have, we get one estimator %0 .

()

Set X = % and E ( X ) = E % :

()

Var f %

()

. f/ E %

()

Var %

()

()

. Assume that,

based on your sample data on hand, the maximum likelihood estimators for the true

parameters % is a . Then we might want to set % . a .Then we have:

()

Var f %

()

. f / ( a ) Var %

2

X has mean X and variance + X2 ; random variable Y has mean Y and variance + Y2

f ( X , Y ) . f ( X , Y ) + f X/ ( X , Y )( X

X ) + fY/ ( X , Y )(Y X )

f X/ ( X , Y ) , and fY/ ( X , Y ) are all constants:

Var f ( X , Y )

. f X/ ( X , Y ) Var ( X

2

X ) + fY/ ( X , X ) Var (Y X )

2

+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov

( X X ) , ( X X )

2

+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov ( X , Y )

Var f ( X , Y )

. f

/

X

( X , Y )

/

X

Var ( X ) Cov ( X , Y )

Cov ( X , Y )

Var ( Y )

( X , Y )

f X/ ( X , Y )

f X/ ( X , Y )

likelihood estimators. As a simple case, say we have two maximum likelihood estimators

( )

( )

X = E %1 , X = E % 2 , we have:

Var f %1 , % 2

/

fE %

(% ,% )

1

( )

Var %1 + f E %

/

( )

2

(% ,% )

1

( )

Var % 2 + 2 f E/ % %1 ,% 2 f E/ %

( )

1

( )

2

(% ,% ) Cov (% ,% )

1

( )

( )

and E % 2 = % 2 . Then

Var f %1 , % 2

f%/1 %1 ,% 2

( )

Var %1 + f%/2 %1 ,% 2

( )

) (

) (

However, we dont know the true value of %1 and % 2 . Nor do we know f%/1 %1 ,% 2 and

f%/2 %1 ,% 2 . Assume that, based on your sample data on hand, the maximum likelihood

estimators for the true parameters %1 and % 2 are a and b respectively. Then we might

want to set

1

f %1 , % 2

1%1

f%/1 %1 , % 2 =

1

f %1 , % 2

1% 2

f%/2 %1 , % 2 =

1

f %1 , % 2

1%1

.

%1

1

f %1 , % 2

1% 2

.

%2

,

%1 = a

)

% 2 =b

Then we have:

Var f %1 , % 2

1

f %1 , % 2

1%1

+2

( )

1

f %1 , % 2

1% 2

Var %1 +

%1 = a

1

f %1 , % 2

1%1

%1 = a

1

f %1 , % 2

1% 2

1

f %1 , % 2

1% 2

Var f %1 , % 2

+2

as

% 2 =b

% 2 =b

%2 =b

1

f %1 , % 2

1% 2

1

f %1 , % 2

1%1

as

%1 = a

1

f %1 ,% 2

1%1

)

%1

. Then

%2

1

f %1 , % 2

1%1

( )

Var % 2

Cov %1 , % 2

and

1

f %1 , % 2

1%1

( )

Var %1 +

%1

%1

1

f %1 , % 2

1% 2

1

f %1 , % 2

1% 2

Cov %1 , % 2

( )

Var % 2

%2

%2

1

f %1 , % 2

1%1

1

f %1 , % 2

1%1

1

f %1 , % 2

1% 2

and that

%1 = a

really means

%1

really means

%2

1

f %1 , % 2

1% 2

.

% 2 =b

[ ]%

likelihood estimator.

Expressing the above formula in a matrix:

Var f %1 , % 2

. f% %1 ,% 2

/

f% %1 , % 2

/

Cov % 1 ,% 2

( )

Var % 1

( )

Var % 1

Cov % 1 , % 2

Cov % 1 , % 2

Cov % 1 ,% 2

( )

Var % 2

( )

Var % 2

( )

(% ,% )

f%/ %1 ,% 2

1

f%/

You model a loss function using lognormal distribution with parameters and + . You

are given:

The maximum likelihood estimates of and + are

= 4.215

+ = 1.093

0.1195

0

0

0.0597

1

The mean of the lognormal distribution is exp + + 2

2

Estimate the variance of the maximum likelihood estimate of the mean of the lognormal

distribution, using the delta method.

Guo Fall 2009 C, Page 70 / 284

Solution

1

The mean function is f ( , + ) = exp + + 2 . The maximum likelihood estimator of

2

1 2

f ( , + ) is f , + = exp + + , where and + are maximum likelihood

2

estimator of and + respectively.

( )

( )

1 2

= Var exp + +

2

( )

( ) ( ) +

1

f ,+

1

f ,+ . f ( ,+ ) +

( ) (+ + )

1

f ,+

1+

( )

Var f , +

1

.

f ,+

1

( )

+2

1

Var +

f ,+

1+

( )

( )

1

f ,+

1

( )

( )

1

f ,+

1+

( )

Var +

+

( )

Cov , +

+

0.1195

0

0

0.0597

( )

( )

( )

( )

Var f , +

( )

1

.

f ,+

1

( )

1

0.1195 +

f ,+

1+

0.0597

+

( )

1

f ,+

1

and

( )

1

f ,+

1+

.

+

Consequently, we set

Guo Fall 2009 C, Page 71 / 284

( )

1

f ,+

1

( )

1

f ,+

1

( )

1

f ,+

1+

.

+

( )

1 2

= exp + +

2

( )

1 2

= + exp + +

2

1

1

1 2

f ,+ =

exp + +

2

1

1

1

1

1 2

f ,+ =

exp + +

2

1+

1+

( )

1

f ,+

1

( )

1

f ,+

1

1 2

. exp + +

2

1 2

. + exp + +

2

( )

Var f , +

( )

1

f ,+

1+

1

. exp 4.125 + 1.0932 = 123.02

2

1

. 1.093exp 4.125 + 1.0932 = 134.46

2

Please note that you can also solve this problem using the black-box formula

Var f %1 , % 2

.

+2

1

f %1 , % 2

1%1

1

f %1 , % 2

1%1

( )

Var %1 +

%1

%1

1

f %1 , % 2

1% 2

1

f %1 , % 2

1% 2

Cov %1 , % 2

( )

Var % 2

%2

%2

However, I recommend that you first solve the problem using Taylor series

approximation. This forces you to understand the logic behind the messy formula. Once

you understand the formula, next time you can use the memorized formula for

Var f %1 , % 2

The time to an accident follows an exponential distribution. A random sample of size two

has a mean time of 6. Let Y represent the mean of a new sample of size two.

Guo Fall 2009 C, Page 72 / 284

Use the delta method to approximate the variance of the maximum likelihood estimator

of FY (10 ) .

Solution

The time to an accident follows an exponential distribution. Assume % is the mean for

this exponential distribution. If X 1 and X 2 are two random samples of time-to-accident,

then the maximum likelihood estimator of % is just the sample mean. So % = 6 .

Pr (Y > 10 ) = Pr

X1 + X 2

> 10 = Pr ( X 1 + X 2 > 20 )

2

Pr ( X 1 + X 2 > 20 ) =

te t 6

&20 36 dt

To calculate

&

&

+2

a

+2

a

x2

te t 6

& 36 dt , youll need to memorize the following shortcut:

20

x /%

x /%

dx = (a + % ) e

dx =

(a + % )

a /%

+% 2 e

a /%

If interested, you can download the proof of this shortcut from my website

http://www.guo.coursehost.com. The shortcut and the proof are in the sample chapter of

my P manual. Just download the sample chapter of P manual and youll get the proof and

more worked out examples using this shortcut.

2

te t 6

1

1

&20 36 dt = 6 20& t 6 e

t 6

dt =

1

[ 20 + 6] e

6

20 6

= 0.1546

FY (10 ) = Pr ( X 1 + X 2

20 ) =

20

&

0

te

t%

%2

dt

FY (10 ) =

20

&

0

te

t %

2

dt

2

20

&

te

()

t %

2

1

1%

dt .

20

&

0

t %

te

()

Var %

dt

()

E % .6

X1 + X 2

1

1

1

= ( 2 ) Var ( X ) = % 2 . ( 6 2 ) = 18

2

4

2

2

Please note that the two samples X 1 and X 2 are independent identically distributed with

( )

1

1%

20

&

0

t %

te

dt =

20

&

0

20

&t

0

( 20 + % ) e

1

1

1%

1

1%

1+

te

20

20 %

20 %

t %

2

dt

()

E % .6

t %

&

&t

1+

400

dt .

1

1%

t %

te

dt =

=1

20

20

1+

20

&t

dt

20

t %

dt

20 %

20

exp

t %

20 %

400

20 %

= 0.066

6

Var FY (10 ) .

1

1%

20

&

0

te

t %

2

dt

()

()

E % .6

Chapter 3

Kernel smoothing

Kernel smoothing

=Set your point estimate equal to the average of a neighborhood

=Recalculate at every point by averaging this point and the nearby points

Let me illustrate this with a story. You want to buy a house. After looking at many

houses, you find one house you like most. You go the current owner of the house and ask

for the price. The current owner tells you, Im asking for $210,000. Make me an offer.

What are you going to offer? 200,000? $203,000? $205,000 or something else? You are

not sure. And you know the danger: if your offer is too high, the seller accepts your offer

and youll overpay the house; if your offer is too low, youll look stupid and the seller

may refuse to deal with you anymore. So to your best interest, youll want to make your

offer reasonable, not too high, not too low.

If you talk to someone experienced in the real estate market, hell tell you how (and this

works): instead of making a random offer, you can make your offering price to be around

the average selling price of the similar houses sold in the same neighborhood.

Say four similar houses in the same neighborhood are sold this year. Their prices are

$198,000, $200,000, $201,000, and $202,000. So the average selling price is $200,250. If

the house you want to be is truly similar to these four houses, then the seller is asking for

too much. You can offer around $200, 250 and explain to the seller that your asking price

is very similar to the selling price of the houses in the same neighborhood. A reasonable

seller will be willing to lower his asking price.

What advantage do we gain by looking at a neighborhood? A smoothed, better estimate.

If we focus on one house alone, its selling price appears random. However, when we

broaden our view and look at many similar houses nearby, well remove the randomness

of the asking price and see a more reasonable price.

This simple story illustrates the spirit of kernel smoothing. When we want to estimate

f X ( x ) , probability density of a random variable X at point x . Instead of looking at one

# of x's in the sample

, we may want to look at the x s

sample size n

neighborhood. For example, we may want to look at 3 data points x b , x , and x + b

where b is a constant. Then we calculate the average of empirical densities at x b , x ,

and x + b and use it as an estimator of f X ( x ) :

point x and say f

( x) = p ( x) =

( x) =

1

1

1

p ( x b) + p ( x) + p ( x + b)

3

3

3

calculate f ( x ) by averaing the empirical densities

of a neighborhood x b , x , x + b

Please note the analogy of determining the house price is not perfect. Theres one small

difference between how we estimate the price of a house located at x and how we

estimate f X ( x ) . When we estimate the fair price of a house located at x , we exclude the

data point x because we dont know the value of the house located at x :

Value of a house located at x

= 0.5 *value of the houses located at x b + 0.5 *value of the houses located at x + b

In contrast, when we estimate the density at x , we include the empirical density p ( x ) in

our estimate:

f

( x) =

1

1

1

p ( x b) + p ( x) + p ( x + b)

3

3

3

We include p ( x ) in our f

( x)

calculation because f

( x)

by itself is an estimate of

we use the empirical density p ( x ) =

time, we refine our estimate f

to estimate f X ( x ) . The 2nd

sample size n

x and its nearby points x b and x + b . This is why kernel smoothing recalculates at

every point by averaging this point and its nearby points.

Of course, we can expand our neighborhood. Instead of looking at only two nearby points,

we may look at 4 nearby points and calculate the average empirical density of a 5-point

neighborhood:

f

( x) =

1

1

1

1

1

p ( x 2b ) + p ( x b ) + p ( x ) + p ( x + b ) + p ( x + 2b )

5

5

5

5

5

calculate f ( x ) by averaing the empirical densities

of a neighborhood x 2b , x b , x , x +b , x + 2b

In addition, we dont need to use equal weighting. We can assign more weight to the data

points near x . For example, we can set

f

( x) =

1

2

4

2

1

p ( x 2b ) +

p ( x b) + p ( x) +

p ( x + b ) + p ( x + 2b )

10

10

10

10

10

Now you understand the essence of kernel smoothing. Lets talk about the two major

issues to think about if you want to use kernel smoothing:

How big is the neighborhood? This is called the bandwidth. The bigger the

neighborhood, the greater the smoothing. However, if your neighbor is too big,

you may run the risk of over-smoothing and finding false patterns.

How much weight you do give to each data point in the neighborhood? For

example, you can assign equal weight to each data point in the neighborhood.

You can also give more weight to the data point closer to the point whose density

you want to estimate. There are many weighting methods out there for you to use.

The weighting method is called kernel.

Of these two factors, the bandwidth is typically more important than the weighting

method. Your final result may not change much if you use different weighting method.

However, if you change the bandwidth, your estimated density may change widely.

Theres some literature out there explaining in more details on how to choose a proper

bandwidth and a proper weighting method. However, for the purpose of passing Exam C,

you dont need to know that much.

3 kernels you need to know

Loss Models explains three kernels. Youll need to understand them.

Uniform kernel. This is one of the easiest weighting methods. If you use this

method to estimate density, youll assign equal weight to each data point in the

neighborhood.

Triangular kernel. Under this weighting method, you give more weight to the

data points that are closer to the point for which you are estimating density.

Gamma kernel. This is more complex but less important than the uniform kernel

and the triangular kernel. If you want to cut some corners, you can skip the

gamma kernel.

Now lets look at the math formulas. Lets focus on the uniform kernel first.

Uniform kernel

The uniform kernel for estimating density function:

0

ky ( x) =

1

2b

0

if x < y - b

if y - b

y+b

if x > y + b

Guo Fall 2009 C, Page 77 / 284

Lets look at the symbol k y ( x ) . Here x is your target data point (the location of the house

you want to buy) for which you want to estimate the density (the fair price of the house

you want to buy). y is a data point in the neighborhood (location of a similar house in the

neighborhood). k y ( x ) is y s weight for estimating the density function of x .

The uniform kernel estimator of the density function at x :

f ( x)

p ( yi )

=

All yi

density function at x

k yi ( x )

empirical density of yi

yi 's weight

the empirical densities of the nearby points yi 's

if x < y - b

0

K y ( x) =

x y+b

if y - b x y + b

2b

1

if x > y + b

F ( x)

p ( yi )

=

All yi

distribution function at x

empirical density of yi

K yi ( x )

yi 's weight

average of the empirical densities of the nearby points yi 's

Now lets look at the formula for k y ( x ) . The formula looks intimidating. The good news

is that you really dont need to memorize it. You just need to understand the essence of

the uniform weighting method. Once you understand the essence, you can derive the

formula effortless on the spot.

Lets rewrite the uniform kernel formula as:

0

1

ky ( x) =

2b

0

if x < y - b

if y - b

if x > y + b

y+b

ky ( x) =

if y - x > b

1

2b

if y - x

A

x b

y1

y3

y4

B

x+b

y2

Here your neighborhood is [x b, x + b]. b is called the bandwidth, which is half of the

width of the neighborhood you have chosen. Now the formula for k y ( x ) becomes:

ky ( x) =

if y - x > b

1

2b

if y - x

ky ( x) =

if y is OUT of the

neighborhood [ x - b, x + b]

1

2b

if y is in the

neighborhood [ x - b, x + b]

If the data point y is out of the neighborhood [x b, x + b] , its weight is zero. We throw

this data point away and not use it in our estimation. And this should make intuitive sense.

In the neighborhood diagram, data points y1 and y2 are discarded.

If the data point y is in the neighborhood [x b, x + b], well use this data point in our

estimation and assign a weight 1 2b . In the neighborhood diagram, data points y3 and

y4 are used in the estimation and each gets a weight1 2b .

This is how we get 1 2b . Area ABCD represents the total weight we can possibly assign

to all the data points in the neighborhood. So well want the total area ABCD equal to

one.

1

Area ABCD = AB * BC = (2b) * BC =1, so BC =

.

2b

So for each data point that falls in the neighborhood AB, its weight is BC = 1 2b . For

each data point that falls out of the neighborhood AB, its weight is zero.

Now you shouldnt have trouble memorizing the uniform kernel formula for k y ( x ) .

Next, lets look at the formula for K y ( x ) , the weighting factor for the distribution

function at x :

Guo Fall 2009 C, Page 79 / 284

if x < y - b

0

K y ( x) =

x y+b

if y - b x y + b

2b

1

if x > y + b

Its quite complex to derive the K y ( x ) . So lets not worry about how to derive the

formula. Lets just find an easy way to memorize the formula. Once again, lets draw a

neighborhood diagram:

A

F

x b y

B

x+b

To find how much weight to give to the data point y toward calculating the F ( x ) , draw

a vertical line at the data point y (Line EF). Next, imagine that you use a pair of scissors

to cut off whats to the left of Line EF while keeping whats to the right of Line EF. Next,

calculate the area of the neighborhood rectangular ABCD thats remaining after the cut.

This remaining area of the neighborhood rectangular ABCD that survives the cut is

K y ( x ) . Lets walk through this rule.

If x y

Situation One

A

x b

F

y

B

x+b

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram

becomes:

F

y

B

x+b

Next, we calculate the area of the neighborhood rectangular ABCD that survives the cut.

After the cut, the original neighborhood rectangular ABCD shrinks to the rectangular

EFBC. The area of surviving area is:

EFBC = EF EC =

1

x y+b

( x + b y) =

2b

2b

Situation Two

the data point y .

F

y

A

x b

B

x+b

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram

is as follows:

F

y

A

x b

B

x+b

The original neighborhood rectangular ABCD completely survives the cut. So well set

K y ( x ) = ABCD = 1 .

Situation Three

the data point y .

A

x b

B

x+b

F

y

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram

is as follows:

F

y

E

The original neighborhood rectangular ABCD is completely cut off. So well set

K y ( x) = 0 .

Now you see that you really dont need to memorize the ugly K y ( x ) formula. Just draw

a neighborhood diagram, use a pair of scissors, choose y at the cutting point and cut off

the left side of the diagram. Then you just calculate the surviving area of the

neighborhood rectangle. The surviving area is the K y ( x ) .

Triangular kernel

In the uniform kernel, every data point in the neighborhood gets an identical weight of

1 2b . Say we have two data points in the neighborhood y3 and y4 , but y4 is closer to x

and y4 is farther away from x (see the diagram below).

A

x b

y3

x y4

B

x+b

However, often times it makes sense for us to give y4 more weight than y3 . For example,

x is the location of the house you want to buy; y3 and y4 are the locations of the two

similar houses in your neighborhood. It makes intuitive sense for us to give more weight

to the house located at y4 than the one located at y3 . If the house located at y3 was sold

at $200,000 and the house located at y4 was once sold at $198,000, we might want to

assign 40% weight to the house located at y3 and 60% to the one located at y4 . Then the

estimated fair price of the house located at x is:

60%* Price of the house located at y4 + 40% * Price of the house located at y3

= 60% * 198,000 + 40% * 200,000 = $198,800

Here comes the kernel smoothing. Kernel smoothing assigns more weight to a data point

closer to the point for which we need to estimate the density. Its assign less weight to a

data point farther away from the point for which we need to estimate the density.

Lets make sense of the triangular kernel formulas for k y ( x ) and K y ( x ) . First, lets look

at k y ( x ) :

0

b+ x y

b2

ky ( x) =

b+ y x

b2

0

if x < y - b

if y - b

if y

y+b

if x > y + b

0

b+ x y

b2

ky ( x) =

b+ y x

b2

0

if x < y - b

0

if y - b

if y

ky ( x) =

y+b

if x > y + b

equivalent to x - b y x .

x- y >b

if

b+ x y

b2

b+ y x

b2

if x

if x - b

x + b . And y

x+b

y

y + b is

D

H

F

y1

A

x b

E

y2

C

x

G

y3

B

x+b

y4

0

ky ( x) =

b+ x y

b2

b+ y x

b2

0

ky ( x) =

b+ x y

b2

b+ y x

b2

if

if x

x- y >b

y

if x - b

x+b

y

[ x, x + b ]

if y is in the left-half neighborhood, that is y

[ x - b, x ]

y1 and y4 are out of the neighbor and have zero weight.

Now lets find k y when the data point y is in the neighborhood [ x - b, x + b] . Data points

y2 and y3 are in the neighborhood and their weights are equal to the height EF and GH

respectively.

Before calculating EF and GH, let me give you a preliminary high school math formula.

This formula is used over and over in the triangle kernel smoothing:

In a triangle ABC,

DE EC

=

,

AB BC

DE = AB

1

DE EC

DEC 2

DE

=

=

ABC 1 AB BC

AB

2

EC

BC

EC

=

BC

DE

DEC = ABC

AB

EC

= ABC

BC

DE EC

If you dont understand why

=

and

AB BC

high school geometry.

DEC

EC

=

ABC

BC

Now lets come back to the following diagram and calculate EF and GH. EF is the weight

assigned to the data point y2 . GH is the weight assigned to the data point y3 .

D

H

F

y1

A

x b

E

y2

C

x

G

y3

B

x+b

y4

First, please note that the area of the triangle ABD represents the total weight assigned to

all the data points in the neighborhood [A, B]. So the area of the triangle ABD should be

one:

1

ABD = 0.5 * AB * CD = 1. However, AB= 2b .

0.5* 2b *CD=1, CD =

b

y

EF AE

AE

=

, EF =

CD = 2

CD AC

AC

(x

b

b ) 1 b + y2

=

b

b2

x + b y3 1 b + x y3

GH BG

BG

=

, GH =

CD =

=

CD BC

BC

b

b

b2

if y2

[x

b, x ] ;

if y3

[ x, x + b ]

Guo Fall 2009 C, Page 85 / 284

So we have:

if y is OUT of the neighborhood [ x - b, x + b ]

0

ky ( x) =

b+ x y

b2

b+ y x

b2

[ x, x + b ]

[ x - b, x ]

if x < y - b

(b + x

y)

if y - b

2b 2

K y ( x) =

(b + y

x)

2b 2

if y

y+b

if x > y + b

0

(b + x

y)

(b + y

to y

x)

2b 2

if y

[ x, x + b ]

if y

[ x - b, x ]

if y

( x + b, + )

2b 2

K y ( x) =

if y

, x b)

y is equivalent to y

[ x, x + b ]

and y

y + b equivalent

[ x - b, x ] .

D

H

F

y1

Situation One

A

x b

If y

E

y2

C

x

G

y3

B

x+b

y4

[ x, x + b ]

Draw a vertical line at the data point y (Line GH). Next, imagine that you use a pair of

scissors and cut off whats to the left of Line GH while keeping whats to the right of

Line GH. Next, calculate the area of the triangle ABD remaining after the cut. This

remaining area after the cut is K y ( x ) .

D

H

A

x b

C

x

G

y

B

x+b

H

G

y

B

x+b

Guo Fall 2009 C, Page 87 / 284

BG

K y ( x ) = BGH = BDC

BC

If y

Situation Two

1 x+b y

=

2

b

(x +b

=

y)

2b 2

[ x - b, x ]

D

A

x b

E

y

C

x

B

x+b

Draw a vertical line at data point y (Line EF). Cut off whats to the left of EF.

After the cut:

D

E

y

K y ( x ) = BDFE = 1

AE

AEF = ACD

AC

K y ( x ) = BDFE = 1

C

x

B

x+b

AEF

2

1 y

= !

2

(b + x

y)

(x

b

b)

"

(b + x

=

y)

2b 2

2b 2

If y

Situation three

, x b)

D

M

y

A

x b

C

x

B

x+b

Draw a vertical line MN at data point y . Cut off whats to the left of line MN. Now the

whole area ABD will survive the cut. So K y ( x ) = 1 .

If y

Situation Four

( x + b, + )

D

S

A

x b

C

x

B

x+b

R

y

Draw a vertical line RS at data point y . Cut off whats to the left of line RS. Now the

whole area ABD will be cut off. So K y ( x ) = 0 .

Now you see that you really dont need to memorize the complex formulas for K y ( x ) .

Just draw a diagram and directly calculate K y ( x ) .

Finally, lets look at the gamma kernel.

Gamma kernel

#x

ax y

y

x$ (# )

ky ( x) =

, where x > 0

To understand the gamma kernel, youll need to know this: in kernel smoothing, all the

weights should add up to one. Because of this, for convenience, we can use a density

function as weights. This way, the weights automatically add up to one.

(x % )

ky ( x)

(x % )

=

x %

x$ (# )

x# 1e x %

x# 1e

= #

=

#

% $ (# )

y

x %

x$ (# )

. However, we set % =

xa y

$ (# )

The simplest gamma pdf is when a = 1 (i.e. exponential pdf). So the simple gamma

kernel is an exponential kernel:

ky ( x) =

1

e

y

x y

, where x > 0

x

x

y

If you need to find the exponential kernel for F ( x ) , then K y ( x ) = & k y ( t )dt = 1 e .

0

Problem 1

A random sample of size 12 gives us the following data:

1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12

Use uniform kernel with bandwidth 2, calculate f ( 6 ) , F ( 6 )

Solution

uniform kernel with bandwidth 2

Guo Fall 2009 C, Page 90 / 284

discard any data points that are out side of the neighborhood [4, 8]. So 1, 2, 3, 3, 9, 9,11,

12 are discarded. We only consider 5, 6, 7, 8. Each of these four data points has a weight

of 1 / (2*b)=1/4.

So f ( 6 ) =

p ( y ) k y ( 6) =

1 1

1 1

1 1

1 1

1

+

+

+

=

12 4 12 4 12 4 12 4

12

In the calculation of F ( 6 ) , any data point that falls out of the lower bound or touches the

lower bound of the neighborhood [4, 8] gets a full weight of 1. Data 1, 2, 3, 3 are below

the lower bound of the neighborhood [4, 8] and they each get a weight of 1. Any data

point that falls out of the upper bound or touches the upper bound of the neighborhood [4,

8] get zero weight. So 8 (touching the upper bound) and 9, 9, 11, 12 (staying above the

upper bound) each get zero weight.

Data points y = 5, 6, 7 are in the neighborhood range [4, 8]. If you draw a diagram, youll

find that the weights for y = 5, 6, 7 are:

K5 ( 6 ) =

3

2

1

, K6 ( 6) = , K7 ( 6) =

4

4

4

F (6) =

p ( y ) K y ( 6)

1

1

1

1

1 3

1 2

1 1

+

+

' 0.4583

(1) + (1) + (1) + (1) +

12

12

12

12

12 4 12 4 12 4

y

11

12

1/4

1/4

1/4

1/4

3/4

2/4

1/4

p ( y ) 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12

k y (6)

K y ( 6)

Problem 2

1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12

Use triangle kernel with bandwidth 2, calculate f ( 6 ) , F ( 6 )

Solution

Guo Fall 2009 C, Page 91 / 284

y

2

3

3

5

6

7

8

9

9

11

12

1

p ( y ) 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12

k y (6)

K y ( 6)

1/4

1/2

1/4

7/8

1/2

1/8

f ( 6) =

F (6) =

=

p ( y ) k y ( 6) =

1 1

1 1

1 1

1

+

+

=

12 4 12 2 12 4

12

p ( y ) K y ( 6)

1

1

1

1

1 7

1 1

1 1

+

+

= 0.42708

(1) + (1) + (1) + (1) +

12

12

12

12

12 8 12 2 12 8

Problem 3

1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12

Use the gamma kernel with # = 1 , calculate f ( 6 ) , F ( 6 )

Solution

1

ky ( x) = e

y

f ( 6) =

1

12

1

+

12

x y

, K y ( x ) = & k y ( t )dt = 1 e

x

y

p ( y ) k y ( 6)

1 61

1 1 62

1

e

+

e

+

1

12 2

12

1 67

1 1 68

1

e

+

e

+

7

12 8

12

1 63

1

e

+

3

12

1 69

1

e

+

9

12

1 63

1 1 65

1 1 66

e

+

e

+

e

3

12 5

12 6

1 69

1 1 6 11

1 1

e

+

e

+

e

9

12 11

12 12

6 12

' 0.0248

F (6) =

1

(1 e

12

1

+ (1 e

12

=

p ( y ) K y ( 6)

6 1

6 7

) + 121 (1

6 2

) + 121 (1

6 8

) + 121 (1

63

) + 121 (1

69

) + 121 (1

63

) + 121 (1

69

) + 121 (1

65

) + 121 (1

) + 121 (1

6 11

) + 121 (1

66

6 12

' 0.658

Guo Fall 2009 C, Page 92 / 284

Nov 2003 #4

You study five lives to estimate the time from the onset of a disease to death. The times

to death are:

2

Using a triangular kernel with bandwidth 2, estimate the density function at 2.5.

Solution

The neighborhood is [0.5, 4.5]. If you draw a neighborhood diagram, you should get:

y

p( y)

15

15

15

15

15

k y ( 2.5)

1.5

4

1.5

4

1.5

4

1.5

4

f ( 2.5 ) =

p ( y ) k y ( 2.5 ) =

+

+

+

= 0.3

5 4

5 4

5 4

5 4

From a population having distribution function F , you are given the following sample:

2.0

3.3

3.3

4.0

4.0

4.7

4.7

4.7

Calculate the kernel density estimate F ( 4 ) , using the uniform kernel with bandwidth 1.4.

Solution

A

2

B

2.6

C

3.3

D

4

E

4.7

F

5.4

If you use scissors to cut whats left to the line AG at y = 2 , the neighborhood

rectangular BEKG completely survives the cut. So K y = 2 ( 4 ) = ABD = 1 .

If you use scissors to cut whats left to the line CI at y = 3.3 , the surviving area is CFLI.

Area CFLI=0.75. So K y =3.3 ( 4 ) = 0.75

If you use scissors to cut whats left to the line DJ at y = 4 , the surviving area is DFLI,

which is 0.5. K y = 4 ( 4 ) = BCD = 0.5 .

If you use scissors to cut whats left to the line EK at y = 4.7 , the surviving area is EFLK,

which is 0.25. So K y = 4.7 ( 4 ) = 0.25 .

y

p( y)

2.0

18

K y ( 6)

F ( 4) =

3.3

18

3.3

18

4.0

18

4.0

18

4.7

18

0.75 0.75

0.5

0.5

p ( y ) K y ( 4) =

4.7

18

4.7

18

1

1

1

1

(1) + ( 0.75) 2 + ( 0.5 ) 2 + ( 0.25 ) 3 = 0.53125

8

8

8

8

Chapter 4

Bootstrap

Essence of bootstrapping

Loss Models doesnt explain bootstrap much. As a result, many candidates just memorize

a black-box formula without understanding the essence of bootstrap.

Let me explain bootstrap with an example. Suppose you want to find out the mean and

variance of GRE score of a group of 5,000 students. One way to do so is to take out lot of

random samples. For example, you can sample 20 students GRE scores and calculate the

mean and variance of the GRE score. Here you have one sample of size 20. Of course,

you want to take many samples. For example, you can take out 30 samples, each sample

consisting 20 students GRE score. For each of the 30 samples, you can calculate the

mean and variance of the GRE score.

As you can see, taking 30 samples of size 20 takes lot of time and money. As a research

scientist, you are short of research grant. And your life is busy. Is there any way you can

cut some corners?

You can cut corners this way. Instead of taking out 30 samples of size 20, you just take

out one sample of size 20 and collect 20 students GRE scores. These 20 scores are X 1 ,

X 2 ,, X 20 . You bring these 20 scores home. Your data collection is done.

Next, you reproduce 30 samples of size 20 each from one sample of size 20. How? Just

resample from your one sample of 20 scores. You randomly select 20 scores with

replacement from the 20 scores you have. This is your 1st resample. Next, you randomly

select 20 scores with replacement from the 20 scores you have. This is your 2nd resample.

If you repeat this process 30 times, youll get 30 resamples of size 20 each. If you repeat

this process 100 times, youll get 100 resamples of size 20 each. Now your original one

sample gives birth to many resamples. How wonderful.

The rest is easy. If you have 30 resamples, you can calculate the mean and variance of the

GRE scores for each sample. This should give you a good idea of the mean and variance

of the GRE scores.

Does this sound a fraud? Not really. Your original sample of size 20 X 1 , X 2 ,, X 20

reflects the population. As a result, resamples from this sample are pretty much what you

get if you take out many samples from the population. (By the way, the bootstrap comes

from the phrase to pull oneself by ones bootstrap.)

To use bootstrap, youll need to have a computer and some bootstrapping software to

quickly create a great number (such as 10,000) of resamples and to calculate the statistics

of the resamples. Bootstrap is a computer-intensive technique.

Guo Fall 2009 C, Page 95 / 284

To summarize, bootstrap reduces researchers time and money spent on data collection.

Researchers just need to collect one good sample and bring it home. Then they can use

computers to create resamples and calculate statistics data.

For more information on bootstrap, you can download the free PDF file at

http://bcs.whfreeman.com/pbs/cat_160/PBS18.pdf

You are given a random sample of two values from a distribution F :

1

You estimate

X=

( F ) = Var ( X )

1

2

2

i =1

(X

, where

X1 + X 2

. Determine the bootstrap approximation to the mean square error.

2

Solution

Your original sample is (1,3). The variance of your original sample is

Var ( X ) = E ( X

1

E ( X ) = (12 + 32 )

2

2

1

(1 + 3)

2

=1

Under the bootstrap method, you resample from your original sample with replacement.

Your resamples are: (1,1), (1,3),(3,1), and (3,3), each having probability of 1 4 .

For each resample, you calculate g ( X 1 , X 2 ) =

1

2

2

i =1

(X

2

Resample with

replacement

( X1 , X 2 )

(1,1)

X=

X1 + X 2

2

(1,3)

(3,1)

(3,3)

MSE = E g ( X 1 , X 2 ) Var ( X )

=

g ( X1, X 2 ) =

1

2

1

2

1

2

1

2

2

1

2

2

i =1

(X

(1 1) + (1 1)

2

=0

(1

2) + (3 2)

=1

(3

2 ) + (1 2 )

=1

(3

3) + ( 3 3)

=0

P ( X 1 , X 2 ) g ( X 1 , X 2 ) Var ( X )

1

1

1

1

1

2

2

2

2

( 0 1) + (1 1) + (1 1) + ( 0 1) =

4

4

4

4

2

You are given a random sample of two values from a distribution F :

2

You estimate

X=

2

i =1

(X

, where

X1 + X 2

. Determine the bootstrap approximation to the mean square error.

2

Solution

The only difference between this problem and the previous problem (May 2000 #17)

is the definition of g ( X 1 , X 2 ) . In this problem, g ( X 1 , X 2 ) =

previous problem, g ( X 1 , X 2 ) =

1

2

2

i =1

(X

2

i =1

(X

; in the

Var ( X ) = E ( X

1

E ( X ) = (12 + 32 )

2

2

1

(1 + 3)

2

=1

Under the bootstrap method, you resample from your original sample with replacement.

Your resamples are: (1,1), (1,3),(3,1), and (3,3), each having probability of 1 4 .

2

i =1

(X

2

Resample with

replacement

( X1 , X 2 )

(1,1)

X=

X1 + X 2

2

(1

(1

(3

(3

(1,3)

(3,1)

(3,3)

MSE = E g ( X 1 , X 2 ) Var ( X )

=

g ( X1, X 2 ) =

i =1

(X

1) + (1 1) = 0

2

2) + (3 2) = 2

2

2 ) + (1 2 ) = 2

2

3 ) + ( 3 3) = 0

2

P ( X 1 , X 2 ) g ( X 1 , X 2 ) Var ( X )

1

1

1

1

2

2

2

2

( 0 1) + ( 2 1) + ( 2 1) + ( 0 1) = 1

4

4

4

4

May 2005 #4

4.

{X

E(X )

g ( X1, X 2 , X 3 ) =

1

3

(X

Solution

}.

E X

{X

E(X )

{X

E(X )

The 3rd central moment is E

} = Var ( X ) .

}.

Your original sample is (1,1,4). The 3rd central moment of this sample is calculated as

follows:

X=

1+1+ 4

=2 , E

3

{X

E(X )

} = 13 (1

2) +

3

1

1

3

3

(1 2 ) + ( 4 2 ) = 2

3

3

The third central moment of this original sample is used to approximate the true 3rd

central moment of the population. So the true parameter is = 2 .

Next, you need to understand bootstrap. Under bootstrap, you resample from the original

sample with replacement. Imagine you have 3 boxes to fill from left to right. The 1st box

can be filled with any number of your original sample (1,1,4); the 2nd box can be filled

with any number of your original sample (1,1,4); and the 3rd box can be filled with any

number of your original sample (1,1,4). The # of resamples is 33=27. This is a concept in

Exam P.

For each resample ( X 1 , X 2 , X 3 ) , you calculate g ( X 1 , X 2 , X 3 ) =

1

3

(X

(1) Three 1s. The number of permutation is 8. To understand why, lets denote the

original sample as (a,b,c) with a=1, b=1, and c=4. Then the following 8 resamples will

produce (1,1,1): aaa,aab,aba,baa, bba,bab,abb, bbb. For the resample of (1,1,1),

X=

1+1+1

=1 ,

3

=E

{X

E(X )

} = 13 (1

1) +

3

1

1

3

3

(1 1) + (1 1) = 0 ,

3

3

= ( 0 2) = 4

2

(2) Two 1s and one 4. The following 8 permutations will produce two 1s and one 4:

aac,aca,caa,bbc,bcb,cbb,abc,acb,cab,bac,bca,cba.

X=

1+1+ 4

=2 , E

3

{X

E(X )

} = 13 (1

2) +

3

1

1

3

3

(1 2 ) + ( 4 2 ) = 2 ,

3

3

= ( 2 2) = 0

2

(3) Two 4s and one 1. The following 6 permutations will produce two 4s and one 1:

Guo Fall 2009 C, Page 99 / 284

X=

1+ 4 + 4

=3 , E

3

{X

E(X )

} = 13 (1

3) +

3

1

1

3

3

( 4 3) + ( 4 3) = 2 ,

3

3

= ( 2 2) = 0

2

(4) Three 4s. The following 1 permutation will produce two 4s and one 1: ccc.

X=

4+4+4

=4 , E

3

{X

E(X )

} = 13 ( 4

4) +

3

1

1

3

3

( 4 4) + ( 4 4) = 0 ,

3

3

= ( 4 2) = 4

2

E

8

12

6

1

( 4 ) + ( 0 ) + (16 ) + ( 4 )

27

27

27

27

4.9 .

A sample of claim amounts is {300, 600, 1500}. By applying the deductible to this

sample, the loss elimination ratio for a deductible of 100 per claim is estimated to be

0.125.

You are given the following simulations from the sample:

Simulation

1

2

3

4

5

6

7

8

9

10

600

1500

1500

600

600

600

1500

1500

300

600

Claim Amounts

600

300

300

600

300

600

1500

300

600

600

1500

1500

600

300

1500

1500

1500

1500

300

600

Determine the bootstrap approximation to the mean square error of the estimate.

Solution

Your original sample is {300, 600, 1500}. If you resample this sample with replacement,

youll get 33=27 resamples. However, calculating the mean square errors based on 27

Guo Fall 2009 C, Page 100 / 284

resamples is too much work under the exam condition. Thats why SOA gives you only

10 resamples.

E min ( X , d )

Loss elimination ratio is LERX ( d ) =

.

E(X )

Loss elimination ratio for the original sample {300, 600, 1500} with 100 deductible is

0.125. SOA already gives the loss ratio. If we need to calculate it, this is how:

For the loss amount 300, the insurer pays only 200, saving 100.

For the loss amount 600, the insurer pays only 500, saving 100.

For the loss amount 1500, the insurer pays only 1400, saving 100.

1

The expected saving due to 100 deductible is:

(100 + 100 + 100 ) = 100

3

1

The expected loss amount is: ( 300 + 600 + 1500 ) = 100 + 200 + 500 = 800

3

So the loss ratio is: 100 / 800 = 0.125

Next, for each of the 10 resamples, you calculate the loss ratio as we did for the original

sample. To speed up the calculation, lets set $100 as one unit of money. Then the

deductible is one.

2

LER

( LER 0.125 )

X1

X2

X3

Resample

1

6

6

15

1/9

0.000193

2

15

3

15

1/11

0.001162

3

15

3

6

1/8

0

4

6

6

3

1/5

0.005625

5

6

3

15

1/8

0

6

6

6

15

1/9

0.000193

7

15

15

15

1/15

0.003403

8

15

3

15

1/11

0.001162

9

3

6

3

1/4

0.015625

10

6

6

6

1/6

0.001736

Total

0.0291

For example, for the 1st resample {6,6,15}, the claim payment after the deductible of 1 is

{5,5,14}. So the LER is (1+1+1) / (6+6+15) =3/27=1/9.

The MES =

10

1

( LERi

i =1 10

0.125 ) =

2

0.0291

= 0.0029

10

Chapter 5

The Bhlmann credibility premium formula is tested over and over in Course 4 and Exam

C. However, many candidates dont have a good understanding of the inner workings of

the Bhlmann credibility premium model. They just memorize a series of black-box

formulas:

Z=

E Var ( X

n

, k=

n+k

Var (

, and P

(1 Z ) + Z X

Rote memorization of a formula without fully grasping the concepts is tedious, difficult,

and prone to errors. Additionally, a memorized formula will not yield the needed

understanding to grapple with difficult problems.

In this chapter, were going to dig deep into Bhlmanns credibility premium formula and

gain a crystal clear understanding of the concepts.

Lets start with a simple example to illustrate one major challenge an insurance company

faces when determining premium rates. Imagine you are the founder and the actuary of

an auto insurance company. Your companys specialty is to provide auto insurance for

taxi drivers.

Before you open your business, there are half of dozen insurance companies in your area

that offer auto insurance to taxi drivers. The world has been going on fine for many years

without your start up. It can continue going on without your start up. So its tough for you

to get customers. Finally, you take out a big portion of your saving account and buy TV

advertising, which brings in your first three customers: Adam, Bob, and Colleen. Since

your corporate office is your garage and you have only one employee (you), you decide

that three customers is good enough for you to start your business.

When you open your business at t = 0 , you sell three auto insurance policies to Adam,

Bob, and Colleen. The contract of your insurance policy says that the premium rate is

guaranteed for only two years. Once the two-year guarantee period is over, you have the

right to set the renewal premium, which can be higher than the guaranteed initial

premium.

When you set your premium rate at t = 0 , you notice that Adam, Bob, and Colleen are

similar in many ways. They are all taxicab drivers. They work at the same taxi company

in the same city. They are all 35 years old. They all graduated from the same high school.

Guo Fall 2009 C, Page 102 / 284

They are all careful drivers. Therefore, at t = 0 you treat Adam, Bob, and Colleen as

identical risks and charge the same premium for the first two years.

To actually set the initial premium for the first two years, you decide to buy a rate book

from a consulting firm. This consulting firm is well-known in the industry. Each year it

publishes a rate manual that lists the average claim cost of a taxi driver by city, by

mileage and by several other criteria. Based on this rate manual, you estimate that Adam,

Bob, and Colleen may each incur $4 claim cost per year. So at t = 0 , you charge Adam,

Bob, and Colleen $4 each. This premium rate is guaranteed for two years.

During the 2-year guaranteed period, Adam, Bob, and Colleen have incurred the

following claims:

Year 1

Year 2 Total Claim

Average claim

Claim

Claim

per insured per year

Adam

$0

$0

$0 / 2 = $0

$0

Bob

$1

$7

$8 / 2 = $4

$8

Colleen

$4

$9

$13 / 2 =$6.5

$13

Grand Total

$21

Average claim per person per year (for the 3-person group): $21 / (3 2) = $3.5

Now the two-year guarantee period is over. You need to determine the renewal premium

rate for Adam, Bob, and Colleen respectively for the third year. Once you have

determined the premium rates, you will need to file these rates with the insurance

department of the state where you do business (called domicile state).

Question: How do you determine the renewal premium rate for the third year for Adam,

Bob, and Colleen respectively?

One simple approach is to charge Adam, Bob, and Colleen a uniform rate (i.e. the group

premium rate). After all, Adam, Bob, and Colleen are similar risks; they form a

homogeneous group. As such, they should pay a uniform group premium rate, even

though their actual claim patterns for the past two years are different. You can continue

charging them the old rate of $4 per insured per year. However, since the average claim

cost for the past two years is $3.50 per insured per year, you can charge them $3.50 per

person for year three.

Under the uniform group rate of $3.50, Bob and Colleen will probably underpay their

premiums; their actual average annual claim for the past two years exceeds this group

premium rate. Adam, on the other hand, may overpay his premiums; his average annual

claim for the past two years is below the group premium rate. When you charge each

policyholder the uniform group premium rate, low-risk policyholders will overpay their

premiums and the high-risk policyholders will underpay their premiums. Your business

as whole, however, will collect just enough premiums to pay the claim costs.

However, in the real world, most likely you wont be able to charge Adam, Bob, and

Colleen a uniform rate of $3.50. Any of your customers can easily shop around, compare

premium rates, and buy an insurance policy elsewhere with a better rate. For example,

Adam can easily find another insurer who sells a similar insurance policy for less than

your $3.50 group rate. Additionally, the commissioner of your state insurance department

is unlikely to approve your uniform rate. The department will want to see that your low

risk customers pay lower premiums.

Key points to remember:

Under the classical theory of insurance, people with similar risks form a homogeneous

group to share the risk. Members of a homogeneous group are photocopies of each other.

The claim random variable for each member is independent identically distributed with a

common density function f X ( x ) . The uniform pure premium rate is E ( X ) . Each member

of the homogeneous group should pay E ( X ) .

In reality, however, theres no such thing as a homogeneous group. No two

policyholders, however similar, have exactly the same risks. If you as an insurer charge

everybody a uniform group rate, then low-risk policyholders will leave and buy insurance

elsewhere.

To stay in business, you have no choice but to charge individualized premium rates that

are proportional to policyholders risks.

Now lets come back to our simple case. We know that uniform rating wont work in the

real world. Well want to set up a mathematical model to calculate the fair renewal

premium rate for Adam, Bob, and Colleen respectively. Our model should reflect the

following observations and intuition:

Adam, Bob, and Colleen are largely similar risks. Well need to treat them as a

rating group. This way, our renewal rates for Adam, Bob, and Colleen are

somewhat related.

On the other hand, we need to differentiate between Adam, Bob, and Colleen. We

might want to treat Adam, Bob, and Colleen as potentially different sub-risks

within a largely similar rate group. This way, our model will produce different

renewal rates. We hope the renewal rate calculated from our model will agree

with our intuition that Adam deserves the lowest renewal rate, Bob a higher rate,

and Colleen the highest rate.

To reflect the idea that Adam, Bob, and Colleen are different sub-risks within a

largely similar rate group, we may want to divide the largely similar rate group

into four sub-risks (or more sub-risks if you like): super preferred, preferred,

standard, and sub-standard. So the rate group actually consists of four sub-risks.

Adam or Bob or Colleen can be any one of the four sub-risks.

Guo Fall 2009 C, Page 104 / 284

Here comes a critical point: we dont know who belongs to which sub-risk. We

dont know whether Adam is a super-preferred sub-risk, or a preferred sub-risk, a

standard sub-risk, or a sub-standard sub-risk. Nor do we know to which sub-risk

Bob or Colleen belongs. This is so even if we have Adams two-year claim data.

Judged from his 2-year claim history, Adam seems to be a super preferred or at

least a preferred sub-risk. However, a bad driver can have no accidents for a while

due to good luck; a good driver can have several big accidents in a row due to bad

luck. So we really cant say for sure that Adam is indeed a better risk. All we

know that Adams sub-risk class is a random variable consisting of 4 possible

values: super preferred, preferred, standard; and substandard.

To visualize that Adams sub-risk class is a random variable, think about rolling a 4-sided

die. One side of the die is marked with the letters SP (super preferred); another side is

marked with PF (preferred); the third side is marked with STD (standard); and the

fourth side is marked with SUB (substandard). To determine Adam belongs to which

sub-class, well roll the die. If the result is SP, then well assign Adam to the super

preferred class. If the result is PF, well assign him to the preferred class. And so on

and so forth. Similarly, we can roll the die and randomly assign Bob or Colleen to one of

the four sub-classes: SP, PF, STD, and SUB.

Now we are ready to come up with a model to calculate the renewal premium rate:

Let random variable X j t represent the claim cost incurred in year t by the j -th insured,

where t = 1, 2,..., n , and n + 1 and j =1,2,, and m . Here in our example, n = 2 (we

have two years of claim data) and m = 1, 2,3 (corresponding to Adam, Bob, and Colleen).

For any j =1,2,, and m , X j 1 , X j 2 ,, X j n , and X j n +1 are identically distributed with a

common density function f X ,

variance

is a realization of

.

is a random variable (or a vector of random variables)

representing the presence of multiple sub-risks. X j 1 , X j 2 ,, X j n , and X j n +1 , which

represent the claim costs incurred by the same policyholder, belong to the same sub risk

class .

However, is unknown to us. All we know is that is a random realization of . Here

in our example, = {SP, PF, STD, SUB} . When we say that is a realization of

, we

mean that with probability p1 ,

p3 ,

= SP ; with probability p2 ,

( p1 + p2 + p3 ) ,

= PF ; with probability

= SUB .

sub-risk class, we assume that given

identically distributed. That is, X j 1

, X j2

, , X j n

, X j n +1

).

are independent

)= (

) and a

year n + 1 by the j -th insured, using his prior n -year average claim cost X j =

1

n

n

t =1

X jt .

The estimated value of X j n +1 is the pure renewal premium for year n + 1 . Bhlmanns

approach is to use a + Z X j to approximate X j n +1 subject to the condition that

E a+ZX j

X j n +1

is minimized.

a + Z X j = (1 Z ) + Z X j ,

Z=

n

,

n+k

k=

(

E(X

E Var X j t

Var

= E(X j t) = E E X j t

jt

=E

)

)

E Var X j t

Var

Next, well derive the above formulas. However, before we derive the Bhlmann

premium formulas, lets go over some preliminary concepts.

Preliminary concept #1

E(X ) = E

E(X

Double expectation

If X is discrete, E ( X ) = E

E(X

p(

)E(X ).

all

If X is continuous, E ( X ) = E

E(X

E(X

)f ( )d

Guo Fall 2009 C, Page 106 / 284

Ill explain the double expectation theorem assuming X is discrete. However, the same

logic applies when X is continuous.

Lets use a simple example to understand the meaning behind the above formula. A class

has 6 boys and 4 girls. These 10 students take a final. The average score of the 6 boys is

80; the average score of the 4 girls is 85. Whats the average score of the whole class?

This is an elementary level math problem. The average score of the whole class is:

Average score =

=

=

= 82

# of students

10

10

Average score =

6

4

( 80 ) + ( 85 )

10

10

If we express the above calculation using the double expectation theorem, then we have:

E ( Score ) = EGender E ( Score Gender ) =

=

6

4

( 80 ) + (85 ) = 82

10

10

So instead of directly calculating the average score for the whole class, we first break

down the whole class into two groups based on gender. We then calculate the average

score of these two groups: boys and girls. Next, we calculate the weighted average of

these two group averages. This weighted average is the average of the whole class. If you

understand this formula, you have understood the essence of the double expectation

theorem.

The Double Expectation Theorem in plain English:

Instead of directly calculating the mean of the whole population, you first break down the

population into several groups based on one standard (such as gender). You calculate the

mean of each group. Next, you calculate the mean of all the group means. This is the

mean of the whole population.

Problem A group of 20 graduate students (12 with non-math major and 8 with math

major) have a total GRE score of 12,940. The GRE score distribution by major is as

follows:

Guo Fall 2009 C, Page 107 / 284

Total GRE scores of 8 math major

Total GRE score

7,740

5,200

12,940

Find the average GRE score twice. First time, do not use the double expectation theorem.

The second time, use the double expectation theorem. Show that you get the same result.

Solution

(1) Find the mean without using the double expectation theorem. The average GRE score

for 20 graduate students is:

Average score =

=

= 647

# of students

20

E ( GRE ) = EMajor E ( GRE Major ) =

12 7, 740

8 5, 200

+

= 647

20 12

20

8

Preliminary concept #2

Proof.

Var ( X ) = E ( X 2 ) E 2 ( X )

E ( X ) = EY E ( X Y ) ,

E ( X 2 ) = EY E X 2 Y

However, E X 2 Y = Var ( X Y ) + E 2 ( X Y ) .

Guo Fall 2009 C, Page 108 / 284

{E E ( X Y ) }

E(X Y) ) }

Var ( X ) = E ( X 2 ) E 2 ( X ) = E Y Var ( X Y ) + E 2 ( X Y )

= E Y Var ( X Y ) + E Y E 2 ( X Y )

(E

= E Y Var ( X Y ) + Var Y E ( X Y )

If X is the lost amount of a policyholder and Y is the risk class of the policyholder, then

Var ( X ) = E Y Var ( X Y ) + Var Y E ( X Y ) means that the total variance of the loss

consists of two components:

Var ( X )

Total variance

E Y Var ( X Y )

expected process variance

Var Y E ( X Y )

variance of hypothetical mean

Next, lets look at a comprehensive example using double expectation and total variance.

Example. The number of claims, N , incurred by a policyholder has the following

distribution:

P (n) =

3!

3 n

p n (1 p ) .

n !( 3 n ) !

Solution

E ( N ) = 3 p , Var ( N ) = 3 p (1 p )

However, p is also a random variable. So we cannot directly use the above formula.

Guo Fall 2009 C, Page 109 / 284

To find E ( N ) , we divide N into different groups by p , just as we divided the class into

boys and girls. The only difference is that this time we have an infinite number of groups

( p is a continuous random variable).

Lets consider a small group [ p, p + dp ]

Each value of p is a separate group. For each group, we will calculate its mean. Then we

will find the weighted average mean of all the groups, with weight being the probability

of each groups p value. The result should be E ( N ) .

1

E ( N ) = EP E ( N p )

3 2

=

E ( N p ) f P ( p ) dp =

3 p dp =

p

2

p= 0

p= 0

=

0

3

2

Alternatively, E ( N ) = EP E ( N p ) = EP [3 p ] = 3E ( P ) = 3

1

3

=

2

2

Next, well calculate Var ( N ) . One method is to calculate Var ( N ) from scratch using

the standard formula Var ( N ) = E ( N 2 ) E 2 ( N ) . Well use the double expectation

theorem to calculate E ( N 2 ) and E ( N ) .

E(N

)=E

E N p

= E N 2 p f ( p ) dp

0

E N 2 p = E 2 ( N p ) + Var ( N p ) = ( 3 p ) + 3 p (1 p ) = 6 p 2 + 3 p

2

3

E ( N ) = E N p f ( p ) dp = ( 6 p + 3 p ) dp = 2 p + p 2

2

0

0

2

Var ( N ) = E ( N

7

E (N) =

2

2

3

2

7

2

5

4

Alternatively, you can use the following formula to calculate the variance:

Var ( N ) = E p Var ( N p ) + Var P E ( N p )

Guo Fall 2009 C, Page 110 / 284

E ( N p ) = 3 p , Var ( N p ) = 3 p (1 p )

E p Var ( N p ) = E p 3 p (1 p ) = E p ( 3 p 3 p 2 )

= E p ( 3 p ) E p ( 3 p 2 ) = 3E p ( p ) 3 E p ( p 2 )

Var ( N ) = E p Var ( N p ) + Var P E ( N p ) = 3E p ( p ) 3E p ( p 2 ) + 9Var ( p )

a+b

If X is uniform over [a, b] , then E ( X ) =

,

2

Var ( X )

(b

=

a)

12

We have:

0 +1 1

E (P) =

= ,

2

2

Var ( P )

(1

=

1

E ( P ) = E ( P ) + Var ( P ) =

2

2

0)

1

=

12

12

1

4

=

12 12

=3

1

2

Preliminary concept #3

4

1

5

+9

=

12

12

4

In a regression analysis, you try to fit a line (or a function) through a set of points. With

least squares regression, you get a better fit by minimizing the distance squared of each

point to the fitted line.

Lets say you want to find out how a persons income level affects how much life

insurance he buys. Let X represent income. Let Y represent the amount of life insurance

this person buys. You have collected some data pairs of ( X , Y ) from a group of

consumers. You suspect theres a linear relationship between X and Y . You want to

predict Y using the function a + bX , where a and b are constant. With least squares

regression, you want to minimize the following:

Guo Fall 2009 C, Page 111 / 284

Q=E

( a + bX

Y)

Q

E

=

a

a

( a + bX

Y)

!

= E#

% a

( a + bX

2 "

Y) $

&

= 2 E ( a + bX Y ) = 2 a + bE ( X ) E (Y )

Setting

Q

= 0.

a

Q

=

E

b

b

a + bE ( X ) E (Y ) = 0

( a + bX

= 2E

( a + bX

Setting

Q

= 0.

b

Y)

!

= E#

% b

( a + bX

( Equation I )

2 "

Y) $

&

Y ) X = 2 aE ( X ) + bE ( X 2 ) E ( X Y )

aE ( X ) + bE ( X 2 ) E ( X Y ) = 0

(Equation II )

b E ( X 2 ) E 2 ( X ) = E ( X Y ) E ( X ) E (Y )

b=

Cov ( X , Y )

Var ( X )

, a = E (Y ) bE ( X )

Now Im ready to give you a quick proof of the Bhlmann credibility formula. To

simplify notations, Im going to fix on one particular insured (such as Adam) and change

the symbol X j t to X t . Remember, our goal is to estimate X n +1 , the individualized

premium rate for year n + 1 , using a + Z X . Z is the credibility factor assigned to the

1

mean of past claims X = ( X 1 + X 2 + ... + X n ) . Well want to find a and Z that

n

minimize the following:

Guo Fall 2009 C, Page 112 / 284

E a+ZX

X n +1

Please note that X 1 , X 2 ,, X n , and X n +1 are claims incurred by the same policyholder

(whose risk class is unknown to us) during year 1, 2, , n , and n + 1 .

Applying the formula developed in preliminary concept #3, we have:

z=

Cov X , X n +1

( )

Var X

Cov X , X n +1 = Cov

=

1

1

( X 1 + X 2 + ... + X n ) , X n +1 = Cov

n

n

( X 1 + X 2 + ... + X n ) , X n+1

1

Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )

n

distributed. If indeed X 1 , X 2 ,, X n , X n +1 are independent identically distributed, we

would have

Z=

Cov X , X n +1

( )

) =0

Var X

The result Z = 0 simply doesnt make sense. What went wrong is the assumption that

X 1 , X 2 ,, X n , X n +1 are independent identically distributed. The correct statement is

that X 1 , X 2 ,, X n , and X n +1 are identically distributed with a common density function

f ( x,

) , where

is unknown to us.

given risk class . In other words, if we fix the sub-class variable at , then all the

claims incurred by the policyholder who belongs to sub-class are independent

identically distributed. Mathematically, this means that X 1 , X 2 ,, X n , and

X n +1

Here is an intuitive way to see why X i and X j have non-zero covariance. X i and X j

represent the claim amount incurred at time i and j by the policyholder whose sub-class

Guo Fall 2009 C, Page 113 / 284

a low risk, then X i and X j both tend to be small. On the other hand, if

. If

is

is a high risk,

then X i and X j both tend to be big. So X i and X j are correlated and have a non-zero

variance.

Next, lets derive the formula:

Cov ( X i , X j ) = E ( X i X j ) E ( X i ) E ( X j ) = Var

where i ' j .

Because X i

and X j

conditional mean (

),

) ( ) = ( )

)E(X )

E ( Xi

=E

Cov ( X i , X j ) = E

we have:

E Xi X j

) = E(X )E(X ) = (

E Xi X j

E Xi X j

{E

=E

= Var

1

Cov ( X 1 + X 2 + ... + X n ) , X n +1

n

1

= Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )

n

1

= nVar ( ) = Var ( )

n

Cov X , X n +1 =

( )

( )

Var X = Var

1

1

( X 1 + X 2 + ... + X n ) = 2 Var ( X 1 + X 2 + ... + X n )

n

n

Wrong!

X2

,, X n

Guo Fall 2009 C, Page 114 / 284

Var ( X 1 + X 2 + ... + X n )

= Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n )

+2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n )

So we have n variance terms. Though X 1 , X 2 ,, X n are not independent, they have a

2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n ) .

Out of X 1 , X 2 ,, X n , if you take out any two items X i and X j where i ' j , youll get

n ( n 1)

ways of taking

2

out two items X i and X j where i ' j , the sum of the covariance terms becomes:

= 2Var (

}C

( )

2

n

= 2 Var

1

n ( n 1) = n ( n 1) Var

2

1

Var ( X 1 + X 2 + ... + X n )

n2

1

= 2 { Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n )

n

+ 2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n ) }

Var X =

1

nVar ( X ) + n ( n 1) Var

n2

Var ( X ) Var

+ Var

} = 1n {Var ( X ) + ( n

1) Var

Var ( X ) = E Var ( X

Var ( X ) Var

+ Var

E(X

= E Var ( X

)

)

Guo Fall 2009 C, Page 115 / 284

( )

Var X = Var

1

E Var ( X

n

Finally, we have:

Z=

Cov X , X n +1

( )

) = Var

Var X

( )

E Var ( X

E Var ( X

Var

Var

n+

Var

Var X

Let k =

Var

Var

1

E Var ( X

n

. Then Z =

( )

Var X

n

n+k

( )

independent, have a common mean E ( X ) = and a common variance Var ( X ) .

( )

E X =E

1

1

1

( X 1 + X 2 + ... + X n ) = E ( X 1 + X 2 + ... + X n ) = ( n ) =

n

n

n

E ( X n +1 ) =

( )

a = E ( X n +1 ) Z E X = Z = (1 Z )

a + Z X = (1 Z ) + Z X = Z X + (1 Z ) , where z =

n

n+k

Z=

Cov X , X n +1

( )

),

a = (1 Z )

Var X

( )

Var X =

= VE , where i ' j

Var ( X 1 + X 2 + ... + X n )

n2

= nVar ( X ) + n ( n 1) Var

= n Var ( X ) Var

= nE Var ( X

( )

Var X =

Z=

( )

) = Var

Var X

( )

( )

) = Var

Var X

( )

Var X

1

E Var ( X

n

E Var ( X

Var (

1

E Var ( X

n

Var X

n+

Cov X , X n +1

Var (

Or Z =

} + n Var

+ n 2Var

Var ( X 1 + X 2 + ... + X n )

= Var

n2

Cov X , X n +1

= VE +

1

EV

n

+ nVar (

=

n

,

n+k

VE

n

=

EV

1

VE + EV n +

n

VE

P = a + Z X = (1 Z ) + Z X

Lets look at the final formula:

P

Renewal

premium

= Z

X

risk-specific

sample mean

(1

Z)

global mean

Here P is the renewal premium rate during year n + 1 for a policyholder whose sub-risk

is unknown to us. X is the sample mean of the claims incurred by the same policyholder

(hence the same sub-risk class) during year 1, 2, , n . is the mean claim cost of all

the sub-risks combined.

If we apply this formula to set the renewal premium rate for Adam for Year 3, then the

formula becomes:

P Adam

= Z

Renewal

premium

Adam

(1

Z)

risk-specific

sample mean

global mean

At first, the above formula may seem counter-intuitive. If we are interested only in

Adams claim cost in Year 3, why not set Adams renewal premium for Year 3 equal to

his prior two-year average claim X (so P X )? Why do we need to drag in , the

global average, which includes the claim costs incurred by Bob and Colleen?

Actually, its blessing that the renewal premium formula includes . X varies widely

based on your sample size. However, the state insurance departments generally want the

renewal premium to be stable and responsive to the past claim data. If your renewal

premium P is set to X , then P will fluctuate wildly depending on the sample size. Then

youll have a difficult time getting your renewal rates approved by state insurance

departments.

In addition, you may have P X = 0 ; this is the case for Adam. Youll provide free

insurance to the policyholder who has not incurred any claim yet. This certainly doesnt

make any sense.

Adam

Bob

<X

, the renewal

Adam

Bob

<P

.

premium formula P = (1 Z ) + Z X will produce P

There are other ways to derive the Bhlmann credibility formula. For example, instead of

minimizing E a + Z X

E a+ZX

example, (

X n +1

, we can minimize

Guo Fall 2009 C, Page 118 / 284

sub-risk , then we can set our renewal premium for year n + 1 equal to his conditional

mean claim cost ( ) = E ( X n +1 ) = E ( X 1 ) = E ( X 2 ) = ... = E ( X n ) . However, we

. As a result, we list all the possible values of (

dont know

) by minimizing

(

a+ZX

Z=

Cov X , (

( )

Var X

Cov X , (

= Cov

=

)

)

1

Cov

n

+ Cov X 2 , (

+ ... + Cov X n , (

1

( X 1 + X 2 + ... + X n ) , (

n

1

Cov X 1 , (

n

( X 1 + X 2 + ... + X n ) , ( )

)

Cov X i , (

E Xi (

= E E Xi (

For a fixed

= E Xi (

, (

E Xi (

E ( Xi ) E (

Cov X i , (

Cov X ,

E ( Xi ) E (

) is a constant. Hence

)

= E E Xi (

=E

X1, (

E ( Xi

=E

+ Cov X ,

}= E

= E

{E

X2, (

E

2

E Xi (

= (

= Var

+ ... + CovX ,

)E

Xi

= (

Xn, (

= nVar

Cov X , (

=

1

Cov X 1 , (

n

+ Cov X 2 , (

+ ... + Cov X n , (

( )

X n +1

} =Var

or E X ,

a+ZX

n

n+k

is to be

minimized:

( ) 1n {E

Var X =

Var ( X

) = Var

+ nVar

Z=

Cov X , X n +1

( )

Var X

( )

Var X

=

n+

E Var ( X

Var (

( )

a = E ( X n +1 ) Z E X = Z = (1 Z )

a + Z X = (1 Z ) + Z X = Z X + (1 Z ) ,

Theres a third approach to deriving Bhlmanns credibility formula. Instead of

minimizing

E a+ZX

X n +1

or E a + Z X

we can minimize E a + Z X

E ( X n +1 X 1 , X 2 ,..., X n )

Here X n +1 X 1 , X 2 ,..., X n represents the claim cost at year n + 1 of the policyholder who

emphasizes that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same sub-class .

This condition must hold for the Bhlmann credibility formula to be valid. For example,

if X n +1 comes from sub class 1 and X 1 , X 2 ,..., X n from sub-class 2 , then the Bhlmann

credibility formula will not hold true.

However, the requirement that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same

sub-class shouldnt bother us at all. At the very beginning when we presented the

Bhlmann credibility formula, we already used X 1 , X 2 ,..., X n , X n +1 to refer to the claims

incurred by the same policyholder whose sub-risk is . As a result,

Guo Fall 2009 C, Page 120 / 284

E ( X n +1 X 1 , X 2 ,..., X n ) = E X

So E a + Z X

= (

E ( X n +1 X 1 , X 2 ,..., X n )

= E a+ZX

Key Points

We can derive the Bhlmann credibility formula by minimizing any of the following

three terms:

E a+ZX

X n +1

, E a+ZX

, E a+ZX

E ( X n +1 X 1 , X 2 ,..., X n )

The Bhlmann credibility premium is the least squares linear estimator of any of the

following three terms:

X n +1 , the claim amount in year n + 1 incurred by the policyholder who has

X 1 , X 2 ,..., X n claims in year 1, 2,.., n .

given we have observed that the same policyholder has X 1 , X 2 ,..., X n claim costs

in years 1, 2,.., n respectively.

Even though we have derived the Bhlmann credibility formula assuming X is the claim

cost, the Bhlmann credibility formula works if X is any other quantity such as loss

ratio, the aggregate loss amount, or the number of claims.

Popularity of the Bhlmann credibility formula

The Bhlmann credibility formula is popular due to its simplicity. The renewal premium

is the weighted average of the uniform group rate and the sample mean of the past claims.

The renewal premium is easy to calculate and easy to explain to clients.

In contrast, Bayesian premiums (the posterior means) are often difficult to calculate,

requiring knowledge of prior distributions and involving complex integrations.

Next, lets derive a special case of the Bhlmann credibility formula. This special case is

presented in Loss Models.

Special case

If E ( X i ) = , Var ( X i ) =

where correlation

Z=

Cov X , X n +1

( )

),

Var X

( )

Var X =

Cov X , X n +1 = Var

= Cov ( X i , X j ) = *

Var ( X 1 + X 2 + ... + X n )

n2

Z=

Cov X , X n +1

( )

Var X

a = (1 Z ) = 1

)=

*

1

n

n2

+ n ( n 1) *

=

2

+ n ( n 1) *

n*

1 + ( n 1) *

(1 * )

n*

=

1 + ( n 1) *

1 + ( n 1) *

Z X + (1 Z ) =

(1 * ) =

n*

*

X+

1 + ( n 1) *

1 + ( n 1) * 1 + * n *

n

i =1

Xi +

(1 * )

1 + * n*

You dont need to memorize the Bhlmann credibility premium formula for this special

case. If you understand how to derive the general Bhlmann credibility premium formula,

you can derive the special case formula any time by setting Cov ( X i , X j ) = * 2 .

Next, lets turn our attention toward how to solve the Bhlmann credibility problem on

the exam.

Step 1

Step 2

For each sub-class , calculate the average claim cost (or loss ration,

aggregate claim, etc) ( ) = E ( X ) ; calculate the variance of the claim

cost Var ( X

Step 3

).

E(X

EV

n

, Z=

VE

n+k

Step 4

Calculate k =

Step 5

Calculate = E

E(X

combined. This is the uniform group premium rate you would charge

under the classical theory of insurance.

Step 6

Step 7

1

n

n

i =1

Xi .

weighted average of the sample mean and the uniform group rate.

(Nov 2003 #23)

You are given:

Two risks have the following severity distributions:

Amount of Claim

250

2,500

60,000

Probability of Claim

Amount for Risk 1

0.5

0.3

0.2

Probability of Claim

Amount for Risk 2

0.7

0.2

0.1

Guo Fall 2009 C, Page 123 / 284

Determine the Bhlmann credibility estimate of the second claim amount from the same

risk.

Solution

This is a typical problem for Exam C. Here policyholders are from two risk classes. Even

though the problem doesnt say that Risk 1 and Risk 2 are two sub-risks of a similar

bigger risk group (i.e. homogeneous group), we should assume so. Otherwise, the

Bhlmann credibility formula wont work. Remember the Bhlmann credibility premium

is the weighted average of the uniform group rate and the risk specific sample mean

X . If Risk 1 and Risk 2 are not sub-risks of a homogeneous group, then the uniform

group rate doesnt exist; we have no way of calculating Z X + (1 Z ) .

The problem says that a claim of 250 is observed. This means that a policyholder of an

unknown sub-class has incurred a claim of X 1 =$250. Since Risk 1 is twice as likely as

2

1

Risk 2, the $250 claim has

chance of coming from Risk 1 and chance of from Risk

3

3

2. The question asks us to estimate the next claim amount X 2 incurred by the same

policyholder.

Amount of Claim

250

2,500

60,000

Probability of Claim

Amount for Risk 1

0.5

0.3

0.2

Probability of Claim

Amount for Risk 2

0.7

0.2

0.1

E ( X risk 1) = 250(0.5) + 2,500(0.3) + 60,000(0.2) = 12,875

E ( X risk 2 ) = 250(0.7) + 2,500(0.2) + 60,000(0.1) = 6,675

= E ( X ) = P ( X from risk 1) E ( X risk 1) + P ( X from risk 2 ) E ( X risk 2 )

=

2

1

(12,875 ) + ( 6, 675 ) = 10,808.33

3

3

VE = P ( X from risk 1) E ( X risk 1)

(

E(X

2

1

2

2

(12,875 ) + ( 6, 675 ) 10,808.332 = 8,542, 222.22

3

3

)

risk 2 ) =250 (0.7 )+ 2,500 (0.2) + 60,000 (0.1) =361,293,750

2

(

Var ( X risk 2 ) = E ( X

2

risk 2

E 2 ( X risk 2 ) = 361, 293, 750 6, 6752 = 316, 738,125

EV = P ( X from risk 1) Var ( X risk 1) + P ( X from risk 2 ) Var ( X risk 1)

=

2

2

( 556,140, 625) + ( 316, 738,125) = 476,339, 791.67

3

3

k=

EV 476,339, 791.67

=

= 55.76

VE

8,542, 222.22

Z=

n

1

=

= 1.76% ,

n + k 1 + 55.76

Next, I want to emphasize an important point.

In the Bhlmann credibility premium formula, what matters is the

1

X = ( X 1 + X 2 + ... + X n ) , not the individual claims data X 1 , X 2 ,, X n .

n

For example, for n = 3 , ( X 1 , X 2 , X 3 ) = ( 0,3, 6 ) , ( X 1 , X 2 , X 3 ) = (1, 7,1) , and

P = Z X + (1 Z ) = 3Z + (1 Z ) .

Shortcut

We can rewrite the Bhlmann credibility premium formula as

n

k

k + nX

P = Z X + (1 Z ) =

X+

=

=

n+k

n+k

n+k

k +

n

i =1

Xi

n+k

EV

as the number of samples taken out of the global mean .

VE

Imagine we have two urns, A and B. A contains an infinite number of identical balls with

each ball marked with the number . B contains an infinite number of identical balls

We can interpret k =

with each ball marked with the number X . You take out k balls from Urn A and n balls

from Urn B.

k + nX

Then the average value per ball is: P =

=

n+k

k +

n

i =1

Xi

n+k

k

n

X

k + nX

P=

=

n+k

k +

n

i =1

Xi

n+k

Practice problems

Q1

You are an actuary on group health insurance pricing. You want to use the

Bhlmann credibility premium formula P = Z X + (1 Z ) to set the renewal premium

rate for a policy. One day the vice president of your company stops by. He has a Ph.D.

degree in statistics and is widely regarded as an expert on the central limit theorem. He

asks you to throw the formula P = Z X + (1 Z ) into the trash can and focus on .

All we care about is . As long as we charge each policyholder , well be okay, the

vice president says. The fundamental concept of insurance is that many people form a

group to share the risk. If we charge , the law of large numbers will work its magic and

well be able to collect enough premiums to pay our guarantees.

Guo Fall 2009 C, Page 126 / 284

Solution

According to the law of large numbers, for a homogeneous group of policyholders, we

can set the premium rate equal to the average claim cost = E ( X ) . Some policyholders

will suffer losses greater than E ( X ) , while others will suffer losses less than E ( X ) .

However, on average, insurance companies will have collected just enough premiums to

offset the loss. As long as each policyholder pays , then the insurer will be solvent.

However, in practice, insurance companies cant charge X . Members of a so-called

homogenous risk group are really different risks. Policyholders of different risks can shop

around and compare premium rates. If any policyholder believes that his premium is too

high, he can terminate his policy and buy cheaper insurance elsewhere.

If an insurer charges to similar yet different risks, good risks will stop doing business

with the insurer and buy cheaper insurance elsewhere; only bad risks will remain in the

insurers book of business. As more and more good risks leave the insurers book of

business, the actual expected claim cost will exceed the original average premium rate .

Then the insurer has to increase , causing more policyholders to terminate their

policies. Gradually, the insurers customer base will shrink and the insurer will go

bankrupt.

Q2

Compare and contrast the classical theory of insurance and the credibility theory

of insurance.

Solution

Classical theory of

insurance

Is there a homogeneous

Yes. This is the

group?

foundation of insurance.

Identical risks form a

homogeneous group to

share risks.

Are claim random variables

Yes. Since each member

X of different members of a of a homogeneous group

group independent

has identical risk, each

identically distributed?

members claim random

variable is independent

identically distributed at

all times.

Whats the fair premium

The fair premium is

rate?

E ( X ) = , where X is

the random loss variable

of any policyholder in

the homogeneous group.

Every member of a

homogeneous group

needs to pay , the

uniform group pure

premium rate.

Credibility

Theory

Each member of a seemingly

homogeneous group belongs

to a sub-class. The insurer

doesnt know who belongs to

which sub-class.

No. Since members of a

similar risk group are actually

of different sub-risk classes,

only claims incurred by the

same sub-class are

independent identically

distributed.

The fair premium is

E ( X = ) = ( ) , which

is the mean claim cost of the

sub-class . Every member

of the same sub-class

needs to pay ( ) ,

Q3

One day you visited your college statistics professor. He asked what you were

doing in your job. You told him that you used the Bhlmann credibility premium formula

to set the renewal premium for group health insurance policies. The Bhlmann credibility

theory was new to the professor. After listening to your explanation of the formula

P = Z X + (1 Z ) , he looked puzzled. He told you that for 20 years he had been telling

his students that X is the unbiased estimator of E ( X ) . I dont get it. Why dont you

just set P

X ?

X.

Solution

Your stats professor is perfectly correct in saying that the sample mean is an unbiased

estimator of the population mean. If the number of observations n is large (so we have

observed X 1 , X 2 , , X n claims), for any policyholder, setting his renewal premium

equal to his prior average mean claim is a good idea.

In reality, however, its hard to implement the idea P X . Often you, as an insurer,

have to set the renewal premium with limited data (so n may be small). For a small n ,

X may not be a good estimate of E ( X ) . In addition, we may have a weird situation

where X = 0 . In our taxi driver insurance example, if you use P X to set the renewal

premium for Adam, youll get P = 0. This clearly doesnt make any sense.

Q4

Nov 2005 #26

For each policyholder, losses X 1 , X 2 , , X n , conditional on

identically distributed with mean

) = E(X j

),

, are independent

j = 1, 2,..., n

and variance

v(

) = Var ( X j

),

j = 1, 2,..., n

The Bhlmann credibility factor assigned for estimating X 5 based on X 1 , X 2 ,

X 3 , X 4 is Z = 0.4

Solution

Z=

Cov X , X n +1

( )

) = Var

Var X

( )

Var X

VE

1

VE + EV

n

We are told that n = 4 (we have four years of claim data), Z = 0.4 , and VE = 0.8 .

0.4 =

VE

8

VE +

4

VE

,

VE + 2

VE = 1.33 .

= VE = 1.33

Q5

Nov 2005 #19

For a portfolio of independent risks, the number of claims for each risk in a year follows

a Poisson distribution with means given in the following table:

Class

1

2

3

1

10

20

# of risks

900

90

10

The Bhlmann credibility estimate of the number of claims for the same risk in Year 2 is

11.983. Determine x .

Solution

The problem states that x claims in Year 1 have been observed for a randomly selected

risk. The wording a randomly selected risk is needed because in order for the

Bhlmann credibility formula to work, the risk class must be unknown to us. If we

already know the risk class, we can calculate the expected number of claims in Year 2;

we dont need to estimate any more.

Please also pay attention to the wording the Bhlmann credibility estimate of the

number of claims for the same risk in Year 2 is In order for the Bhlmann credibility

formula to work, the renewal premium (or the expected number of claims in this

problem) in year n + 1 and the prior n year claims X 1 , X 2 , , X n must refer to the

same (unknown) risk class.

And now back to the problem. Let Y represent the number of claims incurred in a year

by a randomly chosen class. Since Y has is Poisson random variable,

E (Y

) = Var (Y ) .

Class

=

E (Y ) = Var (Y )

1

2

3

Total

1

10

20

P(

# of risks

90%

9%

1%

100%

900

90

10

1,000

= E (Y ) = E E (Y

P(

) E (Y ) = 1( 90% ) + 10 ( 9% ) + 20 (1% ) = 2

Guo Fall 2009 C, Page 130 / 284

P(

EV =

E (Y

VE = Var

E (Y

=E

{ E E (Y ) }

Alternatively, VE = Var

E (Y

= E E (Y

)}

E (Y

2

k=

EV

2

=

VE 9.9

n

nY + k

P=

=

n+k

Yi + k

i =1

n+k

2

( 2)

9.9

=

= 11.983 ,

2

1+

9.9

x+

x = 14

Q6

Nov 2005 #7

For a portfolio of policies, you are given:

The annual claim amount on a policy has probability density function

2x

f (x ) = 2 , 0 < x <

+ ( ) = 4 3, 0 < <1

Determine the Bhlmann credibility estimate of the claim amount for the selected risk in

Year 2.

Solution

E(X

) = x f (x

0

) dx = x

0

2x

2

2 x2

dx =

2

0

dx =

2 1 3

x

2

3

2

3

E(X

)=

x f (x

)d

Wrong!

E(X

= . So

should

regarding x , not

) = E(X

E X2

= x2 f ( x

) dx = x 2

2x

) = E(X

2 x3

dx =

) = 12

(X

= E(X ) = E E(X

E2 ( X

Var ( X

+(

)d =

E(

)=

)d = 4

0

1

2

+(

)d

E2 (

1

EV = E (

18

2

3

= E(

)=

d =

(4 ) d

= 4

)=

d =

4

6

4

5

) = 181 64 ,

2 4

8

= ,

3 5

15

1

18

2

E(

3

)

1

18

=E

1

2

= Var

2

3

1

E(

18

=

2

3

Var (

4

5

) = E(

0

1

Var (

(4

E(X

)=

2

3

=E

E(

2

3

)

2 1 4

x

2

4

dx =

Var ( X

4

6

2

75

2

VE =

3

Var (

2

)=

3

2

75

1 4

EV

18 6

k=

=

= 3.125

2

VE

2 2

3 75

The above fraction is complex. We dont want to bother expressing k in a neat fraction;

trying to expressing k in a neat fraction is prone to errors.

n

k +

P=

i =1

Xi

k + X1

=

1+ k

n+k

8

+ 0.1

15

= 0.428

1 + 3.125

3.125

E(X

)=

x f (x

) dx =

2x

= E(X ) = E E(X

dx =

2 x2

dx =

= E(X

)+ (

)d =

VE = Var

E(X

E(X

)

1

=

1

=E

E(X

E(X

+(

=0

VE = VE

EV = E Var ( X

E X2

E(X

= x2 f ( x

EV = E Var ( X

d =

16

9

) dx = x 2

=

1

5

=0

= +(

0

)

)

8

=

27

+(

Var ( X

d =

2 4

3 5

5

0

8

15

)=

)

1

2

)d

2

3

1

d = 4

0

= 0.01185

2 1 4

x

2

4

dx =

8

15

E2 ( X

E2 ( X

(as before)

16 1

8

=

9 6

27

2 x3

dx =

)= E X2

d =

) = E(X

2x

2

3

{E E ( X ) }

2

3

)d

Var ( X

Var ( X

2

3

{E E ( X ) }

=0

2

3

=0

2 1 3

x

2

3

1

18

1

18

2

1

2

d =

4 1

18 6

4 1

EV 18 6

k=

=

= 3.125

VE 0.01185

k +

P=

Xi

i =1

n+k

k + X1

=

1+ k

8

+ 0.1

15

= 0.428

1 + 3.125

3.125

Q7

May 2005, #20

For a particular policy, the conditional probability of the annual number of claims given

, = , and the probability distribution of , are as follows:

# of claims

Probability

0

2

2

1 3

Probability

0.05 0.30

0.80 0.20

Calculate the Bhlmann credibility estimate of the number of claims in Year 2.

Solution

Let X represent the annual number of claims.

0

X

Probability

E(X

E X2

Var ( X

1 3

)=2 5

) = 02 ( 2 ) + 12 ( ) + 22 (1 3 ) = 4

)=4

11

VE = Var E ( X

EV = E Var ( X

Probability

E(

) = 0 ( 2 ) + 1( ) + 2 (1

= E E(X

E(

(2

=9

25

11

= E [2 5

] = 2 5E ( )

) = V ( 2 5 ) = V ( 5 ) = 25Var ( )

) = E ( 9 25 2 ) = 9E ( ) 25E ( 2 )

0.05 0.30

0.80 0.20

2

) = 0.052 ( 0.8) + 0.32 ( 0.2 ) = 0.02

Guo Fall 2009 C, Page 134 / 284

Var (

) = 0.02

0.12 = 0.01

= 2 5E (

) = 2 5 ( 0.1) = 1.5

VE = 25Var ( ) = 25 ( 0.01) = 0.25

EV = 9 E ( ) 25 E ( 2 ) = 9 ( 0.1)

25 ( 0.02 ) = 0.4

k +

EV

0.4

k=

=

= 1.6

VE 0.25

P=

n

i =1

n+k

Xi

k + X 1 1.6 (1.5 ) + 2

=

= 1.69

1+ k

1 + 1.6

Q8

May 2005, #17

You are given:

The annual number of claims on a given policy has a geometric distribution with

parameter The prior distribution of - has the Pareto density function

+ (- ) =

( - + 1)

. +1

0<- <

A randomly selected policy has x claims in Year 1.

Calculate the Bhlmann credibility estimate of the number of claims for the selected

policy in Year 2.

Solution

Let X represent the annual number of claims on a randomly selected policy. Here the

risk factor is - . The conditional random variable X - has geometric distribution. If you

look up Tables for Exam C/4, youll find geometric random variable N with parameter

- has mean and variance as follows:

E(N) = - ,

Var ( N ) = - (1 + - )

The conditional mean: E ( X - ) = The conditional variance: Var ( X - ) = - (1 + - )

as E ( - ) and E- ( - 2 ) as E ( - 2 ) . So

EV = E- Var ( X - ) = E- - (1 + - ) = E ( - ) + E ( - 2 )

VE = V- E ( X - ) = V ( - )

= E- E ( X - ) = E ( - )

We are told that the prior distribution of - has the Pareto density function

+ (- ) =

( - + 1)

. +1

0<- <

Here the phrase prior distribution refers to the fact that we know + ( - ) prior to our

observation of x claims in Year 1. In other words, + ( - ) hasnt incorporated our

observation of x claims in Year 1 yet. Please note that the prior distribution, not the

posterior distribution, is used in Bhlmanns credibility estimate.

Frankly, I think SOAs emphasis that + ( - ) is prior (as opposed to posterior) distribution

is unnecessary and really meant to scare exam candidates. When we talk about density

function, we always refer to prior distribution. So theres never a need to say prior

distribution. If we want to refer to a distribution that has incorporated our recent

observations, at that time we say posterior distribution.

Back to the problem. We are told that - has Pareto distribution. Is it a one-parameter

Pareto or two-parameter Pareto? Many candidates have trouble knowing which one to

use. Here is a simple rule:

To decide whether to use one-parameter Pareto or two-parameter Pareto, look at your

random variable X . If X is greater than a positive number, then use single-parameter

Pareto. If X is greater than zero, then use two-parameter Pareto:

If X > a positive constant

x. +1

(x + )

. +1

In this problem, the Pareto random variable - > 0 . So we should use the two-parameter

Pareto formula in Tables for Exam C/4.

E(Xk ) =

k!

(. 1)(. 2 ) ... (.

k)

E(X ) =

. 1

2 2

(. 1)(. 2 )

E(X2) =

Var ( X ) = E ( X

2 2

E (X ) =

(. 1)(. 2 )

. 1

(.

1) (. 2 )

2

Since the two-parameter Pareto is frequently tested in Exam C, you might want to

memorize the following formulas:

E(X ) =

. 1

2 2

E(X ) =

,

(. 1)(. 2 )

Var ( X ) =

parameters are . and

E (- ) =

. 1

(.

1) (. 2 )

2

( - + 1)

. +1

. So the two

= 1 . So we have:

, E (- 2 ) =

(.

1)(. 2 )

EV = E ( - ) + E ( - 2 ) =

VE = V ( - ) =

. 1

(.

1) (.

2

2)

, Var ( - ) =

(.

1)(. 2 )

(.

=

1) (.

2

(.

= E (- ) =

2)

1)(. 2 )

1

. 1

k=

(. 1)(. 2 ) = . 1

EV

=

.

VE

2

(. 1) (. 2 )

k +

P=

n

i =1

n+k

Xi

k + X1

=

1+ k

(.

1)

+x

. 1

1 + (. 1)

x +1

.

Guo Fall 2009 C, Page 137 / 284

Q9

May 2005 #11

You are given:

The number of claims in a year for a selected risk follows Poisson distribution

with mean 0

The severity of claims for the selected risk follows exponential distribution with

mean

The number of claims is independent of the severity of claims.

The prior distribution of 0 is exponential with mean 1.

The prior distribution of is Poisson with mean 1.

A priori, 0 and are independent.

Using the Bhlmann credibility for aggregate losses, determine k .

Solution

Let N represent the annual number of claims for a randomly selected risk.

Let X represent the loss dollar amount per loss incident.

Let S represent the aggregate annual claim dollar amount incurred by a risk.

Then S =

N

i =1

X i = X 1 + X 2 + ... + X N .

0n

( N = 0,1, 2,... ).

n!

Here 0 is an exponential random variable with pdf f ( 0 ) = e 0 . We have E ( 0 ) = 1 ,

1

f X ( x ) = e x . Here is a Poisson random variable with pdf f ( ) = e

) = Var ( ) = 1 . Hence

E(

).

have E (

E (S ) = E (N ) E ( X ) ,

) = E ( ) + Var ( ) = 1

2

1

. We

!

+1 = 2 .

To remember that you need to use E 2 ( X ) , not E ( X ) , in the Var ( S ) formula, please

note that Var ( S ) is dollar squared. If you use Var ( N ) E ( X ) , youll get dollar, not dollar

squared. As a result, you need to use Var ( N ) E 2 ( X ) .

For a fixed pair of ( 0 ,

),

Guo Fall 2009 C, Page 138 / 284

E S (0,

= E(N 0) E( X

)=0

Var S ( 0 , ) = E ( N 0 ) Var ( X

) + Var ( N 0 ) E ( X ) = 0

2

+0

is an exponential

EV = E0 , Var S ( 0 ,

= 20

} = E ( 20 ) .

0,

EV = E0 , ( 20 2 ) = 2 E ( 0 ) E ( 2 ) = 2 (1)( 2 ) = 4

{E

VE = Var 0 ,

S (0,

} = Var

0,

(0 ) = E 0, (0 )

(0 ) = E 0, (0 2 2 ) = E (0 2 ) E (

( 0 ) = E ( 0 ) E ( ) = 1(1) = 1

2

E 0,

E 0,

VE = E 0 ,

k=

(0 )

E 0, (0

E 0, (0

) = 2 ( 2) = 4

= 4 12 = 3

EV 4

=

VE 3

You are given:

Claims are conditionally independent and identically Poisson distributed with

mean

F(

) =1

1

1+

is:

2. 6

>0

Determine the Bhlmann credibility factor.

Solution

Guo Fall 2009 C, Page 139 / 284

Poisson random variable with mean

The conditional mean is: E ( X

)=

k=

= E(

E(

EV

=

VE Var (

To quickly calculate E (

)=

VE = Var E ( X

),

= Var (

and Var (

. 1

E(X2) =

E(X ) =

is a

EV = E Var ( X

2 2

,

(. 1)(. 2 )

Var ( X ) =

, where x > 0 .

x+

(.

1) (. 2 )

2

2. 6

1

Here we are given that F ( ) = 1

. So

+1

variable with parameters = 1 and . = 2.6 .

So E ( X ) =

1

1

2.6

2.6

= 2

=

and Var ( X ) =

2

2.6 1 1.6

( 2.6 1) ( 2.6 2 ) 1.6 ( 0.6 )

1

) = 1.6 = 1.6 ( 0.6 ) = 0.369

EV

=

k=

2.6

VE Var ( )

2.6

2

1.6 ( 0.6 )

E(

Z=

n

5

=

= 0.93

n + k 5 + 0.369

You are given:

Claim counts follow a Poisson distribution with mean 0

Claim sizes follow a lognormal distribution with parameters and

Claim counts and claim amounts are independent.

The prior distribution has joint pdf:

Guo Fall 2009 C, Page 140 / 284

f (0, ,

)=2

<1

Solution

Let N represent the claim counts, X i the dollar amount of the i -th claim, and S the

aggregate losses. N 0 has Poisson distribution with mean of 0 . X i ,

distribution with parameters and

independent identically distributed.

. In addition, for i = 1 to N , X i ,

has lognormal

is

S=

N

i =1

Xi

E ( S 0, ,

Var ( S 0 , ,

) = E(N 0) E( X

. If we fix 0 , , and

) = 0E ( X

, then

) = E ( N 0 )Var ( X , ) + Var ( N 0 ) E ( X

= 0 Var ( X , ) + 0 E ( X , )

= 0 E ( X , )

2

From Tables for Exam C/4, we know the lognormal distribution has the following

moments:

1

E ( X k ) = exp k + k 2

2

E ( X ) = exp +

E ( S 0, ,

1

2

) = 0E ( X

1

, E ( X 2 ) = exp 2 + 22

2

) = 0 exp

1

2

= exp ( 2 + 2

Var ( S 0 , ,

) = 0E ( X

= 0 exp ( 2 + 2

E ( S 0, ,

= E0 , ,

1

0 exp +

1

2

0 exp +

1

2

= 0 = 0 0 =0

=

= 0 = 0 0 =0

1

2

Set 0.5

f (0, ,

= y . Then

= ( e 1)

e0.5 d = ( e 1)

= 0 = 0 0 =0

=

= 0 = 0 0 =0

1

3

=0

e y dy = ( e 1) ( e0.5 1)

= E 0 , ,

2

= E 0 , ,

E ( S 0, ,

)2

2 exp (

=0

) 12 ( e

0 2d 0 d d

0 =0

1) d

{E

2

E ( S 0, ,

0 , ,

)}

d0 d d

exp ( 2 )

0 2 exp ( 2 +

0 2 exp ( 2 +

=0

0 =0

=0

) f ( 0, , ) d 0 d d

=0

0d 0 d d

2 e0.5 d

0 2 exp ( 2 +

2 exp (

y =0

E ( S 0, ,

0.5

E 0 , ,

2 e0.5

= dy .

1

0 exp +

2

1

( e 1)

2

=0

VE = Var 0 , ,

=0

=0

=0

1

2

) d0 d d

2 d0 d d

e d d

2 e0.5

0 exp +

= E0 , ,

1 2

( e 1)

3

1

3

2 exp (

=0

exp (

exp ( 2 ) d d

=0

)d

=0

Set

= y . Then 2 d

1

0 exp +

2

E 0 , ,

= dy .

{E

)}

E ( S 0, ,

0 , ,

=

= 0 = 0 0 =0

=

= 0 = 0 0 =0

Var ( S 0 , ,

= E0 , ,

0 , ,

(e

0.5

0 exp ( 2 + 2

)2

0d 0 d d

0 =0

2 exp ( 2

d0 d d

exp ( 2 )

=0

)}

1) = 0.5872

0 exp ( 2 + 2

1 2

1 y

e 1)

e dy

(

3

2

y =0

1 2

( e 1) ( e 1)

6

E ( S 0, ,

) f (0, , ) d 0 d d

1 1 2

( e 1)

2 2

k=

{E

( e 1)

=0

1) ( e 1) =

)d

=0

0 exp ( 2 + 2

2 exp ( 2

exp (

= 2 = ( e 1) ( e0.5 1)

1 2

( e 1) ( e 1)

6

EV = E0 , ,

(e

1 1

3 2

E ( S 0, ,

VE = E 0 , ,

=

1 2

( e 1)

3

)d

=0

1

2

2 exp ( 2

=0

exp ( 2 ) d d

=0

2

1 1 2

1

1

e 1) ( e 2 1) = ( e2 1) = 5.103

(

2 2

2

8

EV

5.103

=

= 8.69

VE 0.5872

The joint pdf is f ( 0 , ,

c(

)=2

)=2

= a (0 )b ( ) c (

. In addition, 0 , , and

) , where a ( 0 ) = 1 , b ( ) = 1 , and

< 1.

Consequently, 0 , , and

marginal pdf:

f0 ( 0 ) = 1 ,

0 < 0 < 1;

f ( ) = 1,

0 < < 1;

( )=2

, 0<

E ( S 0, ,

= E0 , ,

< 1.

)

0

Var ( S 0 , ,

EV = E0 , ,

E (e

)=

e 2 du =

( )

E e2

= E ( 0 ) E ( e ) E e0.5

0 exp +

= E0 , ,

E ( e ) = e du = e 1 ,

1

,

2

E (0 ) =

1 2

e

2

= e2 2 d

2

1

0

= E0 , ,

=

1 2

e

2

E e0.5

) = ( e 1) ( e

1

2

0.5

= E ( 0 ) E ( e ) E e0.5

= e0.5 2 d

2

1)

= 2 e0.5

0 exp ( 2 + 2

E 0 , ,

1

0 exp +

2

0

2

= E ( 0 ) E ( e2 ) E e 2

= E 0 , ,

E ( S 0, ,

= E 0 , ,

0 2 exp ( 2 +

( )

k=

E ( S 0, ,

1

1

E ( 0 2 ) = 0 2 d 0 = , E ( e 2 ) = ( e2 1) , E e

3

2

0

VE = E 0 , ,

E ( S 0, ,

1 2

( e 1)

2

= E 0 , ,

2

2

= 2 ( e0.5 1)

( )

( ) = 12 12 ( e 1) 12 ( e 1) = 18 ( e 1)

E ( S 0, ,

1 2

( e 1)

2

EV = E ( 0 ) E ( e 2 ) E e 2

VE = Var 0 , ,

2 =

1 2

( e 1) ( e 1)

6

)

)

2

{E

= 5.103

E ( S 0, ,

0 , ,

)}

( )

= E ( 0 2 ) E ( e2 ) E e

= e 2 d

2

= e

=e 1

( e 1)

(e

0.5

1) = 0.5872

2

EV

5.103

=

= 8.69

VE 0.5872

Please note

The joint pdf f ( 0 , ,

are

and lie in a cube A < 0 < B , C < < D , E < < F , where A,B,C,D,E, and F are

constant. For example, say A < 0 < B , C < < D , e ( 0 ) <

f (0, ,

) = a ( 0 ) b ( ) c ( ) , then 0 , , and

< f ( 0 ) . Even if

You are given:

A portfolio of independent risks is divided into two classes.

Each class contains the same number of risks.

For each risk in Class 1, the number of claims per year follows a Poisson

distribution with mean 5.

For each risk in Class 2, the number of claims per year follows a binomial

distribution with mean m = 8 and q = 0.55 .

A randomly selected risk has three claims in Year 1, r claims in Year 2, and four

claims in Year 3.

The Bhlmann credibility estimate for the number of claims in Year 4 for this risk is

4.6019.

Determine r .

Solution

Risk

P ( Risk )

X Risk

E ( X Risk )

Var ( X Risk )

#1

#2

0.5

0.5

Binomial m = 8 and q = 0.55

5

8(0.55) = 4.4

5

8(0.55)0.45 = 1.98

1

( 5 + 4.4 ) = 4.7 ,

2

2

VE = ( 5 4.4 ) 0.52 = 0.09

EV 3.49

=

= 38.78 ,

k=

VE 0.09

4.6019 =

EV =

1

( 5 + 1.98) = 3.49

2

k +

P=

n

i =1

Xi

n+k

38.78 ( 4.7 ) + ( 3 + r + 4 )

, r =3

3 + 38.78

You are given the following information on claim frequency of auto accidents for

individual drivers:

Rural

Urban

Total

Business Use

Expected claims Claim variance

1.0

0.5

2.0

1.0

1.8

1.06

Pleasure use

Expected claims Claim variance

1.5

0.8

2.5

1.0

2.3

1.12

Each drivers claim experience is independent of every other drivers.

There are an equal number of business and pleasure use drivers.

Determine the Bhlmann credibility factor for a single driver.

Solution

The key to solving this problem is correctly identifying risk classes. There are four risk

classes:

= ( BR, BU , PR, PU )

BR=Business & Rural Use, BU=Business & Urban Use

PR=Pleasure & Rural Use, PU=Pleasure & Urban Use

Next, we need to calculate the probability of Rural Use and Urban Use.

Rural

Urban

Total

Expected claims

1.0

2.0

1.8

P ( R ) + P (U ) = 1

Next, we list the probability for each class:

Business Use 0.5

Pleasure use 0.5

Rural 0.2

P(BR)=0.2(0.5)=0.1 P(PR)=0.2(0.5)=0.1

Urban 0.8

P(BU)=0.8(0.5)=0.4 P(PU)=0.8(0.5)=0.4

Let X represent the claim frequency of auto accidents of a randomly selected driver.

E(X

BR

BU

PR

PU

1.0

2.0

1.5

2.5

EV = E Var ( X

VE = Var E ( X

E(X

VE = E

k=

P(

0.5

1.0

0.8

1.0

= E E(X

Var ( X

0.1

0.4

0.1

0.4

E(X

1.0

4.0

2.25

6.25

) = E E ( X ) {E E ( X ) }

2

E(X )

{E E ( X ) } =4.425 2.052 = 0.2225

2

EV

0.93

=

= 4.18

VE 0.2225

Z=

n

1

=

= 0.193

n + k 1 + 4.18

Chapter 6

In the Bhlmann credibility model, we focus on one policyholder. We know that this

policyholder has incurred claim amounts X 1 , X 2 ,, X n in Year 1, 2, , n

respectively. We want to estimate his conditional mean claim amount in Year n + 1 :

E ( X n +1 X 1 , X 2 ,..., X n )

(1 Z ) + Z X .

Now we move from the Bhlmann credibility world to a more complex, the BhlmannStraub credibility world. Instead of looking at only one policyholder, we look at a group

of policyholders.

In Year 1, there are m1 policyholders. The 1st policyholder has incurred X (1, t = 1) claim.

The 2nd policyholder has incurred X ( 2, t = 1) claim. And the m1 -th policyholder has

incurred X ( mi , t = 1) claim dollar amount.

In Year 2, there are m2 policyholders. The 1st policyholder has incurred X (1, t = 2 )

claim. The 2nd policyholder has incurred X ( 2, t = 2 ) claim. And the m2 -th

policyholder has incurred X ( m2 , t = 2 ) claim amount.

In Year t , there are mt policyholders. The 1st policyholder has incurred X (1, t ) claim.

The 2nd policyholder has incurred X ( 2, t ) claim. And the mt -th policyholder has

incurred X ( mt , t ) claim amount.

In Year n , there are mn policyholders. The 1st policyholder has incurred X (1, t = n )

claim. The 2nd policyholder has incurred X ( 2, t = n ) claim. And the mn -th

policyholder has incurred X ( mn , t = n ) claim amount.

In Year n + 1 , there are mn +1 policyholders.

Question: In Year n + 1 , how much renewal premium should each of the mn +1

policyholders pay?

Guo Fall 2009 C, Page 148 / 284

All the observed policyholders belong to the same sub-risk class

. That is, m1

the mn +1 policyholders in year n + 1 , all belong to the same sub-risk .

We dont know the specific value of

from = { 1 , 2 ,...} .

Given

conditional mean E X ( i, t )

Var X ( i, t )

= (

( ).

One approach is to calculate the renewal premium for Year n + 1 from the scratch. An

easier approach is to convert the Bhlmann-Straub credibility problem into a standard

Bhlmann credibility problem. Ill do both.

First, lets look at the problem from the Bhlmann world. In Year 1, m1 policyholders

m1

i =1

belong to the same, unknown, sub-risk , theres no distinction between any two of these

m1 policyholders. All these m1 policyholders are just photocopies of one another.

So

m1

X ( i, t = 1) claim amount.

i =1

Is the same as

m1

m1

X ( i, t = 1) claim amount.

i =1

i =1

per year is

1

m1

m1

X ( i, t = 1) .

i =1

Similarly,

m2

X ( i, t = 2 ) claim amount.

i =1

Is the same as

m1

m2

i =1

X ( i, t = 1) in the first m1

X ( i, t = 2 ) claim.

i =1

So on and so forth.

m1

m2

X ( i, t = 2 ) claim.

i =1

m3

X ( i, t = 1) claim.

i =1

X ( i, t = 3) claim.

i =1

mn

X ( i, t = n ) claim.

i =1

In m = m1 + m2 + ... + mn years, one policyholder has incurred total

mi

X ( i, t ) claim.

t =1 i =1

Then the expected claim cost in Year m + 1 for one policyholder can be calculated using

the Bhlmann credibility formula:

P = Z X + (1 Z ) , where

Total observed claims

1

X=

=

Total # of observed years m

mt

X ( i, t )

t =1 i =1

# of observation years

m

Z=

=

,

# of observation years + k m + k

k=

E

Var

( )

( )

2

Theres nothing new under the sun in the Bhlmann-Straub credibility model. Every

problem about the Bhlmann-Straub credibility model can be solved using the Bhlmann

credibility model.

Actually, we can have a unified formula for the Bhlmann-Straub and the Bhlmann

credibility models:

P = Z X + (1 Z )

X=

,

# of observed exposures (measured on the insured-year basis)

# of observed exposures

Z=

,

# of observed exposures + k

k=

E

Var

( )

( )

2

In this unified formula, the observed exposure is measured on the insured-year basis. For

example, if one policyholder has incurred $500 claim in one year, the exposure is:

1 insured 1 year = 1 insured-year

If the policyholder has incurred $500 claim in a 2-year period, then the exposure is:

1 insured 2 years = 2 insured-years

Lets see how the unified formula works for the Bhlmann and the Bhlmann-Straub

credibility models. In the Bhlmann model, we have an n -year claim history of one

policyholder. So the observed exposure is:

1 insured n years = n insured-years.

Then the formula becomes:

Z=

n

,

n+k

X=

X 1 + X 2 + ... + X n

n

However, we have only 1-year claim data for each of these m policyholders. So the total

# of exposures is:

Then the unified formula becomes:

Guo Fall 2009 C, Page 151 / 284

1

X=

m

m

Z=

,

m+k

mt

X ( i, t )

t =1 i =1

Now you know how to convert a Bhlmann-Straub problem into a Bhlmann problem

and how to use a unified formula for the Bhlmann-Straub model and the Bhlmann

model. Next, Ill derive the Bhlmann credibility formula from the scratch. First, lets

create an average policyholder and reorganize each years claim data from the viewpoint

of this average policyholder.

Lets look at the claim history data in the Bhlmann-Straub model from the average

policyholders point of view:

m1

policyholder has incurred X 1 =

1

m1

m1

i =1

X ( i, t = 1) claim.

i =1

m2

1

In Year 2, the average policyholder has incurred total X 2 =

m2

X ( i, t = 2 ) claim.

i =1

1

mt

mt

X ( i, t ) claim.

i =1

1

mn

mn

X ( i, t = n ) claim.

i =1

approximate E ( X n +1 ) , where X =

Z=

Cov X , X n +1

( )

Var X

E ( Xt

)=E

1

mt

mt

i =1

),

n

i =1

mi

X i . Well minimize E a + Z X

m

E ( X n +1 )

a = (1 Z )

X ( i, t )

1

mt

mt

i =1

E X ( i, t )

1

mt

mt

)= ( )

i =1

Var ( X t

) = Var

mt

1

mt

( )

=E

i =1

( )

Var X

n

i =1

i =1

Var X ( i, t )

i =1

1

mt2

mt

2

( )

i =1

( )

n

1

m

i =1

mi =

( )

mi E ( X i

n

1

m2

mt

1

mt2

mt

mi

( Xi

m

( )

( )

mi

( Xi

m

= Var

i =1

1

= 2 mt

mt

E X

X ( i, t )

i =1

mi 2Var ( X i

2

(m) =

( )= (

1

m

i =1

mi (

1

m2

m

2

i =1

mi 2

i =1

, has mean (

) and variance

Cov X , X n +1 = E X X n +1

E X X n +1 = E

mi

( ) = m1

and Var X

( ) without using

( )

E X =E

E ( X n +1 ) = E

( )

E X

E ( X n +1

Cov X , X n +1 = E

1

m

=E

( ).

n +1

( )E(X

E X

=E

=E

, are

) E(X )E(X

E X X n +1

( )

mi = (

( )

m = m1 + m2 + ... + mn . The claims incurred by these m policyholders, given

2

n +1

=E

= Var (

Z=

Cov X , X n +1

( )

Var X

)=

Var (

Var (

)

2

( )

m+

Var

( )

( )

Period 1

Period n

m1

mn

Exposure

) = E ( X 1 ) = E ( X 2 ) = ... = E ( X n ) = ...

Var ( X 1

)=

( )

m1

Var ( X n

)=

( )

mn

Then

E ( X n +1 X 1 , X 2 ,..., X n ) = Z X + (1 Z )

X=

n

i =1

mi

Xi ,

m

Z=

m

,

m+k

k=

Var

( )

( )

n

policyholder in Year 1, 2, , n respectively. In addition, Z

. Dont make a

n+k

n

n

common mistake of writing Z =

. The formula Z =

is not good for the

n+k

n+k

Bhlmann-Straub credibility model.

Key point

In the Bhlmann-Straub credibility model, what matters is the total exposure m and the

historical average claim per exposure X . The individual claim amount X ( i, t ) doesnt

matter.

For example, everything else being equal, the following two cases have the same

Bhlmann-Straub credibility estimate.

Case #1

m1 = 2 , X (1, t = 1) = 7 , X (1, t = 1) = 1 ;

m2 = 3 , X (1, t = 2 ) = 0 , X ( 2, t = 2 ) = 4 , X ( 3, t = 2 ) = 2 .

Case #2

m1 = 1 , X (1, t = 1) = 9 ;

m2 = 4 , X (1, t = 2 ) = 3 , X ( 2, t = 2 ) = 0.6 , X ( 3, t = 2 ) = 1 , X ( 3, t = 2 ) = 0.4 .

In both cases,

the total exposure is m1 + m2 = 5 ;

the total claim dollar amount is 14 = 7+0+4+2 =9+3+0.6+1+0.4;

the average claim per insured per year is 14/5=2.8.

This is a minor point. If you dont care, just skip it.

Loss Models mentions the Hewitts version of the Bhlmann-Straub credibility model.

This model assumes that X i , the average claim, given the sub risk class , are

Var ( X i

) = ( ) and a variance

( )

) = w( ) + m .

2

So the difference between the general and the standard Bhlmann-Straub model is about

the conditional variance assumption. Hewitts assumption is Var ( X i

the standard Bhlmann-Straub assumption is Var ( X i

Then Loss Models derived the formulas:

v

w = E w( ) , v = E v ( ) , w +

= E Var X j

mj

m* =

mj

j =1

v + wmj

v

j =1

w+

mj

n

j =1

E Var X j

)=

( ).

( )

) = w( ) + m ;

2

mi

)

Guo Fall 2009 C, Page 155 / 284

P = Z X + (1 Z ) ,

Z=

am *

,

1 + am *

X=

j =1

n

j =1

Xj

E Var X j

1

E Var X j

If m1 = m2 = ...mn = m , then

m* =

j =1

Z=

v

w+

mj

am *

=

1 + am *

j =1

1

v

w+

m

1

1+

1 1

a m*

n

v

w+

m

=

1

1+

a

w+

v

m

=

n+

w+

v

m

First, X

mj X j

mj

, the

expected process variance . The higher the expected process variance of X j , the less

weight is assigned to X j . This way, X will have the minimum variance. This point is

explained in the study note by Curtis Gary Dean. Refer to this study note if you want to

find out more.

am *

. To get comfortable with this formula,

1 + am *

1

n

look at the basic formula Z =

=

. Lets compare these two formulas:

v

v

n+

1+

a

na

Z=

n

n+

v

a

1

,

1 v

1+

a n

Z=

am *

=

1 + am *

1

1+

1 1

a m*

Now you see that these two formulas are similar. If Var ( X i

) = ( ) and m

2

= 1 as in

m* =

E Var X j

j =1

1 n

= ,

v

j =1 v

Z=

1+

1 1

a m*

1

1 v

1+

a n

1

n

.

=

v

1 v

n

+

1+

a

a n

The third point. Loss Models points out that in this version of the model, as m j

approaches infinity, the credibility factor Z wont approach to one. Lets take a look at

this.

( )

) = w( ) + m

2

Var ( X i

, Var ( X i

When m j

) = w(

1

n

=

,

w

j =1 w

m* =

)+

( )

Z=

1+

w(

mi

1 1

a m*

).

1

<1

1 w

1+

a n

Compare this with the Bhlmann model or the Bhlmann-Straub model. In the Bhlmann

model, as the number of exposures n approaches ,

Z=

n

v

n+

a

1

1 v

1+

a n

Var ( X i

)=

Z=

1

1 1

1+

a m*

( )

0,

mi

m* =

n

j =1

,

1

E Var X j

Finally, Loss Models has a special case of the general Bhlmann-Straub model. In this

special case, Var ( X i

( )

) = w( ) + m

2

=a+

b

, as opposed to Var (

m

= a in the Bhlmann

j =1

= a to Var (

=a+

b

. In

m

b

. Loss Models points out to find

m

b

the credibility factor for this special case, we just need to change a to a + :

m

Z=

am *

1 + am *

b

m*

m

Z=

b

1+ a +

m*

m

a+

Most likely, Exam C wont have problems on the generalized version of the BhlmannStraub model. So you should focus on the standard Bhlmann-Straub model. To tackle

the standard Bhlmann-Straub model, you can use any of the following 3 approaches:

m

m+k

having m = m j policyholder, we have m years of observation of one

policyholder. Then Z =

# of observation years

m

=

# of observation years + k m + k

Use the unified formula (without converting into the Bhlmann model):

Z=

# of observation years

m

=

# of observation years + k m + k

Nov 2001 #26

You are given the following data on large business policyholders:

Losses for each employee of a given policyholder are independent and have a

common mean and variance.

The overall average loss per employee for all policyholders is 20.

Year

1

2

3

15

10

5

Number of employees

800

600

400

Determine the Bhlmann-Straub credibility premium per employee for this policyholder.

Solution

Method 1

The expected process variance:

The hypothetical mean:

So k =

= 20 .

EV=8,000

VE=40

EV 8, 000

=

= 200

VE

40

Z=

m

1800

=

= 0.9

m + k 1800 + 200

X=

1

m

3

i =1

mi X i =

= 11.111

1800

Alternatively,

Guo Fall 2009 C, Page 159 / 284

k +

P=

3

i =1

mi X i

m+k

Method 2

1800 + 200

= 12

Year

1

2

3

15

10

5

Number of employees

800

600

400

Into

Year

First 800 years

Next 600 years

Next 400 years

Total loss

15*800

10*600

5*400

Number of employee

1

1

1

The above two tables are essentially the same. In both tables, the average loss per

employee per year is

X=

= 11.111

1800

After the conversion, the # of observation years n = 800 + 600 + 400 = 1800 . This seems

crazy, but it is merely a conceptual tool for us to transform a Bhlmann-Straub problem

into a Bhlmann problem.

Using the Bhlmann premium formulas, we have:

Z=

n

1800

=

= 0.9 , P = Z X + (1 Z ) = 0.9 (11.111) + 0.1( 20 ) = 12

n + k 1800 + 200

Method 3

In this method, we dont care about the distinction between the Bhlmann and the

Bhlmann Straub models. We just use the following unified formulas:

P = Z X + (1 Z )

X=

observed claims

,

# of observed exposures (measured on the insured-year basis)

# of observed exposures

Z=

,

# of observed exposures + k

k=

E

Var

Z=

# of observed exposures

1800

=

= 0.9

# of observed exposures + k 1800 + 200

X=

= 11.111

1800

( )

( )

2

May 2001 #23

You are given the following information about a single risk:

The risk has m exposures in each year

The risk is observed for n years

The variance of the hypothetical means is a

v

m

Solution

A nave approach is to use the Bhlmann credibility formula:

Z=

n

n

n

=

=

v

n + k n + EV

w+

VE n +

m

a

v

n

approaches w and Z approaches Z =

.

w

m

n+

a

Incidentally, this leads to the correct answer. However, this line of thinking is

problematic. As explained earlier, in the Bhlmann-Straub model, the credibility factor is

Guo Fall 2009 C, Page 161 / 284

m

Z=

=

m+k

i =1

n

i =1

mi

not Z =

mi + k

n

.

n+k

The correct logic is to realize that this problem involves a special Bhlmann-Straub

credibility model where Var ( X i

) = w(

)+

mi

As derived earlier, when m = m2 = ... = mn = m , we have:

Z=

am *

=

1 + am *

As m

1

1+

v

m

1 1

a m*

w+

1

1+

a

v

m

0, Z =

n+

=

n+

m1 = m2 = ... = mn .

w+

v

m

w+

v

m

n+

w

a

Nov 2004 #9

Members of three classes of insureds can have 0, 1, or 2 claims, with the following

probabilities:

Class

I

II

III

# of claims

0

1

2

0.9 0.0 0.1

0.8 0.1 0.1

0.7 0.2 0.1

A class is chosen at random, and varying # of insureds from that class are observed over

2 years, as shown below:

Year

1

2

# of insureds

20

30

# of claims

7

10

for 35 insureds.

Solution

Guo Fall 2009 C, Page 162 / 284

Method 1

P(

Class

I

II

III

E(X

Var ( X

(1)

1/3

1/3

1/3

(2)

0.2

0.3

0.4

0.36

0.41

0.44

Note (1)

(2)

0.36 =02*(0.9) + 12*(0.0) + 22*(0.1) 0.22

1

m = m1 + m2 = 20 + 30 = 50 , = ( 0.2 + 0.3 + 0.4 ) = 0.3 ,

3

1

VE = Var E ( X ) = ( 0.22 + 0.32 + 0.4 2 ) 0.32 = 0.00667

3

1

EV = E Var ( X ) = ( 0.36 + 0.41 + 0.44 ) = 0.4033

3

k=

EV

0.4033

=

= 60.5 ,

VE 0.00667

Z=

m

50

=

= 0.4525 ,

m + k 50 + 60.5

X=

7 + 10

= 0.34

50

k +

P=

i =1

mi X i

m+k

Method 2

= 0.318

50 + 60.5

Year

1

2

# of insureds

20

30

# of claims

7

10

into

Year

First 20years

Next 30 years

Total # of claims

7

10

Number of insured

1

1

The above two tables are essentially the same. In both tables, the average loss per insured

per year is

X=

7 + 10

= 0.34

50

premium formulas, we have:

Z=

n

50

=

= 0.4525 , P = 0.4525 ( 0.34 ) + (1 0.4525 ) 0.3 = 0.318

n + k 50 + 60.5

Method 3

In this method, we dont care about the distinction between the Bhlmann and the

Bhlmann Straub models.

Z=

# of observed exposures

50

=

= 0.4525

# of observed exposures + k 50 + 60.5

X=

7 + 10

= 0.34

50

Nov 2002 #32

You are given four classes of insured, each of whom may have zero or one claim, with

the following probabilities:

# of claims

0

1

Class

I

II

III

IV

0.9

0.8

0.5

0.1

0.1

0.2

0.5

0.9

A class is selected at random, and four insureds are selected at random from the class.

The total number of claims is two, If five insureds are selected at random from the same

class, estimate the total number of claims using the Bhlmann Straub credibility.

Solution

You can use any one of the three methods. Here I use the Bhlmann Straub credibility

m

.

formula Z =

m+k

P(

Class

I

II

III

IV

E(X

1/4

1/4

1/4

1/4

0.1

0.2

0.5

0.9

Var ( X

0.09

0.16

0.25

0.09

(1)

a

If X = !

"b

with probability p

2

, then E ( X ) = ap + bq , Var ( X ) = ( a b ) pq

with probability q = 1- p

1

( 0.1 + 0.2 + 0.5 + 0.9 ) = 0.425

4

1

EV = ( 0.09 + 0.16 + 0.25 + 0.09 ) = 0.1475

4

1

VE = ( 0.12 + 0.22 + 0.52 + 0.92 ) 0.4252 = 0.096875

4

k=

EV

0.1475

=

= 1.5226 ,

VE 0.096875

Z=

m

4

=

= 0.7243 ,

m + k 4 + 1.5226

X=

2

= 0.5

4

You are given:

The # of claims incurred in a month by any insured has a Poisson distribution

with mean #

The claim frequency of different insureds are independent.

The prior distribution is gamma with probability density function

(100# )

f (# ) =

100 #

120#

Months

1

2

3

4

# of insureds

100

150

200

300

# of claims

6

8

11

?

Solution

This time, lets solve it by converting the Bhlmann Straub credibility problem into a

Bhlmann credibility problem.

This table

Months

1

2

3

# of insureds

100

150

200

# of claims

6

8

11

Is the same as

Months

First 100

Next 150

Next 200

# of insureds

1

1

1

# of claims

6

8

11

So the total number of the observation years n = 100 + 150 + 200 = 450 . The total # of

25

observed claims is 6+8+11=25. So X =

.

460

Let N represent the # of claims incurred in a month by a randomly chosen policyholder.

Then N # is Poisson with mean # . So the risk random variable is # .

E ( N # ) = Var ( N # ) = #

= EV = E# Var ( N # ) = E# [ # ] = & = 6

VE = Var# E ( N # ) = Var# [ # ] = & (& + 1)

k=

EV

0.06

=

= 100

VE 0.0006

Z=

1

= 0.06

100

2

&

1

= 6(7)

100

0.06 2 = 0.0006

n

450

=

= 0.818

n + k 450 + 100

Guo Fall 2009 C, Page 166 / 284

P = Z X + (1 Z ) = 0.818

25

+ (1 0.818 ) 0.06 = 0.0564

460

300*0.0564=16.9

You are given:

A region is comprised of 3 territories. Claims experience for Year 1 is as follows:

Territory

A

B

C

# of insureds

10

20

30

# of claims

4

5

3

The # of claims for each insured each year has a Poisson distribution.

Each insured in a territory has the same expected claim frequency.

The # of insureds is constant over time for each territory.

Determine the Bhlmann-Straub empirical Bayes estimate of the credibility factor Z for

Territory A.

Solution

P(

Territory

A

B

C

= E E(X

VE = Var E ( X

k=

E(X

1/6

2/6

3/6

4/10=0.4

5/20=0.25

3/30=0.1

= E Var ( X

=E

EV

0.2

=

= 16 ,

VE 0.0125

) = Var ( X )

E(X

Z=

= EV =

1

2

3

( 0.4 ) + ( 0.25 ) + ( 0.1) = 0.2

6

6

6

2 =

1

2

3

0.4 2 ) + ( 0.252 ) + ( 0.12 ) 0.22 = 0.0125

(

6

6

6

n

10

=

= 0.385

n + k 10 + 16

Chapter 7

Empirical Bayes estimate for the

Bhlmann model

Deans study note has a good explanation of the formulas and worked out problems.

Read Deans study note along with my explanation.

This topic is among the least interesting ones in Exam C. However, it was repeatedly

tested in Exam C. The exam problems on this topic are easy. The difficulty is to

memorize the formulas. In this chapter, I will show you some ideas behind the formulas

to help you memorize the formulas.

We have an n -year claim data about r risks. For each risk, we have its claim amount in

Year 1, Year 2, , Year n . Let X i j represent the claim incurred by the i -th

policyholder in Year j . This is what we know:

Year 1

Year 2

X 11

X 12

X 1n

X 21

X 22

X 2n

X r1

X r2

Xrn

Risk

Year n

The issue here is that we dont know the probability distribution of the conditional claim

random variable X or the probability distribution of the risk variable . As a result, we

cant calculate the two inputs for the credibility factor Z : the expected process variance

EV = E Var ( X ) and the variance of the hypothetical mean VE = Var E ( X ) .

So we need to estimate EV and VE from the past claim data given to us.

Its easy to estimate EV = E Var ( X

using the formula

EV = E Var X i j

2

i

n 1 t =1

(X

it

)

Xi

2

Year 1

Risk

Year 2

Year

n

X 11

X 12

X 1n

X 21

X 22

X 2n

X r1

X r2

Xrn

Sample

mean X i

2

i

(X

Xi

X1

X2

Xn

it

n 1 t =1

2

1 n

X 1t

1 =

n 1 t =1

2

1 n

X 2t

2 =

n 1 t =1

2

1 n

X rt

r =

n 1 t =1

1

X 1t

n t =1

1 n

X2 =

X 2t

n t =1

1 n

X1 =

Xr t

n t =1

X1 =

EV =

1

r

2

i

i =1

1

r ( n 1)

i =1 t =1

(X

Xi

it

( )

Var X = Var

1

E Var ( X

n

= VE +

1

EV

n

VE = Var

( )

VE = V ar X

( )

( )

1

E Var ( X

n

= Var X

( )

( )

VE = V ar X

= Var X

1

EV

n

1

EV

n

V ar X is simple to calculate: V ar X =

So

( )

1

1 n

EV =

Xi

n

r 1 i =1

r 1 i =1

(X

1

1

n r ( n 1)

i =1 t =1

(X

it

Xi

EV

VE = 0 . If VE = 0 , then k =

and Z = 0 .

VE

Summary of the estimation process for the empirical Bayes estimate for the

Bhlmann model

Step 1 Calculate the sample variance for each risk and the expected process variance for

all risks combined:

2

i

n 1 t =1

(X

Xi

it

( )

Step 2 Calculate V ar X =

, EV =

r 1 i =1

(X

( )

1

r

2

i

i =1

1

r ( n 1)

i =1 t =1

(X

it

Xi

( )

1

EV . Find VE = V ar X

n

1

EV

n

An insurer has data on losses for four policyholders for seven years. X i j is the loss from

the i -th policyholder for year j . You are given:

4

i =1 j =1

4

i =1

(X

(X

Xi

ij

Xi

ij

= 33.6

= 3.3

nonparametric empirical Bayes estimation.

Solution

Step 1

EV =

1

r

( )

2

i

i =1

Step 2

V ar X =

Step 3

Var X = VE +

( )

1

r ( n 1)

n

r 1 i =1

(X

1

EV

n

i =1 t =1

(X

it

Xi

1

33.6 = 1.4

4 ( 7 1)

3.3

= 1.1

4 1

VE = 1.1

1.4

= 0.9

7

k=

EV

VE

1.4

,

0.9

Z=

n

7

=

= 0.818

n + k 7 + 1.4

0.9

An insurer has data on losses for four policyholders for seven years. X i j is the loss from

4

i =1 j =1

4

i =1

(X

(X

ij

Xi

ij

Xi

= 33.6

= 3.3

Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility

factor for an individual policyholder.

Solution

You are given total claims for two policyholders:

Policyholder

X

Y

Year

2

3

800 650

650 625

1

730

655

4

700

750

Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility

factor for Policyholder Y.

Solution

r = 2, n = 4.

Step 1 Calculate the sample conditional variance for each risk and the mean

2

i

n 1 t =1

X it

Xi

, EV =

1

r

r

i =1

2

i

1

r ( n 1)

i =1 t =1

(X

it

Xi

X=

= 720

4

Y=

= 670

4

V ar ( X ) =

V ar ( Y ) =

(X

Xi

it

n 1 t =1

1

2

2

2

2

=

( 730 720 ) + (800 720 ) + ( 650 720 ) + ( 700 720 ) =3,933.33

4 1

EV =

4 1

( 655

2

=3,016.67

1

( 3,933.33 + 3, 016.67 ) = 3, 475

2

( )

Step 2 Calculate V ar X =

( )

V ar X =

1

2 1

( 720

r 1 i =1

(X

2

( )

( )

1

1

X + Y = ( 720 + 670 ) = 695

2

2

VE = V ar X

= 1, 250

( )

1

EV . Find VE = V ar X

n

1

EV

n

1

1

EV = 1, 250

( 3, 475 ) = 381.25

n

4

k=

EV

VE

3, 475

mY

4

=

= 0.305

= 9.115 , ZY =

381.25

mY + k 4 + 9.115

Here the # of policyholders varies from risk to risk and year to year. For risk 1, m11

policyholders have incurred X 11 claim amount in Year 1; m12 policyholders have

incurred X 12 claim amount in Year 1; ; that m1n policyholders have incurred X 1n

claim amount in Year 1.

For risk 2, m21 policyholders have incurred X 21 claim amount in Year 1; m22

policyholders have incurred X 22 claim amount in Year 2; ; that m2n2 policyholders

have incurred X 2n2 claim amount in Year n2 . So on and so forth.

This is the information given to you:

Year 1

Year 2

Risk

1

r

Total

n1

n2

nr

X 11

X 12

X 1n1

m11

m12

m1n1

X 21

X 22

X 2n

m21

m22

m2n

X r1

X r2

Xrn

mr1

mr 2

mr n

Total

exposure

Sample mean

How to estimate:

Risk Periods

1

Year

n1

m1 =

m2 =

mr =

mr =

n1

t =1

X1 =

m2 t

X2 =

mr t

1

Xr =

mr

mi

X=

n1

t =1

nr

t =1

r

i =1

1

m1

m1t

1

m2

1

m

n1

t =1

m1t X 1t

n2

t =1

mr nr

X r nr

Sample variance

2

1

mr t X r t

2

r

=

=

1

1

X1

m2 t X 2 t

X2

Xr

nr

nr 1 t =1

r

EV =

n2

n2 1 t =1

1

m1t X 1t

n1

n1 1 t =1

nr

t =1

Year

nr

X 2 n2

2

2

mi X i

m1n2

m2t X 2t

nr

t =1

Year

n2

i =1

mr t X r t

( ni

r

i =1

( ni

1)

2

i

1)

Step 1 Calculate the sample variance for each risk and the expected process variance for

all risks combined:

2

i

mi t X i t

ni 1 t =1

r

EV =

ni

i =1

( ni

r

i =1

1)

( ni

ni

2

i

1)

Xi

i =1 t =1

mi t X i t

r

i =1

( ni

Xi

1)

Step 2 Calculate VE

VE =

i =1

mi X i

1

m

( r 1) EV

r

i =1

mi2

This formula is counter-intuitive and very hard to remember. However, youll just have

to memorize it. Perhaps Deans explained might help you a little bit. He says that the

crude estimate for VE is

r

VE =

i =1

mi X i

r 1

However, this estimate is biased. To have an unbiased estimator, we need to change the

above estimate to

r

VE =

i =1

mi X i

1

m

( r 1) EV

r

i =1

mi2

This isnt a big help on how to memorize the formula. This formula is hard. Youll just

have to memorize it.

Final point. Loss Models mentions the concept of credibility weighted average premium.

It proves that the total loss will be equal to the total premium if we set

Zi X i

i =1

r

Zi

i =1

Nov 2000 #27

You are given the following information on towing losses for two classes of insured,

adults and youths:

Exposures

Year

Adult

1996

2000

1997

1000

1998

1000

1999

1000

Total 5000

Youth

450

250

175

125

1000

Pure Premium

Year

Adult

1996

0

1997

5

1998

6

1999

4

Weighted

3

Average

Total

2450

1250

1175

1125

6000

Youth

15

2

15

1

10

Total

2.755

4.400

7.340

3.667

4.167

You are also given that the estimated variance of the hypothetical means is 17.125.

Determine the non-parametric empirical Bayes credibility premium for the youth class,

using the method that preserves the total losses.

Solution

We have two risk groups: adults and youths. So r = 2 .

EV =

i =1

( ni

r

i =1

( ni

1)

1)

2

i

ni

i =1 t =1

mi t X i t

r

i =1

( ni

Xi

1)

1

2

2

2

2

2, 000 ( 0 3) + 1, 000 ( 5 3) + 1, 000 ( 6 3) + 1, 000 ( 4 3)

( 4 1) + ( 4 1)

2

= 12,291.7

The calculation of VE is complex. Fortunately, we are given that a = VE = 17.125

(Thank you SOA! )

ZA =

5, 000

1, 000

= 0.874 , ZY =

= 0.582

12, 291.7

12, 291.7

5, 000 +

1, 000 +

17.125

17.125

r

Zi X i

i =1

r

Zi

= 5.8

0.874 + 0.528

i =1

The non-parametric empirical Bayes credibility premium for the youth class is:

Lets verify that the total credibility premium is equal to the total loss:

The non-parametric empirical Bayes credibility premium for the adult class is:

The total credibility premium is:

1,000(8.24)+5,000(3.35)=25,000

The total loss is:

Adult: 2,000(0)+1,000(5)+1,000(6)+1,000(5)=15,000

Or 5,000(average exposure) * (3 average premium per exposure)=15,000

Youth: 450(15)+250(2)+175(15)+125(1)=10,000

Or 1,000(average exposure) * (10 average premium per exposure)=10,000

Total: 25,000

Guo Fall 2009 C, Page 176 / 284

You are given the following experience for two insured groups:

Year

Group

1

2

1

# of members

8

12

Average loss

96

91

per member

# of members

25

30

2

Average loss

113

111

per member

Total

# of members

Average loss

per member

2

i =1 j =1

mij xij

2

i =1

mi xi

xi

3

5

113

4

25

97

20

116

75

113

100

109

= 2020

= 4800

Determine the nonparametric Empirical Bayes credibility premium for group 1, using the

method that preserves the total loss.

Solution

r

EV =

i =1

( ni

r

i =1

( ni

VE =

k=

i =1

1)

VE

i =1 j =1

mi j X i j

r

i =1

mi X i

ni

2

i

1

m

m

EV

Z1 =

1)

(r

r

2

i

m

i =1

( ni

1) EV

Xi

1)

4800

2020

= 505

2+2

( 2 1) 505

1

100

( 252 + 752 )

100

= 114.533

505

= 4.409

114.533

m1

m1 + k

25

m2

75

= 0.85 , Z 2 =

=

= 0.944

25 + 4.409

m2 + k 75 + 4.409

Z=

n+k

m

n

, not Z =

.

m+k

n+k

r

Zi X i

i =1

r

Zi

= 105.42

0.85 + 0.944

i =1

You are making credibility estimates for regional rating factors. You observe that the

Bhlmann-Straub nonparametric empirical Bayes method can be applied, with rating

factor playing the role of pure premium. X i j denotes the rating factor for region i and

number of reported claims, mi j , measuring exposure.

You are given:

mi =

1

2

3

4

j =1

Xi =

mi j

50

300

150

1

mi

4

j =1

vi =

mi j

1.406

1.298

1.178

1

3

4

j =1

mi j X i j

0.536

0.125

0.172

Xi

mi j X i

0.887

0.191

1.348

Determine the credibility estimate of the rating factor for region 1 using the method that

3

preserves

i =1

mi X i .

Solution

r

EV =

i =1

( ni

r

i =1

( ni

1)

1)

2

i

ni

i =1 j =1

mi j X i j

r

i =1

( ni

Xi

1)

( 4 1) + ( 4 1) + ( 4 1)

VE =

mi X i

i =1

1

m

k=

EV

VE

Z1 =

Z2 =

Z3 =

(r

1) EV

r

i =1

mi2

= 0.069

1

2

2

2

500

( 50 + 300 + 150 )

500

0.2777

= 40.0829

0.069

m1

m1 + k

25

= 0.555

25 + 40.0829

300

= 0.882

300 + 40.0829

150

= 0.789

150 + 40.0829

m2

m2 + k

m3

m3 + k

3

Zi X i

i =1

Zi

= 1.2824

0.555 + 0.8821 + 0.789

i =1

Nov 2004 #17

You are given the following commercial automobile policy experience:

Losses

# of automobile

Losses

# of automobile

Losses

# of automobile

Company

I

II

III

Year 1

50,000

100

?

?

150,000

50

Year 2

50,000

200

150,000

500

?

?

Year 3

?

?

150,000

300

150,000

150

Guo Fall 2009 C, Page 179 / 284

Solution

Company

I

Year 1

X11=50,000 /

100

= 500

Year 2

X12=50,000/200

= 250

II

III

X=

Xi

X 1 =100,000/300

= 333.33

X21=150,000/500 X22=150,000/300

=300

= 500

X32=150,000/150

=1,000

X31=150,000/50

=3,000

X 2 =300,000/800

=375

X 3 =300,000/200

=1,500

= 538.46

300 + 800 + 200

EV =

i =1

( ni

r

i =1

Year 3

1)

( ni

1)

ni

2

i

i =1 j =1

mi j X i j

r

i =1

( ni

Xi

1)

2

( 2 1) + ( 2 1) + ( 2 1)

2

2

=53,888,889

r

VE =

i =1

mi X i

1

m

( r 1) EV

r

i =1

mi2

=

1

1,300

3002 + 8002 + 2002 )

(

1,300

2

53,888,889 ( 3 1)

=157,035.6

k=

53,888,889

= 343.16 ,

157, 035.6

Z=

200

= 0.368

200 + 343.16

Guo Fall 2009 C, Page 180 / 284

You are given:

Group

Total Claims 1

# in Group

Average

Total Claims 2

# in Group

Average

Total Claims

# in Group

Average

Year 1

Year 2

10,000

50

200

18,000

90

200

16,000

100

160

Year 3

15,000

60

250

Total

2,500

110

227.27

34,000

190

178.95

59,000

300

196.67

Use the nonparametric empirical Bayes method to estimate the credibility factor for

Group 1.

Solution

r

EV =

i =1

( ni

r

i =1

( ni

1)

2

i

1)

ni

i =1 j =1

mi j X i j

r

i =1

( ni

Xi

1)

1

2

2

50 ( 200 227.27 ) + 60 ( 250 227.27 )

( 2 1) + ( 2 1)

2

=71,985.65

Z1 =

110

= 0.5

71,985.65

100 +

651.03

We have a parametric model for X

random variable with mean

E(X

is a Poisson

) = Var ( X ) =

is a Poisson, we have:

E(X

= E Var ( X

= EV .

However, Var ( X ) = E Var ( X

+ Var E ( X

= EV + VE

VE = Var ( X ) EV ,

The number of claims a driver has during the year is assumed to be Poisson distributed

with an unknown mean that varies by driver.

The experience for 100 drivers is as follows:

# of claims during the year

0

1

2

3

4

# of drivers

54

33

10

2

1

Determine the credibility of one years experience for a single driver using

semiparametric empirical Bayes estimation.

Solution

Let X represent the # of claims in a year and

that X is a Poisson random variable.

= E(X ) = E E(X

EV = E Var ( X

)

=E

=E

( ) = E( )

( ) = E( )

Guo Fall 2009 C, Page 182 / 284

=X =

54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63

=

= 0.63

54 + 33 + 10 + 2 + 1

100

EV = = 0.63

Var ( X ) =

1 100

Xi

100 1 i =1

=

100 1

=0.68

2

Var ( X ) = E Var ( X

+ Var E ( X

= EV + VE

Z=

n

EV

n+

VE

1

= 0.073

0.63

1+

0.05

When taking the exam, you should use BA II Plus/ Professional 1-V Statistics Worksheet

to quickly calculate the sample mean and the sample variance.

Nov 2000 #7

The following information comes from a study of robberies of convenience stores over

the course of a year:

X i = 50

X i2 = 220

The number of robberies of a given convenience store during the year is assumed

to be Poisson distributed with an unknown mean that varies by store.

robberies next year of a store that reported no robberies during the studied year.

Solution

50

= 0.1

500

EV = = X =

Var ( X ) =

500

n 1 i =1

(X

n

i =1

Xi

( )

i =1

X i2 n X

To see why this formula works, notice that the (biased) sample variance is:

Var ( X ) =

1

n

n

i =1

1

n

n

i =1

Xi

Var ( X ) =

Xi

1

n

= E ( X 2 ) E2 ( X ) =

( )

n

i =1

Xi2

1 500

Xi

500 1 i =1

,

i =1

1

500 1

1

n

(X )

n

i =1

Xi

X i2

500

i =1

( )

n

i =1

X i2 n X

( )

X i 2 500 X

220 5

= 0.43086

499

Z=

n

EV

n+

VE

1

= 0.768

0.1

1+

0.33086

The single store didnt have any robbery incidents for two years. So the sample mean is

zero.

P

For a portfolio of motorcycle insurance policyholders, you are given:

The number of claims for each policyholder has a conditional Poisson distribution

For Year 1, the following data are observed:

Guo Fall 2009 C, Page 184 / 284

Number of claims

0

1

2

3

4

Total

Number of Policyholders

2000

600

300

80

20

3000

Solution

Enter the following in BA II Plus/Professional 1-V Statistics Worksheet:

X01=0, Y01=2000

X02=1, Y02= 600

X03=2, Y03= 300

X04=3, Y04= 80

X05=4, Y05= 20

You should get:

The sample mean is X =0.50666667 0.507. This is and EV .

The sample standard deviation is S X = 0.83077411

The sample variance is S X 2 = 0.830774112 = 0.69018562 0.69019 . This is Var ( X ) .

So VE = Var ( X ) EV = 0.69019 0.507 = 0.183

Z

n

1

=

= 0.265

0.507

EV

1+

n+

VE

0.183

During a 2-year period, 100 policies had the following claim experience:

Number of claims in Year 1 and Year 2 Number of Policyholders

0

50

1

30

2

15

3

4

4

1

Guo Fall 2009 C, Page 185 / 284

Each policyholder was insured for the entire 2-year period.

A randomly selected policyholder had one claim over the 2-year period.

Using semiparametric empirical Bayes estimation, determine the Bhlmann estimate for

the number of claims in Year 3 for the same policyholder.

Solution

Well use a 2-year period as one unit of time. So well calculate the Bhlmann estimate

the number of claims in Year 3 and Year 4. Then half of this amount will be the

Bhlmann estimate for the number of claims in Year 3.

Enter the following in BA II Plus/Professional 1-V Statistics Worksheet:

X01=0, Y01=50

X02=1, Y02=30

X03=2, Y03=15

X04=3, Y04= 4

X05=4, Y05= 1

You should get:

The sample mean is X =0.76. This is and EV .

The sample standard deviation is S X = 0.92244734

The sample variance is S X 2 = 0.922447342 = 0.85090909 0.851 . This is Var ( X ) .

VE = Var ( X ) EV = 0.851 0.76 = 0.091

n

1

=

= 0.107

0.76

EV

1+

n+

VE

0.091

A randomly selected policyholder had one claim over the 2-year period. So the sample

claim frequency is

P

1

1

P = ( 0.786 ) = 0.393

2

2

Chapter 8

The study note titled Chapter 8 Credibility jointly written by Mahler and Dean provides

an excellent explanation of the limited fluctuation credibility theory. Please read this

study note along with my explanation.

The goal of the limited fluctuation credibility model is the same as the goal of the

Bhlmann credibility model. We observe that a policyholder has incurred S1 , S2 ,, Sn

claim dollar amounts in Year 1, 2,, n respectively. We want to estimate the

policyholder renewal premium in Year n + 1 . The renewal premium in Year n + 1 is

E ( S n +1 S1 , S 2 ,..., S n ) , the expected claim dollar amount in Year n + 1 .

Here I wrote the past n year claim amounts as S1 , S2 ,, Sn instead of X 1 , X 2 ,, X n as

in the Bhlmann credibility model. Theres a reason for using a different notation. In the

limited fluctuation credibility model, we typically break down the annual claim dollar

amount S into two components:

the claim dollar amount per loss incurred by a policyholder in a year (loss

severity)

Mathematically, S =

N

i =1

frequency) by a policyholder. X i is the claim dollar amount of the i -th claim (loss

severity) incurred by the policyholder. S is the total claim dollar amount incurred in a

year (also called the annual aggregate claim) by the policyholder. In contrast, in the

Bhlmann credibility model, we dont break down the annual claim dollar amount into

loss frequency and loss severity.

In the limited fluctuation credibility model, we assume, as in the Bhlmann credibility

model, that the renewal premium is the weighted average of the global premium rate

1

(called manual rate) and the sample mean S = ( S1 + S 2 + ... + S n ) :

n

P

Renewal

premium

= E ( S n +1 S1 , S 2 ,..., S n ) = Z

S

policyholder-specific

sample mean

(1

Z)

global mean

(manual rate)

amounts S1 , S2 ,, Sn and hence different S . However, is the same for all

policyholders regardless of their different claim history.

Guo Fall 2009 C, Page 187 / 284

The limited fluctuation credibility assumes that the above renewal premium equation

automatically holds true without any proof. This equation is the starting point for the

limited fluctuation credibility. So when you study the limited fluctuation credibility,

youll need to accept the above equation without demanding proof.

In contrast, the Bhlmann credibility theory doesnt assume the above equation holds true

automatically. It derives this equation using basic probability theories.

Next, we need to calculate the weighting factor Z ( 0 Z 1 ), which is the credibility

assigned to the prior sample mean S . The Limited fluctuation credibility calculates Z as

follows:

Z=

=

expected # of observations needed to make Z=1

your n

E ( N ) to make Z=1

Once again, the limited fluctuation credibility assumes that Z =

your n

E ( N ) to make Z=1

holds true automatically without the need to prove it. So you need to accept it without

demanding any proof. The core theory of the limited fluctuation credibility is to calculate

E ( N ) to make Z = 1 .

We first derive a model for r insureds. Then to calculate the renewal premium for one

insured, we just set r = 1 .

The aggregate annual loss for r insureds is:

S=

M

i =1

X i = X 1 + X 2 + ... + X M

r

j =1

# of annual claims for r insureds; N j is the number of claims incurred by the j -th

insured.

We assume that X 1 , X 2 , X M are independent identically distributed with a common

pdf f X ( x ) ; N1 , N 2 ,, N r are independent identically distributed with a common pdf

fN (n) .

Guo Fall 2009 C, Page 188 / 284

P

E (S )

k E (S )

E (S )

E (S )

S E (S )

S E (S )

E (S )

p,

P

= P( a

a) =

(a)

( a) .

However,

=2

(a)

1, P

E (S )

=2

E (S )

E (S )

k E ( S ) = p or 2

E (S )

1 = p . We

E (S )

= 1+ p ,

E (S )

Define CVS =

E (S )

the mean. Then

( y) =

1+ p

2

k

1+ p

=

,

CVS

2

Next, define

k=

1+ p

, or y =

2

1+ p

CVS .

2

1

1+ p

. Then k = y CVS

2

As actuaries, we set k and p . Then we find E ( N ) to make Z = 1 by solving the

k

1+ p

or k =

=

CVS

2

equation

1+ p

CVS = y CVS

2

E ( M ) = E ( N1 + N 2 + ... + N r ) = r E ( N )

Var ( M ) = Var ( N1 + N 2 + ... + N r ) = rVar ( N )

E ( S ) = E ( X 1 + X 2 + ... + X M ) = E ( M ) E ( X ) = r E ( N ) E ( X )

Var ( S ) = Var ( X 1 + X 2 + ... + X M ) = E ( M ) Var ( X ) + Var ( M ) E 2 ( X )

= rE ( N ) Var ( X ) + rVar ( N ) E 2 ( X )

= r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

(S ) =

CVS =

E (S )

k = y CVS = y

k

y

k

y

r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

r E(N)E(X )

r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

r E(N)E( X )

r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

r E(N)E( X )

Var ( X ) Var ( N )

1

1

,

=

+

r E(N)

E2 ( X )

E(N)

E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

r E(N)E(X )

y

r E(N) =

k

Var ( X ) Var ( N )

+

E2 ( X )

E(N)

You also know how to derive from scratch this is the mother of all the formulas

for the limited fluctuation credibility model

y

k

If r E ( N ) =

Var ( X ) Var ( N )

y

+

=

2

E (X )

E(N)

k

CVX2 +

Var ( N )

E(N)

, then Z = 1 .

Please note that r is the number of insureds needed to achieve the full credibility. E ( N )

is the number of annual claims per insured. So r E ( N ) represents the expected number

of claims the insurer needs to have in its book of business to have the full credibility.

# of insureds in

the book of business

expected # of

claims per insured

E(N )

needs to have in its book of business

to have full credibility

1+ p

2

k

Var ( X ) Var ( N )

+

E2 ( X )

E(N)

y

r E(N) =

k

Var ( X ) Var ( N )

+

E(X )

E(N)

Var ( X )

E2 ( X )

Wrong!

need to square E ( X ) so the numerator and denominator are both dollar squared.

Please also note that

Var ( N )

E(N )

Var ( N )

E(N )

is fine.

Once again, remember that X is the dollar amount of a single claim incurred by one

policyholder and that N is the annual number of claims incurred by the policyholder.

Special case

Credibility formulas for the aggregate loss for one insured (credibility in terms of

the expected number of annual claims)

Set r = 1 .

1

If E ( N ) =

Z = min

1+ p

2

k

Var ( X ) Var ( N )

, then Z = 1 .

+

E2 ( X )

E(N)

your n

, 1 = min

E ( N ) to make Z=1

your n

,1

Var ( N )

2

n0 CVX +

E(N)

You are given:

Claim counts follow a Poisson distribution

Claim sizes follow a lognormal distribution with coefficient of variation of 3

Claim sizes and claim counts are independent

The number of claims in the 1st year is 1,000

The aggregate loss in the 1st year was 6.75 million

The manual premium for the 1st year was 5 million

The exposure in the 2nd year is identical to the exposure in the 1st year

The full credibility standard is to be within 5% of the expected aggregate loss

95% of the time

Determine the limited fluctuation credibility net premium (in millions) for the 2nd year.

Solution

We are asked to find the limited fluctuation credibility renewal net premium for Year 2.

So we are just concerned with one policy (or one insured). Set r = 1 .

Credibility for the aggregate loss

E(N) =

1+ p

2

k

1 + 95%

2

5%

Var ( X ) Var ( N )

+

=

E2 ( X )

E(N)

Var ( X ) Var ( N )

+

E2 ( X )

E(N)

We are told that the claim size X is lognormal with the coefficient of variation of one.

The information that X is lognormal is not needed. SOA just wants to scare us. What

matters is CVX . We are told that CVX = 3 .

In addition, we know that N is Poisson. So

Var ( N )

E(N)

=1.

E(N) =

Z = min

1.96

5%

( 32 + 1) = 10

1.96

5%

your n

, 1 = min

E ( N ) to make Z=1

1000

10

= E ( S n +1 S1 , S 2 ,..., S n ) = Z

P

Renewal

premium

1.96

5%

, 1 = min 10

(1

Z)

policyholder-specific

sample mean

5%

,1 =0.255

1.96

global mean

(manual rate)

= 0.255*6.75 + (1-0.255)*5=5.446

For each individual insured, the number of claims follows a Poisson distribution

The mean claim count varies by insured, and the distribution of mean claim

counts follow a gamma distribution

For a random sample of 1000 insureds, the observed claim counts are as follows:

# of claims, n

# of insureds, f n

n f n = 750 ,

0

512

1

307

2

123

3

41

4

11

5

6

n 2 f n = 1494

Claim sizes follow a Pareto distribution with mean 1500 and variance 6,750,000.

Guo Fall 2009 C, Page 193 / 284

The full credibility standard is to be with 5% of the expected aggregate loss 95%

of the time.

Determine the minimum number of insureds needed for the aggregate loss to be fully

credible.

Solution

1

r E(N) =

1+ p

2

k

Var ( X ) Var ( N )

+

E2 ( X )

E(N )

r=

1

E(N)

1+ p

2

=

k

1+ p

2

k

Var ( X ) Var ( N )

+

E2 ( X )

E(N)

1 + 95%

1.96

2

=

5%

5%

(X ) =

=

E(X )

6,750,000

.

1500

CV =

E(N

n fn

750

E(N) =

=

= 0.75 ,

1000 1000

r=

1

E(N)

1 1.96

=

0.75 5%

1+ p

2

k

2

)=

Var ( N )

E(N )

n2 fn

1000

Var ( N )

1

6,750,000

1500

2

X

E(N)

=3

1, 494

= 1.494

1000

0.9315

= 1.242

0.75

Var ( X ) Var ( N )

+

E2 ( X )

E(N)

( 3 + 1.242 ) =

6,518.42688

= 8, 691.24

0.75

Guo Fall 2009 C, Page 194 / 284

You are given the following information about a general liability book of business

comprised of 2500 insureds:

Xi =

Ni

j =1

following a negative binomial distribution with parameters r = 2 and = 0.2 .

Yi1 , Yi 2 , , Yi Ni are independent and identically distributed random variables

following a Pareto distribution with parameters = 3 and = 1000 .

The full credibility standard is to be within 5% of the expected aggregate losses

90% of the time.

Using classical credibility theory, determine the partial credibility of the annual loss

experience for the book of business.

Solution

First, lets calculate the # of insureds to have full credibility.

1

r=

1

E(N)

However,

1+ p

2

k

Var (Y )

E 2 (Y )

Var (Y ) Var ( N )

+

E 2 (Y )

E(N)

E (Y 2 ) E 2 ( Y )

E 2 (Y )

1

r=

E(N)

E (Y 2 )

E 2 (Y )

1+ p

2

k

E (Y 2 )

E (Y )

2

1+

Var ( N )

E(N)

E ( N ) = r = 2 ( 0.2 ) = 0.4 ,

Var ( N ) = r

(1 + ) ,

= 3 and

= 1000 .

= 0.2 .

Var ( N )

E(N)

= 1+

= 1 + 0.2 = 1.2

E (Y k ) =

1)(

E (Y 2 )

E (Y )

2

k!

2 ) ... (

k)

2 2

1)(

2)

E (Y ) =

2(

1)

2

, E (Y 2 ) =

2 ( 3 1)

3 2

2 2

1)(

2)

=4

E (Y 2 )

1+ p

2

k

1

r=

E(N)

1 + 90%

2

5%

1

=

0.4

4.2 1.645

=

0.4 5%

E (Y )

2

=2

1

2

E (Y 2 )

1+

E 2 (Y )

Var ( N )

E(N)

1 1.645

( 4 1 + 1.2 ) =

0.4 5%

1.645

= 10.5

5%

(4

1 + 1.2 )

1+ p

2

k

. For

1+ p

2

k

1.645

= 10.5

5%

= 11,365.305

1.645

Lets continue. 10.5

is the number of insured to get full credibility. However,

5%

the number of insureds is 2500 in the book of the business.

your r

=

r to make Z=1

Z=

2500

10.5

1.645

5%

50

= 0.469

1.645

10.5

5%

You are given the following information about a commercial auto liability book of

business:

Each insureds claim count has a Poisson distribution with mean , where

a gamma distribution with = 1.5 and = 0.2

has

Individual claim size amounts are independent and exponentially distributed with

mean 5000

The full credibility standard is for the aggregate losses to be within 5% of the

expected with probability 0.9

Using classical credibility, determine the expected number of claims required for full

credibility.

Solution

1

rE ( N )

needs to have in its book of business

to have full credibility

1+ p

2

=

k

1+ p

2

k

Var ( X ) Var ( N )

+

E2 ( X )

E(N)

1 + 90%

1.645

2

=

5%

5%

r = = 1.5 and = = 0.2 .

E ( N ) = r , Var ( N ) = r

X is exponentially distributed.

is Poisson and

is

+ 1) ,

Var ( N )

Var ( X )

E2 ( X )

E(N)

= 1+

= 1 + 0.2 = 1.2

=1

rE ( N ) =

1+ p

2

k

Var ( X ) Var ( N )

1.645

+

=

2

5%

E (X )

E(N )

(1 + 1.2 ) = 2381.302

So the insurer needs to have at least 2,381 claims in a year to have full credibility.

Please note that the following information is not necessary for us to solve the problem:

Var ( N )

= 1 + regardless of .

E(N)

The mean 5000 for the individual claim size random variable. If X is

Var ( X )

exponential, then 2

= 1 regardless of the mean.

E (X)

Nov 2003 #3

You are given:

The number of claims has a Poisson distribution

Claim sizes have a Pareto distribution with parameters = 0.5 and = 6

The number of claims and claim sizes are independent

The observed pure premium should be within 2% of the expected pure premium

90% of the time.

Determine the expected number of claims needed for full credibility.

Solution

The pure premium is the expected total annual claim dollar amount incurred by one

policyholder. Set r = 1 , we have:

1

E(N) =

1+ p

2

k

Var ( X ) Var ( N )

+

=

E2 ( X )

E(N)

1+ p

2

k

E(X2)

E

(X )

=2

E(X2)

E2 ( X )

= 0.5 and

1+

Var ( N )

E(N )

=6

1

6 1

=2

= 2.5

2

6 2

Guo Fall 2009 C, Page 198 / 284

N is Poisson. So

E(N) =

1.645

2%

E(N)

1+ p

2

k

Var ( N )

=1.

E(X2)

E

(X )

1+

Var ( N )

E(N )

1 + 90%

2

2%

( 2.5

1 + 1)

You are given:

The number of claims has probability function:

p ( x) =

m x

m x

q (1 q ) , x = 0,1, 2,..., m

x

The actual number of claims must be within 1% of the expected number of claims

with probability 0.95.

Determine q .

Solution

1

rE ( N ) =

1+ p

2

k

Var ( X ) Var ( N )

+

E2 ( X )

E(N)

This problem is concerned only with loss frequency. So we in the aggregate loss model

S=

N

i =1

rE ( N ) =

1+ p

2

k

Var ( N )

E(N)

1

rE ( N ) =

1+ p

2

k

Var ( N )

E(N)

mq (1 q )

mq

=1 q

Var ( N )

E(N)

1.96

=

1%

(1

q ) = 34,574

q = 0.9

May 2005 #2

You are given:

The number of claims follows a negative binomial distribution with parameters r

and = 3 .

Claim severity has the following distribution

The number

the severity

Claim Size

1

10

100

Probability

0.4

0.4

0.2

of claims is independent of

of claims.

Determine the expected number of claims needed for aggregate losses to be within 10%

of the expected aggregate losses with 95% probability.

Solution

Claim Size X

1

10

100

Probability

0.4

0.4

0.2

1

rE ( N ) =

1+ p

2

k

Var ( N )

E(N)

= 1+

= 1+ 3 = 4 .

Var ( X ) Var ( N )

1.96

+

=

2

10%

E (X)

E(N)

1, 445.04

+ 4 = 2469.06

24.42

You are given:

The number of claims follows a Poisson distribution

(unknown) and

Claim sizes follow a gamma distribution with parameters

= 10, 000

The number of claims and claim sizes are independent

The full credibility standard has been selected so that actual aggregate losses will

be within 10% of the expected aggregate losses 95% of the time

Using limited fluctuation (classical) credibility, determine the expected number of claims

required for full credibility.

Solution

1

rE ( N ) =

1+ p

2

k

1+ p

2

k

Var ( X ) Var ( N )

+

=

E2 ( X )

E(N)

2

Var ( X ) + E 2 ( X )

E2 ( X )

rE ( N ) =

Since

1+ p

2

k

E(X2)

E

(X )

1+ p

2

k

E2 ( X )

+1

E(X2)

E2 ( X )

and E ( X 2 ) =

Var ( X )

1+ p

2

k

1+ p

2

k

+ 1)

+1

1.96

10%

+1

Chapter 9

Bayesian estimate

Exam C routinely tests Bayesian premium problems. Though many seem to understand

the theory behind Bayesian premiums, they have trouble calculating Bayesian premiums.

Most candidates are weak in the following two areas:

When the prior probability is continuous, many candidates dont know how to

calculate the posterior probability or how to find the Bayesian premium.

Continuous-prior problems are typically harder than discrete-prior problems.

When the prior probability is discrete and the calculation is messy, many

candidates dont know how to solve the problem in a few minutes. Many

candidates have inefficient calculation methods that are long and prone to errors.

In this chapter, I will first give you an intuitive review of Bayes Theorem. Next, I will

give you a framework for quickly solving Bayesian premium problems whether the prior

probability is discrete and continuous. In addition, I will give you a BA II Plus/BA II Plus

Professional shortcut for calculating Bayesian premiums when the prior probability is

discrete.

Even you are proficient in Bayes Theorem, I recommend that you still go over the

review. It is the foundation for the framework and shortcut to be presented later.

Prior probability. Before anything happens, as our baseline analysis, we believe (based

on existing information we have up to now or using purely subjective judgment) that our

total risk pool consists of several homogenous groups. As a part of our baseline analysis,

we also assume that these homogenous groups have different sizes. For any insured

person randomly chosen from the population, he is charged a weighed average premium.

As an over-simplified example, we can divide, by the aggressiveness of a persons

driving habits, all insureds into two homogenous groups: aggressive drivers and nonaggressive drivers. In regards to the sizes of these two groups, we assume (based on

existing information we have up to now or using purely subjective judgment) that the

aggressive insureds account for 40% of the total insureds and non-aggressive account

for the remaining 60%.

So for an average driver randomly chosen from the population, we charge a weighed

average premium rate (we believe that an average driver has some aggressiveness and

some non-aggressiveness):

= 40%*premium rate for an aggressive drivers rate

+ 60%*premium rate for a non-aggressive drivers rate

Posterior probability. Then after a year, an event changed our belief about the makeup

of the homogeneous groups for a specific insured. For example, we found in one year one

particular insured had three car accidents while an average driver had only one accident

in the same time period. So the three-accident insured definitely involved more risk than

did the average driver randomly chosen from the population. As a result, the premium

rate for the three-accident insured should be higher than an average drivers premium

rate.

The new premium rate we will charge is still a weighted average of the rates for the two

homogeneous groups, except that we use a higher weighting factor for an aggressive

drivers rate and a lower weighting factor for a non-aggressive drivers rate.

For example, we can charge the following new premium rate:

Premium rate for a driver who had 3 accidents last year

= 67%* premium rate for an aggressive drivers rate

+ 33%* premium rate for a non-aggressive drivers rate

In other words, we still think this particular drivers risk consists of two risk groups

aggressive and non-aggressive, but we alter the sizes of these two risk groups for this

specific insured. So instead of assuming that this persons risk consists of 40% of an

aggressive drivers risk and 60% of a non-aggressive drivers risk, we assume that his

risk consists of 67% of an aggressive drivers risk and 33% of a non-aggressive drivers

risk.

How do we come up with the new group sizes (or the new weighting factors)? There is a

specific formula for calculating the new group sizes:

For any given group,

Group size after an event

=K the group size before the event this groups probability to make the event happen.

K is a scaling factor to make the sum of the new sizes for all groups equal to 100%.

In our example above, this is how we got the new size for the aggressive group and the

new size for the non-aggressive group. Suppose we know that the probability for an

aggressive driver to have 3 car accidents in a year is 15%; the probability for a nonaggressive driver to have 3 car accidents in a year is 5%. Then for the driver who has 3

accidents in a year,

the size of the aggressive risk for someone who had 3 accidents in a year

Guo Fall 2009 C, Page 203 / 284

(probability of an aggressive driver having 3 car accidents in a year)

= K (40% )(15%)

the size of the non-aggressive risk for someone who had 3 accidents in a year

= K (prior size of the non-aggressive risk)

(probability of a no- aggressive driver having 3 car accidents in a year)

= K ( 60% ) (5%)

K is a scaling factor such that the sum of posterior sizes is equal to one. So

K ( 40% ) (15%) + K ( 60% ) ( 5%) =1,

K=

1

= 11.11%

40% (15% ) + 60% ( 5% )

the size of the aggressive risk for someone who had 3 accidents in a year

= 11.11% (40% ) ( 15% )= 66.67%

the size of the non-aggressive risk for someone who had 3 accidents in a year

=11.11% (60% ) ( 5%) = 33.33%

The above logic should make intuitive sense. The bigger the size of the group prior to the

event, the higher contribution this group will make to the events occurrence; the bigger

the probability for this group to make the event happen, the higher the contribution this

group will make to the events occurrence. So the product of the prior size of the group

and the groups probability to make the event happen captures this groups total

contribution to the events occurrence.

If we assign the post-event size of a group proportional to the product of the prior size

and the groups probability to make the event happen, we are really assigning the postevent size of a group proportional to this groups total contribution to the events

occurrence. Again, this should make sense.

Lets summarize the logic for finding the new size of each group in the following table:

A

B

C

Homogenous

BeforeGroups

groups (also called event

probability to

segments, which

group size make the even

are 2 components

happen

of a risk)

Aggressive

40%

15%

D=(scaling factor K) BC

Post-event group size

K40%15%

=

Non-aggressive

60%

5%

40% 15%

40% 15% + 60% 5%

K60%5%

=

60% 5%

40% 15% + 60% 5%

If we divide the population into n non-overlapping groups G1,G 2, ...,Gn such that each

element in the population belongs to one and only one group, then after the event E

occurs,

Pr(Gi | E ) = K Pr(Gi ) Pr( E | Gi )

K is a scaling factor such at

K [ Pr(G1 | E ) + Pr(G2 | E ) + ... + Pr(Gn | E )] = 1

Or K [Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )] = 1

So K=

1

Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )

And Pr(Gi | E ) =

Pr(Gi ) Pr( E | Gi )

Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )

Pr(Gi | E ) is the conditional probability that Gi will happen given the event E happened,

so it is called the posterior probability. Pr(Gi | E ) can be conveniently interpreted as the

new size of Group Gi after the event E happened. Intuitively, probability can often be

interpreted as a group size.

For example, if a probability for a female to pass Course 4 is 55% and male 45%, we can

say that the total pool of the passing candidates consists of 2 groups, female and male

with their respective sizes of 55% and 45%.

Guo Fall 2009 C, Page 205 / 284

Pr(Gi ) is the probability that Gi will happen prior to the event Es occurrence, so its

called prior probability. Pr(Gi ) can be conveniently interpreted as the size of group Gi

prior to the occurrence of E.

Pr( E | Gi ) is the conditional probability that E will happen given Gi has happened. It is the

Group Gi s probability of making the event E happen. For example, say a candidate who

has passed Course 3 has 50% chance of passing Course 4, that is to say:

Pr(passing Course 4 / passing Course 3)=50%

We can say that the people who passed Course 3 have a 50% of chance of passing Course

4.

Before we jump into the formula, lets look at a sixth-grade level math problem, which

requires zero knowledge about probability. If you understand this problem, you should

have no trouble understanding Bayes Theorem.

Problem 1

A rock is found to contain gold. It has 3 layers, each with a different density of gold. You

are given:

The top layer, which accounts for 80% of the mass of the rock, has a gold density

of only 10% (i.e. the amount of gold contained in the top layer is equal to 10% of

the mass of the top layer).

The middle layer, which accounts for 15% of the rocks mass, has a gold density

of 5%.

The bottom layer, which accounts for only 5% of the rocks mass, has a gold

density of 0.2%.

Questions

What is the rocks density of gold (i.e. what % of the rocks mass is gold)?

Of the total amount of gold contained in the rock, what % of gold comes from the top

layer? What % from the middle layer? What % comes from the bottom layer?

Solution

Lets set up a table to solve the problem. Assume that the mass of the rock is one (can be

1 pound, 1 gram, 1 ton it doesnt matter).

Guo Fall 2009 C, Page 206 / 284

A

Layer

2

3

4

5

Top

Middle

Bottom

Total

B

Mass of

the layer

0.80

0.15

0.05

1.00

C

Density of

gold in the

layer

10.0%

5.0%

0.2%

D=BC

Mass of gold

contained in the

layer

0.0800

0.0075

0.0001

0.0876

E=D/0.0876

Of the total amount of

gold in the rock, what %

comes from this layer?

91.3%

8.6%

0.1%

100%

Cell(D,2)=0.810%=0.08,

Cell(D,5)=0.0800+0.0075+0.0001=0.0876,

Cell(E,2)= 0.08/0.0876=91.3%.

So the rock has a gold density of 0.0876 (i.e. 8.76% of the mass of the rock is gold).

Of the total amount of gold contained in the rock, 91.3% of the gold comes from the top

layer, 8.6% of the gold comes from the middle layer, and the remaining 0.1% of the gold

comes from the bottom layers. In other words, the top layer contributes to 91.3% of the

gold in the rock, the middle layer 8.6%, and the bottom layer 0.1%.

The logic behind this simple math problem is exactly the same logic behind Bayes

Theorem.

Now lets change the problem into one about prior and posterior probabilities.

Problem 2

In underwriting life insurance applications for nonsmokers, an insurance company

believes that theres an 80% chance that an applicant for life insurance qualifies for the

standard nonsmoker class (which has the standard underwriting criteria and the standard

premium rate); theres a 15% chance that an applicant qualifies for the preferred smoker

class (which has more stringent qualifying standards and a lower premium rate than the

standard nonsmoker class); and theres a 5% chance that the applicant qualifies for the

super preferred class (which has the highest underwriting standards and the lowest

premium rate among nonsmokers).

According to medical statistics, different nonsmoker classes have different probabilities

of having a specific heart-related illness:

The standard nonsmoker class has 10% of chance of getting the specific heart

disease.

The preferred nonsmoker class has 5% of chance of getting the specific heart

disease.

Guo Fall 2009 C, Page 207 / 284

The super preferred nonsmoker class has 0.2% of chance of getting the specific

heart disease.

If a nonsmoking applicant was found to have this specific heart-related illness, what is

the probability of this applicant coming from the standard risk class? What is the

probability of this applicant coming from the preferred risk class? What is the probability

of this applicant coming from the super preferred risk class?

Solution

The solution to this problem is exactly the same as the one to the rock problem.

Event: the applicant was found to have the specific heart disease

A

B

C

E=D/0.0876

D=BC

(i.e. the scaling factor

=1/0.0876)

1

Group

BeforeThis groups After-event

After-event size of the

(or

event size probability

size of the

group (scaled)

segment) of the

of having

group (not yet

group

the specific

scaled)

heart illness

2

Standard

0.80

10.0%

0.0800

91.3%

3

Preferred

0.15

5.0%

0.0075

8.6%

4

Super

0.05

0.2%

0.0001

0.1%

Preferred

5

Total

1.00

0.0876

100%

So if the applicant was found to have the specific heart disease, then

Theres a 91.3% chance he comes from the standard risk class;

Theres an 8.6% chance he comes from the preferred risk class;

Theres a 0.1% chance he comes from the super preferred risk class.

Framework for calculating the discrete posterior probability

When calculating the discrete posterior probability, if the problem is tricky, try to set up

the table as we did in Problem 1 and Problem 2. Use this table to help you keep track of

your data and work.

Problem 3

1% of the women at age 45 who participate in a study are found to have breast cancer.

80% of women with breast cancer will have a positive mammogram. 10% of women

without breast cancer will also have a positive mammogram. One woman aged 45 who

participated in the study was found to have a positive mammogram.

Guo Fall 2009 C, Page 208 / 284

Solution

This problem is tricky and many folks wont be able to solve this problem right.

To solve this problem, we need to correctly identify the following 3 items:

What are the distinct causes (i.e. segments) that can possibly produce the event?

Make sure your causes are mutually exclusive (i.e. no two causes can happen

simultaneously) and collectively exhaustive (i.e. there are no other causes).

Causes of this event two distinct causes. Women with breast cancer and without breast

cancer. These are the two segments. In terms of size of each segment, women with breast

cancer account for 1% of the participants; and women without breast cancer account for

99%.

Each cause probability to produce the event women with breast cancer have 80%

chance of having a positive mammogram. Women without breast cancer have 10% of the

chance of having a positive mammogram.

Next, we set up the following table:

Event: a woman in the study is found to have a positive mammogram.

Segments

Segments

Segments

contribution

contribution % to the

probability to

Segment (distinct

Segments

produce the

amount to the

event (post event

causes)

size

event

event

probability)

women with breast

cancer

1%

80%

cancer

99%

10%

99%(10%)=0.099 0.009/0.107=92.52%

Total

100%

0.107

100%

mammogram, then she has 7.48% chance of actually having breast cancer.

A health study tracked a group of persons for five years. At the beginning of the study,

20% were classified as heavy smokers, 30% as light smokers, and 50% as nonsmokers.

Results of the study showed that light smokers were twice as likely as nonsmokers to die

during the five-year study, but only half as likely as heavy smokers.

A randomly selected participant from the study died over the five-year period.

Calculate the probability that the participant was a heavy smoker.

Solution

Let p =the probability that a non-smoker will die during the next 5 years. Then,

The probability that a light smoker will die during the next 5 years is 2 p

The probability that a heavy smoker will die during the next 5 years is 4 p

Please note that we dont enough information to calculate p . This shouldnt bother us.

We need to know the value of p to solve the problem.

Event: A participant died during the 5-year period

Segment's

Segment

probability to

Segment's

Segment's

Segment

size

produce the event contribution amount

contribution %

4p

20%(4 p )=0.8 p 0.8 p /1.9 p =42.11%

Heavy smoker

20%

2p

30%(2 p )=0.6 p 0.6 p /1.9 p =31.58%

Light smoker

30%

p

50%( p )= 0.5 p

0.5 p /1.9 p =26.32%

Non smoker

50%

1.9 p

Total

100%

100.00%

The probability that the participant was a heavy smoker is 42.11%.

The probability that the participant was a heavy smoker is 31.58%.

The probability that the participant was a heavy smoker is 26.32%.

Morale of this problem

In problems related to Bayes Theorem, the absolute size of each segment doesnt

matter; only the ratio of each segment size matters. Similarly, the absolute

probability for each segment to produce the event doesnt matter; only the ratio of

probabilities matters.

If we are to solve this problem quickly, we can set up the following table:

Segment

Heavy smoker

Light smoker

Non smoker

Total

Segment

size

2

3

5

10

Segment's probability to

produce the event

4

2

1

Segment's

contribution

amount

2(4)=8

3(2)=6

5(1)=5

19

Segment's

contribution %

8/19=42.11%

6/19=31.58%

5/19=26.32%

100%

In the above table, we change the segment sizes from 20%, 30%, and 50% to 2, 3, and 5.

Similarly, we change the segments probabilities from 4 p , 2 p , and p to 4, 2, and 1.

This speeds up our calculations. You can use this technique when taking the exam.

Problem 5 (May 2000, #22)

You are given:

A portfolio of independent risks is divided into two classes, Class A and Class B.

There are twice as many risks in Class A as in Class B.

The number of claims for each insured during a single year follows a Bernoulli

distribution.

Class A and B have claim size distributions as follows:

Claim Size

50,000

100,000

Class A

0.6

0.40

Class B

0.36

0.64

The expected number of claims per year is 0.22 for Class A and 0.11 for Class B.

One insured is chosen at random. The insureds loss for two years combined is 100,000.

Calculate the probability that the selected insured belongs to Class A.

Solution

This time, well use a formula driven approach without a table. Lets S represent the

total claim $ amount incurred by the randomly chosen insured during the 2-year period.

We observe that S = 100, 000 . We are asked to find P ( A S = 100, 000 ) , which is the

posterior probability that Class A has incurred a total loss of $100,000 during the 2-year

period.

Using either the conditional probability formula or the Bayes Theorem, we have:

P ( A S = 100, 000 ) =

=

P ( S = 100, 000 )

P ( S = 100, 000 )

Guo Fall 2009 C, Page 211 / 284

P ( A ) P ( S = 100, 000 A )

P ( A S = 100, 000 ) =

=

1+

P ( A) P ( S = 100, 000 A)

1

=

P ( B ) P ( S = 100, 000 B )

P ( A ) P ( S = 100, 000 A )

P ( A S = 100, 000 ) =

1+

1

P ( S = 100, 000 B )

P ( B)

P ( A) P ( S = 100, 000 A )

1

1 P ( S = 100, 000 B )

1+

2 P ( S = 100, 000 A)

Ratio of P ( A) and P ( B ) , not their absolute amounts

Ratio of P ( S = 100, 000 A ) and P ( S = 100, 000 A ) , not their absolute amounts

P ( S = 100, 000 B )

P ( S = 100, 000 A )

P ( S = 100, 000 A ) is the probability that the Class A produces the observation (i.e. Class

We are told that the # of claims for Class A and B is a Bernoulli random variable.

Remember that Bernoulli random variable is just a binominal random variable with n = 1

(only one trial). Let X represent the # of claims incurred by the insured. Let p represent

the probability for the insured to have a claim. Then E ( X ) = p . We are told that

E ( X A ) = 0.22 . So pA = 0.22 . Similarly, E ( X B ) = pB = 0.11 .

So each year, Class A can have either zero claim (with probability 0.78) or one claim

(0.22). The claim amount is either 50,000 (probability 0.6) and 100,000 (probability 0.4).

Each year, Class B can have either zero claim (with probability 0.89) or one claim (0.11).

The claim amount is either 50,000 (probability 0.36) and 100,000 (probability 0.64).

Guo Fall 2009 C, Page 212 / 284

There are only 3 ways for Class A or B to produce $100,000 claims in two years:

Have $50,000 claim in Year 1 and $50,000 Year 2.

Have $100,000 claim in Year 1 and $0 claim in Year 2.

Have $0 claim in Year 1 and $100,000 claims in Year 2.

P ( S = 100, 000 A) = ( 0.222 )( 0.62 ) + 2 ( 0.22 )( 0.78 )( 0.4 ) = 0.1547

P ( A S = 100, 000 ) =

1

1

=

= 0.709

1 P ( S = 100, 000 B ) 1 + 1 0.1269

1+

2 0.1547

2 P ( S = 100, 000 A )

Problem 6 (continuous random variable)

You are tossing a coin. Not knowing p , the success rate of a heads showing up in one

toss of the coin, you subjectively assume that p is uniformly distributed over [ 0,1] . Next,

you do an experiment by tossing the coin 3 times. You find that, in this experiment, 2 out

of 3 tosses have heads.

Calculate the posterior probability p .

Solution

A

B

C

1 Group

BeforeThis groups

event

probability

size of the

to make the

group

event

happen

1

2 Any p in

C32 p 2 (1 p )

[0,1]

D=BC

After-event size After-event size of the

of the group (not group (scaled)

yet scaled)

C32 p 2 (1 p )

C32 p 2 (1 p )

1

C32 p 2 (1 p )dp

Total

C32 p 2 (1 p )dp

100%

The key to solving this problem is to understand that we have an infinite number of

groups. Each value of p ( 0 p 1 ) is a group. Because p is uniform over

Guo Fall 2009 C, Page 213 / 284

[0,1], f ( p ) = 1 . As a result, for a given group of p , the before-event size is one. And for

a given group of p , this groups probability to make the event getting 2 heads out of 3

tosses happen is a binomial distribution with probability of C32 p 2 (1 p ) . So the afterevent size is

C32 p 2 (1 p )

group size

to have 2 heads out of 3 tosses

k is a scaling factor such that the sum of the after-event sizes for all the groups is equal to

one. Since we have an infinite number of groups, we have to use integration to sum up all

the after-event sizes for each group:

1

k C32 p 2 (1 p )dp = 1

k=

1

1

C32 p 2 (1 p )dp

k C32 p 2 (1 p ) =

C32 p 2 (1 p )

1

C p (1 p )dp

2

3

p 2 (1 p )

1

p 2 (1 p )dp

It turns out that the posterior probability we just calculated is a Beta distribution.

Key point

The process for calculating the continuous posterior probability is the same for

calculating the discrete posterior probability. The only difference is this: you use

integration for continuous posterior probability; you use summation for discrete posterior

probability.

Problem 7 (May 2000 #10)

The size of a claim for an individual insured follows an inverse exponential distribution

with the following probability density function:

e x

f (x ) =

x2

The parameter

function:

, x>0

Guo Fall 2009 C, Page 214 / 284

g(

)=

>0

One claim of size 2 has been observed for a particular insured. Which of the following is

proportional to the posterior distribution of ?

2

e ,

Solution

The observation is x = 2 . We need to find g ( x = 2 ) .

g ( x = 2) =

posterior density

g ( x = 2 ) = kg (

g(

k

scaling factor

g ( x = 2) = k

e

22

k

e

16

x =2

3

is proportional to e

make the event happen

e x

x2

f (x = 2

posterior density

) f (x = 2 ) = k

4

Here the problem didnt ask you to find the full posterior probability. If you have to find

it, this is how. One way is to do integration. Assume g ( x = 2 ) = K e 3 4 . Because the

total posterior probability should be one, we have:

+

g ( x = 2 )d =

K e

K=

d = 1,

1

+

0

+

To calculate

d , set

e

0

d =

0

3

= y . Then

4

4

4

4

y e yd

y =

3

3

3

2 +

ye y dy = 1 , K =

gamma distribution. So

0

4

y.

3

4

ye dy =

3

y

3

4

. Here ye

is a simple

9

9

, and g ( x = 2 ) = xe

16

19

3x 4

=

= 2 and

4

. If you look at the table for

3

1

f ( x) =

( 4 3)

x 2 1e

x ( 4 3)

9

xe

16

3x 4

You are given:

In a portfolio of risks, each policyholder can have at most one claim per year.

The probability of a claim for a policyholder during a year is q .

(q) =

q3

, 0.6 < q < 0.8

0.07

A randomly selected policyholder has one claim in Year 1 and zero claim in Year 2.

For this policyholder, determine the posterior probability that 0.7 < q < 0.8 .

Solution

The observation is N1 = 1 and N 2 = 0 . We are asked to find the posterior probability

P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 )

0.8

f ( q N1 = 1, N 2 = 0 )dq

q = 0.7

f ( q N1 = 1, N 2 = 0 ) =

f ( q ) P ( N1 = 1, N 2 = 0 q )

P ( N1 = 1, N 2 = 0 )

f ( q ) P ( N1 = 1, N 2 = 0 q )

0.8

f ( q ) P ( N1 = 1, N 2 = 0 q ) dq

0.6

independent.

f ( q ) P ( N1 = 1, N 2 = 0 q ) =

q3

q 4 q5

q (1 q ) =

0.07

0.07

q 4 q5

f ( q N1 = 1, N 2 = 0 ) = 0.8 40.075

=

q q

dq

0.07

0.6

q4 q5

0.8

q = 0.7

0.8

0.8

1 7

q

7 ! 0.7

1 5

q

5

1 6

q

6 ! 0.6

0.8

1

0.86 0.7 6 )

(

=5

1

( 0.85 0.65 )

5

0.8

q ( q 4 q 5 )dq

(q

q 5 ) dq

0.6

1 6

q

6

q 5 ) dq

0.6

0.8

(q

(q

q = 0.7

0.8

(q

q 6 )dq

q 5 ) dq

0.6

1

0.87 0.77 )

(

6

= 0.5572

1

6

6

( 0.8 0.6 )

6

You are given:

The # of claims for each policyholder follows a Poisson distribution with mean

The distribution of across all policyholders has probability density function

( )=

e

"

d =

>0

,

1

n2

A randomly selected policyholder is known to have had at least one claim last year.

Determine the posterior probability that this same policyholder will have at least one

claim this year.

Solution

The observation is N1 # 1 . We are asked to find P ( N 2 # 1 N1 # 1) . If we ignore N1 # 1 ,

then by conditioning on

, we have:

P ( N2 # 1

P ( N 2 # 1) =

) f ( )d

=0

N2

P ( N2 # 1

) =1

P ( N2 = 0

(1

P ( N 2 # 1) =

) =1

) f ( )d

=0

(1

P ( N 2 # 1 N1 # 1) =

)f(

N1 # 1) d

=0

Next, we have:

f

N1 # 1) =

f

( ) P ( N1 # 1 )

( ) P ( N1 # 1 ) d

(1

)d

e d

N1 # 1) =

d =

4

e

3

P ( N 2 # 1 N1 # 1) =

(1

(1

)f(

1

12

4

3

)d

(1

2

(1

N1 # 1) d =

=0

2e

+e

)d

=0

4 1

3 12

1 3

=

22 4

=0

(1

(1

4

3

e d

0

) 43

e

2

0

(1

d +

)d

1 1

+

= 0.8148

22 32

Next, Ill give you a framework for how to calculate Bayesian problems. As I explain my

framework, I will also give you a shortcut.

Framework for calculating discrete-prior Bayesian premiums

Step 1

Step 2

Step 3

Step 2. Change the prior probability to posterior probability.

Step 4

Step 5

Problem 6 (Nov 2001 #7)

You are given the following information about six coins:

Coin

1-4

5

6

Probability of Heads

0.50

0.25

0.75

A coin is selected at random and the flipped repeatedly. X i denotes the outcomes of the

i th flip, where 1 indicates heads and 0 indicates tails. The following sequence is

obtained:

S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}

Determine E ( X 5 S ) using Bayesian analysis.

Solution

Step 1 Determine the observation. This is easy; we are already told the observation is

S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}

Step 2 Discard the observation. Set up the partition equation.

Guo Fall 2009 C, Page 219 / 284

Now were going to simplify the problem by purposely discarding the observation. So

instead of calculating E ( X 5 S ) , well just calculate E ( X 5 ) . X 5 is the # of heads

showing up in the fifth flip of the coin randomly chosen. X 5 is a binominal random

variable with parameter n = 1 (one flip of coin) and p (the probability of the head

showing up). Using the binomial distribution formula, we have:

E ( X5 ) = n p = p

However, the parameter p varies by coin types. For Coin 1-4, p = 0.5 ; for Coin 5,

p = 0.25 ; and for Coin 6, p = 0.75 . Because the coin is randomly chosen from Coin 1, 2,

3, 4, 5, and 6, we dont know which coin is chosen. So well need to partition E ( X 5 )

over coin types:

E ( X5 )

E ( X 5 Coin 1-4 ) = P ( Coin 1-4 showing a head in one flip ) = 0.5

E ( X 5 Coin 5 ) = P ( Coin 5 showing a head in one flip ) = 0.25

E ( X 5 Coin 6 ) = P ( Coin 6 showing a head in one flip ) = 0.75

We can go one step further and calculate E ( X 5 ) . Though the problem doesnt

specifically tell us P ( Coin 1-4 ) , P ( Coin 5) , and P ( Coin 6 ) , we assume that coins are

uniformly distributed so each coin is equally likely to be chosen. So

P ( Coin 1-4 ) =

4

,

6

P ( Coin 5 ) =

1

,

6

P ( Coin 6 ) =

1

6

4

1

1

E ( X 5 ) = 0.5 + 0.25 + 0.75 = 0.5

6

6

6

Of course, this problem isnt as simple as this. Otherwise, everyone who has passed

Exam P will pass Exam C.

Step 3 Consider the observation. Modify the equation obtained in Step 2. Change

Guo Fall 2009 C, Page 220 / 284

We have found E ( X 5 ) . The real problem, however, is to find E ( X 5 S ) . So well need

to modify our equation obtained in Step 2. The original partition equation (if we discard

the observation) is:

E ( X5 )

How to modify:

E ( X5 )

E ( X5 S )

P ( Coin 1-4 )

P ( Coin 1-4 S )

P ( Coin 5)

P ( Coin 5 S )

P ( Coin 6 )

P ( Coin 6 S )

this observation, we can no longer assume that the coin randomly chosen has 4 6 chance

of being Coin 1-4, 1 6 chance of being Coin 5, and 1 6 chance of being Coin 6; these

probabilities would have fine if we didnt observe S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} . Now

reevaluate the probability that the coin belongs to which type. So well replace the prior

probabilities P ( Coin 1-4 ) , P ( Coin 5) , and P ( Coin 6 ) with posterior probabilities

P ( Coin 1-4 S ) , P ( Coin 5 S ) , and P ( Coin 6 S ) respectively.

the conditional expectation.

Now the new equation is:

E ( X5 S )

= E ( X 5 Coin 1- 4 ) P ( Coin 1- 4 S ) + E ( X 5 Coin 5 ) P ( Coin 5 S ) + E ( X 5 Coin 6 ) P ( Coin 6 S )

= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )

E ( X 5 S ) = 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )

Please note that our observation S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} doesnt change how

likely each coin actually produces a head in one flip. So the following three items are

fixed regardless of our observation:

E ( X 5 Coin 1-4 ) = P ( Coin 1-4 showing a head in one flip ) = 0.5

E ( X 5 Coin 5 ) = P ( Coin 5 showing a head in one flip ) = 0.25

E ( X 5 Coin 6 ) = P ( Coin 6 showing a head in one flip ) = 0.75

P ( Coin 1- 4 S ) =

P ( Coin 5 S ) =

P ( Coin 6 S ) =

P(S

=

P(S )

P(S )

P(S

=

P(S )

P(S )

P(S

=

P(S )

P(S )

Where

P ( S ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )

Detailed calculation:

P ( S Coin 1- 4 ) = P (1,1, 0,1 Coin 1- 4 ) = 0.5 ( 0.5 )( 0.5 )( 0.5 ) = 0.54

P ( S Coin 5 ) = P (1,1, 0,1 Coin 5 ) = 0.25 ( 0.25 )( 0.75 )( 0.25 ) = 0.253 ( 0.75 )

P ( S Coin 6 ) = P (1,1, 0,1 Coin 6 ) = 0.75 ( 0.75 )( 0.25 )( 0.75 ) = 0.753 ( 0.25 )

P(S

P(S

P(S

4

Coin 1- 4 ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) = 0.54

6

1

Coin 5 ) = P ( Coin 5 ) P ( S Coin 5 ) = 0.253 0.75

6

1

Coin 6 ) = P ( Coin 6 ) P ( S Coin 6 ) = 0.753 0.25

6

P(S ) =

4

1

1

0.54 ) + ( 0.253 ) ( 0.75 ) + ( 0.753 ) ( 0.25 )

(

6

6

6

P ( Coin 1- 4 S ) =

4

( 0.54 )

6

4

1

1

0.54 ) + ( 0.253 ) ( 0.75 ) + ( 0.753 ) ( 0.25)

(

6

6

6

= 0.681

1

0.253 ) ( 0.75 )

(

6

P ( Coin 1- 4 S ) =

= 0.032

4

1

1

4

3

3

( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)

6

1

0.753 ) ( 0.25 )

(

6

P ( Coin 1- 4 S ) =

= 0.287

4

1

1

4

3

3

( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)

6

E ( X5 S )

= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )

I recommend that initially you use the 5-step framework to calculate discrete-prior

Bayesian premiums. Just copy what I did. Explicitly write out each of the 5 steps; dont

skip step. Solve as many problems as you need until you are proficient with the

framework.

Once you are familiar with the 5-step process, lets learn how to improve it. Well focus

on improving Step 4 (calculating the posterior probabilities). If you ever solve a Bayesian

premium problem, youll have discovered that Step 4 is long, tedious, and prone to errors.

Take a look at Step 4 in Problem 4. See how involved the calculation is. When taking the

exam, you are really stressed. In addition, you have only 3 minutes to solve a problem. If

you follow the standard solution approach, chances are high that youll mess up at least

one step of your calculation. Then all your hard work is ruined. You wont be able to

score a point.

Most exam candidates will mess up in Step 4 . Lets find a better way to do Step 4.

What are we doing in Step 4? Two things. First, we calculate the raw posterior

probabilities:

P(S

P(S

P(S

4

Coin 1- 4 ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) = 0.54

6

1

Coin 5 ) = P ( Coin 5 ) P ( S Coin 5 ) = 0.253 0.75

6

1

Coin 6 ) = P ( Coin 6 ) P ( S Coin 6 ) = 0.753 0.25

6

constant

k=

=

1

P(S )

1

P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )

After multiplying each raw posterior probability with this constant, the three posterior

probabilities will nicely add up to one. Normalization is necessary; its a part of Bayes

Theorem. However, it is a messy calculation. So ideally, well want to avoid it.

It turns out that we really can avoid normalizing the raw posterior probabilities. To

understand how to avoid normalization, lets formally present the question:

Problem -- Calculate E ( X ) given the following information:

X =x

pX ( x )

0.5

4

0.54 ) k

(

6

0.25

0.75

1

0.253 ) ( 0.75 ) k

(

6

1

( 0.753 ) ( 0.25) k

6

Statistics Worksheet. This is how we solved it without calculating k .

X =x

pX ( x )

Scaled p X ( x ) up

multiply p X ( x ) by

0.5

0.25

0.75

4

0.54 ) k = 0.041667 k

(

6

1

( 0.253 ) ( 0.75) k = 0.001953 k

6

1

( 0.753 ) ( 0.25) k = 0.017578 k

6

1, 000, 000

k

41,667

1,953

17,578

X01=0.5, Y01=41,667

X02=0.25, Y02= 1,953

X03=0.75, Y03=17,578

Next, set your BA II Plus/Professional to I-V Statistics Worksheet. You do this by

pressing 2ND Stat and then keeping pressing ENTER until your calculator displays

1-V.

Press the down arrow key & . You should get: n = 61,198

Press the down arrow key & . You should get: X = 0.56382970

So E ( X 5 S ) ' X = 0.564

This the result calculated using BA II Plus/Professional 1-V Statistics Worksheet matches

what we calculated in the 5-step process.

Now its time for me to present my shortcut

A

B

C

D=BC

Group Before- This

After-event size of the

(Coin event

groups

group (raw posterior

Type) size of probability probability)

the

to produce

group

HHTH

0.54

4

4

0.54 ) = 0.041667

(

1-4

6

6

3

0.25

1

1

0.253 ) ( 0.75 ) = 0.001953

(

5

0.75

6

6

3

0.75

1

1

0.753 ) ( 0.25 ) = 0.017578

(

6

0.25

6

6

E = 1,000,000

F

Scale up raw Conditional

posterior

mean

probability

41,667

0.50

1,953

0.25

17,578

0.75

X01=0.5, Y01=41,667

X02=0.25, Y02= 1,953

X03=0.75, Y03=17,578

Next, set your BA II Plus/Professional to I-V Statistics Worksheet. You do this by

pressing 2ND Stat and then keeping pressing ENTER until your calculator displays

1-V.

Press the down arrow key & . You should get: n = 61,198

Press the down arrow key & . You should get: X = 0.56382970

So E ( X 5 S ) ' X = 0.564

decimal places. This is even faster:

Event: the coin produces HHTH

A

B

C

D=BC

Group Before- This

After-event size of the

(Coin event

groups

group (raw posterior

Type) size of probability probability)

the

to produce

group

HHTH

I

0.54

4

4

0.54 ) = 0.0417

(

6

6

II

0.253

1

1

0.253 ) ( 0.75 ) = 0.0020

(

0.75

6

6

III

0.753

1

1

0.753 ) ( 0.25 ) = 0.0176

(

0.25

6

6

E = 10,000

Scale up raw

posterior

probability

F

Conditional

mean

417

0.50

20

0.25

176

0.75

X01=0.5, Y01=417

X02=0.25, Y02= 20

X03=0.75, Y03=176

Using 1-V Statistics Worksheet, you should get: n = 613 , X = 0.56362153 ' 0.564

Problem 7 (May 2000 #7)

You are given the following information about two classes of risks:

Risks in Class A have a Poisson claim count distribution with a mean of 1.0 per year.

Risks in Class B have a Poisson claim count distribution with a mean of 3.0 per year.

Risks in Class A have an exponential severity distribution with a mean of 1.0 per year.

Risks in Class B have an exponential severity distribution with a mean of 3.0 per year.

Each class has the same number of risks.

Within each class, severities and claim counts are independent.

A risk is randomly selected and observed to have 2 claims during one year. The observed

claim amounts were 1.0 and 3.0. Calculate the posterior expected value of the aggregate

loss for this risk during the next year.

Solution

Guo Fall 2009 C, Page 227 / 284

Conceptual framework

Let

S represent the aggregate claim dollar amount.

X represent the individual claim dollar amount

N represent the # of claims

N

i =1

First, lets make things simple and forget about the condition N = 2, X 1 = 1, X 2 = 3 . Then

have:

E ( S ) = E ( S A ) P ( A) + E ( S B ) P ( B )

The above formula is an Exam P concept. You shouldnt have trouble understanding it.

Here P ( A) and P ( B ) are prior probabilities, which are probabilities prior to our

observation { N = 2, X 1 = 1, X 2 = 3} .

Next,

E ( S A ) = E ( N A ) E ( X A ) = "A

= 1(1) = 1

E ( S B ) = E ( N B ) E ( X B ) = "B

= 3 ( 3) = 9

Here "A and "B are the Poisson means for claim counts for Class A and B respectively.

And A and B are exponential mean claim amounts for Class A and B respectively.

E ( S ) = P ( A) + 9P ( B )

Now lets move to the complex concept E ( S N = 2, X 1 = 1, X 2 = 3) . To calculate this

amount, well still use the formula E ( S ) = P ( A ) + 9 P ( B ) . However, well replace the

prior probabilities P ( A) and P ( B ) with posterior probabilities:

P ( A N = 2, X 1 = 1, X 2 = 3) , P ( B N = 2, X 1 = 1, X 2 = 3)

Our observation { N = 2, X 1 = 1, X 2 = 3} has changed our belief of the likelihood that the

risk is from Class A and Class B. So well no longer use the prior probability P ( A) and

P ( B ) to calculate E ( S ) .

In addition, well replace E ( S ) with E ( S N = 2, X 1 = 1, X 2 = 3) to indicate that the

expected aggregate claim amount is based on the observation { N = 2, X 1 = 1, X 2 = 3} .

Then our original partition equation becomes:

E ( S N = 2, X 1 = 1, X 2 = 3)

= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)

Next, well need to use the Bayes theorem to calculate the posterior probabilities

P ( A N = 2, X 1 = 1, X 2 = 3) and P ( B N = 2, X 1 = 1, X 2 = 3) :

P ( A N = 2, X 1 = 1, X 2 = 3) =

P ( N = 2, X 1 = 1, X 2 = 3)

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( B N = 2, X 1 = 1, X 2 = 3) =

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( N = 2, X 1 = 1, X 2 = 3)

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

If you understand my logic so far, you are in the good shape. The remaining work is just

the calculation.

Standard calculation

Well calculate the probability for Class A Risk and Class B Risk to each produce the

observed outcome { N = 2, X 1 = 1, X 2 = 3} :

Guo Fall 2009 C, Page 229 / 284

P A { N = 2, X 1 = 1, X 2 = 3} = P A ( N = 2 ) P A ( X = 1) P A ( X = 3)

=e

"A

( "A )

2!

=e

1

(e

2!

)( e ) = 12 e

3

= 0.00337

P B { N = 2, X 1 = 1, X 2 = 3} = P B ( N = 2 ) P B ( X = 1) P B ( X = 3)

=e

"B

( "B )

2!

=e

32 1

e

2! 3

1

3

1

e

3

3

3

1

= e

2

1

3

= 0.00656

P ( A N = 2, X 1 = 1, X 2 = 3)

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

0.5 ( 0.00337 )

= 0.339

Similarly,

P ( B N = 2, X 1 = 1, X 2 = 3)

=

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

0.5 ( 0.00656 )

= 0.661

Finally,

E ( S N = 2, X 1 = 1, X 2 = 3)

= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)

Shortcut

When taking the exam, youll still need to understand the conceptual framework

explained in the beginning of the solution. However, youll skip the normalizing step and

avoid the need to manually calculate the mean.

This is what you need when solving this problem in the exam condition:

Event: { N = 2, X 1 = 1, X 2 = 3}

probability to produce

event

size of the event

the

group

1

e 1 )( e 3 )

(

2!

1

= e 5 = 0.00337

2

0.5

0.5

32 1

e

e

2! 3

1

= e

2

1

3

1

3

1

e

3

3

3

After-event

size of the

group (raw

posterior

probability)

Scale up

Conditional

raw

mean

posterior

probability

(multiply

the raw

probability

by

200,000)

0.5(0.00337)

337

"A A = 1(1) = 1

656

0.5(0.00656)

"B

= 3 ( 3) = 9

= 0.00656

X01=1,

Y01=337

X02=9,

Y02=656

You should get: n = 993 , X ' 6.28 . So E ( S N = 2, X 1 = 1, X 2 = 3) ' 6.28 .

Half of the insureds are expected to have 2 claims per year.

The other half of the insureds are expected to have 4 claims per year.

A randomly selected insured has made 4 claims in each of the first two policy years.

Determine the Bayesian estimate of this insureds claim count in the next (third) policy

year.

Solution

The observation is { N1 = 4, N 2 = 4} . We are asked to find E ( N 3 N1 = 4, N 2 = 4 ) .

However, the insured can belong to either Class A with "A = 2 or Class B with "B = 4 .

E ( N3 ) = E ( N3 A ) P ( A) + E ( N3 B ) P ( B )

= "A P ( A ) + "B P ( B ) = 2 P ( A ) + 4 P ( B )

Next, well modify the above partition equation by considering the observation

{ N1 = 4, N 2 = 4} . Well change the prior probabilities to posterior probabilities:

E ( N 3 N1 = 4, N 2 = 4 ) = 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )

P ( A N1 = 4, N 2 = 4 ) =

( N1 = 4, N 2 = 4 )!

P ( N1 = 4, N 2 = 4 )

P A

P ( A ) P ( N1 = 4, N 2 = 4 A )

P ( A) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )

Similarly,

P ( B N1 = 4, N 2 = 4 ) =

P ( B ) P ( N1 = 4, N 2 = 4 B )

P ( A ) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )

Detailed calculations (if you use my shortcut, youll avoid most of these calculations):

P ( N1 = 4, N 2 = 4 A ) = P ( N1 = 4 A) P ( N 2 = 4 A ) = e

"A

( "A )

24

= e

4! !

4!

P ( N1 = 4, N 2 = 4 B ) = P ( N1 = 4 B ) P ( N 2 = 4 B ) = e

( "B )

44

= e

4! !

4!

"B

24

0.5 e

4!

2

P ( A N1 = 4, N 2 = 4 ) =

0.5 e

2

4!

= 0.176

44

+ 0.5 e

4!

= 0.824

+ 0.5 e

4

4!

44

0.5 e

4!

4

P ( B N1 = 4, N 2 = 4 ) =

24

0.5 e

4!

The above two calculations are nasty and prone to errors. Many candidates will mess up

in these calculations and wont score a point. Assume you have done your calculation

right, you should get:

E ( N 3 N1 = 4, N 2 = 4 )

= 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )

What you should do in the exam room

Just set up the following table and let BA II Plus/Professional 1-V do the magic for you.

Watch and relax.

Event: { N1 = 4, N 2 = 4}

Group Beforeevent

size of

the

group

This

After-event size of the

groups

group (raw posterior

probability probability)

to produce

the event

A

0.5

24

e2

4!

24

0.5 e 2

4!

Scale up

raw

posterior

probability

Conditional

mean

"A = 2

"B = 4

B

0.5

4

4!

0.5 e

4

4!

Next, well need to scale the raw posterior probabilities up. Well want to avoid the errorprone calculation of following two raw posterior probabilities:

2

24

0.5 e

4!

2

44

0.5

e

,

4!

Remember what I said earlier when I was explaining Bayes Theorem to you:

What matters is the ratio of these two (or more) raw posterior probabilities, not their

absolute amounts.

What matters is the ratio of

24

0.5 e

4!

2

44

0.5

e

,

4!

24

0.5 e

4!

0.5 e

2

4!

44

0.5 e

4!

= 1,

0.5 e

2

4!

44

= 4

2

(e )

(e )

4 2

2 2

= 216 ( e

2 2

= 256e 4 = 4.689

New Table

Event: { N1 = 4, N 2 = 4}

Group Beforeevent

size of

the

group

This

groups

probability

to produce

the event

24

e

4!

0.5

After-event

size of the

group (raw

posterior

probability)

24

0.5 e

4!

Scale up

raw

Condiposterior

tional

probability

(multiply

mean

the raw

probability

by 1,000)

group (raw posterior

probability) after

simplification

24

0.5 e

4!

24

0.5 e

4!

=1

1,000

"A = 2

4,689

"B = 4

44

0.5 e

4!

0.5

44

e4

4!

44

0.5 e 4

4!

0.5 e

2

4!

= 256e

= 4.689

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:

X01=2,

Y01=1,000

X02=4,

Y02=4,689

You should get: n = 5, 689 , X ' 3.648 . So E ( N 3 N1 = 4, N 2 = 4 ) ' 3.648 .

Problem 9 (Nov 2000 #28)

Prior to observing any claims, you believed that claim sizes followed a Pareto distribution

with parameters = 10 and =1, 2, or 3, with each value equally likely. You then

observe one claim of 20 for a randomly selected risk. Determine the posterior probability

that the next claim for this risk will be greater than 30.

Solution

P ( X 2 > 30 )

= P ( X 2 > 30

= 1) P ( = 1) + P ( X 2 > 30

= 2 ) P ( = 2 ) + P ( X 2 > 30

= 3) P ( = 3)

If you look at Tables for Exam C/4, youll see that the survival function of a (2is S ( x

)=

. Here the

x+

problem doesnt say whether the Pareto is one parameter or two parameters. One quick

way to determine whether to use one parameter or two-parameter Pareto is this:

If the random variable is greater then zero, then use two parameter Pareto.

If the random variable is greater than a positive constant, then use one parameter Pareto.

The problem just vaguely says that claim sizes follow a Pareto distribution. Here the

claim size (i.e. claim dollar amount) must be greater than zero. Theres no reason for us

to think that the claim dollar amount must exceed a positive constant (such $500). As a

result, well use the 2-parameter Pareto.

Then for

P ( X 2 > 30

) = S ( 30 ) =

1

4

P ( X 2 > 30 ) =

=

10

30 + 10

P ( = 1) +

1

4

1

4

P ( = 2) +

1

4

P ( = 3)

1

1

1

P ( = 1) + P ( = 2 ) + P ( = 3)

4

16

64

P ( X 2 > 30 X 1 = 20 ) =

1

1

1

P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )

4

16

64

Next, well calculate the posterior probabilities. If you look at Tables for Exam C/4,

youll find the density function of a 2-parameter Pareto distribution with parameters

f (x

Then for

)=

is:

+1

(x+ )

= 10 , f ( 20

+1

x+

) = 10 2010+ 10

+1

1

=

10 3

+1

P ( = 1 X 1 = 20 ) =

P ( = 2 X 1 = 20 ) =

P ( = 3 X 1 = 20 ) =

P ( = 1) f ( 20

f ( 20 )

= 1)

P ( = 2 ) f ( 20

= 2)

P ( = 3) f ( 20

= 3)

f ( 20 )

f ( 20 )

f ( 20 ) = P ( = 1) f ( 20

= 1) + P ( = 2 ) f ( 20

10

Apply the formula f ( 20 ) =

10 20 + 10

calculation right, youll find:

+1

= 2 ) + P ( = 3) f ( 20

1

=

10 3

= 3)

+1

P ( = 1 X 1 = 20 ) =

0.3704% 1

=

0.7408% 2

P ( = 2 X 1 = 20 ) =

0.2469% 1

=

0.7408% 3

P ( = 3 X 1 = 20 ) =

0.1235% 1

=

0.7408% 6

Then

1

1

1

P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )

4

16

64

1 1

1 1

1 1

=

+

+

= 0.148

4 2 16 3 64 6

P ( X 2 > 30 X 1 = 20 ) =

If you ever try to reproduce my answers, youll find the calculation outlined above is

absolutely a nightmare. In addition, I must acknowledge that I used an Excel spreadsheet

to help me do the above calculations when I was preparing this manual. I must also

knowledge that theres little chance that I will be able to do the calculation right in the

heat of the exam.

In the exam, Ill never use the above standard approach, which is prone to errors. This is

what I will do in the exam (dramatically reducing the complexity of the calculations).

This is what you should do in the exam:

Guo Fall 2009 C, Page 237 / 284

Event: X 1 = 20

A

B

Group Beforeevent

size of

the

group

D=BC

This groups

density to

produce the

event

After-event size

of the group (raw

posterior

probability)

Scale up raw

posterior

probability

f ( 20

1

=

10 3

1

3

1

30 3

1 1

30 3

=2

1

3

2 1

30 3

=3

1

3

3 1

30 3

=

=1

multiply the

raw prob by

3(30)(32)

+1

1 1

3 30

1

3

1 2

3 30

1

3

1 3

3 30

1

3

F

P ( X 2 > 30

=

1

4

1

4

1

4

1

4

X 01 =

1

= 0.25 ,

4

1

X 02 =

4

1

X 02 =

4

Y01 = 3

= 0.0625 ,

Y02 = 2

= 0.015625 ,

Y03 = 1

You see how nice and easy the shortcut calculation is.

The claim count and claim size distribution for risks of Type A are:

# of claims

0

1

2

Claim size

500

1235

Probability

1/3

2/3

The claim count and claim size distributions for risks of Type B are:

# of claims

0

1

2

Probability

4/9

4/9

1/9

Probability

1/9

4/9

4/9

Claim size

250

328

Probability

2/3

1/3

Claim counts and claim sizes are independent within each risk type.

The variance of the total losses in 296,962.

Determine the Bayesian premium for the next year for this same risk.

Solution

N

Let S = ( X i represent the total annual loss. The observation is S1 = 500 . We are asked

i =1

to find E ( S 2 S1 = 500 ) . If we ignore the observation S1 = 500 , then the problem becomes

finding E ( S2 ) . Since the risk can be from either Type A or Type B, well condition S2

on risk types.

E ( S2 ) = E ( S2 A) P ( A) + E ( S2 B ) P ( B )

E (S ) = E (N ) E ( X ) ,

E ( S2 A ) = E ( N 2 A) E ( X A ) ,

E ( S2 B ) = E ( N 2 B ) E ( X B )

E ( N 2 A) = 0

4

4

1

6

1

4

4

12

+1

+2

= , E ( N2 B ) = 0

+1

+2

=

9

9

9

9

9

9

9

9

1

2

2

1

E ( X A ) = 500

+ 1235

= 990 , E ( X B ) = 250

+ 328

= 276

3

3

3

3

Guo Fall 2009 C, Page 239 / 284

E ( S2 A ) = E ( N 2 A ) E ( X A ) =

6

( 990 ) = 660

9

12

E ( S 2 B ) = E ( N 2 B ) E ( X B ) = ( 276 ) = 368

9

E ( S 2 ) = E ( S 2 A ) P ( A ) + E ( S 2 B ) P ( B ) = 660 P ( A) + 368 P ( B )

E ( S 2 S1 = 500 ) = 660 P ( A S1 = 500 ) + 368 P ( B S1 = 500 )

P ( A S1 = 500 ) =

P ( A) P ( S1 = 500 A )

P ( S1 = 500 )

, P ( B S1 = 500 ) =

P ( B ) P ( S1 = 500 B )

P ( S1 = 500 )

P ( A S1 = 500 )

P ( B S1 = 500 )

P ( A ) P ( S1 = 500 A )

P ( B ) P ( S1 = 500 B )

The only way for Type A to incur 500 claim in Year 1 is to have one claim of 500. The

only way for Type B to incur 500 claim in Year 1 is to two claims of 250 each.

4 1

4

2

So P ( S1 = 500 A) = , P ( S1 = 500 B ) =

9 3

9

3

4

1

P ( A S1 = 500 ) P ( A ) P ( S1 = 500 A )

9

3

=

=

P ( B S1 = 500 ) P ( B ) P ( S1 = 500 B )

4

2

0.5

9

3

0.5

3

4

P ( A S1 = 500 ) =

3

4

, P ( B S1 = 500 ) =

7

7

= 660

3

4

+ 368

= 493.14

7

7

Guo Fall 2009 C, Page 240 / 284

Event: S1 = 500

A

B

Group Beforeevent

size of

the

group

D=BC

This groups

probability

to produce

the event

After-event size

of the group (raw

posterior

probability)

Scale up raw

posterior

probability

F

E ( S 2 Type )

multiply the

raw prob by

32

0.5

Type

A

Type

B

0.5

4 1

9 3

0.5

4

2

9

3

4 1

0.5

9 3

2

0.5

4

2

9

3

4

9

3

660

368

X01=660, Y01=3; X02=368, Y02=4.

You should get: n = 7 , X = 493.14 . So E ( S 2 S1 = 500 ) = 493.14

Nov 2002 #39

Class

1

2

3

# of insureds

3000

2000

1000

0

1/3

0

0

Claim Count

Probabilities

1

2

3

1/3 1/3

0

1/6 2/3 1/6

0

1/6 2/3

4

0

0

1/6

Determine the expected number of claims in Year 2 for that insured.

Solution

Guo Fall 2009 C, Page 241 / 284

Conceptual framework

The observation is N1 = 1 . We are asked to find E ( N 2 N1 = 1) . If we ignore the

observation N1 = 1 , then the problem becomes finding E ( N 2 ) . Since N 2 can be

generated from each of the three classes, well condition N 2 on classes:

3

E ( N 2 ) = ( E ( N 2 Class i ) P ( Class i )

i =1

E ( N 2 Class 1) = 0

1

1

1

+1

+2

=1

3

3

3

1

2

1

E ( N 2 Class 2 ) = 1

+2

+3

=2

6

3

6

1

2

1

E ( N 2 Class 3) = 2

+3

+4

=3

6

3

6

The observation N1 = 1 will change the above equation into:

E ( N 2 N1 = 1) = P ( Class 1 N1 = 1) + 2 P ( Class 2 N1 = 1) + 3P ( Class 3 N1 = 1)

P ( Class 1 N1 = 1) =

P ( Class 2 N1 = 1) =

P ( Class 3 N1 = 1) =

P ( Class 1) P ( N1 = 1 Class 1)

P ( N1 = 1)

3 1

6

3

=

P ( N1 = 1)

P ( Class 2 ) P ( N1 = 1 Class 2 )

2 1

= 6 6

P ( N1 = 1)

P ( Class 3) P ( N1 = 1 Class 3)

3

0

6

=

=0

P ( N1 = 1)

P ( N1 = 1)

P ( N1 = 1)

3 1 2 1 3

2

P ( N1 = 1) = + + 0 =

6 3 6 6 6

9

P ( Class 1 N1 = 1) =

P ( Class 2 N1 = 1) =

P ( Class 1) P ( N1 = 1 Class 1)

P ( N1 = 1)

3 1

6

3=3

=

2

4

9

P ( Class 2 ) P ( N1 = 1 Class 2 )

P ( N1 = 1)

2 1

1

=6 6=

2

4

9

3

1

+ 2 + 3 0 = 1.25

4

4

Event: N1 = 1

A

B

(class) event

probability

size of to produce

the

the event

group

3/6

1/3

2/6

1/6

1/6

D=BC

After-event size

of the group (raw

posterior

probability)

Scale up raw

posterior

probability

E ( N 2 Class )

multiply the

raw prob by

18

3 1

6 3

2 1

6 6

1/60

Because the posterior probability is zero for Class to produce N1 = 1 , we can delete the

last row.

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:

X01=1, Y01=3; X02=2, Y02=1.

You should get: n = 4 , X = 1.25 . So E ( N 2 N1 = 1) = 1.25

Guo Fall 2009 C, Page 243 / 284

A car manufacturer is testing the ability of safety devices to limit damage in car

accidents. You are given:

A test car has either front air bags or side air bags (but not both), each type being

equally likely

The test car will be driven into either a wall or a lake, with each accident type

being equally likely

The manufacturer randomly selects 1, 2, 3, or 4 crash test dummies to put into a

car with front air bags.

The manufacturer randomly selects 2, or 4 crash test dummies to put into a car

with side air bags.

Each crash test dummy in a wall-impact accident suffers damage randomly equal

to either 0.5 or 1, with damage to each dummy being independent of damage to

the others.

Each crash test dummy in a lake-impact accident suffers damage randomly equal

to either 1 or 2, with damage to each dummy being independent of damage to the

others.

One test car is selected at random, and a test dummy accident produces total damage of 1.

Determine the expected value of the total damage for the next accident, given that the

kind of safety device (front or side air bags) and accident type (wall or lake) remain the

same.

Solution

This is one of the most feared exam problems. If you use the framework and shortcut,

however, you should do just fine.

Conceptual framework

N

i =1

of dummies chosen for the crash testing. The observation is S1 = 1 . We are asked to find

E ( S 2 S1 = 1) .

To simplify the problem, lets first discard the observation. Then the problem becomes

finding E ( S2 ) . The crash testing falls into four types:

Front air bag, lake collision (FL)

Side air bag, wall collision (SW)

Side air bag, lake collision (FW)

Guo Fall 2009 C, Page 244 / 284

E ( S2 )

= E ( S 2 FW ) P ( FW ) + E ( S 2 FL ) P ( FL ) + E ( S 2 SW ) P ( SW ) + E ( S 2 SL ) P ( SL )

test dummies to put into a car with front air bags. Each dummy is equally likely to be

chosen. So the expected number of dummies used for crash testing under FW is:

E ( N FW ) = E ( N F ) =

1+ 2 + 3 + 4

= 2.5

4

If the car is tested for lake collision, then the damage to a tested dummy can be either 0.5

or 1 with each damage equally likely:

E ( X FW ) = E ( X W ) =

0.5 + 1

= 0.75

2

E ( S 2 FW ) = E ( N FW ) E ( X FW ) = 2.5 ( 0.75 )

Similarly,

E ( S 2 FL ) = E ( N FL ) E ( X FL ) =

1+ 2 + 3 + 4 1+ 2

= 2.5 (1.5 )

4

2

E ( S 2 SW ) = E ( N SW ) E ( X SW ) =

E ( S 2 SL ) = E ( N SL ) E ( X SL ) =

2 + 4 0.5 + 1

= 3 ( 0.75 )

2

2

2 + 4 1+ 2

= 3 (1.5 )

2

2

If we want to complete the above calculation, well plug in

P ( FW ) = P ( FL ) = P ( SW ) = P ( SL ) = 0.25

This will produce the prior mean.

consider the impact of the observation S1 = 1 . This observation will change the partition

equation into:

E ( S 2 S1 = 1) = 2.5 ( 0.75 ) P ( FW S1 = 1) + 2.5 (1.5 ) P ( FL S1 = 1)

+3 ( 0.75 ) P ( SW S1 = 1) + 3 (1.5 ) P ( SL S1 = 1)

P ( FW S1 = 1) =

P ( SW S1 = 1) =

P ( FW ) P ( S1 = 1 FW )

P ( S1 = 1)

P ( SW ) P ( S1 = 1 SW )

P ( S1 = 1)

P ( FL S1 = 1) =

P ( SL S1 = 1) =

P ( FL ) P ( S1 = 1 FL )

P ( S1 = 1)

P ( SL ) P ( S1 = 1 SL )

P ( S1 = 1)

Where

P ( S1 = 1) = P ( FW ) P ( S1 = 1 FW ) + P ( FL ) P ( S1 = 1 FL )

+ P ( SW ) P ( S1 = 1 SW ) + P ( SL ) P ( S1 = 1 SL )

The key is to calculate P ( S1 = 1 FW ) . In a front bag lake collision testing, the number of

dummies can be 1,2,3, or 4; the damage per dummy can be 0.5 or 1. So there are only 2

ways for FW to produce S1 = 1 .

Two dummies were chosen each having 0.5 damage. Probability: 0.25(0.5)(0.5)

One dummy was chosen having 1 damage. Probability: 0.25(0.5)

Total probability: P ( S1 = 1 FW ) =0.25(0.5)(0.5) + 0.25(0.5) = 0.1875

We can apply the same logic and find (please verify my calculation):

P ( S1 = 1 FL ) = 0.125 , P ( S1 = 1 SW ) = 0.125 , P ( S1 = 1 SL ) = 0

We are given that P ( FW ) = P ( FL ) = P ( SW ) = P ( SL ) = 0.25

Finally,

P ( S1 = 1) = 0.25 * (0.1875 + 0.125 + 0.125)

P ( FW S1 = 1) =

0.25 0.1875

3

=

0.25 (0.1875 + 0.125 + 0.125) 7

Guo Fall 2009 C, Page 246 / 284

P ( FL S1 = 1) =

0.25 0.125

2

=

0.25 (0.1875 + 0.125 + 0.125) 7

P ( SW S1 = 1) =

0.25 0.1875

2

=

0.25 (0.1875 + 0.125 + 0.125) 7

P ( SL S1 = 1) =

0.25 0

=0

0.25 (0.1875 + 0.125 + 0.125)

Finally,

E ( S 2 S1 = 1) = 2.5 ( 0.75 ) P ( FW S1 = 1) + 2.5 (1.5 ) P ( FL S1 = 1)

+3 ( 0.75 ) P ( SW S1 = 1) + 3 (1.5 ) P ( SL S1 = 1)

3

2

2

= 2.5 ( 0.75 ) + 2.5 (1.5 ) + 3 ( 0.75 ) = 2.518

7

7

7

Event: S1 = 1

A

B

(class) event

probability

size of to produce

the

the event

group

FW

FL

SW

SL

1

4

1

4

1

4

1

4

0.1875

0.125

0.125

0

D=BC

After-event size

of the group (raw

posterior

probability)

Scale up raw

posterior

probability

F

E ( S 2 S1 = 1)

multiply the

raw prob by

40,000

1

( 0.1875)

4

1

( 0.125)

4

1

( 0.125)

4

0

1875

2.5 ( 0.75 )

1250

2.5 (1.5 )

1250

3 ( 0.75)

3 (1.5)

Because the posterior probability is zero for Class to produce S1 = 1 , we can delete the

last row.

Guo Fall 2009 C, Page 247 / 284

X01= 2.5 ( 0.75 ) , Y01=1875;

X02= 2.5 (1.5 ) ,

Y02=1250

X03= 3 ( 0.75) ,

Y03=1250

Problem 10 (May 2005 #35)

The # of claims on a given policy has the geometric distribution with parameter

One-third of the policies have = 2 ; and the remaining two-thirds have = 5 .

Calculate the Bayesian expected # of claims for the selected policy in Year 2.

Solution

the observation N1 = 2 , then

E ( N2 ) = E ( N2

= 2) P (

= 2) + E ( N2

= 5) P (

E ( N2 ) = 2P (

= 2) + 5P (

= 5)

)=

= 5)

E ( N 2 N1 = 2 ) = 2 P (

= 2 N1 = 2 ) + 5 P (

= 5 N1 = 2 )

A

B

C

D=BC

Group Beforeevent

size of

the

group

This groups

probability to

produce the

event (a

geometric

distribution)

P ( N1 = 20

=2

1

3

=5

2

3

After-event size of

the group (raw

posterior

probability)

22

(1 + 2 )

52

(1 + 5)

Scale up

raw

posterior

probability

F

E ( N2

)=

multiply

the raw

prob by

100,000

(1 + )

4

27

1 4

= 0.04938

3 27

4,938

25

216

2 25

= 0.07716

3 216

7,716

X01=2, Y01=4,938; X02=5, Y02=7,716.

You should get: X ' 3.83 . So E ( N 2 N1 = 2 ) ' 3.83

Problem 11 (Nov 2005, #15)

For a particular policy, the conditional probability of the annual number of claims given

) = , and the probability distribution of ) are as follows:

# of claims

Probability

0

2

Probability

0.10

0.80

0.30

0.20

2

1 3

Calculate the Bayesian estimate of the expected # of claims in Year 2.

Solution

Ignoring this observation, we have:

E ( X2 ) = E ( X2

E ( X2

= 0.1) P ( = 0.1) + E ( X 2

= 0.3) P ( = 0.3)

= E ( X 2 , X 2 = 0 ) P ( X 2 = 0 ) + E ( X 2 , X 2 = 1) P ( X 2 = 1) + E ( X 2 , X 2 = 2 ) P ( X 2 = 2 )

= 0(2

) + 1( ) + 2 (1

E ( X2

)=2

Considering the observation X 1 = 1 , we have:

E ( X 2 X 1 = 1) = 1.5 P ( = 0.1 X 1 = 1) + 0.5 P ( = 0.3 X 1 = 1)

Event: X 1 = 1

A

B

D=BC

Beforeevent

size of

the

group

This groups

probability

to produce

the event .

(The

probability

to have one

claim is )

After-event size

of the group (raw

posterior

probability)

Scale up raw

posterior

probability

= 0.1

0.8

0.1

0.8(0.1)=0.08

1.5

= 0.3

0.2

0.3

0.2(0.3)=0.06

0.5

Group

F

E ( X2

multiply the

raw prob by

100

X01=1.5, Y01=8; X02=0.5, Y02=6.

You should get: X = 1.07142857 . So E ( X 2 X 1 = 1) ' 1.07

The solution process for continuous-prior problems are similar to the process for the

discrete prior problems. There are two major differences:

Well use integration for the continuous prior problems; well use summation for

the discrete prior problems.

You cant use the BA II Plus/Professional 1-V Statistics Worksheet shortcut any

more to solve a continuous-prior premium problem. In contrast, you use the BA II

Plus/Professional 1-V Statistics Worksheet shortcut to solve a discrete-prior

premium problem.

Step 1

Determine the observation.

Step 2

Step 3

Step 2. Change the prior probability to posterior probability.

Step 4

Step 5

Problem 1 (May 2001 #37)

You are given the following information about workers compensation coverage

The # of claims from an employee during the year follows a Poisson distribution

100 p

with mean

, where p is the salary (in thousands) for the employee

100

An employee is selected at random. No claims were observed for this employee during

the year. Determine the posterior probability that the selected employee has a salary

greater than 50.

Solution

Step 1 Determine the observation. This is N = 0 . We are asked to find P ( p > 50 N = 0 ) .

Please note we are NOT asked to find P ( N 2 > 50 N1 = 0 ) .

Guo Fall 2009 C, Page 251 / 284

If we ignore the observation, we just need to find P ( p > 50 ) . Since p is uniform on the

interval [0, 100], we have:

P ( p > 50 ) =

100

f ( p ) dp

50

P ( p > 50 N = 0 ) =

100

f ( p N = 0 ) dp

50

f ( p N = 0) =

f ( p) P ( N = 0 p)

P ( N = 0)

f ( p) P ( N = 0 p)

100

f ( p ) P ( N = 0 p ) dp

p =0

P ( N = 0 p ) = e0.01 p 1 .

P ( N = 0) =

100

f ( p ) P ( N = 0 p ) = 0.01e0.01 p 1 ,

f ( p ) P ( N = 0 p ) dp =

p =0

=e

100

100

0.01e0.01 p 1dp = e

p=0

( e 1) = 1

f ( p N = 0) =

100 p

= 1 0.01 p . So

100

0.01e0.01 p dp .

p =0

f ( p) P ( N = 0 p)

P ( N = 0)

=

=

e

e

e 1

1 e1

1 e1

P ( p > 50 N = 0 ) =

100

p = 50

100

0.01 0.01 p

e e0.5

f ( p N = 0 )dp =

e dp =

= 0.622

e 1

e 1

p = 50

Shortcut

100 p

, we naturally set

100

100 p

. Since p is uniform over [0, 100], 100

100

100 p

"=

is uniform over [0, 1]. f ( " ) = 1 .

100

"=

f ( " N = 0) =

f (" ) P ( N = 0 " )

P ( N = 0)

e

1

"

e "d"

"

1 e

"=

100 p

p

=1

,

100

100

p = 100 (1 " )

0.5

f ( " N = 0 )d " =

" =0

0.5

" =0

"

1 e

d"

1 e 0.5

= 0.6225

1 e1

You are given:

In a portfolio of risks, each policyholder can have at most two claims per year.

For each year, the distribution of the number of claims is:

# of claims

0

1

2

Probability

0.1

0.9 q

q

(q) =

q2

, 0.2 < q < 0.5

0.039

A randomly selected policyholder had two claims in Year 1 and two claims in Year 2.

For this insured, determine the Bayesian estimate of the expected number of claims in

Year 3.

Solution

Continuous-prior problems are harder than discrete-prior ones and many candidates are

scared of them. However, if you can follow the 5-step framework, youll be on the right

track.

The observation is ( N1 = 2, N 2 = 2 ) . We are asked to find E ( N 3 N1 = 2, N 2 = 2 ) .

Lets simplify the problem by discarding the observation ( N1 = 2, N 2 = 2 ) . Then our task

is to find prior mean E ( N 3 ) . This is an Exam P problem.

N 3 is distributed as follows:

+0

,

N 3 = -1

,2

.

with probability

0.1

with probability

with probability

0.9 - q

q

(q) =

q2

, 0.2 < q < 0.5 . If q is fixed, then

0.039

E ( N 3 q ) = 0 ( 0.1) + 1( 0.9 q ) + 2 ( q ) = q + 0.9

Eq E ( N 3 q ) ! = Eq ( q + 0.9 ) = E ( q ) + 0.9

E ( N 3 ) = E ( q ) + 0.9

0.5

0.5

q2

E ( q ) = q ( q ) dq = q

dq = 0.39

0.039

0.2

0.2

E ( N 3 ) = E ( q ) + 0.9 = 0.9 + 0.39 = 1.29

So the mean prior to the observation is 1.29. Please note that we dont need to calculate

the prior mean. I calculated it just to show you this: if you discard the observation, then

the problem becomes an Exam P problem.

Next, lets add in the observation. The observation ( N1 = 2, N 2 = 2 ) will change the

equation from E ( N 3 ) = E ( q ) + 0.9 to

Guo Fall 2009 C, Page 254 / 284

E ( N 3 N1 = 2, N 2 = 2 ) = E ( q N1 = 2, N 2 = 2 ) + 0.9

E ( q N1 = 2, N 2 = 2 ) =

0.5

q f ( q N1 = 2, N 2 = 2 ) dq

0.2

f ( q N1 = 2, N 2 = 2 ) =

f ( q ) P ( N1 = 2, N 2 = 2 q )

P ( N1 = 2, N 2 = 2 )

f ( q ) P ( N1 = 2, N 2 = 2 q )

0.5

f ( q ) P ( N1 = 2, N 2 = 2 q ) dq

0.2

P ( N1 = 2, N 2 = 2 q ) = q 2 ,

(q) =

q2

.

0.039

q2

q2 )

(

f ( q N1 = 2, N 2 = 2 ) = 0.5 0.039

=

q2

2

( q ) dq

0.039

0.2

q4

0.5

q 4 dq

0.2

0.5

1 6 0.5

q !

0.2

0.2

E ( q N1 = 2, N 2 = 2 ) = q f ( q N1 = 2, N 2 = 2 ) dq = 0.5

=6

= 0.419

1 5 0.5

4

0.2

q !

q dq

0.2

5

q 5 dq

0.5

0.2

You are given:

The parameter / has an inverse gamma distribution with probability density

function

g ( " ) = 500" 4 e

10 "

, " >0

function

f ( x / = " ) = " 1e

x "

For a single insured, two claims were observed that totaled 50. Determine the expected

value of the next claim from the same insured.

Solution

Guo Fall 2009 C, Page 255 / 284

then the problem becomes

E ( X3 ) =

xf ( x )dx =

xf ( x " )g ( " ) dx =

x ( " 1e

x "

)g ( " ) dx

If we consider the observation, well need to change the prior density g ( " ) to the

posterior density g ( " X 1 + X 2 = 50 )

E ( X 3 X 1 + X 2 = 50 ) =

x ( " 1e

x "

)g ( " X

+ X 2 = 50 ) dx

For a group of insureds, you are given:

The amount of claim is uniformly distributed but will not exceed a certain

unknown limit

500

The prior distribution of is ( ) = 2 , > 500

Determine the probability that the next claim will exceed 500.

Solution

The observation is X 1 = 400, X 2 = 600 . We are asked to find

P ( X 3 > 550 X 1 = 400, X 2 = 600 )

P ( X 3 > 550

)f ( )d

500

X3

P ( X 3 > 550 ) =

550

] . So P ( X 3 > 550 ) =

550

( )d

500

Since we have the observation X 1 = 400, X 2 = 600 , we will modify the above equation by

changing the prior density f

( )

X 1 = 400, X 2 = 600 ) :

550

X 1 = 400, X 2 = 600 ) d ]

600

to

X 1 = 400, X 2 = 600 ) 0 f

500

600

( ) P ( X 1 = 400 ) P ( X 2 = 600 )

500 1

500

> 600

where

f

d = 1, k

X 1 = 400, X 2 = 600 ) =

600

1

4 +1

k

! 600 = 3 ( 600

X 1 = 400, X 2 = 600 ) =

> 600

where

) = 1,

l = 3 ( 6003 )

3 ( 6003 )

4

=

550

3 ( 6003 )

1

4

d = 3 ( 6003 )

600

550

)d

600

= 3 ( 6003 )

1

4 +1

4 +1

550

5 +1

5+1

! 600

= 3 ( 6003 )

1

600

3

550

600

4

3 550

= 0.3125

4 600

=1

You are given:

The amount of a claim, X , is uniformly distributed on the interval [ 0,

is

( )=

500

2

> 500

Guo Fall 2009 C, Page 257 / 284

Two claims, x1 = 400 and x2 = 600 , are observed. You calculate the posterior

distribution as:

x1 , x2 ) = 3

6003

4

> 600

Solution

E ( X 3 x1 , x2 ) =

E ( X3

)f (

x1 , x2 ) d

600

is uniform over [ 0,

X3

E ( X 3 x1 , x2 ) =

2

600

] . So E ( X 3 ) = .

2

6003

4

d =

3

( 6003 )

2

600

d =

3

1

6003 ) ( 600

(

2

2

) = 450

You are given:

An individual automobile insured has annual claim frequencies that follow a

Poisson distribution with mean "

An actuarys prior distribution for the parameter " has probability density

function

( " ) = ( 0.5) 5e

5"

1

+ ( 0.5 ) e

5

" 5

In the first policy year, no claims were observed for the insured.

Solution

observation N1 = 0 , then the problem becomes finding E ( N 2 ) . Using the double

expectation theorem, we have:

Guo Fall 2009 C, Page 258 / 284

(" ) d"

0

E ( N 2 N1 = 0 ) = E ( " N1 = 0 ) =

(" N

= 0) d "

(" N

(" N

= 0) 0

(" N

= 0) = k

=k

So

(" N

( 0.5 ) 5e

6"

5 ( 0.5)

( 6e

6

1

+ ( 0.5 ) e

5

6"

) + 0.5

6

5"

= 0) .

1

+ ( 0.5 ) e

5

" 5

"

6" 5

6

e

5

6" 5

Next, well need to find the normalizing constant k . The total probability should be one.

We have:

(" N

5 ( 0.5)

( 6e

6

= 0 )d " = k

0

5 ( 0.5)

( 6e

6

6"

(" N

) + 0.5

6

= 0) = 2

=

6

e

5

6" 5

5 ( 0.5 )

( 6e

6

5

( 6e

6

6"

) + 16

E ( N 2 N1 = 0 ) = E ( " N1 = 0 ) =

) + 0.5

6

6

e

5

6" 5

6"

) + 0.5

6

6

e

5

k =2

6

e

5

=1

6" 5

6" 5

(" N

6"

= 0) d " =

5 1

1 5

+

= 0.278

6 6

6 6

Poisson-gamma model

Problem (May 2000, #30)

You are given:

An individual automobile insured has an annual claim frequency distribution that

follows a Poisson distribution with mean "

" follows a gamma distribution with parameter

and

st

The 1 actuary assumes that = 1 and = 1 6

The 2nd actuary assumes the same mean for the gamma distribution, but only half

the variance

A total of one claim is observed for the insured over a 3-year period

Both actuaries determine the Bayesian premium for the expected number of

claims in the next year using their model assumptions

Determine the ratio of the Bayesian premium that the 1st actuary calculates to the

Bayesian premium that the 2nd actuary calculates.

Solution

If

N " is Poisson with mean "

n1 , n2 ,, nk claims are observed in Year 1, Year 2,, Year k respectively

Then

The conditional random variable " n1 , n2 ,..., nk also follows gamma distribution with

parameters

*

+ n1 + n2 + ... + nk =

1+ k

+k

+ # of observation years

E ( N k +1 n1 , n2 ,..., nk ) = E ( " n1 , n2 ,..., nk ) =

* *

1

+ # of observation years

This theorem is tested over and over and you should memorize it. If you want to find the

proof of this theorem, refer to the textbook Loss Models.

Guo Fall 2009 C, Page 260 / 284

In this problem,

the observation period = 3 years

# of claims observed = 1

=1,

1st actuary:

1+1

2

=

=

1

+ # of observation years 6 + 3 9

and has

2nd actuary: You need to know that a gamma distribution with parameters

2

mean

and variance

. We are told that the two actuaries get the same mean but the

nd

2 actuary gets half the variance of the 1st one.

1 1

= 1 = ,

6 6

1

1

= 1

2

6

=2,

1

12

+ total # of claims observed

2 +1

1

=

=

1

+ # of observation years 12 + 3 5

So the ratio is

2 1 10

=

9 5 9

Nov 2001 #3

You are given:

The # of claims per auto insured follows a Poisson distribution with mean "

The prior distribution for " has the following probability density function:

f (" ) =

( 500" ) e 500"

"1 ( 50 )

50

Year 1 Year 2

# of claims

75

210

# of autos insured 600

900

Determine the expected # of claims in Year 3.

Solution

The observation is N1 = 75, N 2 = 210 , where N1 is the # of claims in Year 1 for the 600

auto policies; N 2 is the # of claims in Year 2 for the 900 auto policies. N1 has Poisson

distribution with mean of 600" . N 2 has Poisson distribution with mean of 900" .

We need to find E ( N 3 N1 = 75, N 2 = 210 ) , where N 3 is the # of claims in Year 3 for the

one auto policy. Then the expected # of auto claims in Year 3 for 1,100 auto policies is

simply

1,100 E ( N 3 N1 = 75, N 2 = 210 )

We are told that

500" ) e 500 "

(

f (" ) =

"1 ( 50 )

50

If you look at Table for Exam C, youll find the gamma pdf is:

x

f ( x) =

x1(

e

=

e

1(

( x" )

"1 (

x"

, where " =

You should immediately recognize that this is gamma distribution with parameters

= 50 and " = 500 . Then using the gamma distribution formula listed in Table for

Exam C, we have

E ( N3 ) = E ( " ) =

"

50

= 0.1 .

500

If we consider the observation N1 = 75, N 2 = 210 , then we need to modify the formula

f ( " N1 = 75, N 2 = 210 ) 0 f ( " ) P ( N1 = 75 " ) P ( N 2 = 210 " )

0 ( " 49 e

0"

500 "

49 + 75+ 210

600 "

( 600" )

75

0" e

334

900 "

( 900" )

210

2000 "

*

= 335 and

1

.

2, 000

E ( N 3 N1 = 75, N 2 = 210 ) = E ( " N1 = 75, N 2 = 210 ) = a*

335

2, 000

Then the expected # of auto claims in Year 3 for 1,100 auto policies is simply

1,100

335

= 184.25

2, 000

May 2001 #2

Annual claim counts follow a Poisson distribution with mean "

The parameter " has prior distribution with probability density function

1

f (" ) = e

3

" 3

, " >0

Two claims were observed during the 1st year. Determine the variance of the posterior

mean.

Solution

So this is the Poisson-gamma model.

=1.

The observation is N1 = 2 . We are asked to find the variance Var ( " N1 = 2 ) . We are told

that N " is Poisson with mean " , yet " is gamma with

=1,

= 3.

*

+ # of observed claims =

= ( # of observation periods +

( )

* 2

+ N1

1

= (1 + 3

= 0.75

= 3 ( 0.75 ) = 1.6875

2

Binomial-beta model

Problem (Nov 2000, #11)

For a risk, you are given:

The # of claims during a single year follows a Bernoulli distribution with mean p

The prior distribution for p is uniform on the interval [0, 1]

The claims experience is observed for a number of years

1

The Bayesian premium is calculated as based on the observed claims

5

Which of the following observed claims data could have yielded this calculation?

0 claims during 3 years

0 claims during 4 years

0 claims during 5 years

1 claims during 4 years

1 claims during 5 years

Solution

Please note that a uniform distribution is a special case of beta distribution with

parameter a = b = = 1 . In addition, Bernoulli distribution is a special case of binomial

distribution with n = 1 .

Next, Ill give you the general binomial-beta formula.

If

p has beta distribution with parameter a and b

x1 , x2 ,, xk claims are observed in Year 1, Year 2,, Year k respectively (where xi

can be 0, 1, , n )

Then

The conditional random variable p x1 , x2 ,..., xk also has beta distribution with parameters

a* = a + x1 + x2 + ... + xk = a + total # of claims observed

b* = b + k n

( x1 + x2 + ... + xk ) = b + k n

E ( X k +1

a*

x1 , x2 ,..., xk ) = n E ( p x1 , x2 ,..., xk ) = n * *

a +b

Proof.

f ( p ) P ( x1 , x2 ,..., xk p )

f ( p x1 , x2 ,..., xk ) =

f ( p ) P ( x1 , x2 ,..., xk p ) dp

. Where

1

f ( p ) P ( x1 , x2 ,..., xk p ) dp

is

Next, lets find the beta pdf f ( p ) . If you look at the Exam C table, youll see that beta

distribution has the following pdf:

f ( x) =

1 (a + b)

1 ( a ) 1 (b )

u a (1 u )

b 1

1

x

, 0< x< , u=

x

This pdf is really annoying. It has variables u and x . To simplify the pdf, set

Then u = x and 0 < x < 1 . The pdf becomes:

f ( x) =

1 (a + b)

x a (1 x )

1 ( a ) 1 (b )

1 (a + b) a

1

x

=

x 1 ( a ) 1 (b)

b 1

(1

x)

b 1

= 1.

, 0 < x < 1.

This is the most commonly used beta pdf. This is the one you should use for Exam C.

Back to the problem. Since p has beta distribution with parameter a and b , the pdf is

f ( p) =

1 (a + b)

1 ( a ) 1 (b )

pa

(1

p)

b 1

, which is proportional to p a

(1

p)

b 1

This is so because x1 , x2 ,..., xk are independent identically distributed given p . For i = 1

to k , xi p is binomial with parameters n and p . So P ( xi p ) = Cnxi p xi (1 p )

n xi

So P ( x1 , x2 ,..., xk p ) is proportional to

p x1 (1 p )

n x1

p x2 (1 p )

n x2

... p xk (1 p )

n xk

= p x1 + x2 +...+ xk (1 p )

kn

( x1 + x2 +...+ xk )

( xi

kn

xi

= p i=1 (1 p ) (

i =1

k

f ( p x1 , x2 ,..., xk ) is proportional to f ( p ) p

k

a 1

(1

p)

b 1

( xi

i =1

(1

p)

kn

( =p

i =1

xi

( xi

(1

i =1

a+

p)

kn

( xi , which is proportional to

i =1

( xi

i =1

(1

p)

b+k n

( xi

i =1

a* = a + x1 + x2 + ... + xk ,

b* = b + k n

( x1 + x2 + ... + xk )

Next, well calculate E ( X k +1 x1 , x2 ,..., xk ) , the Bayesian estimate for Year k + 1 , using the

5-step framework.

We first discard the observation x1 , x2 ,..., xk . Then E ( X k +1 x1 , x2 ,..., xk ) becomes

E ( X k +1 ) . Using the double expectation theorem, we have:

E ( X k +1 ) = E p E ( X k +1 p ) ! = E p [ n p ] = n E ( p )

Next, we consider the observation x1 , x2 ,..., xk . Well modify the above equation by

know that p x1 , x2 ,..., xk has beta distribution with parameters

a* = a + x1 + x2 + ... + xk ,

b* = b + k n

( x1 + x2 + ... + xk )

Looking up the beta expectation formula from the Exam C table, we have:

E ( p x1 , x2 ,..., xk ) =

a*

a* + b*

Finally, we have:

E ( X k +1 x1 , x2 ,..., xk ) = n E ( p x1 , x2 ,..., xk ) = n

a*

a* + b*

Now lets apply the binomial-beta formula to this problem. We are told that the # of

claims in a year is a Bernoulli random variable. So the number of trial is n = 1 . In

addition, the prior distribution of p is uniform over [0, 1], which is beta distribution with

parameter a = b = 1 .

(x

k

E ( X k +1 x1 , x2 ,..., xk ) = n

a + ( xi

i =1

a+b+k n

= (1)

1 + ( xi

i =1

1 + 1 + k (1)

1 + ( xi

i =1

2+k

1

5

1 + ( xi

i =1

2+k

1

5

We have two unknowns in one equation. We cant solve it. One way to find the right

k

(x

i =1

1 + ( xi

i =1

2+k

1

. So zero

5

Chapter 10

For an insurance:

Losses can be 100, 200 or 300 with respective probabilities 0.2, 0.2, and 0.6.

The insurance has an ordinary deductible of 150 per loss.

Y P is the claim payment per payment random variable.

Calculate Var (Y P ) .

(A) 1500 (B) 1875 (C) 2250 (D) 2625 (E) 3000

Core concepts:

Ground up loss

Ordinary deductible

Claim payment

Claim payment per payment

Explanation

Let X represent the ground up loss amount (ground up loss amount is the actual loss

incurred by the policyholder). Let d where d 0 represent the deductible.

Amount paid the insurer (called claim payment):

(X

d )+ = max ( X

d , 0) =

0

X

if X d

if X > d

(X

d ) = min ( X , d ) =

X

d

if X d

if X > d

X

ground up loss

(X

d )+

insurance company

(X

d)

out of his own pocket

Example. Your deductible for your car insurance is $500. If you have an accident and the

loss is $600, you pay $500 out of your own pocket and your insurance company pays you

$100. In this case,

600

ground up loss

100

amount paid by the

insurance company

500

amount paid by the insured

out of his own pocket

However, if the loss is $400, then you pay all the loss and the insurance company pays

zero.

=

400

ground up loss

0

amount paid by the

insurance company

400

amount paid by the insured

out of his own pocket

Let Y represent the claim payment. Then Y = ( X

d , then Y

the loss with his money and wont need to report the loss to the insurance company. So

the insurance company may not even know that a loss has incurred. So for the insurance

company to pay any claim, Y must be positive. This is why the claim payment per

payment is (Y Y > 0 ) .

Full solution

Let X represent the ground up loss. Let Y represent the claim payment. The deductible is

d = 150 .

YP =Y Y > 0

We are asked to find Var (Y P ) .

Var (Y P ) = Var (Y Y > 0 ) = Var

=E

(X

(X

2

E 2 ( X 150 X > 150 )

Guo Fall 2009 C, Page 269 / 284

2

X

( X 150 )+

P(X )

100

0

200

50

300

150

0.2

0.2

0.6

P( X )

0.2

0.8

P ( X > 150 )

(X

Var

(X

0.2

0.8

0.6

0.8

0.2

0.2

0.6

+ 50

+ 150

= 125

0.8

0.8

0.8

2

0.2

0.2

0.6

+ 50 2

+ 150 2

= 17,500

0.8

0.8

0.8

Var ( X 150 )+ X > 150 .

As explained in the chapter on calculators, when using BA II Plus or BA II Plus

Professional 1-V Statistics Worksheet, we can simply discard the data that falls out of the

conditional probability and calculate the mean/variance on the remaining data.

X

Is X > 150 ?

100

No, so discard

this data

200

Yes. Keep this

data.

300

Yes. Keep this

data.

200

300

X

50

150

( X 150 )+

P(X )

0.2

0.6

10P ( X ) -- Scaled up

probability

X01=200, Y01=2;

X02=150, Y02=6

n = 8,

Var =

X = 125,

= 43.30127019

= 1,875

Losses can be 100, 200, 300, and 400 with respective probabilities 0.1, 0.2, 0.3, and 0.4.

The insurance has an ordinary deductible of 250 per loss.

Y P is the claim payment per payment random variable.

Calculate Var (Y P ) .

Solution

Fast solution

Ground up loss X

Is X > 250 ?

100

No. Discard

X

( X 250 )+

200

No. Discard.

300

Yes. Keep.

400

Yes. Keep.

250 :

300

50

400

150

P(X )

0.3

0.4

10 P ( X ) -- scaled up probability

X01=50, Y01=3;

X02=150, Y02=4

Guo Fall 2009 C, Page 271 / 284

n = 7,

Var =

X = 107.14,

= 49.48716593

= 2, 4489.98

Standard solution

X

(X

250 ) +

P(X )

100

0

200

0

300

50

400

150

0.1

0.2

0.3

0.4

P(X )

0.1

0.7

P ( X > 250 )

E(X

(X

Var

0.2

0.7

0.4

0.7

1

2

3

4

+0

+ 50

+ 150

= 107.1428571

7

7

7

7

(X

0.3

0.7

1

2

3

4

+ 02

+ 502

+ 150 2

= 13, 928.57143

7

7

7

7

Losses can be 1,000, 4,000, 5,000, 9,000, and 12,000 with respective probabilities 0.11,

0.17, 0.24, 0.36, and 0.12.

The insurance has an ordinary deductible of 900 per loss.

Y P is the claim payment per payment random variable.

Calculate Var (Y P ) .

Solution

Ground up loss X

Is X > 0.9 ?

( X 0.9 )+

P(X )

100P ( X ) -- scaled

up probability

1

Yes.

Keep.

0.1

4

Yes.

Keep.

3.1

5

Yes.

Keep.

4.1

9

Yes.

Keep.

8.1

12

Yes.

Keep.

11.1

0.11

0.17

0.24

0.36

0.12

11

17

24

36

12

X01=0.1, Y01=11;

X03=4.1, Y03=24;

X04=11.1, Y04=12

X02=3.1, Y02=17;

X04=8.1, Y04=36;

n = 100,

Var =

X = 5.77,

= 3.28345854

2

Chapter 11

You are given:

Losses follow an exponential distribution with the same mean in all years.

The loss elimination ratio this year is 70%.

The ordinary deductible for the coming year is 4/3 of the current deductible.

Compute the loss elimination ratio for the coming year.

Core concept:

Loss elimination ratio (LER)

LER =

=

E(X )

Expected loss amount

LER answers the question, What % of the expected loss amount is absorbed by the

policyholder due to the deductible?

How to calculate LER.

E(X ) =

xf ( x )dx =

(X

s ( x )dx

d ) = min ( X , d ) =

E(X

X

d

if X d

if X > d

d ) = x f ( x )dx + d

Alternatively,

E(X

d ) = s ( x )dx =

1 FX ( x ) dx

You can find the proof of the 2nd formula from Loss Models.

Guo Fall 2009 C, Page 274 / 284

E(X ) = E(X

0) =

s ( x )dx

Ground up loss X has exponential distribution with mean

f ( x) =

E(X

e , s ( x) = 1 F ( x) = 1

d

LER =

d)

E(X )

= e , E(X ) =

d ) = s ( x )dx = e dx =

E(X

1 e

1 e

=1 e

d

1 e

= 0.7, e

= 0.3

LER ' = 1 e

4 d

3

=1

4

3

4

of the original deductible),

3

4

= 1 0.3 3 = 0.799

http://www.guo.coursehost.com

Chapter 12

E (Y

Find E(Y-M)+

The above formula works whether Y is a simple random variable or a compound random

variable Y =

n

i =1

X i . If Y =

n

i =1

Dont write

E (Y m )+ = E (Y ) m + mf X ( 0 ) + ( m 1) f X (1) + ( m 2 ) f X ( 2 ) + ...1 f X ( m 1)

In other words, the pdf in the right hand side must match up with the random variable in

the left hand side. If the random variable in the left hand side Y =

n

i =1

If your random variable in the left hand side is X , then you need to write

E(X

m ) + = E ( X ) m + mf X ( 0 ) + ( m 1) f X (1) + ( m 2 ) f X ( 2 ) + ...1 f X ( m 1)

To use the above formula in the heat of the exam, we rewrite the above formula into:

fY ( 0 )

E (Y

m )+ = E (Y )

fY (1)

m +

fY ( 2 )

...

fY ( m 1)

m

m 1

m 2

...

1

http://www.guo.coursehost.com

fY ( 0 )

fY (1)

fY ( 2 )

...

m

m 1

m 2 = mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)

...

fY ( m 1)

This is not a standard notation. However, we use it anyway to help us memorize the

formula. In the exam, you just write these 2 matrixes. Then you simply take out each

element in the 1st matrix and multiply it with a corresponding element in the 2nd matrix.

Next, sum everything up.

Please note that if you take out an element fY ( k ) (where 0 k

matrix, then you need to multiple it with m k from the 2nd matrix so ( m k ) + k = m

stands.

The proof of this formula is simple.

The standard formula is:

d 1

E ( S d )+ = E ( S )

s =0

1 FS ( s )

d 1

E ( S d )+ = E ( S )

s =0

1 FS ( x )

The above formula is confusing. f S ( x ) is not a good notation because S and x dont

match. The right notation should be f S ( s ) .

Lets move on from the formula E ( S d )+ = E ( S )

d 1

s =0

E ( S 3) + = E ( S )

2

s =0

2

s =0

1 FS ( s )

1 FS ( s ) = 1 FS ( 0 ) + 1 FS (1) + 1 FS ( 2 ) = 3

FS ( 0 ) = P ( S

FS ( 0 ) + FS (1) + FS ( 2 )

0) = P ( S = 0) = fS (0)

Guo Fall 2009 C, Page 277 / 284

http://www.guo.coursehost.com

FS (1) = P ( S 1) = P ( S = 0 ) + P ( S = 1) = f S ( 0 ) + f S (1)

FS ( 2 ) = P ( S

2 ) = P ( S = 0 ) + P ( S = 1) + P ( S = 2 ) = f S ( 0 ) + f S (1) + f S ( 2 )

FS ( 0 ) + FS (1) + FS ( 2 ) = 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )

E ( S 3)+ = E ( S ) 3 + 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )

fY ( 0 )

E (Y

m )+ = E (Y )

fY (1)

m +

fY ( 2 )

...

fY ( m 1)

Problem 1

# 11

m

m 1

m 2

...

1

A company provides insurance to a concert hall for losses due to power failure. You are

given:

The number of power failures in a year has a Poisson distribution with mean 1.

x

10

20

50

Probability of x

0.3

0.3

0.4

The number of power failures and the amounts of losses are independent.

Calculate the expected amount of claims paid by the insurer in one year.

Solution

http://www.guo.coursehost.com

Then S =

N

i =1

Xi .

The total claim dollar amount after the deductible of $30 is:

(S

30 )+ =

Xi

i =1

30

+

fS (0)

E ( S 30 )+ = E ( S )

f S (1)

30 +

30

29

f S ( 2 ) 28

...

...

f S ( 29 )

1

It seems like we have awful lot of work to do about the two matrixes. Before you start to

panic, please note that many of the values f S ( 0 ) , f S (1) ,..., f S ( 29 ) will be zero. This is

because X has only 3 distinct values: 10, 20, and 50 with probability of 0.3, 0.3, and 0.4

respectively. Evidently, we can throw away X = 50 . If X = 50 , then S is at least 50 and

is out of the range S 29 .

N

i =1

P ( N = n) =

1

e

n!

N

X

P(N )

P ( X 1 , X 2 ,..., X N )

S=

N

i =1

0

1

e

e

1

e

2

1

1

=1.

X = 10

X = 20

0.3

0.3

( X 1 , X 2 ) = (10,10 )

0.32

0

10

20

20

Xi

P(S )

e1

0.3e 1

0.3e 1

1 1

e ( 0.32 )

2

http://www.guo.coursehost.com

S=

N

i =1

Xi

0

10

20

20

P(S )

e1

0.3e 1

0.3e 1

1 1

e ( 0.32 )

2

After consolidation:

S=

N

i =1

0

10

20

E ( S 30 ) + = E ( S )

P(S )

Xi

e1

0.3e

1

0.3e 1 + e

2

( 0.3 ) = 0.345e

2

fS ( 0)

30 +

f S (10 )

f S ( 20 )

30

20

10

In the actual exam, to help remember the two matrixes, you can write only the 1st matrix:

fS ( 0)

f S (10 )

f S ( 20 )

b

c

As said early, the sum of the two elements in each row needs to be m (or 30 in this

problem). As a result,

0 + a = 30

10 + b = 30

20 + c = 30

a = 30

b = 20

c = 10

http://www.guo.coursehost.com

fS ( 0)

f S (10 )

f S ( 20 )

30

f S (10 )

f S ( 20 )

S=

i =1

b = f S (10 )

c

f S ( 20 )

fS ( 0)

fS ( 0)

30

20

10

30

20 = 0.3e 1 20 = e

10

0.345e 1

10

30

0.3 20 = 39.45e

0.345

10

E (S ) = E (N ) E ( X )

Xi

E ( S ) = E ( N ) E ( X ) = 29

E ( S 30 ) + = E ( S ) 30 + 39.45e 1 = 13.5128

Problem 2

x

1

2

=2

fX ( x)

0.6

0.4

Calculate the expected aggregate payments of the insurance.

Solution

http://www.guo.coursehost.com

S=

N

i =1

fS ( 0)

E ( S 3 )+ = E ( S )

3 + f S (1) 2

fS ( 2)

1

fS ( 0)

f S (1) .

fS ( 2)

P(N )

P ( X 1 , X 2 ,..., X N )

S=

N

i =1

0

1

X =1

0.6

0

1

X =2

0.4

( X 1 , X 2 ) = (1,1)

0.62

e

2e

22 2

e = 2e

2!

Xi

P(S )

e2

( 0.6 ) 2e

( 0.4 ) 2e 2

( 0.6 ) 2e

2

S=

N

i =1

0

1

2

Xi

P(S )

e2

( 0.6 ) 2e 2 = 1.2e

E ( S 3)+ = E ( S )

fS ( 0)

fS ( 2)

1

1.52e

3

2

2

2

1

http://www.guo.coursehost.com

1

= 2.8 3 + e

Problem 3

1.52

1

Sample M #45

Prescription drug losses, S, are modeled assuming the number of claims has a geometric

distribution with mean 4, and the amount of each prescription is 40.

Calculate E ( S 100 ) +

Yufeng Guo was born in central China. After receiving his Bachelors degree in physics

at Zhengzhou University, he attended Beijing Law School and received his Masters of

law. He was an attorney and law school lecturer in China before immigrating to the

United States. He received his Masters of accounting at Indiana University. He has

pursued a life actuarial career and passed exams 1, 2, 3, 4, 5, 6, and 7 in rapid succession

after discovering a successful study strategy.

Mr. Guos exam records are as follows:

Fall 2002

Passed Course 1

Spring 2003

Passed Courses 2, 3

Fall 2003

Passed Course 4

Spring 2004

Passed Course 6

Fall 2004

Passed Course 5

Spring 2005

Passed Course 7

Mr. Guo currently teaches an online prep course for Exam P, FM, MFE, and MLC. For

more information, visit http://actuary88.com/.

If you have any comments or suggestions, you can contact Mr. Guo at

yufeng_guo@msn.com.

- ASM Exam LC Study Manual (90 Days)Uploaded byUngoliant101
- GuoArchMLC2007Uploaded byArlen Wei
- 2. Guo-Deeper Understanding Faster Calc SOA MFE.pdfUploaded byPierre-Louis Truchet
- H. Mahler Study Aids for SOA Exam C and CAS Exam 4， 2013 NodrmUploaded byBob Liu
- FM GuoUploaded byLoh Hui Yin
- SOA Exam MFE Flash CardsUploaded bySong Liu
- DUMLC Guo ManualUploaded byJose Martinez-Gracida
- Exam C FormulasUploaded byTrever Grah
- Asm c Manual PDFUploaded byMike ana
- MLCSM ACTEXUploaded bySony Hernández
- YGM Manual MLCUploaded bySaint-cyr Shibura
- Guo´s Manuals SOA exam C.pdfUploaded byAustin Fritzke
- Formulas for the MFE ExamUploaded byrortian
- Deeper Understanding, Faster CalculationUploaded byjcl_da_costa6894
- Deeper Understanding Fall 08 Manual for FMUploaded byAbdur Rehman
- ACTEX Exam C Spring 2009 Vol 1Uploaded byDavid Thompson
- FINAN MFE / EXAM 3FUploaded byDE Pear P
- A&J Questions Bank for SOA Exam MFE/ CAS Exam 3FUploaded byanjstudymanual
- Deeper Understanding, Faster Calculation- Exam P.pdfUploaded byDiana Yeung
- 2014 Mahler Sample m FeUploaded bynight_98036
- ASM MLC 11th Edition.pdfUploaded byDoaibu
- EXAM MLCUploaded byAdriel Galván Lugo
- Guo+Spring+08+PUploaded byAlyn Lee 南宫竹轩
- module am.pdfUploaded bygrenamo
- Deeper Understanding, Faster Calculation - Exam P Insights and ShortcutsUploaded bySteven Lai
- asm mlcUploaded bycoldpumpkin
- Broverman MLC 2008 Volume 2Uploaded byEverett Moseley
- ASM - MFE Manual Ninth Edition.pdfUploaded byJesus Enrique Miranda Blanco
- MFEUploaded byJi Li

- Introduction to Credibility Theory - Herzog, T.NUploaded byGustavo De La Rosa
- Actuarial TutorialUploaded byPeter Buchan
- The Calorie MythUploaded byGustavo De La Rosa
- 13 Secret Exercises of Physique ChampionsUploaded byjairaghavan89
- Linear Programming - Chvatal VUploaded byGustavo De La Rosa
- 1860744621 - Creative Guitar 1Uploaded byHykal
- Power Factor TrainingUploaded bycabuonav
- Kezia NobleUploaded bySen Lin

- Effect of Sound on Plant GrowthUploaded byRaynaldi Prabowo
- Performance of EcoSan Toilets at Majumbasita in Dar Es Salaam – TanzaniaUploaded byIJEAB Journal
- CEHv8 Module 07 Viruses and Worms.pdfUploaded byMehrdad
- Ardet_ante_oculos_opulentissimum_regnum.pdfUploaded byMirza Čohadžić
- Polar SportsUploaded byjordanstack
- ECON 300 PPT Ch_06Uploaded bysam lissen
- 56600509 Activity Based Costing Classic Pen CaseUploaded byAizen
- BODC One-pager on CandidatesUploaded byMark D. Levine
- HADOOP One Day Crash CourseUploaded byMayank Kataria
- save water sloganUploaded byHoney Ali
- Requirements for Higher EducationUploaded byUnas Mirza
- StudyQs 17bUploaded byrich3626x
- To His Coy MistressUploaded byInactiveAccount
- Managing Extensions of TimeUploaded byNajwa Juaini
- T.A.Bill.form outerUploaded byreadsri
- APLUS 111 Getting Started PetroleumUploaded bychenhui
- 5090_s03_qp_2Uploaded bynehdi234
- Hugelkultur the Ultimate Raised Garden BedsUploaded bylintering
- HUploaded byLotionmatic Lee
- African ReportUploaded byYasin Ozturk
- eigenv5bUploaded byXiaoqin Zhou
- Plenary 3_RK SrivastavaUploaded byAsian Development Bank Conferences
- Programming JobsUploaded bysujadilip
- ShinglesUploaded byEksy Andhika
- 06 Luzan Asia v. CAUploaded byRoizki Edward Marquez
- T10 - Government Accounting.docUploaded byAnonymous lg2LtB0hgC
- determinism_and_free_will_resource_booklet_.docUploaded byAlfredVeevers
- GR 177857-58Uploaded byBerniceAnneAseñas-Elmaco
- Class 9 CBSE English AssignmentUploaded bygurdeepsarora8738
- professional taxUploaded byRawinder