You are on page 1of 720
EXERCISES FOR LESSON 36 703 36.23. (4B-F92:22] (3 points) You are given the following information for 10,000 risks grouped by number of claims. + APoisson distribution was fit to the grouped risks. ‘+ Minimum chi-square estimation has been used to estimate the Poisson distribution parameter. ‘+ The results areas follows: Nunberaf | Actual Number | Eximated Number OF Glsims | OFRisks | Risks Using Poisson a 7873 7788 1387 2 2 10900 You ae given the following agrees OF [ evel Of ignicanced —] Freedom | 0.050 0.010 T 0.005 2 | 899 321 [1060 3 [ar 1130 [12.0 aa 1330 [1430 (ose 1510 [16.70 You are to use Pearson's chi-square stat able fit. Which of the following is true? (A) Reject Hy at a=0.005, (B) Accept Hf at a= 0.005, but reject Hy at a =0.010 (©) Accept Hat a= 0.010, but reject Ha at a= 0.025 (D) Accept ifo at a= 0.025, but reject Hy at a =0.050 (8) Accept fp at a= 0.050 to test the hypothesis, Hp, that the Poisson provides an accept- {/4sruy ana 28hion ‘Bereisesconsnue onthe nest age Copyright czas 704 36, HYPOTHESIS TESTS: CHI-SQUARE 36.24, (4B-F97:20] (3 points) You are given the following: ‘+ The observed number of claims fora group of 100 risks has been recorded as follows: Number of Claims _ Number of Risks 0 80 1 20 ‘+ The null hypothesis, H, is that the number of claims per risk follows a Bernoulli distribution with mean p. ‘+ Acchi-square testis performed using Pearson's goodness-of-fit statistic, Using the chi-square table shown below, determine the smallest value of p for which Hp will be accepted at the 0.01 significance level Degrees of | Significance Level Freedom ool 1 6.63 2 921 3 1134 (A) Less than 0.08 (B) Atleast 0.08, but less than 0.09 (©) Atleast 0.09, but less than 0.10 (D) Atleast 0.10, but less than 0.11 (E) Atleasto.11 36.25. {160-F87:18) In the following table, the values of ,ge are calculated from fully spectiied survival model, and the values of d,+, are observed deaths from the complete mortality experience of 100 cancer patients age x at entry to the study. [ede [dave of o10 | 15 1| 025 | 30 2] 025 | 20 3] 020} 15 4] ois} 10 5 [005 | 10 You hypothesize that the mortality of cancer patients is governed by the specified mode. Let Q be the value of the chi-square statistic used to test the validity of this model, and let n be the degrees of freedom. Determine Qn. 64 @ 74 aa 0) 453 © 463 ety Manat Ise eaon ‘eretss conte onthe next page. Cops cant Ashe EXERCISES FOR 705 36.26. [4-00-29 and 4-F02:28) You are given the following observed claim frequency data collected over a period of 365 days: Number of Claims perDay _ Observed Number of Days 0 50 1 122 2 101 3 92 a 0 Fita Poisson distribution to the above data, using the method of maximum likelihood. Group the data by number of claims per day into four groups: 0 1 2 30rmore Apply the chi-square goodness-of-fit test to evaluate the null hypothesis that the claims follow a Poisson distribution. Determine the result of the chi-square test (A) Reject at the 0.005 significance level (B) Reject at the 0.010 significance level, but not atthe 0,005 level. (C) Reject the 0.025 significance level, but not at the 0.010 level. (D) Reject at the 0.050 significance level, but not at the 0.025 level (E) Do not reject at the 0.050 significance level. 36.27. (4-FO4:10) You are given the following random sample of 30 auto claims: 54140230560 60014100 1,500 1,800 1,920 2,000 2450 2,500 2,580 2,910 3,800 3,800 3,810 3,870 4,000 4,800, 7,200 7,390 11,750 12,000 15,000 25,000 30,000 32,300 35,000 55,000 You test the hypothesis that auto claims follow a continuous distribution F(x) with the following percentiles: x [3103002498 40767408 12,990 Fix) [0.16 027 0550.81 0.90 0.95 You group the data using the largest number of groups such that the expected number of claims in each group is at least Calculate the chi-square goodness-of-fit stat Less than7 (B) Atleast, butless than 10 (©) Atleast 10, but less than 13, (D) Atleast 13, butless than 16 © Atleast 16 (4 Sty anast—13theion Execs continue onthe next page ‘opi o2011 Akt 706 36.28. w 8 © o CC) 36.29. 36, HYPOTHESIS TESTS: CHI-SQUARE {[€-S05:19] Which ofthe following statements is true? For annul hypothesis that the population follows a particular distribution, using sample data to estimate the parameters of the distribution tends to decrease the probability ofa Type Il errar. ‘The Kolmogorov-Smimnov test can be used on individual or grouped data, ‘The Anderson-Darling test tends to place more emphasis on a good fit in the middle rather than in the {ails ofthe distribution. For agiven number of cells, the critical value for the increased sample size. None of (A, (B), (C) or (D) is tue. square goodness-of-fit test becomes larger with [C-S05:39] You test the hypothesis that a given set of data comes from a known distribution with dist bution function (x). The following data were collected: Number of Interval Fix) Observations xe 0.035, 3 2sx<5 0.130 @ BSx<7 0.630 137 Tex<8 0.830 6 B=x 1.000 50 Total 300 the upper endpoint of each interval. ‘You test the hypothesis using the chi-square goodness-of-fit test. Determine the result ofthe test. (A) The hypothesis is not rejected at the 0.10 significance level. (B) The hypothesis is rejected at the 0.10 significance level, but is not rejected at the 0.05 significance level. (©) Thehypothesisis rejected at the 0.05 significance level, but isnot ejected atthe 0.025 significance level. (D) The hypothesisis rejected at the 0.025 significance level, but isnot rejected atthe 0.01 significancellevel. (©) The hypothesis is rejected atthe 0.01 significance level (4S Manas—28eon ‘szretses conte onthe net page EXERCISES FOR LESSON 36 707 36.30. An automobile insurance coverage has the following claim frequency experience for 2005-2008: Year Exposures Claims 2005 500 13 2006 527 92 2007 535 94 2008 538, 1 ‘The data are fitted to a Poisson distribution usiig maximum lkellhood, A chi-square testis performed to test the fit. Atwhat signicance level is the fit to a Poisson distribution accepted? a @ © o c) 36.31. “Hy will be rejected at the 0.01 significance level, ‘Hg will be rejected at the 0.02 significance level, but not at the 0.01 level ‘Hg willbe rejected at the 0.05 significance level, but not atthe 0.02 level. ‘Hp willbe rejected at the 0.10 significance level, but not atthe 0.05 level. ‘Hy willbe accepted at the 0.10 significance level. For a group dental insurance policy, the number of claims submitted in a 3-year pesiod is: — |___ NamberoF Number of Covered Individuals Claims 2008 122 155 2005, 134 215 2006 Ma 230 | You test the null hypothesis that the number of claims per covered individual follows a Poisson distribution with a constant mean over the three year period using the Pearson chi-square goodness-of-fit test. ‘Which of the following statements is correct? a B) © o ® 36.32. Reject the null hypothesis at a =0.008. Accept the null hypothesis at «= 0.005 but not ata: Accept the null hypothesis at a=0.01 but not at. Accept the null hypothesis at a= 0.025 but not at a= Accept tie null hypothesis at « = 0.06, An insurance coverage has the following claim frequency experience for 2005-20 Year Exposures. Claims 2005 1000 80 2006 1025, 90 2007 975, 95 2008 1000 105 ‘The underlying distribution is hypothesized to be a negative binomial distribution with r = 1, = 0.09. A chi-square testis performed to test the fit, Determine the chi-squate statistic and the number of degrees of freedom it has. Additional released exam questions: C-F05:10,34, C-S07:5, (/4 Suey Manual—r3eedon Copyigozn Aske 708 36. HYPOTHESIS TESTS: CHI-SQUARE Solutions 3 degrees of freedom, so is [false] 2is[true] All 25 oss values Gf they are 36.1. Chi-square has 4~ distinct) must be tested for the Kolmogorov-Smimov statist 36.2. First calculate the distribution function at the endpoi 2 FQ)=1-7=05 eao=1-2 =5 FQO)=1-5=2 ‘Now calculate the expected number in each cel B, =250(0.5)=125 B E,=250(0.875—3)=10.42 Ey Now we calculate the chi-square statistic. (120-1257 | (70-625 | (15-2083 | (1-10.42) | (0-31.25 2409+ 1.63+2.02+0.05, 36.3. 806+ 170424 5000 s000e"*? = 4093.7 3000e"(0.2) = 818.7 = 5000~(4093.7 +818.7 +81:9+5.5)=0.2 732 127? aie 25 02 Jo037 * 187 * a19* 55 * 02 Notice that if3+ claims were made into 1 class (as they should be under most rules, even rules that only require ‘expected exposure per cel), the answer would be 1.255, or (B). The question is ambiguous about whether to merge 4+ into 3, making it unfair. 136.4, If there are 5 classes, then there are 5 - 1—1 = [8] degrees of freedom. (C). Unbelievably, the official answer was (B), contradicting official answer (C) tothe last question! 36.5. The sample mean is ¢ = #182) —0,4, and the second moment is 0.6-0.4=0.44. Then 6 = %#-1=0.1 and rp. ‘The expected number of observations is: 1.6637] (Cc) 2001002, i002) — 9.5, so the variance is, 1000p = 1000] 1000p; ‘Sty Manat eon (Coppuighto2o11 At EXERCISE SOLUTIONS FOR LESSON 36 709 sol) (ca) Ci) 1000 ~ 683,013 ~ 248.369 ~ 56.447 = 12.171 1000p; a7 1000Pr(N > 2) Notice that the expected number of observations greater than with another. ‘The chi-square statistics is greater than 5, so we do not merge this class (700 — 683.013) (200-248.3697 | (100~56.447? | (0-12.171)2 683.013 “aan 748.369 56.447 T2171 0.4225 +-9.4195 +33,6035 + 12.171 =[ 55.62. ‘This has 4—1-2= 1 degree of freedom. 36.6. Since there are fewer than 5 expected observations for 4 and 5 claims, the two classes are combined, 13° 3 art gt = aaa * joes * a5 +36 * 5 = 8 Since the parameters were estimated, there are 5—2~ Accept at 0.010, reject at 0.025 | (C) ‘As an additional exercise, you can determine what a and b are for the negative binomial distribution using the fitted values for0, 1, and 2, and then verify that the fitted values for 3 and 4 are consistent, subject to rounding to the nearest integer. degrees of freedom, 36.7. Tpy2Ts nj [3] 6 [10 ]1 Fls[s]s 46 5 degrees of freedom. { Accept at 0.025 significance but not at 0.05. |(D) iJi~? 73 [4 m3 3 [2 36 | 1956 [19% [25 9-068, 05" 0.01 | 1.0562 36725 ' 1956 ' 1.9 In this exereise the chi-square testis applied to groups with ess than 5 expected observations. While this should not be done ordinarily the question mandates that you do it, so follow instru 36.9, “ Fs) Sy /4 Soy Manst—I34heon Copyrigt e201 Asst no 36. HYPOTHESIS TESTS: CHI-SQUARE 449 = 20237 Ey=2T141— F=127.80 Aas a7? | 2077 | 1641? | 22.208 184.49 * 202.37 * 21398 * 271.41 * 127.80 13.93, Q Since the distribution was fitted, the number of degrees of freedom is S~2~1=2. Accept at[ 0.005 } (B) ‘36.10. Let's calculate the expected number of observations in each loss size category: We shall use the formula PIX x)=Sx(x)=e 8 -Ey-Bj=B3 Fe =S(1)~(Es-+ Bu-+ Bs)=1000"8? —(F + Bu + Bs)= 45.1 % ‘The chi-square statistic is then (using the alternative formula) 15? 40? 20 | 15? 10? baad etait zat eet 5 OUI) 180 BB 3 degrees of freedom. ‘Accept at 0.05, reject at 0.10 |. (D) 36.12, Since noparameters are estimated, (B) and (D) are false. The Kolmogorov-Smirnov statistic depends on. the data points themselves, so (C) Is false. However (A) is true. 36.13. TDs yaps apt poe fart a F, [10] 10 [10 | 10] 10 = Leese e+e ‘There ateS~—1=4 degrees of freedom. [ Accept at 0.02, reject at 0.05. | (C) 36.14. There will be3 degrees of freedom. (A) replaces +(32-+02) with ¥ s0Q x 8~0445=9.35, (Nochange) @) replaces (0° +24) with 3 so Q-= 98-02 = 9.6. (No change.) (C) replaces (2? +72) with & so Q 9.8~1.25=8.5. (No change) (D) replaces 472-462) with # so Q=9.8—8.45 =1.35, which is accepted at all significance levels, We see that the biggest change happens when two intervals, one with an excess and one with, a shortage (compared to expected) are combined. (D) g6.15, 5 = SMUIaHE 9 69 (jas Manual—18 ation Copyig 20118880 EXERCISE SOLUTIONS FOR LESSON 36 m sei, Aaya: Eeamreaiers) fy=1289 Bem 1.36 6.89? | 5, 6.36 5.362 a 12.89 + 23.11 J ‘There are 4~ 1 =3 degrees of freedom. | Accept at 0.05 significance but not at 0.10. |(D) 36.17. S{1)=09,5(2)=07,5(3)= 0.4. Expected number of deaths in the Tour years 15,20, 45,60, (21-15) | (27-30F | (39-457 | 63-60" @ it 30, 45, 60, a ‘There are 3 degres of feeviom, so the answer i3.65/3 =[ 12] (2) 36.18, Tt Fj | 368 | 225.1 [381 788, 1692, 9. 7368 * 225.1 * 30 3525 ‘There is 3—1~ degree of freedom. [Accept at 0.05, nat at 0.10 | (D) 36.19. The Poisson probabilities are: [ Pi 71 = 3655 0 | e-°%=0.s4e812 | 200,32 1 | 0329286 120.19 2 | 0.098786 36.06 3 | oo19757 721 4 | 0.002968 1.08 5+ | 0.000394 oa ‘We see that to get 5 expected claims (not actual claims), we must combine 3+ (not 4). This will give us four groups, witk the fourth group having expected claims 7.21 + 1.08-+0.14=8.43, and thechi-square statistic is, using the altemative formula (36.2), 20g | Patt 200.32 * 120.19 * 36.06 * 8.43 18,0561 + 102.5127 +30.1997+17.0819-365 =[2.85] (B) Q 365 36.20. Expected number of accidents is 1000 times standard probability, There are 5 degrees of freedom since thete are 6 groups and no parameters are estimated. Then st ast tot at 250 * 350* 240 * tio * 40 * 10 9+0.6429 + 0.4167 +0.0081+1.225+14, Q 7.59 ‘This is higher than the critical value at 0,005, so the answers (A) fest Manuat—I3thetion (Copyright Can AM mez 36.21. Total number of claims is 112+ 180+ 138 = 430. Fitted claims are 430 times historical probabilities, or 117.992, 151.016, 160.992. Then the chi-square statistic is 5.992? 28.984? 22.9922 17.992 * 151,016” 160.982 3043+5.5628+32836=[9.15] (B) 36.22, We calculate the Pareto cdf at each boundary. 10,000)?" _ (m— mpi mir | or | Fla (ae :) np.=00( Fox) Fux.) | TF 0 | 2000 0.36606 36.606 0654 2000 | 4000 0.56800 20274 0.14699 44000 | 8000 0.76995 20.15 0.99247 8000 | 15,000 0.89881 12.886 «06086 15,000 | c0 1 10.119 00141 ‘The chi-squere statistic is 0.15654+-0.14699-+0.48247-+0.06086-+0.00141 =[ 0.84827 | Since the parameters ‘were fixed based on a different data set, no degrees of freedom are lost for parameters, so there are[ 4 ] degrees of freedom, one less than the number of groups. 36.23, Using the alternative formula (36.2), Tors? , 20 7788 * 947 ‘There are 4~ 1-1 =2 degrees of freedom. { Accept at 0.005, reject at 0.010 | (B) 36.24, Using the usual formula: (60-100(1p))* | (20-1009 ="“Toot=p) 100p (20—100p}? = 663p(1 ~ p) 400 — 4000p + 10,000p = 663p — 663p* 10,663p? — 4663p + 400 =0 4663 + 4663? — 1600(10,663) -2(10,663) Alternatively you could directly set up a chi-square with 1 degree of freedom using one of the cells and the fact that this distribution is binomial, so the mean is 100p and the variance is 100p(1— p). Using the “I” cell, we subtract the mean, square, and divide by the variance to get on] «&) (20-100p¥ 6.63 100p(1= p) which immediately leads to the second line above. (jay Manoa 3h eon Coppi Can Ste EXERCISE SOLUTIONS FOR LESSON 36 713 36.25. = AS=10P , (80=25) (20-25) | (15-20), (10-157 | (10-8F 10 3 Es 20 5 5 2.4167 Since the modelis fully specified, there are 5 degrees of freedom. The answer's therefore 12.4167—5: ® 36.26. Fora Poisson distribution the maximum likelihood estimator is the sample mean, or 50(0) + 122(1) | 101(2)+-92(3) _ 600 365 365 ‘Then the expected observations are, using (a,b, 0) properties of the Poisson, By = 36579" = 70.5342 6438F = 115.9441 7A167 dh 6138 i= (2982) onan na6s-Se.<01200 sng te erative formal 50* 1222 101? 922 Q Foss * Tisoaai * S.2904 7 83,2273 a Since one parameter was fitted, the number of degrees of freedom is 2, and the statistic is between the critical values for 2.5% and 1%. (C) 36.27. With 30 claims, you need a probability of atleast 1/6 to have 5 expected claims, Thus we must group 0-500 and 4876+, giving us 4 groups. Then we have Tips = Expected Group Claims (0, 500) 3 81 (600, 2,498] 8 84 (2498, 4,876) 8 78 (4,876, 00] 10 87. square statistic Is, using formula (36.2) #8 we at aat zat 37 3078508] wy 36.28. While statement (A) may be true in many cases, such as when using chi-square, i: may not be true if ‘other test statistics are used. Ifthe critical value is not affected by using sample data, the probability of accepting the null hypothesis when itis false may not decrease, Statement (B) was controversial. The textbook, which was the second edition of Loss Modelsat the time, said [The Kolmogorov-Smirnov} test should only be used on individual data’, and thus this statement was consid cred false. However, itis possible to find bounds for the Kolmogorov-Smirnow statistic when you have grouped data, and sometimes these bounds may be adequate to accept or reject a hypothesis, In fact, the third edition of Loss Models softened its tone and says (page 448) "This test as presented here should only be used on individ- ual data ...” and in footnote 6 says “It is possible to modify the Kolmogorov-Smirnov test for use with grouped data.” [think that the answer for (B) with the current edition of Loss Models would be true. Because the Anderson-Darling test divides by the variances, F(x\1 — F(x)), it will put more weight on the tails where the variance is smallest, making (C) false. ‘Sample size does not affect the chi-square goodness-of-fit test; the critical value depends on the number of cells, not the sample size. Thus (D) is false. (E) /4 Sy Manoa —3ton Gopi 0201158 m4 36, HYPOTHESIS TESTS: CHI-SQUARE 36.29, The expected number of observations in each interval is 300(F(x:) — F(x;-1)), of 105, 28.5, 150, 60, and 51 in the five intervals given, The chi-square statistic is, (=1057 , (42-285) | (137-150) (66-60) (50-51? 10s 285 150 31 {8810+ 6.3947 + 1.1267 +0.6-+0.01961 = 11.0220 @ ‘There are 4 degrees of freedom, one less than the number of intervals. This is between the critical values for 0.05, significance and 0.025 significance. (C) 136.30, We fit, then calculate the chi-square statistic, Us+s24944121 420 “= 500+ 527+535+538 2100 “1s 1B 1 aa = T00* 105 * 107 * 1076 ‘There are 4~ 1=3 degres of freedom. One degree of freedom i los by iting the Poisson, but no other degree of freedom is lost, as discussed in Section 36.4.{ Accept at 0.05, reject at 0.10. | (D) 3631. The total numberof covered individuals is 122-+ 194+ 144 = 400 and the total numberof claims Is 155+ 215+ 230 = 600. The Poisson parameter is estimated as 600/400 = 1.5 (using method of moments ot maximum likelitood). The fitted claims are 1. times the number of covered individuals, cr 183, 201, and 216 forthe three yeas. The chi-square statisti is 0.2 =6642 (155— 183), (215~ 2017 | (230-2167 188 201 216 Because a Poisson was fitted, the alternative formula (36.2) may also be used: =4.2842+0.9751+0.9074. 1667 _ 155? | 215? , 2308 = Fa t Sort ag — 000= 131.2042 +229 9751 + 244.9076 — 600= 6.1667 ‘There are 2 degrees of freedom: no degree of freedom is lost for the years since each year is independent, ‘but 1 degree of freedom is lost because the mean was estimated. 5.991 < 6.1667 <7.378, so the answer i (D). 36.32, Since th’s is a case of separate years, not intervals of data, Section 36.4 applies. The variance, not the ‘expected value, is used in the denominator. The mean ofthe fitted distribution is rf =0.09and the variance is. + B(1-+ 6) =(0.09X1.09)=0.0981. Therefore, the means and variances for each year are Year Mean Variance 2005 1000(0.08) 1000(0.0981 2006 1025(0.08)=92.25 1025(0.0981)=100.5525 2007 975(0.09)=87.75 $75(0.0981 2008 1000(0.09) ‘The chi-square statistic is 22st 725% | 1st 100.5525 * 95.6475 * 98.1 3.9129 ‘There are| 4 degrees of freedom |. No degree of freedom is lost Ifyou don't understand why, read Section 36.4. ‘The alternative Tormula (36.2) cannot be used for this exercise. (4S Mansa 138 aon QUIZ SOLUTIONS FOR LESSON 36 Quiz Solutions 36-1, The fitted values are: —e"1 =0,632121 = 0.981684 (1000) (4000) =1-e ‘The chi-square statistic, using the alternative formula, is 3222 3312 ‘fst Manas —13thehion (Copyigt C201 SM ire 36 78 24(0.632121) =331.2 Fy =524(0.981684 ~ 0.632121) Ey=524(1 —0.981684)=9.6 5.98 76 36. HYPOTHESIS TSSTS: CHI-SQUARE (4S Manuat—1stetion ‘Copy ean Asht Lesson 37 Likelihood Ratio Algorithm, Schwarz Bayesian Criterion Reading: Loss Models Third Edition 16.4.4-16.5 ‘They've been asking close to one question per exam on this easy material ‘We shall discuss methods for selecting a model. There are judgment-based and score-based methods. Ex- amples of judgment-based methods are: 1. Graphs, like p-p plots. 2, Success of similar models in similar situations. If Pareto worked well in the past, you may want to continue using it 3. Forced models. fonly one claim can occur per year, claim counts are Bemnoull 4. Judgment about which score-based method to use; they can lead to different results. Score-based methods are: 1. Lowest Kolmogorov-Smirnov statistic 2. Lowest Anderson-Datling statist 3. Lowest chi-square statistic 4, Highest p-value for chi-square test 5. Highest value of likelihood funetion atthe maximum The first three score-based methods result in higher scores for a model with more parameters; there is no penalty for adding additional parameters. But this is no good; there should be cost for adding complesity to the model. The p value of the chi-square test charges for extra parameters by reducing the number of degrees of freedom. For the fifth approach, two methods for adjusting the loglikelihood are provided, and we will now discuss them, 37.1 Likelihood Ratio algorithm If parameters are added to a model, the new model will have a loglikelihood at least as great since the former model is a special case of the new model. For example, a gamma model will have a loglikelihood at least as great as an exponential model, since atthe very worst we can set a= 1 in the gamma model and obtain an exponential model. To test waether It is worthwhile adding new parameters, we use the fact that twice the increase in the loglikelinood is chi-square with degrees of freedom equal to the number of additional parameters. Since testing the difference of loglikelihoods is equivalent to testing the ratio of likelihoods, this is called the likelihood ratio test Let’ state this more precisely: (/4 Say Marsal —i3teaon a7 (Copyigt C2011 880 718 37, LIKELIHOOD RATIO ALGORITHM, SCHWARZ. BAYESIAN CRITERION A free parameter is one that is not specified, and that is therefore maximized using maximum likelihood. ‘The number of degrees of freedom for the likelihood alternative model, the model of the alternative hypothes! ‘base model, the model of the null hypothesis {est is the number of free parameters in the ‘minus the number of free parameters in the Then the likelihood ratio test accepts the alternative model the logikelihood of it exceeds the loglikelihood of the base model by one-half ofthe appropriate chi-square percentile (1 minus the significanc> level ofthe test) at the numberof degrees of freedom for the test. n other words, the alternative models accepted if2In(Li/1)> ¢, where Pr(X°> ¢)=a for X a chi-square random variable with the number of degrees of freedom forthe tes. ‘Thus ifn exponential model isthe base model and a gamma model isthe alternative the exponential model has 1 free parameter, since an exponential is a gamma distribution with « = 1, so only @ i free. The gimma ‘model has 2 free parameters: « and 0 are both unspecified. To accept the gamma model, we requite that that its loglikelthood be greater than the loglikelhood of the exponential model by atleast one-half the chi-square percentile with 1 degree of freedom. This test mayalso be used when one model is limiting case of another. For example, a Poisson is alimiting case ofa negative binomial as f -0. Even when one model is not a subset of another, so thatthe testis not mathematically valid, we can use the likelihood ratio test as a plausibility argument; the model with more parameters should havea loglikeilhood increase at least as high as indicated by the chi-square distribution. The way the test worksis that you start off by selecting, for every number of parameters, the mode! with the highest loglikelihood. La’s suppose that we are using 5% as the significance evel. In order to prefer the best 2-parameter models over the best I-parameter ‘model, twice the excess ofthe loglikeihood ofthe 2-parameter model ver the logikelihood ofthe I-parameter ‘model must be at least as high as the 9th percentile of chi-square with 1 degree of freedom, or 3.6816. Ifthe best 2-paramete: model isnt this good, you look at the best 3-parameter model, and twice the excess ofits Joglikelihood over the loglikelinood of the 1-parameter model must be atleast as high as the 5th percentile of chi-square with 2 degrees of freedom, or 5.9815. And so on. Ifthe -parameter model is preferred, you start all over again, comparing3-, 4, and higher- parameter models tothe 2-parameter model usingthe same criterion: to prefera mode, twice the excess of the logikelinood over the previously accepted model must be atleast as high as the 95th percentile ofa chi-square with degrees of freedom equal to the dference in the number of parameters between the models. ExamPLe37A You have derived parameters of various models for your observed data, using maximum likeli- hood. ‘The following table summarizes the loglikelihoods of those models with maximal loglikelihood for any given number of parameters: ‘Number of parameters | Maximal Loglikelinood 321.32 319.93 319.12 8.12 You select the model by using the likelihood ratio algorithm at 10% significance. 1, Which model do you select? 2. How would the answer change if the best 2-parameter model has a loglikelihood of ~319.98% Awswer: 1. With 10% significance, the critical values of chi-square at 1, 2, and 3 degrees of freedom are 2.71, 4.61, and 626 respectively. Dividing by 2, we get 1.35, 2.30, and 3.13, which are the thresholds required to add additional parameters. The 2-parameter model increases the loglikelihood by 1.39, thus meeting (yataty Manas—I3¢on (Copyright C201 sit ‘37.1. LIKELIHOOD RATIO ALGORITHM 719 the threshold; itis preferred to the 1-parameter model. The 3- and 4- parameter models have loglike- lihoods 081 and 1.81 higher than the 2-parameter model respectively; they do not meet the threshold. ‘Thus| the2-parameter model Is selected 2, If the 2-parameter model had a loglikelihood of ~319.98, it would increase the loglikelihood over the 1- paramete: model by 1.34, which does not meet the threshold. The 3-parameter model increases the log- likelihood by 2.20, which is less than 2.30. The 4-parameter model increases the loglikelihood by 3.20, ‘which is higher than 3.13. Thus| the 4-parameter model is selected o Quiz37-1 You are given the following results for fitting models to a data set, Model Maximal loglikelihood Burr “738 Exponential 795 Inverse Pareto “727 Paralogistic 772 For which ofthe following levels of confidence is the paralogistic model preferred by the likelihood ratio test? (More than one choice may be correct.) (A) 90% 8) 95% (©) 97.5% (0) 99% E1 99.5% ‘The following example shows how the likelihood ratio test would be used to compare two models using the same distribution, but with different numbers of specified parameters. EXAMPLE 37B You are given a sample with five observations: 05 5 10 2 30 Vinx ‘uate considering wo models forthe underying dstrbuton 1 Wei 5,0=25 I, Weibull fewith maximum iketIhood 6158 ull with > ‘The loglikelihood function of the Weibull fit at the optimal value of the parameters is ~17.8625. Calculate the likelihood ratio statistic, and using a chi-square table, determine the significance levels at which the first model is accepted. ‘Answer: The probability density function for'a Weibull is fl: ‘The loglikelthood function for the 5 points is 1(¢,0)=nine+(e—-D Inxs Evaluated at (1.5,25}: zat 105,25) an In1.5+0.5(9.6158). —9(1.5)In25=~19.6817 ‘/4stuy Manat 3thedon (Copyign cz0 Aste 720 37, LIKELIHOOD RATIO ALGORITHM, SCHWARZ. BAYESIAN CRITERION ‘The statistic Is 2'~17.8625 + 19.6817) = 3.638. There are two degrees of freedom, since Model I has no free parameters and and Model II has two free parameters. Model I is accepted at all levels of significance in the tables, since even at 10% the critical value is 4.605 which is greater than 3.638. Since a chi-square distribution ‘with 2 degrees of freedom is an exponential with mean 2 (although you're not responsible for that fact), we can explicitly calculate the p-value: Pr(X> 3.638) = e982 1622 o EXAMPLE 37 You are given a sample with five observations: 05 5 10 2 30 You are considering two models for the underlying distribution: 1. Exponential fit with maximum likelihood, TL Weibull fitwith maximum likelihood, ‘The loglikelihood function at the optimal value of the parameters is -17.8625.. Calculate the likelihood ratio statistic, and using a chi-square table, determine the significance levels at ‘which the first model is accepted. ANswer: An exponential isa special case of a Weibull with + = 1, so Model I constrains the @ parameter, while ‘Model II constrains both parameters. The exponential maximum likelihood fit sets @ = %= 13.1, The density and loglikelinood for an exponential are ene at almost any degree of significance. a ‘The likelihood ratio test can be used to decide whether to combine two data sets into one model or to model, them separately. Ifthe data sets are combined, then the number of free parameters of the overall model is the number of parameters in the single model. If they are not combined, the number of free parameters of the overall model is the sum of the number of parameters of the two models. Twice the logarithm ofthe likelihood. ratio should exceed the chi-square distribution with degrees of freedom equal to the difference in the number of fiee paranievers i the overall 1iodels. ExampLe 37D For a group covered by insurance, there are 200 members and 20 claims in 2001. There are 250 ‘members and 30 claims in 2002. ‘Two alternative models are proposed: 1. APoisson distribution with parameter A; to model 2001 claim frequency and a Poisson dist parameter 42 to model 2002 claim frequency. 2. A Poisson distribution with parameter A to model 2001 claim frequency and a Poisson distribution with parameter 1.12 to model 2002 claim frequency. (jai Manas ation Copyign C20 Asie 37:1. LIKELIHOOD RATIO ALGORITHM 721 Calculate the likelihood ratio statistic to decide whether to prefer the frst model to the second one. ANswen: We must ealeulate the maximum likelihood ofeach model Suppose that in a given year there are n members each having 1,2...» claims, withthe total amber of claims 37,2, = m. The density funetion for a Polsson with parameter 1s /()= ¢-™, so the likelihood function andoglikeinood functions are In the first model, we know that the MLE ofeach Poisson distribution is the sample mean, or 2. The loglike- lihood for each year j= 1,2 then becomes: ‘with , = 200, m; = 20, 2 ~250, mz =30, Adding diese together for the nwo years, we get WA, 2a) Let’ calculate the MLE ofthe second model. The likelihoods in the first year are e~**/:!, and in the second. year they are e~!19(1.12)"/x. Temporarily ignoring the constants x), we have 200-2099 20,1, 199920) e911)” 1(0)=—4752+50In2-4301n1.1 1a) a 50 tt a5 48 Ha-4754 220 1-2 m5 ‘At 50/475, the loglkelihood is 1(&) =-s0+50In % +30m11-In TT =asen-in [] a! "en, The likelihood ratio statistic i twice the difference. Notice thatthe term In[] 1zj<2 1%! the same in both ‘models, since the product in each case is over the number of claims of all members in both years, so thesame ‘tems are being multiplied. Therefore this term drops out when subtracting the second models statisti from the first model’ statistic. Therefore, twice the difference is 2(-159.66+ 159.71) =[ 0.1 {4S Manat —ithedion (Copyght 62011 Ant 722. 37, LIKELIHOOD RATIO ALGORITHM, SCHWARZ BAYESIAN CRITERION Note that the number of degrees of freedom for the likelihood ratio testis 1, the difference between the ‘number of free parameters in the first model (2) and the second model (1). The second model would be selected at almost any level of significance, since 0.1 is such a low number. a 37.2 Schwarz Bayesian Criterion Loglikelihood is proportional to n, the size of the sample, The likelihood ratio algorithm thresholds will therefore bbe easier to meet as n grows, making it easier to justify more complex models. The Schwarz Bayesian Criterion (SBC) attempts to compensate for this by applying a penalty of (r/2)In n to each model before comparing them. ‘This means that each additional parameter must increase the loglikelihood by (In n)/2 to justify its inclusion, Exam 37E Asin Fxample37A, you have derived maximum loglikelihoods for several models and they are the ‘ones presented in the table in that example. The data consists of 10 points. You select the model by using the ‘Schwarz Bayesian Criterion. Which modeldo you select? ‘Axswer: (In 10}/2= 1.15, so you will charge a penalty of 1.15 for each parameter. This means that the penalized values ofthe four models are: ‘Number of parameters Penalized value 1 321.32 1.15=—322.47 2 ~319.93 -2.30= 322.23 3 ~319.12—3.45=~322.57 4 =318.12~ 4.61 = ~322.73 (On the 4-parameter line, 4.61 rather than 4.60 resulted when multiplying a more precise value of (In 10)/2 by 4 and rounding to two places.) We see in this table that the highest value, after the penalty, occurs for the 2-parameter model. Therefore the 2-parameter model is selected a Exercises 37.1. Which ofzhe following is true? (A) Ahigh p-value tends to imply rejection of the null hypothesis. (B) The significance level of a hypothesis testis the probability of making a type I error given that the null hhypothesiss false. (©) Ifthe test statistic lies within the rejection region, the alternative hypothesis is acceated. (D) IFT fs the test statistic from a likelihood ratio test, the test rejects the nuill hypothosis if T > e, where T has a chi-square distribution with the number of degrees of freedom equal to the number of free parameters in the model, and cis such that Pr(7'> c)= a, where ais the significance level. (©) The critical values fora hypothesis test do not depend on the significance level ofthe test. (4S Manat —138 eon xeries conn on the next page. Copyright e201 ASM EXERCISES FOR LESSON 37, 723 37.2, Claim sizes are fitted to several models, with the following results Model Negative Loglikelihood Exponential 613.7 Pareto 6133 Loglogistic 613.1 Weibull 6124 Burr 6109) Generalized Pareto 6106 Using the likelihood ratio algorithm at 5% significance, which model is preferred? Use the following information for questions 37.3 and 37.4: You fit various models to 20 loss observations using maximum likelihood. The fits maximizing the likeli hood for given number of parameters have the following loglikelihoods: ‘Number of parameters | Loglikelihood 1 142.32 2 —140.75, 3 139.40 4 138.30 5 =137.40 37.3. Using the likelihood ratio algorithm at 95% confidence, how many parameters are inthe selected model? 37.4. Using the Schwarz Bayesian Criterion, how many parameters are in the selected model? 87.8. 196 claimssizes are fitted to several models, with the following results: Model [Negative Loglikelihood Exponential 4253 Lognormat 423.9 Burr 4216 Weibull 4210 Inverse Gaussian 420.0 Using the Sctwarz Bayesian Criterion, which model is selected? 37.6. 95 claim sizes are fitted to several models, with the following results: Model Negative Loglikelihood Exponential 487.0 Inverse Exponential 4875 Gamma 487.0 Inverse gamma 484.1 Burr 482.0 Using the Schwarz Bayesian Criterion, which model is selected? /4 Suey Manust3theion xeries continue on the next page Copyign O20 Asse 724 37, LIKELIHOOD RATIO ALGORITHM, SCHWARZ BAYESIAN CRITERION 37.7. Your company sells auto collision coverage with a choice of two deductibles: a 500 deductible and a 1000 deductible. Last year, your experience was as follows: Deductible | Number of Claims | Average Claim Size (after deductible) 500 55 700 1000 45 1100 It is suspected that policyholders with the higher deductible pad their claims so that they are above the deductible, resulting in higher average loss size. To investigate this, you assume that losses on the coverage have an exponential distribution. You test the hypothesis that the mean of this distribution is the same for each coverage using the likelihood ratio test. Which of the following statements is truct (A) The hypothesis is rejected at 0.5% significance. (B) The hypothesis is accepted at 0.5% significance but rejected at 1% significance. (©) The hypothesis is accepted at 1% significance but rejected at 2.5% significance. (D) The hypothesis is accepted at 2.5% significance but rejected at 5% significance. (©) The hypothesis is accepted at 5% significance. 87.8. For an insurance coverage, you are given two populations. Claim sizes for both are assumed to follow a lognormal distribution, possibly with different parameters. Parameters for each population, and for both com- bined, are estimzted using maximum likelihood, with the following results: Population Estimated Parameters _ Negative Loglikelhood First population only 7235 ‘Second population only 7227 Bot populations combined 14498 ‘The null hypothesis is that parameters are the same for both populations. Which ofthe following statements s truet (A) ‘The null hypothesis is rejected at 1% significance. ®)_Thenullhypothesisis accepted at 1% significance, but rejected at 2.5% significance. (©) Thenullhypothesisis accepted at 2.5% significance, but rejected at 5% significance. (D) The null hypothesis is accepted at 5% significance, but rejected at 10% significance. (©) Thenullhypothesisis accepted at 10% significance. 37.9, ‘The underlying distribution for a random variable X is lognormal with @ = 2.5. The parameter jis estimated from a sample of 100 observations using maximum likelihood. The resulting estimate is l=6. et Iq) be the loglikelihood of the 100 observations based on a lognormal distribution with parameters ando=25. Determine 1(4)— 1(6). 37.10, The underlying distribution for a random variable X is lognormal with @ = 2.5, The parameter jis estimated asf based on a sample of 100 observations using maximum likelihood. ‘The true value of is 4. et Iq) be the logikelinood of the 100 observations based on a lognormal distribution with parameters and =25, Determine Es(U(4) 1(Q) (/4S Manat 128 eon -eries conte onthe net page hr e201 St EXERCISES FOP LESSON 37 725 87.11, [4-801:20] During a one-year period, the number of accidents per day was distributed as follows: ‘Number of Accidents Days 209 Ty 33 7 3 2 For these data, the maximum likelihood estimate for the Poisson distribution is 4 = 0.60, and forthe negative binomial distribution, itis #=2.9 and f 0.21. The Poisson has a negative loglikelihood value of 385.9, and the negative binomial has a negative loglikeihood value of 382.4. Determine the likelihood ratio test statistic, treating the Poisson distribution as the nullhypothess w @1 3 os @7 37.12. You ft atwo:parameter Pareto toa sample of 100 claim amounts x... igo and use the likelihood ratio test to test the hypothesis Hy: 8 =9and against 1:0 =9anéa¢3 You are given that 3°)? In(9 +x, Determine the result ofthe test. (A) Reject at the 0.005 significance level. (B) Reject a the 0.010 significance level, but not at the 0.006 level, (© Reject at the 025 significance level, but not at the 0.010 level (D) _Rejectat the 0.050 significance level, but not atthe 0.025 level ©) Donot wject atthe 0.050 significance level /4stady Mana —3thedon ‘Bercises continue onthe nxt page. (CopyigtC201 ASM 726 37, LIKELIHOOD RATIO ALGORITHM, SCHWARZ BAYESIAN CRITERION 37.13, You fit Weibull distribution toa sample of 20 claim amounts. You test Hy: =2 against Ay t #2 using the likelncod ratio statist. You are given @ Sinx,=73.6177 Exp 07.266 (ii) Atthe maximum likelihood estimate, the loglikelihood is -98.443 Determine the result of the test. (A) _ Reject atthe 0.008 significance level (B) _Rejectat the 0.010 significance level, but not at the 0.005 level, (©) Rejectat the 0.025 significance level, but not at the 0.010 level (D) _ Reject at the 0.050 significance level, but not atthe 0.025 level (BE) Donot wject atthe 0.050 significance level 37.14, [4-F03:28] You fit a Pareto distribution to a sample of 200 claim amounts and use the likelihood ratio testo test the hypothesis that a = 1.5 and @=7.8. You are given: (®) The maximum likelihood estimates are @= 1.4 and =7.6 Gi) The nataral logarithm of the likelihood function evaluated at the maximum likeihood estimates is ~81792 (ai). Sint +7.8)=607.68 Determine the result ofthe test. (A) Reject ar the 0.005 significance level (B) Reject ar the 0.010 significance level, but not at the 0.05 level (©) Reject atthe 0.025 significance level, but not atthe 0.010 level, (D) Reject athe 0.050 significance level, but not at the 0.025 level (©) Donot ject at the 0.050 significance level 37.15. [4-FO4:22| Ifthe proposed model is appropriate, which of the following tends to zerm as the sample size {goes to infinity? (A) Kolmogorov-Smirmov test statistic (B) Anderson-Darling test statistic (©) Chiesquare goodness-of-fit rst statistic (D) _Schware Bayesian adjustment ©) None of), (),(C) or (D) Additional released exam questions: C-F0S5:25, C-F06:22, C-S07:14 fasta Manastisi eon Coppi ozo Aste EXERCISE SOLUTIONS FOR LESSON 37 727 Solutions 87.1. (A) would be correct if*high’ is replaced with "low, (B) would be correct if “false” is replaced with “true”, (D) would be correct if the number of degrees of freedom is the number of free parameters of the model minus the number of fee parameters of the null hypothesis. (E) would be correct if the words “donot were removed, ‘The answers (©. 37.2, Pareto, Weibull, and loglogistic are 2-parameter distributions. Weibull is the best 2-parameter distribu- tion, the one with the lowest negative loglikelihood of the three, but 2613.7 ~612.4)=2.6 < 3.84, the 95% point with I degree of freedom, so itis not significantly better than exponential. Burr and gene-alized Pareto are 3- parameter distributions. The generalized Pareto is the better 3-parameter distribution, the one with the lowest. negative loglikelihood of the two, and 2(613.7— 610.6) = 6.2 > 5.99, the 95% point with 2 degrees of freedom, so itis better than the exponential. [ Generalized Pareto 387.3. At 95% confidence, the critical value of chi-square is 3.84 at 1 degree of freedom, 5.99 at 2 degrees of freedom, and 7.81 at 3 degrees of freedom. Dividing each of these by 2, we compare the likelihood ratio statistic against 1.92, 3.00, and 3.91 respectively ‘+ Comparing the 2-parameter model to the 1-parameter model, —140.75 + 142.32. 2-parameter model ‘+ Comparing the 3-parameter model to the 1-parameter model, 139.40-+ 142.32 = 2.92<3.00, so reject the 3-parameter model, 5T< 1.92, so reject the ‘+ Comparing the 4-parameter model to the 1-parameter model, ~138,0+ 142.32 =4.02> 3.91, so select the 4-parameter model. ‘+ Comparing the 5-parameter model to the 4-parameter model, 137.40 + 138,30=0.90< 1.92, so reject the 5-parameter model ‘The 4-parameter model is selected, 374, (In20)/2= 1.50, so the adjusted loglikelihoods are: T-parameter model | 4232-15: 2-parameter model | ~140.75—3.0: 3-parameter model | ~139.40—4.5= 4-parameter model | —138.30-6.0: S-parameter model | ~137.40-755 143.90 144.30 144,90 ‘The 2-parameter model is selected. 37.5. Negative loglikelihoods are given, so we are minimizing and must add the penalty function, which is a ‘multiple of (In186)/2 = 2.64. The only 1-parameter distribution is the exponential, for waich 425.3 + 2.64 427.94, The 2-parameter distribution with lowest negative loglikelihood is the inverse Gaussian (whose negative loglikelihood is even lower than the 3-parameter Burt); 420.0 + 2(2.64)= 425.28, ‘The inverse Gaussian is selected. 87.6. The penalties are multiples of (In95)/2 = 2.3. The best 1-parameter distribution is the exponential; 487.0+2.3= 409.3. The best 2-parameter distribution is the inverse gamma; 484.1+-4.6= 4887. The3-parameter Burr yields 482.0+ 6.8 = 488.8. The inverse gammais selected. 37.7. For an exponential with mean @, the conditional density function of claim sizes x, after applying the deductible, given that the loss is over a deductible d is flxitd) _ pertsave 1-F(d)~ eat /4 Sy Manual i3teaon (Coprigh 02011 Ashe 728 37, LIKELIHOOD RATIO ALGORITHM, SCHWARZ BAYESIAN CRITERION This is, of course, the memoryless property of the exponential distribution: f(x: +d | x;>d)= f(x. Ifthere are n claims of sizes x,....%n, multiplying this results in ikelthood and loglikelihood functions of, ‘The MLE for an exponential isthe sample mean, as discussed in shortcut #1 on page 537. Plugging in 0 =. the formula for (8), we get: 1(3)=-ming—n ‘Thus, ifthe models are estimated separately, the MLE for the 500 deductible is 700 and the MLE for the 1000 ‘deductible is 1100. The loglikelinood function forall the data would be the sum of the loglikelihoods of each deductible, or -551n700 ~ 55 ~451n 1100 ~ 45. If the combined data is used to create a single model, since the density function is the same for both de- ‘ductibles (due to the lack of memory), the likelihood function is the same too. The mean forthe combined data is 55(700) +45(1100) _ 5 45555 ‘0 the loglikelihood is -1001n880— 100. ‘The loglikelihood of the seperate models minus the loglikelihood of the combined model is 351700 451n 1100+ 1001n880=2.545. Doubling this, we obtain as the loglikelihood statistic (2.545) =5.090, The combined model has one parameter (the parameter of the exponential distribution), while the separate models have two parameters (each one has ‘one parameter of the exponential distribution), so there is 1 degree of freedom in the statistic. Looking up the chi-square table, we se that 5.02 < 5.09 < 6.64, We therefore| accept at 1% significance but reject at 2.5% | (C) 37.8, Thealternative hypothesishas two free parameters addtional othe nullhypothesis. Using thelikelihood ratio test, 2(1449.8 (723.5 +7227)) =7.2 with 2 degrees of freedom. Hence accept at2.5% reject at 5% | (C) 37.9. The likelihood and loglikelihood are (note that 2(2.5*)= 12.5) e-EtonpP at g-Blon-uhas Ul Saat n 25° Oe TTS Sila — wh tym SAP _ 991.25 somes tof Jos Inthe difference 1(4) ~1(6), only the first term survives: Letnxs~6)*— SOx, ~4P 125 L(inx 67 14)-16) (anxi-6)+¢ 125 1006-4)? 125 In the second line, the cross term 25(In x, ~ 6)(6—4) =0, since is the maximum likelihood estimator, ‘we know that the maximum likelihood estimator for a lognormal isthe average ofthe logs of the data, so (Cinz,)/n=6. The final answeris negative, since the logikelihood is maximized at 6. ety Marat in Cops can Ast EXERCISE SOLUTIONS FOR LESSON 37 729 37.10. ‘Thelikeihood and loglikelihood are e-Blon-niet oP TT Sina we 2a? uw hw =100!na ~soin2x— Inf In the difference 1(4) - (a), only the first term survives: Since x; is lognormal, In.x; follows a normal distribution with jt = 4, so the expected value of S(in.x; ~ 4)? is n times the true variance of the normal distribution, or no? = na2, The maximum likelihood estimator ofthe lognormal is the average ofthe nx,’ of the sample, as discussed in Subsection 30.3.1, page 540. Therefore, S(lnx; ~ 2} is the biased sample variance ofthe Inx:S, of n 1 times the unbiased sample variance ofthe normal distribution ts expected value is(n 1a, The difference is therefore ~0?, and B14) ~ 1a = 52 -[0s 37.11, The statistic is twice the difference of loglikelihoods, oF 2(385.9 -382.4)=[7 87.12, The density and loglikelihood functions for a Pareto are oy 1(a,0)=nIna+naind ~(a+1) 7 In(6-+x1) flxiaa)= At(a,8)=(3,9), this is 1(3,9)= 1001n3 + 100(3)In3 — 4(262) For 0 =8, the optimal value of « using maximum likelihood is, based on the formula in Subsection 30.4.2, "78.971 42.275 K-=1001n9 Yn ¢5))=219.7205-202 100 a a= = 2.365322 ‘The loglikelthood at (2.365322, 9) is (2.365322) = 100In2.365322 + 100(2.365322)In9 ~ 3.365322(262) =~275.909 This fit constrained a to be the maximum, so the difference in the number of free parameters between the ‘two models is 1: 2 free parameters in H, 1 free parameter in Fy. The likelihood ratio statistic is 2(-275.909 + 278,971) = 6.124 which at one degree of freedom is between the 97.5th percentile (5.024 and the 99th per- centile (6,635), so the answer is (©), ‘37.13. We need find the optimal 0 for a Weibull with + = 2, Use formula (30.1) with > and d;=0 forall i: , No censored data, = ‘The density and loglikelihood functions are Slxit.8) (4st anus I3teon Copnign ezai sie 730 37, LIKELIHOOD RATIO ALGORITHM, SCHWARZ BAYESIAN CRITERION nine +(e) Ena For t =2 and the corresponding optimal value of 8, 16,2) =~ 20(2)in 66.0553 =201n2+73.6177 ~20~401n66,0553 = —100.139 istic is 2(-98.443 + 100.139) strains both variables (both are optimized), so the difference is 1 degree of freedom. The c square at 1 degree of freedom are 2.706 at 90% and 3.841 at 95%, so the answer is (E).. 87.14. The Pareto density is ‘The likelihood of 200 claim amounts with 3.392, Ho constrains «(a is optimized) and H; con- al values of chi- Sand 0 =7.8is a war 1,507,205) 818 Fa The loglikelihood is 10.5,7.8)=2001n1.5+300In7.8-2.5 (607.64) = 81.0930+616.2371-2 ‘Twice the difference between ~821,77 and ~817.92 Is 7.70. You are restricting two parameters, so the critical value Is the chi-square critical value at 2 degrees of freedom. 7.70 is greater than the 2.5% significance level (7,378) but less than the 1.0% significance level (9.210). (C) 37.15. Only the Kolmogorov-Smimnov critical value gets smaller based on the size of the sample; the critical value isa constant aver the square-root ofthe size of the sample. For an appropriate model, the statistic should be less than the critical value, and thus must go to zero as the sample size goes to infinity. The Anderson-Darling ‘and chi-square statistics are independent of the size of the sample, and the Schwarz Bayesian adjustment in- ‘creases with the size of the sample (logarithmically). (A) Quiz Solutions 37-1, The Burrmodel has 3 parameters and the exponential model has 1 parameter. The irverse Pareto model hhas 2 parameters and is inferior to the 2-parameter paralogistic model since its likelihood is lower, soi jected. In order to prefer the paralogistic to the exponential, 2(—77.2 + 79.5) = 4.6 must be at least as high as the chi-square statisic with 1 degree of freedom. Its greater at 90% and 95% but not at higher levels of confidence. In order to prefer the paralogistic to the Burr, (-75.8-+ 77.2) =2.8 must be less than thechi-square statistic ‘with 1 degree of freedom. Its ess than 3.841, at 95% confidence, but not less than 2.706, at 90% confidence. We conclude that the paralogistic model is preferred only at 95% confidence. (B) Although not requested, you can shaw that the exponential is preferred at 99% and 99.59%, while the Burris preferred at 90% and 97.5%. Cop C20 ASM Part IV Credibility 732, PART IV. CREDIBILITY Credibility theory has been on the CAS syllabus longer than I've been a fellow. The SOA started giving credit for taking credibility on the CAS syllabus in the 19908, and it became a popular option. In 2000, when the two organizations decided to jointly sponsor preliminary exams, credibility became a requirement for SOA students. ‘Asa result of this topic being a CAS topic before 2000, loads of released exam questions are available, and these ‘questions are still useful and relevant for most of the current syllabus, which has hardly changed. There are now three options for texts which may be used for studying credibility. To help you choose one (or ‘maybe more), here is my review: Loss Models, by Klugman et al Description: Ths is the same textbook asthe rest of Exam C/Exam 4’ syllabus, and the style is similar. The bbook is organized as a college textbook, and as such has an academic orientation (which I like) It provides illustrative examoles and old exam questions heavily edited into textbook style, but also theoretical exercises and full derivations ofall major concepts. Dean (in the References on page 41 of the Dean study note) says that this book “Provides a more rigorous ‘treatment of ereibilty premium (apparently comparing it to Herzog) and includes most the (sc) material cov- cred in this study note”, This isa bit boastful, since the textbook provides more material than the Dean study note; it features an extension to the Bihimann-Straub model in which the process variance is not o2/, and it provides an empirical Bayes estimator when you wish to use @ manual premium. (We will discuss these differ ences in coverage inthe lessons.) Apparently, Dean is referring tothe fact that the Loss Madeés textbook does not offer opinions onthe applicability of credibility methods to real life. Dean definitely provides better coverage of the practical side Advantages: Ifyou bought the textbook for the rest of the syllabus anyway, there is no extra expense. You are never asked to memorize a formula, since everything is derived from basic principles. (This, however, assumes that you carefully read the part which is background material only on the syllabus.) The authors do not hesitate to criticize traditonal actuarial methods (such as limited fluctuation credibility)—you get an unbiased review here. Disadvantages: Ifyou want answers to the exercises, the answer book sells separately. The academic orienta- tion wontthelp you much on an exam. You may not care for this mathematical approach. Mahier-Dean and Dean Description: Avery hands-on exposition. These authors hardly care about the derivation of their formulas, or about their statistical validity, which they take for granted. They are working actuaries who want to get the job done. There are ample exercises including old exam questions, and all of them are worth ding. (The exercises in the Dean study note are in the back, starting on page 42—dontt miss them!) Dean says that the Mahler-Dean study note is “An introduction to credibility theory with many examples and problems,” a very apt description. ‘Advantages: Can be downloaded from the web for free." All exercises have worked-out solutions. Easy to read and exam oriented. Dean offers practical advice on the use of Bihlmann, eg. the weakness of using unweighted Teast squares for data varying greatly by dollar amount, Disadvantages: ‘The Mahler-Dean study note omits derivations of methods—you simply have to accept them on faith. On the other hand, Dean provides derivations in appendices, so they're there if you really must know. The Mahler-Dean study note suffers from overuse of capital letters; to me, capital letters look very pretentious. "Goto wew.s0a org/iles/edu-2010-spring-exan-c.paf, and on page 4in Option Blick on bot inks Copyign 220118580 PART IV. CREDIBILITY 733 This is less so in the Dean study note. (Loss Models goes to the other extreme, refusing tc capitalize even the ‘gamma distribution, even though itis represented by a Greek capital letter) Introduction to Credibility Theory, by Herzog Description: This isa later edition of the textbook that was used on the former CAS syllabus. It is somewhere between Klugman and Mahler-Dean. Itis easy reading, The exercises are good and should be done, but are easy. Answers, but not worked-out solutions, are provided. The book emphasizes the practical methods for ‘working out credibility problems. Some of the derivations are there too, but in different chapters which are off the syllabus, so you only read them if you need to slake your curiosity. Dean says thatthe Herzog text is ‘A well organized survey of credibility theory.” Advantages: An easy read. Most complete coverage of conjugate prior pairs; covers the (important for exams) Bernoulli-beta pair very openly, unlike the other two options, as well as normal-normal and exponential-inverse gamma. Disadvantages: This means paying for another book. ‘The book doesn't cover several topics, such as non- Poisson limited fluctuation credibility. The author has a habit of sticking in small, hard-to-understand sections with reference material; of course you can skip these, but they may annoy you. Since there are three options, the question that arises i, wil items not covered in all three options be tested ‘on? It seems unfair to test on something which Herzog doesnt talk about but the other two options do, when the syllabus saysthat you'e free to use any option. Itappears that the exams are only testing on material found in all three options. This allows you to skip a lot ‘of material in the Mahler-Dean and Dean study notes or Loss Models if you are using those options. However, exam questions have been asked on conjugate priors covered only in Herzog, They expect you to work these questions out on the fly, but someone who is familiar with the conjugate priors has an advantage. I recommend you study all conjugate priors discussed in this manual. Even the normal/notmal pair, which no released exam question covers, was reported to have been tested in the Spring 2009 exam. Itis also a good idea to know the general credibility formula, (38.1), even though Herzog doesn't mention it, since it is useful in the simulation part of the course. Also, it looks like they are testing on limited fluctuation Credibility with non-Poisson frequency, which Herzog omits. Each lesson discusses the cove-age of the material by the three options, and with the exception of limited fluctuation credibility with non-Poisson frequency, you can skip anything not covered by all three. Tam covering Bayesian estimation together with Bayesian credibility. Bayesian estimation (I am not talking about Bayesian credibility) is covered by Loss Models as part of parametric estimation, and as such you ate re- sponsible for thismateril, including topics such as loss functions and Bayesian credibility intervals even though the other options may not cover these topics. However, there are no questions on loss functions and Bayesian Credibility intervals on released exams since 2000, and I have not received any student reports about such ques- ‘dons appearing on exams, so if short of time, you may skip those topics. Copigiesmitasie 734 PART IV. CREDIBILITY (J4 Sy Mana —138don opin e201 ASM Lesson 38 Limited Fluctuation Credibility: Poisson Frequency Reading: Loss Models Third Edition 20.2.1 or SN C-21-01 2.1-2.5 or Introduction to Credibility Theory 5.3 There is about one question per exam on limited fuctuation credibility, the topic coveied in the next three lessons. Limited fluctuation credibility is also known as classical credibility. We begin with the following situation. Your company offers a group medical coverage. Ventnor Manufacturing, a company with 1000 employees, placed its health coverage with your firm. The rate you charge is based on a pure pre- 'mium of 990. This pure premium represents your expectation of 0.2 claims per empleyee, each one averaging 1500, After one year, Ventnor has submitted 160 claims averaging 5000, The VP of Human Resources of Ventnor meets with your company’s salesman. He argues that Ventnor’s average cost per employee 1s 0.16 claims times 5000 per claim, or 800, which is less than your assumption of 900. Ventnor's employees have superior health records. Therefore the premium being charged for this coverage should be reduced. What is your response? (One way to approach the above situation is to temporarily ignore your estimate of expected pure premium. Instead, you want to determine whether you would be justified using a pure premium based on Ventnor'sexpe- rience alone. ‘To start, you can build a 90% confidence interval for Ventnor’s aggregate claims. The sample mean of aggre gate claims, 160(5000), is the center of the confidence interval, We need to calculate the variance, and we will hhave to make some assumptions in order to do that. Suppose that, based on the data you have for Ventnor’s claims, you estimate the variance of claim size as 1,000,000. (This estimate may simply be the unbiased sample variance of the claims.) You also assume that claim frequency is Poisson. For a compound Psisson distribution, if the Poisson parameter is 2, the mean of claim size is 4, and the variance is 0, then, by equation (14.4) on. age 226 the variance is Var(s)= (4 +0) In this case, if weassume that mean expected claims and mean loss s and 5000 respectively), then the variance Is 160(5000? + 1,000,000) for aggregate claims is then 160(5000) + 1.645 y/160(26 000,000), ‘fwe charge 160(5000), can we tolerate this amount of fluctuation? The question is, what percentage of themean is that “plus or minus" term? Well, we can divide the “plus or minus” term by the mean: are equal to the Ventnor experience (160 160(26,000,000). The confidence interval 1.645 / 16025,000,000) _4 4, 1160(5000) - 13.26% isa lot of fuctuation! You can tolerate 5% fluctuation, but no more, So we don't want to charge 160(5000). But we can stil do something. We can use a weighted average of what we think the right rate is based on the general population (900) and Ventnor’ experience rate. The point Is that the “right rate” has no variance. If {i4Sey Manat taihetion 795 Copp oaor Aste 736 38, LIMITED FLUCTUATION CREDIBILITY: POISSON FREQUENCY ‘we use I~ times our rate and « times Ventnor’ experience rate, then the variance of the sum will be a times the variance of using 100% of Ventnor’ experience rate. The standard deviation will be a times the standard deviation of using 100% of Ventnor’s experience rat. ‘We said that we want 5% fluctuation. So we want the standard deviation to be 0.05/0.1326 = 0.3771 of the standard deviation of Ventnor’ experience. This means putting 0.3771 weight on Ventnor's experience and the rest of the weighton 900. By doing this, the fluctuation of the rate will be (,8#5.)(0.1326) = 0.05 of the mean. Our conclusion is that we will offer Ventnor the following pure premium: 0.3771(800) +0.6229(900) 162.29 ‘The above is an illustration of the limited fluctuation credibility method. To carry out the method 1. You make assumptions for the mean and variance of claim size and claim frequency. Often claim fre- quency is assumed t0 be Poisson. 2, You establish credibility standards based on two parameters: the probability ofbeing in a certain interval, ‘which is something like a confidence level (which was 90% here), and the size of the interval you want 10, be in, which is expressed as a percentage of the mean (which was 5% here) 3, You determine how many exposures, or claims, or aggregate claim amounts, you woulé need to satisfy this standard and grant full credibility. We didn't do this directly here. Instead, we determined how large the interval would be if we insisted on a certain probability, 4, If full credibility cannot be granted, you determine what percentage of credibility can be granted. Let's tur the above calculation into formulas. Let er be the exposure needed for full credibility! where an ‘exposure is a unit insured for a time period, e.g. person-year. 1000 was the exposure in the above example, ‘but itwas not enough for full credibility. Let be the expected aggregate claims per exposure? (900 in the above ‘example), and o be the standard deviation per exposure. Let yp be the coefficient from tke standard normal distribution for te confidence interval which you desire; which was 1.645 in the above example for 90%. In ‘general, yp = "((1 + p)/2), where p is the level of confidence that the mean isin the interval: Let k be the maximum fluctuation you will accept, the size of the confidence interval on each side as a percentage of the mean. In other words, you want confidence level p that actual losses will be within k of expected losses. The parameter p is sometimes called the probability parameter, and k is sometimes called the range parameter. ‘Then the aggregate mean is ery and the aggregate variance is erc?. We want vero ary () "ln ‘where. = (jp/2)? and CV is the coefficient of variation for the aggregate distribution. This is the general formula for full credibility. EXAMPLESBA Aggregate claims follow a lognormal distribution with parameters = 7.5, 0=2. The full credi- bility standard is set according to the methods of limited fluctuation credibility so that actual aggregate claims are within 10% of expected aggregate claims 95% of the time. Determine the number of exposures needed for full credibility er cv? (38.1) "eps my own neation and not used by any of he sylabus materials Loss Modes uses fortis concept, which s something like the hypothetical mean of Bublmann cedibily, not eonfuseit with the ‘of Bahimann credit, which the overall mean, Usually we use 2 0 this concept, but Loss Models ses yp in this context. Mahler-Dean use y with no subset and Herzog uses ‘Loss Models wes p for this concept, Mahler-Dean uses Pr and Herzog uses 1~ a, Lost Modes uses fo the range parameter. Mahler-Dean uses kone of thee rare uses of lowercase leter—and Herzogussc. This ‘ehasnorelationshiptoBahlmannisk (J Sy Mansa I3tedon opie 02011 A580

You might also like