Professional Documents
Culture Documents
19.1 BACKGROUND
A disability insurer developed a new underwriting manual for small groups. As with other rate
manuals, the company’s underwriting manual provides pricing, based on a number of variables,
to be applied to groups that apply for disability insurance. Daniel Skwire, writing in Chapter
25 (“The Rate Manual”) in Daniel Skwire (ed.) Group Insurance (7th ed., Actex Learning, 2016
[12]) defines the rate manual as consisting of “rates which vary by allowable case characteris-
tics (such as age, gender, and family composition) and with rating factors to be applied for
other rating characteristics, like geographic area, group size, industry, trend factor, and mor-
bidity factors” (Skwire, ed. [12]). Chapter 26 (“Long term Disability,” in Skwire [12]) ad-
dresses the type and size of rate adjustments appropriate for a number of risk characteristics
when underwriting and pricing a disability risk:
Social Security Offsets (the probability that a claimant will receive a disability benefit
from Social Security, and the amount of that benefit);
Plan variations
o Benefits as a percentage of income, between 50% and 70%;
o Maximum benefits;
o Elimination period (3-, 6- and 12-month elimination periods);
o Benefit period (to age 65, to Social Security normal retirement age, or for life);
o Definition of disability (“any reasonable occupation” vs. “own occupation,” with
the latter increasing the basic premium from the “any reasonable occupation” level);
o Offsets (e.g., for workers compensation benefits, state sickness plans, or pension
benefits);
o Limits on certain conditions, such as mental nervous, alcoholism and drug abuse.
Group size and composition;
Employee contribution and participation rates;
Employment type (white collar vs. manual);
Industry; and
Average earnings.
As we discussed in Chapter 2, a detailed rating manual developed using age, sex, and other
factors should enable the carrier to make reasonably accurate predictions of the likely cost of
disability coverage for employer groups. Rating factors, however, are frequently developed in
a single dimensional way and combined within a multivariate rating model. Within individual
rate cells, therefore, it is possible that manual rates will not accurately reflect the combination
of risk factors that the specific rate cell represents. Consequently, some rating “cells” will ex-
hibit greater profitability than others.
373
374 CHAPTER 19
In this chapter, we consider the feasibility and process of refining a basic manual rating model
using predictive modeling. By identifying those cells or combinations of rating factors that
represent better or worse risks, a refined model can be obtained that would better identify risk
factors associated with disability costs, and allow the company’s underwriters to segment and
rate risks more appropriately. This chapter describes the modeling and model evaluation pro-
cess.
The model was developed to predict Excess Profit Margin (EPM) based on five years of active
and lapsed policy data. The manual rates contain a margin for profit, risk and contingencies,
so any profitability within the book greater than zero represents excess profit over and above
that assumed in pricing. The data is adjusted and summarized over five years at the employer
group (“policy”) level. The original 5 years of data contained 10,438 policies. Adjustments
were applied in order to limit the effect of large policies. Larger policies were truncated at a
maximum of 500 lives for each policy. A large number of variables are available for modeling
purposes. Variables are described (including, where appropriate, values of categorical varia-
bles) in Table 19.1.
One derived variable was added to the database described in Table 19.1. This variable, called
EPM_Set, is a categorized dependent variable in which the continuous EPM Dependent variable
is categorized by “0.1” intervals. This is necessitated because Quest and C5.0 trees (tested in this
project) accept only a categorical variable for their target field.
The need for a model that could be used by underwriters in an operational setting suggested the
use of a tree form, which would generate profitable and unprofitable combinations of risk factors.
Three different types of decision trees were considered for developing the model: Quest, C5.0,
and C&R Tree. The evaluation was performed using Clementine 9.0, manufactured by SPSS.
Clementine makes different models available. Decision trees were evaluated by running train-
ing and testing datasets to help us decide on the ultimate decision tree model.
TABLE 19.1
Independent Variables
Independent
Variable Name Explanation Categories
SIC1 category SIC Group (defined by the first two dig- Group 1: 01-47, 49, 74-77
its of a case’s SIC code) Group 2: 48,60-69,73,81,84-89
Group 3: 80
Group 4: 50-59, 70-72, 78-79
Group 5: 82-83, 90-99
Northeast (1), South (2),
Region indicator Region
Midwest (3), West (4)
M_LT_40 index % Males < 40 (based on covered salary) < 20% (1), 20% (2)
M_OT_40 index % Males 40 (based on covered salary) < 40% (1), 40% (2)
F_LT_40 index
% Females < 40 (based on covered sal- < 20% (1), 20% (2)
ary)
% Females 40 (based on covered sal- < 20% (1), 20% (2)
F_OT_40 index
ary)
PWC index
Professional/White-collar % (based on < 60% (1), 60% (2)
covered salary)
Avgsalary index Average Covered Salary < $45k (1), $45k (2)
Ownocc index Own Occupation Period < 5 years (1), > 5 years (2)
Integrated indicator Integrated Indicator No (0), Yes (1)
Case size Case Size < 100 (1), 100 (2)
Ben% Group Benefit % 60% (1), > 60% (2)
SSI indicator Social Security Integration Full (1), Other (2)
BC index Blue-collar % < 10% (1), 10% (2)
Contributory ind Contributory Indicator Yes (1), No (2)
Multi-location ind Multi-location Indicator Yes (1) No (2)
Pop Den ind Population Density 1 million (1), < 1 million (2)
EP index Elimination Period 90 days (1), > 90 days (2)
Ben Per Benefit Period < to age 65 (1), to age 65 (2)
MN limit Mental/Nervous Limit Yes (1), No(2)
DA limit Drug/Alcohol Limitation Yes (1), No(2)
Definition of Disabil- Partial and Residual (1),
Definition of Disability
ity Other (2)
COLA indicator Cost of Living Adjustment Indicator Yes (1), No (2)
Agency polcnt index Agency Policy Count < 10 (1), 10 (2)
Maximum Benefit Maximum Benefit < $10,000 (1), $10,000 (2)
1 The Standard Industrial Classification (abbreviated SIC) is a United States government system for classifying
industries by a four-digit code. Established in 1937, it is being replaced by the six-digit North American Industry
Classification System, which was released in 1997. This insurer continues to use the SIC codes.
376 CHAPTER 19
The total number of records in the dataset is 10,438, representing from 1999 to 2003. This
dataset is divided into three subsets: training (4,233 records—about 40%), testing (3,129 rec-
ords—about 30%), and evaluation (3,089—about 30%). The training dataset is used to develop
a model, and the testing dataset is used to check the developed models. If a test result is not
satisfactory, the model is adjusted. The models that pass the testing phase are evaluated using
the evaluation datasets.
Four models were tested with the three different types of decision trees. The four models are
based on different choices of independent variables:
Company and Benefit variables were tested separately to determine whether each of the two
groups of variables influences the model’s performance.
Each model was run at least 30 times in each type of decision tree resulting in approximately
100 runs for each of three types of tree algorithms. The testing indicated that Model 1, which
included all independent variables as input, outperformed the other three models. The C&R
Tree performs better than either of the two other types of decision trees. Thus, the final model
choice was Model 1 (all variables) within the C&R tree. Note that C & R Tree accepts any
variable types for its target field, and therefore the original dependent variable (continuous)
could be used.
The following are the results of training and testing of the three decision trees with all inde-
pendent variables as inputs.
TABLE 19.2
TABLE 19.3
Although the C&R Tree was chosen for final model development, the percentage of correct
assignments using the C&R Tree was less than 50% (44.6%). To overcome this accuracy prob-
lem and to increase the model performance, the first and second predictive variables are first
selected and then “fixed.” The data are then divided into four subgroups according to the cho-
sen two variables as the first and second nodes in each decision tree. The decision process is
then rerun for each subgroup. This process results in different models for each subgroup, which
may have some additional operational implications for implementation but can improve model
accuracy. The C&R Tree is then rerun within each subgroup to develop each of the four models.
After this process, subgroups were appended again to develop an overall decision tree model.
Figure 19.1 illustrates the output of the tree classification process for one subgroup (Male >
40). Table 19.1 also provides an interpretation of variable values.
SAMPLE TREE DIAGRAM (LIMITED TO NODES 1-12 OF FINAL MODEL)
$R-EPM 99-03
Node 0
n 4138
% 100.00
Predicted 0.00
Region Indicator
-
378 CHAPTER 19
1.50 > 1.50 1.50 > 1.50 1.00; 4.00 2.00; 3.00; 5.00 1.50 > 1.50 1.50 > 1.50
Node 4 Node 4 Node 7 Node 8 Node 13 Node 14 Node 19 Node 22 Node 26 Node 22
n 55 n 55 n 65 n 595 n 146 n 125 n 1018 n 257 n 613 n 257
% 1.33 % 1.33 % 1.57 % 14.38 % 3.53 % 3.02 % 24.60 % 6.21 % 14.81 % 6.21
Predicted -0.68 Predicted -0.68 Predicted -0.09 Predicted 0.32 Predicted 0.04 Predicted 0.33 Predicted -0.04 Predicted -0.54 Predicted 0.29 Predicted -0.54
- - - -
Multi Location Ind
Ben% Group Avg Salary Index Regional Indicator
1.50 > 1.50 1.50 > 1.50 2.00 3.00; 400 1.50 > 1.50
FIGURE 19.1
A PREDICTIVE MODEL FOR DISABILITY UNDERWRITING PROFITABILITY 379
As the partial model in Figure 19.1 shows, the C&R Tree process identifies “nodes,” or groups
of independent variables that segment the entire database into groups with common characteris-
tics that predict the subgroup’s profitability. The model may be evaluated in at least two ways:
1. Statistically in terms of “fit.” We will examine the statistical performance of the model first
in this section; or
2. In terms of whether the model serves a business purpose. In this case, a comparison model
exists—the rate manual—and the alternative, tree-based model may be evaluated in terms of
its ability to discriminate between potentially profitable and potentially unprofitable groups.
We will examine the business performance of the model in the latter part of this section.
The model was evaluated by assessing the correlation between predicted and actual profitabil-
ity, using a number of different datasets.
N = 10,453
Minimum Error 0.9600
Maximum Error 301.9000
Mean Error 0.0286
Standard Deviation 4.8747
Correlation 0.0200
Significant at the 95% level
As an “overall” validation, the fact that the linear correlation statistic is significant at the 95%
level is a good sign. This means the model is correctly predicting EPM, i.e., there is a “corre-
lation” between the predicted EPM and the actual EPM.
N = 5,178
Minimum Error 0.9600
Maximum Error 301.9000
Mean Error 0.0221
Standard Deviation 4.7685
Correlation 0.0240
Significant at the 95% level
380 CHAPTER 19
As with any of the validation statistics, there is some variation in the data not explained by the
model. But for predicting the higher EPMs versus the lower EPMs, the model is, in general, capa-
ble of doing so. Again, the relationship explained by the model is significant at the 95% level—
which means we are fairly certain (95%) that the model is useful in predicting EPM.
The statistically significant correlation statistics seen when the sample size (N) is larger is a posi-
tive sign. This shows that model is correctly predicting EPM and can be used to predict EPM and
to assess which variables can be used to determine which employer groups are more likely to have
a higher EPM.
There still is significant variation around the predicted EPM numbers on a case-by-case basis,
but taken across the entire population, the model will have a positive impact on identifying
candidates who are likely to have higher EPM numbers.
The dependent variable for this model is the level of excess profitability. Table 19.4 summa-
rizes the excess profitability of the book of business by model node. The baseline excess prof-
itability of the entire book is zero. The underlying book, of course, may generate profits at the
level anticipated in pricing. The dependent variable is profit in excess of the pricing level. As
the table shows, certain nodes are relatively profitable and others are relatively unprofitable.
Overall, the relatively profitable nodes are large, so that identification of these nodes holds the
promise of increased profitability of the book.
The model assigns groups prospectively to different nodes (“predicted number in node”) de-
pending on the values of the group’s variables. We can then identify the actual classification
of groups based on each group’s outcomes. Similarly, we can predict the profit of groups in
each node, based on the model, and compare predicted and actual profit for each node. Ideally,
a model would correctly assign groups to each node so that expected and actual numbers are
the same. Because it is highly unlikely that a model will do this, we examine, instead, ways
that management may use the model results to improve its underwriting process.
A PREDICTIVE MODEL FOR DISABILITY UNDERWRITING PROFITABILITY 381
TABLE 19.4
Table 19.5 shows that the predictive model, while it does not accurately predict the level of
profit within individual nodes, predicts both the number and direction of the profitability by
node with reasonable accuracy. Thirteen of the nineteen nodes are predicted accurately in terms
of direction, accounting for 7,135 or 68% of all groups. Overall profitability of these groups
(even with those that are predicted to be profitable but turn out to be unprofitable in actuality)
is 0.035. In total, profits amount to $247.8, or about 6 times the level of the book using the rate
manual only.
382 CHAPTER 19
TABLE 19.5
Table 19.6 identifies only those nodes (11) that are predicted to be profitable. These nodes
account for considerable profitability, both at the group level and in total.
A PREDICTIVE MODEL FOR DISABILITY UNDERWRITING PROFITABILITY 383
TABLE 19.6
A question that arises is why a predictive model is required at all in underwriting? The model
results suggest that there are some “nodes” or groupings of underwriting risk factors that are
more profitable than others. The obvious implication of this result is that the rate manual that
was the basis for the study should be updated. Updating a rate manual, however, is a large
undertaking. Using a predictive model (that can be easily and frequently updated) in combina-
tion with the Rate Manual is therefore a practical and cost-effective solution.
384 CHAPTER 19
1. A means of practically applying the risk factors that result in the model “node” classifica-
tions. Node 1, for example, an unprofitable node, consists of groups that meet the following
criteria:
Multi-location Indicator: 1 Multi-locations
Own Occupation Indicator: 1 Less than 5 years
COLA Indicator: 1 Yes
Accounts that meet these criteria could either be avoided or an additional rating factor
could be introduced, as discussed below.
Conversely, Node 11 is a profitable node. Groups in Node 11 meet the following criteria:
Multi-location Indicator: 1 Multi-locations
Own Occupation Indicator: 2 Greater than 5 years
PWC (Prof White-collar % based on covered salary): 1 Less than 60%
Benefit Percentage: 1 Less than 60%
Accounts that meet these criteria are potentially more profitable and therefore more attrac-
tive and could be pursued.
Scenario 3 combines an underwriting and pricing strategy. Groups that are predicted to be
unprofitable are rated up by 10%. Assuming that the increased rates do not result in loss of
business, the overall profitability of the block will be higher than Scenario 2, at about 5
times that of the base case.
Scenario 4 requires that those groups that are identified as potentially profitable and con-
firmed by actual experience (testing) of the models be written. This results in almost 50%
reduction in number of groups written but results in significantly higher profits.
A PREDICTIVE MODEL FOR DISABILITY UNDERWRITING PROFITABILITY 385
In Scenario 5, cases are accepted if the model indicates profitability (and this is confirmed
by the actual model test) while unprofitable groups are rated up 10%. In this scenario, some
groups reject the rating increase or seek another carrier, so the number of groups written
falls by about one-third while the profitability of the book is significantly increased. Over-
all, Scenario 5 increases total profitability, although profitability per account is lower, be-
cause it identifies a larger number of potentially profitable accounts than Scenario 4.
TABLE 19.7
While no model will ever replace an underwriter’s judgment (particularly our analysis of the
model and possible underwriter reactions), this analysis shows that a predictive model, used in
combination with underwriter rules, has a capacity to increase profitability of a book of busi-
ness.