You are on page 1of 16

European Actuarial Journal manuscript No.

(will be inserted by the editor)


An actuarial model for assessing general practictioners
prescriptions costs
Giorgio Spedicato
the date of receipt and acceptance should be inserted later
Abstract Monitoring general practitioners prescriptions costs is an important issue in order
to efciently allocate national health insurance resources. To address this aim this paper pro-
poses a methodology based on non - life actuarial models. The patients frequency and costs
of drugs prescriptions are modeled by means of Generalized Additive Models for Location,
Scale and Shape (GAMLSS) in our approach. The total cost of the pool of patients drug
prescriptions is then modelled by means of convolutions, following a classical risk theory
approach. An example based on a quasi-real dataset exemplies the proposed methodology.
Keywords GAMLSS public health insurance drug prescriptions coverage predictive
models
1 Introduction
Monitoring general practitioners (GPs) costs of drug prescriptions is an important issue to
efciently allocate National Health Insurance (NHI) budget. Prolonged economic downturn
has produced increased pressure on governments toward rationalization and budget restric-
tions. For example, NHI policy discussion in Italy [3] has brought the attention upon stan-
dard costs of service, which should represent the efcient price of any service granted by
the NHI. This paper aims to show a rationale approach to assess the standard cost of drug
prescriptions charged to the NHI.
Drug prescriptions expenditure has been widely studied by health econometricians and
medical researchers. In particular, [26] analysed GPs drug prescriptions costs in Ireland.
The yearly total cost was estimated by means of a linear regression model, based on aggre-
gate demographic variables of each GPs pool of patients. [21] conducted a similar study in
Northern Italy. Here, models applied were multiple linear regression and LISREL model,
based on both patient- and GP-level demographic variables. [9] studied the effect of GPs
age and sex on number and cost of drug prescriptions in Catalunya region. Finally, [10]
applied a panel data econometric model on data from Catalunya region. In synthesis, medi-
cal literature conrm the availability of data and the importance of using statistical models;
moreover, empirical studies show that both patient- and GPs-level demographic variables
BLINDED
2 Giorgio Spedicato
play a signicant role in determining the total cost of the yearly prescriptions. However,
there are three drawbacks in the approaches proposed in the medical literature: number and
cost of single prescriptions are not separately taken into account; linear regression mod-
els are used, while generalized linear models (GLMs) seem more adequate; the expected
value of total cost of the yearly prescriptions is modeled only, without taking into account
variability.
Even if no actuarial literature exists on this topic, some well known actuarial approaches
may be usefully applied in this context. In particular, we propose a new methodology which
combines four actuarial techniques that are widely used in non-life insurance actuarial prac-
tice.
The rst technique consists of the convolution of stochastic distributions (see e.g. [4] for
a theoretical introduction). In particular, risk theory models the total cost of claims by con-
volution of number and cost of claims distributions. One of the mayor applications of risk
theory in actuarial context regards the estimation of insurers Solvency Capital Requirement
([2]); see an example in [20]. Here, we propose to model the distribution of the yearly total
costs of drugs prescriptions of a single patient as a convolution of the stochastic number
and cost of prescriptions associated with the patient.
The second technique is an extension of GLMs, that is Generalized Additive Models for
Location, Scale and Shape (GAMLSS) [19]. GLMs are widely used in non-life rate-making
([1] and [5]). In particular, over-dispersed Poisson and Gamma GLMs are applied to model
the frequency and the severity of claims as a function of policyholders characteristics in
order to assess risk premium of insurance coverages (see [25] for details). However, the
variability of number and cost of claims is rarely taken into consideration in a standard rate-
making. GAMLSS allow to model as a function of covariates not only the mean, but also
other parameters which enable to completely dene the conditional distribution of the de-
pendent variable. Very few actuarial applications of GAMLSS exist. In particular, GAMLSS
have been proposed to assess the frequency and the cost of claims in the Australian market
in [6] and to analyse mortality trend in [24]. Moreover, in order to assess the premium risk
Solvency II capital requirement, [22] applies GAMLSS to better take into account portfo-
lios heterogeneity. Here, we propose to model frequency and costs of prescription drugs for
each patient by means of GAMLSS, in order to estimate location and dispersion parameters
as a function of patient characteristics.
The third technique is represented by models for lapse probability and conversion rate,
widely used in actuarial practice to predict drop-outs and arrivals, given that a policyholders
portfolio is an open collectivity (see e.g. [25] and [23] for a practical discussion). We propose
to model the probability that any subject may leave the GP for death or other causes, as well
as the probability that a new subject may enter the pool of patients of the GP.
The fourth technique consists of approximating the total loss distribution of a portfolio
by a theoretical distribution (see for details [14]). We extend this approach for approximating
the yearly total cost of drug prescriptions arising from a GPs pool of patients.
The paper will be structured as follows: the methodology will be introduced in Section
2, an example based on a quasi real data set will be discussed in Section 3. Finally, in Section
4 conclusions and suggestions for further research will be provided.
2 The methodology
This section introduces the theoretical tools which are the basis of the new methodology
proposed.
An actuarial model for assessing general practictioners prescriptions costs 3
2.1 Risk theory
One of the goals of risk theory is modeling the total cost of a policyholders portfolio. Given
that patients are heterogeneous, we follow the so-called individual risk theory approach to
model the distribution of the yearly total costs of prescription drugs. In particular, the yearly
total cost

T of prescription drugs can be expressed as the sum of single patients costs t
i
,
i = 1, ...N, that is:

T =
N

i=1
t
i
, (1)
where both

T and t
i
, i = 1, ...N, are random variables.
Then, the yearly cost of prescription drugs t
i
for patient i can be seen as a convolution
of single patients yearly costs c
i j
of prescription drugs for patient i, j = 1, ... n
i
, that is:
t
i
=
n
i

j=0
c
i j
, (2)
where n
i
represents the stochastic number of prescription drugs during the exposure period
for patient i and c
i j
represents the n
i
stochastic costs of drug prescriptions for patient i.
2.2 GAMLSS
GLM extends classical linear model when the dependent variable is not conditionally Gaus-
sian distributed; here, the expected value of the dependent variable y
i
is expressed as a
function of covariates through the GLM link function, that is:
_
E [ y
i
] =
i
= g
1
(
i
) = f (x
i
)
var [ y
i
] = V (
i
)
(3)
where g
1
() is the link function, V (
i
) is a function that depends by the distribution
family and is a constant that can be estimated from the data (see [1] for details). How-
ever, standard GLM framework leads to restrictive modeling for the variance of y
i
, since it
depends on
i
.
Arecent extention of GLMs, i.e. GAMLSS family, overcomes such limitations. GAMLSS
enable to model up to four parameters of y
i
distribution as a function of covariates (i.e. lo-
cation
i
, scale
i
and shape parameters
i
and
i
). Then, we have:
_

i
= f
1
(x
i
)

i
= f
2
(x
i
)

i
= f
3
(x
i
)

i
= f
4
(x
i
)
(4)
The distribution of y
i
is therefore fully characterized by a set of exible equations. In
particular, equation (4) implies that moments of y
i
can be directly expressed as a function of
covariates after a convenient parametrization, that is:
_
E [ y
i
] = f (x
i
)
var [ y
i
] = g(x
i
)
(5)
4 Giorgio Spedicato
Current GAMLSS R package [19] supports more than 60 distributions, non-linear and
non-parametric relationships (e.g. cubic splines, loess and non parametric smoothers), ran-
dom effect modeling; moreover, it provides a full set of diagnostic tools.
In order to assess the drug prescriptions total cost in (2) of a GPs pool of patients, we
propose to model n
i
and c
i j
by means of GAMLSS framework as a function of patients char-
acteristics. This enables to obtain expressions for E [ n
i
], var [ n
i
], E [ c
i
] and var [ c
i
] following
equation (5). We propose to use for n
i
a count data regression model, while for c
i
a posi-
tive distribution regression model. Suitable candidates for n
i
are Negative Binomial (NB) or
Poisson (POI) distributions, which are are widely used in non-life actuarial practice; the ad-
vantage is that closed forms for the moments exist as a function of distributions parameters.
Formulas (6) and (7) show conveniently parametrizations of NB and POI probability mass
functions, respectively:
p
Y
(y|) =
e

y
y!
E [Y] =
var [Y] =
(6)
p
Y
(y|) =
(y+
1

)
(
1

)(y+1)
_

1+
_
y
_
1
1+
_ 1

E [Y] =
var [Y] = +
2

(7)
However, suitable candidates for c
i
are Gamma (GA) and Inverse Gaussian (IG). Equa-
tions (8) and (9) show convenient parametrizations of GA and IG density functions, respec-
tively:
f
Y
(y|, ) =
1
(
2
)
y
1

2
1
e

y
(
2
)

_
1

2
_
E [Y] =
var [Y] =
2

2
(8)
f
Y
(y|, ) =
1

2
2
y
3
e

(y)
2
2
2

2
y
E [Y] =
var [Y] =
2

3
(9)
2.3 Lapse probability and conversion rate
With the aim to optimize proposed tariffs, actuaries usually t models for lapse probability
and conversion rates which take into account new policyholders ows and existing cus-
tomer drop outs, respectively. The standard approach is logistic regression with covariates
regarding policyholders demographic prole and market competitiveness environment (see
[25]). Lapse and conversion modeling allows to dene properly the effective period of ex-
posure at risk for each subject during the time of the study, e
i
.
Our application models drug prescriptions cost of a pool of patient during one calendar
year. However each patient can enter the pool after the beginning of the year, e.g. for having
changed residence, and can leave the pool before the end of the year, e.g. for death.
Therefore the effective exposure period becomes a stochastic variable, e
i
that shall be mod-
elled in order to properly assess t
i
. We assume the expected value of n
i
to be proportional
An actuarial model for assessing general practictioners prescriptions costs 5
to e
i
, as formula 10 shows. GLM modelling handles this issue by means of offsets, as [1]
shows. The ln(e
i
) term in the link equation has its coefcient set at 1 by an offset term, as
equation 10 shows.
E [ n
i
] = e
i
exp
_
x
T
i

_
ln(E [ n
i
]) = ln(e
i
) +x
T
i
(10)
However since the exposure variable is in our application stochastic, equation 10 will be
properly modied to take into account the contribution of inows and outows.
e
i
= 1 e
l
i
+ e
nb
i
(11)
Equation 11 expresses the exposure of patient i-th, e
i
as the algebraic sum of three com-
ponents: the exposure amount, 1, that would be acheived if the patient would stay within
the pool for the full calendar year, less the fraction of year exposure, e
l
i
, that shall not
be considered in case the patient leaves the pool before the year end, plus the exposure
contribution, e
nb
i
, of new patients that shares the same demographic prole of patient i-th.
e
l
i
= q
i

I
d
can be expressed as the product of a Bernulli random variable q
i
and a uniform
(0,1) random variable,

I
d
. In particular q
i
represents the probability that patient i-th will
leave the pool within the year, while
d
represents the fraction of year lost. Using a uniform
distriubtion, we are assuming that lapse probability is constant thought the year.
Similarly we can express the exposure to new patients ow as e
nb
i
=
m
i

j=0

I
nb
j
. m
j
represents
the random number of new patients and it will be modelled by a Poisson distribution of
parameter
j
. Moreover we are assuming that
i
patients share the demographic prole of
patient i-th. The interpretation of

I
nb
j
is parallel to the
d
one.
2.4 Loss distribution modeling
Many actuarial application uses loss distribution modelling to assess the shape of claim
costs. Loss distribution modeling ts theoretical distribution parameters on real data in or-
der to fully characterize the distribution that better t empirical claim costs under study.
Fitting distributions requires to choose theoretical functions as candidates, to estimate their
parameters and to assess their goodness of t. Another application of loss distribution mod-
eling lies in approximating the insurer portfolios total cost,

T, by a simple theoretical dis-
tribution.
[14] book provides a comprehensive dissertation on loss distribution modelling.
An analytical expression of the loss distribution allows to estimate key moments (e.g. mean
and variance) and other statistics by closed form instead using simulation analysis that can
be time - consuming. However very often real data are difcult to be synthesized by theo-
retical distribution due to data quality problems or excessive heterogeneity.
The applications of loss distribution tting in this paper is twofold. The rst side con-
sists in the selection of conditional distribution for n
i
and c
i j
when performing GAMLSS
modelling. Normalized quantile residuals (see [8] for details) plots aided the assessment of
chosen conditional distribution reasonableness. The second side consists in the closed ap-
proximation of shape of

T by means of a log-normal distribution following the approach
outlined in [15] paper.
6 Giorgio Spedicato
2.5 The estimation procedure
In order to estimate

T we will dene the distributions of n
i
and c
i
by means of GAMLSS
predictive models. Patients with full year exposure will be used to calibrate the model for n
i
.
Distributions of t
i
and

T can be obtained empirically by means of Monte Carlo simula-
tion. In particular, a random realization from distribution of the total cost t
i
for patient i can
be simulated using the convolution algorithm:
1. Sample one realization of the effective yearly exposure for patient i-th, e
i
2. Select the number of prescription drugs, k, at random from the assumed prescription
drugs frequency distribution n
i
.
3. Do the following k times. Select the prescription drugs cost, z, at random from the
assumed prescription drugs cost distribution c
i
. costs, z, selected in step 2.
Then, if the outlined process is repeated for all N patients of the general practitioners port-
folio, we obtain one random realization from the distribution of the total cost

T.
Finally, in order to obtain the distribution of t
i
or

T it is necessary to repeat the previous
steps M times (M >> 0).
3 An empirical application
3.1 Data sources and preparation
3.1.1 Data sources
An empirical application will be presented in the studio to exemplify numerically the frame-
work outlined previously. We will assess the distribution of yearly drug prescription total
cost of a target GP pool of patients. The data sources used in the application are:
1. A data set,the prescriptions data set (PDS), containing the number of prescriptions of
6,000+ patients to their GPs [11]. Each rows in the PDS contains the number of pre-
scriptions during a whole year (dependent variable) plus a wide choice of demographic
data. PDS will be used to calibrate the frequency model. We have not challenged the
reliability of PDS due to the impossibility to perform such task. Moreover the PDS has
been collected on patients between 25 and 65 years of age. All analyses will be therefore
limited to the corresponding span of age, without losing generality.
2. A life table split by sex used to model the probability of death as a function of age
(source [13]).
3. A data set in the same format of the VDS containing 600 patient demographic data,
henceforth the target data set (TDS). TDS represents the pool of patients of a GP that
we code as XY. XY

T distribution is to be assessed by the methodology proposed in this
paper.
4. A data set containing a sample of drugs costs along with the age and sex of the patient
whom the prescriptions was required for. This data set, henceforth the Costs Data Set
(CDS) will be used to calibrate the drug prescription cost model. This dataset has been
collected in Spring 2011 thanks to the cooperation of an Italian drugstore.
5. A function that allows to model the probability of drop out due to reasons other than
death (lapse probability). Due to data availability limitation, we have set this probability
to a at value of 2.0%, after a discussion with a panel of experienced GPs.
An actuarial model for assessing general practictioners prescriptions costs 7
6. A function that gives the rate of new enrolled patients (conversion rates). Due to data
availability limitation, we have set this rate to a at value of 3.0%, after a discussion
with a panel of experienced GPs.
Standard lapse and conversion models deployed by personal lines pricing actuaries uses
logistic regression model to predict yearly lapse probability for each policyholder. Variable
used in such regression models consist policyholder demographics, policyholder purchasing
behaviour and market competitiveness.
In our problem it is clear that the risk of enter and drop out from the pools is not uniform
among the patients. Age is indeed a systematic risk factor, but we had not the data source
to build predictive models for lapses and conversion rates with covariates and therefore we
choose a at lapse rate to model drop out for reasons other than deaths. Even if the followed
approach is simple, it however permits to simulate the open collectivity patients ows.
As the aim of the paper is to demonstrate the feasibility of the process, we did not care
to nd datasets completely matching to the real problems. The PDS and VDS comes from
a German study on yearly number of visit to GPs conducted in the 80s. We have assumed
that the number of the visit to the doctor may be a perfect proxy to the number of drug
prescription and that the population sampled in PDS and VDS dataset are representative
of the population targeted that is represented by northern Italy NHI patients. On the other
hand the CDS represents a sample of drug prescriptions amount collected in Spring 2011
thanks to the cooperation of a drug store of Nibionno (Italy). The number and the cost data
set are not collected on the same subject. This issue does not represent a limitation to the
analysis as the cost distributions has been assumed independent from the distribution of drug
prescriptions number having the effect of structural variables (like age and sex) taken into
account. Nevertheless the employed data sources allowed us to exemplify adequately the
operative methodology we have discussed 2.5.
3.2 Predictive models estimation
GAMLSS can be tted by means of an R package ([18]).
As long as the purpose of this article is to illustrate the application of an actuarial
methodology to a health economic problem, the modelling stage has not been excessively
complicated and an approach somewhat resembling the usual pricing practice in non - life
insurance has been followed.
The PDS average number of drug prescription equal to 3.33 and corresponding standard
deviation is 6.03. The sampled costs of drug prescriptions average is 20.3 and corresponding
standard deviation is 24.1.
Two predictive model on n
i
and

c
i
j were tted using GAMLSS framework.
Model building process consisted in experimenting and assessing different distributional
assumption of the dependent variables, the signicance of candidate predictors and their
functional relationship within the regression equation, as properly described in [17]. Finally
following decisions were taken with respect to the selected models:
The negative binomial has been chosen as underlying distribution for the frequency of
prescriptions, while the inverse Gaussian has been chosen as underlying distribution for
the cost of a single drug prescription. They were parametrized using formulas 7 and 9
respectively.
8 Giorgio Spedicato
Cubic splines have been used in both the frequency and costs model to handle non -
linear marginal relationships between the continuous covariates and the dependent co-
variates. Splines are suggested (see e.g. [12])in applied statistical modelling to overcome
the naive assumption of marginal linearity. Another approach to handle this issue, widely
used in personal lines ratemaking (as described in [25]), consists in binning continuous
variables into categorical variable choosing brackets properly and using such binned
variables into regression models.
In this exploratory study, we have performed no analysis of interactions of predictors.
An exposure variable has been added to the dataset, with constant value 1 (assuming
every patients of PDS to have been observed for a whole year without censoring), e
i
=1.
The number of prescriptions regression model had a ln(e
i
) term as offset. This offset
term had to been taken inserted the formula explicitly as required by GAMLSS package.
Even if in the PDS e
i
= 1 for all records, within the simulation process described further
e
i
will became a random variables taking into account the open collectivity structure.
The inspection of frequency and cost models marginal effects plots for their parameter
in gures 1 and 2 leads to following conclusions:
The relationship between age and n
i
is positive and almost linear.
Females experience more drug prescriptions than males.
The relationship between handicap percentage and drug prescription is positive and
shows non - linear behaviour.
The relationship between income and drug prescriptions is negative and almost linear.
The cost of prescriptions seems to have a parabolic behaviour with age, as it increase
sharply, peaks at 55 years circa and then seems to drop.
As previously cited, most relevant advantage of GAMLSS models is that more param-
eters in addition than can be t as a function of covariates, as shown in equation 4. The
analysis process has shown that modelling ( n
i
) as a function of patients age improve the
GAIC goodness of t index relevantly. On the other hand goodness of t has not improved
if a regression relationship between ( c
i
) either age or sex were set.
The number and cost of prescription GAMLSS models diagnostic plots are reported
in es 3 and 4 respectively. GAMLSS diagnostic plots shows normalized quantile residuals
plot with respect to tted value, position in the data base, the residual kernel distribution plot
and a normal qq-plot. Normalized quantile residuals are a generalized version of residuals
[8] that follows normal distribution by construction. They are useful to assess the correctness
of the probabilistic distributions of the model being tested. See [17] for further details.
The residuals analysis of frequency model plot in 3 shows that the body of distribution has
been t fairly well while the rightmost tail goodness of t is not perfect due to residual over-
dispersion. The diagnostic plot the cost model in gure 4 shows that the chosen probabilistic
distribution ts very well the empirical data.
An actuarial model for assessing general practictioners prescriptions costs 9
model plot.png
Fig. 1 Drugs prescriptions frequency model marginal effects plot, parameter
model plot.png
Fig. 2 Drugs prescriptions costs model marginal effects plot, parameter
10 Giorgio Spedicato
Fig. 3 GAMLSS diagnostic output of drugs prescriptions frequency model
3.3 The simulation process
The cost distribution of the yearly amount of drugs prescription for the TDS has been ini-
tially simulated as follows:
1. The TDS has been duplicated into two distinct dataset: the rst one representing patients
in force at the beginning of the period (henceforth IFP), the second one representing the
patients (henceforth NP) that would enter in the GP pool after the beginning the period.
2. The following passage have been repeated m = 1, . . . , M = 1000 times in order to simu-
late the distribution of the patients pool drug prescriptions total costs:
(a) The exposure in terms of patient/years

E has been determined both for IFP and NP
datasets rows, as follows:
For IFP patients exposure, one number I
i
from a Bernoulli variable with prob-
ability equal to q
i
(d)
+q
i
(l)
has been drawn. q
i
(d)
and q
i
(l)
represent the proba-
bility of lapse due to death and other causes respectively. Due to collected data
limitation, the model we built assumes that only age and sex affect the lapse
probability, allowing a contribution of other causes set at as q
i
(l)
= 0.02.In
case I
i
= 1 the yearly exposure for patient i-th is drawn from a uniform [0, 1].
Then the exposure for IFP dataset records is expressed as e
i
= (I
i
= 0) 1 +
U (0, 1) (I
i
= 1).
For NP data set, the exposure has been determined rst sampling a number

in
i
from a Poisson with rate parameter 0.03 for each row.

in
i
represents the number
of patien with the same demographic characteristics of the patient in row i
th that will enter in the data set within the year. For each

in
i
the convolution
An actuarial model for assessing general practictioners prescriptions costs 11
Fig. 4 GAMLSS diagnostic output of drugs prescriptions costs model
approach has been applied to determine the total exposure for new incoming
patients sampling in
i
outcomes from a uniform [0, 1] distribution.
.
(b) Predict E [ n
i
], var [ n
i
], E [ c
i
] and var [ c
i
] for each rows in IFP and NP dataset us-
ing GAMLSS models calibrated in the previous step. Therefore n
i
and c
i
are fully
dened since both and parameters are kwown for both n
i
and c
i
.
(c) Applying the convolution process on n
i
and c
i
to determine the total costs of drug
prescription in the year as shown in formula 2. The number and the cost distributions
parameters have been estimated in the previous step.
(d) Sum the simulated amounts t
i
, number, n
i
and exposures e
i
along the IFP and NP
databases and then summing them up in order to determine the yearly

E patients
exposures, prescriptions number

N and total cost

T for the analysed pool of patients.
The R object oriented structure makes possible to perform the simulation process using
the predict methods applied on estimated GAMLSS regression models at 3.2 paragraph. The
simulation steps have shown to be quite slow, as several hours have been needed to simulate
the yearly total expenditures of a 600 patients group of hypothetical general practictioner
XY using just M = 750 simulations on a standard desktop PC. A short-cut would be there-
fore useful to apply the propose operationally.
In [15] the log - normal distribution has been suggested to t total loss distribution for
an personal line non life portfolio. This suggestion has been followed and the log-normal
distribution has been t on total prescription cost distribution simulated in the previous step
by the Monte Carlo approach. The R tdistrplus package [7] was used in order to estimate
the parameters and to assess the goodness of t graphically and using suitable statistical test
12 Giorgio Spedicato
cost t.png
Fig. 5 General practictioner XY yearly total cost of drug prescritpion log - normal distribution t
(Andeson Darling and Kolmorogov Smirnow).
Fitting results shown in gure 5 show that the log-normal distribution could provide a very
good t of

T. Moreover all p-values of the two goodness of t statistical tests were non
signicative. Therefore if the parameters of the log-normal distribution of

T would be known
in advance, there would be no need to conduct a time - consuming Monte-Carlo simulation
to assess the distribution

T. We will show that it is possible to know these parameters in
advance. As

T is the sum of independent t
i
observation, equation 12 follows.
E
_

=
N

i=1
E( t)
i
var
_

=
N

i=1
var( t)
i
(12)
Moreover each t
i
represents an outcome of a compound distribution. Following [4], the
expected value and the variance of t
i
can be obtained in closed form from equation ??. All
terms in ?? are obtained from previously tted GAMLSS models.
Since the theoretical expected value and variance of

T are known, the parameters of
the log-normal approximation of the total amount distribution can be therefore be evaluated
directly using the method of moments formulas 13. The direct estimation of parameters
T
and
T
allows to completely dene

T distribution.

T
= ln(E (T))
1
2
ln
_
1+
var (E (T))
E
2
(T)
_

2
T
= ln
_
1+
var (E (T))
E
2
(T)
_
(13)
Therefore the total cost distribution can be almost perfectly approximated using a quite
simple analytical distribution.
An actuarial model for assessing general practictioners prescriptions costs 13
3.4 Results
The outlined algorithmhas been applied on TDS data set, that represent general practictioner
XY 600 patients demographic data. Tables 1, 2 and 3 shows general practictioner patient
/ years, number of prescriptions and total cost of prescription key statistics. The 99.5%
percentile gure has been added for number and total amount. Such gure may be used to
budget and monitor GP XY drug prescriptions expenditures.
mean Q1 Q3
602.84 599.69 606.10
Table 1 Doctor XY patient/years distribution
mean SD Q1 Q3 p99.5
1952.96 125.33 1868.50 2038.00 2260.52
Table 2 Doctor XY number of prescriptions distribution
mean SD Q1 Q3 p99.5
39967.79 2721.93 38182.61 41814.40 46118.52
Table 3 Doctor XY total cost of prescriptions distribution
4 Conclusions and further research
4.1 Discussion of results
This article has shown how non - life actuarial techniques can be successfully applied to a
health economics problem. We have used GAMLSS to evaluate the frequency and the cost
of drug prescriptions following an approach closely resembling personal line rate-making.
The predictive models we propose can be used to assess and explain which demographic
risk factors affect signicantly the number and the cost of drug prescriptions paid by NHI.
Moreover the convolution approach of the collective risk theory has been used to assess
the distribution of yearly expenditures of a GPs pool of patients. The assessment of the total
cost distribution can be used to monitor the prescriptions granted by the GP using a statisti-
cally grounded approach.
A relevant limitation of the followed modelling approach is that pandemic events are
not handled properly as each patient is assumed independent from the other ones. Pandemic
events would affect at the same time may patients by disease contagion especially if spatially
14 Giorgio Spedicato
close (as catastrophic insurance losses). On the other hand seasonal diseases do not repre-
sent an issue due to the yearly period of observation (fractional exposures have appeared to
be a small issue in this problem).
The approach followed in this paper has modelled all prescriptions granted by GP, avoid-
ing creating sub-models e.g. for disease groups.
Further subdivisions of drug prescriptions might be interesting for deepening the risk factors
inuencing the frequency and the cost of homogeneous groups of drug expenditures.
We think that the most valuable use of the proposed model within health economics
would be a rationale assessment of the standard cost of drug prescription for a GP pool
of patient. Assessing

T and t
i
distributions would permit to obtain statistics useful for the
planning and budgeting process like:
The expected value and any desired dispersion measures.
Extreme percentiles (e.g. 99th), that may be used as a threshold for further actions in
order to investigate potential inefciencies or abuses.
If the predictive model would be calibrated on a certied sample, they could be used to
estimate the standard cost of yearly drug prescription for any GP pool of patients knowing
patients demographics. The use of standard costs of government provided services have
been acquiring relevant increasing importance in a period of budget pressure Italy and many
OECD countries are facing. At the same time the developed model would permit to obtain
the distribution percentiles of drug prescription that can be used to monitor the expenditures,
e.g. priotitizing routinely audits of individual GP drug prescriptions.
Moreover the proposed methodology can be easily used to estimate the drug prescription
costs taking into account ination and changes of coverage offered by the NHI, like the
application of a yearly deductible or a coinsurance percentages. With respect to the actuarial
side of the analysis, another relevant application lies in the estimations of the multi - year
actuarial present value of the drug prescriptions costs for any patient given its demographic
prole. GAMLSS model for the number and costs let us to obtain a yearly average total cost
( a pure premium) as a function on age x, x +1, . . . and other demographic variables, pr
x
of
any patient. After assumptions about future ination rate i
t
, the nancial discount rate v
t
and
the probability of survival
x
p
t
have been made, the lifetime actuarial present value of drug
prescription cost for any patient in the pool can be expressed by formula 14.
c
i
=
x

t=0
(1+i
i
)
t
t
p
x
v
t
pr
x+t
(14)
4.2 Further research
Finally the proposed approach can certainly be applied to more traditional actuarial applica-
tions like personal lines rate-making or capital modelling. -
The discussed model shows a rationale approach to assess a general practitioner drugs
prescriptions cost distribution. Actuarial techniques commpon in general insurance pricing
and risk management practices have been applied to a Health Economics problem as long
as GAMLSS models, frontier methods in regression modelling.
An actuarial model for assessing general practictioners prescriptions costs 15
As a further research direction, predictive models more exible than standard log-linear
regressions framework should be tested in order to better assess the frequency and the cost
of prescriptions. More rened models for patient lapses and patient conversions can be built
following what done in pricing optimization tasks on personal lines rate-making [23], when
data avaibility issue would be solved.
We suggest to expand the study by increasing the sample of physician analysed and
patients transactions. The use of GP level data would be a valuable improvement in the ex-
plicative power of the data set. In fact literature (e.g. [21]) has shown that GP characteristics
like length of practice affect the outcome signicantly. This could increase the consistency
of model estimates and, last but not least, the inclusion of GP level variables in the model
could improve model explicative and predictive power.
16 Giorgio Spedicato
Acknowledgements The authors wish to thank Dr. Stefania Giacalone for having provided drug prescrip-
tions costs data. I wish to thank Simona Minotti for her outstanding contribution in reviewing the document.
The data analysis in this paper was performed with R, statistical software which is released under the GNU
General Public License (GPL). For more information on R, the interested reader is referred to R Development
Core Team, [16].
References
1. Duncan Anderson, Sholom Feldblum, Claudine Modlin, Doris Schirmacher, Ernesto Schirmacher, and
Neeza Thandi. A practitioners guide to generalized linear models. Technical report, Casualty Actuarial
Society, 2007.
2. CEIOPS. Qis5 technical specications, July 2010.
3. Cermlab. Alla ricerca di standard per la sanit federalista.
http://www.cermlab.it/argomenti.php?group=sanita&item=43, 02 2010.
4. C.D. Daykin, T. Pentik
ainen, and M. Pesonen. Practical risk theory for actuaries. Monographs on statistics and applied
probability. Chapman & Hall, 1994.
5. Piet de Jong and Gillian Heller. Generalized linear models for insurance data. Cambridge University
Press, New York, rst edition edition, 2008.
6. Piet de Joung, Mikis Stasinopoulos, Robert Stasinopoulos, and Gillian Heller. Mean and dispersion
modeling for policy claim cost. Scandinavian Actuarial Journal, 2007.
7. Marie Laure Delignette-Muller, Regis Pouillot, Jean-Baptiste Denis, and Christophe Dutang. tdistrplus:
help to t of a parametric distribution to non-censored or censored data, 2010. R package version 0.1-3.
8. Peter Dunn and Gordon K. Smyth. Randomized quantile residuals. J. Computat. Graph. Statist, 5:236
244, 1996.
9. E. Fernandez-Liz, P. Modamio, A. Catalan, C. F. Lastra, T. Rodriguez, and E. L. Marino. Identifying
how age and gender inuence prescription drug use in a primary health care environment in catalonia,
spain. Br J Clin Pharmacol, 65:407417, Mar 2008.
10. M. Garcia-Goni and P. Ibern. Predictability of drug expenditures: an application using morbidity data.
Health Econ, 17:119126, Jan 2008.
11. Professor W. Greene. German health care usage data. online:
http://pages.stern.nyu.edu/ wgreene/Econometrics/PanelDataSets.htm, 1997.
12. Frank E. Harrel. title. Technical report, Vanderbilt University School of Medicine, 2011.
13. Istat. Geodemo istat: Tavole di mortalit regionali, 2011. Online; accessed 26-June-2011.
14. S.A. Klugman, H.H. Panjer, and G.E. Willmot. Loss Models: From Data to Decisions (Book, Solutions
Manual, and ExamPrep). John Wiley & Sons, 2009.
15. D.E. Papush, G.S. Patrik, and F. Podgaits. Approximations of the aggregate loss distribution. In CAS
Forum, pages 175186, 2001.
16. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation
for Statistical Computing, Vienna, Austria, 2010. ISBN 3-900051-07-0.
17. Bob Rigby and Mikis Stasinopoulos. A exible regression approach using gamlss in r, 11 2009.
18. R. A. Rigby and D. M. Stasinopoulos. Generalized additive models for location, scale and shape,(with
discussion). Applied Statistics, 54:507554, 2005.
19. Robert Rigby and Mikis Stasinopoulos. Generalized additive models for location, scale and shape,(with
discussion). Applied Statistics, 54:507554, 2005.
20. Nino Savelli and Gianpaolo Clemente. Hierarchical structures in the aggregation of premium risk for
insurance underwriting. Scandinavian Actuarial Journal, 1:1, 2010.
21. G. Simon, C. Francescutti, S. Brusin, and F. Rosa. Variation in drug prescription costs and general
practitioners in an area of north-east italy. the use of current data. Epidemiol Prev, 18:224229, Dec
1994.
22. Giorgio Alfredo Spedicato. Solvency II premium risk modeling under the direct compensation CARD
system. PhD thesis, La Sapienza, Universit a di Roma, 2011.
23. James Tanser. Pretium manual. Tower Watson, 3.1 edition, 2010.
24. Gary Venter. Mortality trend models. Casualty Actuarial Society Forum, 1:1, 2011.
25. Geoff Werner and Claudine Modlin. Basic Ratemaking, 2009.
26. Keith Wilson-Davis and William G. Stevenson. Predicting prescribing costs: A model of northern ireland
general practices. Pharmacoepidemiology and Drug Safety, 1(6):341345, 1992.